Polar Codes for Sources with Finite Reconstruction Alphabets

Polar Codes for Sources with Finite Reconstruction Alphabets Aria G. Sahebi and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 4809, USA. Email: ariaghs@umich.edu, pradhanv@umich.edu Abstract Polar codes are a new class of codes introduced by Arikan capable of achieving the symmetric capacity of discrete memoryless channels. It was later shown that polar codes can achieve the symmetric rate-distortion bound for the lossy source coding problem when the size of the reconstruction alphabet is a prime. In this paper, we show that polar codes achieve the symmetric rate-distortion bound for finite-alphabet sources regardless of the size of the reconstruction alphabet. I. ITRODUCTIO Polar codes were originally proposed by Arikan in [] for discrete memoryless channels with a binary input alphabet. Polar codes over binary input channels are coset codes capable of achieving the symmetric capacity of channels. These codes are constructed [ ] based on the Kronecker power 0 of the 2 2 matrix and are the first known class of capacity achieving codes with an explicit construction. It was subseuently shown by Korada and Urbanke [2] that polar codes employed with a successive cancelation encoder are also optimal for the lossy source coding problem when the size of the reconstruction alphabet is two. Karzand and Telatar extended this result to the case where the size of the reconstruction alphabet is a prime [3]. In an earlier work [4], we have shown that polar codes can achieve the symmetric capacity of arbitrary discrete memoryless channels regardless of the size of the channel input alphabet. It is shown in [4] that in general, when the channel input alphabet is an Abelian group G, channel polarization occurs in several levels. For each subgroup H of G, there exists a corresponding asymptotic case in which the channel output can determine the coset of the subgroup to which the channel input belongs. Then a modification is made to the encoding and decoding rules to achieve the symmetric capacity of the channel. As mentioned in [4], polar codes employed with the modified encoding rule fall into a class of codes called nested group codes. In this paper, motivated by the results of [2] [4], we show that polar codes achieve the symmetric rate-distortion bound for the lossy source coding problem when the size of the reconstruction alphabet is finite. We show that similar to the channel coding problem, polar transformations applied to test channels can converge to several asymptotic cases each corresponding to a subgroup H of the reconstruction alphabet G. We employ a modified randomized rounding encoding rule to achieve the symmetric rate-distortion bound. This paper is organized as follows: In Section II, some definitions and basic facts are stated which are used in the paper. In Section III we show that polar codes achive the symmetric rate-distortion bound when the size of te reconstruction alphabet is of the form p r where p is a prime and r is a positive integer. This result is generalized in Section IV to the case where the reconstruction alphabet is some arbitrary Abelian group of arbitrary size and arbitrary group operation. We conclude in Section V. II. PRELIMIARIES Source and Channel Models: The source is modeled as a discrete-time random process X with each sample taking values in a fixed finite set X with probability distribution p X. The reconstruction alphabet is denoted by U and the uality of reconstruction is measured by a single-letter distortion function d : X U R +. We denote the source by X, U,p X,d. A test channel is specified by a tuple U, X,W where U is the channel input alphabet, X is the channel output alphabet and W is the channel transition kernel. 2 Achievability and the Rate-Distortion Function: A transmission system with parameters n,θ,,τ for compressing a given source X, U,p X,d consists of an encoding mapping and a decoding mapping e : X n {,2,,Θ}, g : {,2,,Θ} U n such that the following condition is met: P dx n,gex n > τ where X n is the random vector of length n generated by the source. In this transmission system, n denotes the block length, log Θ denotes the number of channel uses, denotes the distortion level and τ denotes the probability of exceeding the distortion level. Given a source, a pair of non-negative real numbers R,D is said to be achievable if there exists for every ǫ > 0, and for

all sufficiently large numbers n a transmission system with parameters n,θ,,τ for compressing the source such that log Θ R + ǫ, D + ǫ, τ ǫ n The optimal rate distortion function R D of the source is given by the infimum of the rates R such that R,D is achievable. It is known that the optimal rate-distortion function is given by: RD min IX;U p U X E px p U X {dx,u} D where p U X is the conditional probability of U given X and p X p U X is the joint distribution of X and U. The symmetric rate-distortion function RD is defined as follows: RD min p U X E px p U X {dx,u} D p U IX;U where p U is the marginal distribution of U given by p U u x X p Xxp U X u x and is the size of the reconstruction alphabet U. 3 Binary Polar Codes: For any 2 n, a polar code of length designed for the channel Z 2, Y,W is a linear code characterized by a generator matrix G and a set of indices A {,,} of perfect channels. The generator matrix for polar codes is defined as [ G ] B F n where 0 B is a permutation of rows, F and denotes the Kronecker product. The set A is a function of the channel. The decoding algorithm for polar codes is a specific form of successive cancellation []. For the source coding problem, when the size of the reconstruction alphabet is binary, the encoding rule is a form of soft decicion successive encoding and the decoding rule is a matrix multiplication [2]. 4 Groups, Rings and Fields: All groups referred to in this paper are Abelian groups. Given a group G, +, a subset H of G is called a subgroup of G if it is closed under the group operation. In this case, H,+ is a group on its own right. This is denoted by H G. A coset C of a subgroup H is a shift of H by an arbitrary element a G i.e. C a + H for some a G. For any subgroup H of G, its cosets partition the group G. A transversal T of a subgroup H of G is a subset of G containing one and only one element from each coset shift of H. We give some examples in the following: The simplest nontrivial example of groups is Z 2 with addition mod-2 which is a ring and a field with multiplication mod-2. The group Z 2 Z 2 is also a ring and a field under component-wise mod-2 addition and a carefully defined multiplication. The group Z 4 with mod-4 addition and multiplication is a ring but not a field since the element 2 Z 4 does not have a multiplicative inverse. The subset {0, 2} is a subgroup of Z 4 since it is closed under mod-4 addition. {0} and Z 4 are the two other subgroups of Z 4. The group Z 6 is neither a field nor a ring. Subgroups of Z 6 are: {0}, {0,3}, {0,2,4} and Z 6. 5 Polar Codes Over Abelian Groups: For any discrete memoryless channel, there always exists an Abelian group of the same size as that of the channel input alphabet. In general, for an Abelian group, there may not exist a multiplication operation. Since polar encoders are characterized by a matrix multiplication, before using these codes for channels of arbitrary input alphabet sizes, a generator matrix for codes over Abelian groups needs to be properly defined. In Appendix A of [5] a convention is introduced to generate codes over groups using {0, }-valued generator matrices. 6 Channel Parameters: For a test channel U, X, W assume U is euipped with the structure of a group G,+. The symmetric capacity of the test channel is defined as I 0 W IU;X where the channel input U is uniformly distributed over U and X is the output of the channel; i.e. for U, I 0 W u U x X Wx ulog Wx u ũ U Wx ũ The Bhattacharyya distance between two distinct input symbols u and ũ is defined as ZW {u,ũ} x X Wx uwx ũ and the average Bhattacharyya distance is defined as ZW u,ũ U u ũ ZW {u,ũ} We use the following two uantities in the paper extensively: D d W Wx u Wx u + d u U x X D d W Wx u Wx u + d 2 u U x X where d is some element of G and + is the group operation. 7 otation: We denote by Oǫ any function of ǫ which is right-continuous around 0 and that Oǫ 0 as ǫ 0. We denote by a ǫ b to mean a b + Oǫ. For positive integers and r, let {A 0,A,,A r } be a partition of the index set {,2,,}. Given sets T t for t 0,,r, the direct sum r T t At is defined as the set of all tuples u u,,u such that u i T t whenever i A t.

III. POLAR CODES FOR SOURCES WITH RECOSTRUCTIO ALPHABET Z p r In this section, we consider sources whose reconstruction alphabet size is of the form p r for some prime number p and a positive integer r. In this case, the reconstruction alphabet can be considered as a ring with addition and multiplication modulo p r. We prove the achievability of the symmetric rate-distortion bound for these sources using polar codes and later in Section IV we will generalize this result to sources with arbitrary finite reconstruction alphabets. A. Z p r Rings Let G Z p r {0,,2,,p r } with addition and multiplication modulo p r be the input alphabet of the channel, where p is a prime and r is an integer. For t 0,,,r, define the subgroups H t of G as the set: H t p t G {0,p t,2p t,,p r t p t } and for t 0,,,r, define the subsets K t of G as K t H t \H t+ ; i.e. K t is defined as the set of elements of G which are a multiple of p t but are not a multiple of p t+. ote that K 0 is the set of all invertible elements of G and K r {0}. One can sort the sets K 0 > K > > K r in a decreasing order of invertibility of its elements. Let T t be a transversal of H t in G; i.e. a subset of G containing one and only one element from each coset of H t in G. One valid choice for T t is {0,,,p t }. ote that given H t and T t, each element g of G can be represented uniuely as a sum g ĝ + g where ĝ T t and g H t. B. Recursive Channel Transformation The Basic Channel Transforms: For a test channel U G, X,W where G, the channel transformations are given by: W x,x 2 u Wx u + u 2Wx 2 u 2 u 2 G W + x,x 2,u u 2 Wx u + u 2 Wx 2 u 2 2 for x,x 2 X and u,u 2 G. Repeating these operations n times recursively, we obtain 2 n channels W,,W. For i,,, these channels are given by: W i x,u i u i W x u G u i+ G i where G is the generator matrix for polar codes. Let V be a random vector uniformly distributed over G and assume the random vectors V, U and X are distributed over G G X according to p V U Xv,u,x n {u v G } 3 Wx i u i i The conditional probability distribution induced from the above euation is consistent with 3. We use this probability distribution extensively throughout the paper. C. Encoding and Decoding Let n be a positive integer and let G be the generator matrix for polar codes where 2 n. Let {A t 0 t r} be a partition of the index set {,2,,}. Let b be an arbitrary element from the set r HAt t. Assume that the partition {A t 0 t r} and the vector b are known to both the encoder and the decoder. The encoder maps a source seuence x to a vector v G by the following rule: For i,,, if i A t, let v i be a random element g from the set b i + T t picked with probability Pv i g P Vi V i,x g v i,x P Vi V i,x b i + T t v i,x This encoding rule is a generalization of the randomized rounding encoding rule used for the binary case in [2]. Given a seuence v G, the decoder decodes it to v G. For the sake of analysis we assume that the vector b is uniformly randomly distributed over the set r HAt t Although it s common information between the encoder and the decoder. The average distortion is given by E{dX,U } and the rate of the code is given by R r r A t A t log T t t log p D. Test Channel Polarization The following result has been proved in [5]: For all ǫ > 0, there exists a number ǫ 2 nǫ and a partition {A ǫ 0,A ǫ,,a ǫ r} of {,,} such that for t 0,,r and i A ǫ t, Z d W i < Oǫ if d H s for 0 s < t and Z d W i > Oǫ if d H s for t s < r. For t 0,,r and i A ǫ t, we have IW i t logp+oǫ and Z t W i Oǫ where Z t W Z d W H t d H t Moreover, as ǫ 0, p 0,,p r. A ǫ t p t for some probabilities In the next section, we show that for any β < 2 and for t 0,,r, lim P Z t n > 2 2βn P Z t 4 n r st Remark III.. This observation implies the following stronger result: For all ǫ > 0, there exists a number ǫ 2 nǫ and a partition {A ǫ 0,A ǫ,,a ǫ r} of {,,} such that for t 0,,r and i A ǫ t, p s

IW i t logp + Oǫ and Zt W i > 2 2βnǫ. A Moreover, as ǫ 0, ǫ t p t for some probabilities p 0,,p r. E. Rate of Polarization In this section we derive a rate of polarization result for the source coding problem. In this proof, we do not assume is a power of a prime and hence the rate of polarization result derived in this section is valid for the general case. It is shown in [6] with a slight generalization that if a random process Z n satisfies the following two properties Z n+ kz n w.p. 2 Z n+ Zn 2 w.p. 6 2 for some constant k, then for any β < 2, lim n PZ n < 2 2βn PZ 0. We prove that the random process D d n satisfied these properties. First note that by definition D d W + [ Wx u + u 2 Wx 2 u 2 u 2 G x,x 2 X u G Wx u + u 2 + dwx 2 u 2 + d If we add and subtract the term Wx u +u 2 Wx 2 u 2 +d to the term inside brackets and use the ineuality a+b 2 2a 2 + b 2 with a Wx u +u 2 Wx 2 u 2 Wx u + u 2 Wx 2 u 2 +d b Wx u + u 2 Wx 2 u 2 + d Wx u + u 2 + dwx 2 u 2 + d we obtain D d W + 2 2 u 2 G x [,x 2 X,u G Wx u +u 2 Wx 2 u 2 Wx u +u 2 Wx 2 u 2 +d 2 + Wx u + u 2 Wx 2 u 2 + d 5 Wx u + u 2 + dwx 2 u 2 + d 2] 7 This summation can be expanded into two separate summations. For the first summation, we have 2 2 u 2 G x,x 2 X,u G Wx u +u 2 Wx 2 u 2 Wx u +u 2 Wx 2 u 2 +d 2 2 2 2 u 2 G x,x 2 X,u G Wx u + u 2 2 Wx 2 u 2 Wx 2 u 2 + d 2 Wx 2 u 2 Wx 2 u 2 + d 2 u 2 G x 2 X 2 D d W 8 ] 2 Similarly, for the second summation we can show that u 2 G x,x 2 X u G 2 2 Wx u + u 2 Wx 2 u 2 + d Wx u + u 2 + dwx 2 u 2 + d 2 2 D d W 9 Therefore, it follows from 7, 8 and 9 that condition 5 is satisfied for k 4. ext we show that Dd W Dd W 2. ote that D d W v G x,x 2 X v G x,x 2 X [ v 2 G 2 v 2 G Wx v + v 2 Wx 2 v 2 Wx v + d + v 2 Wx 2 v 2 [ v 2 G ] 2 Wx 2 v 2 Wx v + v 2 2 ] 2 2 Wx v + d + v 2 The suared term in the brackets can be expanded as 2 v 2,v 2 G Wx 2 v 2 Wx 2 v 2Wx v + v 2 Wx v + d + v 2 Wx v + v 2 Wx v + d + v 2 Therefore, Dd W can be written as a summation of four terms D d W D + D 2 + D 3 + D 4 where D v G x,x 2 X 2 v 2,v 2 G Wx 2 v 2 Wx 2 v 2Wx v + v 2 Wx v + v 2 D2 2 Wx 2 v 2 v G x,x 2 X v 2,v 2 G Wx 2 v 2Wx v + v 2 + dwx v + v 2 + d D3 2 Wx 2 v 2 v G x,x 2 X v 2,v 2 G Wx 2 v 2Wx v + v 2 Wx v + v 2 + d D4 2 Wx 2 v 2 v G x,x 2 X v 2,v 2 G Wx 2 v 2Wx v + v 2 + dwx v + v 2 For d G define S d W Wx vwx v + d v G x X

ote that S d W S d W. We have D 2 Wx 2 v 2 Wx 2 v 2 x 2 X v 2,v 2 G Wx v + v 2 Wx v + v 2 v G x X 2 Wx 2 v 2 Wx 2 v 2S v2 v W 2 x 2 X v 2,v 2 G 2 Wx 2 v 2 Wx 2 v 2 as a W x 2 X v 2 G v 2 v2 a 2 S a W Wx 2 v 2 Wx 2 v 2 a v 2 G x 2 X 2 S a WS a W 2 S a W 2 With similar arguments we can show that D2 2 S a W 2 D3 2 S a WS a d W D4 2 S a WS a d W Therefore D d W 4 Sa W 2 S a WS a d W 2 S a W S a d W 2 ote that D d W Wx v Wx v + d 2 Therefore v G x X 2S 0 W 2S d W 2 Dd W 4S0 W S d W 2 2 To show that Dd W Dd W it suffices to show that S a W S a d W S 0 W S d W. We will make use of the rearrangement ineuality: Lemma III.. Let π be an arbitrary permutation of the set {,,n. If a a n and b b n then n n a i b i a i b πi i i The rearrangement ineuality implies that S 0 W S d W S a W S a d W. Therefore it follows that condition 6 is also satisfied and hence lim n P D d n < 2 2βn P D d 0. It has been shown in Appendix D of [5] that D d W < ǫ implies Z d W > ǫ. This completes the rate of polarization result. F. Polar Codes Achieve the Rate-Distortion Bound The average distortion for the encoding and decoding rules described in Section III-C is given by D avg p Xx x X b L r HA t t v b +L r T A t t r g v i,x P Vi V i,x P i A t Vi V i,x b i + T t v i,x r dx H t At,v G p Xx x X b L r HA t t v b +L r T A t t r g v i,x i A t This can be written as P Vi V i,x P Vi V i,x b i + T t v i,x H t D avg E Q {dx,v G } where the distribution Q is defined by and Qv i v i,x and hence Qv,x Recall that i dx,v G P Vi V i,x v i v i,x P Vi V i,x Qx p Xx Qv i v i,x r i A t Pv,x b i + T t v i,x H t P Vi V i,x v i v i,x P Vi V i,x i b i + T t v i,x H t P Vi V i,x v i v i,x The total variation distance between the distributions P and Q is given by P Q t.v. Qv,x Pv,x v G,x X x X p Xx v G Qv x Pv x

We have v G v G a [ v G Qv x Pv x r i Pv i v,x Pb i A i + T t v i t,x H t r Pv i v i,x i A t r i A t Pv i v i,x Pb i + T t v i r r,x H t Pv i v i ji+ j A t i j Pv i v i,x Pb i + T t v i,x i Pv i v,x j A t,x H t where in a we used the telescopic ineuality introduced in [2]. It is straightforward to show that P Q t.v. r E i A t { } Pb i + T t v i,x H t It has been shown in Appendix D of [5] that if Z d W > ǫ then D d W 2ǫ ǫ 2. Therefore if Z d W i > ǫ for all d H we have D d W i v i G v i G i,x X W i vi,x v i W i vi 2ǫ ǫ 2 Therefore for all v i G v i G i x X,x v i + d W i vi,x v i W i vi,x v i + d 2ǫ ǫ 2 We have { } E H t PT t V i,x i Pv,x H t PT t v i,x v i G i,x X H t Pvi,x Pg,v i,x g T t v i G i x X v i G i,x X v i G i,x X G H t g T t d H 2ǫ ǫ2 H t g T t [ G Wvi,x g H t G Wvi,x g + d] d H [ Wv i,x g G d H g T t H t Wvi,x g + d] v i G i,x X Wv i,x g Wv i,x g + d Therefore for if for all d H t, Z d > ǫ then for δ 2ǫ ǫ 2 H t we have { } E P H t Pb i + T t V i,x < δ A similar argument as in [2] implies that D avg E Q{dX,V G } E P {dx,v G } + r A t d max δ where d max is the maximum value of the distortion function. ote that Therefore, E P {dx,v G } D D avg D + r A t d max δ ote that from the rate of polarization derived in Section III-E we can choose ǫ to be ǫ 2 2βn. This implies that as n, D + r A t d max δ D. Also note that the rate of the code R r A t t log p converges to r p tt log p and this last uantity is eual to the symmetric capacity of the test channel since the mutual information is a martingale. This means the rate IX;U is achievable with distortion D.

IV. ARBITRARY RECOSTRUCTIO ALPHABETS For an arbitrary Abelian group G, The following polarization result has been provided in [5]: For all ǫ > 0, there exists a number ǫ 2 nǫ and a partition {A ǫ H H G} of {,,} such that for H G and i A ǫ H, Z dw i < Oǫ if d H and Z dw i > Oǫ if d / H. For H G and i A ǫ H, we have IW i G log H + Oǫ and ZH W i Oǫ where Moreover, as ǫ 0, p H,H G. Z H W Z d W H d H A ǫ H p H for some probabilities As mentioned earlier the rate of polarization result derived in Section III-E is valid for the general case. Therefore it follows that for any β < 2 and for H G, lim P Z H n > 2 2βn P Z H n 0 V. COCLUSIO We have shown that polar codes achieve the symmetric rate-distortion bound for finite-alphabet sources regardless of the size of the reconstruction alphabet. REFERECES [] E. Arikan, Channel Polarization: A Method for Constructing Capacity- Achieving Codes for Symmetric Binary-Input Memoryless Channels, IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 305 3073, 2009. [2] S. B. Korada and R. Urbanke, Polar Codes are Optimal for Lossy Source Coding, IEEE Transactions on Information Theory, vol. 56, no. 4, pp. 75 768, Apr. 200. [3] M. Karzand and E. Telatar, Polar Codes for Q-ary Source Coding, Proceedings of IEEE International Symposium on Information Theory, 200, austin, TX. [4] A. G. Sahebi and S. S. Pradhan, Multilevel Polarization of Polar Codes Over Arbitrary Discrete Memoryless Channels, Proc. 49th Allerton Conference on Communication, Control and Computing, Sept. 20. [5], Multilevel Polarization of Polar Codes over Arbitrary Discrete Memoryless Channels, July 20, online: http://http://arxiv.org/abs/07.535. [6] E. Arikan and E. Telatar, On the rate of channel polarization, Proceedings of IEEE International Symposium on Information Theory, 2009, Seoul, Korea. S H p H Remark IV.. This observation implies the following stronger result: For all ǫ > 0, there exists a number ǫ 2 nǫ and a partition {A ǫ H H G} of {,,} such that for H G and i A ǫ i G H, IW log H +Oǫ and Z H W i >. Moreover, as ǫ 0, 2 2βnǫ A ǫ H p H for some probabilities p H,H G. The encoding and decoding rules for the general case is as follows: Let n be a positive integer and let G be the generator matrix for polar codes where 2 n. Let {A H H G} be a partition of the index set {,2,,}. Let b be an arbitrary element from the set H G HAH. Assume that the partition {A H H G} and the vector b are known to both the encoder and the decoder. The encoder maps a source seuence x to a vector v G by the following rule: For a subgroup H of G, let T H be a transversal of H in G. For i,,, if i A H, let v i be a random element g from the set b i + T H picked with probability Pv i g P Vi V i,x g v i,x P Vi V i,x b i + T H v i,x Given a seuence v G, the decoder decodes it to v G. It follows from the analysis of the Z p r case in a straightforward fashion that this encoding/decoding scheme achieves the symmetric rate-distortion bound when the group G is an arbitrary Abelian group.