Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de arxiv:302.009v2 [cs.it] Jul 203 Abstract Fixed-to-variable length (f2v) atchers are used to reversibly transfor an input sequence of independent and uniforly distributed bits into an output sequence of bits that are (approxiately) independent and distributed according to a target distribution. The degree of approxiation is easured by the inforational divergence between the output distribution and the target distribution. An algorith is developed that efficiently finds optial f2v codes. It is shown that by encoding the input bits blockwise, the inforational divergence per bit approaches zero as the block length approaches infinity. A relation to data copression by Tunstall coding is established. I. INTRODUCTION Distribution atching considers the proble of apping uniforly distributed bits to sybols that are approxiately distributed according to a target distribution. In difference to the siulation of rando processes [] or the exact generation of distributions [2], distribution atching requires that the original bit sequence can be recovered fro the generated sybol sequence. We easure the degree of approxiation by the noralized inforational divergence (Idivergence), which is an appropriate easure when we want to achieve channel capacity of noisy and noiseless channels [3, Sec. 3.4.3 & Chap. 6] by using a atcher. A related work is [4], [3, Chap. 3], where it is shown that variable-to-fixed length (v2f) atching is optially done by geoetric Huffan coding and the relation to fixed-to-variable length (f2v) source encoders is discussed. In the present work, we consider binary distribution atching by prefix-free f2v codes. A. Rooted Trees With Probabilities We use the fraework of rooted trees with probabilities [5], [6]. Let T be the set of all binary trees with 2 leaves and consider soe tree T T. Index all nodes by the nubers N := {, 2, 3,... } where is the root. Note that there are at least N 2 + nodes in the tree, with equality if the tree is coplete. A tree is coplete if any right-infinite binary sequence starts with a path fro the root to a leaf. Let L N be the set of leaf nodes and let B := N \ L be the set of branching nodes. Probabilities can be assigned to the tree by defining a distribution over the 2 paths through the tree. For each i N, denote by P T (i) the probability that a path is chosen that passes through node i. Since each path ends at a different leaf node, P T defines a leaf distribution, i.e., This work was supported by the Geran Ministry of Education and Research in the fraework of an Alexander von Huboldt Professorship. U T (0) = 3 4 U T () = U 2 T (0) = 2 3 2 U T (2) = 3 4 U 2 T () = 3 U 4 T (0) = 2 4 U T (4) = 2 U 4 T () = 2 3 U T (3) = UT () = 4 4 (a) tree probabilities defined by a leaf distribution Q T () = 2 Q T (2) = 2 3 q = 3 5 U T (5) = 4 4 Q T (4) = 4 9 q = 3 3 Q T (3) = q = 3 3 (b) tree probabilities defined by branching distributions 5 Q T (5) = 2 9 6 7 6 7 U T (6) = 4 U T (7) = 4 Q T (6) = 8 27 Q T (7) = 4 27 Fig.. A rooted tree with probabilities. The set of branching nodes is B = {, 2, 4} and the set of leaf nodes is L = {3, 5, 6, 7}. In (a), a unifor leaf distribution U T is chosen, i.e., for each i L, U T (i) =. The leaf 4 distribution deterines the node probabilities and branching distributions. In (b), identical branching distributions are chosen, i.e., for each i B, Q i T = Q, where Q(0) =: and Q() =: q =. The branching 3 distributions deterine the resulting node probabilities, which we denote by Q T (i). Since the tree is coplete, Q T (i) = and Q T defines a leaf distribution. P T (i) =. For each branching node i B, denote by PT i the branching distribution, i.e., the probabilities of branch 0 and branch after passing through node i. The probabilities on the tree are copletely defined either by defining the branching distributions {PT i, i B} or by defining the leaf distribution {P T (i), i L}. See Fig. for an exaple.

B. v2f Source Encoding and f2v Distribution Matching Consider a binary distribution Q with q = Q(0), q = Q(), 0 < q 0 <, and a binary tree T with 2 leaves. Let Q T (i), i N, be the node probabilities that result fro having all branching distributions equal to Q, i.e. Q i T = Q for each i B. See Fig. (b) for an exaple. Let U T be a unifor leaf distribution, i.e., U T (i) = 2 for each i L, see Fig. (a) for an exaple. We use the tree as a v2f source code for a discrete eoryless source (DMS) Q. To guarantee lossless copression, the tree for a v2f source encoder has to be coplete. Consequently, Q T defines a leaf distribution, i.e., Q T (i) =. We denote the set of coplete binary trees with 2 leaves by C. Each code word consists of log 2 2 = bits and the resulting entropy rate at the encoder output is H(Q T ) = Q T (i)[ log 2 Q T (i)] = Q T (i)[ log 2 Q T (i) + log 2 U T (i) log 2 U T (i)] = D(Q T U T ) () where H(Q T ) is the entropy of the leaf distribution defined by Q T and where D(Q T U T ) is defined accordingly. Fro (), we conclude that the objective is to solve in D(Q T U T ). (2) The solution is known to be attained by Tunstall coding [7]. The tree in Fig. is a Tunstall code for Q:, q = 3 and = 2 and the corresponding v2f source encoder is 000 00, 00 0, 0 0,. (3) The dual proble is f2v distribution atching. Q is now a binary target distribution and we generate the codewords defined by the paths through a (not necessarily coplete) binary tree uniforly according to U T. For exaple, the f2v distribution atcher defined by the tree in Fig. is 00 000, 0 00, 0 0,. (4) Denote by l i, i L the path lengths and let L be a rando variable that is uniforly distributed over the path lengths according to U T. We want the I-divergence per output bit of U T and Q T to be sall, i.e., we want to solve in T T. (5) In contrast to (2), the iniization is now over the set of all (not necessarily coplete) binary trees with 2 leaves. Note that although for a non-coplete tree we have Q T (i) <, the proble (5) is well-defined, since there is always a coplete tree with leaves L L and Q T (i) =. The su in (5) is over the support of U T, which is L. Solving (5) is the proble that we consider in this work. C. Outline In Sec. II and Sec. III, we restrict attention to coplete trees. We show that Tunstall coding applied to Q iniizes and that iteratively applying Tunstall coding to weighted versions of Q iniizes /. In Sec. IV we derive conditions for the optiality of coplete trees and show that the I-divergence per bit can be ade arbitrarily sall by letting the blocklength approach infinity. Finally, in Sec. V, we illustrate by an exaple that source decoders are sub-optial distribution atchers and vice-versa, distribution deatchers are sub-optial source encoders. II. MINIMIZING I-DIVERGENCE Let R be the set of real nubers. For a finite set S, we say that W : S R is a weighted distribution if for each i S, W (i) > 0. We allow for i S W (i). The I-divergence of a distribution P and a weighted distribution W is D(P W ) = i supp P P (i) log 2 P (i) W (i) where supp denotes the support of P. The reason why we need this generalization of the notion of distributions and I- divergence will becoe clear in the next section. Proposition. Let Q be a weighted binary target distribution, and let (6) T = argin (7) be an optial coplete tree. Then we find that i. An optial coplete tree T can be constructed by applying Tunstall coding to Q. ii. If 0 q 0 and 0 q, then T also iniizes aong all possibly non-coplete binary trees T, i.e., the optial tree is coplete. Proof: Part i. We write and hence = argin 2 log 2 2 Q T (i) = 2 = argax log Q T (i) (8) log 2 Q T (i) (9) Consider now an arbitrary coplete tree T C. Since the tree is coplete, there exist (at least) two leaves that are siblings, say j and j +. Denote by k the corresponding branching node. The contribution of these two leaves to the objective function on the right-hand side of (9) can be written as log 2 Q T (j) + log 2 Q T (j + ) = log[q T (k)q 0 ] + log[q T (k)q ] = log Q T (k) + log Q T (k) + log q 0 + log q. (0)

Now consider the tree T that results fro reoving the nodes j and j +. The new set of leaf nodes is L = k L\{j, j +} and the new set of branching nodes is B = B \ k. Also Q T defines a weighted leaf distribution on L. The sae procedure can be applied repeatedly by defining T = T, until T consists only of the root node. We use this idea to re-write the objective function of the right-hand side of (9) as follows. log 2 Q T (i) = log 2 Q T (i) + log 2 Q T (k) + log 2 q 0 + log 2 q = log 2 Q T (k) + (2 )[log 2 q 0 + log 2 q ]. () k B Since (2 )[log 2 q 0 + log 2 q ] is a constant independent of the tree T, we have argax log 2 Q T (i) = argax log 2 Q T (k). (2) k B The right-hand side of (2) is clearly axiized by the coplete tree with the branching nodes with the greatest weighted probabilities. According to [8, p. 47], this is exactly the tree that is constructed when Tunstall coding is applied to the weighted distribution Q. Part ii. We now consider q 0 and q. Assue we have constructed a non-coplete binary tree. Because of noncopleteness, we can reove a branch fro the tree. Without loss of generality, assue that this branch is labeled by a zero. Denote by S the leaves on the subtree of the branch. Denote the tree after reoving the branch by T. Now, Q T (i) = Q T (i) q 0 Q T (i), for each i S (3) where the inequality follows because by assuption q 0. Thus, for the new tree T, the objective function () is bounded as log 2 Q T (i) = log 2 Q T (i) + Q T (i) log 2 q 0 i S \S log 2 Q T (i). (4) In suary, under the assuption q 0 and q, the objective function () that we want to axiize does not decrease when reoving branches, which shows that there is an optial coplete tree. This proves the stateent ii. of the proposition. III. MINIMIZING I-DIVERGENCE PER BIT The following two propositions relate the proble of iniizing the I-divergence per bit to the proble of iniizing the un-noralized I-divergence. Let T T be soe set of binary trees with 2 leaves and define := in. (5) T T Proposition 2. We have T := argin = argin D(U T Q T ) (6) T T T T where Q T is the weighted distribution induced by Q 2 := [q 0 2, q 2 ]. Proof: By (5), for any tree T T, we have with equality if T = T (7) 0 We write the left-hand side of (8) as with equality if T = T (8) = U T (i) Q T (i) U T (i)l i = [ ] U T (i) Q T (i) log 2 2 li = U T (i). (9) li Q T (i)2 Consider the path through the tree that ends at leaf i. Denote by l 0 i and l i the nuber of ties the labels 0 and occur, respectively. The length of the path can be expressed as l i = l 0 i + l i. The ter Q T (i)2 li can now be written as Q T (i)2 li = q l0 i 0 ql i 2 (l0 i +l i ) = (q 0 2 ) l0 i (q 2 ) l i = Q T (i). (20) Using (20) and (9) in (8) shows that for any binary tree T T we have U T (i) Q T (i) 0 with equality if T = T (2) which is the stateent of the proposition. Proposition 3. Define := in Then the optial coplete tree T := argin. (22) is constructed by applying Tunstall coding to Q T. (23) Proof: The proposition is a consequence of Prop. 2 and Prop..i.

Algorith. ˆT argin solved by Tunstall coding on Q repeat. ˆ = D(U ˆT Q ˆT ) E U ˆT [ D(UT Q T ) ˆ ] Tunstall on Q 2 ˆ 2. ˆT = argin until D(U ˆT Q ˆT ) ˆ E U ˆT = 0 = ˆ, T = ˆT A. Iterative Algorith By Prop. 3, if we know the I-divergence, then we can find T by Tunstall coding. However, is not known a priori. We solve this proble by iteratively applying Tunstall coding to Q 2 ˆ, where ˆ is an estiate of and by updating our estiate. This procedure is stated in Alg.. Proposition 4. Alg. finds (, T ) as defined in Prop. 3 in finitely any steps. Proof: The proof is siilar to the proof of [3, Prop. 4.]. We first show that is strictly onotonically decreasing. Let ˆ i be the value that is assigned to ˆ in step. of the ith iteration and denote by ˆT i the value that is assigned to ˆT in step 2. of the ith iteration. Suppose that the algorith does not terinate in the ith iteration. We have ˆ i = D(U ˆTi Q ˆTi ) E U ˆTi D(U ˆTi Q ˆTi ) ˆ i E U ˆTi = 0. (24) By step 2, we have [ ˆT i = argin D(UT Q T ) ˆ i ] (25) and since by our assuption the algorith does not terinate in the ith iteration, we have D(U ˆTi Q ˆTi ) ˆ i E U ˆTi < 0 D(U ˆTi Q ˆTi ) E U ˆTi < ˆ i ˆ i+ < ˆ i. (26) Now assue the algorith terinated, and let ˆT be the tree after terination. Because of the assignents in steps. and 2., the terinating condition iplies that for any tree T C, we have ˆ 0, with equality if T = ˆT. (27) Consequently, we have ˆ, with equality if T = ˆT. (28) We conclude that after terination, (, ˆT ) is equal to the optial tuple (, T ) in Prop. 3. Finally, we have shown that is strictly onotonically decreasing so that ˆTi ˆT j for all i < j. But there is only a finite nuber of coplete binary trees with 2 leaves. Thus, the algorith terinates after finitely any steps. IV. OPTIMALITY OF COMPLETE TREES Coplete trees are not optial in general: Consider = and Q: q 0 = 5 6, q = 6. For =, Tunstall coding constructs the (unique) coplete binary tree T with 2 leaves, independent of which target vector we pass to it. The path lengths are l = l 2 =. The I-divergence per bit achieved by this is = 2 log 2(q 0 q ) = 0.424 bits. (29) Now, we could instead use a non-coplete tree T with the paths 0 and 0. In this case, I-divergence per bit is = 2 log 2(q 0 q q 0 ) 2 ( + 2) = 0.37034 bits. (30) In suary, for the considered exaple, using a coplete tree is sub-optial. We will in the following derive siple conditions on the target vector Q that guarantee that the optial tree is coplete. A. Sufficient Conditions for Optiality Proposition 5. Let Q be a distribution. If ax{q 0, q } 4 in{q 0, q }, then the optial tree is coplete for any and it is constructed by Alg.. Proof: According to Prop..ii, the tree that iniizes D(U T Q T ) is coplete if the entries of the weighted distribution Q 2 are both less than or equal to one. Without loss of generality, assue that q 0 q. Thus, we only need to check this condition for q 0. We have q 0 2 log 2 q 0 + 0 log 2 q 0. (3) We calculate the value of that is achieved by the (unique) coplete tree with 2 leaves, naely = = 2 log 2(q 0 q ). (32) For each, this is achieved by the coplete tree with all path lengths equal to. Substituting the right-hand side of (32) for in (3), we obtain 2 log 2(q 0 q ) log 2 q 0 + log 2 (q 0 q ) 2 log2 q 0 2 q 0 q q 0 4q q 0 (33) which is the condition stated in the proposition.

B. Asyptotic Achievability for Coplete Trees Proposition 6. Denote by T () the coplete tree with 2 leaves that is constructed by applying Alg. to a target distribution Q. Then we have log 2 in{q 0,q } (34) and in particular, the I-divergence per bit approaches zero as. Proof: The expected length can be bounded by the converse of the Coding Theore for DMS [8, p. 45] as Thus, we have H[U T ()] =. (35) in T () C D(U T () Q T ()). (36) The tree T () that iniizes the right-hand side is found by applying Tunstall coding to Q. Without loss of generality, assue that q 0 q. According to the Tunstall Lea [8, p. 47], the induced leaf probability of a tree constructed by Tunstall coding is lower bounded as Q T ()(i) 2 q, for each leaf i L. (37) We can therefore bound the I-divergence as D(U T () Q T ()) = 2 log 2 2 Q T (i) 2 log 2 2 2 q = log 2 q. (38) We can now bound the I-divergence per bit as This proves the proposition. q log 2 q log 2. (39) C. Optiality of Coplete Trees for Large Enough Proposition 7. For any target distribution Q with q 0 < and q = q 0, there is an 0 such that for all > 0, the tree that iniizes (40) is coplete. Proof: Without loss of generality, assue that q 0 q. By Prop. 6, we have log 2 q. Thus, there exists an 0 such that log 2 q q 2 q 0 2 q 0 2, for all > 0. (4) Thus, for all 0, both entries of Q T () are saller than. The proposition now follows by Prop. 2 and Prop..ii. TABLE I COMPARISON OF V2F SOURCE CODING AND F2V DISTRIBUTION MATCHING: Q: q 0 = 0.65, q = 0.385; = 2 v2f source encoder Tunstall on Q 00 00 0 0 0 0 Alg. on Q 00 0 0 00 0 000 redundancy D(Q T U T ) 0.038503 0.0476 f2v distribution atcher 00 00 0 0 0 0 00 0 0 0 00 000 I-divergence per bit 0.039206 0.037695 V. SOURCE CODING VERSUS DISTRIBUTION MATCHING An ideal source encoder transfors the output of a DMS Q into a sequence of bits that are independent and uniforly distributed. Reversely, applying the corresponding decoder to a sequence of uniforly distributed bits generates a sequence of sybols that are iid according to Q. This suggests to design a f2v distribution atcher by first calculating the optial v2f source encoder. The inverse apping is f2v and can be used as a distribution atcher. We illustrate by an exaple that this approach is suboptial in general. Consider the DMS Q with Q: q 0 = 0.65, q = 0.385. We calculate the optial binary v2f source encoder with blocklength = 2 by applying Tunstall coding to Q. The resulting encoder is displayed in the st colun of Table I. Using the source decoder as a distribution atcher results in an I-divergence per bit of 0.039206 bits. Next, we use Alg. to calculate the optial f2v atcher for Q. The resulting apping is displayed in the 2nd colun of Table I. The achieved I-divergence per bit is 0.037695 bits, which is saller than the value obtained by using the source decoder. In general, the decoder of an optial v2f source encoder is a sub-optial f2v distribution atcher and the deatcher of an optial v2f distribution atcher is a sub-optial v2f source encoder. REFERENCES [] Y. Steinberg and S. Verdu, Siulation of rando processes and ratedistortion theory, IEEE Trans. Inf. Theory, vol. 42, no., pp. 63 86, 996. [2] D. Knuth and A. Yao, The coplexity of nonunifor rando nuber generation. New York: Acadeic Press, 976, pp. 357 428. [3] G. Böcherer, Capacity-achieving probabilistic shaping for noisy and noiseless channels, Ph.D. dissertation, RWTH Aachen University, 202. [Online]. Available: http://www.georg-boecherer.de/ capacityachievingshaping.pdf [4] G. Böcherer and R. Mathar, Matching dyadic distributions to channels, in Proc. Data Copression Conf., 20, pp. 23 32. [5] R. A. Rueppel and J. L. Massey, Leaf-average node-su interchanges in rooted trees with applications, in Counications and Cryptography: Two sides of One Tapestry, R. E. Blahut, D. J. Costello Jr., U. Maurer, and T. Mittelholzer, Eds. Kluwer Acadeic Publishers, 994. [6] G. Böcherer, Rooted trees with probabilities revisited, Feb. 203. [Online]. Available: http://arxiv.org/abs/302.0753 [7] B. Tunstall, Synthesis of noiseless copression codes, Ph.D. dissertation, 967. [8] J. L. Massey, Applied digital inforation theory I, lecture notes, ETH Zurich. [Online]. Available: http://www.isiweb.ee.ethz.ch/archive/ assey scr/adit.pdf