Fixed-to-Variable Length Distribution Matching

Similar documents
Achievable Rates for Shaped Bit-Metric Decoding

Rooted Trees with Probabilities Revisited

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Randomized Recovery for Boolean Compressed Sensing

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.

A Simple Regression Problem

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

ESE 523 Information Theory

Distance Optimal Target Assignment in Robotic Networks under Communication and Sensing Constraints

Fundamentals of Image Compression

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Generalized Alignment Chain: Improved Converse Results for Index Coding

Sharp Time Data Tradeoffs for Linear Inverse Problems

Lower Bounds for Quantized Matrix Completion

Maximum Entropy Interval Aggregations

Combining Classifiers

Support recovery in compressed sensing: An estimation theoretic approach

An Algorithm for Quantization of Discrete Probability Distributions

On Constant Power Water-filling

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Computational and Statistical Learning Theory

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.

Non-Parametric Non-Line-of-Sight Identification 1

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool

CS Lecture 13. More Maximum Likelihood

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models

On Conditions for Linearity of Optimal Estimation

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Lec 05 Arithmetic Coding

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Error Exponents in Asynchronous Communication

Reed-Muller codes for random erasures and errors

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder

Computational and Statistical Learning Theory

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Interactive Markov Models of Evolutionary Algorithms

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

A Probabilistic and RIPless Theory of Compressed Sensing

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Fairness via priority scheduling

List Scheduling and LPT Oliver Braun (09/05/2017)

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

A Bernstein-Markov Theorem for Normed Spaces

Lecture 21 Nov 18, 2015

Hamming Compressed Sensing

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

3.8 Three Types of Convergence

The Inferential Complexity of Bayesian and Credal Networks

Feature Extraction Techniques

The Weierstrass Approximation Theorem

arxiv: v1 [cs.ds] 3 Feb 2014

A Note on Online Scheduling for Jobs with Arbitrary Release Times

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Asynchronous Gossip Algorithms for Stochastic Optimization

Stochastic Subgradient Methods

Block designs and statistics

Computable Shell Decomposition Bounds

Mixed Robust/Average Submodular Partitioning

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Multi-Dimensional Hegselmann-Krause Dynamics

Quantile Search: A Distance-Penalized Active Learning Algorithm for Spatial Sampling

Chapter 6 1-D Continuous Groups

In this chapter, we consider several graph-theoretic and probabilistic models

Distributed Subgradient Methods for Multi-agent Optimization

Compressive Sensing Over Networks

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

arxiv: v1 [cs.ds] 17 Mar 2016

Kernel Methods and Support Vector Machines

Impact of Imperfect Channel State Information on ARQ Schemes over Rayleigh Fading Channels

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem.

The Hilbert Schmidt version of the commutator theorem for zero trace matrices

AN APPLICATION OF INCOMPLETE BLOCK DESIGNS. K. J. C. Smith University of North Carolina. Institute of Statistics Mimeo Series No. 587.

1 Proof of learning bounds

Fundamental Limits of Database Alignment

A remark on a success rate model for DPA and CPA

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

1 Rademacher Complexity Bounds

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Quantile Search: A Distance-Penalized Active Learning Algorithm for Spatial Sampling

New Slack-Monotonic Schedulability Analysis of Real-Time Tasks on Multiprocessors

Machine Learning Basics: Estimators, Bias and Variance

time time δ jobs jobs

A Markov Framework for the Simple Genetic Algorithm

Learnability and Stability in the General Learning Setting

arxiv: v2 [math.co] 3 Dec 2008

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Transcription:

Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de arxiv:302.009v2 [cs.it] Jul 203 Abstract Fixed-to-variable length (f2v) atchers are used to reversibly transfor an input sequence of independent and uniforly distributed bits into an output sequence of bits that are (approxiately) independent and distributed according to a target distribution. The degree of approxiation is easured by the inforational divergence between the output distribution and the target distribution. An algorith is developed that efficiently finds optial f2v codes. It is shown that by encoding the input bits blockwise, the inforational divergence per bit approaches zero as the block length approaches infinity. A relation to data copression by Tunstall coding is established. I. INTRODUCTION Distribution atching considers the proble of apping uniforly distributed bits to sybols that are approxiately distributed according to a target distribution. In difference to the siulation of rando processes [] or the exact generation of distributions [2], distribution atching requires that the original bit sequence can be recovered fro the generated sybol sequence. We easure the degree of approxiation by the noralized inforational divergence (Idivergence), which is an appropriate easure when we want to achieve channel capacity of noisy and noiseless channels [3, Sec. 3.4.3 & Chap. 6] by using a atcher. A related work is [4], [3, Chap. 3], where it is shown that variable-to-fixed length (v2f) atching is optially done by geoetric Huffan coding and the relation to fixed-to-variable length (f2v) source encoders is discussed. In the present work, we consider binary distribution atching by prefix-free f2v codes. A. Rooted Trees With Probabilities We use the fraework of rooted trees with probabilities [5], [6]. Let T be the set of all binary trees with 2 leaves and consider soe tree T T. Index all nodes by the nubers N := {, 2, 3,... } where is the root. Note that there are at least N 2 + nodes in the tree, with equality if the tree is coplete. A tree is coplete if any right-infinite binary sequence starts with a path fro the root to a leaf. Let L N be the set of leaf nodes and let B := N \ L be the set of branching nodes. Probabilities can be assigned to the tree by defining a distribution over the 2 paths through the tree. For each i N, denote by P T (i) the probability that a path is chosen that passes through node i. Since each path ends at a different leaf node, P T defines a leaf distribution, i.e., This work was supported by the Geran Ministry of Education and Research in the fraework of an Alexander von Huboldt Professorship. U T (0) = 3 4 U T () = U 2 T (0) = 2 3 2 U T (2) = 3 4 U 2 T () = 3 U 4 T (0) = 2 4 U T (4) = 2 U 4 T () = 2 3 U T (3) = UT () = 4 4 (a) tree probabilities defined by a leaf distribution Q T () = 2 Q T (2) = 2 3 q = 3 5 U T (5) = 4 4 Q T (4) = 4 9 q = 3 3 Q T (3) = q = 3 3 (b) tree probabilities defined by branching distributions 5 Q T (5) = 2 9 6 7 6 7 U T (6) = 4 U T (7) = 4 Q T (6) = 8 27 Q T (7) = 4 27 Fig.. A rooted tree with probabilities. The set of branching nodes is B = {, 2, 4} and the set of leaf nodes is L = {3, 5, 6, 7}. In (a), a unifor leaf distribution U T is chosen, i.e., for each i L, U T (i) =. The leaf 4 distribution deterines the node probabilities and branching distributions. In (b), identical branching distributions are chosen, i.e., for each i B, Q i T = Q, where Q(0) =: and Q() =: q =. The branching 3 distributions deterine the resulting node probabilities, which we denote by Q T (i). Since the tree is coplete, Q T (i) = and Q T defines a leaf distribution. P T (i) =. For each branching node i B, denote by PT i the branching distribution, i.e., the probabilities of branch 0 and branch after passing through node i. The probabilities on the tree are copletely defined either by defining the branching distributions {PT i, i B} or by defining the leaf distribution {P T (i), i L}. See Fig. for an exaple.

B. v2f Source Encoding and f2v Distribution Matching Consider a binary distribution Q with q = Q(0), q = Q(), 0 < q 0 <, and a binary tree T with 2 leaves. Let Q T (i), i N, be the node probabilities that result fro having all branching distributions equal to Q, i.e. Q i T = Q for each i B. See Fig. (b) for an exaple. Let U T be a unifor leaf distribution, i.e., U T (i) = 2 for each i L, see Fig. (a) for an exaple. We use the tree as a v2f source code for a discrete eoryless source (DMS) Q. To guarantee lossless copression, the tree for a v2f source encoder has to be coplete. Consequently, Q T defines a leaf distribution, i.e., Q T (i) =. We denote the set of coplete binary trees with 2 leaves by C. Each code word consists of log 2 2 = bits and the resulting entropy rate at the encoder output is H(Q T ) = Q T (i)[ log 2 Q T (i)] = Q T (i)[ log 2 Q T (i) + log 2 U T (i) log 2 U T (i)] = D(Q T U T ) () where H(Q T ) is the entropy of the leaf distribution defined by Q T and where D(Q T U T ) is defined accordingly. Fro (), we conclude that the objective is to solve in D(Q T U T ). (2) The solution is known to be attained by Tunstall coding [7]. The tree in Fig. is a Tunstall code for Q:, q = 3 and = 2 and the corresponding v2f source encoder is 000 00, 00 0, 0 0,. (3) The dual proble is f2v distribution atching. Q is now a binary target distribution and we generate the codewords defined by the paths through a (not necessarily coplete) binary tree uniforly according to U T. For exaple, the f2v distribution atcher defined by the tree in Fig. is 00 000, 0 00, 0 0,. (4) Denote by l i, i L the path lengths and let L be a rando variable that is uniforly distributed over the path lengths according to U T. We want the I-divergence per output bit of U T and Q T to be sall, i.e., we want to solve in T T. (5) In contrast to (2), the iniization is now over the set of all (not necessarily coplete) binary trees with 2 leaves. Note that although for a non-coplete tree we have Q T (i) <, the proble (5) is well-defined, since there is always a coplete tree with leaves L L and Q T (i) =. The su in (5) is over the support of U T, which is L. Solving (5) is the proble that we consider in this work. C. Outline In Sec. II and Sec. III, we restrict attention to coplete trees. We show that Tunstall coding applied to Q iniizes and that iteratively applying Tunstall coding to weighted versions of Q iniizes /. In Sec. IV we derive conditions for the optiality of coplete trees and show that the I-divergence per bit can be ade arbitrarily sall by letting the blocklength approach infinity. Finally, in Sec. V, we illustrate by an exaple that source decoders are sub-optial distribution atchers and vice-versa, distribution deatchers are sub-optial source encoders. II. MINIMIZING I-DIVERGENCE Let R be the set of real nubers. For a finite set S, we say that W : S R is a weighted distribution if for each i S, W (i) > 0. We allow for i S W (i). The I-divergence of a distribution P and a weighted distribution W is D(P W ) = i supp P P (i) log 2 P (i) W (i) where supp denotes the support of P. The reason why we need this generalization of the notion of distributions and I- divergence will becoe clear in the next section. Proposition. Let Q be a weighted binary target distribution, and let (6) T = argin (7) be an optial coplete tree. Then we find that i. An optial coplete tree T can be constructed by applying Tunstall coding to Q. ii. If 0 q 0 and 0 q, then T also iniizes aong all possibly non-coplete binary trees T, i.e., the optial tree is coplete. Proof: Part i. We write and hence = argin 2 log 2 2 Q T (i) = 2 = argax log Q T (i) (8) log 2 Q T (i) (9) Consider now an arbitrary coplete tree T C. Since the tree is coplete, there exist (at least) two leaves that are siblings, say j and j +. Denote by k the corresponding branching node. The contribution of these two leaves to the objective function on the right-hand side of (9) can be written as log 2 Q T (j) + log 2 Q T (j + ) = log[q T (k)q 0 ] + log[q T (k)q ] = log Q T (k) + log Q T (k) + log q 0 + log q. (0)

Now consider the tree T that results fro reoving the nodes j and j +. The new set of leaf nodes is L = k L\{j, j +} and the new set of branching nodes is B = B \ k. Also Q T defines a weighted leaf distribution on L. The sae procedure can be applied repeatedly by defining T = T, until T consists only of the root node. We use this idea to re-write the objective function of the right-hand side of (9) as follows. log 2 Q T (i) = log 2 Q T (i) + log 2 Q T (k) + log 2 q 0 + log 2 q = log 2 Q T (k) + (2 )[log 2 q 0 + log 2 q ]. () k B Since (2 )[log 2 q 0 + log 2 q ] is a constant independent of the tree T, we have argax log 2 Q T (i) = argax log 2 Q T (k). (2) k B The right-hand side of (2) is clearly axiized by the coplete tree with the branching nodes with the greatest weighted probabilities. According to [8, p. 47], this is exactly the tree that is constructed when Tunstall coding is applied to the weighted distribution Q. Part ii. We now consider q 0 and q. Assue we have constructed a non-coplete binary tree. Because of noncopleteness, we can reove a branch fro the tree. Without loss of generality, assue that this branch is labeled by a zero. Denote by S the leaves on the subtree of the branch. Denote the tree after reoving the branch by T. Now, Q T (i) = Q T (i) q 0 Q T (i), for each i S (3) where the inequality follows because by assuption q 0. Thus, for the new tree T, the objective function () is bounded as log 2 Q T (i) = log 2 Q T (i) + Q T (i) log 2 q 0 i S \S log 2 Q T (i). (4) In suary, under the assuption q 0 and q, the objective function () that we want to axiize does not decrease when reoving branches, which shows that there is an optial coplete tree. This proves the stateent ii. of the proposition. III. MINIMIZING I-DIVERGENCE PER BIT The following two propositions relate the proble of iniizing the I-divergence per bit to the proble of iniizing the un-noralized I-divergence. Let T T be soe set of binary trees with 2 leaves and define := in. (5) T T Proposition 2. We have T := argin = argin D(U T Q T ) (6) T T T T where Q T is the weighted distribution induced by Q 2 := [q 0 2, q 2 ]. Proof: By (5), for any tree T T, we have with equality if T = T (7) 0 We write the left-hand side of (8) as with equality if T = T (8) = U T (i) Q T (i) U T (i)l i = [ ] U T (i) Q T (i) log 2 2 li = U T (i). (9) li Q T (i)2 Consider the path through the tree that ends at leaf i. Denote by l 0 i and l i the nuber of ties the labels 0 and occur, respectively. The length of the path can be expressed as l i = l 0 i + l i. The ter Q T (i)2 li can now be written as Q T (i)2 li = q l0 i 0 ql i 2 (l0 i +l i ) = (q 0 2 ) l0 i (q 2 ) l i = Q T (i). (20) Using (20) and (9) in (8) shows that for any binary tree T T we have U T (i) Q T (i) 0 with equality if T = T (2) which is the stateent of the proposition. Proposition 3. Define := in Then the optial coplete tree T := argin. (22) is constructed by applying Tunstall coding to Q T. (23) Proof: The proposition is a consequence of Prop. 2 and Prop..i.

Algorith. ˆT argin solved by Tunstall coding on Q repeat. ˆ = D(U ˆT Q ˆT ) E U ˆT [ D(UT Q T ) ˆ ] Tunstall on Q 2 ˆ 2. ˆT = argin until D(U ˆT Q ˆT ) ˆ E U ˆT = 0 = ˆ, T = ˆT A. Iterative Algorith By Prop. 3, if we know the I-divergence, then we can find T by Tunstall coding. However, is not known a priori. We solve this proble by iteratively applying Tunstall coding to Q 2 ˆ, where ˆ is an estiate of and by updating our estiate. This procedure is stated in Alg.. Proposition 4. Alg. finds (, T ) as defined in Prop. 3 in finitely any steps. Proof: The proof is siilar to the proof of [3, Prop. 4.]. We first show that is strictly onotonically decreasing. Let ˆ i be the value that is assigned to ˆ in step. of the ith iteration and denote by ˆT i the value that is assigned to ˆT in step 2. of the ith iteration. Suppose that the algorith does not terinate in the ith iteration. We have ˆ i = D(U ˆTi Q ˆTi ) E U ˆTi D(U ˆTi Q ˆTi ) ˆ i E U ˆTi = 0. (24) By step 2, we have [ ˆT i = argin D(UT Q T ) ˆ i ] (25) and since by our assuption the algorith does not terinate in the ith iteration, we have D(U ˆTi Q ˆTi ) ˆ i E U ˆTi < 0 D(U ˆTi Q ˆTi ) E U ˆTi < ˆ i ˆ i+ < ˆ i. (26) Now assue the algorith terinated, and let ˆT be the tree after terination. Because of the assignents in steps. and 2., the terinating condition iplies that for any tree T C, we have ˆ 0, with equality if T = ˆT. (27) Consequently, we have ˆ, with equality if T = ˆT. (28) We conclude that after terination, (, ˆT ) is equal to the optial tuple (, T ) in Prop. 3. Finally, we have shown that is strictly onotonically decreasing so that ˆTi ˆT j for all i < j. But there is only a finite nuber of coplete binary trees with 2 leaves. Thus, the algorith terinates after finitely any steps. IV. OPTIMALITY OF COMPLETE TREES Coplete trees are not optial in general: Consider = and Q: q 0 = 5 6, q = 6. For =, Tunstall coding constructs the (unique) coplete binary tree T with 2 leaves, independent of which target vector we pass to it. The path lengths are l = l 2 =. The I-divergence per bit achieved by this is = 2 log 2(q 0 q ) = 0.424 bits. (29) Now, we could instead use a non-coplete tree T with the paths 0 and 0. In this case, I-divergence per bit is = 2 log 2(q 0 q q 0 ) 2 ( + 2) = 0.37034 bits. (30) In suary, for the considered exaple, using a coplete tree is sub-optial. We will in the following derive siple conditions on the target vector Q that guarantee that the optial tree is coplete. A. Sufficient Conditions for Optiality Proposition 5. Let Q be a distribution. If ax{q 0, q } 4 in{q 0, q }, then the optial tree is coplete for any and it is constructed by Alg.. Proof: According to Prop..ii, the tree that iniizes D(U T Q T ) is coplete if the entries of the weighted distribution Q 2 are both less than or equal to one. Without loss of generality, assue that q 0 q. Thus, we only need to check this condition for q 0. We have q 0 2 log 2 q 0 + 0 log 2 q 0. (3) We calculate the value of that is achieved by the (unique) coplete tree with 2 leaves, naely = = 2 log 2(q 0 q ). (32) For each, this is achieved by the coplete tree with all path lengths equal to. Substituting the right-hand side of (32) for in (3), we obtain 2 log 2(q 0 q ) log 2 q 0 + log 2 (q 0 q ) 2 log2 q 0 2 q 0 q q 0 4q q 0 (33) which is the condition stated in the proposition.

B. Asyptotic Achievability for Coplete Trees Proposition 6. Denote by T () the coplete tree with 2 leaves that is constructed by applying Alg. to a target distribution Q. Then we have log 2 in{q 0,q } (34) and in particular, the I-divergence per bit approaches zero as. Proof: The expected length can be bounded by the converse of the Coding Theore for DMS [8, p. 45] as Thus, we have H[U T ()] =. (35) in T () C D(U T () Q T ()). (36) The tree T () that iniizes the right-hand side is found by applying Tunstall coding to Q. Without loss of generality, assue that q 0 q. According to the Tunstall Lea [8, p. 47], the induced leaf probability of a tree constructed by Tunstall coding is lower bounded as Q T ()(i) 2 q, for each leaf i L. (37) We can therefore bound the I-divergence as D(U T () Q T ()) = 2 log 2 2 Q T (i) 2 log 2 2 2 q = log 2 q. (38) We can now bound the I-divergence per bit as This proves the proposition. q log 2 q log 2. (39) C. Optiality of Coplete Trees for Large Enough Proposition 7. For any target distribution Q with q 0 < and q = q 0, there is an 0 such that for all > 0, the tree that iniizes (40) is coplete. Proof: Without loss of generality, assue that q 0 q. By Prop. 6, we have log 2 q. Thus, there exists an 0 such that log 2 q q 2 q 0 2 q 0 2, for all > 0. (4) Thus, for all 0, both entries of Q T () are saller than. The proposition now follows by Prop. 2 and Prop..ii. TABLE I COMPARISON OF V2F SOURCE CODING AND F2V DISTRIBUTION MATCHING: Q: q 0 = 0.65, q = 0.385; = 2 v2f source encoder Tunstall on Q 00 00 0 0 0 0 Alg. on Q 00 0 0 00 0 000 redundancy D(Q T U T ) 0.038503 0.0476 f2v distribution atcher 00 00 0 0 0 0 00 0 0 0 00 000 I-divergence per bit 0.039206 0.037695 V. SOURCE CODING VERSUS DISTRIBUTION MATCHING An ideal source encoder transfors the output of a DMS Q into a sequence of bits that are independent and uniforly distributed. Reversely, applying the corresponding decoder to a sequence of uniforly distributed bits generates a sequence of sybols that are iid according to Q. This suggests to design a f2v distribution atcher by first calculating the optial v2f source encoder. The inverse apping is f2v and can be used as a distribution atcher. We illustrate by an exaple that this approach is suboptial in general. Consider the DMS Q with Q: q 0 = 0.65, q = 0.385. We calculate the optial binary v2f source encoder with blocklength = 2 by applying Tunstall coding to Q. The resulting encoder is displayed in the st colun of Table I. Using the source decoder as a distribution atcher results in an I-divergence per bit of 0.039206 bits. Next, we use Alg. to calculate the optial f2v atcher for Q. The resulting apping is displayed in the 2nd colun of Table I. The achieved I-divergence per bit is 0.037695 bits, which is saller than the value obtained by using the source decoder. In general, the decoder of an optial v2f source encoder is a sub-optial f2v distribution atcher and the deatcher of an optial v2f distribution atcher is a sub-optial v2f source encoder. REFERENCES [] Y. Steinberg and S. Verdu, Siulation of rando processes and ratedistortion theory, IEEE Trans. Inf. Theory, vol. 42, no., pp. 63 86, 996. [2] D. Knuth and A. Yao, The coplexity of nonunifor rando nuber generation. New York: Acadeic Press, 976, pp. 357 428. [3] G. Böcherer, Capacity-achieving probabilistic shaping for noisy and noiseless channels, Ph.D. dissertation, RWTH Aachen University, 202. [Online]. Available: http://www.georg-boecherer.de/ capacityachievingshaping.pdf [4] G. Böcherer and R. Mathar, Matching dyadic distributions to channels, in Proc. Data Copression Conf., 20, pp. 23 32. [5] R. A. Rueppel and J. L. Massey, Leaf-average node-su interchanges in rooted trees with applications, in Counications and Cryptography: Two sides of One Tapestry, R. E. Blahut, D. J. Costello Jr., U. Maurer, and T. Mittelholzer, Eds. Kluwer Acadeic Publishers, 994. [6] G. Böcherer, Rooted trees with probabilities revisited, Feb. 203. [Online]. Available: http://arxiv.org/abs/302.0753 [7] B. Tunstall, Synthesis of noiseless copression codes, Ph.D. dissertation, 967. [8] J. L. Massey, Applied digital inforation theory I, lecture notes, ETH Zurich. [Online]. Available: http://www.isiweb.ee.ethz.ch/archive/ assey scr/adit.pdf