Optimum Binary-Constrained Homophonic Coding

Optimum Binary-Constrained Homophonic Coding Valdemar C. da Rocha Jr. and Cecilio Pimentel Communications Research Group - CODEC Department of Electronics and Systems, P.O. Box 7800 Federal University of Pernambuco 507-970 Recife PE, BRAZIL Abstract This paper introduces an optimum homophonic coding scheme to handle the situation in which the symbols in each homophonic codeword are independent and identically distributed binary random variables obeying an arbitrary probability distribution. An algorithm is given for decomposing the source probabilities which produces, at each step, the expanded alphabet of least entropy. A proof is given that this procedure is optimum in the sense of minimising the overall redundancy. Introduction Homophonic coding [, 2] is a cryptographic technique for reducing the redundancy of a message to be enciphered at the cost of some plaintext expansion. This technique consists of a replacement (one-to-many) of each letter of the original message by a substitute or homophone, in a larger alphabet, to form the plaintext message that is then encrypted. Each homophone is represented (one-to-one) by a codeword, usually in such a manner as to produce uniformly distributed and statistically independent code symbols, and consequently making the cipher more secure by increasing the cipher unicity distance [4]. For simplicity, we consider only the homophonic coding of the output sequence U, U 2, U 3,... of a K-ary discrete memoryless source (DMS), but the theory presented can be applied to sources with memory simply by replacing the probability distribution for U i with the conditional probability distribution for U i given the observed values of U, U 2,..., U i. For a K-ary DMS, the homophonic coding problem reduces to the problem of such coding for a single K-ary random variable U. Let U be a random variable taking values in the finite set {u, u 2,..., u K }. We assume with no loss of essential generality that all K values of U have non-zero probability and that K 2. Let V be a random variable representing the homophones for U, taking values in the set {v, v 2,...}, which may be finite or countably infinite, which is characterized by the fact The authors acknowledge partial support of this research by the Brazilian National Council for Scientific and Technological Development (CNPq) under Grants 30424/77-9 and 300987/96-0.

that for each j there is exactly one i such that P (V = v j U = u i ) 0. For D-ary variable-length homophonic encoding we associate to each homophone a distinct codeword referred as a homophonic codeword. Let X, X 2,..., X W denote a homophonic codeword, where X i is a D-ary random variable and where the length W of the codeword is in general also a random variable. It is required that the homophonic codewords be assigned in such a manner that X, X 2,..., X W is a prefix-free encoding of V, i.e., such that the homophonic codewords x, x 2,..., x w are all distinct and none is the prefix of another. If the components X, X 2,..., X W of the homophonic codewords are independent and uniformly distributed D-ary random variables, the homophonic coding is said to be perfect [2]. Homophonic coding was defined in [2] to be optimum if it is perfect and minimises the average length of a homophone, E[W ], over perfect homophonic codings [3, 5]. In 200 we presented at the VI International Symposium on Communication Theory & Applications [6] an algorithm to perform homophonic coding in which the symbols in each homophonic codeword are independent and identically distributed binary random variables obeying an arbitrary probability distribution Π 2 = {p, p}, where p /2. The importance of this algorithm comes from the fact that it satisfies the upper bound for optimum binary-constrained prefix-free homophonic coding which states that the entropy of a homophone exceeds the source entropy by less than h(p)/p bits [6], where h(.) is the binary entropy function. In the sequel we shall refer to this binary-constrained algorithm as the BCA-200 algorithm. The BCA-200 algorithm provides a solution to the problem posed by Knuth and Yao [7, p.427] on the generation of probability distributions using a biased coin and was brought to our attention by Julia Abrahams [8]. Our motivation for the present paper was the observation that there are cases, as illustrated in Section 2, Example, where we can perform homophonic coding more efficiently than by using the BCA-200 algorithm. In Section 2 we also prove two lemmas that show how to expand the source alphabet with least increase in the expanded alphabet entropy. In Section 3 we introduce an algorithm to perform optimum binary-constrained homophonic coding and prove that it is optimum in the sense of minimising the overall redundancy. We also present an example where both the BCA-200 algorithm and the algorithm proposed here produce identical results, and show that the upper bound h(p)/p, referred earlier, is tight. Finally, in Section 4 we present some concluding remarks. 2 Motivation and basic results We present next an example illustrating that in some cases it is possible to perform homophonic coding more efficiently than by using the BCA-200 algorithm. This was our main motivation to try to improve the BCA-200 algorithm. Example Let U be the K = 2 DMS with P U (u ) = 5/9 and P U (u 2 ) = 4/9. We consider the perfect binary-constrained homophonic coding of U when Π 2 = {2/3, /3} is the code alphabet probability distribution. Applying the BCA-200 algorithm [6] we obtain P U (u ) = 4/9 + 8/3 4+2i 2 i=0

P U (u 2 ) = /3 + 2/27 + 8/3 5+2i, which lead to an average homophonic codeword length E(W ) = 9/9 and to a homophonic coding redundancy E(W ) H(U) = 2. 0.99 =.2 bits. On the other hand, by trial and error we obtain i=0 P U (u ) = 2/9 + 3/9 P U (u 2 ) = 4/9, which lead to an average homophonic codeword length E(W ) = 5/3 and to a homophonic coding redundancy E(W ) H(U) =.667 0.99 = 0.676 bits, i.e., a redundancy representing 60% of the redundancy obtained with the BCA- 200 algorithm. The following two lemmas provide the basis for constructing the algorithm presented in Section 3 for performing optimum binary-constrained homophonic coding. Lemma Let U be a K-ary DMS with probability distribution P U = {P U (u ), P U (u 2 ),..., P U (u K )} and entropy H(U). The entropy H(V ) of the probability distribution P V = {P U (u ), P U (u 2 ),..., βp U (u i ), ( β)p U (u i ),... P U (u K )}, containing K + terms and obtained by expanding the symbol u i from U into two new symbols with probabilities βp U (u i ) and ( β)p U (u i ), respectively, where 0 < β, is given by H(V ) = H(U) + P U (u i )h(β), where h(β) denotes the binary entropy function. Proof: We prove this lemma by expanding and simplifying the expression for H(V ) as follows H(V ) = βp U (u i ) log βp U (u + ( β)p i) U (u i ) log ( β)p U (u + i) K j=,j i P U (u j ) log P U (u j ) = βp U (u i ) log β + βp U (u i ) log P U (u + ( β)p i) U (u i ) log ( β) + ( β)p U (u i ) log P U (u + K i) j=,j i P U (u j ) log P U (u j) = K j= P U (u j ) log P U (u + P j) U (u i )[β log β + ( β) log ( β) ] = H(U) + P U (u i )h(β). Lemma 2 Let U be a K-ary DMS with probability distribution P U = {P U (u ), P U (u 2 ),..., P U (u K )} and entropy H(P U (u ), P U (u 2 ),..., P U (u i ),..., P U (u j ),..., P U (u K )). Let P V = {P U (u ), P U (u 2 ),..., α, P U (u i ) α,..., P U (u K )} be the probability distribution obtained by expanding the symbol u i from U into two new symbols with probabilities α and P U (u i ) α, respectively, and let P V2 = {P U (u ), P U (u 2 ),..., α, P U (u j ) α,..., P U (u K )} be the probability distribution obtained by expanding the symbol u j from U into two new symbols with probabilities α and P U (u j ) α, respectively, i.e., both P V and P V2 contain K + terms each and 0 < α min{p U (u i ), P U (u j )}. The entropy H(V ), associated with P V, is greater than the entropy H(V 2 ), associated with P V2, if and only if P U (u i ) > P U (u j ). 3

Proof: Let = P U (u i ) + P U (u j ) α > 0. If P U (u i ) > P U (u j ) it follows that either P U (u i ) > P U (u i ) α > P U (u j ) > P U (u j ) α or P U (u i ) > P U (u j ) > P U (u i ) α > P U (u j ) α. Or, equivalently, P U (u i )/ > (P U (u i ) α)/ > P U (u j )/ > (P U (u j ) α)/ or P U (u i )/ > P U (u j )/ > (P U (u i ) α)/ > (P U (u j ) α)/. In either case it follows that h(p U (u i )/ ) < h(p U (u j )/ ), i.e., h(p U (u j )/ ) h(p U (u i )/ ) > 0. By subtracting h(p U (u i )/ ) from h(p U (u j )/ ) it follows that h(p U (u j )/ ) h(p U (u i )/ ) = [ (P U (u i ) α) log (P U (u j ) α) log P U (u i ) α P U (u i ) log P U (u j ) α + P U (u j ) log However, by subtracting H(V 2 ) from H(V ) it follows that P U (u i ) ] > 0. P U (u j ) H(V ) H(V 2 ) = (P U (u i ) α) log P U (u i ) α P U (u i ) log P U (u i ) (P U (u j ) α) log P U (u j ) α + P U (u j ) log P U (u j ) = [h(p U (u j )/ ) h(p U (u i )/ )] > 0. () By making α = β i P U (u i ) = β j P U (u j ) in () it follows that H(V ) H(V 2 ) = P U (u i )h(β i ) P U (u j )h(β j ) 0. (2) Corollary (To Lemmas and 2) Let u r be a source symbol for which P U (u r ) α 0 is a minimum, r {, 2,..., K}. To expand the source alphabet by one symbol as in Lemma 2, with the least increase in the resulting expanded alphabet entropy, we must replace the symbol u r by two symbols whose probabilities are α and P U (u r ) α, respectively. 3 An optimum algorithm In standard D-ary homophonic coding the designer benefits from the fact that a given symbol probability P U (u i ), 0 < P U (u i ) <, has an essentially unique base D decomposition. This follows because P U (u i ) either has a unique decomposition as an infinite sum of negative powers of D, or it has both a decomposition as a finite sum of distinct negative powers of D and a decomposition as an infinite sum of distinct negative powers of D in which the smallest term in the finite decomposition is expanded as an infinite sum of successive negative powers of D. For example, for D = 3, P U (u i ) = 4/9 can be decomposed as either P U (u i ) = /3 + /9 or as P U (u i ) = /3 + (/27) i=0 (2/3)i. Constrained homophonic coding unfortunately does not inherit the essentially unique probability decomposition property described above. This means that in order to split the source symbols into homophones we need to work with the whole set of source symbol probabilities, instead of working with only one 4

symbol probability at a time as for the D-ary case. We will describe next a way to perform binary-constrained homophonic coding in the form of an algorithm which we will refer as the BCA-2003 algorithm. 3. The BCA-2003 Algorithm Let Π 2 = {p, p}, p /2, be the probability distribution for the homophonic codeword symbols. For a given source, the BCA-2003 algorithm simultaneously finds the decomposition of each source symbol probability as a sum (finite or infinite) of terms p λ ( p) l λ, and the corresponding prefix-free homophonic code, where λ is the number of s and l λ is the number of zeroes of a homophonic codeword of length l. The homophones are selected as terminal nodes in the binary rooted tree with probabilities T. From any non-terminal node in this tree two branches emanate with probabilities p and p = p, respectively. The label of a path in T is represented by the sequence of zeroes and ones associated with the branches constituting the path. The probability of a path of length l in T, containing λ s and l λ zeroes, is p λ ( p) l λ. In particular, for computing the probability of a terminal node we consider the path extending from the root node to that terminal node. Let v(i, j) denote the j th homophone assigned to the source symbol u i, i K, j =, 2,.... Let α(i, j) denote the probability of v(i, j). Definition We define the symbol running sum γ m (i), associated with the symbol u i, i K, at the m th iteration of the BCA-2003 algorithm as γ m (i) = P U (u i ) j α(i, k), with γ m (i) = P U (u i ) for j = 0, where j denotes the number of homophones allocated to u i up to the m th iteration. Definition 2 We define the running sum set Γ m at the m th iteration of the algorithm as Γ m = {γ m (i) γ m (i) > 0, i K}, with Γ 0 = {P U (u ), P U (u 2 ),..., P U (u K )}. Let γ max = max γ m (i) Γ m, i K. At m = 0 we grow T from the root, starting with only two leaves. We will expand each terminal node in T, whose probability exceeds γ max, by the least number of branches sufficient to make the resulting extended terminal node probability less than or equal to γ max. We call the resulting tree the processed binary rooted tree with probabilities, T p. At the m th iteration, m, a homophone is assigned to a terminal node of the corresponding T p, in a manner that the unused terminal node with largest probability P m is assigned as a homophone to the symbol u r with minimum nonnegative value for the difference between its homophone running sum γ m (r) and P m, i.e., such that min i {γ m (i) P m (γ m (i) P m ) 0} = γ m (r) P m 0, i K. The algorithm consists of the following steps.. Let m = 0. Let γ 0 (i) = P U (u i ), i K. Let Γ 0 = {P U (u ), P U (u 2 ),..., P U (u K )}. k= 5

2. Determine γmax and produce the tree T p for the m th iteration by expanding each terminal node in the tree from the (m ) th iteration, m, whose probability exceeds γ max, by the least number of branches sufficient to make the resulting extended terminal node probability less than or equal to γ max. 3. Find the unused path E l of length l in T p whose probability is largest among unused paths, and denote this largest probability by P m. 4. If, for i K, min i {γ m (i) P m (γ m (i) P m ) 0} = γ m (r) P m 0, then we associate to u r the homophone (terminal node) v(r, j) and the binary homophonic codeword of length l, whose digits constitute the labeling of E l in T p. This implies α(r, j) = P m. Compute the symbol running sum γ m(r) after this decomposition and let Γ m = Γ m {γ m (r)}. If γ m(r) = 0 then let Γ m+ = Γ m. The decomposition of P U (u r ) is now complete and contains j homophones, and if Γ m+ = φ then END. Otherwise, i.e., if γ m(r) > 0, then let Γ m+ = Γ m {γ m(r)}. 5. Let m m +. 6. Go to step 2. Example 2 Let U be the K = 3 DMS with P U = {53/8, 6/8, 4/27}. We consider the perfect binary-constrained homophonic coding of U when Π 2 = {2/3, /3} is the code alphabet probability distribution. Applying the BCA-200 algorithm [6] we obtain P U (u ) = 53/8 = 4/9 + 4/27 + 4/8 + 2/243 + 8/3 7+2i i=0 P U (u 2 ) = 6/8 = 4/27 + 4/8 P U (u 3 ) = 4/27 = /9 + 2/8 + 8/3 6+2i, which lead to an average homophonic codeword length E(W ) = 24/8 and to a homophonic coding redundancy E(W ) H(U) = 2.642.27 =.372 bits. On the other hand, by using the BCA-2003 algorithm we obtain i=0 P U (u ) = 53/8 = 4/9 + /9 + 2/27 + 2/8 P U (u 2 ) = 6/8 = 4/27 + 4/8 P U (u 3 ) = 4/27, which lead to an average homophonic codeword length E(W ) = 68/27 and to a homophonic coding redundancy E(W ) H(U) = 2.52.27 =.25 bits, i.e., a redundancy representing 9% of the redundancy obtained with the BCA-200 algorithm. Proposition Let U be a K-ary DMS with probability distribution P U = {P U (u ), P U (u 2 ),..., P U (u K )} and entropy H(U). Let Π 2 = {p, p} be the homophonic code alphabet probability distribution. The perfect binary-constrained homophonic coding of U in the manner described in the BCA-2003 algorithm minimises the redundancy E(W ) H(U) and is therefore optimum. 6

Proof: It is well known [9] that the homophone entropy H(V ) in perfect homophonic coding is related to the average homophonic codeword length E(W ) by the expression H(V ) = h(p)e(w ). Therefore, by minimising H(V ) we also minimise E(W ). At each step of the BCA-2003 algorithm the alphabet increases by one symbol and, by Lemma 2, it follows that the associated alphabet entropy increases by the least amount. This procedure is repeated until the decomposition of U into homophones V is complete. It follows that H(V ) has the minimum possible value for a given P U and the proposition follows. The tightness of the upper-bound H(V U) h(p)/p [6] is demonstrated in the following example. Example 3 Let U be the K = 2 DMS with P U (u ) = ( p) n = p n and P U (u 2 ) = ( p) n = p n. We consider the perfect binary-constrained homophonic coding of U when Π 2 = {p, p} is the code alphabet probability distribution. Applying the BCA-2003 algorithm we obtain the following probabilities for the homophones representing u : α(, ) = p, α(, 2) = pp, α(, 3) = p 2 p,..., α(, j) = p (j ) p,..., α(, n) = p (n ) p, and for u 2 we obtain a single homophone v(2, ) whose probability is α(2, ) = p n. It follows that H(V U = u ) = H(U)/( p n ) + h(p)/p, H(V U = u 2 ) = 0 and thus H(V U) = P U (u )H(V U = u ) + P U (u 2 )H(V U = u 2 ) = ( p n )H(V U = u ) = H(U) + ( p n ) h(p) p. (3) However, since H(V ) = H(U) + H(V U) = ( p n )h(p)/p, it follows from (3) that h(p) lim H(V U) = n p, because lim n H(U) = 0. This example shows that the upper bound h(p)/p is tight. We remark that both BCA-200 and BCA-2003 algorithms produce identical results in this example because at each step of either algorithm there is only one possibility for performing the source expansion, i.e., γ m () P m > 0 and γ m (2) P m < 0, for m n. For m = n + we have γ n+ () = 0 and γ n+ (2) P n+ = 0. 4 Conclusions We introduced an algorithm for performing optimum binary-constrained homophonic coding, which has the interesting characteristic of simultaneously performing the selection of homophone probabilities and associated homophonic codewords. Furthermore we derived properties of this algorithm which allowed us to prove its optimality for minimising the redundancy of binary-constrained homophonic coding. We remark that the BCA-2003 algorithm produces the same results as those obtained by trial and error in Example. 7

References [] Ch. G. Günther, A universal algorithm for homophonic coding, pp. 405-44 in Advances in Cryptology-Eurocrypt 88, Lecture Notes in Computer Science, No.330. Heidelberg and New York: Springer, 988. [2] H. N. Jendal, Y. J. B. Kuhn and J. L. Massey, An informationtheoretic approach to homophonic substitution, pp. 382-394 in Advances in Cryptology-Eurocrypt 89 (Eds. J.-J. Quisquater and J. Vandewalle), Lecture Notes in Computer Science, No.434. Heidelberg and New York: Springer, 990. [3] V. C. da Rocha Jr. and J. L. Massey, On the entropy bound for optimum homophonic substitution, Proc. IEEE International Symposium on Information Theory, Ulm, Germany, 29 June - 4 July, 997, p.93. [4] C. E. Shannon, Communication theory of secrecy systems, Bell System Tech. J., vol. 28, pp. 656-75, Oct., 949. [5] V. C. da Rocha Jr. and J. L. Massey, Better than optimum homophonic substitution, Proc. IEEE International Symposium on Information Theory. Sorrento, Italy, 25-30 June, 2000, p. 24. [6] V.C. da Rocha Jr. and C. Pimentel, Binary-constrained homophonic coding, VI International Symposium on Communication Theory & Applications, 5-20 July 200, Ambleside, England, pp.263-268. [7] D.W. Knuth and A.C. Yao, The complexity of random number generation, In J.F. Traub, editor,algorithms and Complexity: Recent Results and New Directions. Proceedings of the Symposium on New Directions and Recent Results in Algorithms and Complexity, Carnegie Mellon University, 976. Academic Press, New York, 976. [8] Julia Abrahams, Generation of Discrete Distributions form Biased Coins, IEEE Trans. Inform. Theory, vol. IT-42, pp.54-546, September 996. [9] J.L. Massey, Applied Digital Information Theory, Fach Nr. 35-47 G, 7. Semester, Class notes at the ETH Zurich, Chapter2, Wintersemester 988-989. 8