Motivation for Arithmetic Coding

Motivation for Arithmetic Coding Motivations for arithmetic coding: 1) Huffman coding algorithm can generate prefix codes with a minimum average codeword length. But this length is usually strictly greater than H(X 1 ) 2) To improve the coding efficiency, one can use block memoryless code by working with the extended alphabet X n. But computational complexity will grow exponentially as n increases Thus for small n, the Huffman coding is inefficient. On the other hand, for large n, it is unpractical due to its exponential coding complexity. Solution: Arithmetic coding is one of the algorithms that can address the above issue. It can achieve the entropy rate of a stationary source with a linear coding complexity.

Shannon-Fano-Elias Codes Let (X 1, X n ) be a random vector with joint pmf p(u 1,u 2, u n ), u i єx={x 0, x J-1 }. We partition the interval [0,1] into disjoint sub-intervals I(u 1,u 2, u n ), u 1,u 2, u n єx n such that the following properties hold: 1) The length of the interval I(u 1,u 2, u n ) is equal to p(u 1,u 2, u n ). 2) I u1 un u n 1 u n X ( ) = [0,1] 3) The intervals I(u 1,u 2, u n ) are arranged according to the natural lexicographic order on the sequence u 1,u 2, u n. I(x 0 ) I(x 1 ) I(x 2 ) I(x J-2 ) I(x J-1 ) n=1 I(x 0 x 0 ) I(x 0 x 1 ) I(x 0 x J-1 ) I(x 1 x 0 ) I(x J-1 x J-1 ) n=2

Shannon-Fano-Elias Codes (cntd) I(x 0 x 0 x 0 x 0 ) = [0, p(x 0 x 0 x 0 x 0 )] I(x 0 x 0 x 0 x 1 ) = [p(x 0 x 0 x 0 x 0 ), p(x 0 x 0 x 0 x 0 )+ p(x 0 x 0 x 0 x 1 )] : I(x J-1 x J-1 x J-1 )= [1-p(x J-1 x J-1 x J-1 ), 1] To get the codeword corresponding to u 1 u 2 u n, let I(u 1 u 2 u n ) = [a, b]. Represent the mid-point a+b/2 by its binary expansion a+ b = 0. BB 1 2 BL 2 i = B 2, B {0,1}. i=1 i i Let L= log pu ( 1 un) + 1= log( b a) + 1 The binary sequence B 1 B 2 B L is the codeword of u 1 u 2 u n. The length of the codeword assigned to u 1 u 2 u n is equal to log pu ( 1 u n ) + 1

Let a+ b 2 L Shannon-Fano-Elias Codes: Decoding a+ b = 0. BB 1 2 B 2 L L is the real number obtained by rounding off (a+b)/2 to the first L bits. We can prove a+ b 2 L is inside the interval [a,b]. a+ b a+ b 2 2 L a+ b a+ b 0.00 0B B 2 = 2 a+ b 2 L L i=l+1 i L 1 n 1 L+ 1 L+ 2 = B 2 < 2 = 2 2 i ( log p( u u ) + 1) 1 b a = pu ( 1 un ) = 2 2 n is inside [a, b]. Furthermore, a+ b a+ b L, 2 [ ab, ] 2 + L 2 L [ log p( u u ) + 1] After receiving the codeword B 1 B 2 B n, the decoder searches through all u 1 u 2 u n єx n until the unique u 1 u 2 u n is found for which I(u 1 u 2 u n ) contains a+ b = 0. BB 1 2 B 2 L, and then decodes B 1 B 2 B L as the unique u 1 u 2 u n. L

Shannon-Fano-Elias Codes: Example x p(x) I(x) L(X)= log ( ) + 1 midpoint C(x) x 0.25 0 [0, 0.25] 3 0.001 001 x 0.5 1 [0.25, 0.75] 2 0.10 10 x 0.125 2 [0.75, 0.875] 4 0.1101 1101 x 0.125 3 [0.875, 1] 4 0.1111 1111 Shannon-Fano-Elias Code is a prefix code.

Arithmetic Coding The encoding complexity of the Shannon-Fano-Elias coding algorithm mainly lies in the process of determining the interval I(u 1 u 2 u n ). Similarly, given B 1 B 2 B L, the decoding complexity of the Shannon- Fano-Elias coding algorithm mainly lies in the process of finding the unique interval I(u 1 u 2 u n ) such that the point 0.B 1 B 2 B L is in I(u 1 u 2 u n ). In arithmetic coding, both of the processes can be realized sequentially with linear complexity. The idea of arithmetic coding was originated by Elias and later made practical by Rissanen, Pasco, Moffat and Witten.

Arithmetic Coding (Continued) 1) To determine the interval I(u 1 u 2 u n ), we decompose the joint probability p(u 1 u 2 u n ) as p(u 1 u 2 u n ) = p(u 1 ) p(u 2 u 1 ) p(u 3 u 1 u 2 ) p(u n u 1 u n-1 ) we then construct a sequence of embedded intervals I( u ) I( uu 1 2) I( uu 1 2 u 1 n 2) Partition the interval [0, 1] into disjoint subintervals I(x j ), 0 j J-1 shown below 0 I(x 0 ) I(x 1 ) I(x 2 ) I(x J-2 ) I(x J-1 ) 1 The length of the interval I(x j ) is equal to p(x j ). Then I(u 1 )= I(x j ) if u 1 =x j. 3) If I(u 1 u 2 u i )=[a i,b i ], we then partition [a i,b i ] into disjoint sub-intervals I(u 1 u i x j ), 0 j J-1 according to the conditional pmf p(x j u 1 u i ), 0 j J-1, shown below. I(u 1 u i x 0 ) I(u 1 u i x 0 ) I(u 1 u i x J-1 ) a i ) b i

Arithmetic Coding (Continued) The length of the interval I(u 1 u i x j ) is equal to p(u 1 u i x j ) = p(u 1 u i ) p(x j u 1 u i ) = the length of [a i, b i ]x p(x j u 1 u i ) Then I(u 1 u i u i+1 ) = I(u 1 u i x j ) if u i+1 =x j 4) Repeat step 3) until the interval I(u 1 u n ) is determined. The last interval I(u 1 u n ) is the desired interval. 5) To get the codeword corresponding to u 1 u n, we apply the same procedure as in the Shannon-Fano-Elias coding. let I(u 1 u 2 u n ) = [a, b]. Let L= log pu (. Rounding off the midpoint (a+b)/2 to the first L 1 u n ) + 1 bits, we get a+ b = 0. BB 1 2 B 2 L L The sequence B 1 B 2 B L is the codeword corresponding to u 1 u n.

Arithmetic coding ( Decoding) The decoding process can be realized sequentially. 1) Partition [0, 1) into disjoint sub-intervals I(x j ), 0 j J-1. If 0.B 1 B 2 B L єi(x j ), set u 1 =x j. 2) Having decoded u 1 u 2 u i, we then partition I(u 1 u 2 u i ) into disjoint subintervals I(u 1 u 2 u i x j ), 0 j J-1. If 0.B 1 B 2 B L є I(u 1 u 2 u i x j ), then set u i+1 =x j. 3) Repeat step 2) until the sequence u 1 u 2 u n is decoded.

Arithmetic coding 1) In arithmetic coding, the length n of the sequence u 1 u 2 u n to be compressed is assumed to be known to both the encoder and the decoder. 2) The length of the codeword length assigned to u 1 u 2 u n is L pu1 u n = log ( ) + 1 Thus the average codeword length in bits/symbol converges to the entropy rate of a stationary source as n approaches infinity.

Arithmetic Coding (Example) Let {x i } be a discrete memoryless source with a common pmf p(0)=2/5, p(1)=3/5, and the alphabet X={0,1} Let u 1 u 2 u 5 =10110. We have I(1)=[2/5, 1] I(10)=[2/5, 16/25] I(101)=[62/125, 16/25] I(1011)=[346/625, 16/25] I(10110)=[346/625, 1838/3125] The length of I(101100) is 108/3125 108 L= log + 1= 6 3125 Midpoint = 1784/3125 = 0.100100 and the codeword = 100100

Arithmetic coding Source symbol Probability Initial Subinterval x 0 0.2 [0.0, 0.2) x 1 0.2 [0.2, 0.4) x 2 0.4 [0.4, 0.8) x 3 0.2 [0.8, 1.0] Let the message to be encoded be x 0 x 1 x 2 x 2 x 3

Encoding sequence: x0x1x2x2x3 x 0 x 1 x 2 x 2 x 3 1.0 0.2 0.08 0.072 0.0688 0.8 0.16 0.072 0.0688 0.06752 0.4 0.08 0.056 0.0624 0.06496 0.2 0.04 0.048 0.0592 0.06368 0.0 0 0.04 0.056 0.0624

The final interval [0.06752,0.0688), we can get the codeword length L and the corresponding codeword.

Adaptive Arithmetic Coding In the above description of arithmetic coding, we assume that both the encoder and decoder know in advance the joint pmf of the random vector (X 1, X 2, X n ). In practice, the pmf is often unknown, and has to be estimated online and offline. For simplicity, let x={0,1}. The initial pmf is equally likely, i. e., p(0) = p(1) = ½ After u 1 u 2 u i is processed, the conditional pmf given u 1 u 2 u i is given by p(1 u u u ) = 1 2 i p(0 u u u ) = 1 2 i number of 1 in u1u 2 ui+ 1 i+ 2 number of 0 in u 1u 2 u i + 1 i+ 2 Let u 1 u 2 u 8 = 11001010. Then according to the above 1 2 1 2 3 3 4 4 p( u1u2 u 6) =p(11001010)= 2 3 4 5 6 7 8 9

p(1 u u u ) = Adaptive Arithmetic Coding Another choice for the conditional pmf given u 1 u 2 u i is as follows 1 2 i p(0 u u u ) = 1 2 i number of 1 in u1u2 ui+ 1/ 2 i+ 1 number of 0 in u1u2 ui+ 1/ 2 i+ 1 1 3/ 2 1/ 2 3/ 2 5/ 2 5 / 2 7 / 2 7 / 2 p(11001010)= 2 2 3 4 5 6 7 8

Lempel-Ziv Algorithm Adaptive arithmetic coding presented at the end of the last section is universal because it does not require source statistics and can achieve the ultimate compression rate of any discrete memoryless source Lempel-Ziv is another universal source coding algorithm developed by Ziv and Lempel. One Lempel-Ziv algorithm is LZ77 which is known as sliding window Lempel-Ziv algorithm, which is published in 1977. One year later, they propose a variant of LZ77, the incremental parsing Lempel-Ziv algorithm, i.e., LZ78. In this course we will look at LZ78.

Lempel-Ziv parsing LZ78 adopts a incremental parsing procedure, which parses the source sequence u 1 u 2 u n into non-overlapping variable-length blocks. The first substring in the incremental parsing of u 1 u 2 u n is u 1. The second substring in the parsing is the shortest phrase of u 1 u 2 u n appeared so far in the parsing. that has not Assume that u 1, u 1 u n2,u n2 +1 u n 3, u ni-1 +1 u n i are the substrings created so far in the parsing process. The next substring, which is denoted as u ni +1 u n i+1 is the shortest phrase of u i+1 u n n that has not appeared in {u 1, u1 u n2,un2+1 u n3, uni-1+1 u is such a ni } prefix exists Otherwise u +1 u ni n = u i+1 n +1 u i n with n i+1 =n, and the incremental parsing procedure terminates.

Lempel-Ziv parsing: Example Example 1 1 0 10 11 100 111 00 1110 001 110 01 The incremental parsing procedure yields the following partition 1, 0, 10, 11, 100, 111, 00, 1110, 001, 110, 01 Example 2 1 10 11 0 00 110 1 1, 10, 11, 0, 00, 110, 1 In this example, the last substring 1 has already appeared.

Lempel-Ziv parsing The concatenation of all phrases is equal to the original source sequence. All phrases are distinct, except that the last phrase could be equal to one of the preceding ones. In Example 2, the last phrase is equal to the first one. All phrases except the last one are distinct. Λ Let denote an empty string. Think of as an initial phrase before the first phrase in the incremental parsing. Each new phrase in the parsing is the concatenation of a previous phrase with a new output letter from the source sequence. For example, the first phase 1 is the concatenation of the empty string with the new symbol 1. similarly, the phrase 110 is the concatenation of the phrases 11 with the new symbol 0. Λ

Lempel-Ziv Encoding Let X={x 0, x J-1 }. The Lempel-Ziv encoding of the sequence u 1 u 2 u n can be implemented sequentially as follows. 1. The first phrase u 1 is uniquely determined by (0, u 1 ) where the index 0 is corresponding to the initial empty phrase. Represent the pair (0,u1) by the integer 0xJ+index(u 1 ) where the index(u 1 )=j if u 1 =x j, 0 j J-1. Encode the first phrase into the binary representation of the integer 0xJ+index(u 1 ) = index(u 1 ) padded with possible zeros on the left to ensure that the total length of the codeword is logj 2. Having determined the ith phrase, we know that the ith phrase is equal to the concatenation of the mth phrase with a new symbol x j for some 0 m i-1 and 0 j J-1. Represent the ith phrase into the binary representation of the integer mxj+j padded with some possible zeros on the left to ensure that the total of the codeword is 3. Repeat step 2 until all phrases are encoded. Λ logij

Lempel Ziv Encoding: Example Partitioned phrases: 1 10 11 0 00 110 1 X ={0,1}, J=2. Phrases ( m, j ) codewords length 1 (0, 1) 1 1 10 (1, 0) 10 2 11 (1, 1) 011 3 0 (0, 0) 000 3 00 (4, 0) 1000 4 110 (3, 0) 0110 4 1 (0,1) 0001 4 So the Lempel-Ziv coding transforms from the original source sequence 1 10 11 0 00 110 1 To 1 10 011 000 1000 0110 0001

Lempel Ziv Encoding In the example, instead of compression, we get expansion. The problem is that the source sequence in the example is too short. In fact the LZ78 can achieve the entropy rate of any stationary source as the length of the source sequence goes without bound. If there are t phrases in the incremental parsing of u 1 u 2 u n, then the length of the whole Lempel-Ziv codeword for u 1 u 2 u n is t logij i= 1

Lempel Ziv Decoding The decoding process is easy and can also be done sequentially since the decoder knows in advance that the length of the codeword corresponding to the ith phrase is logij After receiving the whole codeword, the decoder parses the whole codeword into non-overlapping substring of lengths logij, 1 i t. From the ith string, the decoder finds the integer mj+j and the pair (m,j). Then the ith phrase is the concatenation of the mth phrase with the symbol x j.

Lempel Ziv Decoding: Example 1 10 011 000 1000 0110 0001 Integers 1 2 3 0 8 6 1 pairs (0,1) (1,0)(1,1) (0,0) (4,0) (3,0) (0,1) Phrases 1 10 11 0 00 110 1

Performance of Lempel-Ziv Coding Theorem 2.6.1 Let {X i } be a discrete stationary source. Let r(x 1 X n ) be the ratio between the length of the whole Lempel-Ziv codeword for X 1 X n and the length n of X 1 X n is the compression rate in bits per symbol. Then as E[r(X 1 X n ) ] H ( X ) n