SGN-2306 Signal Compression. 1. Simple Codes

Size: px
Start display at page:

Download "SGN-2306 Signal Compression. 1. Simple Codes"

Transcription

1 SGN-236 Signal Compression. Simple Codes. Signal Representation versus Signal Compression.2 Prefix Codes.3 Trees associated with prefix codes.4 Kraft inequality.5 A lower bound on the average length of optimal prefix codes.6 Shannon codes.7 Encoding a binary tree. Signal Representation versus Signal Compression An example of image representation The luminance at one pixel varies from black to white, in 256 gray levels. Alphabet: A = {,, 2, 3,..., 255} having 256 symbols. Each symbol is represented using 8 bits = byte. Image Size: 52 rows 52 columns, in total pixels, thus the file size is bytes Empirical probability of the symbol x (symbol frequency, or histogram) h(x) = Number of occurences of the symbol x Number of pixels in the image, for all symbols x A Original image Lena Symbol frequency (histogram).2 Original Image..8 h(x) (histogram) x (graylevel value) 2

2 Ready made tools for signal compression applied to Lena image Comparing several widely available compression programs for general use (Original file size Compressed file size) Unix compress utility (based on Ziv-Lempel method) File size: bytes bytes (7.46 bits/pixel) Unix gzip utility (a variation on Ziv-Lempel method) with option - (faster than with option -9) File size: bytes bytes (6.88 bits/pixel) Unix gzip utility with option -9 (takes longer than option -, but usually gives better compression) File size: bytes bytes (7.4 bits/pixel) Unix bzip2 utility (Burrows-Wheeler method) with option - (block size= k) File size: bytes bytes (5.65 bits/pixel) Unix bzip2 utility with option -9 (block size= 9k) File size: bytes 8462 bytes (5.6 bits/pixel) 3 Store the first column of lena, D(:, ). A simple (revertible) image transformation for Lena For the second column, D(:, 2), do not store it, but store instead the differences E(:, 2) = D(:, 2) D(:, ). Having available D(:, ) and E(:, 2) we can recover the second column D(:, 2) = E(:, 2) + D(:, ). For the third column, D(:, 3), instead of storing it, store the differences E(:, 3) = D(:, 3) D(:, 2). Continue to store in the same way E(:,k) = D(:,k) D(:,k ). The difference image is shown in the middle, below. Histogram of the original D, the difference image E, and histogram of the transformed image..2 Original Image.9 Difference Image ( pixel horiz. transl.) h(x) (histogram).6 h(x) (histogram) x (graylevel value) x (graylevel value) Entropy= 7.44 Difference image Entropy= 5.5 Note: the alphabet of the image E(:, :) is ideally A = { 255, 254,..., 3, 2,,,, 2, 3,...,255}, but for image lena it is A = { 25, 254,..., 3, 2,,,, 2, 3,...,75}. We added to each pixel in the difference image the value 25, in order to be able to display the negative values, finally using imagesc function in matlab for showing the values. 4

3 Shannon lossless coding theorem The basic fact in Data Compression If a source generates N symbols independently, with probabilities 2... N p p p 2... p N then, the lowest number of bits/symbol needed for coding is H = N i= p i log 2 p i () Example: For original data represented with 8 bits, the symbols are {,, 2,...,255}. If all symbols are independent and uniformly distributed, p i = const = /256, and then H = N i= p i log 2 p i = N (/256) ( 8) = (/256) = 8 (2) i= thus data is incompressible (the original representation is as good as one can get). If the value 28 occurs with probability p 28 =.99 while the rest of the values occur with probability p i = const =./255, then the entropy is H = N and therefore data is compressible 5: (since 8/.6 = 5). i= p i log 2 p i =.6, (3) 5 The Ways of Signal Compression If the symbols are generated independently, nobody can encode a signal at a bitrate lower than the entropy of the independent source. The techniques for designing the codes with bitrates close to entropy are described in the Data compression part of the course (lectures to 6). If the symbols are dependent, the decorrelation of the source (using any of the following techniques: prediction, transforms, filterbanks) will improve the compression. If the original signal is transformed in a lossy manner (using scalar/vector quantization in conjunction with the following techniques: prediction, transforms, filterbanks) the compression ratio of the signal can be orders of magnitude better. 6

4 .2 Prefix codes A code is a set of codewords which are used to represent the symbols (or strings of symbols) of the original alphabet. Consider the original alphabet as the integers I 8 = {,,...,255}. The binary representation using 8 bits is the most simple code. A gray level image has the pixel values usually in the set I 8 and we refer to an uncompressed image as the file containing, in the scanning order left-right and up-down, the values of the pixels represented in binary form. Consider the ASCII set of symbols. The ASCII table specifies 7-bit binary codes for all English alphabet letters, punctuation signs, digits and several other special symbols. We refer usually to an original (uncompressed) text as the file containing the text stored as a sequence of ASCII symbols. The above codes have a common feature, all their codewords have the same length in bits. They satisfy an essential property of coding: reading a sequence of codes one can uniquely decode the message, without the need of separators between codewords. But even with unequal length of codewords the instantaneous decoding property may hold, this being the case of prefix codes. 7.3 Trees associated with prefix codes We will visualize prefix codes as leaves in binary (or D-ary) trees. The binary trees will have their nodes labelled by the sequence of bits read on the path from the root to the node (with the convention that a left branch carry a, and a right branch carry a ). In the sequel we will usually identify the codewords with the leaves in a tree. Example of a code for the alphabet {a,b, c,d} DEPTH DEPTH DEPTH 2 ROOT INTERIOR NODE LEAF CODEWORD Table Symbol Codeword Codeword length a 2 b 2 c 2 d 2 BINARY TREE FOR THE CODE IN TABLE The string d,d, c,a, a, b is coded as. To decode correctly there is no need of codeword separators! 8

5 Example of a code for the alphabet {a,b, c,d, e, f, g} Table 2 Symbol Codeword Codeword length a 3 b 3 c 4 d 4 e 3 f 2 g 2 BINARY TREE FOR THE CODE IN TABLE 2 The string eeabgf is coded as. To decode correctly there is no need of codeword separators! 9 Example of a code for the alphabet {,, 2, 3, 4, 5, 6} (continued) Assume now that the frequencies of the symbols are as follows: Symbol i Codeword Codeword length l(i) Symbol frequency p(i) (a) 3.25=/8 (b) 3.25=/8 2(c) 4.625=/6 3(d) 4.625=/6 4(e) 3.25=/8 5(f) 2.25=/4 6(g) 2.25=/4 The average codelength is L = 6 i= p(i)l(i) = 3 (.25 3) + 2 (.625 4) + 2 (.25 2) = When we encode a very long string, of n symbols, nl is approximately the length of the encoded string, in bits. Why? The usual binary representation of {,, 2, 3, 4, 5, 6} requires 3 bits, resulting in an average codelength of 3 bits. The code of Table 2 has a shorter average codelength than the binary representation.

6 In fact for the given p(i) the code in Table 2 achieves the shortest possible codelength in the class of prefix codes. This will be obvious at the end of present lecture (exercise!). The code in Table 2 obeys the following rule: the more probable a symbol, the shorter its codelength..4 Kraft inequality The prefix condition introduces constraints on the set of codeword lengths, e.g. there is no prefix code with five codewords such that w = w 2 = w 3 = w 4 = w 5 = 2, where w denotes the length of the codeword w. These constraints on possible codeword lengths are completely characterized by the Kraft inequality for prefix codes, very easy to prove and visualize using the associated trees. Theorem (Kraft inequality) a) Given a set of numbers l,...,l n which satisfy the inequality then there is a prefix code with w = l,..., w n = l n. n i= 2 l i (4) b) Reciprocally, for every prefix free code having the codeword lengths w = l,..., w n = l n, the inequality (4) is true. 2

7 Proof (by a simple example). Depth = Depth = Depth =2 Depth =3 Depth =4 (Maximum)Depth =5 Partition of nodes 2 (5 3) (5 4) + 2 (5 2) (5 4) + 2 (5 3) + 2 (5 2) = 2 5 The tree in red has the set of codewords (the leafs marked with red asterisk): (,,,,,,,,,), having the codelengths l i (3,5,5,4,2,5,5,4,3,2). Each leaf has exactly 2 5 l i descendants on the last level, at maximum depth 5. The total number of these descendants can t exceed the total number of nodes at depth 5, which is 2 5. From i 2 5 l i 2 5 it results i 2 l i. 3 A second proof of Part a) (showing how to construct a prefix tree once Kraft inequality is satisfied) Without loss of generality suppose the numbers l,...,l n are ordered increasingly, and denote k i the number of occurrences of the integer i in the sequence l,...,l n. The sum in (4) can be alternatively written n i= 2 l i = l n j=l k j 2 j. We construct a prefix code in the following way: take k j leaves at depth j = l (which is possible since k j 2 j, i.e. k j 2 j, and obviously there are 2 j nodes at depth j in the tree). At depth j + we need to place k j + leaves, and from (4) one gets easily k j + 2 j + 2k j, which shows that we have to place at depth j + fewer leaves than the available places (at depth j + there are 2 j + nodes, but 2k j are not free, namely the siblings of the k j nodes which were declared leaves at depth j ). Continuing the same way, the inequality (4) will constrain the number of leaves to be placed at depth j to k j 2 j 2 j j k j... 2k j, which makes the placement possible. A second proof of part b) The proof is by successively reducing the associated tree to smaller depth trees. Consider the maximal depth of the original tree l max = l n (there are m n leaves with length l max, which contributes m n times the terms 2 l max in the sum (4)). 4

8 We create another tree, with maximal depth l max, by pruning two by two the leaves which are siblings to create shorter words, now of length l max, or if a leaf in the original tree does not have a sibling simply replace it by its parent. When pruning the siblings w and w to get the parent node w we have 2 w = 2 w + 2 w, while when replacing a son wb (b is either or ) by the parent w we have 2 w > 2 wb, therefore the overall sum w W lmax 2 w over the leaves of the new tree is greater or equal than the overall sum over the original tree w W lmax 2 w. By repeating the process, we find the chain of inequalities 2 w 2 w... 2 w = = w W lmax w W lmax w W If the code is D-ary (the codewords are represented using D symbols, then the Kraft inequality reads n D l i (5) i= and Theorem holds true with (5) replacing (4). Kraft inequality holds true also for any uniquely decodable codes (not only for prefix codes, or instantaneous codes). Example: The code in Table 2 satisfies Kraft inequality with equality: n i= 2 l i = = 5.5 A lower bound on the average length of optimal prefix codes The average length of the codewords is L = n p il i (6) i= which is the quantity we want to minimize for a given probability distribution p,p 2,...,p n. The prefix code strategy will not allow the lengths l,...,l n to be selected freely, they will be constrained by the Kraft inequality. The optimization problem is an integer linear programming problem, but with nonlinear constraints. To get a hint to the optimal solution we relax the integer requirement for the code length. 6

9 A lower bound on the average length of optimal prefix codes (continued) We minimize the extended criterion p il i + λ( n i i= 2 l i ) (7) where λ is an extra variable (Lagrange multiplier) to be determined such that the (Kraft) constraint is fulfilled. Taking derivative of (7) with respect to the real parameters l j and then equating to zero we get p j + λ2 l j ln 2 =, or p j λ ln 2 = 2 l j introducing it in the constraint equality n i= p i λ ln 2 = we obtain the optimal value λ = n p i i= ln 2 = ln 2. Finally we get 2 l j = p j λ ln 2 = p j or l j = log 2 p j (8) The optimal length of the code for the symbol j is therefore log 2 p j, quantity which is called the self-information of the symbol j. The average length of the optimal codewords is n i= p i log 2 p i (9) 7 and it is called entropy. By the optimality of l j = log 2 p j that we proved, no code can get a better average length than the entropy. In the case when the values log 2 p j are integers, there is a prefix code actually attaining the entropy bound. Finding the prefix code which minimizes the average codelength L for the case when at least for one j the value log 2 p j is non-integer is solved by the simple construction discovered by Huffman in 952 (to be discussed later in the lecture). 8

10 .6 Shannon code Shannon s code is derived from the result of the optimization solution (8). If the self-information log 2 p j is not integer, a natural attempt to allocate the code lengths is: where x denotes the smallest integer greater than x. l j = log 2 p j () To show that the lengths () define a prefix code we need to check the Kraft inequality, which results at once: n i= 2 l i = n 2 log 2 p i n 2 log 2 p i = n p i = () i= i= the first inequality being due to log 2 p i log 2 p i. The average codelength of the prefix code defined by () is L = i p i log 2 p i and can be lower and upper bounded by using log 2 p j log 2 p j log 2 p j + and therefore i p i log 2 p i i p i log 2 p i i i= p i log 2 + p i 9 which immediately gives H(p) L H(p) + stating that the average codelength is within bit of the entropy. 2

11 Example of a Shannon code for the alphabet {,, 2, 3, 4, 5, 6} Exercise: consider the probabilities given in Table 3 and draw in the template below one (out of many possible) prefix trees corresponding to the Shannon code: Template where you should draw the Shanon code Table 3 Symbol i probability p(i) l(i) = log 2 p(i) The entropy of the given source is H = 6 p i log 2 p i = i= The average codelength of the Shannon code is L = i p i l(i) = 3.2 A slightly better code has the codelengths l = [ ] and the average codelength L = 2.97 The trivial binary representation of the symbols requires only 3 bits, which is better than the Shannon code. A simple method to code with at a bitrate closer to the entropy is to block the alphabet symbols. 22

12 Coding blocks of symbols We start with an example: consider as symbols of the alphabet all the pairs (s,s 2 ) for the source in Table 3. The new alphabet has now 7 7 = 49 symbols, in the set {(, ), (, ),...,(5, 6), (6, 6)} and their probabilities are p(s,s 2 ) = p(s )p(s 2 ) The entropy of the pairs is 2 times the entropy found for a single symbol: H(p(s,s 2 )) = p(s i,s j ) log 2 p(s i,s j ) = p(s i )p(s j ) log 2 p(s i )p(s j ) (i,j) i j = p(s i )p(s j )(log 2 p(s i ) + log 2 p(s j )) = p(s i ) p(s j ) log 2 p(s j ) i j i j p(s j ) p(s i ) log 2 p(s i ) = H(p(s)) p(s i ) + H(p(s)) p(s j ) = 2H(p(s)) j i i j The length of the Shannon code for the new alphabet is bounded: L((s,s 2 )) < H(p(s,s 2 ))+ = 2H(p(s))+, therefore the number of bits per symbol will be 2 L((s,s 2 )) < H(p(s))+ 2. Taking now blocks of n symbols and constructing a Shannon code for them, the number of bits per symbol will be n L((s,...,s n )) < H(p(s)) + n. Increasing the block length, one can be as close to the entropy as one wishes. Exercise: Extend the alphabet of the source to blocks of ten symbols and then design the Shannon code and find its actual length (you need a computer to solve this) Encoding a binary tree Suppose we have a n symbol alphabet, for which we associated by a code design procedure a binary tree, having n leaves. How can we enumerate all binary trees which have n leaves? First we define an order in which we are going to inspect the interior nodes and leaves of the tree: We start from the root, than we go through the nodes/leaves at the first depth level from left to right, continue with second depth level and continue until the last depth level of the tree. When inspecting the nodes, we write down a string, one bit for each node. If the node is an interior node we write a, if it is a leaf we write down a. When somebody else is reading the string, he will be able to draw the tree! A tree with n leaves has n interior nodes (this can be proven easily by mathematical induction). The number of bits necessary to encode a binary tree with n leaves is therefore n + (n ) = 2n. Not all combinations of 2n bits will correspond to a valid binary tree, therefore there are at most 2 2n binary trees with n leaves. 24

13 Encoding a binary tree: example Inspect the tree and insert a in the interior nodes, a zero in the leaf nodes. Then write the bits in top-down, left -right scan. Encoding the structure of a binary tree The binary string associated with the tree is:,,,,, Commas have been added for clarity, to see the different depth levels, they are not needed if one wants to decode the string of bits to a tree structure. There are n = leaves and 2n = 9 bits in the sequence. 25 The complete message for you (the prefix tree is also included and is listed first): Space c d g o u k L 26

14 SIGNAL COMPRESSION 2. Huffman Codes 2. Huffman algorithm for binary coding alphabets 2.2 Canonical Huffman coding 2.3 Huffman algorithm for D-ary coding alphabets 2.4 Huffman codes for infinite alphabets 27 A recap of the previous lecture A prefix code can be represented by a binary tree The length of the codewords should be small for frequently occurring symbols Kraft inequality constrains the lengths of the symbols The average codelength is a good measure of coding performance The ideal codelength for a source emitting symbol i with probability p(i) is log 2 p(i) (it minimizes the average codelength). If the ideal log 2 p(i) codelength is not integer, there is a need of an optimization algorithm. This is curent lecture s topic: Huffman algorithm. 28

15 2. Huffman algorithm for binary sources Huffman coding associates to each symbol i, i {, 2,...,n}, a code, based on the known symbol probabilities p i, p i {p,p 2,...,p n }, such that the average codelength is minimized. No code assignment can be better than Huffman s code if the source is independent. Procedure. Arrange symbol probabilities p i in increasing order; Associate to them a list of n nodes at the zero th (initial) level of a tree.. While there is more than one node in the list.. Merge the two nodes with smallest probability from the list into a new node, which will have assigned as probability the sum of probabilities in the merged nodes. Delete from the list the two nodes which were merged..2 Arbitrarily assign and to the branches from the new node to the nodes which were merged. 2. Associate to each symbol the code formed by the sequence of s and s read from the root node to the leaf node. 29 Memory requirement: Once the Huffman codes have been associated to symbols, they are stored in a table. This table must be stored and be used both at encoder and decoder (e.g. it may be transmitted along with the data, at the beginning of the message). Extremely fast: Coding (and encoding) takes basically one table look up. 3

16 Example of using Huffman algorithm We consider the same example which was used to illustrate the Shannon code. The probabilities are given in Table 3. The steps of the algorithm are illustrated in the next two pages HUFFMAN c= d= b= a= e= f= g= Table 3 Symbol probability Huffman code Shannon code i p(i) l H (i) l Sh (i) = log 2 p(i) a (). 3 4 b (2) c (3) d (4) e (5) f (6) g (7) Huffman code assigns shorter codewords than Shannon codewords to the symbols a, c and g. 3 Initial step of the algorithm c d a e b g f Step of the algorithm c d a e b g f Step 2 of the algorithm.4 Step 5 of the algorithm c d a e b g f Step 3 of the algorithm c d a e b g f Step 4 of the algorithm c d a e b g f Step 6 of the algorithm c d a e b g f c d a e b g f 32

17 The entropy of the given source is H = 6 p i log 2 p i = The average codelength of the Shannon code is L = p i l Sh (i) = 3.2 i i= The average codelength of the Huffman code is L = i p i l H (i) = 2.67 A simple method to code with at a bitrate closer to the entropy is to block the alphabet symbols. As an example, consider the new alphabet as all pairs of the symbols {(a,a),...,(g,f), (g, g)}. Building the Huffman code for this source we get the average codelength L 2 = (i,j) p (i,j) l H ((i,j)) = which gives a per symbol average codelength 2 L 2 = , better than the Huffman average codelength when coding symbol by symbol. For blocks of three symbols one gets a per symbol average codelength 3 L 3 = For blocks of four symbols one gets a per symbol average codelength 4 L 4 = Note that the better compression is paid by a higher complexity: Coding symbol by symbol we need to store a code table with 7 entries, while when coding blocks of 4 symbols at a time we need a table with 24 entries. 33 Optimality of the Huffman code Proposition In the tree W we prune (merge) the symbols i n and i n to obtain the tree W P having the following symbols: (a) the n 2 symbols i,...,i n 2 and (b)the extra-symbol obtained by merging the symbols i n and i n (this extra-node having probability p(i n ) + p(i n )). Then the average lengths over the two trees are linked by Properties of the optimal code: L(W) = L(W P ) + p(i n ) + p(i n ) (a) If for two symbols i and i 2 we have p(i ) < p(i 2 ) then l(i ) l(i 2 ). (b) The longest two codewords in an optimal code have to be the sons of the same node and have to be assigned to the two symbols having the smallest and second smallest probability. Proof: (a) If l(i ) < l(i 2 ) and p(i ) < p(i 2 ) just switch the codewords for i and i 2. The resulting codelength minus the original codelength will be p(i 2 )(l(i ) l(i 2 )) + p(i )(l(i 2 ) l(i )) = (p(i 2 ) p(i ))(l(i ) l(i 2 )), which is negative, therefore the original code was not optimal. (b) By (a) we know the longest two codewords in an optimal code have to be assigned to the two symbols having the smallest and second smallest probability. If they don t have the same length, the lengthier can be shortened, therefore the code was not optimal (this can be seen also from the fact that the optimal tree has to be complete, i.e., each node has no descendents, or exactly two descendent; having only one descendent is not possible in an optimal tree). By a swapping argument always the smallest and second smallest can be arranged to be sons of the same node. Recursively applying property (b) one gets the Huffman algorithm, which results to provide the optimal tree. 34

18 Huffman procedure function [Tree, code,l_h,el_h,h]=huffman2(p) % The probabilities of the symbols...n are given %% Output: %% The codewords for the original symbols may be displayed by using %% for i=:n %% bitget(code(i),l_h(i):-:) %% end %% The Tree structure uses the original symbols :n and indices %% for interior nodes (n+):(2n-) %% The rows in Tree are nodes at same depth n=length(p); %% START THE HUFFMAN CONSTRUCTION % Initial step list_h=:n; plist=p; m=n+; pred=zeros(,2*n-); % Will keep indices of the nodes in the list % Will keep the probabilities of the nodes in list_h % Index of the first new node % pred(i) will show which node is the parent of node i while(length(list_h)>) 35 [psort,ind]=sort(plist); i= ind() ; i2= ind(2) ; % nodes ind() and ind(2) will be merged pnew=plist(i)+plist(i2); % Probability of the new node pred(list_h(i))=m; % the predecessor of list_h(i) is m pred(list_h(i2))=m; list_h([i,i2])=[]; % remove from the list nodes ind() and ind(2) plist([i,i2])=[]; % remove the probabilities of the deleted nodes list_h=[list_h m]; % add node m to the list plist=[plist pnew]; % specify the probability of node m m=m+; % Index of the new node end % end while %% END OF HUFFMAN CONSTRUCTION %% We extract now the structure of the coding tree Tree=[];nTree=[]; Tree(,)=2*n-; depth=; ind=find(pred==2*n-); % These are the sons of the root Tree(depth+,:2)=ind; ntree(depth+)=2; code(ind())=; code(ind(2))=; ntogo=2*n-4; 36

19 while(ntogo>) ntree(depth+2)=; for i=:ntree(depth+) father=tree(depth+,i); ind=find(pred==father); % These are the sons of the father if(~isempty(ind)) Tree(depth+2,nTree(depth+2)+[ 2])=ind; code(ind())=+2*code(father); code(ind(2))=+2*code(father); ntree(depth+2)=ntree(depth+2)+2; ntogo=ntogo-2; end % end if end % end for depth=depth+; end % end while %% We have the tree code structure and we find the %% codes and lengths for the symbols for i=:n [i,i2]=find(tree==i); L_H(i)=i-; bitget(code(i),l_h(i):-:) end 37 % Huffman s average code length EL_H=sum(p.*L_H(:n)) % Find the entropy of the source H indp=find(p>); H=-sum(p(indp).*log(p(indp)))/log(2); 38

20 2.2 Canonical Huffman coding Huffman construction may generate many different (but equivalent) trees When assigning and to the branches from a father to the sons, any choice is allowed. The resulting trees will be different, the codes will be different, but the length of a given codeword is the same for all trees. Therefore the optimal code is not unique. At every interior node we can switch the decision / of the two branches. There are n interior nodes, therefore Huffman algorithm can produce at least 2 n different codes. At one step in the algorithm, there may be ties, cases when the probabilities of the symbols are equal, and we may choose freely which symbol to consider first. This is a second source of non-unicity. Table 4 Symbol probability Huffman code Huffman code 2 Canonical Huffman i p(i) w(i) w(i) w(i) a (). b (2).5 c (3).5 d (4).9 e (5).4 f (6).27 g (7).2 39 Huffman construction may generate many different but equivalent trees HUFFMAN HUFFMAN f= g=.5..4 b= a= e=.2 g= f=.4. e= a=.5 b= c= d= CANONICAL HUFFMAN.9.5 d= c=..5.4 a= b= e=.27.2 f= g=.5.9 c= d= 4

21 The Canonical Huffman code in Table 5 is listed in Table 6 such that the codelengths are decreasing, and for a given codelength the symbols are listed in lexicographic order and their codewords are consecutive binary numbers. Table 5 Symbol Canonical Huffman i w(i) a () b (2) c (3) d (4) e (5) f (6) g (7) Table 6 Symbol Canonical Huffman i w(i) c (3) d (4) a () b (2) e (3) f (6) g (7) With Canonical Huffman codes all we need to store is the string from the first column of Table 6, and the table cdabefg Symbol Canonical Huffman i w(i) c (3) a () f (6) 4 Assigning the codewords in Canonical Huffman coding Given: the symbols {i}, for which the optimal lengths {l(i)} are known. Denote maxlength=max i {l(i)} /* Find how many codewords of length j =,...,maxlength are in {l(i)} */ For l = to maxlength Set num[l] = For i = to n Set num[l i ] = num[l i ] + 2 /* The integer for first code of length l is stored in firstcode(l). */ Set firstcode(maxlength) For l = maxlength- downto Set firstcode(l) (firstcode(l + ) +num[l + ] )/2 3 For l = to maxlength Set nextcode(l) firstcode(l) 4 For i = to n Set codeword(i) nextcode(l i ) Set symbol[l i,nextcode(l i ) firstcode(l i )] i Set nextcode(l i ) nextcode(l i ) + 42

22 Example of building firstcode[l] and Symbol[l i,j] for canonical Huffman coding Table 7 Symbol Code length codeword[i] Bit pattern l-bit prefix i l i (a) 2 2(b) 5 3(c) 5 4(d) 3 5(e) (f) (g) (h) num[l] 3 4 firstcode[l] 2 2 The array Symbol[l i,j] (a) 5(e) 8(h) 3 4(d) 4 5 2(b) 3(c) 6(f) 7(g) 43 The Canonical Huffman code from Table 7 in a tree form 44

23 Decoding using a Canonical Huffman code Set v nextinputbit(). Set l. 2 While v < firstcode[l] do (a) Set v 2v + nextinputbit() (b) Set l l + At the end of while loop the integer v is a valid code, of l bits 3 Return symbol[l, v firstcode[l]] This is the index of the decoded symbol. 45 The codewords of length l occupy consecutive positions when read as integers. When decoding, we receive the bits one after another, and interpret them according to Decoding algorithm. Example for the canonical Huffman from Table 7: After we receive the first bit, b, we find it to be smaller than firstcode[] = 2, so we read the next bit, b 2. After we receive the second bit, if the binary number v = b b 2 is larger or equal to firstcode[2] = we are done, and read the decoded symbol from the table symbol[2,v firstcode[2]]. If the binary number v = b b 2 is smaller than firstcode[2] = we have to read a new bit, b 3. After we receive the third bit, if the binary number v = b b 2 b 3 is larger or equal to firstcode[3] = we are done, and read the decoded symbol from the table symbol[3,v firstcode[3]] = d. If the binary number v = b b 2 b 3 is smaller than firstcode[3] = we have to read a new bit, b 4. After we receive the fourth bit, b 4, we find it to be smaller than firstcode[4] = 2, so we read the next bit, b 5. After we receive the fifth bit, we find that the binary number v = b b 2 b 3 b 4 b 5 is larger or equal to firstcode[5] =, so we are done, and read the decoded symbol from the table symbol[5,v firstcode[5]]. As an example, if the bitstream received is... First stage: Read b =, initialize v = b =. Since v = < firstcode[] = 2 we read the next bit, b 2 =, and update v = 2 + =. Now v = < firstcode[2] =, so we read the next bit, b 3 =, and update v = 2 + =. Now v = = firstcode[3], so we decode by looking into symbol[3,v firstcode[3]] = symbol[3, ] = d. Second stage: We read now b 4 = and initialize v = b 4 =. Since v = < firstcode[] = 2 we read the next bit, b 5 =, and update v = 2 + = 2. Now v = 2 > firstcode[2] = so we decode by reading in symbol[2, v firstcode[2]] = symbol[2, ] = a. 46

24 2.3 Huffman algorithm for D-ary coding alphabets A prefix code for D ary coding alphabets can be represented as a D ary tree. A D ary complete tree has the following property: denote the number of interior nodes n i and number of leaves n. Then n = (D ) n i + (2) The proof is simple. Start with the complete tree having one interior node (the root), n i =, and n = D leaves; it obeys n = (D ) n i +. Now with any complete tree satisfying n = (D ) n i + we split one leaf, getting an extra interior node and D leaves, therefore after the split n new = n + D = (D ) n i + + D = (D ) (n i + ) + = (D ) n new i +. With D ary trees, the optimal codes have clearly the following properties: (a) If for two symbols i and i 2 we have p(i ) < p(i 2 ) then l(i ) l(i 2 ). (b) The longest codeword should not be the unique son of an interior node (at least another codeword must have the same length as the longest codeword). Huffman coding associates to each symbol a code, based on the known symbol probabilities {p,p 2,...,p n }. Procedure. Complete with n zero probability nodes the list of initial nodes {,...,n} such that in total there are n+n = (D ) n i + with n i an integer (i.e. we can build a complete D-ary tree having n + n leaves). 47. Arrange symbol probabilities p i in increasing order; Associate to them a list of n nodes at (initial) level of a tree.. While there is more than one node in the list.. Merge the D nodes with smallest probability from the list into a new node, which will have assigned as probability the sum of probabilities in the merged nodes. Delete from the list the D nodes which were merged..2 Arbitrarily assign,,...,d to the branches from the new node to the nodes which were merged. 2. Associate to each symbol the code formed by the sequence of,,...,d read from the root node to the leaf node. 48

25 2.4 Huffman codes for infinite alphabets Huffman algorithm starts building the code tree from the largest depth. If the alphabet is infinite, Huffman algorithm can not be directly applied. We consider as alphabet the natural numbers {,, 2,...} where the symbol probability is geometric with an arbitrary parameter θ (, ). P(i) = ( θ)θ i, i =,, 2,... (3) An example of geometric distribution is when the source is binary, Bernoulli i.i.d., with θ the probability of zero. The runs of s are to be encoded using one code, the runs of s using another code. The probability of i zero symbols followed by a one is P(i) = ( θ)θ i, hence the need to encode integers i having geometric distribution. More on run length coding latter! Consider a source with parameter θ. There is an integer l (sometimes denoted l(θ)) which satisfies both inequalities θ l + θ l+ < θ l + θ l (4) Proof: take θ =,θ,θ 2,θ 3..., which is monotonically decreasing to zero. The sequence θ + θ,θ + θ 2,θ 2 + θ 3..., is also monotonically decreasing to zero, and has the first term greater than. Therefore there is an integer l satisfying (4). The construction which follows will demonstrate a Huffman code, therefore an optimal code, for the geometric distribution with parameter θ. The same code will be optimal also for any other θ which satisfies θ l + θ l+ < θ l + θ l for l = l(θ). Therefore the code will be specific to a l value, not to a θ value. 49 Consider a source with parameter θ and the value l(θ) associated to it. For any m define the m-reduced source which has m + l + symbols, with following probabilities: Easily we see m+l i= P m (i) =. P m (i) = ( θ)θ i i m m < i m + l ( θ)θ i θ l The probabilities ( θ)θ i are monotonically decreasing when i increases, the same being true for ( θ)θi θ l. We show that the two symbols with smallest probabilities are m and m + l. First we show P m (m + l) P m (m ), or ( θ)θ m+l θ l ( θ)θ m which is a straight consequence of left side of (4). second we show that P m (m + l ) > P m (m), or ( θ)θ m+l > ( θ)θ m θ l which is a straight consequence of right side of (4) Knowing now that the two symbols with smallest probabilities are m and m + l, we apply Huffman algorithm, which merges the two symbols into a new one, with probability ( θ)θ m+l + ( θ)θ m ( θ)θm = (θ l + θ l ( θ)θm ) = = P θ l θ l θ l m (m) (6) 5 (5)

26 The source after merging the two nodes will have m + l symbols, with probabilities P m (i) P m (i) = ( θ)θ i i m m < i m + l ( θ)θ i θ l which is exactly the reduced source of order m. With the same type of merging two nodes, we reduce consecutively the original source until m =. At this last stage we merge two symbols to get a l alphabet with probabilities P (i) = (7) ( θ)θi θ l, i l (8) Little algebra will provide the optimal code for the reduced source. For convenience of notations we present now only the case when l is an integer power of two: l = 2 k. In this case the optimal code for (8) is as follows: use wordlength k for all symbols i =,...,l, i.e. use the binary representation using k bits for i. The overall code, named Golomb-Rice code, has the following rules: Any integer i is coded in two parts: First, the last k bits of i are written to the codeword (i.e. the binary representation of (i mod 2 k )). Then i 2 is computed and we append exactly i k 2 bits with value to the codeword, followed by a single. k Summary: the symbols m and m+l are merged at successive steps of the Huffman algorithm, until there is no more pair (m, m + l), i.e when there are only l symbols. The Huffman code for these l symbols is trivial when l = 2 k, it is just the binary representation of the l symbols. The overall code can be understood tracking back all operations, as shown in the next example. Example: take k = 2,l = 2 k = 4. The reduced tree and the final code tree are presented on the next page m-reduced source for l=4 and m=5 Based on the previous construction, Golomb-Rice code with k = 2 and l = 4 is optimal for all geometric 52

27 sources with θ obeying the inequalities: θ 4 + θ 5 < θ 3 + θ 4 i.e for θ (.892,.8567). Golomb-Rice codes: fast encoding and decoding The overall code, named Golomb-Rice code, has the following rules: Any integer i is coded in two parts: First, the last k bits of i are written to the codeword (i.e. the binary representation of (i mod 2 k )). Then i 2 is computed and we append exactly i k 2 bits with value to the codeword, followed by a single. k The GR codes are thus build of two distinct prefix codes: the last k bits of i form a first prefix code (a totally balanced tree of depth k), and the unary representation of i 2 (exactly i k 2 bits of followed by k a delimiter, ) forms a second prefix code. One can swap the two codes, so that the unary representation comes first and the k bits of i come next, and the code is still decodeable. 53 SGN-236 Signal Compression 3. Lempel-Ziv Coding 3. Dictionary methods 3.2 The LZ77 family of adaptive dictionary coders (Ziv-Lempel 77) 3.3 The gzip variant of Ziv-Lempel The LZ78 family of adaptive dictionary coders (Ziv-Lempel 78) 3.5 The LZW variant of Ziv-Lempel Statistical analysis of a simplified Ziv-Lempel algorithm 54

28 3. Dictionary methods Replace a substring in the file with a codeword that identifies the substring in a dictionary (or codebook). Static dictionary. One first builds a suitable dictionary, which will be used for all compression tasks. Examples: digram coding, where some of most frequently occurring pairs of letters are stored in the dictionary. Example: A reasonable small dictionary: 28 ASCII individual characters, followed by 28 pairs (properly selected out of the possible 2 4 pairs) of ASCII symbols. In clear an ASCII character needs 7 bits. With the above dictionary, the favorable cases are encoded by digrams (4 bits/character) while in the unfavorable cases, to encoding a single character one needs 8 bits/character instead of 7 bits/character). The dictionary may be enlarged, by adding longer words (phrases) to it (e.g. and, the). Unfortunately using a dictionary with long phrases will make it well adapted and efficient for a certain type of texts, but very inefficient for other texts (compare the dictionaries suitable for a mathematical textbook or for a collection of parliamentary speeches). 55 Semi-static dictionaries: one can build a dictionary well suited for a text. First, the dictionary is sent as side information, and afterwards the text is sent, encoded with the optimal dictionary. This has two drawbacks: (a) the overhead of side information may be very high, for short texts, and (b) at encoder we need to pass two times through the text (read two times a large file). Adaptive dictionary is both elegant and simple. The dictionary is built on the fly (or it need not to be built at all, it exists only implicitly) using the text seen so far. Advantages: (a) there is only one pass through the text, (b) the dictionary is changing all the time, following the specificity of the recently seen text. A substring of a text is replaced by a pointer to where it has occurred previously. Almost all dictionary methods are variations of two methods, developed by Jacob Ziv and Abraham Lempel in 977 and 978, respectively. In both methods, the same principle is used: the dictionary is essentially all or part of the text seen before (prior to the current position), the codewords specify two types of information: a) pointers to previous positions and b) the length of the text to be copied from the past. The variants of Ziv-Lempel coding differ in how pointers are represented and in the limitations they impose on what is referred to by pointers. 56

29 A (cartoon-like) example of encoding with an adaptive dictionary is given in the image below. The decoder has to figure out what to put in each empty box, by following each arrow, and taking the amount of text suggested by the size of each box. Pease porridge hot, Pease porridge cold, Pease porridge in a pot Nine days old. Some like it hot, Some like it cold, Some like it in a pot Nine days old. 57 Pease porridge hot, Pease porridge cold, Pease porridge in a pot Nine days old. Some like it hot, Some like it cold, Some like it in a pot Nine days old. 58

30 3.2 The LZ77 family of adaptive dictionary coders (Ziv-Lempel 77) The algorithm was devised such that decoding is fast and the memory requirements are low (the compression ratio was sacrificed in favor of low complexity). Any string of characters is first transformed into a strings of triplets, where components have the following significance: The first component of a triplet says how far back to look in the previous text to find the next phrase. The second component records how long the phrase is. The first and second components form a pointer to a phrase in the past text. The third component gives the character which will follow the next phrase. This is absolutely necessary if there is no phrase match in the past. It is included in every triplet for uniformity of decoding. We start with a decoding example. Suppose the encoded bitstream contains the triplets <,,a ><,,b >< 2,,a >< 3, 2,b >< 5, 3,b ><,,a > When the triplet < 5, 3,b > is received, the previous decoded text is abaabab. The pointer < 5, 3, > tells to copy the past phrase aab after abaabab. The character <,, b > tells to append a b after abaababaab 59 When the triplet <,,a > is received, it tells to copy characters starting with the last available character. This is a recursive reference, but fortunately it can be solved easily. We find that the characters are in fact bbbbbbbbbb. Thus recursive references are similar to run-length coding (to be discussed in a later course). In LZ77 there are limitations on how far back a pointer can refer and the maximum size of the string referred to. Usually the window for search is limited to a few thousand characters. Example: with 3 bits one can address 892 previous positions (several book pages). The length of the phrase is limited to about 6 characters. Longer pointers are expensive in bits, without a significant improvement of the compression. If the length of the phrase is the position is not relevant The decoder is very simple and fast, because each character decoded requires only a table look-up (the size of the array is usually smaller than the cache size). The decoding program is sometimes included with the data at very little cost, such that a compressed file can be downloaded from the network without any software. When executed, the program generates the original file. 6

31 Example of LZ77 compression encoder output <,,a > <,,b > < 2,,a > < 3, 2,b > < 5, 3,b > <,,a > decoder output a b aa bab aabb bbbbbbbbbba 6 Encoding procedure Goal: Given the text S[...N], and the window length W. Produce a stream of triplets < f, l, c > position-length-next character (the binary codes for f, l,c are discussed latter).. Set p. /* (S(p) is the next character to be encoded) */ 2. While (p N) /* while we did not reach the end of the text */ 2. Search for the longest match for S[p,p+,...] in S[p W,...,p ]. Denote m the position and l the length of the match S[m,m+,...,m+l ] S[p,p+,...,p+l ]. 2.2 Write in the output stream the triplet <position,length,next character>), i.e. < p m,l, S[p + l] > 2.3 Set p p + l +. /*Continue encoding from S[p + l + ])*/ 62

32 Decoding procedure Given a stream of triplets < f, l, c > (the binary codes for f, l,c are discussed latter). Set p. /* (S(p) is the next character to be decoded) */ 2 While there are non-decoded triplets < f, l,c > /* while we did not reach the end of the text */ 2. Read the triplet < f,l, c >. 2.2 Set S[p,p +,...,p + l ] = S[p f,p f +,...,p f + l ]. 2.3 Set S[p + l] c. 2.4 Set p p + l +. /*Continue encoding from S[p + l + ])*/ The gzip variant of Ziv-Lempel 77 Distributed by Gnu Free Software Foundation (author Gailly, 993) Gzip uses a simple technique to speed up at the encoder the search for the best match in the past. The next three characters are used as addresses in a look up table, which contains a linked list showing where the next three characters have occurred in the past. The length of the list is restricted in size, by a parameter selected by the user before starting the encoding. If there are long runs of the same characters, limiting the size of the list helps removing the unhelpful references in the list. Recent occurrences are stored at the beginning of the list. Binary encoding of the triplets <position,length,next character> In gzip the encoding is done slightly differently than classical LZ77: instead of sending all the time the triplet <position,length,character>, gzip sends either a pair <length,position>, when a match is found, or it sends <character>, when no match was found in the past. Therefore a previous match is represented by a pointer consisting in position and length. The position is Huffman coded such that more frequent positions (usually recent ones) are encoded using fewer bits than older positions. 64

33 The match length and the next character are encoded with a single Huffman code (more efficient than separately Huffman encoding the length and the character and adding an extra bit to signal that what follows is length or character). The Huffman codes are generated semi-statically: blocks of up to 64Kbytes from the input file are processed at a time. The canonical Huffman codes are generated for the pointers and raw characters, and a code table is placed at the beginning of the compressed form of the block. The program does not need to read twice the file (64 Kbytes can be kept in memory). With its fast list search method and compact Huffman representation of pointers and characters on Huffman codes, gzip is faster and compresses better than other Ziv-Lempel methods. However, faster versions exist, but their compression ratio is smaller The LZ78 family of adaptive dictionary coders In LZ77 pointers can refer to any substring in the window of previous text. This may be inefficient, since the same substring may appear many times in the window, and we spare multiple codewords for the same substring. In LZ78 only some substrings can be referenced, but now there is no window restriction in the previous text. The encoded stream consists of pairs <index, character>, where <index, > points in a table to the longest substring matching the current one, and < character> is the character following the matched substring. Example We want to encode the string abaababaa The encoder goes along the string and creates a table where it dynamically adds new entries. When encoding a new part of the string, the encoder searches the existing table to find a match for the new part, and if there are many such matches, it selects the longest one. Then it will add to the encoded stream the address in the table of the longest match. Additionally, he will add to the bitstream the code for the next character. 66

34 When starting to encode the string abaababaa, the table is empty, so there is no match in it, and the encoder adds to the output bitstream the pair <,a > ( for no match found in the table, and a for the next character). After this, the encoder adds to the dictionary an entry for the string a, which will have address. Continuing to encode the rest of the string, baababaa, the table has the single entry a, so no match is found in the table. The encoder adds to the output bitstream the pair <,b > ( for no match found in the table, and b for the next character). After this, the encoder adds to the dictionary an entry for the string b, which will have address 2. Continuing to encode the rest of the string, aababaa, we can find in the table the entry a, which is the longest match now. The encoder adds to the output bitstream the pair <,a > ( for match found in the first entry of the table, and a for the next character). After this, the encoder adds to the dictionary an entry for the string aa, which will have address How the decoder works. encoder output <,a > <,b > <,a > < 2,a > < 4,a > < 4,b > < 2,b > < 7,b > < 8,b > decoder output a, b, aa, ba, baa, Table entries a b aa ba baa Table addresses In the example the encoded stream is <,a ><,b ><,a >< 2,a >< 4,a >< 4,b >< 2,b >< 7,b >< 8,b >. After decoding the first 5 pairs we have found the original text a,b, aa, ba,baa. When processing the sixth pair, < 4,b >, which represents the phrase 4 (i.e. ba) and the following character is b, therefore we complete the decoded string to a,b, aa, ba,baa, bab and bab is added to the dictionary as the phrase 6. The rest of the decoded string is a run of b s. The separation of the input string in substrings (the separation by commas in the previous example) is called a parsing strategy. The parsing strategy can be implemented in a trie structure. The characters of each phrase specify the path from the root to the node labeled by the index of the phrase. 68

35 TRIE DATA FOR LZ78 CODING a a 3 b a 2 4 a b the data structure in LZ78 grows without any bounds, so the growth must be stopped to avoid the use of too much memory. At the stopping moment the trie can be removed and re-initialized. Or it can be partly rebuilt using a few hundred of the recent bytes. Encoding with LZ78 may be faster than with LZ77, but decoding is slower, since we have to rebuild the dictionaries (tables) at decoding time. 7

36 3.5 The Lempel-Ziv-Welch (LZW) variant of LZ78 LZW is more popular than Ziv-Lempel coding, it is the basis of Unix compress program. LZW encodes only phrase numbers and does not have explicit characters in the encoded stream. This is possible by initializing the list of phrases to include all characters, say the entries to 28, such that a has address 97 and b has address 98. a new phrase is built from an existing one by appending the first character of the next phrase to it. encoder input a b a ab ab ba aba abaa encoder output New entry added ab ba aa aba abb baa abaa address of new entry in the example the encoder output is formed of the indexes in the dictionary 97, 98, 97, 28, 28, 29, 3, 34. Decoding 97, 98, 97, 28, 28 we find the original text a,b, a,ab, ab and construct the new entries in the dictionary 28, 29, 3, 3. We explain in detail the decoding starting from the next received index, 29. First we read from the encoded stream the entry 29 to be ba, which can be appended to 7 the decoded string, a,b, a, ab,ab, ba. At this moment the new phrase to be added to the dictionary is phrase 32 = abb. Then we read from the encoded stream the entry 3. This is found to be aba and added to the decoded string, a,b, a, ab,ab, ba, aba. We also add to the dictionary the new phrase 33=baa. the lag in the construction of the dictionary creates a problem when the encoder references a phrase index which is not yet available to the decoder. This is the case when 34 is received in the encoded stream: there is no 34 index in the dictionary yet. However, we know that 34 should start with aba and contains an extra character. Therefore, we add to the decoded string a,b, a,ab, ab, ba,aba, aba?. Now we are able to say what is the phrase 34, namely abaa, and after this we can substitute? by a. There are several variants of LZW. Unix compress is using an increasing number of bits for the indices: fewer when there are fewer entries (other variants are using for the same file the maximum number of bits necessary to encode all parsed substrings of the file). When a specified number of phrases are exceeded (full dictionary) the adaptation is stopped. The compression performance is monitored, and if it deteriorates significantly, the dictionary is rebuilt from scratch. 72

37 Encoding procedure for LZW Given the text S[...N]. Set p. /* (S(p) is the next character to be encoded) */ 2. For each character d {,..., q } in the alphabet do /* initial dictionary */ Set D[d] character d. 3. Set d q /* d points to the last entry in the dictionary */ 4. While there is still text remaining to be coded do 4. Search for the longest match for S[p,p +,...] in D. Suppose the match occurs at entry c, with length l. 4.2 Output the code of c 4.3 Set d d +. /* Add an entry to the dictionary*/ 4.4 Set p p + l. 4.5 Set D[d] D[c] + +S[p] /* Add an entry to the dictionary by concatenation*/ 73 Decoding procedure for LZW. Set p. /* (S(p) is the next character to be decoded) */ 2. For each character d {,..., q } in the alphabet do /* initial dictionary */ Set D[d] character d. 3. Set d q /* d points to the last entry in the dictionary */ 4. For each code c in the input do 4. If d (q ) then /* first time is an exception */ Set last character of D[d] first character of D[c]. 4.2 Output D[c]. 4.3 Set d d +. /* Add an entry to the dictionary */ 4.4 Set D[d] D[c] + +? /* Add an entry to the dictionary by concatenation, but the last character is currently unknown*/ 74

38 3.6 Statistical analysis of a simplified Ziv-Lempel Algorithm for the universal data compression system: The binary source sequence is sequentially parsed into strings that have not appeared so far. Let c(n) be the number of phrases in the parsing of the input n-sequence. We need log c(n) bits to describe the location of the prefix to the phrase and bit to describe the last bit. The above two pass algorithm may be changed to a one pass algorithm, which allocates fewer bits for coding the prefix location. The modifications do not change the asymptotic behavior. Parse the source string into segments. Collect a dictionary of segments. Add to the dictionary a segment one symbol longer than the longest match so far found Coding : Transmit the index of the matching segment in the dictionary plus the terminal bit; Example: 75 Index in dictionary Segment Transmitted message (,) 2 (,) 3 (,) 4 (,) 5 (3,) Length of the code for increasing sizes of segment indices is L = Number of segments j= log 2 (j) + Number of segments If we assign the worst case length to all segment indices, and if the number of segments is c(n) with n the total length of the input string, the length is and the average length per input symbol is l = c(n)( + log c(n)) l = c(n)( + log c(n)) n 76

39 Definition A parsing of a binary string x x 2...x n is a division of the string into phrases, separated by commas. A distinct parsing is a parsing such that no two phrases are identical. Lemma (Lempel and Ziv) The number of phrases c(n) in a distinct parsing of a binary sequence x x 2...x n satisfies n c(n) ( ε n ) log n log(log n)+4 where ε = min(, log n ). Theorem Let {X n } be a stationary ergodic process with entropy rate H(X) and let c(n) be the number of distinct phrases in a distinct parsing of a a sample of length n from this process. Then with probability. c(n) log c(n) lim sup H(X) n n 77 Theorem Let {X n } be a stationary ergodic process with entropy rate H(X). Let l(x,x 2,...,X n ) be the Lempel-Ziv codeword length associated with X,X 2,...,X n. Then with probability. lim sup n n l(x,x 2,...,X n ) H(X) Proof We know that l(x,x 2,...,X n ) = c(n)(+log c(n)). By Lemma Lempel-Ziv c(n) n and thus lim sup n n l(x,x 2,...,X n ) = lim sup n c(n) log c(n) n + c(n) H(X) n 78

40 SGN-236 Signal Compression 4. Shannon-Fano-Elias Codes and Arithmetic Coding 4. Shannon-Fano-Elias Coding 4.2 Arithmetic Coding Shannon-Fano-Elias Coding We discuss how to encode the symbols {a,a 2,...,a m }, knowing their probabilities, by using as code a (truncated) binary representation of the cumulative distribution function. Consider the random variable X taking as values m letters of the alphabet, {a,a 2,...,a m }, and for the letter a i the probability mass function is p(x = a i ) = p(a i ) >. The (cumulative) distribution function is F(x) = Prob(X x) = a k x p(a k ) where we assumed the lexicographic ordering relation a i < a j if i < j. Note that if one changes the ordering, the cumulative distribution function will be different. y = F(x) is a function having its plot as stairs, with jumps at x = a k (see plot on next page). Even though there is no inverse function x = F (y), we may define a partial inverse as follows: If all p(a i ) >, an arbitrary value y [, ) uniquely determines a symbol a k, as that symbol that obeys F(a k ) y < F(a k ). We may use the plot of F(x), to identify the value a k for which F(a k ) y < F(a k ). Note that F(a k ) = F(a k ) + p(a k ), which is a fast way to compute F(a k ). 8

41 To avoid dealing with interval boundaries, define F(a i ) = i k= p(a k ) + 2 p(a i) The values F(a i ) are the midways of the steps in the distribution plot. If F(ai ) is given, one can find a i. The same is true if one gives an approximation of F(ai ), as long as it does not go outside the interval F(a k ) y < F(a k ). Therefore the number F(a i ), or an approximation of it, can be used as a code for a i. Since the real number F(a i ) may happen to have an infinite binary representation, we have to look sometimes for numbers close to it, but having shorter binary representations. From Shannon codes we know that a good code for a i needs to be represented in about log p(a i ) bits, therefore F(x) needs to be represented in about log p(x) bits. 8 Probability mass function and cumulative distribution p(a i ).5 F(x) i x 82

42 Probability mass function and cumulative distribution for strings To extend the previous reasoning from symbols to strings of symbols, x, we have to: compute for each of the strings of n symbols the mass probability p(x) (such that x p(x) = ), define a lexicographic ordering for any two strings (each with n symbols) x and y, and denoted it by the ordering symbol, x < y. define the cumulative probability F(y) = x<y p(x) + 2 p(y) The code for x is obtained as follows: We truncate (floor operation) F(x) to l(x) bits to obtain F(x) l(x) where l(x) = log p(x) +. Important notation distinction: F(x) k is the binary representation of the sub-unitary number F(x), using k bits for the fractional part. log p(x) denotes as usual the smallest integer larger than log p(x). The codeword to be used for encoding the string x is F(x) l(x). 83 Property. The code is well defined, i.e. with F(x) l(x) we uniquely identify x. Proof: F(x) F(x) l(x) < 2 l(x) F(x) 2 l(x) < F(x) l(x) Now we use the fact l(x) = log p(x) + and 2l(x) = 2 2 log p(x) > 2 2 log p(x) = 2 p(x) p(x) < = 2l(x) 2 F(x) F(x ) F(x ) < F(x) 2 l(x) F(x ) < F(x) 2 l(x) < F(x) l(x) F(x) Finally the uniqueness of x given F(x) l(x) follows from F(x ) < F(x) l(x) F(x) (i.e. looking at the plot of the cumulative function z = F(x), the value z = F(x) l(x) falls on the step at x, between the basis of the step and the middle of it. 84

43 Property 2. The code is prefix free. Let associate to each codeword z z 2...z l a closed interval, [.z z 2...z l ;.z z 2...z l + ]. 2 l Any number outside the closed interval has at least one bit different in the bits to l, and therefore z z 2...z l is not a prefix of any number outside the closed interval. Extending now the reasoning to all codewords, the code is prefix free if and only if all intervals corresponding to codewords are disjoint. The interval corresponding to any codeword has length 2 l(x). A prefix of the codeword is e.g. z z 2...z l. Can that prefix be a codeword itself? If z z 2...z l is a codeword, than it represents the interval [.z z 2...z l ;.z z 2...z l + ]. But the number z 2 l z 2...z l necessarily belongs to the interval [.z z 2...z l ;.z z 2...z l + ], therefore there 2 l is an overlap of the intervals. We already have F(x ) < F(x) l(x), and similarly: p(x) < = F(x) 2l(x) 2 F(x) F(x) > F(x) + 2 > F(x) l(x) l(x) + 2 l(x) and therefore the interval [ F(x) l(x), F(x) l(x) + ] is totaly included in the interval 2 l(x) [F(x ),F(x)]. Now, the overlap of intervals is contradicted by our consideration on the cumulative distribution for symbols. Consequently Shannon-Fano-Elias code is prefix free. 85 Average length of Shannon-Fano-Elias Codes We use l(x) = log p(x) + bits to represent x. The expected codelength is L = p(x)l(x) = p(x) log x x p(x) + < H + 2 (9) where the entropy is H = x p(x) log p(x) Example. All probabilities are integer powers of 2. x p(x) F(x) F(x) F(x) in binary l(x) = log p(x) + Codeword

44 Average length of Shannon-Fano-Elias Codes The average codelength is 2.75 bits, the entropy is.75 bits. Since all probabilities are powers of two, the Huffman code attains the entropy. One can remove the last bit in the last two codewords of Shannon-Fano-Elias Code in Example! Example 2. All probabilities are not integer powers of 2. The Huffman code is in average.2 bits shorter than Shannon-Fano-Elias Code in Example 2. x p(x) F(x) F(x) F(x) in binary l(x) = ( log p(x) + Codeword () () () Arithmetic coding Motivation for using arithmetic codes Huffman codes are optimal codes, for a given probability distribution of the source. However, their average length is longer than the entropy, within bit distance. To reach average codelength closer to the entropy, Huffman is applied to blocks of symbols, instead of individual symbols. The size of the Huffman table needed to store the code increases exponentially with the length of the block. If during encoding we improve our knowledge of the symbol probabilities, we have either to redesign the Huffman table again, or use an adaptive variant of Huffman (but everybody agrees adaptive Huffman is not elegant nor computationally attractive). When encoding binary images, the probability of one symbol may be extremely small, therefore the entropy is close to zero. Without blocking the symbols, Huffman average length is bit! Long blocks are strictly necessary in this application. Whenever somebody needs to encode long blocks of symbols, or wants to change the code to make optimal for the new distribution, the solution is arithmetic coding. Its principle is similar to Shannon-Fano-Elias Coding, i.e. handling the cumulative distribution to find codes. However, arithmetic coding is better engineered, allowing very efficient implementations (as speed and compression ratio) and an easy adaptation mechanism. 88

45 Principle of arithmetic codes Essential idea: efficiently calculate the probability mass function p(x n ) and the cumulative distribution function F(x n ) for the source sequence x n = x x 2...x n. Then, similar to Shannon-Fano-Elias Codes, use a number in the interval [F(x n ) p(x n ); F(x n )] as the code for x n. A sketch: Expressing F(x n ) with an accuracy of log p(x) will give a code for the source. So the codewords for different sequences are different. But it is no guarantee that the codeword are prefix free. As in Shannon-Fano-Elias Codes, we may use log p(x) + bits to round F(x n ), in which case the prefix condition is satisfied. A simplified variant. Consider a binary source alphabet, assume we have a fixed block length n that is known to both the encoder and decoder. We assume we have a simple procedure to calculate p(x x 2...,x n ) for any string x x 2...,x n. We will use the natural lexicographic order on strings: a string x is greater than a string y if x i =,y i = for the first i such that x i y i. Equivalently, x > y if i x i 2 i > i y i 2 i, i.e. the binary numbers satisfy.x >.y. The strings can be arranged as leaves in a tree of depth n (parsing tree, not coding tree!). In the tree, the order x > y of two strings means that x is at right of y. 89 We need to compute the cumulative distribution F(x n ) for a string x n, i.e. to add all p(y n ) for which y n < x n. However, there is a much smarter way to perform the sum, described next. Let T x x 2...x k the subtree starting with x x 2...x k. The probability of the subtree is P(T x x 2...x k ) = p(x x 2...x k z k+...z n ) = p(x x 2...x k ) z k+...z n The cumulative probability can therefore be computed as F(x n ) = p(yn ) = p(t) y n x n T:T is to the left of x n p(x x 2...x k ) (2) = k:x k = 9

46 Example: For a Bernoulli source with θ = p() we have F() = p(t ) + p(t 2 ) + P(T 3 ) = p() + p() + p() = ( θ) 2 + θ( θ) 2 + θ 2 ( θ) 2 To encode the next bit of the source sequence, we need to calculate p(x i x i+ ) and update F(x i x i+ ). To decode the sequence, we use the same procedure to calculate p(x i x i+ ) and update F(x i x i+ ) for various x i+, and check when the cumulative distribution exceeds the value corresponding to the codeword. The most used mechanisms for computing the probabilities are i.i.d. sources and Markov sources. 9 For i.i.d. sources For Markov sources of first order p(x n ) = n p(x i ) i= p(x n ) = p(x ) n p(x i x i ) Encoding is efficient if the distribution used by the arithmetic coder is close to the true distributions. The adaptation of the probability distribution will be discussed in a separate lecture. The implementation issues are related to the computational accuracy, buffer sizes, speed. i=2 92

47 Statistical modelling + Arithmetic coding = Modern data compression Statistical modeller Next Symbol Arithmetic encoder Cumulative Distribution of Symbols Input image Stream of bits 93 Arithmetic coding example message: BILL GATES Character Probability Range SPACE /. -. A /. -.2 B / E / G / I / L 2/ S / T /.9 -. New Character Low value High Value.. B.2.3 I L L SPACE G A T E S BILL GATES ( , ) = ( 4D8F565H 2 32, 4D8F567H 2 32 ) 32 bits H = i= p i log 2 (p i ) = 3.2 bits / character Shannon: To encode BILL GATES we need at least 3.2 bits 94

48 Encoding principle Set low to. Set high to. While there are still input symbols do get an input symbol code range = high - low. high = low + range high range(symbol) low = low + range low range(symbol) End of While output low 95 Arithmetic coding: Decoding Principle get encoded number Do find symbol whose range straddles the encoded number output the symbol range = symbol high value - symbol low value subtract symbol low value from encoded number divide encoded number by range until no more symbols Encoded Number Output Low High Range Symbol B I L L SPACE G A T E S

49 References on practical implementations of Arithmetic coding Moffat, Neal and Witten (998) Source code ftp : //munnari.oz.au/pub/arith coder/ Witten, Neal and Cleary(987) Source code f tp : //f tp.cpsc.ucalgary.ca/pub/projects/ar.cod/cacm 87.shar 97 SGN-236 Signal Compression 5. Adaptive Models for Arithmetic Coding 5. Adaptive arithmetic coding 5.2 Models for data compression 5.3 Prediction by partial matching 98

50 5. Adaptive arithmetic coding: an example We want to encode the data string bccb from the ternary alphabet {a,b, c}, using arithmetic coding, and the decoder knows that we want to send 4 symbols. We will use an adaptive zero-model (which only counts the frequencies of occurrence of the symbols with no conditioning, the so called zero memory model). To evaluate the probability of the symbols p(a),p(b) and p(c) we denote by N(a) the number of occurrences of the character a in the already received substring, N(b) the number of occurrences of the character b in the already received substring, and similarly N(c), and then assign the probability by p(a) = N(a) + N(a) + N(b) + N(c) + 3 ; p(b) = N(b) + N(a) + N(b) + N(c) + 3 ; p(c) = N(c) + N(a) + N(b) + N(c) + 3 ; (2) When encoding the first b, the history available to the decoder is empty, and consequently there is no knowledge about which of the three symbols is more frequent, therefore we assume p(a) = p(b) = p(c) = 3, which is consistent with formula (2), since N(a) = N(b) = N(c) =. The arithmetic coder will finally provide us a small subinterval of (, ), and virtually 99 any number in that interval will be a codeword for the encoded string. During encoding, the original interval (, ) is successively reduced, and we denote low and high the current limits of the interval (initially low=, high=). At this first encoded symbol, b, the interval (, ) is split according to the probability distribution, in three equal parts (see Fig. 2.8), the interval needed to specify b is (.3333,.6667), therefore low becomes.3333 and high becomes When encoding c (the second character in bccb), we have N(a) = ;N(b) =,N(c) = and therefore p(a) = p(c) = /4 and p(b) = 2/4. To specify c, the arithmetic coder will change low to.5834 and high will be set to When encoding c, (the third character in bccb) we have N(a) = ; N(b) =,N(c) = and therefore p(a) = /5 and p(b) = p(c) = 2/5. To specify c, the arithmetic coder will change low to.6334 and high will be set to When encoding b, (the fourth character in bccb) we have N(a) = ; N(b) =, N(c) = 2 and therefore p(a) = /6, p(b) = 2/6 and p(c) = 3/6. To specify c, the arithmetic coder will change low to.639 and high will be set to.65. Now the encoder found the interval representing the string bccb, and may choose whatever number in the interval.639,.65 to be sent to the decoder. If we assume the encoding is done in decimal,.64 will be a suitable value to choose, and the encoder has to send the message 64.

51 What will be the hexa number suitable to be sent, if we assume hexadecimal coding, knowing the hexa representations of the decimal involved numbers? (What about the binary number?) Decimal Hexadecimal AAACD9E8.9559B3D.A22689D.A39586.A66CF4F 2

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min.

Optimal codes - I. A code is optimal if it has the shortest codeword length L. i i. This can be seen as an optimization problem. min. Huffman coding Optimal codes - I A code is optimal if it has the shortest codeword length L L m = i= pl i i This can be seen as an optimization problem min i= li subject to D m m i= lp Gabriele Monfardini

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms)

Multimedia. Multimedia Data Compression (Lossless Compression Algorithms) Course Code 005636 (Fall 2017) Multimedia Multimedia Data Compression (Lossless Compression Algorithms) Prof. S. M. Riazul Islam, Dept. of Computer Engineering, Sejong University, Korea E-mail: riaz@sejong.ac.kr

More information

Introduction to information theory and coding

Introduction to information theory and coding Introduction to information theory and coding Louis WEHENKEL Set of slides No 5 State of the art in data compression Stochastic processes and models for information sources First Shannon theorem : data

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

CS4800: Algorithms & Data Jonathan Ullman

CS4800: Algorithms & Data Jonathan Ullman CS4800: Algorithms & Data Jonathan Ullman Lecture 22: Greedy Algorithms: Huffman Codes Data Compression and Entropy Apr 5, 2018 Data Compression How do we store strings of text compactly? A (binary) code

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2 Text Compression Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction to Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable

More information

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression

CSEP 521 Applied Algorithms Spring Statistical Lossless Data Compression CSEP 52 Applied Algorithms Spring 25 Statistical Lossless Data Compression Outline for Tonight Basic Concepts in Data Compression Entropy Prefix codes Huffman Coding Arithmetic Coding Run Length Coding

More information

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak 4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof. What is data compression? Reducing the file size without compromising the quality of the data stored in the

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.csie.nctu.edu.tw/~cmliu/courses/compression/ Office: EC538 (03)573877 cmliu@cs.nctu.edu.tw

More information

Stream Codes. 6.1 The guessing game

Stream Codes. 6.1 The guessing game About Chapter 6 Before reading Chapter 6, you should have read the previous chapter and worked on most of the exercises in it. We ll also make use of some Bayesian modelling ideas that arrived in the vicinity

More information

Summary of Last Lectures

Summary of Last Lectures Lossless Coding IV a k p k b k a 0.16 111 b 0.04 0001 c 0.04 0000 d 0.16 110 e 0.23 01 f 0.07 1001 g 0.06 1000 h 0.09 001 i 0.15 101 100 root 1 60 1 0 0 1 40 0 32 28 23 e 17 1 0 1 0 1 0 16 a 16 d 15 i

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

Lecture 1: Shannon s Theorem

Lecture 1: Shannon s Theorem Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work

More information

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding SIGNAL COMPRESSION 8. Lossy image compression: Principle of embedding 8.1 Lossy compression 8.2 Embedded Zerotree Coder 161 8.1 Lossy compression - many degrees of freedom and many viewpoints The fundamental

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

More information

Source Coding Techniques

Source Coding Techniques Source Coding Techniques. Huffman Code. 2. Two-pass Huffman Code. 3. Lemple-Ziv Code. 4. Fano code. 5. Shannon Code. 6. Arithmetic Code. Source Coding Techniques. Huffman Code. 2. Two-path Huffman Code.

More information

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

repetition, part ii Ole-Johan Skrede INF Digital Image Processing repetition, part ii Ole-Johan Skrede 24.05.2017 INF2310 - Digital Image Processing Department of Informatics The Faculty of Mathematics and Natural Sciences University of Oslo today s lecture Coding and

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lec 03 Entropy and Coding II Hoffman and Golomb Coding CS/EE 5590 / ENG 40 Special Topics Multimedia Communication, Spring 207 Lec 03 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 207 Spring p. Outline Lecture 02 ReCap

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

Shannon-Fano-Elias coding

Shannon-Fano-Elias coding Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The

More information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source

More information

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C Image Compression Background Reduce the amount of data to represent a digital image Storage and transmission Consider the live streaming of a movie at standard definition video A color frame is 720 480

More information

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Kirkpatrick (984) Analogy from thermodynamics. The best crystals are found by annealing. First heat up the material to let

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

More information

Multimedia Information Systems

Multimedia Information Systems Multimedia Information Systems Samson Cheung EE 639, Fall 2004 Lecture 3 & 4: Color, Video, and Fundamentals of Data Compression 1 Color Science Light is an electromagnetic wave. Its color is characterized

More information

Image and Multidimensional Signal Processing

Image and Multidimensional Signal Processing Image and Multidimensional Signal Processing Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ Image Compression 2 Image Compression Goal: Reduce amount

More information

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)

More information

Lecture 10 : Basic Compression Algorithms

Lecture 10 : Basic Compression Algorithms Lecture 10 : Basic Compression Algorithms Modeling and Compression We are interested in modeling multimedia data. To model means to replace something complex with a simpler (= shorter) analog. Some models

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT COMP9319 Web Data Compression and Search Lecture 2: daptive Huffman, BWT 1 Original readings Login to your cse account: cd ~cs9319/papers Original readings of each lecture will be placed there. 2 Course

More information

CMPT 365 Multimedia Systems. Lossless Compression

CMPT 365 Multimedia Systems. Lossless Compression CMPT 365 Multimedia Systems Lossless Compression Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Outline Why compression? Entropy Variable Length Coding Shannon-Fano Coding

More information

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea Connectivity coding Entropy Coding dd 7, dd 6, dd 7, dd 5,... TG output... CRRRLSLECRRE Entropy coder output Connectivity data Edgebreaker output Digital Geometry Processing - Spring 8, Technion Digital

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

CSEP 590 Data Compression Autumn Dictionary Coding LZW, LZ77

CSEP 590 Data Compression Autumn Dictionary Coding LZW, LZ77 CSEP 590 Data Compression Autumn 2007 Dictionary Coding LZW, LZ77 Dictionary Coding Does not use statistical knowledge of data. Encoder: As the input is processed develop a dictionary and transmit the

More information

Compression and Coding

Compression and Coding Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)

More information

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT

COMP9319 Web Data Compression and Search. Lecture 2: Adaptive Huffman, BWT COMP9319 Web Data Compression and Search Lecture 2: daptive Huffman, BWT 1 Original readings Login to your cse account:! cd ~cs9319/papers! Original readings of each lecture will be placed there. 2 Course

More information

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004 On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding Ch 0 Introduction 0.1 Overview of Information Theory and Coding Overview The information theory was founded by Shannon in 1948. This theory is for transmission (communication system) or recording (storage

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

2018/5/3. YU Xiangyu

2018/5/3. YU Xiangyu 2018/5/3 YU Xiangyu yuxy@scut.edu.cn Entropy Huffman Code Entropy of Discrete Source Definition of entropy: If an information source X can generate n different messages x 1, x 2,, x i,, x n, then the

More information

BASIC COMPRESSION TECHNIQUES

BASIC COMPRESSION TECHNIQUES BASIC COMPRESSION TECHNIQUES N. C. State University CSC557 Multimedia Computing and Networking Fall 2001 Lectures # 05 Questions / Problems / Announcements? 2 Matlab demo of DFT Low-pass windowed-sinc

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

ELEC 515 Information Theory. Distortionless Source Coding

ELEC 515 Information Theory. Distortionless Source Coding ELEC 515 Information Theory Distortionless Source Coding 1 Source Coding Output Alphabet Y={y 1,,y J } Source Encoder Lengths 2 Source Coding Two coding requirements The source sequence can be recovered

More information

Digital communication system. Shannon s separation principle

Digital communication system. Shannon s separation principle Digital communication system Representation of the source signal by a stream of (binary) symbols Adaptation to the properties of the transmission channel information source source coder channel coder modulation

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Motivation for Arithmetic Coding

Motivation for Arithmetic Coding Motivation for Arithmetic Coding Motivations for arithmetic coding: 1) Huffman coding algorithm can generate prefix codes with a minimum average codeword length. But this length is usually strictly greater

More information

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2006 jzhang@cse.unsw.edu.au

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0 Part II Information Theory Concepts Chapter 2 Source Models and Entropy Any information-generating process can be viewed as a source: { emitting a sequence of symbols { symbols from a nite alphabet text:

More information

COS597D: Information Theory in Computer Science October 19, Lecture 10

COS597D: Information Theory in Computer Science October 19, Lecture 10 COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

ITCT Lecture IV.3: Markov Processes and Sources with Memory

ITCT Lecture IV.3: Markov Processes and Sources with Memory ITCT Lecture IV.3: Markov Processes and Sources with Memory 4. Markov Processes Thus far, we have been occupied with memoryless sources and channels. We must now turn our attention to sources with memory.

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur Module 5 EMBEDDED WAVELET CODING Lesson 13 Zerotree Approach. Instructional Objectives At the end of this lesson, the students should be able to: 1. Explain the principle of embedded coding. 2. Show the

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 8 Greedy Algorithms V Huffman Codes Adam Smith Review Questions Let G be a connected undirected graph with distinct edge weights. Answer true or false: Let e be the

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5 Lecture : Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP959 Multimedia Systems S 006 jzhang@cse.unsw.edu.au Acknowledgement

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

State of the art Image Compression Techniques

State of the art Image Compression Techniques Chapter 4 State of the art Image Compression Techniques In this thesis we focus mainly on the adaption of state of the art wavelet based image compression techniques to programmable hardware. Thus, an

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

Algorithms: COMP3121/3821/9101/9801

Algorithms: COMP3121/3821/9101/9801 NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales TOPIC 4: THE GREEDY METHOD COMP3121/3821/9101/9801 1 / 23 The

More information

A Mathematical Theory of Communication

A Mathematical Theory of Communication A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions

More information

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction. Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal

More information

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9. ( c ) E p s t e i n, C a r t e r, B o l l i n g e r, A u r i s p a C h a p t e r 17: I n f o r m a t i o n S c i e n c e P a g e 1 CHAPTER 17: Information Science 17.1 Binary Codes Normal numbers we use

More information

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2 582487 Data Compression Techniques (Spring 22) Model Solutions for Exercise 2 If you have any feedback or corrections, please contact nvalimak at cs.helsinki.fi.. Problem: Construct a canonical prefix

More information

! Where are we on course map? ! What we did in lab last week. " How it relates to this week. ! Compression. " What is it, examples, classifications

! Where are we on course map? ! What we did in lab last week.  How it relates to this week. ! Compression.  What is it, examples, classifications Lecture #3 Compression! Where are we on course map?! What we did in lab last week " How it relates to this week! Compression " What is it, examples, classifications " Probability based compression # Huffman

More information

Intro to Information Theory

Intro to Information Theory Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 13 Competitive Optimality of the Shannon Code So, far we have studied

More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

An O(N) Semi-Predictive Universal Encoder via the BWT

An O(N) Semi-Predictive Universal Encoder via the BWT An O(N) Semi-Predictive Universal Encoder via the BWT Dror Baron and Yoram Bresler Abstract We provide an O(N) algorithm for a non-sequential semi-predictive encoder whose pointwise redundancy with respect

More information

CSE 421 Greedy: Huffman Codes

CSE 421 Greedy: Huffman Codes CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:

More information

6.02 Fall 2012 Lecture #1

6.02 Fall 2012 Lecture #1 6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture

More information

Digital Image Processing Lectures 25 & 26

Digital Image Processing Lectures 25 & 26 Lectures 25 & 26, Professor Department of Electrical and Computer Engineering Colorado State University Spring 2015 Area 4: Image Encoding and Compression Goal: To exploit the redundancies in the image

More information

CHAPTER 8 COMPRESSION ENTROPY ESTIMATION OF HEART RATE VARIABILITY AND COMPUTATION OF ITS RENORMALIZED ENTROPY

CHAPTER 8 COMPRESSION ENTROPY ESTIMATION OF HEART RATE VARIABILITY AND COMPUTATION OF ITS RENORMALIZED ENTROPY 108 CHAPTER 8 COMPRESSION ENTROPY ESTIMATION OF HEART RATE VARIABILITY AND COMPUTATION OF ITS RENORMALIZED ENTROPY 8.1 INTRODUCTION Klimontovich s S-theorem offers an approach to compare two different

More information

Data Compression Using a Sort-Based Context Similarity Measure

Data Compression Using a Sort-Based Context Similarity Measure Data Compression Using a Sort-Based Context Similarity easure HIDETOSHI YOKOO Department of Computer Science, Gunma University, Kiryu, Gunma 76, Japan Email: yokoo@cs.gunma-u.ac.jp Every symbol in the

More information

COMM901 Source Coding and Compression. Quiz 1

COMM901 Source Coding and Compression. Quiz 1 German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding

More information

CSEP 590 Data Compression Autumn Arithmetic Coding

CSEP 590 Data Compression Autumn Arithmetic Coding CSEP 590 Data Compression Autumn 2007 Arithmetic Coding Reals in Binary Any real number x in the interval [0,1) can be represented in binary as.b 1 b 2... where b i is a bit. x 0 0 1 0 1... binary representation

More information