4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Size: px

Start display at page:

Download "4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak"

Aron Silvester Foster
6 years ago
Views:

1 4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof.

2 What is data compression? Reducing the file size without compromising the quality of the data stored in the file too much (lossy compression) or at all (lossless compression). With compression, you can fit higher-quality data (e.g., higher-resolution pictures or video) into a file of the same size as required for lower-quality uncompressed data.

3 Why data compression? Our appetite for data (high-resolution pictures, HD video, audio, documents, etc) seems to always significantly outpace hardware capabilities for storage and transmission.

4 Data compression: Step If the data is continuous-time (e.g., audio) or continuous-space (e.g., picture), it first needs to be discretized.

5 Data compression: Step If the data is continuous-time (e.g., audio) or continuous-space (e.g., picture), it first needs to be discretized. Sampling is typically done nowadays during signal acquisition (e.g., digital camera for pictures or audio recording equipment for music and speech).

6 Data compression: Step If the data is continuous-time (e.g., audio) or continuous-space (e.g., picture), it first needs to be discretized. Sampling is typically done nowadays during signal acquisition (e.g., digital camera for pictures or audio recording equipment for music and speech). We will not study sampling. It is studied in ECE 3, ECE 438, and ECE 44. We will consider compressing discrete-time or discrete-space data.

7 Example: compression of grayscale images An eight-bit grayscale image is a rectangular array of integers between (black) and 255 (white). Each site in the array is called a pixel.

8 Example: compression of grayscale images An eight-bit grayscale image is a rectangular array of integers between (black) and 255 (white). Each site in the array is called a pixel. It takes one byte (eight bits) to store one pixel value, since it can be any number between and 255.

9 Example: compression of grayscale images An eight-bit grayscale image is a rectangular array of integers between (black) and 255 (white). Each site in the array is called a pixel. It takes one byte (eight bits) to store one pixel value, since it can be any number between and 255. It would take 25 bytes to store a 5x5 image.

10 Example: compression of grayscale images An eight-bit grayscale image is a rectangular array of integers between (black) and 255 (white). Each site in the array is called a pixel. It takes one byte (eight bits) to store one pixel value, since it can be any number between and 255. It would take 25 bytes to store a 5x5 image. Can we do better?

11 Example: compression of grayscale images Can we do better than 25 bytes?

12 Two key ideas Idea #: Transform the data to create lots of zeros.

13 Two key ideas Idea #: Transform the data to create lots of zeros. For example, we could rasterize the image, compute the differences, and store the top left value along with the 24 differences [in reality, other transforms are used, but they work in a similar fashion]

14 Two key ideas Idea #: Transform the data to create lots of zeros. For example, we could rasterize the image, compute the differences, and store the top left value along with the 24 differences [in reality, other transforms are used, but they work in a similar fashion]: 255,,,,,,,,,, 55,,,,,,,,,,,,,,

15 Two key ideas Idea #: Transform the data to create lots of zeros. For example, we could rasterize the image, compute the differences, and store the top left value along with the 24 differences [in reality, other transforms are used, but they work in a similar fashion]: 255,,,,,,,,,, 55,,,,,,,,,,,,,, This seems to make things worse: now the numbers can range from 255 to 255, and therefore we need two bytes per pixel!

16 Two key ideas Idea #: Transform the data to create lots of zeros. For example, we could rasterize the image, compute the differences, and store the top left value along with the 24 differences [in reality, other transforms are used, but they work in a similar fashion]: 255,,,,,,,,,, 55,,,,,,,,,,,,,, This seems to make things worse: now the numbers can range from 255 to 255, and therefore we need two bytes per pixel! Idea #2: when encoding the data, spend fewer bits on frequently occurring numbers and more bits on rare numbers.

17 Entropy coding Suppose we are encoding realizations of a discrete random variable X such that value of X probability 22/25 /25 /25 /25

18 Entropy coding Suppose we are encoding realizations of a discrete random variable X such that value of X probability 22/25 /25 /25 /25 Consider the following fixed-length encoder: value of X codeword

19 Entropy coding Suppose we are encoding realizations of a discrete random variable X such that value of X probability 22/25 /25 /25 /25 Consider the following fixed-length encoder: value of X codeword For a file with 25 numbers, E[file size] = 25*2*(22/25+/25+/25+/25) = 5 bits

20 Entropy coding Suppose we are encoding realizations of a discrete random variable X such that value of X probability 22/25 /25 /25 /25 Consider the following fixed-length encoder: value of X codeword For a file with 25 numbers, E[file size] = 25*2*(22/25+/25+/25+/25) = 5 bits Now consider the following encoder: value of X codeword

21 Entropy coding Suppose we are encoding realizations of a discrete random variable X such that value of X probability 22/25 /25 /25 /25 Consider the following fixed-length encoder: value of X codeword For a file with 25 numbers, E[file size] = 25*2*(22/25+/25+/25+/25) = 5 bits Now consider the following encoder: value of X codeword For a file with 25 numbers, E[file size] = 25(22/25 + 2/25 + 3/25 + 3/25) = 3 bits!

22 Entropy coding A similar encoding scheme can be devised for a random variable of pixel differences which takes values between 255 and 255, to result in a smaller average file size than two bytes per pixel.

23 Entropy coding A similar encoding scheme can be devised for a random variable of pixel differences which takes values between 255 and 255, to result in a smaller average file size than two bytes per pixel. Another commonly used idea: run-length coding. I.e., instead of encoding each individually, encode the length of each string of zeros.

24 Back to the four-symbol example value of X probability 22/25 /25 /25 /25 codeword Can we do even better than 3 bits?

25 Back to the four-symbol example value of X probability 22/25 /25 /25 /25 codeword Can we do even better than 3 bits? What about this alternative encoder? value of X probability 22/25 /25 /25 /25 codeword

26 Back to the four-symbol example value of X probability 22/25 /25 /25 /25 codeword Can we do even better than 3 bits? What about this alternative encoder? value of X probability 22/25 /25 /25 /25 codeword E[file size] = 25(22/25 + 2/25 + /25+2/25) = 27 bits

27 Back to the four-symbol example value of X probability 22/25 /25 /25 /25 codeword Can we do even better than 3 bits? What about this alternative encoder? value of X probability 22/25 /25 /25 /25 codeword E[file size] = 25(22/25 + 2/25 + /25+2/25) = 27 bits Is there anything wrong with this encoder?

28 The second encoding is not uniquely decodable! value of X probability 22/25 /25 /25 /25 codeword Encoded string could either be 255 or followed by 55

29 The second encoding is not uniquely decodable! value of X probability 22/25 /25 /25 /25 codeword Encoded string could either be 255 or followed by 55 Therefore, this code is unusable! It turns out that the first code is uniquely decodable.

30 What kinds of distributions are amenable to entropy coding? a b c d a b c d Can do a lot better than two bits per symbol Cannot do better than two bits per symbol

31 What kinds of distributions are amenable to entropy coding? a b c d a b c d Can do a lot better than two bits per symbol Cannot do better than two bits per symbol Conclusion: the transform procedure should be such that the numbers fed into the entropy coder have a highly concentrated histogram (a few very likely values, most values unlikely).

32 What kinds of distributions are amenable to entropy coding? a b c d a b c d Can do a lot better than two bits per symbol Cannot do better than two bits per symbol Conclusion: the transform procedure should be such that the numbers fed into the entropy coder have a highly concentrated histogram (a few very likely values, most values unlikely). Also, if we are encoding each number individually, they should be independent or approximately independent.

33 What if we are willing to lose some information?

34 What if we are willing to lose some information? Quantization

35 Some eight-bit images The five stripes contain random values from (left to right): {252,253,254,255}, {88,89,9,9}, {25,26,27,28}, {6,62,63,64}, {,,2,3}. The five stripes contain random integers from (left to right): {24,,255}, {76,,9}, {3,,28}, {49,,64 }, {,,5}.

36 Converting continuous-valued to discrete-valued signals Many real-world signals are continuous-valued. audio signal a(t): both the time argument t and the intensity value a(t) are continuous; image u(x,y): both the spatial location (x,y) and the image intensity value u(x,y) are continuous; video v(x,y,t): x,y,t, and v(x,y,t) are all continuous.

37 Converting continuous-valued to discrete-valued signals Many real-world signals are continuous-valued. audio signal a(t): both the time argument t and the intensity value a(t) are continuous; image u(x,y): both the spatial location (x,y) and the image intensity value u(x,y) are continuous; video v(x,y,t): x,y,t, and v(x,y,t) are all continuous. Discretizing the argument values t, x, and y (or sampling), is studied in ECE 3, 438, and 44.

38 Converting continuous-valued to discrete-valued signals Many real-world signals are continuous-valued. audio signal a(t): both the time argument t and the intensity value a(t) are continuous; image u(x,y): both the spatial location (x,y) and the image intensity value u(x,y) are continuous; video v(x,y,t): x,y,t, and v(x,y,t) are all continuous. Discretizing the argument values t, x, and y (or sampling), is studied in ECE 3, 438, and 44. However, in addition to descretizing the argument values, the signal values must be discretized as well in order to be digitally stored.

39 Quantization Digitizing a continuous-valued signal into a discrete and finite set of values. Converting a discrete-valued signal into another discrete -valued signal, with fewer possible discrete values.

40 How to compare two quantizers? Suppose data X(),,X(N) is quantized using two quantizers, to result in Y (),,Y (N) and Y 2 (),,Y 2 (N). Suppose both Y (),,Y (N) and Y 2 (),,Y 2 (N) can be encoded with the same number of bits. Which quantization is better? The one that results in less distortion. But how to measure distortion? In general, measuring and modeling perceptual image similarity and similarity of audio are open research problems. Some useful things are known about human audio and visual systems that inform the design of quantizers.

41 Sensitivity of the Human Visual System to Contrast Changes, as a Function of Frequency

42 Sensitivity of the Human Visual System to Contrast Changes, as a Function of Frequency [From Mannos-Sakrison IEEE-IT 974]

43 Sensitivity of the Human Visual System to Contrast Changes, as a Function of Frequency [From Mannos-Sakrison IEEE-IT 974] High and low frequencies may be quantized more coarsely

44 But there are many other intricacies in the way human visual system computes similarity

45 Are these two images similar?

46 What about these two?

47 What about these two? Performance assessment of compression algorithms and quantizers is complicated, because measuring image fidelity is complicated. Often, very simple distortion measures are used such as mean-square error.

48 Scalar vs Vector Quantization s 255 s r quantize each value separately simple thresholding quantize several values jointly more complex r

49 What kinds of joint distributions are s 255 amenable to scalar quantization? r If (r,s) are jointly uniform over green square (or, more generally, independent), knowing r does not tell us anything about s. Best thing to do: make quantization decisions independently.

50 What kinds of joint distributions are s 255 amenable to scalar quantization? s r If (r,s) are jointly uniform over green square (or, more generally, independent), knowing r does not tell us anything about s. Best thing to do: make quantization decisions independently r If (r,s) are jointly uniform over yellow region, knowing r tells us a lot about s. Best thing to do: make quantization decisions jointly.

51 What kinds of joint distributions are s 255 amenable to scalar quantization? s r If (r,s) are jointly uniform over green square (or, more generally, independent), knowing r does not tell us anything about s. Best thing to do: make quantization decisions independently r If (r,s) are jointly uniform over yellow region, knowing r tells us a lot about s. Best thing to do: make quantization decisions jointly. Conclusion: if the data is transformed before quantization, the transform procedure should be such that the coefficients fed into the quantizer are independent (or at least uncorrelated, or almost uncorrelated), in order to enable the simpler scalar quantization.

52 More on Scalar Quantization Does it make sense to do scalar quantization with different quantization bins for different variables? s r

53 More on Scalar Quantization Does it make sense to do scalar quantization with different quantization bins for different variables? No reason to do this if we are quantizing grayscale pixel values. s r

54 More on Scalar Quantization Does it make sense to do scalar quantization with different quantization bins for different variables? No reason to do this if we are quantizing grayscale pixel values. However, if we can decompose the image into components that are less perceptually important and more perceptually important, we should use larger quantization bins for the less important components. s r

55 Structure of a Typical Lossy Compression Algorithm for Audio, Images, or Video data transform quantization entropy coding compressed bitstream

56 Structure of a Typical Lossy Compression Algorithm for Audio, Images, or Video data transform quantization entropy coding compressed bitstream Let s more closely consider quantization and entropy coding. (Various transforms are considered in ECE 3 and ECE 438.)

57 Quantization: problem statement Source (e.g., image, video, speech signal) Sequence of discrete or continuous random variables X(),,X(N) (e.g., transformed image pixel values).

58 Quantization: problem statement Source (e.g., image, video, speech signal) Sequence of discrete or continuous random variables X(),,X(N) (e.g., transformed image pixel values). Quantizer Sequence of discrete random variables Y(),,Y(N), each distributed over a finite set of values (quantization levels)

59 Quantization: problem statement Source (e.g., image, video, speech signal) Sequence of discrete or continuous random variables X(),,X(N) (e.g., transformed image pixel values). Quantizer Sequence of discrete random variables Y(),,Y(N), each distributed over a finite set of values (quantization levels) Errors: D(),,D(N) where D(n) = X(n) Y(n)

60 MSE is a widely used measure of distortion of quantizers Suppose data X(),,X(N) are quantized, to result in Y(),,Y(N). E N n= ( X(n) Y (n)) 2 = E N n= ( D(n) )2 If D(),..., D(N) are identically distributed, this is the same as NE ( D(n) ) 2, for any n.

61 Scalar uniform quantization Use quantization intervals (bins) of equal size [x,x 2 ), [x 2,x 3 ), [x L,x L+ ]. Quantization levels q, q 2,, q L. Each quantization level is in the middle of the corresponding quantization bin: q k =(x k +x k+ )/2.

62 Scalar uniform quantization Use quantization intervals (bins) of equal size [x,x 2 ), [x 2,x 3 ), [x L,x L+ ]. Quantization levels q, q 2,, q L. Each quantization level is in the middle of the corresponding quantization bin: q k =(x k +x k+ )/2. If quantizer input X is in [x k,x k+ ), the corresponding quantized value is Y = q k.

63 Uniform vs non-uniform quantization Uniform quantization is not a good strategy for distributions which significantly differ from uniform.

64 Uniform vs non-uniform quantization Uniform quantization is not a good strategy for distributions which significantly differ from uniform. If the distribution is non-uniform, it is better to spend more quantization levels on more probable parts of the distribution and fewer quantization levels on less probable parts.

65 Scalar Lloyd-Max quantizer X = source random variable with a known distribution. We assume it to be a continuous r.v. with PDF f X (x)>.

66 Scalar Lloyd-Max quantizer X = source random variable with a known distribution. We assume it to be a continuous r.v. with PDF f X (x)>. The results can be extended to discrete or mixed random variables, and to continuous random variables whose density can be zero for some x.

67 Scalar Lloyd-Max quantizer X = source random variable with a known distribution. We assume it to be a continuous r.v. with PDF f X (x)>. The results can be extended to discrete or mixed random variables, and to continuous random variables whose density can be zero for some x. Quantization intervals (x,x 2 ), [x 2,x 3 ), [x L,x L+ ) and levels q,, q L such that x = x L+ = < q < x 2 q 2 < x 3 q 3 < q L < x L q L < + I.e., q k k-th quantization interval

68 Scalar Lloyd-Max quantizer X = source random variable with a known distribution. We assume it to be a continuous r.v. with PDF f X (x)>. The results can be extended to discrete or mixed random variables, and to continuous random variables whose density can be zero for some x. Quantization intervals (x,x 2 ), [x 2,x 3 ), [x L,x L+ ) and levels q,, q L such that x = x L+ = < q < x 2 q 2 < x 3 q 3 < q L < x L q L < + I.e., q k k-th quantization interval Y = the result of quantizing X, a discrete random variable with L possible outcomes, q, q 2,, q L, defined by Y = Y (X) = q if X < x 2 q 2 if x 2 X < x 3 q L if x L X < x L q L X x L

69 Scalar Lloyd-Max quantizer: goal Given the pdf f X (x) of the source r.v. X and the desired number L of quantization levels, find the quantization interval endpoints x 2,,x L and quantization levels q,, q L to minimize the mean-square error, E[(Y X) 2 ].

70 Scalar Lloyd-Max quantizer: goal Given the pdf f X (x) of the source r.v. X and the desired number L of quantization levels, find the quantization interval endpoints x 2,,x L and quantization levels q,, q L to minimize the mean-square error, E[(Y X) 2 ]. To do this, express the mean-square error in terms of the quantization interval endpoints and quantization levels, and find the minimum (or minima) through differentiation.

71 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx

72 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = y(x) x L k = x k+ x k ( ) 2 f X (x)dx

73 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x L k = x k+ x k L k = x k+ x k ( ) 2 f X (x)dx

74 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x L k = x k+ x k L k = x k+ x k ( ) 2 f X (x)dx Minimize w.r.t. q k : q k E ( Y X) 2 = x k+ ( ) f X (x)dx 2 q k x = x k

75 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x L k = x k+ x k L k = x k+ x k ( ) 2 f X (x)dx Minimize w.r.t. q k : q k E ( Y X) 2 = x k+ ( ) f X (x)dx 2 q x k = x k x k+ q k f X (x)dx = xf X (x)dx x k x k+ x k

76 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x L k = x k+ x k L k = x k+ x k ( ) 2 f X (x)dx Minimize w.r.t. q k : q k E ( Y X) 2 = x k+ ( ) f X (x)dx 2 q x k = x k x k+ q k f X (x)dx = xf X (x)dx, therefore q k = x k x k+ x k x k+ x k x k+ x k xf X (x)dx f X (x)dx

77 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x L k = x k+ x k L k = x k+ x k ( ) 2 f X (x)dx Minimize w.r.t. q k : q k E ( Y X) 2 = x k+ ( ) f X (x)dx 2 q x k = x k x k+ q k f X (x)dx = xf X (x)dx, therefore q k = x k x k+ x k x k+ x k x k+ x k xf X (x)dx f X (x)dx = E[ X X k-th quantization interval]

78 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x L k = x k+ x k L k = x k+ x k ( ) 2 f X (x)dx Minimize w.r.t. q k : q k E ( Y X) 2 = x k+ ( ) f X (x)dx 2 q k x = x k x k+ q k f X (x)dx = xf X (x)dx, therefore q k = x k x k+ x k x k+ x k x k+ x k xf X (x)dx f X (x)dx = E[ X X k-th quantization interval] This is a minimum, since 2 q k 2 E ( )2 Y X = x k+ 2 f (x)dx X >. x k

79 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x Minimize w.r.t. x k, for k = 2,, L L k = x k+ x k L k = x k+ x k ( ) 2 f X (x)dx

80 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x Minimize w.r.t. x k, for k = 2,, L: x k E ( Y X) 2 = x k L k = x k+ x k x k+ ( q k x) 2 f X (x)dx + q k x x k x k x k ( ) 2 f X (x)dx L k = x k+ x k ( ) 2 f X (x)dx

81 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x Minimize w.r.t. x k, for k = 2,, L: x k E ( Y X) 2 = x k L k = x k+ x k x k+ ( q k x) 2 f X (x)dx + q k x x k x k = ( q k x k ) 2 f X (x k ) q k x k x k ( ) 2 f X (x k ) ( ) 2 f X (x)dx L k = x k+ x k ( ) 2 f X (x)dx

82 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x Minimize w.r.t. x k, for k = 2,, L: x k E ( Y X) 2 = x k = q k x k By assumption, f X (x) and q k q k. L k = x k+ x k x k+ ( q k x) 2 f X (x)dx + q k x x k x k x k ( ) 2 f X (x)dx L k = x k+ x k ( ) 2 f X (x)dx ( ) 2 f X (x k ) ( q k x k ) 2 f X (x k ) = ( q k q k )( q k + q k 2x k ) f X (x k ) =.

83 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x Minimize w.r.t. x k, for k = 2,, L: x k E ( Y X) 2 = x k L k = x k+ x k x k+ ( q k x) 2 f X (x)dx + q k x x k x k = q k x k By assumption, f X (x) and q k q k. Therefore, x k ( ) 2 f X (x)dx L k = x k+ x k ( ) 2 f X (x)dx ( ) 2 f X (x k ) ( q k x k ) 2 f X (x k ) = ( q k q k )( q k + q k 2x k ) f X (x k ) =. x k = q k + q k 2, for k = 2,, L.

84 Scalar Lloyd-Max quantizer: derivation E ( Y X) 2 = ( y(x) x)2 f X (x)dx = ( y(x) x) 2 f X (x)dx = q k x Minimize w.r.t. x k, for k = 2,, L: x k E ( Y X) 2 = x k L k = x k+ x k x k+ ( q k x) 2 f X (x)dx + q k x x k x k = q k x k By assumption, f X (x) and q k q k. Therefore, x k ( ) 2 f X (x)dx L k = x k+ x k ( ) 2 f X (x)dx ( ) 2 f X (x k ) ( q k x k ) 2 f X (x k ) = ( q k q k )( q k + q k 2x k ) f X (x k ) =. x k = q k + q k 2, for k = 2,, L. This is a minimum, since 2 x k 2 E ( )2 Y X = 2 ( q q k k ) f X (x k ) >.

85 Nonlinear system to be solved x k+ xf X (x)dx x q k = k x k+ = E[ X X k-th quantization interval], for k =,, L f X (x)dx x k x k = q + q k k, for k = 2,, L 2

86 Nonlinear system to be solved x k+ xf X (x)dx x q k = k x k+ = E[ X X k-th quantization interval], for k =,, L f X (x)dx x k x k = q + q k k, for k = 2,, L 2 Closed-form solution can be found only for very simple PDFs. E.g., if X is uniform, then Lloyd-Max quantizer = uniform quantizer.

87 Nonlinear system to be solved x k+ xf X (x)dx x q k = k x k+ = E[ X X k-th quantization interval], for k =,, L f X (x)dx x k x k = q + q k k, for k = 2,, L 2 Closed-form solution can be found only for very simple PDFs. E.g., if X is uniform, then Lloyd-Max quantizer = uniform quantizer. In general, an approximate solution can be found numerically, via an iterative algorithm (e.g., lloyds command in Matlab).

88 Nonlinear system to be solved x k+ xf X (x)dx x q k = k x k+ = E[ X X k-th quantization interval], for k =,, L f X (x)dx x k x k = q + q k k, for k = 2,, L 2 Closed-form solution can be found only for very simple PDFs. E.g., if X is uniform, then Lloyd-Max quantizer = uniform quantizer. In general, an approximate solution can be found numerically, via an iterative algorithm (e.g., lloyds command in Matlab). For real data, typically the PDF is not given and therefore needs to be estimated using, for example, histograms constructed from the observed data.

89 Vector Lloyd-Max quantizer? X = ( X(),, X(N) ) = source random vector with a given joint distribution. L = a desired number of quantization points.

90 Vector Lloyd-Max quantizer? X = ( X(),, X(N) ) = source random vector with a given joint distribution. L = a desired number of quantization points. We would like to find: () L events A,, A L that partition the joint sample space of X(),, X(N), and (2) L quantization points q A,,q L A L

91 Vector Lloyd-Max quantizer? X = ( X(),, X(N) ) = source random vector with a given joint distribution. L = a desired number of quantization points. We would like to find: () L events A,, A L that partition the joint sample space of X(),, X(N), and (2) L quantization points q A,,q L A L, such that the quantized random vector, defined by Y = q k if X A k, for k =,, L, minimizes the mean-square error, E Y X 2 = E N n= ( Y (n) X(n) )2

92 Vector Lloyd-Max quantizer? X = ( X(),, X(N) ) = source random vector with a given joint distribution. L = a desired number of quantization points. We would like to find: () L events A,, A L that partition the joint sample space of X(),, X(N), and (2) L quantization points q A,,q L A L, such that the quantized random vector, defined by Y = q k if X A k, for k =,, L, minimizes the mean-square error, E Y X 2 = E N n= ( Y (n) X(n) )2 Difficulty: cannot differentiate with respect to a set A k, and so unless the set of all allowed partitions is somehow restricted, this cannot be solved.

93 Hopefully, prior discussion gives you some idea about various issues involved in quantization. And now, on to entropy coding data transform quantization entropy coding compressed bitstream

94 Problem statement Source (e.g., image, video, speech signal, or quantizer output) Sequence of discrete random variables X(),,X(N) (e.g., transformed image pixel values), assumed to be independent and identically distributed over a finite alphabet {a,,a M }.

95 Problem statement Source (e.g., image, video, speech signal, or quantizer output) Sequence of discrete random variables X(),,X(N) (e.g., transformed image pixel values), assumed to be independent and identically distributed over a finite alphabet {a,,a M }. Encoder: mapping between source symbols and binary strings (codewords) Binary string Requirements: minimize the expected length of the binary string; the binary string needs to be uniquely decodable, i.e., we need to be able to infer X(),,X(N) from it!

96 Source (e.g., image, video, speech signal, or quantizer output) Problem statement Sequence of discrete random variables X(),,X(N) (e.g., transformed image pixel values), assumed to be independent and identically distributed over a finite alphabet {a,,a M }. Encoder: mapping between source symbols and binary strings (codewords) Since X(),,X(N) are assumed independent in this model, we will encode each of them separately. Each can assume any value among {a,,a M }. Therefore, our code will consist of M codewords, one for each symbol a,,a M. symbol codeword a w Binary string a M w M

97 Unique Decodability symbol codeword a b c d How to decode the following string:? It could be aaab or aad or acb or cab or cd. Not uniquely decodable!

98 A condition that ensures unique decodability Prefix condition: no codeword in the code is a prefix for any other codeword.

99 A condition that ensures unique decodability Prefix condition: no codeword in the code is a prefix for any other codeword. If the prefix condition is satisfied, then the code is uniquely decodable. Proof. Take a bit string W that corresponds to two different strings of symbols, A and B. If the first symbols in A and B are the same, discard them and the corresponding portion of W. Repeat until either there are no bits left in W (in this case A=B) or the first symbols in A and B are different. Then one of the codewords corresponding to these two symbols is a prefix for the other.

100 A condition that ensures unique decodability Prefix condition: no codeword in the code is a prefix for any other codeword. Visualizing binary strings. Form a binary tree where each branch is labeled or. Each codeword w can be associated with the unique node of the tree such that string of s and s on the path from the root to the node forms w.

101 A condition that ensures unique decodability Prefix condition: no codeword in the code is a prefix for any other codeword. Visualizing binary strings. Form a binary tree where each branch is labeled or. Each codeword w can be associated with the unique node of the tree such that string of s and s on the path from the root to the node forms w. Prefix condition holds if an only if all the codewords are leaves of the binary tree.

102 A condition that ensures unique decodability Prefix condition: no codeword in the code is a prefix for any other codeword. Visualizing binary strings. Form a binary tree where each branch is labeled or. Each codeword w can be associated with the unique node of the tree such that string of s and s on the path from the root to the node forms w. Prefix condition holds if an only if all the codewords are leaves of the binary tree---i.e., if no codeword is a descendant of another codeword.

103 Example: no prefix condition, no unique decodability, one word is not a leaf symbol codeword a b c d Codeword is a prefix for both codeword and codeword

104 Example: no prefix condition, no unique decodability, one word is not a leaf symbol codeword a b c d Codeword is a prefix for both codeword and codeword w a = w b =

105 Example: no prefix condition, no unique decodability, one word is not a leaf symbol codeword a b c d Codeword is a prefix for both codeword and codeword w c = w a = w b =

106 Example: no prefix condition, no unique decodability, one word is not a leaf symbol codeword a b c d Codeword is a prefix for both codeword and codeword w c = w d = w a = w b =

107 Example: prefix condition, all words are leaves symbol codeword a b c d w a =

108 Example: prefix condition, all words are leaves symbol codeword a b c d w b = w a =

109 Example: prefix condition, all words are leaves symbol codeword a b c d w c = w d = w b = w a =

110 Example: prefix condition, all words are leaves symbol codeword a b c d w c = w d = No path from the root to a codeword contains another codeword. This is equivalent to saying that the prefix condition holds. w b = w a =

111 Example: prefix condition, all words are leaves => unique decodability symbol codeword a b c d w c = w d = Decoding: traverse the string left to right, tracing the corresponding path from the root of the binary tree. Each time a leaf is reached, output the codeword and go back to the root. w b = w a =

112 Example: prefix condition, all words are leaves => unique decodability How to decode the following string? w c = w d = w b = w a =

113 Example: prefix condition, all words are leaves => unique decodability w c = w d = w b = w a =

114 Example: prefix condition, all words are leaves => unique decodability w c = w d = w b = w a =

115 Example: prefix condition, all words are leaves => unique decodability w c = w d = w b = w a =

116 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: c w b = w a =

117 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: c w b = w a =

118 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: c w b = w a =

119 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: c w b = w a =

120 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: cd w b = w a =

121 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: cd w b = w a =

122 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: cda w b = w a =

123 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: cda w b = w a =

124 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: cda w b = w a =

125 Example: prefix condition, all words are leaves => unique decodability w c = w d = output: cdab w b = w a =

126 Example: prefix condition, all words are leaves => unique decodability w c = w d = final output: cdab w b = w a =

127 Prefix condition and unique decodability There are uniquely decodable codes which do not satisfy the prefix condition (e.g., {, }).

128 Prefix condition and unique decodability There are uniquely decodable codes which do not satisfy the prefix condition (e.g., {, }). For any such code, a prefix condition code can be constructed with an identical set of codeword lengths. (E.g., {, } for {, }.)

129 Prefix condition and unique decodability There are uniquely decodable codes which do not satisfy the prefix condition (e.g., {, }). For any such code, a prefix condition code can be constructed with an identical set of codeword lengths. (E.g., {, } for {, }.) For this reason, we can consider just prefix condition codes.

130 Entropy coding Given a discrete random variable X with M possible outcomes ( symbols or letters ) a,,a M and with PMF p X, what is the lowest achievable expected codeword length among all the uniquely decodable codes? Answer depends on p X ; Shannon s source coding theorem provides bounds. How to construct a prefix condition code which achieves this expected codeword length? Answer: Huffman code.

131 Huffman code Consider a discrete r.v. X with M possible outcomes a,,a M and with PMF p X. Assume that p X (a ) p X (a M ). (If this condition is not satisfied, reorder the outcomes so that it is satisfied.)

132 Huffman code Consider a discrete r.v. X with M possible outcomes a,,a M and with PMF p X. Assume that p X (a ) p X (a M ). (If this condition is not satisfied, reorder the outcomes so that it is satisfied.) Consider aggregate outcome a 2 = {a,a 2 } and a discrete r.v. X such that X ' = a 2 if X = a or X = a 2 X otherwise

133 Huffman code Consider a discrete r.v. X with M possible outcomes a,,a M and with PMF p X. Assume that p X (a ) p X (a M ). (If this condition is not satisfied, reorder the outcomes so that it is satisfied.) Consider aggregate outcome a 2 = {a,a 2 } and a discrete r.v. X such that X ' = a 2 if X = a or X = a 2 X otherwise p X ' ( a) = ( ) + p X ( a 2 ) if a = a 2 ( a) if a = a 3,,a M p X a p X

134 Huffman code Consider a discrete r.v. X with M possible outcomes a,,a M and with PMF p X. Assume that p X (a ) p X (a M ). (If this condition is not satisfied, reorder the outcomes so that it is satisfied.) Consider aggregate outcome a 2 = {a,a 2 } and a discrete r.v. X such that X ' = a 2 if X = a or X = a 2 X otherwise p X ' ( a) = ( ) + p X ( a 2 ) if a = a 2 ( a) if a = a 3,,a M p X a p X Suppose we have a tree, T, for an optimal prefix condition code for X. A tree T for an optimal prefix condition code for X can be obtained from T by splitting the leaf a 2 into two leaves corresponding to a and a 2.

135 Huffman code Consider a discrete r.v. X with M possible outcomes a,,a M and with PMF p X. Assume that p X (a ) p X (a M ). (If this condition is not satisfied, reorder the outcomes so that it is satisfied.) Consider aggregate outcome a 2 = {a,a 2 } and a discrete r.v. X such that X ' = a 2 if X = a or X = a 2 X otherwise p X ' ( a) = ( ) + p X ( a 2 ) if a = a 2 ( a) if a = a 3,,a M p X a p X Suppose we have a tree, T, for an optimal prefix condition code for X. A tree T for an optimal prefix condition code for X can be obtained from T by splitting the leaf a 2 into two leaves corresponding to a and a 2. We won t prove this.

136 letter p X (letter) a. a 2. a 3.25 a 4.25 a 5.3 Example

137 letter p X (letter) a. a 2. a 3.25 a 4.25 a 5.3 Example Step : combine the two least likely letters. letter p X (letter) a 2.2 a 3.25 a 4.25 a 5.3

138 letter p X (letter) a. a 2. a 3.25 a 4.25 a 5.3 Example Step : combine the two least likely letters. a a 2 letter p X (letter) a 2.2 a 3.25 a 4.25 a 5.3 a 2

139 letter p X (letter) a. a 2. a 3.25 a 4.25 a 5.3 Tree for X: Example Step : combine the two least likely letters. a a 2 Tree for X (still to be constructed) letter p X (letter) a 2.2 a 3.25 a 4.25 a 5.3 a 2

140 letter p X (letter) a 2.2 a 3.25 a 4.25 a 5.3 Example Step 2: combine the two least likely letters from the new alphabet. letter p X (letter) a a 4.25 a 5.3

141 letter p X (letter) a 2.2 a 3.25 a 4.25 a 5.3 Example Step 2: combine the two least likely letters from the new alphabet. a a 2 a 2 a 3 a 23 letter p X (letter) a a 4.25 a 5.3

142 letter p X (letter) a 2.2 a 3.25 a 4.25 a 5.3 Tree for X: Example Step 2: combine the two least likely letters from the new alphabet. a a 2 a 2 a 3 a 23 Tree for X letter p X (letter) a a 4.25 a 5.3

143 letter p X (letter) a 2.2 a 3.25 a 4.25 a 5.3 Tree for X: Example Step 2: combine the two least likely letters from the new alphabet. a a 2 a 2 a 3 Tree for X a 23 Tree for X letter p X (letter) a a 4.25 a 5.3

144 letter p X (letter) a a 4.25 a 5.3 Example Step 3: again combine the two least likely letters letter p X (letter) a a a a 2 a 2 a 3 a 23 a 4 a 45 a 5

145 letter p X (letter) a a 4.25 a 5.3 Example Step 3: again combine the two least likely letters letter p X (letter) a a a Tree for X: a 2 a 2 a 3 a 23 Tree for X a 4 a 45 a 5

146 letter p X (letter) a a 4.25 a 5.3 Example Step 3: again combine the two least likely letters letter p X (letter) a a a Tree for X: a 2 a 2 a 3 Tree for X a 23 Tree for X a 4 a 45 a 5

147 letter p X (letter) a a 4.25 a 5.3 Example Step 3: again combine the two least likely letters letter p X (letter) a a a Tree for X Tree for X: a 2 a 2 a 3 Tree for X a 23 Tree for X a 4 a 45 a 5

148 Example letter p X (letter) a Step 4: combine the last two remaining letters a Done! a Tree for X: a 2 a 2 a 3 a 23 a 2345 a 4 a 45 a 5

149 Example letter p X (letter) a a Step 4: combine the last two remaining letters Done! The codeword for each leaf is the sequence of and s along the path from the root to that leaf. a Tree for X: a 2 a 3 a 4 a 5

150 Example a Tree for X: letter p X (letter) codeword a 2 a 3 a. a 2. a 4 a 3.25 a 4.25 a 5 a 5.3

151 Example a Tree for X: letter p X (letter) codeword a 2 a 3 a. a 2. a 4 a 3.25 a 4.25 a 5 a 5.3

152 Example a Tree for X: letter p X (letter) codeword a 2 a 3 a. a 2. a 4 a 3.25 a 4.25 a 5 a 5.3

153 Example a Tree for X: letter p X (letter) codeword a 2 a 3 a. a 2. a 4 a 3.25 a 4.25 a 5 a 5.3

154 Example a Tree for X: letter p X (letter) codeword a 2 a 3 a. a 2. a 4 a 3.25 a 4.25 a 5 a 5.3

155 Example Expected codeword length: 3(.) + 3(.) + 2(.25) + 2(.25) + 2(.3) = 2.2 bits a Tree for X: letter p X (letter) codeword a 2 a 3 a. a 2. a 4 a 3.25 a 4.25 a 5 a 5.3

156 Self-information Consider again a discrete random variable X with M possible outcomes a,,a M and with PMF p X.

157 Self-information Consider again a discrete random variable X with M possible outcomes a,,a M and with PMF p X. Self-information of outcome a m is I ) = log 2 p X ) bits.

158 Self-information Consider again a discrete random variable X with M possible outcomes a,,a M and with PMF p X. Self-information of outcome a m is I ) = log 2 p X ) bits. E.g., p X ) = then I ) =. The occurrence of a m is not at all informative, since it had to occur. The smaller the probability of an outcome, the larger its self-information.

159 Self-information Consider again a discrete random variable X with M possible outcomes a,,a M and with PMF p X. Self-information of outcome a m is I ) = log 2 p X ) bits. E.g., p X ) = then I ) =. The occurrence of a m is not at all informative, since it had to occur. The smaller the probability of an outcome, the larger its self-information. Self-information of X is I(X) = log 2 p X (X) and is a random variable.

160 Self-information Consider again a discrete random variable X with M possible outcomes a,,a M and with PMF p X. Self-information of outcome a m is I ) = log 2 p X ) bits. E.g., p X ) = then I ) =. The occurrence of a m is not at all informative, since it had to occur. The smaller the probability of an outcome, the larger its self-information. Self-information of X is I(X) = log 2 p X (X) and is a random variable. Entropy of X is the expected value of its self-information: H (X) = E I(X) M p X ) [ ] = p X )log 2 m=

161 Source coding theorem (Shannon) For any uniquely decodable code, the expected codeword length is H (X). Moreover, there exists a prefix condition code for which the expected codeword length is < H (X) +.

162 Example Suppose that X has M=2 K possible outcomes a,,a M.

163 Example Suppose that X has M=2 K possible outcomes a,,a M. Suppose that X is uniform, i.e., p X (a ) = = p X (a M ) = 2 K.

164 Example Suppose that X has M=2 K possible outcomes a,,a M. Suppose that X is uniform, i.e., p X (a ) = = p X (a M ) = 2 K. Then H (X) = E I(X) [ 2 K ] = 2 K log 2 2 K k = ( ) = 2 K ( 2 K ) K ( ) = K

165 Example Suppose that X has M=2 K possible outcomes a,,a M. Suppose that X is uniform, i.e., p X (a ) = = p X (a M ) = 2 K. Then H (X) = E I(X) [ 2 K ] = 2 K log 2 2 K k = ( ) = 2 K ( 2 K ) K ( ) = K On the other hand, observe that there exist 2 K different K-bit sequences. Thus, a fixed-length code for X that uses all these 2 K K-bit sequences as codewords for all the 2 K outcomes of X, will have expected codeword length of K.

166 Example Suppose that X has M=2 K possible outcomes a,,a M. Suppose that X is uniform, i.e., p X (a ) = = p X (a M ) = 2 K. Then H (X) = E I(X) [ 2 K ] = 2 K log 2 2 K k = ( ) = 2 K ( 2 K ) K ( ) = K On the other hand, observe that there exist 2 K different K-bit sequences. Thus, a fixed-length code for X that uses all these 2 K K-bit sequences as codewords for all the 2 K outcomes of X, will have expected codeword length of K. I.e., for this particular random variable, this fixed-length code achieves the entropy of X, which is the lower bound given by the source coding theorem.

167 Example Suppose that X has M=2 K possible outcomes a,,a M. Suppose that X is uniform, i.e., p X (a ) = = p X (a M ) = 2 K. Then H (X) = E I(X) [ 2 K ] = 2 K log 2 2 K k = ( ) = 2 K ( 2 K ) K ( ) = K On the other hand, observe that there exist 2 K different K-bit sequences. Thus, a fixed-length code for X that uses all these 2 K K-bit sequences as codewords for all the 2 K outcomes of X, will have expected codeword length of K. I.e., for this particular random variable, this fixed-length code achieves the entropy of X, which is the lower bound given by the source coding theorem. Therefore, the K-bit fixed-length code is optimal for this X.

168 Lemma : An auxiliary result helpful for proving the source coding theorem log 2 α (α ) log 2 e for log 2 α >. Proof: differentiate g(α) = (α ) log 2 e log 2 α and show that g() = is its minimum.

169 Another auxiliary result: Kraft inequality If integers d,,d M satisfy the inequality M 2 d m, () m= then there exists a prefix condition code whose codeword lengths are these integers. Conversely, the codeword lengths of any prefix condition code satisfy this inequality.

170 Some useful facts about full binary trees A full binary tree of depth D has 2 D leaves.

171 Some useful facts about full binary trees Tree depth D = 4 A full binary tree of depth D has 2 D leaves. (Here, depth is D=4 and the number of leaves is 2 4 =6.)

172 Some useful facts about full binary trees Tree depth D = 4 A full binary tree of depth D has 2 D leaves. (Here, depth is D=4 and the number of leaves is 2 4 =6.) Depth of red node = 2 In a full binary tree of depth D, each node at depth d has 2 D d leaf descendants. (Here, D=4, the red node is at depth d=2, and so it has = 4 leaf descendants.)

173 Kraft inequality: proof of Suppose d d M satisfy (). Consider the full binary tree of depth d M, and consider all its nodes at depth d. Assign one of these nodes to symbol a.

174 Kraft inequality: proof of Suppose d d M satisfy (). Consider the full binary tree of depth d M, and consider all its nodes at depth d. Assign one of these nodes to symbol a. Consider all the nodes at depth d 2 which are not a and not descendants of a. Assign one of them to symbol a 2.

175 Kraft inequality: proof of Suppose d d M satisfy (). Consider the full binary tree of depth d M, and consider all its nodes at depth d. Assign one of these nodes to symbol a. Consider all the nodes at depth d 2 which are not a and not descendants of a. Assign one of them to symbol a 2. Iterate like this M times.

176 Kraft inequality: proof of Suppose d d M satisfy (). Consider the full binary tree of depth d M, and consider all its nodes at depth d. Assign one of these nodes to symbol a. Consider all the nodes at depth d 2 which are not a and not descendants of a. Assign one of them to symbol a 2. Iterate like this M times. If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full binary tree of depth d M is a descendant of one of the first m symbols, a,,a r.

177 Kraft inequality: proof of Suppose d d M satisfy (). Consider the full binary tree of depth d M, and consider all its nodes at depth d. Assign one of these nodes to symbol a. Consider all the nodes at depth d 2 which are not a and not descendants of a. Assign one of them to symbol a 2. Iterate like this M times. If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full binary tree of depth d M is a descendant of one of the first m symbols, a,,a r. But note that every node at depth d m has 2 d M d m descendants. Note also that the full tree has 2 d M leaves. Therefore, if every leaf in the tree is a descendant of a,,a r, then r m= 2 d M d m = 2 d M

178 Kraft inequality: proof of Suppose d d M satisfy (). Consider the full binary tree of depth d M, and consider all its nodes at depth d. Assign one of these nodes to symbol a. Consider all the nodes at depth d 2 which are not a and not descendants of a. Assign one of them to symbol a 2. Iterate like this M times. If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full binary tree of depth d M is a descendant of one of the first m symbols, a,,a r. But note that every node at depth d m has 2 d M d m descendants. Note also that the full tree has 2 d M leaves. Therefore, if every leaf in the tree is a descendant of a,,a r, then r r 2 d M d m = 2 d M 2 d m m= m= =

179 Kraft inequality: proof of Suppose d d M satisfy (). Consider the full binary tree of depth d M, and consider all its nodes at depth d. Assign one of these nodes to symbol a. Consider all the nodes at depth d 2 which are not a and not descendants of a. Assign one of them to symbol a 2. Iterate like this M times. If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full binary tree of depth d M is a descendant of one of the first m symbols, a,,a r. But note that every node at depth d m has 2 d M d m descendants. Note also that the full tree has 2 d M leaves. Therefore, if every leaf in the tree is a descendant of a,,a r, then r r 2 d M d m = 2 d M 2 d m m= m= Therefore, M r 2 d m = 2 d m m= m= = M + 2 d m >. This violates (). m=r+

180 Kraft inequality: proof of Suppose d d M satisfy (). Consider the full binary tree of depth d M, and consider all its nodes at depth d. Assign one of these nodes to symbol a. Consider all the nodes at depth d 2 which are not a and not descendants of a. Assign one of them to symbol a 2. Iterate like this M times. If we have run out of tree nodes to assign after r < M iterations, it means that every leaf in the full binary tree of depth d M is a descendant of one of the first m symbols, a,,a r. But note that every node at depth d m has 2 d M d m descendants. Note also that the full tree has 2 d M leaves. Therefore, if every leaf in the tree is a descendant of a,,a r, then r r 2 d M d m = 2 d M 2 d m m= m= Therefore, M r 2 d m = 2 d m m= m= = M + 2 d m >. This violates (). m=r+ Thus, our procedure can in fact go on for M iterations. After the M -th iteration, we will have constructed a prefix condition code with codeword lengths d,,d M.

181 Kraft inequality: proof of Suppose d d M, and suppose we have a prefix condition code with there codeword lengths. Consider the binary tree corresponding to this code.

182 Kraft inequality: proof of Suppose d d M, and suppose we have a prefix condition code with there codeword lengths. Consider the binary tree corresponding to this code. Complete this tree to obtain a full tree of depth d M.

183 Kraft inequality: proof of Suppose d d M, and suppose we have a prefix condition code with there codeword lengths. Consider the binary tree corresponding to this code. Complete this tree to obtain a full tree of depth d M. Again use the following facts: the full tree has 2 d M leaves; the number of leaf descendants of the codeword of length d m is 2 d M d m.

184 Kraft inequality: proof of Suppose d d M, and suppose we have a prefix condition code with there codeword lengths. Consider the binary tree corresponding to this code. Complete this tree to obtain a full tree of depth d M. Again use the following facts: the full tree has 2 d M leaves; the number of leaf descendants of the codeword of length d m is 2 d M d m. The combined number of all leaf descendants of all codewords must be less than or equal to the total number of leaves in the full tree: M m= 2 d M d m 2 d M

185 Kraft inequality: proof of Suppose d d M, and suppose we have a prefix condition code with there codeword lengths. Consider the binary tree corresponding to this code. Complete this tree to obtain a full tree of depth d M. Again use the following facts: the full tree has 2 d M leaves; the number of leaf descendants of the codeword of length d m is 2 d M d m. The combined number of all leaf descendants of all codewords must be less than or equal to the total number of leaves in the full tree: M M 2 d M d m 2 d M 2 d m m= m=.

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive