Multimedia & Computer Visualization. Exercise #5. JPEG compression

dr inż. Jacek Jarnicki, dr inż. Marek Woda Institute of Computer Engineering, Control and Robotics Wroclaw University of Technology {jacek.jarnicki, marek.woda}@pwr.wroc.pl Exercise #5 JPEG compression The purpose of this exercise is to present the JPEG compression algorithm. In the first part of the exercise a compression algorithm for monochrome and color images will be discussed in detail. There will be following algorithm steps described starting from image conversion algorithm to luminance chrominance model, through cosine transform calculation to entropy encoding. The practical part of the exercise will consist in writing a simplified codec (encoder and decoder), which allows for simulation of almost the whole process of encoding and then image reconstruction. Simplification will rely on the fact that only monochrome images will only encoded and (due to high complexity) entropy coding part will be omitted. Lastly in the third part of the exercise an experiment was proposed which will involve a measurement of compression ratio and quality of the image obtained in the process of encoding and decoding. We will look at three issues: compression algorithm, implementation of simplified JPEG codec in MATLAB, compression impact on the quality of reconstructed image. 1. Image compression and decompression algorithm JPEG is a symmetric algorithm, which means that the operations performed in the decoder are the reversed operations in the respective encoder. Therefore, at the beginning encoding process will be described in detail, and later on just briefly decoding. It is assumed that the compressed image can be monochrome or color, namely image pixels are described respectively by one or three integers. Furthermore, it is assumed that the width and height of the image, measured in points are multiples of the number 8. If this condition is not met this should be corrected by adding to an image appropriate number of rows and (or) columns. Image coding algorithm is implemented in six steps.

Step 1 - Conversion to luminance chrominance model This step is performed ONLY for color images, which are stored differently than in terms of luminance chrominance model. When the compressed image is monochrome, this step is skipped. If we assume that the points of the source image are described using the R, G, B components - conversion to luminance-chrominance model, e.g. YUV model, is implemented precisely the same it was shown in Exercise 3 by calculation the color components using the formula: Y 0. 229 U 0. 146 V 0. 617 0. 587 0. 288 0. 517 0. 114 R 0. 434 G 0. 100 B (1) If, as it has been previously assumed, R, G, B components are integers - direct application of formula (1) will cause Y, U, V that these components will no longer be integers. They should be, further converted to integers e.g. by rounding or using, in place of equation (1), any other relationship which produces directly integers as the result. Step 2 - Image breakdown into blocks At the beginning, from all image components, an integer equal to half range for its elements is being subtracted. For example, if Y, U, V components are stored on 8 bits, then the range is 256, and number 128 is being subtracted. After the subtraction image is divided into so-called blocks (smaller images) size of 8x8 points. Image decomposition into small parts takes place, and these parts will further processed completely independently. If the image is monochrome, a block is described by only one array, in the case of color image by three arrays containing respectively Y, U, V components. To simplify the notation, and in either case the block will be stored as f array: f f (i, j ) i 0, 1,, 7 j 0, 1,, 7 (2) Step 3 - Cosine Transform calculation for blocks For each block cosine transform is calculated described by formulas (3) and (5). C( k )C( l ) F( k,l ) 4 f (i, j ) 4 1 7 7 7 7 2i 1 2 j 1 f ( i, j )cos k cos l 16 i0 j0 16 2i 1 2 j 1 C( k )C(l )F( k,l )cos k cos l 16 k0 l0 16 (3) (4) where:

1/ 2 for k 0 1/ 2 for l 0 C ( k) and C( l) (5) 1 for k 0 1 for l 0 As you can see, the transformation consist in calculation for f array (88 size), another array F, with elements F (k, l) also 88 size. Equation (4) describes the so-called inverse transform. It allows F array reconstruct back f array, namely a block of data describing the image. Inverse transform will be used in the decoder Step 4 - uantization of cosine transform coefficients The result of calculations from the previous step is F array, the cosine transform of data describing the block held in f array. Elements of F array, of course, are the real numbers. The transformation called quantization has two tasks. Firstly, image details removal, which is virtually invisible to the human eye. Secondly, real numbers replacement by numerical approximations. uantization is performed according to the relations described by formulas (6), (7) and (8). Thus, according to the formula (6) the effect of quantization is an array of integers F = F (k, l), which calculation comes down to division of cosine transform coefficient from array F by a suitable quantization table element and then rounding up result to the nearest integer number. Different tables are usually being used for quantization of transform coefficients describing luminance and chrominance components. An array (k, l) of elements received by using equation (7) is an example of the array used for luminance quantization, while the dependence (8) shows an example of chrominance quantization table. F( k,l ) F ( k,l ) Integer Round (6) ( k,l ) 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 56 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 (k, l) (7) 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99

17 18 24 47 24 40 51 61 18 21 26 66 26 58 60 56 24 26 56 99 99 99 99 99 47 99 99 99 99 99 99 99 ( k, l) (8) 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 For typical photographic images table F = F (k, l) obtained in the result of quantization contains a relatively large number of zeros displayed generally in the lower right corner of the array. Of course, the number and distribution of zeros depend on coefficients values of transform F and value of numbers stored in the quantization tables. As shown in formulas (7) and (8) the numbers (k, l) are much higher in the lower right area of the quantization tables, where, because these coefficients in table F correspond to image details which are virtually invisible to the human eye. It should be emphasized again that the presented quantization tables (7), (8) contain only sample values, illustrating the idea of detail elimination an image by integer approximation of the transform. In practice, values of numbers (k, l) can also be different. Exactly this problem will be illustrated in the practical part of the exercise. Step 5 - Conversion of coefficients table on vector The next step of the algorithm, which is to prepare the data obtained in the result of quantization of cosine transform coefficients for coding, is to convert an integer array F = F (k, l) on a vector. Conversion table on vector rely on saving array F elements in the order specified by the so-called zig-zag algorithm. Diagram explaining the idea of zig-zag algorithm is shown below. F F F ( k,l ) F ( 0, 0 ) ( 1, 0 ) ( 2, 0 ) F F F ( 0, 1) ( 11, ) ( 2, 1) F F F ( 0, 2 ) ( 1, 2 ) ( 2, 2 ) (9) For the convenience of further description it is convenient to introduce the following notation conventions F array elements: F DC AC (10) 1 AC2 AC3 AC63

The first element of coefficients vector DC = F (0, 0) is called the constant component, the next elements AC1 = F (0, 1), AC2 = F (1, 0) and so on until the AC63 = F (7, 7), defined according to the zig-zag algorithm, are called variable components. Array transformation on vector in accordance with zig-zag pattern makes sense, because usually a large number of zeros located in the lower right corner of the array, will be transformed after saving in the form of a vector in a long sequence of zeros. Such a sequence is an object which is relatively simply to be encoded efficiently. Step 6 - Entropy coding of coefficients vector The final step of compression algorithm is entropy coding of vectors F stored in the form (10). The idea of entropy coding consist is that word code length used to store the encoded element is related to probability of appearance of this element. To minimize the average length of the code (sequence of code words that describe new elements), those elements that occur frequently are encoded using the words shorter than the code words assigned to the elements occurring less frequently. Entropy coding of vectors F is performed separately for constant components DC and the variable components ACi. Coding of constant component DC In a typical photographic image nearby blocks are usually quite similar. The consequence of this fact is that the constant component of cosine transform for near blocks usually varies little. Therefore, JPEG algorithm uses constant component encoding not directly, but using differences. After the division into blocks carried out in the second steps of the algorithm, image structure is as shown on Figure 1 DC 0 DC 1 DC 2 block 0 block 1 block 2 DC k DC k+1 block k block k+1 DC 2k block 2k Fig. 1. Image divided into blocks

In previous steps of the algorithm, for each block quantized constant component of cosine transform DC i was calculated (upper index indicates the order of the block according to the diagram shown on Figure 1). Extracting constant component from each block one may create a vector as follows: DC 0 1 2 k k 1 m DC DC DC,, DC DC,, DC (11) where m is a number of image blocks. Encoding algorithm of DC vector elements is as follows: build vector as follows: where: 0 1 2 k k 1,,,, (12) m DC 0 0 k k 1 k DC DC dla k 1, 2,,m (13) For following k define size using given below dependency: size Integer Round log 2 abs 1 (14) k Encode vector elements using given below Hoffman code table. Tab. 1. Encoding array for constant component k value size Huffman code for size Supplemental bits 0 0 00 - -1, 1 1 010 0,1-3,-2, 2, 3 2 011 00,01,10,11-7,,-4,4, 7 3 100 000,,011,100,,111-15,,-8,8,,15 4 101 0000,,0111,1000,,1111-2047, -1024,1024,,2047 11 1 1111 1110 000 0000 0000,,111 1111 1111

A sequence of bits that encodes another k element of vector is arranged in the following manner. Firstly, for k and size number calculated using the formula (14) one reads in the column 3 of table 1, bits for Huffman code for size. Secondly, depending on the k position in column 1, one adds to read from column 3 bits of code, corresponding supplemental bits from the column 4. Analyzing the contents of Table 1, one can easily see what the idea of entropy coding. For a typical photographic image, probability that closely located to each other blocks differ little is large. In this case, k numbers, which are differences between constant components of nearby blocks, will also be with large probability most likely small, and often will be simply 0. In Table 1 the number k = 0 is assigned the shortest code (00). You can see also that the increasing k values (they are increasingly less likely) have been assigned longer codes. Encoding variable components AC1 - AC63 Encoding variable components ACk is performed separately for each block. Constant component DC has already been saved earlier. Each block is subjected to encoding, so the vector should be in form: 1 AC2 AC3,, AC63 AC (15) Performed in step 4 quantization of cosine transform coefficients has resulted in that relatively large of coefficients ACk take the value 0. Vector (15) can be, therefore, seen as a system built with some non-zero ACk coefficients, separated by sequences of zero coefficients, i.e.: AC AC AC,, AC, AC,, 0 AC,, AC, 1 2 3 63 k 1 0 k 0 k 1 If two non-zero ACk coefficients will be followed directly one after another, assigning sequence of zeros can be treated as an empty string of length 0 With this approach, to save the vector consisting of coefficients of variable components it is enough to specify the encoding rule for consecutive sequences of coefficients in the form 0,, 0, ACk. The method adopted in the JPEG algorithm, such a sequence is saved with a pair of so-called symbols. The following notation will be used with the individual symbols defined as follows: 0,, 0, AC k symbol _1, symbol _ 2 (16) symbol _ 1 symbol _ 2 amplitude runlength, size where runlength, size and amplitude elements are defined as follows:

runlength number of zeros between the non-zero coefficient ACk and the previous non-zero coefficient ACk-1, size number specifying range within which the coefficient ACk value is located, amplitude number expressing the value of the coefficient ACk. To encode sequence of coefficients in the form 0,, 0, ACk entropy method is used similarly as it took place when constant component were encoded, using Huffman code. However, the algorithm in this case is somewhat more complicated. You can save it in four steps: Calculate runlength, namely a number between non-zero coefficient ACk and previous non-zero coefficient ACk-1 For non-zero coefficient ACk calculate size using formula: size IntegerRound log 2 abs k 1 Encode a pair (runlength, size) according Huffman code table. Fragment of the table shown below. Tab. 2. Fragment of Huffman code table for pairs (runlength, size) (runlength, size) kod Huffmana (runlength, size) kod Huffmana (0, 1) 00 (0, 6) 1111000 (0, 2) 01 (1, 3) 1111001 (0, 3) 100 (5, 1) 1111010 EOB 1010 (6, 1) 1111011 (0, 4) 1011 (0, 7) 11111000 (1, 1) 1100 (2, 2) 11111001 (0, 5) 11010 (7, 1) 11111010 (1, 2) 11011 (1, 4) 111110110 (2, 1) 11100 (3, 1) 111010 ZRL 11111111001 (4, 1) 111011 Found in Table 2 indications: EOB and ZRL are used to store respectively: EOB zeros to the end of the block, ZRL sequence of 16 zeros, which can be treated as 15 zeros and a zero coefficient ACk. Table 2 is only a fragment consisting most likely to appear combinations of pairs (runlength, size) of Huffman coding table. Full table consists of 162 lines and can be found in AC_HUFFMAN.pdf

Encode amplitude number, using size number calculated in step 2 using the following table: Tab. 3. Array to encode coefficient ACk values as amplitude size ACk value Amplitude code 0 0 --- 1-1, 1 0, 1 2-3, -2, 2, 3 00, 01, 10, 11 3-7,,-4, 4,,7 000,, 011, 100,,111 4-15,,-8, 8,,15 0000,, 0111, 1000,, 1111 11-2047,,-1024,1024,,2047 000 0000 0000,,111 1111 1111 The same way an algorithm implemented in JPEG encoder works.. To sum up the encoding process can be presented in a simplified way, as shown on Figure 2 8x8 data source blocks DCT uantizer Bit coder Compressed data uantization tables Huffman code tables Fig. 2. Simplified structure of JPEG encoder The algorithm uses data describing the coded image, and two sets of tables, quantization and Huffman code tables. In the encoder, after the division of the image into blocks followed by calculation of the cosine transform for each block (3) and then quantization transform of coefficients (6), (7), (8). After quantization, coefficients arrays for each block are transformed into vectors according to zig-zag rule and encoded using Huffman code tables. Separately en coded is constant components for the individual blocks and variable components in the blocks.

Operations performed in the decoder are basically the inverse activities carried out in the encoder, as shown in Figure 3. Compressed data is converted in bit decoder, using Huffman code tables in the vectors F (10), which after applying the inverse algorithm to zig-zag algorithm are turned in the tables F (k, l) (9). After the inverse operation to the quantization, that is, multiplying the individual array elements F (k, l) by the corresponding elements of the array (k, l) it returns approximate arrays F of cosine transform coefficients. The next step is to calculate the inverse cosine transform (4), namely the restoration of data blocks. Compressed data Bit decoder Dequantizer DCT -1 8x8 blocks of reconstructed data Huffman code tables uantization tables 2. Simplified JPEG codec Fig. 3. Simplified structure of JPEG decoder In the practical part of the exercise you should write a program implementing a codec, which is a set of encoder and decoder, for images saved according to JPEG algorithm. Monochrome images of size 256 x 256 pixels, stored on 8 bits will be encoded. Due to the fact that the full encoding and decoding process is quite complicated (especially step 6) following simplifying assumptions will be adopted: Encoding and decoding will be subjected only a monochrome image, step 1 will be omitted that consists in conversion description of the image on YUV or YI color model. Due to the excessive complexity, step 5 (table to vector conversion using zig-zag rule) along with step 6 (entropy coding) will be omitted. Introduced restrictions will cause that the result of encoding will be only array of quantized cosine transform coefficients F = F (k, l) received as the result of step 4. In addition to the possibility of obtaining experimental study of quantization effect on the quality of the compression additional variable called α, accepting integer values 1, 2, indicating the degree of compression is introduced. By changing α one can easily affect the degree of compression in a way that α values are multiplied by all the elements of the quantization table. The relationship describing the quantization process takes form:

F ( k, l) Integer Round F( k, l) ( k, l) (17) Equation (17) is, so an extended version of the formula (6) and the meaning is that the greater value α is than array elements of cosine transform F(k, l) will be divided by the greater number and in array F = F (k, l) more zeros appear. Simplified diagram of the encoder takes into account the modification consisting in controlling the degree of compression with variable α is shown on Figure 4. In the decoder, whose scheme is shown on Figure 5 inverse quantization operation for a given compression level α is realized using the formula: ~ but F k, l ~ F k, l F ( k, l) ( k, l) (18) is an array of cosine transform coefficients recreated in the decoder. Simplified decoder structure is presented on Figure 5. As you can see the operations performed in the encoder and decoder are practically symmetrical. Source image Subtraction the number of 128 from the value of picture elements Breakdown of the image into blocks and calculate the cosine transform blocks, formula (3) uantization table, formula (7) uantization of cosine transform coefficients, formula (17) Compression level α Encoded image F = F (k, l) Fig. 4. Simplified diagram of JPEG encoder

Encoded image F = F (k, l) uantization table, formula (7) Reversal quantization of cosine transform coefficients, formula (18) Compression level α Calculation inverse cosine transform for the image blocks, formula (4) (formula 4) Addition the number of 128 to the value of picture elements Reconstructed image Fig. 5. Simplified diagram of JPEG decoder JPEG codec skeleton program in MATLAB The program will consist of three files: "Main" program implementing the encoder and decoder, for example, called jpeg_codec.m, quantization.m function implementing quantization formula (17), dequantization.m function used to reverse the quantization operation, implementing the formula (18). quantization.m and dequantization.m functions will be called at appropriate places jpeg_codec.m program Guidelines for implementing the various functions are as follows: jpeg_codec.m function Coder (implementation of process from Figure 4) Create a file called jpeg_codec.m.

Read from file source image Lena_gray_8.tif and place it in an array. Convert elements of the array containing read image to double format. Use a simple conversion of an integer array to an array of double, i.e. B = double(a) and not B = im2double(a), which scales elements to the values from [0, 1] range. From the array elements obtained in the previous step, the number 128 must be subtracted. Divide the array containing the elements of an image into blocks and perform the cosine transform on block elements. This can be very easily done using the function blkproc () (refer to System Help), which executes given function on the blocks of a given size in the array A, and places results in the array B. In the codec code you should just call a function in the form: B = blkproc (A, [8,8], 'dct2'); This will cause the array A is divided into blocks of size 8 x 8, on each of the blocks will be made two-dimensional cosine transform described by the formula (3), implemented in the package MATALB as dct2 function, and result is placed in the array B. Perform the quantization of cosine transform coefficients, using again blkproc( ) function, while the third argument of the function should be this time quantization enforcing quantization function. This time you should write this function on your own (its construction will be given later on). The result is a table containing the quantized cosine transform coefficients, and thus in line with the encoded image assumptions. Decoder (implementation of process from Figure 5) Perform a reverse quantization of encoded image coefficients using blkproc( ) function again, the third argument of the function should now be dequantization function enforcing reversal of quantization. Alike to quantization function it should be written by you. Using again blkproc( )function, this time with the third parameter idct2 to calculate the inverse cosine transform for blocks of processed array. This function in MATLAB is used to calculate the inverse cosine transform described by formula (4). Add the number 128 to the elements obtained in the previous step. Normalize array elements to the values from [0, 1], by dividing them by the number 255. Convert the array to image in uint8 using im2uint8( ) function getting in the result the image reconstructed from the compressed data. Display the source (read from a file) and the reconstructed image in the graphic window.

quantization.m function The aim of the function is to perform quantization function described by formula (17). Specific steps that should be done are as follows: Create a file called quantization.m Declare a form of the function by typing in the first line of file function B = quantization(a) Declare a variable alpha (compression ratio α) and set the value of this variable, for example to 1. Declare array of size 8 x 8 in accordance to formula (7). Encode formula (17) using round( )to round results up. dequantization.m function The function is to realize a reversal quantization operation described by (18). The structure of function is similar to the previously described quantization function. Create a file called dequantization.m Declare a form of the function by typing in the first line of file function B = dequantization(a) Declare a variable alpha (compression ratio α) and set the value of this variable, for example to 1. Declare array of size 8 x 8 in accordance to formula (7). Encode formula (18). Execution of jpeg_codec program should result in presentation of two images in a graphics window, the source and the reconstructed image, which is obtained after a compression and decompression processes. Images should be different. It is known that execution of above-described JPEG algorithm is a lossy compression and therefore the reconstructed image is no longer the same as the source image, since some of the details are removed from the source image (and permanently lost). By changing the value of α coefficient (implemented in the program by means of alpha variable), by the successive integers 1, 2, etc. can affect the degree of compression and observe how the reconstructed image quality changes. Examples of the source and reconstructed images at the values α = 1 and α = 10 are shown on Figures 6 and 7. As it is shown on the figures, for α = 1 reconstructed image is almost no different from the source and for α = 10 a significant difference can be seen. You can perform an experiment in changing the compression ratio α and observation of both images. Of course visual evaluation is, in some sense, conclusive, however, for an objective images comparison and to determine the impact of compression on the quality certain numerical criteria are used.

3. Compression ratio and quality of reconstructed image Compression ratio achieved as a result of the use of any compression algorithm can obviously be measured by comparing the size of the file containing the source image with the size of the file where you saved the data after compression. In the case of a simplified version of the algorithm, which was proposed in the previous paragraph, it is not directly possible, because as a result of the compression algorithm does not create a file and only an array of quantized cosine transform coefficients. It is because the algorithm implementation skips the last vital step - the entropy coding. Fig. 6. Source and reconstructed image for α = 1 However, it is possible to introduce a reasonable criterion, to measure compression ratio. This criterion may be, for example, the number of zeros in the array of quantized cosine transform coefficients. You could say that the more zeros the better the compression is.

Fig. 7. Source and reconstructed image for α = 10 Assessment of compression ratio Taking the number of zeros in the array with quantized cosine transform coefficients as criterion to measure compression ratio, modify previously written program as follows: In jpeg_codec program after the part that implements encoder, calculate the number of zeros in the array of quantized cosine transform coefficients. It is best to use the nnz( ) (refer to system help) to calculate a number of zeros in the array. Add a code that displays on the console a number corresponding to the percentage of zeros in the array of quantized cosine transform coefficients. After these simple modifications, observe how the percentage of zeros changes for different values of α coefficient. Evaluation of the reconstructed image quality Reconstructed image quality evaluation (received from compressed data) can be performed by comparing the source image and reconstructed. For comparison of two images, commonly used criterion of mean square error MSE (Mean Square Error) might be used, which is expressed as: where: n m 1 2 MSE( X, Y) xi, j yi, j (19) n m i1 j1

X array of elements x(i, j) describing the source image, Y array of elements y(i, j) describing the reconstructed image, n, m images size. Often, the MSE criterion is also presented for convenience, in a slightly different form, on a logarithmic scale. The criterion is then called PSNR (Peak Signal to Noise Ratio), which means the ratio of peak signal to noise ratio and is described by the formula: 255 PSNR X, Y 20 log10 (20) MSE X, Y PSNR unit is the decibel marked the so-called abbreviated as db. In order not to complicate the task, it has been added ready to use function [mse, psnr] = mse_error(a, B) calculating for two square-images values of MSE and PSNR. Using this function, carry out an experiment consists in determining the MSE and PSNR for the source and reconstructed image for different values of α (compression ratio) and summarize results in the table and present them on chart.