CSE 408 Multimedia Information System Yezhou Yang

Image and Video Compression CSE 408 Multimedia Information System Yezhou Yang Lots of slides from Hassan Mansour

Class plan Today: Project 2 roundup Today: Image and Video compression Nov 10: final project update and help meeting Then will talk about RGBD data and Skeleton estimation and tracking Dec 01: Final project presentation

Image Compression Most popular techniques: Color subsampling Transform Coding (JPEG, JPEG2000, ) Fractal Coding Available Compression standards: JPEG, JPEG2000, JPEG-LS, JBIG, JBIG2 (ISO) GIF, PNG, TIFF, BMP, WMP,

JPEG quality per compression ratio Original Compressed to 12% Compressed to 6% Compressed to 2%

JPEG compression Zig-Zag Scan DCT(x-128) JPEG file Entropy Coding Quantization

Quantization ˆ F( u, v) F ( u, v ) round Q ( u, v ) F(u, v) represents a DCT coefficient, Q(u, v) is a quantization matrix entry, and F ˆ ( u, v) represents the quantized DCT coefficients which JPEG will use in the succeeding entropy coding. The quantization step is the main source for loss in JPEG compression. The entries of Q(u, v) tend to have larger values towards the lower right corner. This aims to introduce more loss at the higher spatial frequencies a practice supported by Observations 1 and 2. Table 9.1 and 9.2 show the default Q(u, v) values obtained from psychophysical studies with the goal of maximizing the compression ratio while minimizing perceptual losses in JPEG images. (9.1) 6 Li, Drew, Liu

Table 9.1 The Luminance Quantization Table 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 Table 9.2 The Chrominance Quantization Table 17 18 24 47 99 99 99 99 18 21 26 66 99 99 99 99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 7 Li, Drew, Liu

An 8 8 block from the Y image of Lena 200 202 189 188 189 175 175 175 200 203 198 188 189 182 178 175 203 200 200 195 200 187 185 175 200 200 200 200 197 187 187 187 200 205 200 200 195 188 187 175 200 200 200 200 200 190 187 175 205 200 199 200 191 187 187 175 210 200 200 200 188 185 187 186 f(i, j) 515 65-12 4 1 2-8 5-16 3 2 0 0-11 -2 3-12 6 11-1 3 0 1-2 -8 3-4 2-2 -3-5 -2 0-2 7-5 4 0-1 -4 0-3 -1 0 4 1-1 0 3-2 -3 3 3-1 -1 3-2 5-2 4-2 2-3 0 F(u, v) Fig. 9.2: JPEG compression for a smooth image block. 8 Li, Drew, Liu

Fig. 9.2 (Cont d): JPEG compression for a smooth image block. 9 Li, Drew, Liu

Another 8 8 block from the Y image of Lena 70 70 100 70 87 87 150 187 85 100 96 79 87 154 87 113 100 85 116 79 70 87 86 196 136 69 87 200 79 71 117 96 161 70 87 200 103 71 96 113 161 123 147 133 113 113 85 161 146 147 175 100 103 103 163 187 156 146 189 70 113 161 163 197 f(i, j) -80-40 89-73 44 32 53-3 -135-59 -26 6 14-3 -13-28 47-76 66-3 -108-78 33 59-2 10-18 0 33 11-21 1-1 -9-22 8 32 65-36 -1 5-20 28-46 3 24-30 24 6-20 37-28 12-35 33 17-5 -23 33-30 17-5 -4 20 F(u, v) Fig. 9.2: JPEG compression for a smooth image block. 10 Li, Drew, Liu

Fig. 9.3 (Cont d): JPEG compression for a textured image block. 11 Li, Drew, Liu

Run-length Coding (RLC) on AC coefficients F ˆ ( u, v) RLC aims to turn the values into sets {#-zerosto-skip, next non-zero value}. To make it most likely to hit a long run of zeros: a zig-zag scan is used to turn the 8 8 matrix into a 64-vector. F ˆ ( u, v) Fig. 9.4: Zigzag scan in JPEG. 12 Li, Drew, Liu

DPCM on DC coefficients The DC coefficients are coded separately from the AC ones. Differential Pulse Code modulation (DPCM) is the coding method. If the DC coefficients for the first 5 image blocks are 150, 155, 149, 152, 144, then the DPCM would produce 150, 5, -6, 3, -8, assuming d i = DC i+1 DC i, and d 0 = DC 0. 13 Li, Drew, Liu

Entropy Coding The DC and AC coefficients finally undergo an entropy coding step to gain a possible further compression. Use DC as an example: each DPCM coded DC coefficient is represented by (SIZE, AMPLITUDE), where SIZE indicates how many bits are needed for representing the coefficient, and AMPLITUDE contains the actual bits. In the example we re using, codes 150, 5, 6, 3, 8 will be turned into (8, 10010110), (3, 101), (3, 001), (2, 11), (4, 0111). SIZE is Huffman coded since smaller SIZEs occur much more often. AMPLITUDE is not Huffman coded, its value can change widely so Huffman coding has no appreciable benefit. 14 Li, Drew, Liu

Table 9.3: Baseline entropy coding details size category. SIZ E AMPLITUDE 1-1, 1 2-3, -2, 2, 3 3-7..-4, 4..7 4-15..-8, 8..15...... 10-1023..-512, 512..1023 15 Li, Drew, Liu

Information Theory and Entropy Information theory studies the quantification, storage, and communication of information. It was originally proposed by Claude E. Shannon in 1948 to find fundamental limits on signal processing and communication operations such as data compression, in a landmark paper entitled "A Mathematical Theory of Communication".

Video Compression (MPEG) Started with MPEG-1/H.261 (1991, video CD) MPEG-2 was launched in 1994 and became the standard for DVDs. Finally, MPEG-4 was introduced in 1998. MPEG-4 ISO is based on H.263 MPEG-4/AVC is the new H.264 standard used in HD- DVD and Blu-ray Disc. Both MPEG-4 ISO and H.264/AVC are used for mobile video.

Basic Compression Algorithm Picture N Picture N+1 Difference -Encode picture N as JPEG -Perform Motion Estimation/Compensation on picture N+1 using picture N -Encode the difference as JPEG Motion Estimation / Compensation :

10.1 Introduction to Video Compression A video consists of a time-ordered sequence of frames, i.e., images. An obvious solution to video compression would be predictive coding based on previous frames. Compression proceeds by subtracting images: subtract in time order and code the residual error. It can be done even better by searching for just the right parts of the image to subtract from the previous frame. 19 Li, Drew, Liu

10.2 Video Compression with Motion Compensation Consecutive frames in a video are similar temporal redundancy exists. Temporal redundancy is exploited so that not every frame of the video needs to be coded independently as a new image. The difference between the current frame and other frame(s) in the sequence will be coded small values and low entropy, good for compression. Steps of Video compression based on Motion Compensation (MC): 1. Motion Estimation (motion vector search). 2. MC-based Prediction. 3. Derivation of the prediction error, i.e., the difference. 20 Li, Drew, Liu

Motion Compensation Each image is divided into macroblocks of size N x N. - By default, N = 16 for luminance images. For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted. Motion compensation is performed at the macroblock level. - The current image frame is referred to as Target Frame. - A match is sought between the macroblock in the Target Frame and the most similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)). - The displacement of the reference macroblock to the target macroblock is called a motion vector MV. - Figure 10.1 shows the case of forward prediction in which the Reference frame is taken to be a previous frame. 21 Li, Drew, Liu

Fig. 10.1: Macroblocks and Motion Vector in Video Compression. MV search is usually limited to a small immediate neighborhood both horizontal and vertical displacements in the range [ p, p]. This makes a search window of size (2p + 1) x (2p + 1). 22 Li, Drew, Liu

10.3 Search for Motion Vectors The difference between two macroblocks can then be measured by their Mean Absolute Difference (MAD): N 1 N 1 MAD( i, j) 1 C( x k, y l) R( x i k, y j l) 2 N k 0 l 0 (10.1) N size of the macroblock, k and l indices for pixels in the macroblock, i and j horizontal and vertical displacements, C ( x + k, y + l ) pixels in macroblock in Target frame, R ( x + i + k, y + j + l ) pixels in macroblock in Reference frame. The goal of the search is to find a vector (i, j) as the motion vector MV = (u, v), such that MAD(i, j) is minimum: ( u, v) ( i, j) MAD( i, j) is minimum, i [ p, p], j [ p, p] (10.2) 23 Li, Drew, Liu

Sequential Search Sequential search: sequentially search the whole (2p + 1) x (2p + 1) window in the Reference frame (also referred to as Full search). - a macroblock centered at each of the positions within the window is compared to the macroblock in the Target frame pixel by pixel and their respective MAD is then derived using Eq. (10.1). - The vector (i, j) that offers the least MAD is designated as the MV (u, v) for the macroblock in the Target frame. - Sequential search method is very costly assuming each pixel comparison requires three operations (subtraction, absolute value, addition), the cost for obtaining a motion vector for a single macroblock is (2p + 1) (2p + 1) N 2 3 O ( p 2 N 2 ). 24 Li, Drew, Liu

PROCEDURE 10.1 Motionvector:sequential-search BEGIN min_mad = LARGE NUMBER; /* Initialization */ for i = p to p { for j = p to p { cur_mad = MAD(i, j); if cur_mad < min_mad { min_mad = cur_mad; u = i; /* Get the coordinates for MV. */ v = j; } } } END 25 Li, Drew, Liu

2D Logarithmic Search Logarithmic search: a more efficient version, that is suboptimal but still usually effective. The procedure for 2D Logarithmic Search of motion vectors takes several iterations and is akin to a binary search: - As illustrated in Fig. 10.2, initially only nine locations in the search window are used as seeds for a MAD-based search; they are marked as 1. - After the one that yields the minimum MAD is located, the center of the new search region is moved to it and the step-size ( offset ) is reduced to half. - In the next iteration, the nine new locations are marked as 2 and so on. 26 Li, Drew, Liu

Fig. 10.2: 2D Logarithmic Search for Motion Vectors. 27 Li, Drew, Liu

PROCEDURE 10.2 Motion-vector:2Dlogarithmic-search BEGIN offset = ; Specify nine macroblocks within the search window in the Reference frame, they are centered at (x 0, y 0 ) and separated by offset horizontally and/or vertically; WHILE last TRUE { Find one of the nine specified macroblocks that yields minimum MAD; IF offset = 1 THEN last = TRUE; offset = offset/2 ; Form a search region with the new offset and new center found; } END 28 Li, Drew, Liu

Using the same example as in the previous subsection, the total operations per second is dropped to: 2 OPS _ per _ second 8 (log 2 p 1) 1 N 3 720 480 30 N N 2 8 log 215 9 16 3 720 480 30 16 16 9 1.25 10 29 Li, Drew, Liu

Hierarchical Search The search can benefit from a hierarchical (multiresolution) approach in which initial estimation of the motion vector can be obtained from images with a significantly reduced resolution. Figure 10.3: a three-level hierarchical search in which the original image is at Level 0, images at Levels 1 and 2 are obtained by down-sampling from the previous levels by a factor of 2, and the initial search is conducted at Level 2. Since the size of the macroblock is smaller and p can also be proportionally reduced, the number of operations required is greatly reduced. 30 Li, Drew, Liu

Fig. 10.3: A Three-level Hierarchical Search for Motion Vectors. 31 Li, Drew, Liu

Hierarchical Search (Cont d) Given the estimated motion vector (u k, v k ) at Level k, a 3 x 3 neighborhood centered at (2 u k, 2 v k ) at Level k 1 is searched for the refined motion vector. The refinement is such that at Level k 1 the motion vector (u k 1, v k 1 ) satisfies: (2u k 1 u k 1 2u k +1, 2v k 1 v k 1 2v k +1) Let (x k, 0 yk 0 ) denote the center of the macroblock at Level k in the target frame. The procedure for hierarchical motion vector search for the macroblock centered at (x 0, 0 y0 0 ) in the Target frame can be outlined as follows: 32 Li, Drew, Liu

PROCEDURE 10.3 Motionvector:hierarchical-search BEGIN END // Get macroblock center position at the lowest resolution Level k x k 0 = x 0 0 /2 k ; y k 0 = y 0 0 / 2 k ; Use Sequential (or 2D Logarithmic) search method to get initial estimated MV(u k, v k ) at Level k; WHILE last TRUE { } Find one of the nine macroblocks that yields minimum MAD at Level k 1 centered at (2(x k 0+u k ) 1 x 2(x k 0+u k ) + 1; 2(y k 0+v k ) 1 y 2(y k 0 +v k ) + 1 ); IF k = 1 THEN last = TRUE; k = k 1; Assign (x k 0 ; yk 0 ) and (uk, v k ) with the new center location and MV; 33 Li, Drew, Liu

Table 10.1: Comparison of Computational Cost of Motion Vector Search Methods 34 Li, Drew, Liu

Decoding example Residual Signal Motion- Compensated + Decoded Signal

Video Compression Reference http://mia.ece.uic.edu/~papers/www/multimediastandar ds/chapter7.pdf

Project 2 Roundup