Transform coding - topics Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform Threshold coding Typical coding artifacts Fast implementation of the DCT Bernd Girod: EE38b Image and Video Compression Transform Coding no. Principle of block-wise transform coding original image reconstructed image original image block reconstructed block transform Transform A Quantization & Transmission Inverse transform A- quantized transform Bernd Girod: EE38b Image and Video Compression Transform Coding no.
Properties of orthonormal transforms Forward transform y = Ax NxN transform, arranged as a vector Inverse transform Transform matrix of size N xn x = A y = A T y Input signal block of size NxN, arranged as a vector x Linearity: functions. is represented as linear combination of basis Parseval s Theorem holds: transform is a rotation of the signal vector around the origin of an N -dimensional vector space. Bernd Girod: EE38b Image and Video Compression Transform Coding no. 3 Separable orthonormal transforms, I An orthonormal transform is separable, if the transform of a signal block of size NxN can be expressed by y = AxA T Note: A = A A NxN transform Orthonormal transform matrix of size NxN The inverse transform is NxN block of input signal Kronecker product x = A T ya Great practical importance: The transform requires matrix multiplications of size NxN instead one multiplication of a vector of size xn with a matrix of size N xn Reduction of the complexity from O(N ) to O(N 3 ) Bernd Girod: EE38b Image and Video Compression Transform Coding no.
Separable orthonormal transforms, II -d transform realized by one-dimensional transforms (along rows and columns of the signal block) N NxN block of pixels N x Ax AxA T column-wise N-transform row-wise N-transform NxN block of transform Bernd Girod: EE38b Image and Video Compression Transform Coding no. 5 Criteria for the selection of a particular transform Decorrelation, energy concentration (e.g., KLT, DCT,...) Visually pleasant basis functions (e.g., pseudo-randomnoise, m-sequences, lapped transforms) Low complexity of computation Bernd Girod: EE38b Image and Video Compression Transform Coding no. 3
Karhunen Loève Transform (KLT) Karhunen Loève Transform (KLT) yields decorrelated transform. Basis functions are eigenvectors of the covariance matrix of the input signal. KLT achieves optimum energy concentration. Disadvantages: KLT dependent on signal statistics KLT not separable for image blocks Transform matrix cannot be factored into sparse matrices. Bernd Girod: EE38b Image and Video Compression Transform Coding no. 7 Comparison of various transforms, I Karhunen Loève transform (98/9) Haar transform (9) Walsh-Hadamard transform(93) Slant transform (Enomoto, Shibata, 97) Discrete CosineTransform (DCT) (Ahmet, Natarajan, Rao, 97) Comparison of -d basis functions for block size N=8 Bernd Girod: EE38b Image and Video Compression Transform Coding no. 8
Comparison of various transforms, II Energy concentration measured for typical natural images, block size x3 [Lohscheller]: KLT is optimum DCT performs only slightly worse than KLT Bernd Girod: EE38b Image and Video Compression Transform Coding no. 9 Discrete cosine transform and discrete Fourier transform Transform coding of images using the Discrete Fourier Transform (DFT): For stationary image statistics, the energy concentration properties of the DFT converge against those of the KLT for large block sizes. Problem of blockwise DFT coding: blocking effects due to circular topology of the DFT and Gibbs phenomena. Remedy: reflect image at block boundaries, DFT of larger symmetric block -> DCT edge folded pixel folded Bernd Girod: EE38b Image and Video Compression Transform Coding no. 5
DCT Type II-DCT of blocksize MxM is defined by transform matrix A containing elements D basis functions of the DCT: π(k +)i a ik = α i cos M for i, k =,..., M with α = α i = M M i Bernd Girod: EE38b Image and Video Compression Transform Coding no. Amplitude distribution of the DCT Histograms for 8x8 DCT coefficient amplitudes measured for natural images (from Mauersberger): DC coefficient is typically uniformly distributed. For the other, the distribution resembles a Laplacian pdf. Bernd Girod: EE38b Image and Video Compression Transform Coding no.
Bit allocation for transform I Problem: divide bit-rate R among MxM transform i such that resulting distortion D is minimized. Assumptions R = R i D = D i i i Total rate Rate for coefficient i Total distortion Distortion contributed by coefficient i lead to "Pareto condition" D i R i = D j R j for all i, j Bernd Girod: EE38b Image and Video Compression Transform Coding no. 3 Bit allocation for transform II Additional assumptions Gaussian r.v. and mse distortion yield the optimum rate for each transform coefficient i: variance of transform coefficient i R i = max, log σ i θ { } D i = min σ i,θ bit Threshold parameter, same for all i In practice, with variable length coding, one often uses distortion allocation instead of bit allocation Bernd Girod: EE38b Image and Video Compression Transform Coding no. 7
Bit allocation for transform III Extension to weighted m.s.e. distortion measure D = i w i D i R i = max, log w i σ i bit θ D i = min σ i, θ w i Often implemented by scaling by to quantization ( weighting matrix ) ( w i ) prior Bernd Girod: EE38b Image and Video Compression Transform Coding no. 5 Threshold coding, I Transform that fall below a threshold are discarded. Implementation by uniform quantizer with threshold characteristic: Quantizer output Quantizer input Positions of non-zero transform are transmitted in addition to their amplitude values. Bernd Girod: EE38b Image and Video Compression Transform Coding no. 8
Threshold coding, II Efficient encoding of the position of non-zero transform : zig-zag-scan + run-level-coding ordering of the transform by zig-zag-scan Bernd Girod: EE38b Image and Video Compression Transform Coding no. 7 Threshold coding, III 98 9 79 8 8 9 8 87 9 9 8 8 85 89 7 88 85 93 79 88 88 87 7 DCT 8 88 8 87 83 8 95 7 9 93 89 87 8 83 8 85 93 95 93 9 7 89 87 8 8 85 83 8 75 8 85 7 95 85 77 78 7 79 95 75 Original 8x8 block Reconstructed 8x8 block 9 95 8 77 8 93 7 89 9 95 8 8 87 9 7 88 85 9 8 85 87 89 7 89 88 85 83 83 8 9 75 9 9 8 89 79 8 88 78 9 9 89 9 77 8 8 79 89 88 85 8 75 8 87 79 89 88 78 7 73 83 93 8 8 9-5 -8 8-8 -8 - -3-5 9 5-8 -5 5 5-8 - - 3-8 8-8 - -5 - - -3 3 3-3 -3 - - scaling and inverse DCT Q 85 3-3 - - - - - - - 85 3-3 - - - - - - - inverse zig-zagscan (85 3 - -3 - - - - - - EOB) run-levelcoding Mean of block: 85 (,3) (,) (,) (,) (,) (,-) (,) (,) (,) (,-3) (,) (,-) (,) (,-) (,- ) (,-) (,) (9,-) (,-) (EOB) transmission Mean of block: 85 (,3) (,) (,) (,) (,) (,-) (,) (,) (,) (,-3) (,) (,-) (,) (,-) (,- ) (,-) (,) (9,-) (,-) (EOB) run-leveldecoding (85 3 - -3 - - - - - - EOB) Bernd Girod: EE38b Image and Video Compression Transform Coding no. 8 9
3 - - -3 3 - - -3 3 - - -3 3 - - -3 3 - - -3 3 - - -3 Detail in a block vs. DCT transmitted image block DCT of block quantized DCT of block block reconstructed from quantized Bernd Girod: EE38b Image and Video Compression Transform Coding no. 9 Typical DCT coding artifacts DCT coding with increasingly coarse quantization, block size 8x8 quantizer stepsize for AC : 5 quantizer stepsize for AC : quantizer stepsize for AC : Bernd Girod: EE38b Image and Video Compression Transform Coding no.
Adaptive transform coding Input signal Transform Quantization Entropy coding Block classification class Quantization and entropy coding optimized separately for each class. Typical classes: Blocks without detail Horizontal structures Vertical structures Diagonals Textures without preferred orientation Bernd Girod: EE38b Image and Video Compression Transform Coding no. Influence of DCT block size Efficiency as a function of blocksize NxN, measured for 8 bit quantization in the original domain and equivalent quantization in the transform domain G = memoryless entropy of original signal mean entropy of transform Block size 8x8 is a good compromise. Bernd Girod: EE38b Image and Video Compression Transform Coding no.
Fast DCT algorithm I DCT matrix factored into sparse matrices (Arai, Agui, and Nakajima; 988): y = Ax = SPM M M 3 M M 5 M x S S S S = S 3 S S 5 S S 7 P = M = M = M 3 = C C C C C C M = M 5 = M = Bernd Girod: EE38b Image and Video Compression Transform Coding no. 3 Fast DCT algorithm II Signal flow graph for fast (scaled) 8-DCT according to Arai, Agui, Nakajima: x s y scaling x x x 3 a s s s y y y only 5 + 8 multiplications x x 5 a a 3 s 5 s s 7 s 3 y 5 y y 7 y 3 (direct matrix multiplication: multiplications) x a x 7 a 5 Multiplication: u m m u Addition: u v u v u+v u-v a = C a = C C a 3 = C a = C + C a 5 = C s = s k = C k k =,..., 7 C k = cos π k Bernd Girod: EE38b Image and Video Compression Transform Coding no.
Transform coding: summary Orthonormal transform: rotation of coordinate system in signal space Purpose of transform: decorrelation, energy concentration KLT is optimum, but signal dependent and, hence, without a fast algorithm DCT shows reduced blocking artifacts compared to DFT Bit allocation proportional to logarithm of variance Threshold coding + zig-zag-scan + 8x8 block size is widely used today (e.g. JPEG, MPEG, ITU-T H.3) Fast algorithm for scaled 8-DCT: 5 multiplications, 9 additions Bernd Girod: EE38b Image and Video Compression Transform Coding no. 5 3