Basic Principles of Video Coding

Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion estimation

Basic Principles of Video Coding Analysis/ Source Encoding Encoder Lossy Quantization Lossless Binary Encoding Source model Quantizer parameters Parameter statistics Channel Noise Synthesis/ Source Decoding Dequantization Binary Decoding Decoder Fig. 3. A typical video coding system.

Basic Principles of Video Coding Analysis/Synthesis & Source Encoding/Decoding The source model used in the the analysis/synthesis part of the video coding systems may make assumptions about the spatial and temporal correlation between pixels of a sequence. It might also consider the shape and motion of objects or illumination effects. If the source model consists of statistically independent pixels, the parameters of this model would be luminance and chrominance amplitudes. However, if we use an object model, the parameters would be shape, texture and motion of individual objects. Depending on the source model, different source coding scheme can be employed. 3

Basic Principles of Video Coding Quantization/Dequantization The parameters of the source model are quantized into a finite set of symbols. The quantized parameters depend on the desired trade off between the bit rate and distortion. Binary Encoding/Decoding The quantized parameters are finally mapped into binary codewords using lossless coding techniques. The resulting bit stream is transmitted over the communication channel where noise may corrupt it. The decoder performs the reverse processes, i.e., binary decoding, dequantization and systhesis. 4

Categories of Source Coding Schemes Waveform Coding Assuming statistically independence between pixels, the simplest waveform coding technique is the pulse code modulation (PCM). It is not used because of its inefficiency. Predictive coding exploits the correlation between adjacent pixels by coding the prediction errors between the predicted pixels and the pixels to be coded. To exploit the correlation within a block of pixels, the pixel block is transformed using unitary transforms, such as Karhunen-Loeve (KLT), discrete cosine (DCT), or discrete wavelet (DWT) transforms. The transform serves to decorrelate the original pixels and concentrate the signal energy into a few coefficients, which are then coded. 5

Categories of Source Coding Schemes Content-based Coding The block-based coding techniques approximate the shape of objects in a scene with square blocks of fixed size which results in high prediction errors in boundary blocks. Content-based coding techniques segment a video frame into regions corresponding to different objects and code those objects separately. In object-based analysis-synthesis coding, the objects are segmented and described by individual object models. The shape of an object is described by a -D silhouette; the motion by a motion vector field; and the texture by its color waveform. The decoder synthesizes the object using the current shape and motion parameters as well as the color parameters from the preceding frame. 6

Categories of Source Coding Schemes Model-based Coding If the object type in a video sequence is known, e.g., a human head, we can use a specially designed wireframe model to describe the object. The approach is called model-based coding which is highly efficient as it adapts to the shape of the object. In semantic coding, the object type as well as the parameters describing its behaviors, e.g., facial expressions, are coded and transmitted. This coding scheme has the potential of achieving very high coding efficiency, because the number of possible behaviors is small, hence the number of bits required to specify them is correspondingly small. 7

Categories of Source Coding Schemes Table 3. Comparison of source models, parameter sets and coding techniques. Source Model Encoding Parameters Coding Technique Statistically independent pixels Statistically dependent pixels Translationally moving blocks Unknown moving objects Known moving objects Known moving objects with known behavior Color of each pixel Color of each block Color and motion vector of each block Shape, motion and color of each object Shape, motion and color of each known object Shape, color and behavior of each object PCM Transform coding, predictive coding and vector quantization Block-based hybrid coding Object-based analysissynthesis coding Model-based (or knowledge-based) coding Semantic coding 8

Source Model Information Theory Consider a discrete-time and ergodic source which generates sequences {x(n)} of N source symbols. The sequences can be considered as realization of random sequences {X(n)} with random variables X(n) assuming a value of amplitude, k,,..., K. x k The source is a discrete-amplitude source if the size of x k is finite, otherwise it is a continuous-amplitude source. The source is memoryless if successive samples are statistically independent. A sequence is ergodic if its ensemble average is equal to time average. A sequence is ergodic implies it is stationary. 9

Source Model Source Entropy For discrete-amplitude memoryless sources, the entropy is H ( X ) K k P( x k )log P( x k ) and for discrete-amplitude sources with memory, H ( X ) lim... P( x)log P( x) N N all x where x is a vector of N successive samples x(n), x(n+),., x(n+n). H ( X ) < H ( X ) log with memory without memory K

Source Model The source redundancy as given by R( X ) log K H ( X ) is due to two reasons: a non-uniform distribution of probabilities and the presence of memory. For a memoryless source with ( x ) / K, H ( X ) log K and the redundancy is zero. P k For continuous-amplitude memoryless sources, h ( X ) px ( x)log px ( x) dx and for continuous-amplitude sources with memory, h ( X ) lim... px ( x)log N N p x ( x) dx

Source Model It can be shown that for a Gaussian pdf with a given source variance σ X h( X ) with memory < h( X ) without memory (π eσ ) Note that h(x) are formally called differential entropies, and that it is useful to define an entropy power log x which has a maximum value equal to Gaussian source. Q h( X πe ) σ X for a memoryless

Source Entropy..8.6.4...8.6.4. Amplitude Fig. 3. Frame of Calendar and its histogram. (Entropy 7.67 bits/pixel) 3

Source Entropy...8.6.4. Amplitude Fig. 3.3 Frame-difference image between frames and of Calendar and its histogram. (Entropy 5.3 bits/pixel) 4

Rate Distortion Theory Encoder Source x y z Source Encoder Channel Encoder Channel Decoder Destination x~ Source Decoder y~ Channel Decoder z~ Fig. 3.4 A typical communication system. 5

Rate Distortion Theory In Figure 3.4, the source produces N-tuples, or blocks of N pixels x that the source coder uses to produce symbols y. The channel coder converts the symbols y into channel symbols z that the channel transmits. The channel is characterised by having a capacity of C bits per pixel or NC bits per channel symbol. The channel decoder receives z~ and converts it to ~ y, which the source decoder uses to produce x~ that are then made available to the destination. For a given P(z), the mutual information is defined as where P( ~ z z) is the conditional probability and I( z, ~ z ) is a measure of the amount of information about z~ that is conveyed by the channel. I( z, ~ z ) P( z) P( ~ z z)log z, ~ z P( ~ z z) bits/symbol P( ~ z ) 6

Rate Distortion Theory Shannon has shown that the channel capacity is defined as C N max I( z, ~ z ) P( z) bits/pixel where the maximization is over all possible distributions on the channel symbols z. Example: If the channel is noiseless, i.e., P( ~ z z),, ~ z z otherwise then, I ( z, ~ z ) H ( z) bits/symbol. 7

Rate Distortion Theory That is, the channel conveys all of the information about z to. If the set {z} has K members and they are all equally likely, I( z, ~ z ) is maximized. z~ In this case, C N log K bits/pixel or K NC. If the channel is noisy, the bit rate will be less than C. However, transmission at near C is still possible by employing errorcorrection coding in the channel encoder at the expense of transmission delay between the input and output. 8

Rate Distortion Theory For distortionless communication, i.e., ~ x, x NC H ( y) H ( x) bits/block. In most video applications, distortionless communication is generally not a requirement. Distortion is tolerated as long as it is not perceivable subjectively. Since the channel capacity needs only to satisfy NC H (y), considerable savings in bit rate may be possible if H ( y) < H ( x). In this case, the source coding is irreversible, and information is lost. 9

Rate Distortion Theory As ~ x x, we define a distortion measure, d( x, ~ x ), and the average distortion is d E[ d( x, ~ x )]. If the conditional probability distribution between input and output is P( ~ x x), the mutual information is then given by I( x, x~ ) P( x) P( x ~ x)log x, x~ P( ~ x x) P( ~ x) In lossy coding, an acceptable distortion threshold D is determined such that d P( x, ~ x ) d( x, ~ x ) x, ~ x D

Rate Distortion Theory The rate-distortion function is then defined as R( D) N min I( x, ~ x ) P( ~ x x) bits/pixel R(D) gives the lower bound on the channel capacity required to achieve d D. For a memoryless zero-mean Gaussian source with variance and mean square error criterion σ x R( D) G log i.e., no information needs to be transmitted if σ D x,, D σ D σ x x D σ x.

Video Coding Standards - Overview CC h t q Video in T Q VLC v Q T ME + F P m f T Q P Transform Quantiser Picture memory F Loop filter CC Coding control ME Motion estimation VLC Variable length coder h Flag for INTRA/INTER t Flag for transmitted or not Fig. 3.5 A generic video standard encoder. q v m f Quantization parameters Coded bit stream Motion vector Loop filter on/off

Video Coding Standards - Overview Side information Buffer Demultiplexer Variable Length Decoder Inverse Quantization Inverse Transform + Side information Motion Compensation Fig. 3.6 A generic video standard decoder. 3

Video Coding Standards - Overview All video coding standards, such as H.6, H.63, H.64, MPEG-, MPEG- and MPEG-4 employ block-based hybrid predictive transform coding technique. The image frame is sub-divided into blocks of fixed size. Each block is motion-compensated from the previous frame resulting in a predicted image. The encoder subtracts this predicted image from the original image and transmits the prediction error. If the prediction is inaccurate, i.e., the prediction error exceeds a threshold, the block of pixels instead of the prediction errors is transformed. Motion vectors need to be transmitted separately so that the decoder can perform the same motion-compensation to reconstruct the image block. 4

Predictive Coding In predictive coding, the pixel itself is not coded; instead its value is predicted from the neighbouring pixels in the same frame or in the previous frame to exploit the correlation that exists between adjacent pixels. Fig. 3.4 shows the block diagram of a generic lossy predictive coding system. In the encoder, an input sample s is first predicted from the previously reconstructed samples ŝ stored in the memory to form the predicted pixel s p. The prediction error e p is then quantized and coded using a variable-length coder. The decoder structure resembles the prediction loop of the encoder, where the reconstructed samples ŝ give the decoder output. 5

Predictive Coding Fig. 3.4 A lossy predictive coding system. 6

Predictive Coding Error Analysis of Predictive Coding From Fig. 3.4, the prediction error e p is The reconstructed samples ŝ is given by where ê p p s p is the quantized prediction error, i.e., where e q is the quantization error. e s s ˆ s p + eˆ p e ˆ e + e p p q.,, From the above three equations, we can show that The above relation states that the coding error between the input and output of the predictive coder is the quantization error. s ˆ s eq. 7

Predictive Coding Optimal Linear Predictor Design Let the current pixel be s and s k, k,,..., K, the previous pixels which are used to predict s. In linear prediction, the predicted value for s is given by s p K k a k s k where a k are called the prediction coefficients. K is the order of prediction. To determine the prediction coefficients, we minimize the mean square error (MSE) between s and s p. 8

Predictive Coding Let S k be the random variables (RVs) corresponding to s k ; and S p be the RVs corresponding to s p. The MSE is σ p E p k [ ] K S S E S a S. Minimzing E[ ] with respect to the prediction coefficients a k, i.e., finding σ p / a, we have m k k E S K ak Sk Sm, m,,..., K k. The above relation is the orthogonality principle for the linear minimum MSE estimator, which states that the prediction error is orthogonal to each past sample used for prediction. 9

3 Predictive Coding Let R(k,m) E[S k S m ] be the correlation between S k and S m. The above equation can thus rewritten as or in matrix form, or which gives K, m m R m k R a K k k...,,,, ) (, ), ( ) (, (,) (,) ), ( ) (, ) (,,) ( (,) (,),) ( (,) (,) K R R R a a a K K R K R K R K R R R K R R R K, ] [ r a R. ] [ r R a

Predictive Coding Thus, the minimum MSE is σ p [( S S ) S ] K E p R(, ) k a k R( k, ) R(, ) r R(, ) r T T a [ R] r For a stationary source, the autocorrelation of a pixel is a constant, independent of its spatial location, i.e., R ( m, m) R(,), m,,..., K. Furthermore, the correlations between two pixels are symmetrical, i.e., R ( k, m) R( m, k). Exercise: Show that the minimum MSE is σ p E[ S S ) S ]. ( p 3

Transform Coding Transform coding is a waveform coding technique whereby the image is sub-divided into non-overlapping blocks of pixels. Each block of pixels is either a vector or an array, depending on whether it is -D or -D transform. It is then transformed by a unitary transformation matrix A to obtain a block of uncorrelated transform coefficients, so that a large fraction of its total energy is packed into relatively few transform coefficients. Therefore the efficiency of a transform is determined by its energy compaction property. The coefficients are quantized separately by scalar quantizers and converted into binary codewords using binary encoding. In the decoder, the quantized coefficients are recovered through an inverse transform. 3

33 Transform Coding -D transform Coding ) ( () () N u u u u T ) ( () () N v v v v ) ( () () N v v v v ) ( () () N u u u u Q N Q Q T - Fig. 3.5 -D transform coding system.

Transform Coding Figure 3.5 depicts the -D transform coding process. It can be expressed conveniently in matrix notation. Let u and v denote the N image vector before and after transformation, respectively. v Tu The transform coefficients, v ( n), n,,..., N, are quantized by a bank of N quantizers optimised based on the statistics of the coefficients, to give v. The inverse transform is - u T v where - T is the N N inverse transformation matrix. For a unitary transformation, the matrix inverse is given by T T - T. 34

Transform Coding If matrix T is real, it is also an orthogonal matrix, i.e., - The problem is to find the optimum matrices T and T such that the overall average mean square distortion D N - T T T E T [(u u) (u u) ] is minimized. This transform is Karhunen-Loeve transform.. 35

Transform Coding -D transform Coding -D transformation can be extended from the -D case, i.e., where U and V denote the N N image matrix before and after transformation, respectively. The inverse transform is T V TUT U T T VT -D transform can be computed as two separable -D transforms, i.e., -D transformation of the rows are performed first, followed by the -D transformation of the columns of the resulting transform coefficients. 36

Transform Coding Karhunen-Loeve Transform (KLT) For a real N image vector u, the basis vectors of the KLT are given by the orthonormalized eigenvectors {w k } and the eigenvalues {λ k } of its covariance matrix R, that is, The KLT of u is defined as and the inverse transform is Rw k λ w, k N. k k T v W * u, u Wv N k v( k) w k where w k is the the kth column of W. 37

38 Transform Coding Using matrix diagonalization, we know that The basis vectors of the KLT of a 8 8 first-order stationary Markov source whose covariance matrix is given below where the correlation coefficient ρ is close to, is shown Fig. 3.6. }. Diag{ *T λ k RW D W 7 7 ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ R

Transform Coding Fig. 3.6 Basis vectors of 8 8 transforms. 39

Transform Coding The KLT even though is optimal in the mean square error sense for a particular image, it is dependent on the statistics of the image data. Hence, there is no general fast algorithm. It can be replaced by sub-optimal transforms, such as the discrete cosine and Hadamard transforms. 4

Discrete Cosine Transform Discrete Cosine Transform (DCT) The N N cosine transform matrix C { c( k, n)}, is defined as c( k, n) / N / N π (n + ) k cos N,, k, n N k N, n N The -D DCT of a sequence { u( n), n N } is defined as v( k) N ( ) α k n π (n + ) k u( n)cos N, k N where α( ) / N, α( k) / N for k N. 4

4 Discrete Cosine Transform The inverse DCT is given by The basis vectors of the 8 8 DCT are shown in Figure 3.6. The -D DCT pair is given by the following equations: Since DCT is a separable transform, the above equations can be evaluated by first transforming each row of U and then transforming each column of the intermediate result to obtain V., ) ( )cos ( ) ( ) ( + N n N k n k v k k u N k π α T ), ( ), ( ), ( ), ( CUC V N m N n n l c n m m u k c l k v * VC C U *T * * ), ( ), ( ), ( ), ( N k N l n l c l k m v k c n m u

Discrete Cosine Transform Fig. 3.7 Basis images of 8 8 DCT. 43

Discrete Cosine Transform Properties of DCT. The DCT is real and orthogonal, i.e., * - C C C C T. It is not the real part of the unitary DFT but related to it. 3. The DCT has a fast algorithm similar to that of the FFT. 4. It has excellent energy compaction for highly correlated data, e.g., image data. 5. The N N DCT is very close to the KLT for a first-order stationary Markov source of length N whose autocorrelation matrix R is given on Slide 38, where the correlation coefficient is close to. 44

Discrete Cosine Transform Variance KLT, DCT HT.. 4 6 8 4 6 Coefficient index Fig. 3.8 Distribution of variances of the transform coefficients (in decreasing order) of a stationary Markov sequence with N 6, ρ.95. 45

Discrete Cosine Transform coefficient 3 coefficients 6 coefficients coefficients 46

Discrete Cosine Transform 5 coefficients coefficients 36 coefficients All (64) coefficients 47

Hadamard Transform Hadamard Transform (HT) The elements of the basis vectors of HT take only binary values ± and are, therefore, well suited for digital signal processing. The n Hadamard transform matrices, H n, are N N matrices, where N, n is an integer. These can be easily generated by the core matrix H and the Kronecker product recursion H n H n H n H n H. N H n H n 48

Hadamard Transform The number of transitions in a basis vector of the HT is called its sequency. The HT of an N vector u is written as v Hu and the inverse transform is given by u Hv where H H n, n log N. The -D HT pair for N N images is obtained by substituting C by H in the DCT transform pair on Slide 4. The basis vectors of the 8 8 HT are shown in Figure 3.6. 49

Hadamard Transform Properties of HT. The HT is real, symmetric, and orthogonal, i.e., * T H H H H -. It has a fast transform. The -D transformation can be implemented in N log N additions. 3. The HT has good to very good energy compaction for highly correlated images. 5

Hadamard Transform Example Consider a data matrix, U 5 9 3 6 4 3 7 5 4 8. 6 The -D transform of U by 4 4 HT where H, Note that H is sequency-ordered. 5

5 Hadamard Transform The Hadamard transform is. is 4 4 4 4 8 8 8 8 8 6 4 6 5 4 3 9 8 7 6 5 4 3 U H V. 8 6 4 34 4 4 4 4 8 8 8 8 8 6 4 T V H V

Transform Coding System U Forward Transform V Zigzag Scanning Quantizer B Encoder U Inverse Transform V Inverse Quantizer B Decoder Fig. 3.9 A typical transform coding system. 53

Transform Coding System. Divide the N M image into non-overlapping blocks of size p q and transform each block to obtain V, i,,..., I, I NM / pq. i. Scan the coefficients in a zig-zag order as illustrated below. The rationale for doing this is that the variances of the coefficients decreases monotonically along the zig-zag scan path. 54

Transform Coding System 3. The coefficients are quantized uniformly or non-uniformly. Note that distortion is introduced in the quantization process which controls the bit rate. 4. The quantized coefficients can further be compressed losslessly by employing entropy coding, such as runlength and Huffman coding. 5. The codewords are then transmitted over the communication channel. 6. In the receiver, the decoder carries out the reverse process. 55

Transform Coding System PSNR (db) 35 3 5 KLT DCT HT 5 5 3 4 5 Bit rate (bpp) Fig. 3. PSNR comparisons of various transform coders for a stationary Markov sequence with ρ.95, and block size of 8 8 pixels. 56