4x4 Transform and Quantization in H.264/AVC

Similar documents
CHAPTER 3. Implementation of Transformation, Quantization, Inverse Transformation, Inverse Quantization and CAVLC for H.

Converting DCT Coefficients to H.264/AVC

THE newest video coding standard is known as H.264/AVC

The MPEG4/AVC standard: description and basic tasks splitting

h 8x8 chroma a b c d Boundary filtering: 16x16 luma H.264 / MPEG-4 Part 10 : Intra Prediction H.264 / MPEG-4 Part 10 White Paper Reconstruction Filter

Analysis of Integer Transformation and Quantization Blocks using H.264 Standard and the Conventional DCT Techniques

AN ENHANCED EARLY DETECTION METHOD FOR ALL ZERO BLOCK IN H.264

Lecture 9 Video Coding Transforms 2

H.264 / MPEG-4 Part 10 : Intra Prediction

Direction-Adaptive Transforms for Coding Prediction Residuals

Detailed Review of H.264/AVC

Enhanced SATD-based cost function for mode selection of H.264/AVC intra coding

LOSSLESS INTRA CODING IN HEVC WITH INTEGER-TO-INTEGER DST. Fatih Kamisli. Middle East Technical University Ankara, Turkey

Context-adaptive coded block pattern coding for H.264/AVC

Motion Vector Prediction With Reference Frame Consideration

Intra Frame Coding for Advanced Video Coding Standard to reduce Bitrate and obtain consistent PSNR Using Gaussian Pulse

Waveform-Based Coding: Outline

Product Obsolete/Under Obsolescence. Quantization. Author: Latha Pillai

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

A Bit-Plane Decomposition Matrix-Based VLSI Integer Transform Architecture for HEVC

H.264/MPEG4 Part INTRODUCTION Terminology

VIDEO CODING USING A SELF-ADAPTIVE REDUNDANT DICTIONARY CONSISTING OF SPATIAL AND TEMPORAL PREDICTION CANDIDATES. Author 1 and Author 2

A DISTRIBUTED VIDEO CODER BASED ON THE H.264/AVC STANDARD

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Prediction-Guided Quantization for Video Tone Mapping

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

SSIM-Inspired Perceptual Video Coding for HEVC

CSE 408 Multimedia Information System Yezhou Yang

MODERN video coding standards, such as H.263, H.264,

Introduction to Video Compression H.261

Multimedia & Computer Visualization. Exercise #5. JPEG compression

Deterministic sampling masks and compressed sensing: Compensating for partial image loss at the pixel level

IMPROVED INTRA ANGULAR PREDICTION BY DCT-BASED INTERPOLATION FILTER. Shohei Matsuo, Seishi Takamura, and Hirohisa Jozawa

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction

Lecture 7 Predictive Coding & Quantization

High Throughput Entropy Coding in the HEVC Standard

Bit Rate Estimation for Cost Function of H.264/AVC

1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009

AN IMPROVED CONTEXT ADAPTIVE BINARY ARITHMETIC CODER FOR THE H.264/AVC STANDARD

SYDE 575: Introduction to Image Processing. Image Compression Part 2: Variable-rate compression

Image Compression - JPEG

arxiv: v1 [cs.mm] 10 Mar 2016

CSE 126 Multimedia Systems Midterm Exam (Form A)

Modeling of DCT Coefficients in the H.264 Video Encoder-Edición Única

2. the basis functions have different symmetries. 1 k = 0. x( t) 1 t 0 x(t) 0 t 1

Efficient Large Size Transforms for High-Performance Video Coding

Real-Time Audio and Video

Numerical Analysis. Carmen Arévalo Lund University Arévalo FMN011

Direct Conversions Between DV Format DCT and Ordinary DCT

Video Coding With Linear Compensation (VCLC)

Survey of JPEG compression history analysis

HM9: High Efficiency Video Coding (HEVC) Test Model 9 Encoder Description Il-Koo Kim, Ken McCann, Kazuo Sugimoto, Benjamin Bross, Woo-Jin Han

Transform coding - topics. Principle of block-wise transform coding

1 Overview. Coding flow

Achieving H.264-like compression efficiency with distributed video coding

Scalable resource allocation for H.264 video encoder: Frame-level controller


Transform Coding. Transform Coding Principle

6. H.261 Video Coding Standard

IMAGE COMPRESSION-II. Week IX. 03/6/2003 Image Compression-II 1

Intra Prediction by a linear combination of Template Matching predictors

LORD: LOw-complexity, Rate-controlled, Distributed video coding system

Image Compression. Fundamentals: Coding redundancy. The gray level histogram of an image can reveal a great deal of information about the image

on a per-coecient basis in large images is computationally expensive. Further, the algorithm in [CR95] needs to be rerun, every time a new rate of com

Multimedia Networking ECE 599

A VC-1 TO H.264/AVC INTRA TRANSCODING USING ENCODING INFORMATION TO REDUCE RE-QUANTIZATION NOISE

Lossless Image and Intra-frame Compression with Integer-to-Integer DST

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB)

A TWO-STAGE VIDEO CODING FRAMEWORK WITH BOTH SELF-ADAPTIVE REDUNDANT DICTIONARY AND ADAPTIVELY ORTHONORMALIZED DCT BASIS

Image Compression. Universidad Politécnica de Madrid Dr.-Ing. Henryk Richter

A Study of Embedding Operations and Locations for Steganography in H.264 Video

arxiv: v1 [cs.mm] 2 Feb 2017 Abstract

Adaptive Quantization Matrices for HD and UHD Display Resolutions in Scalable HEVC

Number Representation and Waveform Quantization

Image Data Compression

Rounding Transform. and Its Application for Lossless Pyramid Structured Coding ABSTRACT

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256

8/13/16. Data analysis and modeling: the tools of the trade. Ø Set of numbers. Ø Binary representation of numbers. Ø Floating points.

Compressible Motion Fields

Reduced-Error Constant Correction Truncated Multiplier

Order Adaptive Golomb Rice Coding for High Variability Sources

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Human Visual System Based Adaptive Inter Quantization

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

Fast Intra Coding Method of H.264 for Video Surveillance System

This appendix provides a very basic introduction to linear algebra concepts.

The training of Karhunen Loève transform matrix and its application for H.264 intra coding

Compression. Reality Check 11 on page 527 explores implementation of the MDCT into a simple, working algorithm to compress audio.

Fast Progressive Wavelet Coding

Modelling of produced bit rate through the percentage of null quantized transform coefficients ( zeros )

The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal

Department of Electrical Engineering, Polytechnic University, Brooklyn Fall 05 EL DIGITAL IMAGE PROCESSING (I) Final Exam 1/5/06, 1PM-4PM

Compressed Sensing Using Reed- Solomon and Q-Ary LDPC Codes

(12) Patent Application Publication (10) Pub. No.: US 2009/ A1

The Karhunen-Loeve, Discrete Cosine, and Related Transforms Obtained via the Hadamard Transform

Neural network based intra prediction for video coding

Progressive Wavelet Coding of Images

Modulo Transforms an Alternative to Lifting

(12) United States Patent

A Complete Video Coding Chain Based on Multi-Dimensional Discrete Cosine Transform

Transcription:

Video compression design, analysis, consulting and research White Paper: 4x4 Transform and Quantization in H.264/AVC Iain Richardson / VCodex Limited Version 1.2 Revised November 2010

H.264 Transform and Quantization 1 Overview In an H.264/AVC codec, macroblock data are transformed and quantized prior to coding and rescaled and inverse transformed prior to reconstruction and display (Figure 1). Several transforms are specified in the H.264 standard: a 4x4 core transform, 4x4 and 2x2 Hadamard transforms and an 8x8 transform (High profiles only). Figure 1 Transform and quantization in an H.264 codec This paper describes a derivation of the forward and inverse transform and quantization processes applied to 4x4 blocks of luma and chroma samples in an H.264 codec. The transform is a scaled approximation to a 4x4 Discrete Cosine Transform that can be computed using simple integer arithmetic. A normalisation step is incorporated into forward and inverse quantization operations. 2 The H.264 transform and quantization process The inverse transform and re-scaling processes, shown in Figure 2, are defined in the H.264/AVC standard. Input data (quantized transform coefficients) are re-scaled (a combination of inverse quantization and normalisation, see later). The re-scaled values are transformed using a core inverse transform. In certain cases, an inverse transform is applied to the DC coefficients prior to re-scaling. These processes (or their equivalents) must be implemented in every H.264-compliant decoder. The corresponding forward transform and quantization processes are not standardized but suitable processes can be derived from the inverse transform / rescaling processes (Figure 3). Iain Richardson/Vcodex Ltd 2009-2010 Page 2 of 13

Figure 2 Re-scaling and inverse transform Figure 3 Forward transform and quantization 3 Developing the forward transform and quantization process The basic 4x4 transform used in H.264 is a scaled approximate Discrete Cosine Transform (DCT). The transform and quantization processes are structured such that computational complexity is minimized. This is achieved by reorganising the processes into a core part and a scaling part. Consider a block of pixel data that is processed by a two-dimensional Discrete Cosine Transform (DCT) followed by quantization (dividing by a quantization step size, Q step, then rounding the result) (Figure 4a). Rearrange the DCT process into a core transform (C f ) and a scaling matrix (S f ) (Figure 4b). Scale the quantization process by a constant (2 15 ) and compensate by dividing and rounding the final result (Figure 4c). Combine S f and the quantization process into M f (Figure 4d), where: M f S f 2 15 Q step Equation 1 (The reason for the use of will be explained later). Iain Richardson/Vcodex Ltd 2009-2010 Page 3 of 13

Figure 4 Development of the forward transform and quantization process 4 Developing the rescaling and inverse transform process Consider a re-scaling (or inverse quantization ) operation followed by a twodimensional inverse DCT (IDCT) (Figure 5a). Rearrange the IDCT process into a core transform (C i ) and a scaling matrix (S i ) (Figure 5b). Scale the re-scaling process by a constant (2 6 ) and compensate by dividing and rounding the final result (Figure 5c) 1. Combine the re-scaling process and S into V i (Figure 5d), where: V i = S i 2 6 Q step Equation 2 1 This rounding operation need not be to the nearest integer. Iain Richardson/Vcodex Ltd 2009-2010 Page 4 of 13

Figure 5 Development of the rescaling and inverse transform process 5 Developing C f and S f (4x4 blocks) Consider a 4x4 two-dimensional DCT of a block X: Y = A X A T Equation 3 Where indicates matrix multiplication and: a a a a b c c b A = a a a a c b b c, a = 1 2 b = 1 2 cos π 8 = 0.6532... c = 1 2 cos 3π 8 = 0.2706... The rows of A are orthogonal and have unit norms (i.e. the rows are orthonormal). Calculation of Equation 3 on a practical processor requires approximation of the irrational numbers b and c. A fixed-point approximation is equivalent to scaling each row of A and rounding to the nearest integer. Choosing a particular approximation (multiply by 2.5 and round) gives C f : Iain Richardson/Vcodex Ltd 2009-2010 Page 5 of 13

1 1 1 1 2 1 1 2 C f = 1 1 1 1 1 2 2 1 This approximation is chosen to minimise the complexity of implementing the transform (multiplication by C f requires only additions and binary shifts) whilst maintaining good compression performance. The rows of C f have different norms. To restore the orthonormal property of the 1 original matrix A, multiply all the values c ij in row r by : c 2 rj 1 2 1 2 1 2 1 2 1 10 1 10 1 10 1 10 A 1 = C f R f where R f = 1 2 1 2 1 2 1 2 1 10 1 10 1 10 1 10 j denotes element-by-element multiplication (Hadamard-Schur product 2 ). Note that the new matrix A 1 is orthonormal. The two-dimensional transform (Equation 3) becomes: Y = A 1 X A 1 T = [C f R f ] X [C f T R f T ] Rearranging: Y = [C f X C f T ] [R f R f T ] = [C f X C f T ] S f Where 1 4 1 2 10 1 4 1 2 10 T 1 2 10 1 10 1 2 10 1 10 S f = R f R f = 1 4 1 2 10 1 4 1 2 10 1 2 10 1 10 1 2 10 1 10 2 P = Q R means that each element p ij = q ij r ij Iain Richardson/Vcodex Ltd 2009-2010 Page 6 of 13

6 Developing C i and S i (4x4 blocks) Consider a 4x4 two-dimensional IDCT of a block Y: Z = A T Y A Equation 4 Where a a a a b c c b A = a a a a c b b c, a = 1 2 b = 1 2 cos π 8 = 0.6532... as before. c = 1 2 cos 3π 8 = 0.2706... Choose a particular approximation by scaling each row of A and rounding to the nearest 0.5, giving C i : 1 1 1 1 1 1 2 1 2 1 C i = 1 1 1 1 1 2 1 1 1 2 The rows of C i are orthogonal but have non-unit norms. To restore orthonormality, 1 multiply all the values c ij in row r by : c 2 rj 1 2 1 2 1 2 1 2 2 5 2 5 2 5 2 5 A 2 = C i R i where R i = 1 2 1 2 1 2 1 2 2 5 2 5 2 5 2 5 j The two-dimensional inverse transform (Equation 4) becomes: Z = A 2 T Y A 2 = [C i T R i T ] Y [C i R i ] Rearranging: Z = [C i T ] [Y R i T R i ] [ C i ] = [C i T ] [Y S i ] [ C i ] Iain Richardson/Vcodex Ltd 2009-2010 Page 7 of 13

Where 1 4 1 10 1 4 1 10 T 1 10 2 5 1 10 2 5 S i = R i R i = 1 4 1 10 1 4 1 10 1 10 2 5 1 10 2 5 The core inverse transform C i and the rescaling matrix V i are defined in the H.264 standard. Hence we now develop V i and will then derive M f. 7 Developing V i From Equation 2, V i = S i Q step 2 6 H.264 supports a range of quantization step sizes Q step. The precise step sizes are not defined in the standard, rather the scaling matrix V i is specified. Q step values corresponding to the entries in V i are shown in the following Table. QP Q step 0 0.625 1 0.702.. 2 0.787.. 3 0.884.. 4 0.992.. 5 1.114.. 6 1.250 12 2.5 18 5.0 48 160 51 224 6 The ratio between successive Q step values is chosen to be 2 = 1.2246... so that Q step doubles in size when QP increases by 6. Any value of Q step can be derived from the first 6 values in the table (QP0 QP5) as follows: Q step (QP) = Q step (QP%6) 2 floor(qp/6) The values in the matrix V i depend on Q step (hence QP) and on the scaling factor matrix S i. These are shown for QP 0 to 5 in the following Table. Iain Richardson/Vcodex Ltd 2009-2010 Page 8 of 13

QP Q step 2 6 V i = round ( S i Q step 2 6 ) 0 40 10 13 10 13 13 16 13 16 10 13 10 13 13 16 13 16 1 44.898 11 14 14 18 11 14 14 18 2 50.397 13 16 16 20 13 16 16 20 3 56.569 14 18 18 23 14 18 18 23 4 63.496 16 20 20 25 16 20 20 25 5 71.272 18 23 23 29 18 23 23 29 11 14 14 18 11 14 14 18 13 16 16 20 13 16 16 20 14 18 18 23 14 18 18 23 16 20 20 25 16 20 20 25 18 23 23 29 18 23 23 29 For higher values of QP, the corresponding values in V i are doubled (i.e. V i (QP=6) = 2V i (QP=0), etc). Note that there are only three unique values in each matrix V i. These three values are defined as a table of values v in the H.264 standard, for QP=0 to QP=5 : Iain Richardson/Vcodex Ltd 2009-2010 Page 9 of 13

Table 1 Matrix v defined in H.264 standard QP v (r, 0): V i positions (0,0), (0,2), (2,0), (2,2) v (r, 1): V i positions (1,1), (1,3), (3,1), (3,3) 0 10 16 13 1 11 18 14 2 13 20 16 3 14 23 18 4 16 25 20 5 18 29 23 v (r, 2): Remaining V i positions Hence for QP values from 0 to 5, V i is obtained as: v (QP,0) v(qp,2) v(qp,0) v (QP,2) v (QP,2) v(qp,1) v(qp,2) v (QP,1) V i = v (QP,0) v(qp,2) v(qp,0) v (QP,2) v (QP,2) v(qp,1) v(qp,2) v (QP,1) Denote this as: V i = v(qp, n) Where v (r,n) is row r, column n of v. For larger values of QP (QP>5), index the row of array v by QP%6 and then multiply by 2 floor(qp/6). In general: V i = v (QP%6,n) 2 floor(qp/6) The complete inverse transform and scaling process (for 4x4 blocks in macroblocks excluding 16x16-Intra mode) becomes: Z = round ( [C i T ] [Y v (QP%6,n) 2 floor(qp/6) ] [ C i ] 1 2 6 ) (Note: rounded division by 2 6 can be carried out by adding an offset and right-shifting by 6 bit positions). 8 Deriving M f Combining Equation 1 and Equation 2: Iain Richardson/Vcodex Ltd 2009-2010 Page 10 of 13

M f S i S f 2 21 V i S i, S f are known and V i is defined as described in the previous section. Define M f as: M f = round S i S f 2 21 V i 131072 104857.6 S i S f 2 21 104857.6 83886.1 = 131072 104857.6 104857.6 83886.1 131072 104857.6 104857.6 83886.1 131072 104857.6 104857.6 83886.1 The entries in matrix M f may be calculated as follows (Table 2): Table 2 Tables v and m QP v (r, 0): V i positions (0,0), (0,2), (2,0), (2,2) v (r, 1): V i positions (1,1), (1,3), (3,1), (3,3) v (r, 2): Remaining V i positions m (r, 0): M f positions (0,0), (0,2), (2,0), (2,2) m (r, 1): M f positions (1,1), (1,3), (3,1), (3,3) m (r,2): Remaining M f positions 0 10 16 13 13107 5243 8066 1 11 18 14 11916 4660 7490 2 13 20 16 10082 4194 6554 3 14 23 18 9362 3647 5825 4 16 25 20 8192 3355 5243 5 18 29 23 7282 2893 4559 Hence for QP values from 0 to 5, M f can be obtained from m, the last three columns of Table 2: m(qp,0) m(qp,2) m(qp,0) m(qp,2) m(qp,2) m(qp,1) m(qp,2) m(qp,1) Mf = m(qp,0) m(qp,2) m(qp,0) m(qp,2) m(qp,2) m(qp,1) m(qp,2) m(qp,1) Denote this as: M f = m(qp, n) Where m (r,n) is row r, column n of m. For larger values of QP (QP>5), index the row of array m by QP%6 and then divide by 2 floor(qp/6). In general: M f = m (QP%6,n)/ 2 floor(qp/6) Iain Richardson/Vcodex Ltd 2009-2010 Page 11 of 13

Where m (r,n) is row r, column n of m. The complete forward transform, scaling and quantization process (for 4x4 blocks and for modes excluding 16x16-Intra) becomes: Y = round ( [C f ] [X] [ C f T ] m (QP%6,n)/ 2 floor(qp/6) ] 1 2 15 ) (Note: rounded division by 2 15 may be carried out by adding an offset and rightshifting by 15 bit positions). 9 Further reading ITU-T Recommendation H.264, Advanced Video Coding for Generic Audio-Visual Services, November 2007. H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-complexity transform and quantization in H.264/AVC, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 598 603, July 2003. T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, No. 7. (2003), pp. 560-576. I. Richardson, The H.264 Advanced Video Compression Standard, John Wiley & Sons, May 2010. See http:///links.html for links to further resources on H.264 and video compression. Acknowledgement I would like to thank Gary Sullivan for suggesting a treatment of the H.264 transform and quantization processes along these lines and for his helpful comments on earlier drafts of this document. About the author As a researcher, consultant and author working in the field of video compression (video coding), my books on video codec design and the MPEG-4 and H.264 standards are widely read by engineers, academics and managers. I advise companies on video coding standards, design and intellectual property and lead the Centre for Video Communications Research at The Robert Gordon University in Aberdeen, UK and the Fully Configurable Video Coding research initiative. Iain Richardson/Vcodex Ltd 2009-2010 Page 12 of 13

Using the material in this document This document is copyright you may not reproduce the material without permission. Please contact me to ask for permission. Please cite the document as follows: Iain Richardson, 4x4 Transform and Quantization in H.264/AVC, VCodex Ltd White Paper, April 2009, http:/// Iain Richardson/Vcodex Ltd 2009-2010 Page 13 of 13