1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009

Size: px

Start display at page:

Download "1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009"

Daniella Strickland
6 years ago
Views:

1 1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER D Order-16 Integer Transforms for HD Video Coding Jie Dong, Student Member, IEEE, King Ngi Ngan, Fellow, IEEE, Chi-Keung Fong, Student Member, IEEE, and Wai-Kuen Cham, Senior Member, IEEE Abstract In this paper, the spatial properties of highdefinition (HD) videos are investigated based on a large set of HD video sequences. Compared with lower resolution videos, the prediction errors of HD videos have higher correlation. Hence, we propose using 2-D order-16 transforms for HD video coding, which are expected to be more efficient to exploit this spatial property, and specifically propose two types of 2-D order-16 integer transforms, nonorthogonal integer cosine transform (ICT) and modified ICT. The former resembles the discrete cosine transform (DCT) and is approximately orthogonal, of which the transform error introduced by the nonorthogonality is proven to be negligible. The latter modifies the structure of the DCT matrix and is inherently orthogonal, no matter what the values of the matrix elements are. Both types allow selecting matrix elements more freely by releasing the orthogonality constraint and can provide comparable performance with that of the DCT. Each type is integrated into the audio and video coding standard (AVS) Enhanced Profile (EP) and the H.264 High Profile (HP), respectively, and used adaptively as an alternative to the 2-D order-8 transform according to local activities. At the same time, many efforts have been devoted to further reducing the complexity of the 2-D order-16 transforms and specially for the modified ICT, a fast algorithm is developed and extended to a universal approach. Experimental results show that 2-D order-16 transforms provide significant performance improvement for both AVS Enhanced Profile and H.264 High Profile, which means they can be efficient coding tools especially for HD video coding. Index Terms AVS, H.264, HDTV, ICT, order-16 transform, VBT. I. Introduction TODAY, high-definition (HD) videos become more and more popular with many applications, such as highdefinition television (HDTV) broadcasting, high density storage media, video-on-demand (VOD), and surveillance. The most common algorithms for HD video coding are mainly based on the hybrid framework of motion compensated prediction (MCP) and discrete cosine transform (DCT). Manuscript received July 13, 2007; revised April 28, First version published July 7, 2009; current version published September 30, This work was supported in part by the Chinese University of Hong Kong, Hong Kong, China under the Focused Investment Scheme, Project , and the Research Grants Council of Hong Kong, Project CUHK This paper was recommended by Associate Editor P. Topiwala. The authors are with the Department of Electronic Engineering, the Chinese University of Hong Kong, Hong Kong, China ( jdong@ee.cuhk.edu.hk; knngan@ee.cuhk.edu.hk; ckfong@ee.cuhk.edu.hk; wkcham@ee.cuhk.edu.hk). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TCSVT /$26.00 c 2009 IEEE MPEG-2 [1] is the most widely used standard for HDTV broadcasting, which uses only basic coding tools, such as the 2-D order-8 DCT, bidirectional motion prediction and compensation, and variable length coding (VLC). Although the efficiency is relatively low, the widespread acceptance of MPEG-2 is mainly attributed to its low complexity and its adoption in consumer electronic products. The latest video coding standard H.264/AVC [2] can bring significant performance improvement compared to all the previous standards. It employs complicated intra and inter prediction tools to achieve accurate prediction and advanced entropy coding schemes to remove statistical redundancy. The recently completed High Profile (HP) [3], aiming at HD video coding, further improves the coding efficiency by variable block size transforms (VBT), where a 2-D order-8 integer cosine transform (ICT) is used adaptively as an alternative to the 2-D order-4 ICT in H.264. The audio and video coding standard (AVS) [4] is the national standard of China. Its Enhanced Profile (EP) [5] also aims at HD video coding in the efficiency aspect and does not provide rich functionalities. So, it is quite similar to the Main Profile of H.264. JPEG 2000 [6] is another attractive choice for HD video coding. Compared with those MCP/DCT-based coding techniques, it provides a wavelet-based framework and offers the richest set of features, such as lossy and lossless coding, spatial and quality scalability, region-of-interest (ROI) coding, and error resilience tools. Although it was originally developed for still image coding and does not remove temporal redundancy of video sequences, JPEG 2000 can address the needs for very demanding HD applications, e.g., digital cinema, and remote medical consultancy. However, the techniques, focusing on improving coding efficiency, are all originally designed as general coding tools for videos of various resolutions. Although efficient when applied to HD videos, they do not fully exploit their properties. Wien and Sun [7] once proposed a VBT scheme to H.264, with seven transform block sizes of 4 4 up to However, this scheme was not proposed for HD video coding and the reported experimental results were based on a set of CIF sequences. The maximum transform block size was finally reduced to 8 8 due to the visible artifacts and the overall complexity [8]. Recently, Naito and Koike [9], and Ma and Kuo [10] have investigated the efficiency of using supermacroblock (MB) for HD video coding within the framework of H.264. Naito and Koike [9] allow the MB size to be adaptively selected from to 2 N 2 N (N 4), where N

2 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1463 may be configured at the picture level. The block sizes of intra prediction and motion compensation are scaled, according to the MB size. The experimental results based on two 4K 2K sequences are reported and significant bit saving can be observed, when the sequences are coded using the fixed QP 45. In [10], the MB size is fixed to be and a 2-D order- 16 integer transform is also proposed and adaptively used in addition to the 2-D order-4 and order-8 ICTs in H.264 HP. Modest PSNR gain is shown based on two 1080p sequences, but subjective improvement is invisible. In this paper, we study the unique properties of HD videos and thus develop special coding tools. First of all, the spatial properties of HD videos and their motion prediction errors are both investigated and compared with those of low resolution videos. It is found that the HD videos are distinguished from other low resolution ones by higher spatial correlation. To exploit the property, we propose using 2-D order-16 transforms for HD video coding. Specifically, two types of 2-D order-16 integer transforms are proposed, both of which can achieve comparable performance with the DCT, if well-designed, and provide the freedom of selecting matrix elements by releasing the orthogonality constraint, such that trade-off can be made between performance and complexity. One type, called nonorthogonal ICT (NICT), allows the basis vectors to be nonorthogonal, in order to use matrix elements with small magnitudes on one hand and maintain the structure of the DCT matrix, e.g., dyadic symmetries and relative magnitudes, on the other hand. The transform error, which makes the average variance of the reconstruction error larger than that of the quantization error due to the nonorthogonality, is analyzed quantitatively. The specific NICT proposed in this paper achieves the balance among the performance, the complexity, and the transform error. The other type, called modified ICT (MICT), is an orthogonal integer transform with the dyadic symmetries of the DCT matrix modified. It is naturally orthogonal, no matter what the values of the elements are. There are some related publications. The one proposed by Wien and Sun [7] has a one-norm transform matrix, which means the norms of all the basis vectors are the same and the memory for storing the scaling matrix can be saved. However, as the cost, this 2-D order-16 integer transform has much lower coding gain [11] than the DCT and introduces ringing artifacts, when integrated into H.264. The one proposed by Ma and Kuo [10] is simple and compatible with the 2-D order-8 ICT in H.264. However, the proposed transform modifies the structure of the DCT matrix so significantly that it is actually equivalent to the combined processes of 2-D order-8 ICTs and 2-D order-2 Hadamard Transforms (HT). In other words, first, the four 8 8 partitions of the input block are transformed by 2-D order-8 ICT, respectively, and then the coefficients at the corresponding positions of the four transformed blocks are transformed by 2-D order-2 HT and thus further de-correlated. Liang et al. [12], [13] proposed a family of approximations of the DCT, named bindct, and the lifting scheme for implementation. The lifting parameters, which are theoretically rational, are replaced by proper dyadic approximations and thus enable the multiplication-free fast algorithm. The trade-off between coding gain and computational complexity can be made by tuning the resolution of the approximation. However, the bindct is not a unitary transform and thus uses the exact inversion, although invertibility is guaranteed by the lifting scheme. As a drawback, matrix multiplication cannot provide the exact same results as the lifting scheme, unless additional shifts are used. The proposed two types of 2-D order-16 transforms are both integrated into AVS EP and H.264 HP, respectively, using 16-bit integer arithmetic and used adaptively as an alternative to the 2-D order-8 ICT in whichever standard they are integrated into. The related problems of transform size selection, intra prediction, entropy coding, and coded block pattern (CBP) are solved. Both 2-D order-16 integer transforms are compatible with the 2-D order-8 ones in the standards. The fast algorithm for the MICT is developed and extended to a general approach. The remainder of this paper is organized as follows. Section II investigates the unique spatial property of HD videos. To fully exploit the property, two types of 2-D order- 16 integer transforms are proposed in Section III. Section IV introduces the efforts we have made to reduce the complexity of the proposed transforms. Section V briefly presents the solutions of the related problems, when the 2-D order- 16 transforms are integrated into the standards. Section VI reports the experimental results, followed by the conclusion in Section VII. II. Spatial Analysis of HD Videos The most useful statistical function for all video coding methods is correlation. Spatial correlation is useful for estimating the energy of the coefficients in the case of unitary transform coding, whilst temporal correlation can help to estimate the prediction error and design the predictor for interframe coder [14]. Our research work starts from analyzing the spatial correlation of raw HD videos. Since all the transforms used for video coding are separable and the spatial property of columns and rows can be taken as equivalent [15], we only focus on the horizontal correlation. In a video frame, each row is regarded as an observation of a discrete stochastic process and the observations in total are the rows in one frame. So each column represents a random variable. In probability theory and statistics, correlation of a stochastic process is often measured by the function, correlation coefficient r, as defined below [16] r(n 1,n 2 )=E[(x(n 1 ) µ 1 ) (x(n 2 ) µ 2 )]/(σ 1 σ 2 ) (1) where n 1 and n 2 are two positions in a certain row, x(n 1 ) and x(n 2 ) are the values of pixels at the two positions with ensemble averages, µ 1 and µ 2, and standard deviations, σ 1 and σ 2, and E is the ensemble average operator. Due to the variable local activities in a frame, r(n, n + τ) depends not only on τ but also on n in different positions. In other words, a video frame is not a collection of samples of a wide-sense stationary (WSS) stochastic process. Hence, for a given τ, r(τ) is a random variable instead of a constant. We calculate r(τ) with τ = 1, which represents the correlation of adjacent pixels. Table I shows the expected value

3 1464 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 TABLE I The Correlation Coefficient of Two Adjacent Pixels in Raw Video Sequences TABLE II The Correlation Coefficients r(1) and r(2) of Prediction Errors of Video Sequences Test Sequence Resolution r(1) µ r1 σ r1 Riverbed Flamingo HD Kayak Fireworks Raven Night HD Crew City FlowerGarden F1 SD Basketball Mobile Foreman News CIF Tempete Paris and standard deviation of r(1). A frame in a video sequence is a still image that can be approximated by the first-order Markov source with correlation coefficient tending to 1.0 [15]. Therefore, as what we expected, for all resolution sequences, the mean values of r(1) approach the upper bound of 1.0 and the variances are very small. Though, overall, r(1) of HD videos is slightly higher than others, the difference is not substantial. However, the transforms in video codecs are usually applied to motion prediction errors instead of original pixels. Even for intra frames, prediction errors are transformed thanks to the intra predictions in the recent video coding standards, such as H.264 and AVS. Therefore, it makes sense to investigate the correlation of prediction errors, which influence coding strategies directly. In this paper, the prediction errors are obtained using the criterion of rate distortion (R D) cost [17] and only the horizontal correlation is analyzed, as the correlation equivalence of two spatial directions still holds for prediction errors, according to our experimental results. Table II shows the expected values, µ r1 and µ r2,ofr(1) and r(2) of prediction errors. Generally, the correlation of prediction errors increases with the resolution, and the difference of r(1) values between HD and low resolution videos is substantial. The mean values of r(2), the correlation of every other pixel, are relatively high for HD videos, but approach 0 for most of the low resolution videos. Therefore, as evident in Table II, the prediction errors of HD videos are distinguished from the low resolution ones by higher spatial correlation. However, this conclusion does not hold in some extreme cases. For example, if a human face in sequence Crew was captured by a CIF video, the spatial correlation of the CIF would be higher than the HD. However, based on a large set of Test Sequence Resolution r(1) r(2) µ r1 σ r1 µ r2 σ r2 Riverbed Flamingo Kayak HD Tractor Pedestrians Station Fireworks Raven Optis Night HD Crew Harbour City FlowerGarden F1 SD Basketball Mobile SD Foreman News CIF Tempete Mobile Paris HD videos, the conclusion is useful and quite instructive for studying the tools specially for coding HD videos. For a given bit rate, the MSE produced by transform coding improves with the block size, but the improvement is not significant as the block size increases beyond [18]. In other words, compared with order-8 transforms, order- 16 transforms compact energy to a smaller proportion of coefficients, which are more likely to survive the quantization. In practice, for image or video coding, the performance of 2-D order-16 transform suffers nonoptimal entropy coding (typically run-length coding), as it is difficult to accurately model the distribution of a run, which has a much larger dynamic range than that produced by order-8 transforms and dramatically varies throughout video sequences. Besides, order-16 transform is much more computationally complex, so order-8 transform is a good trade-off in the current image and video coding standards. However, based on the aforementioned observation that the prediction errors of HD videos have higher correlation, 2-D order-16 transforms deserve more study as a special coding tool, as the transform coefficients have high probabilities to be distributed in the low frequency domain and thus run can be modeled more accurately. Table II also shows the standard deviations, σ r1 and σ r2,of r(1) and r(2) of prediction errors, where there is no significant difference between HD and low resolution videos. So, the

4 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1465 Fig. 1. Fig. 2. Flow diagram of traditional ICTs. Flow diagram of the pre-scaled integer transform. spatial correlation of prediction errors varies greatly within an HD video sequence, just like that in low resolution videos. To exploit such a property, VBT should be employed, in which smaller transforms are the complements to the 2-D order-16 one, in order to reduce the complexity and avoid the ringing artifacts especially in more detailed areas. III. The Proposed 2-D Order-16 Integer Transforms A. Review of ICT ICT is generated from the DCT by replacing the realnumbered elements of the DCT matrix with integers while maintaining the structure, such as relative magnitudes and signs, dyadic symmetries, and orthogonality, among the matrix elements [19]. ICT can be implemented using integer arithmetic without mismatch between the encoder and decoder and if well-designed, can provide almost the same compression efficiency as the DCT. Furthermore, ICT has fast algorithms, which can be developed in the similar way for the DCT. The flow diagram of ICT from the input data to the reconstructed data is shown in Fig. 1. As the transform matrix of ICT contains only integers and the norms of basis vectors are much larger than unity, the transformation process shown as (2) is not normalized. Therefore, a normalization process in (3), known as scaling, is required after transformation to ensure the conservation of energy and the reversibility of ICT F n n = T n f n n T T n (2) S n n = F n n //R n n. (3) In (2) and (3), f n n is the input data, T n is the n n transform matrix, S n n is the output of ICT and symbol // indicates that each element of the left matrix is divided by the element at the same position of the right one. The elements in the scaling matrix R n n are derived from the norms of basis vectors of T n as below, where vector m n 1 contains the norms of all basis vectors R n n = m n 1 m T n 1 (4) m n 1 (i) = n 1 Tn 2 (i, j), 0 i<n. (5) j=0 The divisions in (3) are usually approximated by integer multiplication and shift as shown in (6) S n n = F n n P n n >> N (6) where symbol indicates that each element of the left matrix is multiplied by the element at the same position of the right one and >> N means the N-bit right shift. Similarly, for the inverse ICT, the whole process including inverse scaling and inverse transformation can be represented by (7) as follows: f n n =(Tn T (S n n P n n ) T n ) >> N. (7) The block diagram shown in Fig. 1 is further simplified by the pre-scaled integer transform (PIT) (see Fig. 2), which moves the inverse scaling from the decoder to the encoder and combines it with the forward scaling as one single process, named combined scaling. With PIT, the inverse scaling is saved and thus no scaling matrix is needed to be stored, whilst the complexity of encoders remains unchanged. However, the inverse scaling moved to the encoder side can be considered as a frequency weighting matrix to the transformed signals. In order not to alter the energy distribution of the transformed signals significantly, the elements magnitudes in the scaling matrix should be almost the same. In other words, the norms of the basis vectors should be very close to each other according to (4) and (5), which is a key for PIT designing. Interested readers may refer to [20] for more details. B. The Proposed Order-16 NICT The general transform matrix of an order-16 ICT is shown in (8), at the bottom of the page [19], which has alternating even and odd symmetry with respect to the line. The basis vectors with even symmetry form the even part of the transform matrix, T 8e, whereas those with odd symmetry form the odd T 16 = x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 1 x 3 x 5 x 7 x 9 x 11 x 13 x 15 x 15 x 13 x 2 x 6 x 10 x 14 x 14 x 10 x 6 x 2 x 2 x 6 x 3 x 9 x 15 x 11 x 5 x 1 x 7 x 13 x 13 x 7 x 4 x 12 x 12 x 4 x 4 x 12 x 12 x 4 x 4 x 12 x 5 x 15 x 7 x 3 x 13 x 9 x 1 x 11 x 11 x 1 x 6 x 14 x 2 x 10 x 10 x 2 x 14 x 6 x 6 x 14 x 7 x 11 x 3 x 15 x 1 x 13 x 5 x 9 x 9 x 5 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 9 x 5 x 13 x 1 x 15 x 3 x 11 x 7 x 7 x 11 x 10 x 2 x 14 x 6 x 6 x 14 x 2 x 10 x 10 x 2 x 11 x 1 x 9 x 13 x 3 x 7 x 15 x 5 x 5 x 15 x 12 x 4 x 4 x 12 x 12 x 4 x 4 x 12 x 12 x 4 x 13 x 7 x 1 x 5 x 11 x 15 x 9 x 3 x 3 x 9 x 14 x 10 x 6 x 2 x 2 x 6 x 10 x 14 x 14 x 10 x 15 x 13 x 11 x 9 x 7 x 5 x 3 x 1 x 1 x 3 (8)

5 1466 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 part, T 8o. Designing an order-16 ICT is equivalent to designing T 8e and T 8o, and the orthogonality of each part is necessary and sufficient for the orthogonality of the order-16 ICT x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 2 x 6 x 10 x 14 x 14 x 10 x 6 x 2 x 4 x 12 x 12 x 4 x 4 x 12 x 12 x 4 T 8e = x 6 x 14 x 2 x 10 x 10 x 2 x 14 x 6 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 10 x 2 x 14 x 6 x 6 x 14 x 2 x 10 x 12 x 4 x 4 x 12 x 12 x 4 x 4 x 12 x 14 x 10 x 6 x 2 x 2 x 6 x 10 x 14 x 1 x 3 x 5 x 7 x 9 x 11 x 13 x 15 x 3 x 9 x 15 x 11 x 5 x 1 x 7 x 13 x 5 x 15 x 7 x 3 x 13 x 9 x 1 x 11 T 8o = x 7 x 11 x 3 x 15 x 1 x 13 x 5 x 9 x 9 x 5 x 13 x 1 x 15 x 3 x 11 x 7. (10) x 11 x 1 x 9 x 13 x 3 x 7 x 15 x 5 x 13 x 7 x 1 x 5 x 11 x 15 x 9 x 3 x 15 x 13 x 11 x 9 x 7 x 5 x 3 x 1 T 8e is an order-8 ICT. For the compatibility, it is designed to be the scaled version of the transform matrix of the order-8 ICT in H.264 or AVS, according to whichever standard it is integrated into. The scaling factor is 4. In other words, the element set of the even part, {x 0,x 2,...x 12,x 14 }, is equal to {32, 40, 40, 36, 32, 24, 16, 8} and {32, 48, 32, 40, 32, 24, 16, 12}, as proposed to AVS and H.264, respectively. The possible element sets of an orthogonal T 8o, i.e., x 1, x 3,... x 13, x 15, are the solutions of a set of three quadratic equations [21]. The solutions have relatively large magnitudes, i.e., represented by at least 6 bits, and those with small DCT distortion [7] have even larger magnitudes that increase the computational complexity significantly. In this paper, NICT is proposed for the freedom of making trade-off between complexity and performance at the expense of orthogonality. Except orthogonality, NICT preserves all the advantages of ICT, such as bit-exact implementation and the fast algorithm. Based on the property of orthogonal transforms, if there is no quantization in the frequency domain, signals can be reconstructed perfectly; otherwise, the average variance of the reconstruction error equals that of the quantization error [11]. For a nonorthogonal transform, the average variance of the reconstruction error is larger than that of the quantization error, and even though there is no quantization, the reconstruction is imperfect. Hence, we analyze the transform error introduced by the nonorthogonality. First of all, we discuss the reconstruction error without quantization. Let vector x be a sample from a 1-D, zeromean, unit-variance first-order Markov process with length N and adjacent element correlation ρ. x is transformed by nonorthogonal T, and the vector θ is obtained in the transform domain, i.e., θ = Tx. Let the reconstructed vector be y, where y = T T θ = T T Tx. Obviously, T T T is not equal to the identity matrix I, and the difference between T T T and I is denoted as E r, i.e., E r = T T T I. The average variance of the (9) reconstruction error σr0 2 is expressed as σr0 2 = 1 N 1 E [(x(k) y(k)) 2 ]= 1 N N E [(x y)t (x y)] k=0 = 1 N E [(E rx) T (E r x)] = 1 N E [xt E T r E rx]. (11) Then, we denote M = Er T E r, and the deduction continues σr0 2 = 1 N E [xt Mx] = 1 N 1 N 1 M(k, j)e [x(j)x(k)] N k=0 j=0 = 1 N 1 N 1 M(k, j)r(j k) N k=0 j=0 1 N 1 N 1 N σ2 x M(k, j) (12) k=0 j=0 where R(.) and σx 2 are the autocorrelation function and the variance of the source, respectively. As the autocorrelation is less than or equal to the variance, the upper bound of σr0 2 for a given NICT is shown in (12). Hence, minimizing σr0 2 is equivalent to minimizing the term N 1 N 1 k=0 j=0 M(k, j) in (12). Table III gives some examples of σr0 2 in the design of NICT with σx 2 equal to 1.0, and it will be explained later. Then, we take the quantization into consideration and show the relationship of the average variances of the reconstruction error σr 2 and the quantization error σ2 q. Let u be the quantized version of θ and y r be the reconstructed vector, i.e., y r = T T u. The quantization error is denoted as q, i.e., q = θ u. The average variance of σr 2 is expressed by (13) σr 2 = 1 N 1 E [(x(k) y r (k)) 2 ] N k=0 = 1 N E [(x y r) T (x y r )] = 1 N E [(T 1 θ T T u) T (T 1 θ T T u)]. (13) Since T 1 is not equal to T T with the nonorthogonal T,we use (T T T er ) to represent T 1, and substitute T 1 into (13) with (T T T er ). Then we get (14) σr 2 = 1 N E [(T T q T er θ) T (T T q T er θ)] = 1 N E [qt TT T q 2q T TT er θ +(T er θ) T T er θ] = 1 N E [qt (I + E r )q 2q T T (T T T 1 )θ +(T T θ T 1 θ) T (T T θ T 1 θ)] = 1 N E [qt q + q T E r q 2q T E r θ +(y x) T (y x)] = σ 2 q + σ2 r0 + 1 N E [qt E r q] 2 N E [qt E r θ]. (14) The third and fourth terms in (14) are both equal to zero, because the autocorrelation of q and the cross-correlation of q and θ are both zero as shown by Widrow et al. [22]. Therefore, the relationship between σr 2 and σ2 q is shown in (15) σr 2 = σ2 q + σ2 r0. (15)

6 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1467 TABLE III Comparison of Sets of Matrix Elements x 1 x 3 x 5 x 7 x 9 x 11 x 13 x 15 DCT Upper Bound Distortion (%) of σ 2 r0 (10 7 ) Clearly, the average variance of the reconstruction error σr 2 is always σr0 2 larger than the quantization error variance σ2 q,no matter the transform coefficients are quantized or not, whereas with an orthogonal transform, σr 2 is equal to σ2 q. The key of designing an NICT is the balance among σr0 2, the approximation of the DCT, and the magnitudes of the matrix elements. Table III gives 11 element sets of T 8o in (10). Their DCT distortions [7] and upper bounds of σr0 2 in (12) shown in the table are of T 8o instead of T 16 in (8). The upper five sets are given by Cham and Chan [21], which lead to orthogonal T 8o, but the magnitudes and DCT distortion of these sets are larger than the rest six sets leading to nonorthogonal T 8o. Although nonorthogonal, the bottom six element sets are all with negligible σr0 2, and specially, the one in the bottom row has the minimum DCT distortion and σr0 2. Another advantage of this element set is that its norm is close to those of the even part T 8e in (9) and therefore is suitable for PIT implementation, which is required when integrated into the AVS platform. Hence, the vector in the bottom row is used for the element set of the odd part, and the performance will be verified in Section VI. C. The Proposed Order-16 MICT The MICT transform matrix is obtained by modifying the structure of the order-16 DCT matrix using the principle of dyadic symmetry. The basis vectors of the order-16 MICT are inherently orthogonal no matter what the elements are, so compared with the elements in the NICT transform matrix, even smaller magnitudes can be selected. The even part of the general transform matrix remains the same as the T 8e in (9), while the odd part is changed, shown as (16) x 1 x 3 x 5 x 7 x 9 x 11 x 13 x 15 x 9 x 11 x 13 x 15 x 1 x 3 x 5 x 7 x 5 x 7 x 1 x 3 x 13 x 15 x 9 x 11 M 8o = x 15 x 13 x 11 x 9 x 7 x 5 x 3 x 1 x 13 x 15 x 9 x 11 x 5 x 7 x 1 x 3. (16) x 3 x 1 x 7 x 5 x 11 x 9 x 15 x 13 x 7 x 5 x 3 x 1 x 15 x 13 x 11 x 9 x 11 x 9 x 15 x 13 x 3 x 1 x 7 x 5 Fig. 3. Compatible structures for order-8 and order-16 (a) forward transformation and (b) inverse transformation. With the even part T 8e in (9) and the odd part M 8o in (16), the first three basis vectors of the order-16 MICT transform matrix resemble those of the DCT and represent low frequency components. So the MICT shows good energy compaction capability especially for the regions in HD videos, where pixels are higher correlated. Though the dyadic symmetries of most basis vectors in the odd part of the transform matrix are different from those in the DCT matrix, the MICT will not bring significant performance penalty, if the elements are well-selected, as evident in Section VI. Since the elements in the transform matrices are selected without orthogonality constraint, it is free to make a tradeoff between the performance and the magnitudes of the elements, i.e., the computational complexity. In this paper, the magnitudes of elements are represented by only 3 to 4 bits and 16-bit integer implementation can be developed easily. In detail, the even part is selected to be the same as the order-8 ICT in AVS or H.264, according to whichever it is integrated into. For the odd part, there are three considerations for elements values. First, the magnitudes should be comparable to those of the even part. Second, the waveform of the second basis of the MICT transform matrix should resemble that of the DCT. Third, it should be suitable for the design of fast algorithm, which will be illustrated in Section IV-B. Taking all these constraints into consideration, {x 1,x 3,...x 13,x 15 } is designed to be {11, 11, 11, 9, 8, 6, 4, 1}. IV. Complexity Reduction This section introduces the research efforts we have made for complexity reduction. First of all, the 2-D order-16 integer transforms are compatible with the 2-D order-8 ICT in AVS EP or H.264 HP, according to whichever standard they are integrated into. Furthermore, the fast algorithm for the 2-D order-16 MICT proposed in Section III-C is developed and extended to a general approach. A. Compatibility with the Order-8 ICT It is natural that the order-16 DCT is compatible with the order-8 one. However, it is not always true for integer transforms. Bossen [23] points out that the compatibility of order-8 and order-4 ICTs can be achieved by taking the order-4 ICT as the even part of the order-8 one. In this paper, the order-

7 1468 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER integer transforms use the order-8 ICTs in the standards or their scaled version as the even part for the compatibility. First, the compatible transformation is presented. Fig. 3(a) shows the flow diagram of the forward transformation, applicable for both types of order-16 integer transforms. The bottomright and upper-right blocks complete the function of matrix multiplication with the odd and even parts of T 16, respectively. In case of the order-16 MICT, where T 8e is exactly the same as the order-8 ICT, the upper-right block may be reused directly for the order-16 forward transformation. In the other case of the order-16 NICT, where T 8e is equal to the order-8 ICT scaled by a factor 4, the output of upper-right block should be left shifted 2 bits before they are reused for the forward transformation of the order-16 NICT. Therefore, a module specially for the order-8 forward transformation is saved, and the output data from the order-8 forward transformation can be reused by the order-16 one. For the inverse transformation, the compatibility can be achieved in a similar fashion, as shown in Fig. 3(b). Based on the compatibility of transform matrix, the inherent relationship between the scaling matrices, R 8 8 and R 16 16,is derived as (17), according to (4) and (5) R 8 8 (i, j) 2 M = R (2i, 2j), 0 i, j < 8 (17) where M equals 5 and 1 for the case of the NICT and the MICT, respectively. When the divisions in (3) are approximated by integer multiplication and right shift as shown in (6), the relationship between the scaling matrix P 8 8 and P for the scaling is P 8 8 (i, j) =P (2i, 2j) 2 K, 0 i, j < 8 (18) where K equals 5 and 1 for the case of the NICT and the MICT, respectively. When the order-16 integer transforms are implemented as PIT to be compatible with the order- 8 PIT in AVS, (18) is also true for the combined scaling with K doubled. Interested readers may refer to [24] for more details. Equation (18) indicates that P 8 8 (i, j) is 2 K times larger than P (2i, 2j) if N, the number of bits for right shift in (6), is the same for the order-16 and the order-8 transforms. The difference between P 8 8 (i, j) and P (2i, 2j) can be easily sorted out by shift operations. In other words, P (2i, 2j), 0 i, j < 8, may be used for the combined scaling of the order-8 ICT but N in (6) for 8 8 scaling is K bits less than that for the scaling. Hence, the memory for storing an 8 8 scaling matrix is saved, thus achieving the compatibility of scaling matrix. B. Fast Algorithm for the Order-16 MICT ICT and NICT have fast algorithms, which can be developed in a similar way for the DCT. However, fast algorithms for the MICTs are not guaranteed, and little research effort has been devoted to it. We propose a universal approach to developing fast algorithms for the integer transforms with different structures from the DCT matrix, including the MICT. Due to the separability of a 2-D transform and the similarity of the forward and inverse transformation flows, we focus on the fast algorithm for 1-D order-16 transformation in this section. The eight butterflies in the left part of Fig. 3(a) can be easily developed as the first stage, which exploits the symmetries with respect to the line in (8). The upper-right block completes the function of matrix multiplication with the even part, which is the same as the order-8 ICTs in the standards in this paper. Fortunately, both order-8 ICTs in AVS and H.264 have efficient fast algorithms, which are borrowed here. For a more general case that the even part of the MICT has different structure with T 8e in (9), the fast algorithms can be developed in the same way for M 8o as introduced below. The bottom-right block in Fig. 3(a), completing the function of matrix multiplication with the odd part, has a quite different structure from the odd part of the order-16 DCT matrix. So we cannot decompose it to butterfly operations just as we do for T 8e. Instead, we represent M 8o as the product of three 8 8 matrices, as shown in (19) M 8o = M 1 M 2 M 3. (19) There are some considerations for the three matrices. First, they should contain integers only. Second, the integers should be very small such that only additions and shifts are involved and multiplications are avoided in fast algorithms. Last but not least, the matrices should be sparse, which means that many zeros appear in the matrices. Obviously, row vectors in M 8o are orthogonal and have the same length, where the length of a row vector a is defined as a a T. We denote the length of every row vector of M 8o as n O, and clearly det(m 8o ) is equal to n 4 O, where det(.) and. mean determinant and absolute value, respectively. To simplify the problem, we assume M 1, M 2, and M 3 in (19) are also row-orthogonal and the row vectors have the same length in each matrix, denoted as n 1, n 2, and n 3, respectively. Similarly, det(m 1 ), det(m 2 ), and det(m 3 ) are equal to n 4 1, n4 2, and n4 3, respectively. If (19) is satisfied, (20) is necessarily true det(m 8o ) = n 4 O = det(m 1) det(m 2 ) det(m 3 ) = n 4 1 n4 2 n4 3. (20) Therefore n O = n 1 n 2 n 3. (21) So, when designing an order-16 MICT with a fast algorithm, we should make sure that n O of M 8o is the product of three integers. Otherwise, (19) has no solution under this assumption. With the constraint of (21), the lengths of row vectors in M 1, M 2, and M 3 are much smaller than that in M 8o, which means the elements in the three matrices have very small magnitudes and may also contain many zeros. Among the three matrices, M 3 will be found first. Both sides of (19) are postmultiplied by the inverse of M 3, which is equal to M3 T /n 3, so (19) changes to (22) as follows: M 8o M3 T /n 3 = M 1 M 2. (22) It is noticed that M3 T may be regarded as a set of eight column vectors and M 8o M3 T /n 3 contains only integers. So we first collect a set of column vectors by exhaustive search, in which each column vector, b i, satisfies two conditions. One is that the length of b i is n 3, and the other is the elements in (M 8o b i /n 3 ) are all integers. If (19) has solutions, we can pick out eight

8 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1469 orthogonal column vectors from the set to form M3 T and thus get M 3. To find M 2, both sides of (22) are postmultiplied by the inverse of M 2, i.e., M2 T /n 2, so (22) becomes (23) (M 8o M3 T /n 3) M2 T /n 2 = M 1. (23) M 2 can be obtained by the similar method of searching M 3. Finally, with M 2 and M 3 known, M 1 is computed by (23). For the case of M 8o in this paper, its determinant is which is equal to and the corresponding n O is 561. To show that n O is a product of three integers as in (21), we let n 1, n 2, and n 3 be 3, 11, and 17 with the order changeable. Then the method introduced above is used to search for M 1, M 2, and M 3, and (24), shown at the bottom of the page, shows one of the solutions of (19). The magnitudes of elements in (24) are so small that the fast algorithm can be implemented by using only additions and shifts. Fig. 4 shows the flow diagram of the forward transformation of the order-16 MICT. The shifts are not indicated in the bottom-right block for a clearer view. Fig. 4. Flow diagram of the order-16 forward transformation. V. Integration of the 2-D Order-16 Integer Transforms into the Standards In this paper, the proposed two types of 2-D order-16 integer transforms are integrated into AVS EP [5] and H.264 HP [3], respectively, and used adaptively as an alternate to the 2-D order-8 ICTs. For H.264, the 2-D order-4 ICT is not included in the VBT scheme, although it exists in the HP. That is because the 2-D order-4 ICT does not contribute much to the efficiency of HD video coding, as verified in Section VI. The selection of transform size is based on the MB-level R D cost [17]. The transform with lower cost is selected and a 1-bit binary signal is transmitted in MB header for the indication. For AVS EP, the largest block size for intra prediction is 8 8. To apply the 2-D order-16 integer transforms to intracoded MB, the intra prediction is employed, which is simply extended from the five prediction modes in AVS. The reference pixels are not used for prediction directly, but filtered by a 3-tap low-pass filter [1 2 1]/4 at first. Otherwise, visible artifacts would be introduced due to large block size intra prediction [25], [26]. In AVS EP and H.264 HP, the 8 8 residual blocks are coded by CABAC [3], [5]. Their arithmetic coding engines can code sources with various statistics by using many automatically updated probability models and therefore remain unchanged in this paper. Two additional sets of probability models specially for coding the residual blocks are designed for AVS and H.264, respectively, to drive the arithmetic coding engine. CBP is a 6-bit integer in AVS and H.264, of which the four least significant bits (LSB) indicate whether the four 8 8 luminance blocks contain nonzero coefficients, respectively. In this paper, if an MB is coded by the 2-D order-16 transform, CBP becomes a 3-bit integer with the LSB indicating whether the luminance block contains nonzero coefficients. VI. Experimental Results A. Transform Coding Gain An important measure for the evaluation of the compression performance is the transform coding gain G TC [11], which weights the reconstruction error variance of quantized transform coding σtc 2 and the corresponding reconstruction error variance of PCM coding σpcm 2 G TC = σ2 PCM σtc 2. (25) Under the assumptions of optimum quantization and bit allocation at the same overall rate, G TC is the ratio of arithmetic to geometric means of the variances of transform coefficients σ 2 θk (0 k<n), as shown in (26) G TC = 1 N N 1 k=0 σ2 θk. (26) ( N 1 k=0 σ2 θk ) 1 N Suppose the source is 1-D zero-mean, unit-variance firstorder Markov processes with adjacent element correlation ρ M 8o =M 1 M 2 M 3 = (24)

9 1470 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 TABLE IV Test Conditions Fig. 5. Comparing coding gain of orthogonal transforms. Platform Sequence structure Intra frame period Entropy coding Fast motion estimation Deblocking filter R D optimization QP RM62b (AVS), JM11 (H.264) IBBPBBP s Arithmetic coding on on on fixed (27, 30, 35, 40, AVS) fixed (20, 24, 28, 32, H.264) off 2 (AVS), 5 (H.264) Rate control Reference frame Search range ±32 Frame number 60 ranging from 0.55 to 0.95, which is typical for the prediction errors of HD videos (see Table II). Fig. 5 shows the coding gains of the order-16 DCT, the proposed MICTs for AVS and H.264, respectively, and the two MICTs proposed by Wien [7] and Ma [10], respectively. DCT has the highest coding gain, no matter what the value of ρ is. The coding gains of the proposed two MICTs are too close to be distinguished from each other, and are higher than those of the MICTs proposed by Ma and Wien, especially when ρ approaches 1.0. B. Reconstruction Error of Nonorthogonal Transforms In the case of nonorthogonal transforms, even with optimal quantization and bit allocation, (26) cannot be derived to substitute (25), because σpcm 2 depends on quantization only, whereas σtc 2 depends not only on quantization but also on the variance of the source. Therefore, we cannot evaluate the performance of the proposed NICTs by (26) and instead, we have to implement the transforms and quantization and compare the reconstruction errors thus introduced. Seven order-16 transforms are tested, including the order-16 DCT, the proposed two NICTs and the four ICTs, of which the even part is the order-8 ICT in H.264 and the element sets of the odd part are shown in the first four rows of Table III. These transforms are applied to the 1-D zero-mean, unit-variance first-order Markov processes with adjacent element correlation ρ ranging from 0.55 to Before inverse transformation, the transform coefficients are zonal filtered [11], which means the bit rate is reduced by simply retaining the first n (n<16) transform coefficients. The MSE between the source and the reconstructed value are calculated. Fig. 6 compares the MSE introduced by 4:1, 4:2, and 4:3 zonal filters. In detail, the three zonal filters retain the first 4, 8, and 12 transform coefficients respectively, and set all the others to zero. For a clear view, the curves in Fig. 6 present the difference MSE compared with that of the order-16 DCT. We notice that the four ICTs produce larger MSE than the proposed two NICT, although orthogonal. Therefore, the performance of an integer transform is greatly determined by the approximation of the exact DCT rather than the orthogonality, if σr0 2 is negligible compared with σ2 q in (15). TABLE V Experimental Results on the AVS Platform Proposed NICT Proposed MICT HD Sequence PSNR Gain Bit Saving PSNR Gain Bit Saving (db) (%) (db) (%) Bigship City Crew Optis Raven Sheriff Pedestrian Riverbed RushHour Station Sunflower Tractor Average C. Objective Evaluation The proposed two types of 2-D order-16 integer transforms were, respectively, integrated into the RM62b platform for AVS EP and the JM11 platform for H.264 HP, using 16-bit integer arithmetic. The implementation problems are solved in Section V. The test conditions are listed in Table IV. Tables V and VI show performance gains, compared to AVS EP using a 2-D order-8 ICT and H.264 HP using both 2-D order-4 and order-8 ICTs, respectively. The improvement is measured by PSNR gain for equal bit rate or by bit rate saving at the same PSNR, using the method in [27]. With the NICT, the improvements are all greater than 0.1 db and more than 0.2 db on average, whilst for the best case of Riverbed, the gain is up to 0.47 db on the AVS platform and 0.48 db on the H.264 platform. The sequence Riverbed has smooth textures and global motions, so the energy of prediction errors is quite small, no matter using inter or intra prediction. In the transform domain, the prediction errors can be represented by only a few coefficients of the 2-D order-16 NICT, most of which cluster in the low frequency area. With the MICT, the coding efficiency is improved on an average of around 0.06 db less than that by NICT on both platforms, but for the sequences with large homogeneous regions or smooth motions, such as Crew, Pedestrian, RushHour,

10 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1471 Fig. 6. MSE with (a) 4:1, (b) 4:2, and (c) 4:3 zonal filtering. TABLE VI Experimental Results on the H.264 Platform Proposed NICT Proposed MICT HD Order-4, order-8, order-16 Order-8, order-16 Order-4, order-8, order-16 Order-8, order-16 Sequence PSNR Gain Bit Saving PSNR Gain Bit Saving PSNR Gain Bit Saving PSNR Gain Bit Saving (db) (%) (db) (%) (db) (%) (db) (%) BigShip City Crew Optis Raven Sheriff Pedestrian Riverbed RushHour Station Sunflower Tractor Average and Sunflower, the performance of the MICT is comparable to or even better than the NICT, due to the efficiency of the first three basis vectors. However, for the sequences full of details, such as Sheriff, Station, and Tractor, the gain of the MICT becomes trivial because of the relative inefficiency to compress high frequency components. Overall, the performance gap between the proposed NICT and MICT is small. Fig. 7(a) and (b) show the percentage of MBs coded by the 2-D order-16 integer transforms. On average, more than half of the MBs are coded by 2-D order-16 transforms, and for some sequences, such as Sunflower and Station, the percentage of 2-D order-16 transforms is up to 80%. This implies that 2-D order-16 integer transforms are very useful in HD video coding. There is a tendency that the higher the bit rate is, the less the 2-D order-16 integer transforms are used. That is because at high bit rates, many high frequency coefficients survive the quantization and high-cost coefficients, i.e., a (level, run) pair with large run, have to be coded due to the large block size of Hence, 2-D order-8 ICT is more likely to be selected. We also study the VBT scheme using three transforms, 2-D order-4, order-8, and order-16 transforms, on the H.264 platform. As shown in Table VI, the performance gap of the two VBT schemes with and without the 2-D order-4 ICT is marginal. The same conclusion can also be drawn by Fig. 7(c) and (d), where the percentages of MBs coded by 2-D order-4, order-8, and order-16 transforms are compared on the H.264 platform. The percentage of using 2-D order-4 ICT is very small as we expected. Therefore, in the proposed VBT scheme, the 2-D order-4 ICT in H.264 is removed. D. Subjective Evaluation Using 2-D order-16 transform can preserve more details and provide better visual quality, especially at low bit rate. Fig. 8 gives two examples for the subjective comparison. The images are all cropped from H.264-coded HD videos with size of pixels. The indicated PSNR and number of bits are the R D information of the associated frames. Obviously, the vertical edges of the buildings in City and the horizontal edges of the railway sleepers in Station are better preserved by the additional 2-D order-16 transforms. However, at high bit rate, the visual quality without using 2-D order-16 transform is good enough and therefore the improved fidelity by using 2-D order-16 transforms is hard to be visually observed.

1472 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 Fig. 7. Proportion of different block size transforms used in H.264 HP.

Images cropped from City (720p) and Station (1080p) both coded with QP = 32. (a) and (d) are coded using H.264 HP. (b) and (e) are coded using H.264 HP with additional 2-D order-16 NICT.

11 1472 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 Fig. 7. Proportion of different block size transforms used in H.264 HP. (a) 2-D order-8 ICT and order-16 NICT. (b) 2-D order-8 ICT and order-16 MICT. (c) 2-D order-8 and order-4 ICTs and 2-D order-16 NICT. (d) 2-D order-8 and order-4 ICTs and 2-D order-16 MICT. Fig. 8. Images cropped from City (720p) and Station (1080p) both coded with QP = 32. (a) and (d) are coded using H.264 HP. (b) and (e) are coded using H.264 HP with additional 2-D order-16 NICT. (c) and (f) are coded using H.264 HP with additional 2-D order-16 MICT. TABLE VII Operations of Fast Algorithm and Matrix Multiplication Operation Fast MICT Matrix Multiplication Addition Multiplication Shift 32 0 TABLE VIII Execution Time of Fast Algorithms and Matrix Multiplication Fast Algorithm (s) Matrix Multiplication (s) Saving Time (%) DCT % MICT % M 8o % E. Efficiency of the Fast Algorithm If the transform is implemented by matrix multiplication, the complexity is high, due to many additions and multiplications introduced by the dot products. The fast algorithm developed in Section IV-B can implement the transformation process of the MICT by shifts and additions only. Table VII compares the numbers of additions, shifts, and multiplications required by transforming a 16-point vector using the fast algorithm and

12 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1473 matrix multiplication, respectively. Without multiplication, the computational complexity is reduced significantly. However, the number of additions required for the fast algorithm is considerable, because multiplications with small magnitude integers are replaced by combinations of additions and shifts. The 2-D order-16 MICT was implemented using both fast algorithm and matrix multiplication by the C language. Ten thousand random blocks were generated, in which the numbers were uniformly distributed from 256 to 255. The total execution time for transforming these blocks by the two methods was recoded, respectively. The PC platform has a 3.2 GHz Intel Pentium 4 CPU and 1.0 GB RAM. Two Windows API functions QueryPerformanceFrequency and QueryPerformanceCounter are used for timing [28], which measure the execution time to nanosecond accuracy. The time comparison is shown in the third row of Table VIII, where using fast algorithm can save about 89% of the computational time. The efficiency of the fast algorithm for the MICT is compared with that of the fast DCT (FDCT) [29], because there are no publications related to fast algorithms for 2-D order- 16 ICT, NICT or MICT. We did the experiment as described above for the case of the DCT, and the time for computing the 2-D order-16 DCT using FDCT and matrix multiplication are recorded, respectively. The second row of Table VIII shows FDCT can save 93% of the computational time compared with using matrix multiplication. The proposed fast algorithm for the 2-D order-16 MICT is 4% less efficient than the FDCT for the DCT because of the different dyadic symmetries in M 8o. At the same time, we are also concerned with the efficiency of the fast algorithm used for the bottom-right module in Fig. 3(a), which is specially designed for the multiplication with matrix M 8o. Similarly, ten thousand 8 8 random blocks were generated, in which the numbers were uniformly distributed from 256 to 255. As shown in the fourth row of Table VIII, the fast algorithm can save almost 77% computational time compared with matrix multiplication. This proves that the fast algorithm for M 8o has lower efficiency than that of the whole 2-D order-16 MICT. If the structure of the even part is different from T 8e in (9) and some existing fast algorithms of 2-D order-8 ICTs cannot be used, the method for decomposing M 8o has to be applied to the even part and the fast algorithm will be about 10% less efficient, but overall, the computational complexity is reduced significantly. VII. Conclusion In this paper, 2-D order-16 transforms are proposed to exploit the spatial property of HD videos. Specifically, we propose two types of 2-D order-16 integer transforms, NICT and MICT, both of which allow selecting matrix elements more freely by releasing the orthogonality constraint and can provide comparable performance with that of the DCT. The latter is even simpler. Furthermore, considerable effort has been spent to reduce the complexity, including compatibility with the 2-D order-8 ICT in AVS or H.264 and the development of the fast algorithm. Both 2-D order-16 integer transforms are integrated into AVS EP and H.264 HP, respectively, and used adaptively as an alternative to the 2-D order-8 ICTs according to local activities. The experimental results show that 2-D order-16 integer transforms can be efficient coding tools especially for HD video coding. References [1] Generic Coding of Moving Pictures and Associated Audio Information Part 2: Video, ITU-T Rec. H.262 and ISO/IEC MPEG-2, [2] Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, document JVT-G050, ITU-T Rec. H.264 and ISO/IEC AVC, Mar. 2003, [Online]. Available: [3] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 and ISO/IEC AVC, Mar [4] Information Technology-Advanced Coding of Audio and Video-Part 2: Video, GB/T AVS, May [5] AVS Video Group. (2006, Sep.). Information Technology-Advanced Coding of Audio and Video: Part 2. Video, AVS-N1318. [6] D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards and Practice. Boston, MA: Kluwer, [7] ICT Comparison for Adaptive Block Transforms, document ITU-T SG16/Q6 VCEG and VCEG-L12, Jan. 2001, [Online]. Available: [8] Simplified Adaptive Block 762 Transforms, document ITU-T SG16/Q6 VCEG, Dec. 2001, [Online]. Available: [9] S. Naito and A. Koike, Efficient coding scheme for super high definition video based on extending H.264 high profile, in Proc. SPIE Vis. Commun. Image Process., vol Jan. 2006, pp [10] S. Ma and C.-C. Kuo, High-definition video coding with supermacroblocks, in Proc. SPIE Vis. Commun. Image Process., vol Jan. 2007, pp [11] N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, NJ: Prentice-Hall, [12] J. Liang and T. D. Tran, Fast multiplierless approximations of the DCT with the lifting scheme, IEEE Trans. Signal Process., vol. 49, no. 12, pp , Dec [13] A 16-bit Architecture for H.264, Treating DCT Transform and Quantization, document ITU-T SG16/Q6 VCEG and VCEG-M16, Apr. 2001, [Online]. Available: [14] A. Habibi, Comparison of nth-order DPCM encoder with linear transformation and block quantization techniques, IEEE Trans. Commun., vol. 19, no. 6, pp , Dec [15] W. K. Pratt, Digital Image Processing. New York: Wiley, [16] A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes. Boston, MA: McGraw-Hill, [17] T. Wiegand and B. Girod, Lagrange multiplier selection in hybrid video coder control, in Proc. IEEE Int. Conf. Image Process., vol. 3. Oct. 2001, pp [18] A. N. Netravali and J. O. Limb, Picture coding: A review, Proc. IEEE, vol. 68, no. 3, pp , Mar [19] W. K. Cham, Development of integer cosine transforms by the principle of dyadic symmetry, in Proc. Inst. Electr. Eng. I: Commun. Speech Vis., vol no. 4, 1989, pp [20] C.-X. Zhang, J. Lou, L. Yu, J. Dong, and W. K. Cham, The technique of pre-scaled integer transform in hybrid video coding, in Proc. IEEE Int. Symp. Circuits Syst., vol. 1. May 2005, pp [21] W. K. Cham and Y. T. Chan, An order-16 integer cosine transform, IEEE Trans. Signal Process., vol. 39, no. 5, pp , May [22] B. Widrow, I. Kollar, and M.-C. Liu, Statistical theory of quantization, IEEE Trans. Instrum. Meas., vol. 45, no. 2, pp , Apr [23] ABT Cleanup and Complexity Reduction, document JVT-E087.doc, Oct. 2002, [Online]. Available: [24] J. Dong, J. Lou, C.-X. Zhang, and L. Yu, A new approach to compatible adaptive block size transforms, in Proc. SPIE Vis. Commun. Image Process., vol Jul. 2005, pp [25] M. Wien, Variable block size transforms for H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp , Jul [26] Study of ABT for HDTV Coding, document JVT-E073r2.doc, Oct. 2002, [Online]. Available: [27] Calculation of Average PSNR Differences Between RD-curves, document ITU-T SG16/Q6 VCEG, Apr. 2001, [Online]. Available:

1474 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 [28] Microsoft Developer Network. [Online]. Available: http://msdn2. microsoft.

Speech Signal Process., vol. 2. May 1989, pp. 988 991. Jie Dong (S 07) received the B.Eng. and M.Eng. degrees in information engineering from Zhejiang University, Hangzhou, China, in 2002 and 2005, respectively.

Her research interests include HD video compression and processing. King Ngi Ngan (M 79 SM 91 F 00) received the Ph.D. degree in electrical engineering from Loughborough University of Technology, Leicestershire, U.

University, Singapore, and University of Western Australia, Perth.

13 1474 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 [28] Microsoft Developer Network. [Online]. Available: microsoft.com/en-us/library/ms674866(vs.85).aspx [29] C. Loeffler, A. Ligtenberg, and G. S. Moschytz, Practical fast 1-D DCT algorithms with 11 multiplications, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 2. May 1989, pp Jie Dong (S 07) received the B.Eng. and M.Eng. degrees in information engineering from Zhejiang University, Hangzhou, China, in 2002 and 2005, respectively. She is currently working toward the Ph.D. degree in electronic engineering from the Chinese University of Hong Kong, Hong Kong, China. Her research interests include HD video compression and processing. King Ngi Ngan (M 79 SM 91 F 00) received the Ph.D. degree in electrical engineering from Loughborough University of Technology, Leicestershire, U.K. Currently, he is a Chair Professor at the Department of Electronic Engineering, the Chinese University of Hong Kong, Hong Kong, China, and was previously a Full Professor at Nanyang Technological University, Singapore, and University of Western Australia, Perth. He is the author of three books, five edited volumes, and over 200 refereed technical papers in the areas of image/video coding and communications. Prof. Ngan is an Associate Editor of the Journal on Visual Communications and Image Representation, as well as an Area Editor of the European Association for Signal Processing Journal of Signal Processing: Image Communication, and served as an Associate Editor of the IEEE Transactions on Circuits and Systems for Video Technology and Journal of Applied Signal Processing. He chaired a number of prestigious international conferences on video signal processing and communications and served on the advisory and technical committees of numerous professional organizations. He is a general co-chair of the IEEE International Conference on Image Processing, to be held in Hong Kong, in He is a Fellow of The Institution of Engineering and Technology (U.K.), and Institution of Engineers of Australia, and was an IEEE Distinguished Lecturer during Chi-Keung Fong (S 07) received the B.Eng. degree in electrical and electronic engineering from The University of Hong Kong, Hong Kong, China, in 1999, and the M.Phil. degree in electronic engineering in 2003, from the Chinese University of Hong Kong, Hong Kong, China, where he is currently working toward the Ph.D. degree. His research interests include transform coding, pattern matching, and image processing. Wai-Kuen Cham (S 77 M 79 SM 91) graduated from the Chinese University of Hong Kong, Hong Kong, China, in 1979, in electronics. He received the M.Sc. and Ph.D. degrees from Loughborough University of Technology, Leicestershire, U.K., in 1980 and 1983, respectively. Since May 1985, he has been with the Department of Electronic Engineering, the Chinese University of Hong Kong. His research interests include image coding, image processing, and video coding.

Basic Principles of Video Coding

Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion