1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009

Size: px
Start display at page:

Download "1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009"

Transcription

1 1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER D Order-16 Integer Transforms for HD Video Coding Jie Dong, Student Member, IEEE, King Ngi Ngan, Fellow, IEEE, Chi-Keung Fong, Student Member, IEEE, and Wai-Kuen Cham, Senior Member, IEEE Abstract In this paper, the spatial properties of highdefinition (HD) videos are investigated based on a large set of HD video sequences. Compared with lower resolution videos, the prediction errors of HD videos have higher correlation. Hence, we propose using 2-D order-16 transforms for HD video coding, which are expected to be more efficient to exploit this spatial property, and specifically propose two types of 2-D order-16 integer transforms, nonorthogonal integer cosine transform (ICT) and modified ICT. The former resembles the discrete cosine transform (DCT) and is approximately orthogonal, of which the transform error introduced by the nonorthogonality is proven to be negligible. The latter modifies the structure of the DCT matrix and is inherently orthogonal, no matter what the values of the matrix elements are. Both types allow selecting matrix elements more freely by releasing the orthogonality constraint and can provide comparable performance with that of the DCT. Each type is integrated into the audio and video coding standard (AVS) Enhanced Profile (EP) and the H.264 High Profile (HP), respectively, and used adaptively as an alternative to the 2-D order-8 transform according to local activities. At the same time, many efforts have been devoted to further reducing the complexity of the 2-D order-16 transforms and specially for the modified ICT, a fast algorithm is developed and extended to a universal approach. Experimental results show that 2-D order-16 transforms provide significant performance improvement for both AVS Enhanced Profile and H.264 High Profile, which means they can be efficient coding tools especially for HD video coding. Index Terms AVS, H.264, HDTV, ICT, order-16 transform, VBT. I. Introduction TODAY, high-definition (HD) videos become more and more popular with many applications, such as highdefinition television (HDTV) broadcasting, high density storage media, video-on-demand (VOD), and surveillance. The most common algorithms for HD video coding are mainly based on the hybrid framework of motion compensated prediction (MCP) and discrete cosine transform (DCT). Manuscript received July 13, 2007; revised April 28, First version published July 7, 2009; current version published September 30, This work was supported in part by the Chinese University of Hong Kong, Hong Kong, China under the Focused Investment Scheme, Project , and the Research Grants Council of Hong Kong, Project CUHK This paper was recommended by Associate Editor P. Topiwala. The authors are with the Department of Electronic Engineering, the Chinese University of Hong Kong, Hong Kong, China ( jdong@ee.cuhk.edu.hk; knngan@ee.cuhk.edu.hk; ckfong@ee.cuhk.edu.hk; wkcham@ee.cuhk.edu.hk). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TCSVT /$26.00 c 2009 IEEE MPEG-2 [1] is the most widely used standard for HDTV broadcasting, which uses only basic coding tools, such as the 2-D order-8 DCT, bidirectional motion prediction and compensation, and variable length coding (VLC). Although the efficiency is relatively low, the widespread acceptance of MPEG-2 is mainly attributed to its low complexity and its adoption in consumer electronic products. The latest video coding standard H.264/AVC [2] can bring significant performance improvement compared to all the previous standards. It employs complicated intra and inter prediction tools to achieve accurate prediction and advanced entropy coding schemes to remove statistical redundancy. The recently completed High Profile (HP) [3], aiming at HD video coding, further improves the coding efficiency by variable block size transforms (VBT), where a 2-D order-8 integer cosine transform (ICT) is used adaptively as an alternative to the 2-D order-4 ICT in H.264. The audio and video coding standard (AVS) [4] is the national standard of China. Its Enhanced Profile (EP) [5] also aims at HD video coding in the efficiency aspect and does not provide rich functionalities. So, it is quite similar to the Main Profile of H.264. JPEG 2000 [6] is another attractive choice for HD video coding. Compared with those MCP/DCT-based coding techniques, it provides a wavelet-based framework and offers the richest set of features, such as lossy and lossless coding, spatial and quality scalability, region-of-interest (ROI) coding, and error resilience tools. Although it was originally developed for still image coding and does not remove temporal redundancy of video sequences, JPEG 2000 can address the needs for very demanding HD applications, e.g., digital cinema, and remote medical consultancy. However, the techniques, focusing on improving coding efficiency, are all originally designed as general coding tools for videos of various resolutions. Although efficient when applied to HD videos, they do not fully exploit their properties. Wien and Sun [7] once proposed a VBT scheme to H.264, with seven transform block sizes of 4 4 up to However, this scheme was not proposed for HD video coding and the reported experimental results were based on a set of CIF sequences. The maximum transform block size was finally reduced to 8 8 due to the visible artifacts and the overall complexity [8]. Recently, Naito and Koike [9], and Ma and Kuo [10] have investigated the efficiency of using supermacroblock (MB) for HD video coding within the framework of H.264. Naito and Koike [9] allow the MB size to be adaptively selected from to 2 N 2 N (N 4), where N

2 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1463 may be configured at the picture level. The block sizes of intra prediction and motion compensation are scaled, according to the MB size. The experimental results based on two 4K 2K sequences are reported and significant bit saving can be observed, when the sequences are coded using the fixed QP 45. In [10], the MB size is fixed to be and a 2-D order- 16 integer transform is also proposed and adaptively used in addition to the 2-D order-4 and order-8 ICTs in H.264 HP. Modest PSNR gain is shown based on two 1080p sequences, but subjective improvement is invisible. In this paper, we study the unique properties of HD videos and thus develop special coding tools. First of all, the spatial properties of HD videos and their motion prediction errors are both investigated and compared with those of low resolution videos. It is found that the HD videos are distinguished from other low resolution ones by higher spatial correlation. To exploit the property, we propose using 2-D order-16 transforms for HD video coding. Specifically, two types of 2-D order-16 integer transforms are proposed, both of which can achieve comparable performance with the DCT, if well-designed, and provide the freedom of selecting matrix elements by releasing the orthogonality constraint, such that trade-off can be made between performance and complexity. One type, called nonorthogonal ICT (NICT), allows the basis vectors to be nonorthogonal, in order to use matrix elements with small magnitudes on one hand and maintain the structure of the DCT matrix, e.g., dyadic symmetries and relative magnitudes, on the other hand. The transform error, which makes the average variance of the reconstruction error larger than that of the quantization error due to the nonorthogonality, is analyzed quantitatively. The specific NICT proposed in this paper achieves the balance among the performance, the complexity, and the transform error. The other type, called modified ICT (MICT), is an orthogonal integer transform with the dyadic symmetries of the DCT matrix modified. It is naturally orthogonal, no matter what the values of the elements are. There are some related publications. The one proposed by Wien and Sun [7] has a one-norm transform matrix, which means the norms of all the basis vectors are the same and the memory for storing the scaling matrix can be saved. However, as the cost, this 2-D order-16 integer transform has much lower coding gain [11] than the DCT and introduces ringing artifacts, when integrated into H.264. The one proposed by Ma and Kuo [10] is simple and compatible with the 2-D order-8 ICT in H.264. However, the proposed transform modifies the structure of the DCT matrix so significantly that it is actually equivalent to the combined processes of 2-D order-8 ICTs and 2-D order-2 Hadamard Transforms (HT). In other words, first, the four 8 8 partitions of the input block are transformed by 2-D order-8 ICT, respectively, and then the coefficients at the corresponding positions of the four transformed blocks are transformed by 2-D order-2 HT and thus further de-correlated. Liang et al. [12], [13] proposed a family of approximations of the DCT, named bindct, and the lifting scheme for implementation. The lifting parameters, which are theoretically rational, are replaced by proper dyadic approximations and thus enable the multiplication-free fast algorithm. The trade-off between coding gain and computational complexity can be made by tuning the resolution of the approximation. However, the bindct is not a unitary transform and thus uses the exact inversion, although invertibility is guaranteed by the lifting scheme. As a drawback, matrix multiplication cannot provide the exact same results as the lifting scheme, unless additional shifts are used. The proposed two types of 2-D order-16 transforms are both integrated into AVS EP and H.264 HP, respectively, using 16-bit integer arithmetic and used adaptively as an alternative to the 2-D order-8 ICT in whichever standard they are integrated into. The related problems of transform size selection, intra prediction, entropy coding, and coded block pattern (CBP) are solved. Both 2-D order-16 integer transforms are compatible with the 2-D order-8 ones in the standards. The fast algorithm for the MICT is developed and extended to a general approach. The remainder of this paper is organized as follows. Section II investigates the unique spatial property of HD videos. To fully exploit the property, two types of 2-D order- 16 integer transforms are proposed in Section III. Section IV introduces the efforts we have made to reduce the complexity of the proposed transforms. Section V briefly presents the solutions of the related problems, when the 2-D order- 16 transforms are integrated into the standards. Section VI reports the experimental results, followed by the conclusion in Section VII. II. Spatial Analysis of HD Videos The most useful statistical function for all video coding methods is correlation. Spatial correlation is useful for estimating the energy of the coefficients in the case of unitary transform coding, whilst temporal correlation can help to estimate the prediction error and design the predictor for interframe coder [14]. Our research work starts from analyzing the spatial correlation of raw HD videos. Since all the transforms used for video coding are separable and the spatial property of columns and rows can be taken as equivalent [15], we only focus on the horizontal correlation. In a video frame, each row is regarded as an observation of a discrete stochastic process and the observations in total are the rows in one frame. So each column represents a random variable. In probability theory and statistics, correlation of a stochastic process is often measured by the function, correlation coefficient r, as defined below [16] r(n 1,n 2 )=E[(x(n 1 ) µ 1 ) (x(n 2 ) µ 2 )]/(σ 1 σ 2 ) (1) where n 1 and n 2 are two positions in a certain row, x(n 1 ) and x(n 2 ) are the values of pixels at the two positions with ensemble averages, µ 1 and µ 2, and standard deviations, σ 1 and σ 2, and E is the ensemble average operator. Due to the variable local activities in a frame, r(n, n + τ) depends not only on τ but also on n in different positions. In other words, a video frame is not a collection of samples of a wide-sense stationary (WSS) stochastic process. Hence, for a given τ, r(τ) is a random variable instead of a constant. We calculate r(τ) with τ = 1, which represents the correlation of adjacent pixels. Table I shows the expected value

3 1464 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 TABLE I The Correlation Coefficient of Two Adjacent Pixels in Raw Video Sequences TABLE II The Correlation Coefficients r(1) and r(2) of Prediction Errors of Video Sequences Test Sequence Resolution r(1) µ r1 σ r1 Riverbed Flamingo HD Kayak Fireworks Raven Night HD Crew City FlowerGarden F1 SD Basketball Mobile Foreman News CIF Tempete Paris and standard deviation of r(1). A frame in a video sequence is a still image that can be approximated by the first-order Markov source with correlation coefficient tending to 1.0 [15]. Therefore, as what we expected, for all resolution sequences, the mean values of r(1) approach the upper bound of 1.0 and the variances are very small. Though, overall, r(1) of HD videos is slightly higher than others, the difference is not substantial. However, the transforms in video codecs are usually applied to motion prediction errors instead of original pixels. Even for intra frames, prediction errors are transformed thanks to the intra predictions in the recent video coding standards, such as H.264 and AVS. Therefore, it makes sense to investigate the correlation of prediction errors, which influence coding strategies directly. In this paper, the prediction errors are obtained using the criterion of rate distortion (R D) cost [17] and only the horizontal correlation is analyzed, as the correlation equivalence of two spatial directions still holds for prediction errors, according to our experimental results. Table II shows the expected values, µ r1 and µ r2,ofr(1) and r(2) of prediction errors. Generally, the correlation of prediction errors increases with the resolution, and the difference of r(1) values between HD and low resolution videos is substantial. The mean values of r(2), the correlation of every other pixel, are relatively high for HD videos, but approach 0 for most of the low resolution videos. Therefore, as evident in Table II, the prediction errors of HD videos are distinguished from the low resolution ones by higher spatial correlation. However, this conclusion does not hold in some extreme cases. For example, if a human face in sequence Crew was captured by a CIF video, the spatial correlation of the CIF would be higher than the HD. However, based on a large set of Test Sequence Resolution r(1) r(2) µ r1 σ r1 µ r2 σ r2 Riverbed Flamingo Kayak HD Tractor Pedestrians Station Fireworks Raven Optis Night HD Crew Harbour City FlowerGarden F1 SD Basketball Mobile SD Foreman News CIF Tempete Mobile Paris HD videos, the conclusion is useful and quite instructive for studying the tools specially for coding HD videos. For a given bit rate, the MSE produced by transform coding improves with the block size, but the improvement is not significant as the block size increases beyond [18]. In other words, compared with order-8 transforms, order- 16 transforms compact energy to a smaller proportion of coefficients, which are more likely to survive the quantization. In practice, for image or video coding, the performance of 2-D order-16 transform suffers nonoptimal entropy coding (typically run-length coding), as it is difficult to accurately model the distribution of a run, which has a much larger dynamic range than that produced by order-8 transforms and dramatically varies throughout video sequences. Besides, order-16 transform is much more computationally complex, so order-8 transform is a good trade-off in the current image and video coding standards. However, based on the aforementioned observation that the prediction errors of HD videos have higher correlation, 2-D order-16 transforms deserve more study as a special coding tool, as the transform coefficients have high probabilities to be distributed in the low frequency domain and thus run can be modeled more accurately. Table II also shows the standard deviations, σ r1 and σ r2,of r(1) and r(2) of prediction errors, where there is no significant difference between HD and low resolution videos. So, the

4 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1465 Fig. 1. Fig. 2. Flow diagram of traditional ICTs. Flow diagram of the pre-scaled integer transform. spatial correlation of prediction errors varies greatly within an HD video sequence, just like that in low resolution videos. To exploit such a property, VBT should be employed, in which smaller transforms are the complements to the 2-D order-16 one, in order to reduce the complexity and avoid the ringing artifacts especially in more detailed areas. III. The Proposed 2-D Order-16 Integer Transforms A. Review of ICT ICT is generated from the DCT by replacing the realnumbered elements of the DCT matrix with integers while maintaining the structure, such as relative magnitudes and signs, dyadic symmetries, and orthogonality, among the matrix elements [19]. ICT can be implemented using integer arithmetic without mismatch between the encoder and decoder and if well-designed, can provide almost the same compression efficiency as the DCT. Furthermore, ICT has fast algorithms, which can be developed in the similar way for the DCT. The flow diagram of ICT from the input data to the reconstructed data is shown in Fig. 1. As the transform matrix of ICT contains only integers and the norms of basis vectors are much larger than unity, the transformation process shown as (2) is not normalized. Therefore, a normalization process in (3), known as scaling, is required after transformation to ensure the conservation of energy and the reversibility of ICT F n n = T n f n n T T n (2) S n n = F n n //R n n. (3) In (2) and (3), f n n is the input data, T n is the n n transform matrix, S n n is the output of ICT and symbol // indicates that each element of the left matrix is divided by the element at the same position of the right one. The elements in the scaling matrix R n n are derived from the norms of basis vectors of T n as below, where vector m n 1 contains the norms of all basis vectors R n n = m n 1 m T n 1 (4) m n 1 (i) = n 1 Tn 2 (i, j), 0 i<n. (5) j=0 The divisions in (3) are usually approximated by integer multiplication and shift as shown in (6) S n n = F n n P n n >> N (6) where symbol indicates that each element of the left matrix is multiplied by the element at the same position of the right one and >> N means the N-bit right shift. Similarly, for the inverse ICT, the whole process including inverse scaling and inverse transformation can be represented by (7) as follows: f n n =(Tn T (S n n P n n ) T n ) >> N. (7) The block diagram shown in Fig. 1 is further simplified by the pre-scaled integer transform (PIT) (see Fig. 2), which moves the inverse scaling from the decoder to the encoder and combines it with the forward scaling as one single process, named combined scaling. With PIT, the inverse scaling is saved and thus no scaling matrix is needed to be stored, whilst the complexity of encoders remains unchanged. However, the inverse scaling moved to the encoder side can be considered as a frequency weighting matrix to the transformed signals. In order not to alter the energy distribution of the transformed signals significantly, the elements magnitudes in the scaling matrix should be almost the same. In other words, the norms of the basis vectors should be very close to each other according to (4) and (5), which is a key for PIT designing. Interested readers may refer to [20] for more details. B. The Proposed Order-16 NICT The general transform matrix of an order-16 ICT is shown in (8), at the bottom of the page [19], which has alternating even and odd symmetry with respect to the line. The basis vectors with even symmetry form the even part of the transform matrix, T 8e, whereas those with odd symmetry form the odd T 16 = x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 1 x 3 x 5 x 7 x 9 x 11 x 13 x 15 x 15 x 13 x 2 x 6 x 10 x 14 x 14 x 10 x 6 x 2 x 2 x 6 x 3 x 9 x 15 x 11 x 5 x 1 x 7 x 13 x 13 x 7 x 4 x 12 x 12 x 4 x 4 x 12 x 12 x 4 x 4 x 12 x 5 x 15 x 7 x 3 x 13 x 9 x 1 x 11 x 11 x 1 x 6 x 14 x 2 x 10 x 10 x 2 x 14 x 6 x 6 x 14 x 7 x 11 x 3 x 15 x 1 x 13 x 5 x 9 x 9 x 5 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 9 x 5 x 13 x 1 x 15 x 3 x 11 x 7 x 7 x 11 x 10 x 2 x 14 x 6 x 6 x 14 x 2 x 10 x 10 x 2 x 11 x 1 x 9 x 13 x 3 x 7 x 15 x 5 x 5 x 15 x 12 x 4 x 4 x 12 x 12 x 4 x 4 x 12 x 12 x 4 x 13 x 7 x 1 x 5 x 11 x 15 x 9 x 3 x 3 x 9 x 14 x 10 x 6 x 2 x 2 x 6 x 10 x 14 x 14 x 10 x 15 x 13 x 11 x 9 x 7 x 5 x 3 x 1 x 1 x 3 (8)

5 1466 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 part, T 8o. Designing an order-16 ICT is equivalent to designing T 8e and T 8o, and the orthogonality of each part is necessary and sufficient for the orthogonality of the order-16 ICT x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 0 x 2 x 6 x 10 x 14 x 14 x 10 x 6 x 2 x 4 x 12 x 12 x 4 x 4 x 12 x 12 x 4 T 8e = x 6 x 14 x 2 x 10 x 10 x 2 x 14 x 6 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 8 x 10 x 2 x 14 x 6 x 6 x 14 x 2 x 10 x 12 x 4 x 4 x 12 x 12 x 4 x 4 x 12 x 14 x 10 x 6 x 2 x 2 x 6 x 10 x 14 x 1 x 3 x 5 x 7 x 9 x 11 x 13 x 15 x 3 x 9 x 15 x 11 x 5 x 1 x 7 x 13 x 5 x 15 x 7 x 3 x 13 x 9 x 1 x 11 T 8o = x 7 x 11 x 3 x 15 x 1 x 13 x 5 x 9 x 9 x 5 x 13 x 1 x 15 x 3 x 11 x 7. (10) x 11 x 1 x 9 x 13 x 3 x 7 x 15 x 5 x 13 x 7 x 1 x 5 x 11 x 15 x 9 x 3 x 15 x 13 x 11 x 9 x 7 x 5 x 3 x 1 T 8e is an order-8 ICT. For the compatibility, it is designed to be the scaled version of the transform matrix of the order-8 ICT in H.264 or AVS, according to whichever standard it is integrated into. The scaling factor is 4. In other words, the element set of the even part, {x 0,x 2,...x 12,x 14 }, is equal to {32, 40, 40, 36, 32, 24, 16, 8} and {32, 48, 32, 40, 32, 24, 16, 12}, as proposed to AVS and H.264, respectively. The possible element sets of an orthogonal T 8o, i.e., x 1, x 3,... x 13, x 15, are the solutions of a set of three quadratic equations [21]. The solutions have relatively large magnitudes, i.e., represented by at least 6 bits, and those with small DCT distortion [7] have even larger magnitudes that increase the computational complexity significantly. In this paper, NICT is proposed for the freedom of making trade-off between complexity and performance at the expense of orthogonality. Except orthogonality, NICT preserves all the advantages of ICT, such as bit-exact implementation and the fast algorithm. Based on the property of orthogonal transforms, if there is no quantization in the frequency domain, signals can be reconstructed perfectly; otherwise, the average variance of the reconstruction error equals that of the quantization error [11]. For a nonorthogonal transform, the average variance of the reconstruction error is larger than that of the quantization error, and even though there is no quantization, the reconstruction is imperfect. Hence, we analyze the transform error introduced by the nonorthogonality. First of all, we discuss the reconstruction error without quantization. Let vector x be a sample from a 1-D, zeromean, unit-variance first-order Markov process with length N and adjacent element correlation ρ. x is transformed by nonorthogonal T, and the vector θ is obtained in the transform domain, i.e., θ = Tx. Let the reconstructed vector be y, where y = T T θ = T T Tx. Obviously, T T T is not equal to the identity matrix I, and the difference between T T T and I is denoted as E r, i.e., E r = T T T I. The average variance of the (9) reconstruction error σr0 2 is expressed as σr0 2 = 1 N 1 E [(x(k) y(k)) 2 ]= 1 N N E [(x y)t (x y)] k=0 = 1 N E [(E rx) T (E r x)] = 1 N E [xt E T r E rx]. (11) Then, we denote M = Er T E r, and the deduction continues σr0 2 = 1 N E [xt Mx] = 1 N 1 N 1 M(k, j)e [x(j)x(k)] N k=0 j=0 = 1 N 1 N 1 M(k, j)r(j k) N k=0 j=0 1 N 1 N 1 N σ2 x M(k, j) (12) k=0 j=0 where R(.) and σx 2 are the autocorrelation function and the variance of the source, respectively. As the autocorrelation is less than or equal to the variance, the upper bound of σr0 2 for a given NICT is shown in (12). Hence, minimizing σr0 2 is equivalent to minimizing the term N 1 N 1 k=0 j=0 M(k, j) in (12). Table III gives some examples of σr0 2 in the design of NICT with σx 2 equal to 1.0, and it will be explained later. Then, we take the quantization into consideration and show the relationship of the average variances of the reconstruction error σr 2 and the quantization error σ2 q. Let u be the quantized version of θ and y r be the reconstructed vector, i.e., y r = T T u. The quantization error is denoted as q, i.e., q = θ u. The average variance of σr 2 is expressed by (13) σr 2 = 1 N 1 E [(x(k) y r (k)) 2 ] N k=0 = 1 N E [(x y r) T (x y r )] = 1 N E [(T 1 θ T T u) T (T 1 θ T T u)]. (13) Since T 1 is not equal to T T with the nonorthogonal T,we use (T T T er ) to represent T 1, and substitute T 1 into (13) with (T T T er ). Then we get (14) σr 2 = 1 N E [(T T q T er θ) T (T T q T er θ)] = 1 N E [qt TT T q 2q T TT er θ +(T er θ) T T er θ] = 1 N E [qt (I + E r )q 2q T T (T T T 1 )θ +(T T θ T 1 θ) T (T T θ T 1 θ)] = 1 N E [qt q + q T E r q 2q T E r θ +(y x) T (y x)] = σ 2 q + σ2 r0 + 1 N E [qt E r q] 2 N E [qt E r θ]. (14) The third and fourth terms in (14) are both equal to zero, because the autocorrelation of q and the cross-correlation of q and θ are both zero as shown by Widrow et al. [22]. Therefore, the relationship between σr 2 and σ2 q is shown in (15) σr 2 = σ2 q + σ2 r0. (15)

6 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1467 TABLE III Comparison of Sets of Matrix Elements x 1 x 3 x 5 x 7 x 9 x 11 x 13 x 15 DCT Upper Bound Distortion (%) of σ 2 r0 (10 7 ) Clearly, the average variance of the reconstruction error σr 2 is always σr0 2 larger than the quantization error variance σ2 q,no matter the transform coefficients are quantized or not, whereas with an orthogonal transform, σr 2 is equal to σ2 q. The key of designing an NICT is the balance among σr0 2, the approximation of the DCT, and the magnitudes of the matrix elements. Table III gives 11 element sets of T 8o in (10). Their DCT distortions [7] and upper bounds of σr0 2 in (12) shown in the table are of T 8o instead of T 16 in (8). The upper five sets are given by Cham and Chan [21], which lead to orthogonal T 8o, but the magnitudes and DCT distortion of these sets are larger than the rest six sets leading to nonorthogonal T 8o. Although nonorthogonal, the bottom six element sets are all with negligible σr0 2, and specially, the one in the bottom row has the minimum DCT distortion and σr0 2. Another advantage of this element set is that its norm is close to those of the even part T 8e in (9) and therefore is suitable for PIT implementation, which is required when integrated into the AVS platform. Hence, the vector in the bottom row is used for the element set of the odd part, and the performance will be verified in Section VI. C. The Proposed Order-16 MICT The MICT transform matrix is obtained by modifying the structure of the order-16 DCT matrix using the principle of dyadic symmetry. The basis vectors of the order-16 MICT are inherently orthogonal no matter what the elements are, so compared with the elements in the NICT transform matrix, even smaller magnitudes can be selected. The even part of the general transform matrix remains the same as the T 8e in (9), while the odd part is changed, shown as (16) x 1 x 3 x 5 x 7 x 9 x 11 x 13 x 15 x 9 x 11 x 13 x 15 x 1 x 3 x 5 x 7 x 5 x 7 x 1 x 3 x 13 x 15 x 9 x 11 M 8o = x 15 x 13 x 11 x 9 x 7 x 5 x 3 x 1 x 13 x 15 x 9 x 11 x 5 x 7 x 1 x 3. (16) x 3 x 1 x 7 x 5 x 11 x 9 x 15 x 13 x 7 x 5 x 3 x 1 x 15 x 13 x 11 x 9 x 11 x 9 x 15 x 13 x 3 x 1 x 7 x 5 Fig. 3. Compatible structures for order-8 and order-16 (a) forward transformation and (b) inverse transformation. With the even part T 8e in (9) and the odd part M 8o in (16), the first three basis vectors of the order-16 MICT transform matrix resemble those of the DCT and represent low frequency components. So the MICT shows good energy compaction capability especially for the regions in HD videos, where pixels are higher correlated. Though the dyadic symmetries of most basis vectors in the odd part of the transform matrix are different from those in the DCT matrix, the MICT will not bring significant performance penalty, if the elements are well-selected, as evident in Section VI. Since the elements in the transform matrices are selected without orthogonality constraint, it is free to make a tradeoff between the performance and the magnitudes of the elements, i.e., the computational complexity. In this paper, the magnitudes of elements are represented by only 3 to 4 bits and 16-bit integer implementation can be developed easily. In detail, the even part is selected to be the same as the order-8 ICT in AVS or H.264, according to whichever it is integrated into. For the odd part, there are three considerations for elements values. First, the magnitudes should be comparable to those of the even part. Second, the waveform of the second basis of the MICT transform matrix should resemble that of the DCT. Third, it should be suitable for the design of fast algorithm, which will be illustrated in Section IV-B. Taking all these constraints into consideration, {x 1,x 3,...x 13,x 15 } is designed to be {11, 11, 11, 9, 8, 6, 4, 1}. IV. Complexity Reduction This section introduces the research efforts we have made for complexity reduction. First of all, the 2-D order-16 integer transforms are compatible with the 2-D order-8 ICT in AVS EP or H.264 HP, according to whichever standard they are integrated into. Furthermore, the fast algorithm for the 2-D order-16 MICT proposed in Section III-C is developed and extended to a general approach. A. Compatibility with the Order-8 ICT It is natural that the order-16 DCT is compatible with the order-8 one. However, it is not always true for integer transforms. Bossen [23] points out that the compatibility of order-8 and order-4 ICTs can be achieved by taking the order-4 ICT as the even part of the order-8 one. In this paper, the order-

7 1468 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER integer transforms use the order-8 ICTs in the standards or their scaled version as the even part for the compatibility. First, the compatible transformation is presented. Fig. 3(a) shows the flow diagram of the forward transformation, applicable for both types of order-16 integer transforms. The bottomright and upper-right blocks complete the function of matrix multiplication with the odd and even parts of T 16, respectively. In case of the order-16 MICT, where T 8e is exactly the same as the order-8 ICT, the upper-right block may be reused directly for the order-16 forward transformation. In the other case of the order-16 NICT, where T 8e is equal to the order-8 ICT scaled by a factor 4, the output of upper-right block should be left shifted 2 bits before they are reused for the forward transformation of the order-16 NICT. Therefore, a module specially for the order-8 forward transformation is saved, and the output data from the order-8 forward transformation can be reused by the order-16 one. For the inverse transformation, the compatibility can be achieved in a similar fashion, as shown in Fig. 3(b). Based on the compatibility of transform matrix, the inherent relationship between the scaling matrices, R 8 8 and R 16 16,is derived as (17), according to (4) and (5) R 8 8 (i, j) 2 M = R (2i, 2j), 0 i, j < 8 (17) where M equals 5 and 1 for the case of the NICT and the MICT, respectively. When the divisions in (3) are approximated by integer multiplication and right shift as shown in (6), the relationship between the scaling matrix P 8 8 and P for the scaling is P 8 8 (i, j) =P (2i, 2j) 2 K, 0 i, j < 8 (18) where K equals 5 and 1 for the case of the NICT and the MICT, respectively. When the order-16 integer transforms are implemented as PIT to be compatible with the order- 8 PIT in AVS, (18) is also true for the combined scaling with K doubled. Interested readers may refer to [24] for more details. Equation (18) indicates that P 8 8 (i, j) is 2 K times larger than P (2i, 2j) if N, the number of bits for right shift in (6), is the same for the order-16 and the order-8 transforms. The difference between P 8 8 (i, j) and P (2i, 2j) can be easily sorted out by shift operations. In other words, P (2i, 2j), 0 i, j < 8, may be used for the combined scaling of the order-8 ICT but N in (6) for 8 8 scaling is K bits less than that for the scaling. Hence, the memory for storing an 8 8 scaling matrix is saved, thus achieving the compatibility of scaling matrix. B. Fast Algorithm for the Order-16 MICT ICT and NICT have fast algorithms, which can be developed in a similar way for the DCT. However, fast algorithms for the MICTs are not guaranteed, and little research effort has been devoted to it. We propose a universal approach to developing fast algorithms for the integer transforms with different structures from the DCT matrix, including the MICT. Due to the separability of a 2-D transform and the similarity of the forward and inverse transformation flows, we focus on the fast algorithm for 1-D order-16 transformation in this section. The eight butterflies in the left part of Fig. 3(a) can be easily developed as the first stage, which exploits the symmetries with respect to the line in (8). The upper-right block completes the function of matrix multiplication with the even part, which is the same as the order-8 ICTs in the standards in this paper. Fortunately, both order-8 ICTs in AVS and H.264 have efficient fast algorithms, which are borrowed here. For a more general case that the even part of the MICT has different structure with T 8e in (9), the fast algorithms can be developed in the same way for M 8o as introduced below. The bottom-right block in Fig. 3(a), completing the function of matrix multiplication with the odd part, has a quite different structure from the odd part of the order-16 DCT matrix. So we cannot decompose it to butterfly operations just as we do for T 8e. Instead, we represent M 8o as the product of three 8 8 matrices, as shown in (19) M 8o = M 1 M 2 M 3. (19) There are some considerations for the three matrices. First, they should contain integers only. Second, the integers should be very small such that only additions and shifts are involved and multiplications are avoided in fast algorithms. Last but not least, the matrices should be sparse, which means that many zeros appear in the matrices. Obviously, row vectors in M 8o are orthogonal and have the same length, where the length of a row vector a is defined as a a T. We denote the length of every row vector of M 8o as n O, and clearly det(m 8o ) is equal to n 4 O, where det(.) and. mean determinant and absolute value, respectively. To simplify the problem, we assume M 1, M 2, and M 3 in (19) are also row-orthogonal and the row vectors have the same length in each matrix, denoted as n 1, n 2, and n 3, respectively. Similarly, det(m 1 ), det(m 2 ), and det(m 3 ) are equal to n 4 1, n4 2, and n4 3, respectively. If (19) is satisfied, (20) is necessarily true det(m 8o ) = n 4 O = det(m 1) det(m 2 ) det(m 3 ) = n 4 1 n4 2 n4 3. (20) Therefore n O = n 1 n 2 n 3. (21) So, when designing an order-16 MICT with a fast algorithm, we should make sure that n O of M 8o is the product of three integers. Otherwise, (19) has no solution under this assumption. With the constraint of (21), the lengths of row vectors in M 1, M 2, and M 3 are much smaller than that in M 8o, which means the elements in the three matrices have very small magnitudes and may also contain many zeros. Among the three matrices, M 3 will be found first. Both sides of (19) are postmultiplied by the inverse of M 3, which is equal to M3 T /n 3, so (19) changes to (22) as follows: M 8o M3 T /n 3 = M 1 M 2. (22) It is noticed that M3 T may be regarded as a set of eight column vectors and M 8o M3 T /n 3 contains only integers. So we first collect a set of column vectors by exhaustive search, in which each column vector, b i, satisfies two conditions. One is that the length of b i is n 3, and the other is the elements in (M 8o b i /n 3 ) are all integers. If (19) has solutions, we can pick out eight

8 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1469 orthogonal column vectors from the set to form M3 T and thus get M 3. To find M 2, both sides of (22) are postmultiplied by the inverse of M 2, i.e., M2 T /n 2, so (22) becomes (23) (M 8o M3 T /n 3) M2 T /n 2 = M 1. (23) M 2 can be obtained by the similar method of searching M 3. Finally, with M 2 and M 3 known, M 1 is computed by (23). For the case of M 8o in this paper, its determinant is which is equal to and the corresponding n O is 561. To show that n O is a product of three integers as in (21), we let n 1, n 2, and n 3 be 3, 11, and 17 with the order changeable. Then the method introduced above is used to search for M 1, M 2, and M 3, and (24), shown at the bottom of the page, shows one of the solutions of (19). The magnitudes of elements in (24) are so small that the fast algorithm can be implemented by using only additions and shifts. Fig. 4 shows the flow diagram of the forward transformation of the order-16 MICT. The shifts are not indicated in the bottom-right block for a clearer view. Fig. 4. Flow diagram of the order-16 forward transformation. V. Integration of the 2-D Order-16 Integer Transforms into the Standards In this paper, the proposed two types of 2-D order-16 integer transforms are integrated into AVS EP [5] and H.264 HP [3], respectively, and used adaptively as an alternate to the 2-D order-8 ICTs. For H.264, the 2-D order-4 ICT is not included in the VBT scheme, although it exists in the HP. That is because the 2-D order-4 ICT does not contribute much to the efficiency of HD video coding, as verified in Section VI. The selection of transform size is based on the MB-level R D cost [17]. The transform with lower cost is selected and a 1-bit binary signal is transmitted in MB header for the indication. For AVS EP, the largest block size for intra prediction is 8 8. To apply the 2-D order-16 integer transforms to intracoded MB, the intra prediction is employed, which is simply extended from the five prediction modes in AVS. The reference pixels are not used for prediction directly, but filtered by a 3-tap low-pass filter [1 2 1]/4 at first. Otherwise, visible artifacts would be introduced due to large block size intra prediction [25], [26]. In AVS EP and H.264 HP, the 8 8 residual blocks are coded by CABAC [3], [5]. Their arithmetic coding engines can code sources with various statistics by using many automatically updated probability models and therefore remain unchanged in this paper. Two additional sets of probability models specially for coding the residual blocks are designed for AVS and H.264, respectively, to drive the arithmetic coding engine. CBP is a 6-bit integer in AVS and H.264, of which the four least significant bits (LSB) indicate whether the four 8 8 luminance blocks contain nonzero coefficients, respectively. In this paper, if an MB is coded by the 2-D order-16 transform, CBP becomes a 3-bit integer with the LSB indicating whether the luminance block contains nonzero coefficients. VI. Experimental Results A. Transform Coding Gain An important measure for the evaluation of the compression performance is the transform coding gain G TC [11], which weights the reconstruction error variance of quantized transform coding σtc 2 and the corresponding reconstruction error variance of PCM coding σpcm 2 G TC = σ2 PCM σtc 2. (25) Under the assumptions of optimum quantization and bit allocation at the same overall rate, G TC is the ratio of arithmetic to geometric means of the variances of transform coefficients σ 2 θk (0 k<n), as shown in (26) G TC = 1 N N 1 k=0 σ2 θk. (26) ( N 1 k=0 σ2 θk ) 1 N Suppose the source is 1-D zero-mean, unit-variance firstorder Markov processes with adjacent element correlation ρ M 8o =M 1 M 2 M 3 = (24)

9 1470 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 TABLE IV Test Conditions Fig. 5. Comparing coding gain of orthogonal transforms. Platform Sequence structure Intra frame period Entropy coding Fast motion estimation Deblocking filter R D optimization QP RM62b (AVS), JM11 (H.264) IBBPBBP s Arithmetic coding on on on fixed (27, 30, 35, 40, AVS) fixed (20, 24, 28, 32, H.264) off 2 (AVS), 5 (H.264) Rate control Reference frame Search range ±32 Frame number 60 ranging from 0.55 to 0.95, which is typical for the prediction errors of HD videos (see Table II). Fig. 5 shows the coding gains of the order-16 DCT, the proposed MICTs for AVS and H.264, respectively, and the two MICTs proposed by Wien [7] and Ma [10], respectively. DCT has the highest coding gain, no matter what the value of ρ is. The coding gains of the proposed two MICTs are too close to be distinguished from each other, and are higher than those of the MICTs proposed by Ma and Wien, especially when ρ approaches 1.0. B. Reconstruction Error of Nonorthogonal Transforms In the case of nonorthogonal transforms, even with optimal quantization and bit allocation, (26) cannot be derived to substitute (25), because σpcm 2 depends on quantization only, whereas σtc 2 depends not only on quantization but also on the variance of the source. Therefore, we cannot evaluate the performance of the proposed NICTs by (26) and instead, we have to implement the transforms and quantization and compare the reconstruction errors thus introduced. Seven order-16 transforms are tested, including the order-16 DCT, the proposed two NICTs and the four ICTs, of which the even part is the order-8 ICT in H.264 and the element sets of the odd part are shown in the first four rows of Table III. These transforms are applied to the 1-D zero-mean, unit-variance first-order Markov processes with adjacent element correlation ρ ranging from 0.55 to Before inverse transformation, the transform coefficients are zonal filtered [11], which means the bit rate is reduced by simply retaining the first n (n<16) transform coefficients. The MSE between the source and the reconstructed value are calculated. Fig. 6 compares the MSE introduced by 4:1, 4:2, and 4:3 zonal filters. In detail, the three zonal filters retain the first 4, 8, and 12 transform coefficients respectively, and set all the others to zero. For a clear view, the curves in Fig. 6 present the difference MSE compared with that of the order-16 DCT. We notice that the four ICTs produce larger MSE than the proposed two NICT, although orthogonal. Therefore, the performance of an integer transform is greatly determined by the approximation of the exact DCT rather than the orthogonality, if σr0 2 is negligible compared with σ2 q in (15). TABLE V Experimental Results on the AVS Platform Proposed NICT Proposed MICT HD Sequence PSNR Gain Bit Saving PSNR Gain Bit Saving (db) (%) (db) (%) Bigship City Crew Optis Raven Sheriff Pedestrian Riverbed RushHour Station Sunflower Tractor Average C. Objective Evaluation The proposed two types of 2-D order-16 integer transforms were, respectively, integrated into the RM62b platform for AVS EP and the JM11 platform for H.264 HP, using 16-bit integer arithmetic. The implementation problems are solved in Section V. The test conditions are listed in Table IV. Tables V and VI show performance gains, compared to AVS EP using a 2-D order-8 ICT and H.264 HP using both 2-D order-4 and order-8 ICTs, respectively. The improvement is measured by PSNR gain for equal bit rate or by bit rate saving at the same PSNR, using the method in [27]. With the NICT, the improvements are all greater than 0.1 db and more than 0.2 db on average, whilst for the best case of Riverbed, the gain is up to 0.47 db on the AVS platform and 0.48 db on the H.264 platform. The sequence Riverbed has smooth textures and global motions, so the energy of prediction errors is quite small, no matter using inter or intra prediction. In the transform domain, the prediction errors can be represented by only a few coefficients of the 2-D order-16 NICT, most of which cluster in the low frequency area. With the MICT, the coding efficiency is improved on an average of around 0.06 db less than that by NICT on both platforms, but for the sequences with large homogeneous regions or smooth motions, such as Crew, Pedestrian, RushHour,

10 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1471 Fig. 6. MSE with (a) 4:1, (b) 4:2, and (c) 4:3 zonal filtering. TABLE VI Experimental Results on the H.264 Platform Proposed NICT Proposed MICT HD Order-4, order-8, order-16 Order-8, order-16 Order-4, order-8, order-16 Order-8, order-16 Sequence PSNR Gain Bit Saving PSNR Gain Bit Saving PSNR Gain Bit Saving PSNR Gain Bit Saving (db) (%) (db) (%) (db) (%) (db) (%) BigShip City Crew Optis Raven Sheriff Pedestrian Riverbed RushHour Station Sunflower Tractor Average and Sunflower, the performance of the MICT is comparable to or even better than the NICT, due to the efficiency of the first three basis vectors. However, for the sequences full of details, such as Sheriff, Station, and Tractor, the gain of the MICT becomes trivial because of the relative inefficiency to compress high frequency components. Overall, the performance gap between the proposed NICT and MICT is small. Fig. 7(a) and (b) show the percentage of MBs coded by the 2-D order-16 integer transforms. On average, more than half of the MBs are coded by 2-D order-16 transforms, and for some sequences, such as Sunflower and Station, the percentage of 2-D order-16 transforms is up to 80%. This implies that 2-D order-16 integer transforms are very useful in HD video coding. There is a tendency that the higher the bit rate is, the less the 2-D order-16 integer transforms are used. That is because at high bit rates, many high frequency coefficients survive the quantization and high-cost coefficients, i.e., a (level, run) pair with large run, have to be coded due to the large block size of Hence, 2-D order-8 ICT is more likely to be selected. We also study the VBT scheme using three transforms, 2-D order-4, order-8, and order-16 transforms, on the H.264 platform. As shown in Table VI, the performance gap of the two VBT schemes with and without the 2-D order-4 ICT is marginal. The same conclusion can also be drawn by Fig. 7(c) and (d), where the percentages of MBs coded by 2-D order-4, order-8, and order-16 transforms are compared on the H.264 platform. The percentage of using 2-D order-4 ICT is very small as we expected. Therefore, in the proposed VBT scheme, the 2-D order-4 ICT in H.264 is removed. D. Subjective Evaluation Using 2-D order-16 transform can preserve more details and provide better visual quality, especially at low bit rate. Fig. 8 gives two examples for the subjective comparison. The images are all cropped from H.264-coded HD videos with size of pixels. The indicated PSNR and number of bits are the R D information of the associated frames. Obviously, the vertical edges of the buildings in City and the horizontal edges of the railway sleepers in Station are better preserved by the additional 2-D order-16 transforms. However, at high bit rate, the visual quality without using 2-D order-16 transform is good enough and therefore the improved fidelity by using 2-D order-16 transforms is hard to be visually observed.

11 1472 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 Fig. 7. Proportion of different block size transforms used in H.264 HP. (a) 2-D order-8 ICT and order-16 NICT. (b) 2-D order-8 ICT and order-16 MICT. (c) 2-D order-8 and order-4 ICTs and 2-D order-16 NICT. (d) 2-D order-8 and order-4 ICTs and 2-D order-16 MICT. Fig. 8. Images cropped from City (720p) and Station (1080p) both coded with QP = 32. (a) and (d) are coded using H.264 HP. (b) and (e) are coded using H.264 HP with additional 2-D order-16 NICT. (c) and (f) are coded using H.264 HP with additional 2-D order-16 MICT. TABLE VII Operations of Fast Algorithm and Matrix Multiplication Operation Fast MICT Matrix Multiplication Addition Multiplication Shift 32 0 TABLE VIII Execution Time of Fast Algorithms and Matrix Multiplication Fast Algorithm (s) Matrix Multiplication (s) Saving Time (%) DCT % MICT % M 8o % E. Efficiency of the Fast Algorithm If the transform is implemented by matrix multiplication, the complexity is high, due to many additions and multiplications introduced by the dot products. The fast algorithm developed in Section IV-B can implement the transformation process of the MICT by shifts and additions only. Table VII compares the numbers of additions, shifts, and multiplications required by transforming a 16-point vector using the fast algorithm and

12 DONG et al.: 2-D ORDER-16 INTEGER TRANSFORMS FOR HD VIDEO CODING 1473 matrix multiplication, respectively. Without multiplication, the computational complexity is reduced significantly. However, the number of additions required for the fast algorithm is considerable, because multiplications with small magnitude integers are replaced by combinations of additions and shifts. The 2-D order-16 MICT was implemented using both fast algorithm and matrix multiplication by the C language. Ten thousand random blocks were generated, in which the numbers were uniformly distributed from 256 to 255. The total execution time for transforming these blocks by the two methods was recoded, respectively. The PC platform has a 3.2 GHz Intel Pentium 4 CPU and 1.0 GB RAM. Two Windows API functions QueryPerformanceFrequency and QueryPerformanceCounter are used for timing [28], which measure the execution time to nanosecond accuracy. The time comparison is shown in the third row of Table VIII, where using fast algorithm can save about 89% of the computational time. The efficiency of the fast algorithm for the MICT is compared with that of the fast DCT (FDCT) [29], because there are no publications related to fast algorithms for 2-D order- 16 ICT, NICT or MICT. We did the experiment as described above for the case of the DCT, and the time for computing the 2-D order-16 DCT using FDCT and matrix multiplication are recorded, respectively. The second row of Table VIII shows FDCT can save 93% of the computational time compared with using matrix multiplication. The proposed fast algorithm for the 2-D order-16 MICT is 4% less efficient than the FDCT for the DCT because of the different dyadic symmetries in M 8o. At the same time, we are also concerned with the efficiency of the fast algorithm used for the bottom-right module in Fig. 3(a), which is specially designed for the multiplication with matrix M 8o. Similarly, ten thousand 8 8 random blocks were generated, in which the numbers were uniformly distributed from 256 to 255. As shown in the fourth row of Table VIII, the fast algorithm can save almost 77% computational time compared with matrix multiplication. This proves that the fast algorithm for M 8o has lower efficiency than that of the whole 2-D order-16 MICT. If the structure of the even part is different from T 8e in (9) and some existing fast algorithms of 2-D order-8 ICTs cannot be used, the method for decomposing M 8o has to be applied to the even part and the fast algorithm will be about 10% less efficient, but overall, the computational complexity is reduced significantly. VII. Conclusion In this paper, 2-D order-16 transforms are proposed to exploit the spatial property of HD videos. Specifically, we propose two types of 2-D order-16 integer transforms, NICT and MICT, both of which allow selecting matrix elements more freely by releasing the orthogonality constraint and can provide comparable performance with that of the DCT. The latter is even simpler. Furthermore, considerable effort has been spent to reduce the complexity, including compatibility with the 2-D order-8 ICT in AVS or H.264 and the development of the fast algorithm. Both 2-D order-16 integer transforms are integrated into AVS EP and H.264 HP, respectively, and used adaptively as an alternative to the 2-D order-8 ICTs according to local activities. The experimental results show that 2-D order-16 integer transforms can be efficient coding tools especially for HD video coding. References [1] Generic Coding of Moving Pictures and Associated Audio Information Part 2: Video, ITU-T Rec. H.262 and ISO/IEC MPEG-2, [2] Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification, document JVT-G050, ITU-T Rec. H.264 and ISO/IEC AVC, Mar. 2003, [Online]. Available: [3] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 and ISO/IEC AVC, Mar [4] Information Technology-Advanced Coding of Audio and Video-Part 2: Video, GB/T AVS, May [5] AVS Video Group. (2006, Sep.). Information Technology-Advanced Coding of Audio and Video: Part 2. Video, AVS-N1318. [6] D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards and Practice. Boston, MA: Kluwer, [7] ICT Comparison for Adaptive Block Transforms, document ITU-T SG16/Q6 VCEG and VCEG-L12, Jan. 2001, [Online]. Available: [8] Simplified Adaptive Block 762 Transforms, document ITU-T SG16/Q6 VCEG, Dec. 2001, [Online]. Available: [9] S. Naito and A. Koike, Efficient coding scheme for super high definition video based on extending H.264 high profile, in Proc. SPIE Vis. Commun. Image Process., vol Jan. 2006, pp [10] S. Ma and C.-C. Kuo, High-definition video coding with supermacroblocks, in Proc. SPIE Vis. Commun. Image Process., vol Jan. 2007, pp [11] N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video. Englewood Cliffs, NJ: Prentice-Hall, [12] J. Liang and T. D. Tran, Fast multiplierless approximations of the DCT with the lifting scheme, IEEE Trans. Signal Process., vol. 49, no. 12, pp , Dec [13] A 16-bit Architecture for H.264, Treating DCT Transform and Quantization, document ITU-T SG16/Q6 VCEG and VCEG-M16, Apr. 2001, [Online]. Available: [14] A. Habibi, Comparison of nth-order DPCM encoder with linear transformation and block quantization techniques, IEEE Trans. Commun., vol. 19, no. 6, pp , Dec [15] W. K. Pratt, Digital Image Processing. New York: Wiley, [16] A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes. Boston, MA: McGraw-Hill, [17] T. Wiegand and B. Girod, Lagrange multiplier selection in hybrid video coder control, in Proc. IEEE Int. Conf. Image Process., vol. 3. Oct. 2001, pp [18] A. N. Netravali and J. O. Limb, Picture coding: A review, Proc. IEEE, vol. 68, no. 3, pp , Mar [19] W. K. Cham, Development of integer cosine transforms by the principle of dyadic symmetry, in Proc. Inst. Electr. Eng. I: Commun. Speech Vis., vol no. 4, 1989, pp [20] C.-X. Zhang, J. Lou, L. Yu, J. Dong, and W. K. Cham, The technique of pre-scaled integer transform in hybrid video coding, in Proc. IEEE Int. Symp. Circuits Syst., vol. 1. May 2005, pp [21] W. K. Cham and Y. T. Chan, An order-16 integer cosine transform, IEEE Trans. Signal Process., vol. 39, no. 5, pp , May [22] B. Widrow, I. Kollar, and M.-C. Liu, Statistical theory of quantization, IEEE Trans. Instrum. Meas., vol. 45, no. 2, pp , Apr [23] ABT Cleanup and Complexity Reduction, document JVT-E087.doc, Oct. 2002, [Online]. Available: [24] J. Dong, J. Lou, C.-X. Zhang, and L. Yu, A new approach to compatible adaptive block size transforms, in Proc. SPIE Vis. Commun. Image Process., vol Jul. 2005, pp [25] M. Wien, Variable block size transforms for H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp , Jul [26] Study of ABT for HDTV Coding, document JVT-E073r2.doc, Oct. 2002, [Online]. Available: [27] Calculation of Average PSNR Differences Between RD-curves, document ITU-T SG16/Q6 VCEG, Apr. 2001, [Online]. Available:

13 1474 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 [28] Microsoft Developer Network. [Online]. Available: microsoft.com/en-us/library/ms674866(vs.85).aspx [29] C. Loeffler, A. Ligtenberg, and G. S. Moschytz, Practical fast 1-D DCT algorithms with 11 multiplications, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 2. May 1989, pp Jie Dong (S 07) received the B.Eng. and M.Eng. degrees in information engineering from Zhejiang University, Hangzhou, China, in 2002 and 2005, respectively. She is currently working toward the Ph.D. degree in electronic engineering from the Chinese University of Hong Kong, Hong Kong, China. Her research interests include HD video compression and processing. King Ngi Ngan (M 79 SM 91 F 00) received the Ph.D. degree in electrical engineering from Loughborough University of Technology, Leicestershire, U.K. Currently, he is a Chair Professor at the Department of Electronic Engineering, the Chinese University of Hong Kong, Hong Kong, China, and was previously a Full Professor at Nanyang Technological University, Singapore, and University of Western Australia, Perth. He is the author of three books, five edited volumes, and over 200 refereed technical papers in the areas of image/video coding and communications. Prof. Ngan is an Associate Editor of the Journal on Visual Communications and Image Representation, as well as an Area Editor of the European Association for Signal Processing Journal of Signal Processing: Image Communication, and served as an Associate Editor of the IEEE Transactions on Circuits and Systems for Video Technology and Journal of Applied Signal Processing. He chaired a number of prestigious international conferences on video signal processing and communications and served on the advisory and technical committees of numerous professional organizations. He is a general co-chair of the IEEE International Conference on Image Processing, to be held in Hong Kong, in He is a Fellow of The Institution of Engineering and Technology (U.K.), and Institution of Engineers of Australia, and was an IEEE Distinguished Lecturer during Chi-Keung Fong (S 07) received the B.Eng. degree in electrical and electronic engineering from The University of Hong Kong, Hong Kong, China, in 1999, and the M.Phil. degree in electronic engineering in 2003, from the Chinese University of Hong Kong, Hong Kong, China, where he is currently working toward the Ph.D. degree. His research interests include transform coding, pattern matching, and image processing. Wai-Kuen Cham (S 77 M 79 SM 91) graduated from the Chinese University of Hong Kong, Hong Kong, China, in 1979, in electronics. He received the M.Sc. and Ph.D. degrees from Loughborough University of Technology, Leicestershire, U.K., in 1980 and 1983, respectively. Since May 1985, he has been with the Department of Electronic Engineering, the Chinese University of Hong Kong. His research interests include image coding, image processing, and video coding.

Basic Principles of Video Coding

Basic Principles of Video Coding Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion

More information

THE newest video coding standard is known as H.264/AVC

THE newest video coding standard is known as H.264/AVC IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 6, JUNE 2007 765 Transform-Domain Fast Sum of the Squared Difference Computation for H.264/AVC Rate-Distortion Optimization

More information

Digital Image Processing Lectures 25 & 26

Digital Image Processing Lectures 25 & 26 Lectures 25 & 26, Professor Department of Electrical and Computer Engineering Colorado State University Spring 2015 Area 4: Image Encoding and Compression Goal: To exploit the redundancies in the image

More information

Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding

Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding Aditya Mavlankar, Chuo-Ling Chang, and Bernd Girod Information Systems Laboratory, Department of Electrical

More information

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course L. Yaroslavsky. Fundamentals of Digital Image Processing. Course 0555.330 Lec. 6. Principles of image coding The term image coding or image compression refers to processing image digital data aimed at

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on lectures from B. Lee, B. Girod, and A. Mukherjee 1 Outline Digital Signal Representation

More information

Waveform-Based Coding: Outline

Waveform-Based Coding: Outline Waveform-Based Coding: Transform and Predictive Coding Yao Wang Polytechnic University, Brooklyn, NY11201 http://eeweb.poly.edu/~yao Based on: Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC9/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC9/WG11 MPEG 98/M3833 July 1998 Source:

More information

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256 General Models for Compression / Decompression -they apply to symbols data, text, and to image but not video 1. Simplest model (Lossless ( encoding without prediction) (server) Signal Encode Transmit (client)

More information

Module 4. Multi-Resolution Analysis. Version 2 ECE IIT, Kharagpur

Module 4. Multi-Resolution Analysis. Version 2 ECE IIT, Kharagpur Module 4 Multi-Resolution Analysis Lesson Multi-resolution Analysis: Discrete avelet Transforms Instructional Objectives At the end of this lesson, the students should be able to:. Define Discrete avelet

More information

Lossless Image and Intra-frame Compression with Integer-to-Integer DST

Lossless Image and Intra-frame Compression with Integer-to-Integer DST 1 Lossless Image and Intra-frame Compression with Integer-to-Integer DST Fatih Kamisli, Member, IEEE arxiv:1708.07154v1 [cs.mm] 3 Aug 017 Abstract Video coding standards are primarily designed for efficient

More information

Compression and Coding

Compression and Coding Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)

More information

MODERN video coding standards, such as H.263, H.264,

MODERN video coding standards, such as H.263, H.264, 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006 Analysis of Multihypothesis Motion Compensated Prediction (MHMCP) for Robust Visual Communication Wei-Ying

More information

Context-adaptive coded block pattern coding for H.264/AVC

Context-adaptive coded block pattern coding for H.264/AVC Context-adaptive coded block pattern coding for H.264/AVC Yangsoo Kim a), Sungjei Kim, Jinwoo Jeong, and Yoonsik Choe b) Department of Electrical and Electronic Engineering, Yonsei University 134, Sinchon-dong,

More information

4x4 Transform and Quantization in H.264/AVC

4x4 Transform and Quantization in H.264/AVC Video compression design, analysis, consulting and research White Paper: 4x4 Transform and Quantization in H.264/AVC Iain Richardson / VCodex Limited Version 1.2 Revised November 2010 H.264 Transform and

More information

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB)

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB) Image Compression Qiaoyong Zhong CAS-MPG Partner Institute for Computational Biology (PICB) November 19, 2012 1 / 53 Image Compression The art and science of reducing the amount of data required to represent

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2006 jzhang@cse.unsw.edu.au

More information

Single Frame Rate-Quantization Model for MPEG-4 AVC/H.264 Video Encoders

Single Frame Rate-Quantization Model for MPEG-4 AVC/H.264 Video Encoders Single Frame Rate-Quantization Model for MPEG-4 AVC/H.264 Video Encoders Tomasz Grajek and Marek Domański Poznan University of Technology Chair of Multimedia Telecommunications and Microelectronics ul.

More information

Image Data Compression

Image Data Compression Image Data Compression Image data compression is important for - image archiving e.g. satellite data - image transmission e.g. web data - multimedia applications e.g. desk-top editing Image data compression

More information

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding SIGNAL COMPRESSION 8. Lossy image compression: Principle of embedding 8.1 Lossy compression 8.2 Embedded Zerotree Coder 161 8.1 Lossy compression - many degrees of freedom and many viewpoints The fundamental

More information

Rate-Distortion Based Temporal Filtering for. Video Compression. Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801

Rate-Distortion Based Temporal Filtering for. Video Compression. Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801 Rate-Distortion Based Temporal Filtering for Video Compression Onur G. Guleryuz?, Michael T. Orchard y? University of Illinois at Urbana-Champaign Beckman Institute, 45 N. Mathews Ave., Urbana, IL 68 y

More information

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms Flierl and Girod: Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms, IEEE DCC, Mar. 007. Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms Markus Flierl and Bernd Girod Max

More information

SYDE 575: Introduction to Image Processing. Image Compression Part 2: Variable-rate compression

SYDE 575: Introduction to Image Processing. Image Compression Part 2: Variable-rate compression SYDE 575: Introduction to Image Processing Image Compression Part 2: Variable-rate compression Variable-rate Compression: Transform-based compression As mentioned earlier, we wish to transform image data

More information

Enhanced SATD-based cost function for mode selection of H.264/AVC intra coding

Enhanced SATD-based cost function for mode selection of H.264/AVC intra coding SIViP (013) 7:777 786 DOI 10.1007/s11760-011-067-z ORIGINAL PAPER Enhanced SATD-based cost function for mode selection of H.6/AVC intra coding Mohammed Golam Sarwer Q. M. Jonathan Wu Xiao-Ping Zhang Received:

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5 Lecture : Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP959 Multimedia Systems S 006 jzhang@cse.unsw.edu.au Acknowledgement

More information

LOSSLESS INTRA CODING IN HEVC WITH INTEGER-TO-INTEGER DST. Fatih Kamisli. Middle East Technical University Ankara, Turkey

LOSSLESS INTRA CODING IN HEVC WITH INTEGER-TO-INTEGER DST. Fatih Kamisli. Middle East Technical University Ankara, Turkey LOSSLESS INTRA CODING IN HEVC WITH INTEGER-TO-INTEGER DST Fatih Kamisli Middle East Technical University Ankara, Turkey ABSTRACT It is desirable to support efficient lossless coding within video coding

More information

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. Preface p. xvii Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. 6 Summary p. 10 Projects and Problems

More information

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000 1 Wavelet Scalable Video Codec Part 1: image compression by JPEG2000 Aline Roumy aline.roumy@inria.fr May 2011 2 Motivation for Video Compression Digital video studio standard ITU-R Rec. 601 Y luminance

More information

MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING

MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING Yannick Morvan, Dirk Farin University of Technology Eindhoven 5600 MB Eindhoven, The Netherlands email: {y.morvan;d.s.farin}@tue.nl Peter

More information

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 12, NO 11, NOVEMBER 2002 957 Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression Markus Flierl, Student

More information

AN ENHANCED EARLY DETECTION METHOD FOR ALL ZERO BLOCK IN H.264

AN ENHANCED EARLY DETECTION METHOD FOR ALL ZERO BLOCK IN H.264 st January 0. Vol. 7 No. 005-0 JATIT & LLS. All rights reserved. ISSN: 99-865 www.jatit.org E-ISSN: 87-95 AN ENHANCED EARLY DETECTION METHOD FOR ALL ZERO BLOCK IN H.6 CONG-DAO HAN School of Electrical

More information

Lecture 7 Predictive Coding & Quantization

Lecture 7 Predictive Coding & Quantization Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Lecture 7 Predictive Coding & Quantization June 3, 2009 Outline Predictive Coding Motion Estimation and Compensation Context-Based Coding Quantization

More information

Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences

Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences Jingning Han, Vinay Melkote, and Kenneth Rose Department of Electrical and Computer Engineering University of California, Santa

More information

A DISTRIBUTED VIDEO CODER BASED ON THE H.264/AVC STANDARD

A DISTRIBUTED VIDEO CODER BASED ON THE H.264/AVC STANDARD 5th European Signal Processing Conference (EUSIPCO 27), Poznan, Poland, September 3-7, 27, copyright by EURASIP A DISTRIBUTED VIDEO CODER BASED ON THE /AVC STANDARD Simone Milani and Giancarlo Calvagno

More information

Predictive Coding. Prediction Prediction in Images

Predictive Coding. Prediction Prediction in Images Prediction Prediction in Images Predictive Coding Principle of Differential Pulse Code Modulation (DPCM) DPCM and entropy-constrained scalar quantization DPCM and transmission errors Adaptive intra-interframe

More information

Enhanced Stochastic Bit Reshuffling for Fine Granular Scalable Video Coding

Enhanced Stochastic Bit Reshuffling for Fine Granular Scalable Video Coding Enhanced Stochastic Bit Reshuffling for Fine Granular Scalable Video Coding Wen-Hsiao Peng, Tihao Chiang, Hsueh-Ming Hang, and Chen-Yi Lee National Chiao-Tung University 1001 Ta-Hsueh Rd., HsinChu 30010,

More information

Predictive Coding. Prediction

Predictive Coding. Prediction Predictive Coding Prediction Prediction in Images Principle of Differential Pulse Code Modulation (DPCM) DPCM and entropy-constrained scalar quantization DPCM and transmission errors Adaptive intra-interframe

More information

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING 5 0 DPCM (Differential Pulse Code Modulation) Making scalar quantization work for a correlated source -- a sequential approach. Consider quantizing a slowly varying source (AR, Gauss, ρ =.95, σ 2 = 3.2).

More information

CHAPTER 3. Implementation of Transformation, Quantization, Inverse Transformation, Inverse Quantization and CAVLC for H.

CHAPTER 3. Implementation of Transformation, Quantization, Inverse Transformation, Inverse Quantization and CAVLC for H. CHAPTER 3 Implementation of Transformation, Quantization, Inverse Transformation, Inverse Quantization and CAVLC for H.264 Video Encoder 3.1 Introduction The basics of video processing in H.264 Encoder

More information

Rate-distortion Analysis and Control in DCT-based Scalable Video Coding. Xie Jun

Rate-distortion Analysis and Control in DCT-based Scalable Video Coding. Xie Jun Rate-distortion Analysis and Control in DCT-based Scalable Video Coding Xie Jun School of Computer Engineering A thesis submitted to the Nanyang Technological University in fulfillment of the requirement

More information

Converting DCT Coefficients to H.264/AVC

Converting DCT Coefficients to H.264/AVC MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Converting DCT Coefficients to H.264/AVC Jun Xin, Anthony Vetro, Huifang Sun TR2004-058 June 2004 Abstract Many video coding schemes, including

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Lesson 7 Delta Modulation and DPCM Instructional Objectives At the end of this lesson, the students should be able to: 1. Describe a lossy predictive coding scheme.

More information

Motion Vector Prediction With Reference Frame Consideration

Motion Vector Prediction With Reference Frame Consideration Motion Vector Prediction With Reference Frame Consideration Alexis M. Tourapis *a, Feng Wu b, Shipeng Li b a Thomson Corporate Research, 2 Independence Way, Princeton, NJ, USA 855 b Microsoft Research

More information

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction SPIE Conference on Visual Communications and Image Processing, Perth, Australia, June 2000 1 A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction Markus Flierl, Thomas

More information

Can the sample being transmitted be used to refine its own PDF estimate?

Can the sample being transmitted be used to refine its own PDF estimate? Can the sample being transmitted be used to refine its own PDF estimate? Dinei A. Florêncio and Patrice Simard Microsoft Research One Microsoft Way, Redmond, WA 98052 {dinei, patrice}@microsoft.com Abstract

More information

h 8x8 chroma a b c d Boundary filtering: 16x16 luma H.264 / MPEG-4 Part 10 : Intra Prediction H.264 / MPEG-4 Part 10 White Paper Reconstruction Filter

h 8x8 chroma a b c d Boundary filtering: 16x16 luma H.264 / MPEG-4 Part 10 : Intra Prediction H.264 / MPEG-4 Part 10 White Paper Reconstruction Filter H.264 / MPEG-4 Part 10 White Paper Reconstruction Filter 1. Introduction The Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG are finalising a new standard for the coding (compression) of natural

More information

Video Coding With Linear Compensation (VCLC)

Video Coding With Linear Compensation (VCLC) Coding With Linear Compensation () Arif Mahmood Zartash Afzal Uzmi Sohaib Khan School of Science and Engineering Lahore University of Management Sciences, Lahore, Pakistan {arifm, zartash, sohaib}@lums.edu.pk

More information

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition Review of Quantization UMCP ENEE631 Slides (created by M.Wu 004) Quantization UMCP ENEE631 Slides (created by M.Wu 001/004) L-level Quantization Minimize errors for this lossy process What L values to

More information

Direction-Adaptive Transforms for Coding Prediction Residuals

Direction-Adaptive Transforms for Coding Prediction Residuals MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Direction-Adaptive Transforms for Coding Prediction Residuals Robert Cohen, Sven Klomp, Anthony Vetro, Huifang Sun TR2010-090 November 2010

More information

Image Compression. 1. Introduction. Greg Ames Dec 07, 2002

Image Compression. 1. Introduction. Greg Ames Dec 07, 2002 Image Compression Greg Ames Dec 07, 2002 Abstract Digital images require large amounts of memory to store and, when retrieved from the internet, can take a considerable amount of time to download. The

More information

Overview. Analog capturing device (camera, microphone) PCM encoded or raw signal ( wav, bmp, ) A/D CONVERTER. Compressed bit stream (mp3, jpg, )

Overview. Analog capturing device (camera, microphone) PCM encoded or raw signal ( wav, bmp, ) A/D CONVERTER. Compressed bit stream (mp3, jpg, ) Overview Analog capturing device (camera, microphone) Sampling Fine Quantization A/D CONVERTER PCM encoded or raw signal ( wav, bmp, ) Transform Quantizer VLC encoding Compressed bit stream (mp3, jpg,

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 5 Other Coding Techniques Instructional Objectives At the end of this lesson, the students should be able to:. Convert a gray-scale image into bit-plane

More information

An Investigation of 3D Dual-Tree Wavelet Transform for Video Coding

An Investigation of 3D Dual-Tree Wavelet Transform for Video Coding MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com An Investigation of 3D Dual-Tree Wavelet Transform for Video Coding Beibei Wang, Yao Wang, Ivan Selesnick and Anthony Vetro TR2004-132 December

More information

Redundancy Allocation Based on Weighted Mismatch-Rate Slope for Multiple Description Video Coding

Redundancy Allocation Based on Weighted Mismatch-Rate Slope for Multiple Description Video Coding 1 Redundancy Allocation Based on Weighted Mismatch-Rate Slope for Multiple Description Video Coding Mohammad Kazemi, Razib Iqbal, Shervin Shirmohammadi Abstract Multiple Description Coding (MDC) is a robust

More information

Introduction to Video Compression H.261

Introduction to Video Compression H.261 Introduction to Video Compression H.6 Dirk Farin, Contact address: Dirk Farin University of Mannheim Dept. Computer Science IV L 5,6, 683 Mannheim, Germany farin@uni-mannheim.de D.F. YUV-Colorspace Computer

More information

DELIVERING video of good quality over the Internet or

DELIVERING video of good quality over the Internet or 1448 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 Error Resilient Video Coding Using B Pictures in H.264 Mengyao Ma, Student Member, IEEE, Oscar C. Au,

More information

IMAGE COMPRESSION-II. Week IX. 03/6/2003 Image Compression-II 1

IMAGE COMPRESSION-II. Week IX. 03/6/2003 Image Compression-II 1 IMAGE COMPRESSION-II Week IX 3/6/23 Image Compression-II 1 IMAGE COMPRESSION Data redundancy Self-information and Entropy Error-free and lossy compression Huffman coding Predictive coding Transform coding

More information

Statistical Analysis and Distortion Modeling of MPEG-4 FGS

Statistical Analysis and Distortion Modeling of MPEG-4 FGS Statistical Analysis and Distortion Modeling of MPEG-4 FGS Min Dai Electrical Engineering Texas A&M University, TX 77843 Dmitri Loguinov Computer Science Texas A&M University, TX 77843 Hayder Radha Hayder

More information

A Novel Multi-Symbol Curve Fit based CABAC Framework for Hybrid Video Codec s with Improved Coding Efficiency and Throughput

A Novel Multi-Symbol Curve Fit based CABAC Framework for Hybrid Video Codec s with Improved Coding Efficiency and Throughput A Novel Multi-Symbol Curve Fit based CABAC Framework for Hybrid Video Codec s with Improved Coding Efficiency and Throughput by Krishnakanth Rapaka A thesis presented to the University of Waterloo in fulfilment

More information

Proyecto final de carrera

Proyecto final de carrera UPC-ETSETB Proyecto final de carrera A comparison of scalar and vector quantization of wavelet decomposed images Author : Albane Delos Adviser: Luis Torres 2 P a g e Table of contents Table of figures...

More information

BASICS OF COMPRESSION THEORY

BASICS OF COMPRESSION THEORY BASICS OF COMPRESSION THEORY Why Compression? Task: storage and transport of multimedia information. E.g.: non-interlaced HDTV: 0x0x0x = Mb/s!! Solutions: Develop technologies for higher bandwidth Find

More information

Compression methods: the 1 st generation

Compression methods: the 1 st generation Compression methods: the 1 st generation 1998-2017 Josef Pelikán CGG MFF UK Praha pepca@cgg.mff.cuni.cz http://cgg.mff.cuni.cz/~pepca/ Still1g 2017 Josef Pelikán, http://cgg.mff.cuni.cz/~pepca 1 / 32 Basic

More information

Transform coding - topics. Principle of block-wise transform coding

Transform coding - topics. Principle of block-wise transform coding Transform coding - topics Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform Threshold coding Typical coding artifacts

More information

Product Obsolete/Under Obsolescence. Quantization. Author: Latha Pillai

Product Obsolete/Under Obsolescence. Quantization. Author: Latha Pillai Application Note: Virtex and Virtex-II Series XAPP615 (v1.1) June 25, 2003 R Quantization Author: Latha Pillai Summary This application note describes a reference design to do a quantization and inverse

More information

Laboratory 1 Discrete Cosine Transform and Karhunen-Loeve Transform

Laboratory 1 Discrete Cosine Transform and Karhunen-Loeve Transform Laboratory Discrete Cosine Transform and Karhunen-Loeve Transform Miaohui Wang, ID 55006952 Electronic Engineering, CUHK, Shatin, HK Oct. 26, 202 Objective, To investigate the usage of transform in visual

More information

EE5585 Data Compression April 18, Lecture 23

EE5585 Data Compression April 18, Lecture 23 EE5585 Data Compression April 18, 013 Lecture 3 Instructor: Arya Mazumdar Scribe: Trevor Webster Differential Encoding Suppose we have a signal that is slowly varying For instance, if we were looking at

More information

Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming

Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming Min Dai Electrical Engineering, Texas A&M University Dmitri Loguinov Computer Science, Texas A&M University

More information

Lecture 9 Video Coding Transforms 2

Lecture 9 Video Coding Transforms 2 Lecture 9 Video Coding Transforms 2 Integer Transform of H.264/AVC In previous standards, the DCT was defined as the ideal transform, with unlimited accuracy. This has the problem, that we have encoders

More information

Multimedia & Computer Visualization. Exercise #5. JPEG compression

Multimedia & Computer Visualization. Exercise #5. JPEG compression dr inż. Jacek Jarnicki, dr inż. Marek Woda Institute of Computer Engineering, Control and Robotics Wroclaw University of Technology {jacek.jarnicki, marek.woda}@pwr.wroc.pl Exercise #5 JPEG compression

More information

CHAPTER 3. Transformed Vector Quantization with Orthogonal Polynomials Introduction Vector quantization

CHAPTER 3. Transformed Vector Quantization with Orthogonal Polynomials Introduction Vector quantization 3.1. Introduction CHAPTER 3 Transformed Vector Quantization with Orthogonal Polynomials In the previous chapter, a new integer image coding technique based on orthogonal polynomials for monochrome images

More information

6. H.261 Video Coding Standard

6. H.261 Video Coding Standard 6. H.261 Video Coding Standard ITU-T (formerly CCITT) H-Series of Recommendations 1. H.221 - Frame structure for a 64 to 1920 kbits/s channel in audiovisual teleservices 2. H.230 - Frame synchronous control

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

Phase-Correlation Motion Estimation Yi Liang

Phase-Correlation Motion Estimation Yi Liang EE 392J Final Project Abstract Phase-Correlation Motion Estimation Yi Liang yiliang@stanford.edu Phase-correlation motion estimation is studied and implemented in this work, with its performance, efficiency

More information

Bit Rate Estimation for Cost Function of H.264/AVC

Bit Rate Estimation for Cost Function of H.264/AVC Bit Rate Estimation for Cost Function of H.264/AVC 257 14 X Bit Rate Estimation for Cost Function of H.264/AVC Mohammed Golam Sarwer 1,2, Lai Man Po 1 and Q. M. Jonathan Wu 2 1 City University of Hong

More information

Compression. What. Why. Reduce the amount of information (bits) needed to represent image Video: 720 x 480 res, 30 fps, color

Compression. What. Why. Reduce the amount of information (bits) needed to represent image Video: 720 x 480 res, 30 fps, color Compression What Reduce the amount of information (bits) needed to represent image Video: 720 x 480 res, 30 fps, color Why 720x480x20x3 = 31,104,000 bytes/sec 30x60x120 = 216 Gigabytes for a 2 hour movie

More information

CSE 408 Multimedia Information System Yezhou Yang

CSE 408 Multimedia Information System Yezhou Yang Image and Video Compression CSE 408 Multimedia Information System Yezhou Yang Lots of slides from Hassan Mansour Class plan Today: Project 2 roundup Today: Image and Video compression Nov 10: final project

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Source Coding for Compression

Source Coding for Compression Source Coding for Compression Types of data compression: 1. Lossless -. Lossy removes redundancies (reversible) removes less important information (irreversible) Lec 16b.6-1 M1 Lossless Entropy Coding,

More information

A TWO-STAGE VIDEO CODING FRAMEWORK WITH BOTH SELF-ADAPTIVE REDUNDANT DICTIONARY AND ADAPTIVELY ORTHONORMALIZED DCT BASIS

A TWO-STAGE VIDEO CODING FRAMEWORK WITH BOTH SELF-ADAPTIVE REDUNDANT DICTIONARY AND ADAPTIVELY ORTHONORMALIZED DCT BASIS A TWO-STAGE VIDEO CODING FRAMEWORK WITH BOTH SELF-ADAPTIVE REDUNDANT DICTIONARY AND ADAPTIVELY ORTHONORMALIZED DCT BASIS Yuanyi Xue, Yi Zhou, and Yao Wang Department of Electrical and Computer Engineering

More information

Application of a Bi-Geometric Transparent Composite Model to HEVC: Residual Data Modelling and Rate Control

Application of a Bi-Geometric Transparent Composite Model to HEVC: Residual Data Modelling and Rate Control Application of a Bi-Geometric Transparent Composite Model to HEVC: Residual Data Modelling and Rate Control by Yueming Gao A thesis presented to the University of Waterloo in fulfilment of the thesis requirement

More information

Selective Use Of Multiple Entropy Models In Audio Coding

Selective Use Of Multiple Entropy Models In Audio Coding Selective Use Of Multiple Entropy Models In Audio Coding Sanjeev Mehrotra, Wei-ge Chen Microsoft Corporation One Microsoft Way, Redmond, WA 98052 {sanjeevm,wchen}@microsoft.com Abstract The use of multiple

More information

Image and Multidimensional Signal Processing

Image and Multidimensional Signal Processing Image and Multidimensional Signal Processing Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ Image Compression 2 Image Compression Goal: Reduce amount

More information

Transform Coding. Transform Coding Principle

Transform Coding. Transform Coding Principle Transform Coding Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform coefficients Entropy coding of transform coefficients

More information

Real-Time Audio and Video

Real-Time Audio and Video MM- Multimedia Payloads MM-2 Raw Audio (uncompressed audio) Real-Time Audio and Video Telephony: Speech signal: 2 Hz 3.4 khz! 4 khz PCM (Pulse Coded Modulation)! samples/sec x bits = 64 kbps Teleconferencing:

More information

SSIM-Inspired Perceptual Video Coding for HEVC

SSIM-Inspired Perceptual Video Coding for HEVC 2012 IEEE International Conference on Multimedia and Expo SSIM-Inspired Perceptual Video Coding for HEVC Abdul Rehman and Zhou Wang Dept. of Electrical and Computer Engineering, University of Waterloo,

More information

Scalable resource allocation for H.264 video encoder: Frame-level controller

Scalable resource allocation for H.264 video encoder: Frame-level controller Scalable resource allocation for H.264 video encoder: Frame-level controller Michael M. Bronstein Technion Israel Institute of Technology September 7, 2009 Abstract Tradeoff between different resources

More information

Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding

Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding Amir Said (said@ieee.org) Hewlett Packard Labs, Palo Alto, CA, USA Abstract We analyze the technique for reducing the complexity

More information

Modulo Transforms an Alternative to Lifting

Modulo Transforms an Alternative to Lifting Modulo Transforms an Alternative to Lifting Sridhar Srinivasan Microsoft igital Media ivision sridhsri@microsoft.com Abstract This paper introduces a new paradigm for the construction of reversible transforms

More information

Objectives of Image Coding

Objectives of Image Coding Objectives of Image Coding Representation of an image with acceptable quality, using as small a number of bits as possible Applications: Reduction of channel bandwidth for image transmission Reduction

More information

A Lossless Image Coder With Context Classification, Adaptive Prediction and Adaptive Entropy Coding

A Lossless Image Coder With Context Classification, Adaptive Prediction and Adaptive Entropy Coding A Lossless Image Coder With Context Classification, Adaptive Prediction and Adaptive Entropy Coding Author Golchin, Farshid, Paliwal, Kuldip Published 1998 Conference Title Proc. IEEE Conf. Acoustics,

More information

Number Representation and Waveform Quantization

Number Representation and Waveform Quantization 1 Number Representation and Waveform Quantization 1 Introduction This lab presents two important concepts for working with digital signals. The first section discusses how numbers are stored in memory.

More information

Detailed Review of H.264/AVC

Detailed Review of H.264/AVC Detailed Review of H.264/AVC, Ph.D.. abuhajar@digitavid.net (408) 506-2776 P.O. BOX:720998 San Jose, CA 95172 1 Outline Common Terminologies Color Space Macroblock and Slice Type Slice Block Diagram Intra-Prediction

More information

Source Coding: Part I of Fundamentals of Source and Video Coding

Source Coding: Part I of Fundamentals of Source and Video Coding Foundations and Trends R in sample Vol. 1, No 1 (2011) 1 217 c 2011 Thomas Wiegand and Heiko Schwarz DOI: xxxxxx Source Coding: Part I of Fundamentals of Source and Video Coding Thomas Wiegand 1 and Heiko

More information

on a per-coecient basis in large images is computationally expensive. Further, the algorithm in [CR95] needs to be rerun, every time a new rate of com

on a per-coecient basis in large images is computationally expensive. Further, the algorithm in [CR95] needs to be rerun, every time a new rate of com Extending RD-OPT with Global Thresholding for JPEG Optimization Viresh Ratnakar University of Wisconsin-Madison Computer Sciences Department Madison, WI 53706 Phone: (608) 262-6627 Email: ratnakar@cs.wisc.edu

More information

Image Compression - JPEG

Image Compression - JPEG Overview of JPEG CpSc 86: Multimedia Systems and Applications Image Compression - JPEG What is JPEG? "Joint Photographic Expert Group". Voted as international standard in 99. Works with colour and greyscale

More information

Perceptual Feedback in Multigrid Motion Estimation Using an Improved DCT Quantization

Perceptual Feedback in Multigrid Motion Estimation Using an Improved DCT Quantization IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 10, OCTOBER 2001 1411 Perceptual Feedback in Multigrid Motion Estimation Using an Improved DCT Quantization Jesús Malo, Juan Gutiérrez, I. Epifanio,

More information

Title. Author(s)Lee, Kenneth K. C.; Chan, Y. K. Issue Date Doc URL. Type. Note. File Information

Title. Author(s)Lee, Kenneth K. C.; Chan, Y. K. Issue Date Doc URL. Type. Note. File Information Title Efficient Color Image Compression with Category-Base Author(s)Lee, Kenneth K. C.; Chan, Y. K. Proceedings : APSIPA ASC 2009 : Asia-Pacific Signal Citationand Conference: 747-754 Issue Date 2009-10-04

More information

VIDEO CODING USING A SELF-ADAPTIVE REDUNDANT DICTIONARY CONSISTING OF SPATIAL AND TEMPORAL PREDICTION CANDIDATES. Author 1 and Author 2

VIDEO CODING USING A SELF-ADAPTIVE REDUNDANT DICTIONARY CONSISTING OF SPATIAL AND TEMPORAL PREDICTION CANDIDATES. Author 1 and Author 2 VIDEO CODING USING A SELF-ADAPTIVE REDUNDANT DICTIONARY CONSISTING OF SPATIAL AND TEMPORAL PREDICTION CANDIDATES Author 1 and Author 2 Address - Line 1 Address - Line 2 Address - Line 3 ABSTRACT All standard

More information

Neural network based intra prediction for video coding

Neural network based intra prediction for video coding Neural network based intra prediction for video coding J. Pfaff, P. Helle, D. Maniry, S. Kaltenstadler, W. Samek, H. Schwarz, D. Marpe, T. Wiegand Video Coding and Analytics Department, Fraunhofer Institute

More information