Achieving H.264-like compression efficiency with distributed video coding

Size: px

Start display at page:

Download "Achieving H.264-like compression efficiency with distributed video coding"

Della Stokes
5 years ago
Views:

1 Achieving H.264-like compression efficiency with distributed video coding Simone Milani a, Jiajun Wang b and Kannan Ramchandran b a Dept. of Information Engineering, University of Padova, Italy. b Dept. of Electrical Engineering and Computer Science, University of California, Berkeley. ABSTRACT Recently, a new class of distributed source coding (DSC) based video coders has been proposed to enable lowcomplexity encoding. However, to date, these low-complexity DSC-based video encoders have been unable to compress as efficiently as motion-compensated predictive coding based video codecs, such as H.264/AVC, due to insufficiently accurate modeling of video data. In this work, we examine achieving H.264-like high compression efficiency with a DSC-based approach without the encoding complexity constraint. The success of H.264/AVC highlights the importance of accurately modeling the highly non-stationary video data through fine-granularity motion estimation. This motivates us to deviate from the popular approach of approaching the Wyner-Ziv bound with sophisticated capacity-achieving channel codes that require long block lengths and high decoding complexity, and instead focus on accurately modeling video data. Such a DSC-based, compression-centric encoder is an important first step towards building a robust DSC-based video coding framework. Keywords: Distributed video coding, H.264/AVC, entropy coding. 1. INTRODUCTION Distributed video coding (DVC) is a new video coding framework built on the information-theoretic concept of distributed source coding (DSC). 1,2 In particular, it utilizes principles of lossy distributed compression (also known as source coding with side-information or Wyner-Ziv coding) from multi-terminal information theory, which relies on the assumption of perfect knowledge of the statistical correlation structure between the source and side-information. In the case of video coding, where the source is the current encoding unit and the sideinformation is a previously decoded unit, the knowledge of correlation between the source and side-information implies knowledge of the statistics for the displaced frame difference (DFD). Under this assumption, motion search can be theoretically transferred from the encoder to the decoder, reducing the encoding complexity without any loss in compression performance. 3 In this paper, we recognize that for practical video coding algorithms, the correlation statistics are not known a priori either at the encoder or at the decoder. However, we see that motion estimation modules can not only find the best motion-compensated predictor and the DFD, but also find the correlation statistics between the current block and the best possible side-information. Due to the non-stationarity of video data, the correlation structure between a block and its best side-information vastly varies within a frame. This motivates us to adopt the approach of generating a symbol (or syndrome as in channel coding terminology) for each quantized transform coefficient and entropically code it. This is a large deviation from the popular approach of achieving the Wyner-Ziv bound through sophisticated capacity-achieving channel codes, such as LDPC codes, which require both long block-lengths and high decoding complexity It was first shown in [4] that motion search potentially allows a distributed video coding approach to achieve high compression efficiency without sophisticated channel Further author information: Simone Milani: simone.milani@dei.unipd.it; Jiajun Wang, Kannan Ramchandran: {junewang, kannanr}@eecs.berkeley.edu. The work of Simone Milani was also supported by the Foundation Ing. Aldo Gini. An encoding unit can be a block, a slice or a frame. In our case, an encoding unit is a block. While we think the long block-length requirement of efficient channel codes may be a mismatch for the highly nonstationary video data model, we believe the overall performance can be improved by incorporating the ever-changing statistics in both the encoding and the decoding process.

2 codes. In light of the success of H.264/AVC, we design and implement a DSC-based video coder that adopts some building blocks of the H.264/AVC scheme that can be utilized in a DSC-based framework, such as more sophisticated motion search (and thus more accurate correlation estimation) and in-loop deblocking filter. 5 In distributed source coding, the correlation statistics between the source and the side information is used to determine the amount of information about the source (not DFD) needed to be coded to ensure successful decoding. In fact, this information can be conceptualized as a hash of the source. The strength of the hash needed to guarantee successful decoding is determined by the amount of correlation, i.e. the stronger the correlation, the weaker the hash needed. Naturally, the DSC-generated data (hash) has a different probabilistic distribution from that of DFD, and therefore, a new entropy coder needs to be designed to suit the statistics of hashes. This allows us to achieve H.264-like compression efficiency without having to use sophisticated channel codes that entail high decoding complexity. The setup of this work is not exactly the same as that in the classical Wyner-Ziv setup as the side-information is indeed available at the encoder and we can find the correlation between the source and the side-information through motion estimation. However, this approach is still fundamentally different from the predictive-coding framework as we code information about the source instead of the difference between the source and the motioncompensated predictor. This is important as it allows the proposed architecture to be efficiently extended to a system that is robust to channel losses as proposed in [6]. This is because a DSC-based encoder sends information about the source, the amount of which depends on the statistical correlation between the source and side-information (predictor). When channel noise weakens the correlation between the source and the predictor, only an incremental amount of source information is needed at the decoder to ensure successful decoding, which can be thought of as increasing the length or strength of the hash. In an hybrid video coding system, on the other hand, the compressed data (residual signal) deterministically depends on both the source and the predictor. Therefore, channel loss that alters the reconstructed predictor will require the coding of unpredicted signal. In [6], a robust DSC-based video coder was proposed with a baseline encoder that matches the compression efficiency of H With the increasingly wide adoption of the H.264/AVC standard, it is important have a baseline DSCsystem that compares well with the current standards in terms of coding efficiency, and this compression-centric baseline is the focus of this work. Related work Distributed video coding is a novel tool to enable low-complexity encoding and robust transmission under tight delay constraint. 7,8 However, it typically suffers from much lower compression efficiency when compared to today s state-of-the-art hybrid coders. Previous works 8 10 utilized capacity-achieving channel codes to approach the Wyner-Ziv bound. In these work, a bit plane of the same significance from different coefficients are chained together and treated as a single source. A channel code is then applied on the source to generate the syndrome (or hash of the source). At the decoder, the chained bit plane obtained from the previous frame is used as the side information to decode. Such long block-length typically entails relatively high decoding complexity. The paper is organized as follows, the implementation details of the proposed coder will be described in Section 2.1, focusing on the syndrome generation procedure (Section 2.2) and on the adopted entropy coder (Section 2.3). First, the probability distributions of syndromes is analyzed in Section 2.3.1, and then, Section describes a quad-tree based arithmetic coder that obtains an effective compression performance. Finally, Section 3 compares the experimental results obtained with the DSC coding scheme with the ones obtained from a standard H.264/AVC coder Overall structure 2. PROPOSED SYSTEM The structure of the H.264/AVC coder can be seen as the evolution of a comprehensive collection of coding techniques designed over the last 50 years. Many features in H.264/AVC were already present in some of the previous hybrid coders, but were redefined and refined. In addition, some new elements were introduced resulting Typically the bit-planes of the transform coefficients at the same location (in a 8 8 or 4 4 block) are chained together across a frame or a large fraction of a frame.

Figure 1. Encoder block diagram. The key differences between the proposed DSC-based encoder and a H.264/AVC encoder are the syndrome generator and re-designed entropy coder.

3 Figure 1. Encoder block diagram. The key differences between the proposed DSC-based encoder and a H.264/AVC encoder are the syndrome generator and re-designed entropy coder. a final coder with a wide set of tools that can be tuned in many different ways. Experimental results have proven that this orchestration of many different coding strategies is an effective solution, as the the H.264/AVC achieves much higher compression efficiency than all the previous coding standards, including MPEG-4 11 and H Motivated by the success of H.264/AVC, we incorporate some of its important modules that can be utilized in a DSC-based video coding framework, including the more sophisticated motion estimation module and the deblocking filter. But we retain the distributed source coding fundamental: instead of sending the difference between the current block and the motion-compensated predictor, DSC-based video coder sends information about the original (quantized) block, which can be conceptualized as a hash of the original block. Figure 1) the encoder block diagram of the proposed system. In a hybrid coder, Motion Estimation (ME) identifies the best motion-compensated predictor according to a given distortion metric. Both the location of the predictor and the resulting DFD need to be transmitted to the decoder, which reconstructs the coded block generating the corresponding predictor through Motion Compensation and adding the decoded DFD. In a DSC-based coder, we can use the motion search module to find the correlation between the current block and the best available side-information available at the decoder. This value determines the least amount of information (strength of hash ) needed to be sent in order to ensure a successful decoding. We call this hash syndrome, using channel coding terminology. The key differences between the proposed DSC-based encoder and a H.264/AVC encoder are: (1) an additional syndrome generator and (2) a modified entropy coding algorithm to better suit the probability distribution of syndrome values in place of the original H.264/AVC entropy coders, i.e. Context-Adaptive Variable-Length Coder (CAVLC) and Context-Adaptive Binary Arithmetic Coder (CABAC), which were designed according to the statistics of the quantized transformed DFD. We now describe these two modules in more detail Syndrome generation Given the original 4 4 block x and its motion-compensated predictor x p, the H.264/AVC coder generates the residual DFD r = x x p, and transform it into the coefficients R through a separable multiplication-free integer transform (see [13]). The dynamic range of coefficients is then reduced using a dead-zone quantizer as in the following equation R(i, j) + O(i, j, QP, mb type) R q (i, j) = sign(r(i, j)) (1) (i, j, QP, mb type) where the quantization step (i, j, QP, mb type) and the offset O(i, j, QP, mb type) depend on the coefficient position (i, j) in the block, the Quantization Parameter QP, and the macroblock coding type mb type. The standard allows specifying a quantization matrix that may vary the quantization steps for coefficients at different positions according to Rate-Distortion optimization criteria. In the current JVT implementation, the offset O(i, j, QP, mb type) is 1 3 (i, j, QP, mb type) for Intra blocks and 1 6 (i, j, QP, mb type) for Inter blocks. For the The adopted transform matrix is orthogonal but it is not orthonormal, and therefore, a rescaling is needed to compensate the amplification introduced at each frequency (see [13]).

4 sake of simplicity, in the following paragraphs we will refer to (i, j, QP, mb type) and O(i, j, QP, mb type) as and O respectively. The adopted DSC scheme, on its counterpart, transforms and quantizes the original signal x into quantized coefficients X q. In our implementation, the quantization rule was changed in order to match the characteristics of the input signal and avoid an excessive mismatch between the quality obtained by H.264/AVC and DSC coder for the same QP. Therefore, the quantization offset O is set to { O = 3 QP < 12 6 (2 2 QP 11 ) QP 12. (2) After quantization, we generate a syndrome for each quantized transform coefficient based on the correlation statistics found through the motion estimation module. More specifically, we adopt the multilevel coset code proposed in [14] where the syndrome constitutes the least significant bits that cannot be inferred from the best side-information. The number n of least significant bits for a quantized coefficient X q that needs to be sent is computed through n = ( ) (Xq ) X p 2 + log 2, if d > 0 otherwise where d = min { (X q ) X p, X X p }, is the quantization step for that coefficient, and X p is the corresponding unquantized transform coefficients from the predictor block. Given the value of n, the syndrome Z corresponding to the n least significant bits of X q is generated as follows Z = X q & (2 n 1) where & denotes a bitwise AND operation (4) (note that in equations (3) and (4) we omitted the indexes). Considering the lattice Λ that includes all the quantized real values, the symbol Z identifies the sub-lattice Λ Z, where the binary representations of all values have the same least significant bits (see Figure 2). Therefore a syndrome can also be thought of as a sub-lattice index. Note that in theory, the number of least significant bits needed to be sent for each quantized coefficient is assumed known at both the encoder and the encoder. In practice, this information needs to be transmitted together with the syndrome value. For brevity, we map each combination of syndrome bits and their number to a unique symbol S = 2 n + Z. Given S and the side-information X p from motion compensation, the decoder can reconstruct the original quantized value X q by selecting the point in the sub-lattice Λ Z which is closest to the reference X p. Since these symbols are not equally likely, they can be entropy coded to achieve higher compression efficiency. Here we present a quad-tree based arithmetic coder that is tailored to the distribution of these symbols Entropy coding of syndromes The compression gain of H.264/AVC is partially due to an effective entropy coding algorithm, the Context- Adaptive Binary Arithmetic Coder (CABAC 15 ). Its structure relies on an efficient symbol binarization and on an accurate context modeling that precisely suits the statistics of input data. At first, the syntax elements produced by the video coder are converted into variable length binary strings, and for each binary digit, the modeling block assigns a context that is associated with a binary probability mass function (p.m.f.). Then, both the binary digit and the associated p.m.f. are sent to a binary arithmetic coder that maps them into an interval through a Finite States Machine (FSM) and updates the binary context. The good performance provided by the CABAC coder have led us to apply this efficient scheme to syndrome coding. However, the CABAC coder is optimized for processing quantized DFD transform coefficients, and therefore some changes must be made in order to make it suitable for syndromes. In our DSC implementation, M = 2 14 / is added to each coefficient value in order to make it positive, where the 2 14 factor depends on the amplification of the 4 4 transform (6 bits) on the input signal (8 bits). For further details, see [13] (3)

5 Λ00 Λ01 Xp Xq Λ Λ11 B 0 = 0 B 0= X Xq B = 0 1 B 1= 0 B 1= 1 1 Xp B = Xp Xq Figure 2. Partitioning the integer lattice into 3 bins (cosets). The parameter identifies the quantization step, X is the source, X q is the quantized codeword and X p is the side-information (omitting the spatial coordinates (i, j)). The number of levels in the partition tree depends on the correlation between X q and X p given X Modeling syndrome distribution Several works have proposed different probabilistic models for DFD transform coefficients according to the characteristics of the adopted transform and its dimension. Most of the solutions that were adopted for video coding standard prior to H.264/AVC are based on Laplacian and generalized-gaussian models (see [16,17]). In [18], Kamaci et al. propose a better solution using a Cauchy distribution to model the rate and distortion in a rate control algorithm, while [19] used a Laplacian+impulsive distribution which proved to be a sufficientlyaccurate low-cost approximation of the generalized-gaussian distribution. After quantization this model can be easily approximated by a symmetric geometric p.m.f., simplifying the analysis that characterizes syndrome distribution. In our proposed system, we categorize the encoding symbol (coefficients) into two categories: (1) null (zero) coefficients (second case in Equation 3) represented by symbol S = 0 and (2) non-null (zero) coefficient represented by symbol S = 2 n + Z (first case in Equation 3). We first analyze the distribution of these encoding coefficients. From Equation 3, the probability distribution of symbol S can be approximated as follows ) ( cosh (2 p(s) K S p 2n 2 e (1 n 1 Z)log(p r ) ) p 2n 2 e cosh (2 n 1 log(p r )) where K S is a normalizing constant (see Appendix A). Note that for p r 1, i.e. log(p r ) 0, the term ( ) is close to 1, and Equation (5) can be simplified as p(s) K S p 2n 2 e 1 p 2n 2 e. cosh((2 n 1 Z)log(p r)) cosh(2 n 1 log(p r)) Experimental results prove that the model fits the syndrome statistics quite well (see Figure 3). The fitting was made with different p r s for n = 2 syndromes, since the p.d.f. of the encoding coefficients for the 4 4 transform is better fitted by using two different values for p r. Generalized-Gaussian distribution with exponent lower than 1 can be simplified using a Laplacian model with an additive peak component which can be well represented by an impulsive term 19 or, more precisely, by another Laplacian component with a lower variance (obtaining a piecewise exponential distribution). Figure 3 also reveals the fact that the statistics of syndromes is much more irregular than that of the H.264/AVC coefficients. The distribution is much less biased towards zero even though the probability of having a null coefficient is higher. This makes it challenging to efficiently entropy-code the syndromes. In fact, Figure 4 illustrates the difference between the entropy of syndromes and the entropy of DFD coefficients for typical values of p e and p r and we can see that the entropy of syndromes is always higher than that of DFD. This is a direct result of the binning loss in Wyner-Ziv coding. (5)

6 Position 0 syndrome frequencies model 0.25 Position 2 syndrome frequencies model Position 5 syndrome frequencies model Prob. 0.1 Prob. Prob syndrome value (a) position syndrome value (b) position syndrome value (c) position 5 Figure 3. Comparison between the probability mass functions of syndromes (solid line) and the model in Eq. (5) (dashed line). The results were computed from the sequence foreman with QP =. The x-axis corresponds to the syndrome value S while the y-axis corresponds to its probability. Each plot corresponds to a different position in the scanning order of 4 4 transform block difference between entropies p r 0.7 Figure 4. Difference between the entropy of syndromes and the entropy of DFD for different p e and p r values p e Probability of non null symbols Position in the scanning order DSC coder H(s)=5.89 H.264/AVC H(s)=5.10 Figure 5. Probability of non-null syndrome/coefficient for DSC coder and H.264/AVC (from the sequence foreman (frame 1-14 QP=)) In addition, the positions of null quantized coefficients (called zeros as in [19,20]) must also be considered. In H.264/AVC, the high percentage of zeros in a transform block is efficiently exploited by a run-length coding algorithm. The quantized transform coefficients are scanned in zig-zag order, and the number of null coefficients between two non-zero coefficients is coded (called run). In the CABAC coder, run-length coding is replaced by coding the position of non-zero coefficients. A binary context is associated with each position in the scanning order, which models the probabilities of having a zero coefficient at that position. Experimental results show that in DFD-based video coder transform blocks have a low-pass characteristic since the probability of nonzero quantized coefficients is higher at low frequencies. On the contrary, null DSC syndromes are more evenly distributed at all the frequencies, resulting in a much less evident low-pass characteristic (see Figure 5). As a result, coding syndromes by using zig-zag scan followed by run-length coding and coding the position of every non-null syndrome can be quite inefficient. Experimental results show that in a transform block, null syndromes tend to occur in neighboring positions, while non-null syndromes appear to be more sparse. This motivates us to adopt a quad-tree 21,22 based solution in the entropy coder.

7 CBP bit =1 CBP_block= CBP_subblock=0 CBP_subblock= CBP_subblock=151 CBP_subblock=1256 Figure 6. Example of quad-tree coding using CBP variables Quad-tree based entropy coding of syndromes Adopting a hierarchical quad-tree partitioning of the 4 4 syndromes into sub-blocks resulted in a much more efficient coding of the syndromes. The top level variable CBP bit indicates if there is any non-zero syndrome value in the 4 4 block. CBP block then indicates which of the 4 sub-blocks contain non-zero syndrome values. Finally, each of these indicated sub-blocks has a variable CBP-subblock that indicates where and what the nonzero values are (see Figure 6 for an example). At this level, the quad-tree coder characterizes which syndromes are different from zero, which ones are coded using two bits (called d1s or d1-syndromes), and which ones are coded with a higher number of bits. These variables are computed as follows CBP block = b b b b 3 CBP subblock = c c 1 + c c 3 { 0 syndrome n = 0 b i = 1 otherwise c i = 0 if Z i = 0 1 if Z i is d1 and z i &3 = 0 2 if Z i is d1 and z i &3 = 1 3 if Z i is d1 and z i &3 = 2 4 if Z i is d1 and z i &3 = 3 5 otherwise with i = 0,...,3. (6) Comparing the average bit rate needed to code the position of zeros and d1-syndromes using the original CABAC scheme and the quad-tree scheme (Table 1), we see that the quad-tree solution compares well with with respect the original scheme. Since the coder arithmetic coder of CABAC algorithm processes binary symbols, we have designed a binarization unit that maps CBP parameters into a variable-length binary string using a Huffman coding table. Binary symbols are then sent to the arithmetic coder with their contexts. The remaining syndromes are coded separately, specifying the number of coded bit planes and their values for each syndrome. We observed that the number of non-d1 syndromes in a sub-block is very rarely bigger than one, and when there are more than one non-d1 syndromes the number of bit planes in each syndrome is the same most of the time. Therefore, it is possible to specify a single number of bit planes for all the non-d1 syndromes in the sub-block, which is equal to the biggest one among all the non-d1 syndromes in a sub-block. In the DSC setup, giving more information about the source is redundant, but does not preclude a correct decoding. Though we code some redundant bits whenever there are non-d1 syndromes with different numbers of bit planes, we are saving on compressing the CBP parameters, reducing the total number of bits. The naming convention came from the Coded Block Pattern (CBP) structure present in the H.264/AVC coder that specifies which 8 8 block has non-zero coefficients.

8 Sequences quad-tree Original CABAC foreman mobile news Table 1. Comparison of average bit rate needed to code the position of zeros in H.264/AVC and d1-syndromes in the proposed system (frame 1 QP=) H.264 H Rate (kbit) Rate (kbit) (a) foreman (training sequence) (b) news (test sequence) Figure 7. PSNR vs. Bit rate for the first frame in the GOP (QP [15, 39]) 3. EXPERIMENTAL RESULTS In this section, we evaluate the compression performance of the DSC-based coding system with a quad-tree based entropy coder to that of a H.264/AVC coder with the same set of R-D optimization parameters. We first compare the compression efficiency in compressing the first P frame using an identically reconstructed I frame for reference. In this case, the reference blocks used by H.264/AVC to compute the DFD s and the sideinformation used by the DSC coder are identical, and therefore, the difference in performance depends only on the entropy coding of DFD for H.264/AVC and of syndromes for the proposed system. The DSC-based coder proves to be very effective as it compares well with H.264/AVC providing even higher quality for some sequences at medium bit rates (Figure 7). This is the result of adopting a quad-tree based entropy coder, which is more suitable for the statistical distribution of the syndrome symbols than the original CABAC coder and allows packing groups of symbols together More specifically, for the foreman sequence we were able to obtain the same bit rate of H.264/AVC up to db of quality. The news sequence can be compressed more efficiently than H.264/AVC at medium bit rates. This is due to the fact that this sequence has a very low level of motion and a large number of null syndromes. Furthermore, as explained earlier, given the same current block and reference block, the DSC coder generates more null syndromes than H.264/AVC generates zero-coefficients. Therefore the DSC coder will have more sparse non-null symbols for which case quad-tree based approaches are most efficient. The side effect of this phenomenon, however, is that the coding efficiency slightly decreases as we code more frames in a GOP using IPPP GOP structure as more coefficients are null or skipped compared to H.264/AVC. The additional distortion caused by this will propagate. Since the DSC-based coder quantizes the original transform coefficients while H.264/AVC quantizes DFD transform coefficients, the DSC-based coder is much more sensitive to this amount of additional distortion. This is reflected in our second set of experiments where Here both systems use only 4 4 transform with Lagrangian R-D optimization disabled and without cancelling unnecessary coefficients at high frequencies. The proposed system carries out the same I frame encoder as H.264/AVC.

9 different video sequences were coded using different quantization parameters with GOP structure IPPP of 15 frames at frame/s. Figure 8 show that the proposed coder is still able to closely match the compression efficiency of H.264/AVC especially at low and medium rates (under db). We disabled all the rate-distortion optimization strategies and coefficient cancellation for both systems to make the comparison fair. We focus instead on the comparison of the different entropy coders. It is in fact possible to improve the compression gain through optimizing the quantization steps, the coding mode, and nullifying unimportant syndromes. We note a relatively large compression performance gap at high rate. This is due to the fact that quad-tree based compression techniques are most efficient for sparse data. At high rate, the syndromes are much less sparse, causing inefficiency with quad-tree based entropy coding. Rate control algorithm: Figure 7 shows that, for a given QP, the DSC-based coder achieves a lower perceptual quality at a reduced bit rate. This mismatch can be equalized by implementing a rate control algorithm that tunes the quantization parameter QP both at the macroblock and at the frame level in order to keep the coded bit rate close to a target value. We adopted a modified version of the algorithm proposed in [19], where the percentage of zeros has been replaced with the percentage of null syndromes. For the n-th frame, the algorithm allocates T n bits for the current picture, where T n is computed as follows T n = G K I,DSC n I + n DSC. (7) The parameter G represents the number of bits that are still available for the current GOP and n t, t = I, DSC, is the number of t-type frames in the GOP that still remain to be coded. The ratio K I,DSC characterizes the complexity relation between Intra frames (I) and DSC-coded frames (DSC), and it is equal to K I,DSC = X I X DSC where X t = 2 QPt/6 R t, t = I, DSC. (8) The quantization parameter QP t is used to quantize the last t-type frame, while R t is the related number of bits. Experimental results show that there is a linear relation between the number of coded bits R and the percentage ρ of null syndromes, and therefore, the target bit rate T n can be related to a target percentage ρ n through the equation ρ n = T n q m. (9) The target percentage ρ n of null syndromes possible the identification of an average quantization step QP n, which has to be corrected at macroblock level in order to match the bandwidth constraints (see [19] for more details). The parameters m and q are estimating from previously-coded frames. In our tests, we kept the same algorithm both the proposed DSC coder and the H.264/AVC architecture (in this case DSC frames are replaced by P frames) changing only the complexity ratio K I,P. In the DSC coder the complexity K I,P is divided by a constant c = 1.4 in order to equalize the quality mismatch between DSC frames and P frames in H.264/AVC. The scaling of the complexity ratio reduces the number of bits allocated for Intra frames, and increases the bit rate for DSC frames improving their quality. Experimental results have shown the algorithm allows an accurate rate control for the proposed DSC-based coder with a small amount of complexity required (see Figure 9). Moreover, the quality equalization performed by scaling the complexity ratio proves to be effective since the plots in Figure 9 show the performance of the DSC coder is clearly improved. 4. CONCLUSION In this paper we presented the implementation of a compression-centric Distributed Source Coding based coder that utilizes some of the compression techniques adopted by H.264/AVC. While these solutions lead to a much better compression efficiency of the DSC-based coder through accurately modeling video data, they posed significant challenges in redesigning a quad-tree based entropy coder that suits the distribution of the syndromes. Having such an efficient entropy coder allowed us to approach H.264-like compression efficiency without resorting to sophisticated channel codes which entail high decoding complexity. Such a compression-centric DSC-based encoder is an important building block of a robust DSC-based encoder. It is part of our ongoing work to extend this system to a robust version.

10 H264/AVC H264/AVC (a) foreman QCIF (training sequence) (b) news QCIF (training sequence) H264/AVC H264/AVC (c) salesman QCIF (test sequence) (d) sean QCIF (test sequence) H.264 (e) foreman CIF (test sequence) (f) news CIF (test sequence) H.264 Figure 8. PSNR vs. Bit rate for the first frame in the GOP (QP [15, 39])

11 (a) sean QCIF H264/AVC (b) news QCIF H264/AVC Figure 9. PSNR vs. Bit rate with rate-control APPENDIX A. DERIVATION OF PROBABILITY DISTRIBUTION FOR SYNDROMES The probability distribution for non-zero syndromes can be approximated as follows. According to the first case of equation (3), the number of bits that must be included in the syndrome is ( ) ( Xq X p n = 2+ log 2 = 2+ log 2 X q X ) p 2+ log 2 ( X q X p,q ) = 2+ log 2 ( E ), (10) where X p is the side-information (reference block), X p,q is the quantized version of X p and E = X q X p,q. Assuming that both X q and the difference E can be approximated by an independent symmetrical geometric variable, the probability mass function of X q and E can be respectively expressed as p r (X q ) = 1 p r p r Xq M 1 + p r p e (E) = 1 p e 1 + p e p E e, (11) where we assume that the coefficients X q are shifted in such a way that, omitting the tails of the distribution, they can be included in the set [0, 2M]. The parameters p r and p e completely characterize the two probability distributions. In our implementation, they are estimated from the experimental data using log-linear fitting. Let syndrome Z be coded with n bits (i.e. 2 n 2 E = X q X p,q < 2 n 1 ). We can write the joint p.d.f. of the syndrome Z and the number of bits n as p(z, n) = 2M 1 2M 1 X q=0 X p,q=0 where 1( ) is the indicator function. p r (X q ) 1(2 n 2 < X q X p,q < 2 n 1 ) 1(Z = X q &(2 n 1)). (12) Let k T = M/2 n (k is an integer since M is some power of 2), then the sum can be written as p(s = 2 n + Z) = k T 1 k=0 2 k T 1 k=k T k T 1 k=0 2 n 1 1 E=2 n 2 p r (k 2 n + Z)p e (E) [1(k 2 n + Z E) + 1] + 2 n 1 1 E=2 n 2 p r (k 2 n + Z)p e (E) [1(k 2 n + Z < 2M E) + 1] (13) 2 n 1 1 E=2 n 2 2p r (k 2 n + Z)p e (E) + 2 k T 1 k=k T 2 n 1 1 E=2 n 2 2p r (k 2 n + Z)p e (E) (14)

12 since typically log 2 M n, thus p(1(k 2 n + Z E) = 1) 1 and p(1(k 2 n + Z < 2M E) = 1) 1. This can then be rewritten as p(s = 2 n + Z) = 1 p r 1 p e 1 + p r 1 + p e ( p2 n 2 e 1 + p r = 1 p r K S p 2n 2 e where K S is a normalizing constant. k T 1 kt 2n pr k=0 ) 1 p { 2n 2 e p M r p e (1 p 2n 2 e 2 n 1 1 2k T 1 p k 2n Z r p e 2p E e + E=2 n 2 p Z 1 p 2n r r 1 p 2n r ) ( cosh (2 n 1 Z) log(p r ) ) + 1 pm r cosh(2 n 1 log(p r )) REFERENCES p Z r p k 2 n kt 2n Z r pr p e k=k T } 2 n 1 1 E=2 n 2 2p e 1. D. Slepian and J. K. Wolf, Noiseless coding of correlated information sources, IEEE Trans. on Information Theory 19, pp , Jul A. D. Wyner and J. Ziv, The rate distortion function for source coding with side information at the decoder, IEEE Trans. on Information Theory 22, pp. 1 10, Jan P. Ishwar, V. M. Prabhakaran, and K. Ramchandran, Towards a Theory for Video Coding Using Distributed Compression Principles, in Proc. of the Internation conference on Image Processing (ICIP), A. Majumdar, R. Puri, P. Ishwar, and K. Ramchandran, Complexity/performance trade-offs for robust distributed video coding, in Proc. International Conference on Image Processing (ICIP), J. V. T. J. of ISO/IEC MPEG and ITU-T VCEG, Joint final committee draft (JFCD) of joint video specification (ITU-T Rec. H.264 ISO/IEC AVC), in Joint Video Team, 4 th Meeting, (Klagenfurt, Germany), Jul V. P. J. Wang and K. Ramchandran, Syndrome-based robust video transmission over networks with bursty losses, in Proc. of the International Conference on Image Coding, (Atlanta, GA, USA), Oct R. Puri and K. Ramchandran, PRISM: A New Robust Video Coding Architecture Based on Distributed Compression Principles, in Proc. of the th Allerton Conference on Communication, Control and Computing, pp. 2 8, (Allerton, IL, USA), Oct B. Girod, A. Aaron, S. Rane, and D. Rebollo-Monedero, Distributed video coding, Proc. of the IEEE, Special Issue on Video Coding and Delivery, pp , Jan Invited Paper. 9. A. Sehgal, A. Jagmohan, and N. Ahuja, Scalable video coding using Wyner-Ziv codes, Proc. of the Picture Coding Symposium 2004, (San Francisco, CA, USA), Dec Q. Xu and Z. Xiong, Layered Wyner-Ziv video coding, in Proc. of VCIP 04, Jan ISO/IEC JTC1, Coding of Audio-Visual Objects - Part 2: Visual. ISO/IEC (MPEG-4 Visual version 1), Apr. 1999; Amendment 1 (version 2), Feb. 2000; Amendment 4 (streaming profile), Jan. 2001, Jan ITU-T, Video Coding for Low Bitrate Communications, Version 1. ITU-T Recommendation H.263, A. Hallapuro, M. Karczewicz, and H. Malvar, Low Complexity Transform and Quantization-Part I: Basic Implementation, in Joint Video Team, 2 nd Meeting, (Geneva, CH), Jan. 29 Feb A. Majumdar, J. Chou, and K. Ramchandran, Robust Distributed Video Compression based on Multilevel Coset Codes, in Proc. of the Asilomar Conference on Signals, Systems, and Computers, Nov D. Marpe, H. Schwarz, and T. Wiegand, Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard, IEEE Trans. on CSVT 13, pp , July G. Calvagno, C. Ghirardi, G. Mian, and R. Rinaldo, Modeling of subband data for buffer control, IEEE Trans. on CSVT 7, pp. 2 8, Apr E. Y. Lam and J. W. Goodman, A mathematical analysis of the DCT coefficient distributions for images, IEEE Image Processing Mag. 9, pp , Oct N. Kamaci, Y. Altunbasak, and R. M. Mersereau, Frame bit allocation for the H.264/AVC video coder via a Cauchy-density-based rate and distortion models, IEEE Trans. on CSVT 15, pp , Aug S. Milani, L. Celetto, and G. Mian, A rate control algorithm for the H.264 encoder, in Proc. of the Sixth Baiona Workshop on Signal Processing in Communications, (Baiona, Spain), Sept. 8 10, Z. He and S. K. Mitra, Optimum bit allocation and accurate rate control for video coding via ρ-domain source modeling, 12, pp , Oct A. Klinger and C. R. Dyer, Experiments on picture representation using regular decomposition, CGIP 5, pp , A. Rosenfeld, Quadtrees and pyramids for pattern recognition and image processing, in Proc. of 5 th ICIPR, pp , (Miami, FL, USA), E (15)

A DISTRIBUTED VIDEO CODER BASED ON THE H.264/AVC STANDARD

5th European Signal Processing Conference (EUSIPCO 27), Poznan, Poland, September 3-7, 27, copyright by EURASIP A DISTRIBUTED VIDEO CODER BASED ON THE /AVC STANDARD Simone Milani and Giancarlo Calvagno