Optimizing Motion Vector Accuracy in Block-Based Video Coding

Size: px
Start display at page:

Download "Optimizing Motion Vector Accuracy in Block-Based Video Coding"

Transcription

1 revised and resubmitted to IEEE Trans. Circuits and Systems for Video Technology, 2/00 Optimizing Motion Vector Accuracy in Block-Based Video Coding Jordi Ribas-Corbera Digital Media Division Microsoft Corporation One Microsoft Way Redmond, WA 98052, USA. David L. Neuhoff EECS Dept. University of Michigan 1201 Beal Ave. Ann Arbor, MI, 48109, USA Abstract 1 In classical block-based video coding, one motion vector per image block is used to improve the prediction of the frame to be coded. These motion vectors and the resulting motion-compensated difference frame must be encoded with bits. All motion vectors are encoded with the same fixed accuracy, typically 1 or 1/2 pixel accuracy, but the best motion vector accuracies are not known. In this paper, we present a theoretical framework to find the motion vector accuracies that minimize the total encoding rate with this type of coder, for the classical case where all motion vectors are encoded with the same accuracy and for new cases where the accuracy is adapted on a frame-by-frame or block-by-block basis. To do this, we analytically model the effect of motion vector accuracy and show that the energy in a block of the difference frame is approximately quadratic in the accuracy of the block's motion vector. This energy-accuracy model is then used to obtain expressions for the total bit rate (motion rate plus difference frame rate) in terms of the blocks' motion accuracies and other key parameters. Minimizing these expressions leads to simple formulas that indicate how to choose the best motion vector accuracies for this type of coder. These formulas also show that the motion accuracy must increase where more texture is present and decrease when there is much scene noise or when the level of compression is high. We implement several entropy and MPEG-like video coders based on our analysis and present experimental results on synthetic and real video sequences. These results suggest that our formulas are accurate and that significant bit rate savings can be achieved when our optimization procedures are used. Keywords: video coding, motion estimation, motion compensation, motion vector accuracy, bit allocation, difference frame energy, rate modeling. 1 The first author was formerly with the EECS Dept. of the University of Michigan. This work was supported in part by NSF Grant NCR

2 I. Introduction Block-based, motion-compensated video coders are widely used because of their good performance and reasonable complexity. For example, they are the basis of the H.263 and MPEG standards [1-3]. As illustrated in Figure 1, in a typical coder of this class, the current frame to be encoded is divided into small blocks of the same size (typically, 8 8 or pixels per block), and for each block a motion vector is found that points to a position in the previous frame (actually, in the decoded reproduction thereof) where a good prediction of the block can be found. Aggregating the predictions of all blocks yields a prediction of the current frame. Subtracting this from the current frame yields a prediction error or difference frame that is encoded into bits, typically, with a DCT based technique. In addition, the motion vectors must themselves be encoded. The payoff for this investment in motion compensation is a savings in the number of bits required to encode the difference frame that substantially exceeds the number of bits required to encode the motion vectors -- thus significantly reducing the video encoding rate. To elaborate further, the encoding rate R (in bits per pixel) of such a video coder is the sum of the encoding rate R D for the difference frame plus the encoding rate R M for the motion vectors. It is intuitively clear that with the quality of the video frame reproduction held roughly constant, increasing R M will usually improve the motion compensation, which in turn will decrease the energy of the difference frame and hence decrease R D to some nonzero limit. Clearly, there must be an optimal value of R M, i.e., a value that minimizes R = R D + R M. The motion rate R M is principally determined by the number of motion vectors, which is inversely proportional to the size of the blocks, and the accuracy with which motion vectors are represented. To explain the latter, we note that though in some coders motion vectors are constrained to point only to pixels in the previous frame, i.e. to be integer valued, it is also possible to have noninteger valued motion vectors [1-10]. These point to blocks in an interpolation of the previous frame. For example, motion compensation with pixel accuracy means that the previous frame has been interpolated by the factor 1/ (both horizontally and vertically) and that components of the motion vectors are multiples of. In H.263, MPEG-1 and MPEG-2 [1,2], the previous frame is interpolated by a factor of 2 (for motion compensation purposes), and so =1/2 pixel, i.e., is half the distance between two adjacent pixels, and the motion vectors are said to have 1/2 pixel (or subpixel) accuracy. Clearly, the number of bits required to describe the motion vectors (i.e., R M ) increases with higher motion vector accuracy (i.e., smaller ). In this paper we analyze the effect of motion vector accuracy on the overall rate of blockbased, motion-compensated video coding. Our goal is to find the best possible accuracy and to explore the benefits of adapting the accuracy on a per frame or per block basis. A companion paper [11] uses a key result developed here and similar optimization methods to analyze the effect of block size. To be a bit more concrete, we note that both motion and differ- 1

3 ence frame rates may be viewed as functions of the motion accuracy: R M ( ) and R D ( ). As illustrated in Figure 2, the former increases and the latter decreases with smaller. We seek to find the accuracy that minimizes their sum. To accomplish this we need approximations to R M ( ) and R D ( ) that, in addition to being reasonably accurate, are sufficiently tractable that one may minimize their sum by differentiation. Indeed, we wish to be able to perform this minimization for each frame, so that one might adapt the motion accuracy on a per frame basis. Moreover, to analyze the potential benefits of adapting the accuracy on a per block basis, we also seek expressions for the motion and difference frame rates as functions of the individual motion accuracy for each block. Such expressions must also be sufficiently accurate and tractable. To approximate the dependence of the motion rate R M on the accuracy, we take the viewpoint that an "ideal" motion vector is quantized with a uniform scalar quantizer with level spacing, as in [6,10]. Since the number of bits per motion vector component is not ordinarily small, it is straightforward to find simple approximate expressions for the motion rate function, R M ( ). We consider cases where the quantized motion vectors are coded with and without entropy coding and prediction from previous motion vectors. Moreover, for the situation where the motion vectors and their x (horizontal) and y (vertical) components can have different accuracies, we find an expression R M ( ) for the motion rate in terms of = (( x,1, y,1 ),...,( x,n, y,n )), the individual motion vector accuracies of the N blocks of the frame. To approximate the difference frame rate R D, we assume as in previous studies [4,5,7] that it is a function R ~ D (S D ) of the difference frame energy S D, that the difference frame coder is simply a uniform scalar quantizer with entropy coding, and that the difference frame has a Laplacian (two-sided exponential) distribution. In this case, an expression for R ~ D (S D ) can be straightforwardly derived. (It will be shown later that the results also apply fairly well to DCT type difference frame coders.) A key step in our work is the development in Section 2 of a simple quadratic expression for the difference frame energy as a function of the motion vector accuracies. Indeed, we obtain an expression for the difference frame energy for each block, based on the accuracies x and y of the x and y components of its motion vectors. Summing these yields a formula for S D ( ), the total difference frame energy, which clearly shows the effects of characteristics such as the texture of the frame and the energy of the interframe noise. The latter is introduced in the difference frame by the blockbased, motion compensation as a result of illumination changes, camera noise, coding distortion in the previous frame, occlusions, nontranslational motion, and other related phenomena. We then obtain in Section 3 a fairly simple expression of the form R( ) = R M ( ) + R ~ D (S D ( )) (1) In Section 4, we use the above to find closed form expressions for the optimal motion vector accuracies in four cases: 1) the components of all motion vectors of all frames have the same 2

4 accuracy, as in most previous methods [1-5,7-10], i.e. in all frames = x,1 = y,1 = = x,n = y,n ; 2) the motion accuracies are adapted to each frame, but constant throughout the frames, i.e., in the j-th frame j = x,1 = y,1 =...= x,n = y,n ; 3) the motion accuracies are adapted to each block, however, x,i = y,i for each block; and 4) the accuracy of each component of each motion vector is individually adapted to the block. Using these expressions, we separately explore the cases of lossless and lossy difference frame coding. In the lossless case, where the difference frame quantizer has level spacing 1, the motion vector coder uses fixed-rate nonpredictive coding of the motion vector components. Although this lossless codec has limited practical significance, it allows us to verify our analysis in a very simple setting and to explore the performance of optimal motion accuracies when there is no coding distortion. In the lossy case, where the difference frame rate is much lower, the motion vector coder uses predictive and entropy coding to increase efficiency. Though the expressions for the optimal motion vector accuracies are in closed form, they involve certain parameters that must usually be estimated from the frames of the video, namely the coefficients of the quadratic expression for the difference frame energy S D ( ). Several methods for doing the estimation are described in Section 2 -- some more suited to the lossless case and some to the lossy case. Section 5 of this paper presents the results of experiments that use the expressions mentioned above to predict the overall rate and to adapt the motion vector accuracies when coding real video sequences. The results indicate that the expressions, though tractable, are nevertheless fairly accurate, and that the adaptation of the motion vector accuracies to individual frames results in significant rate savings -- for example, up to 0.4 bits/pixel in the lossless case and up to 35% in the lossy case over the typical =1/2 pixel accuracy choice of most video coding schemes. For the tested video sequences, it was not found that adapting motion accuracy on a per block basis yielded significant rate savings, though it is possible that it would for certain sequences. Nevertheless, in related work [12] we found that using blockadaptive motion accuracy can provide a significant reduction of computational complexity in motion estimation. Section 6 presents concluding remarks. It is our belief that an important contribution of the present work is that the analytical expressions quantify phenomena that have been qualitatively observed and intuitively understood in other studies. For example, using idealized models, Girod [4] observed qualitatively that the motion vector accuracy should increase with lower interframe noise. Recently, Benzler [9] showed with extensive empirical experiments that using motion vectors with =1/4 can often result in significant coding improvements over the typical =1/2, particularly in highly textured video sequences and at higher bit rates, when coding noise (which is a kind of interframe noise) is low. Our formulas for the optimal motion accuracies show "quantitatively" how the accuracies increase (i.e. the 's become smaller) when there is less interframe noise and also when more texture is present. 3

5 There has been previous work that has formulated equations like (1) in order to optimize motion vector accuracies. For instance, Buschmann [10] derived a formula for R M in terms of motion accuracy (and other parameters), but assumed ideal coding of motion vectors (i.e., used optimal rate-distortion functions) and did not consider the effect of motion accuracy on R D. In [5], the difference frame energy S D was measured empirically at each step of a topdown, quadtree-based technique that attempted to find the block sizes and acccuracies for the motion vectors that minimized an expression similar to (1). In regards to the motion accuracy aspect, since the work in [5] lacked an analytical expression for S D, no formulas for the optimal motion accuracies were derived. In fact, the motion accuracies were simply heuristically increased while growing the quadtree and hence were not globally optimized. In a related work [6], the difference frame energy S D was also measured empirically for several motion vector accuracies, but for fixed block-size motion compensation. At a given block, this method selected the motion accuracy that produced the largest decrease of difference energy per motion bit. As in [5], the motion accuracies were not globally optimized and no analytical expression for S D or for the optimal motion accuracies were derived. Other previous work found and made use of an analytical expression for the difference frame energy S D. Specifically, the seminal work of Girod [4] developed an analytical expression for S D as a function of the probability distribution of the errors in the motion vectors (which is essentially determined by the motion accuracy), the Fourier transform of the frame, and the power spectral density of the interframe noise. Girod's work was extended to interlaced video frames in [7] and to multihypothesis prediction in [13]. However, Girod's expression for S D is not sufficiently tractable to permit analytical optimization of motion vector accuracy, let alone adaptation on a block-by-block basis. In fact, his work focused only on studying the effect of motion accuracy (and other parameters) on the difference frame energy S D, but did not explore the effect on difference frame rate R D or on motion rate R M. Hence, no expressions were found for the (adaptive or nonadaptive) optimal motion accuracies. Nevertheless, Girod gained interesting insights by modeling the spectrum of image data with an isotropic Gaussian distribution and plotting S D for different values of. Using these plots and plots of the empirical S D ( ) on a few video frames, he reached the important conclusion that for many video sequences the best (nonadaptive) motion vector accuracy is between =1/2 and =1/4. However, it is evident that the optimal motion accuracy needs to depend on the nature of the video sequence and the distortions that are present. In fact, in this paper we show cases where the best motion accuracies are outside Girod's interval. Even when the best accuracy is in that interval, if the optimal is close to =1/2, using =1/4 not only would increase bit rate, but would also significantly increase computational complexity if the typical block matching technique was used for motion estimation. The latter occurred during the MPEG4 experiments [9], where it was observed that using =1/2 often worked better than =1/4 when coding low-textured scenes at low bit rate (i.e. high distortion). It is largely to predict the best nonadaptive value of (given distortion, image texture, and other 4

6 characteristics of the scene) and for adapting motion accuracy on a frame-by-frame or blockby-block basis that our more tractable analysis is developed. Other work related to the analysis of the difference frame energy may be found in [8,14,15]. 2. Estimating the Energy in Blocks of the Difference Frame This section develops approximate expressions for the energy of the difference between a block of the current frame and its prediction from the previous frame, as functions of its motion vector and the motion vector accuracy. Summing over all blocks in the current frame yields an approximate expression for S D ( ), the difference frame energy as a function of the motion vector accuracies. Let F[n] and F - [n] denote the present frame and decoded reproduction of the previous frame, respectively, where n = (n x,n y ) Z 2 denotes a pixel location with integer-valued horizontal and vertical positions n x and n y, and F[0,0] is below and to the left of F[1,1]. That is, we use sampled rather than matrix indexing of the pixel locations. Let F(x) and F - (x) denote continuous-space interpolated versions of F[n] and F - [n], respectively, where x = (x,y) R 2. We wish to consider the prediction of a block of F[n] by a block of F - (x) as pointed to by a motion vector. For convenience we assume that the block of F[n] to be considered covers a B B square 2 and place the origin of the (x,y) coordinate axes at its lower left corner. That is, the current block occupies the square of pixels B = {0,1,...,B-1} {0,1,...,B-1} and, for instance, blocks to the left and below will be at negative (x,y) locations. Given a motion vector v = (v x,v y ) R 2, the prediction for block B in F[n] is ^F[n] = F - (n + v), n B, (2) and the energy of the resulting prediction error is S(v) = (F[n]-^F[n]) 2 = n B (F[n]-F - (n+v)) 2. (3) n B In the analysis to follow, we approximate the above by an integral: S(v) (F(x) - ^F(x)) 2 dx = (F(x) - F - (x+v)) 2 dx, (4) B c B c where B c = [0,B] [0,B] denotes the block corresponding to B in continuous space and ^F(x) denotes the continuous space version of ^F[n]. 2.1 Ideal Motion Vector, Ideal Prediction, and Interframe Noise Let v * R 2 denote the motion vector that minimizes the prediction error energy S(v) in (3). We consider v * and its associated block ^F * (x) = F - (x+v * ), x B c, to be the ideal 2 This analysis could be easily generalized to rectangular blocks of B x B y pixels. 5

7 motion vector and the ideal prediction, respectively, for the current block. An example (for the one-dimensional case) of the current block and its ideal motion vector and prediction are illustrated in Figure 3. In practice, the current block and its ideal prediction are similar (in both discrete and continuous space), since they typically correspond to the same physical image element, moved v * units from the previous frame. Accordingly, we model the ideal prediction as ^F*(x) = F - (x+v*) = F(x) + N(x), x B c (5) where N(x) is interpreted as interframe noise produced by light changes, camera noise, coding distortion (in the previous frame), nontranslational motion, occlusions, etc. Without such interframe noise, the current block and its ideal prediction would be identical. 2.2 Effect of Motion Vector Errors In practice, motion estimation can only produce motion vectors with limited accuracy. Hence, we assume there is some motion error vector u = (u x,u y ) = v - v*, and we model the prediction of the current block as a shifted version of the noiseless ideal prediction plus the interframe noise; i.e. as ^F(x) = F(x+u) + N(x), x B c. (6) The motion error and resulting prediction are illustrated in Figure 3. We seek an approximate expression for the prediction error energy as a function of the motion vector error u. That is, we seek the form of We decompose the above into where ~ S(u) = S(v*+u) ~ S o (u) = B c (F(x) - F(x+u) - N(x)) 2 dx, (7) ~ S(u) ~ S o (u) + ~ S n (u), (8) B c (F(x) - F(x+u)) 2 dx, (9) ~ S n (u) = -2 B c (F(x) - F(x+u)) N(x) dx + B c N(x) 2 dx. (10) We approximate the term ~ S n (u) as a constant 3 C n ; i.e. it has no significant dependence on u. Since the noise N(x) and the difference F(x)-F(x+u) are at most very weakly correlat- 3 The first term in ~ S n (u) is approximately zero; the second is approximately the variance of the noise. 6

8 ed, it is anticipated that the second integral term in (10) will dominate. For this reason we consider C n to be the energy or amount of interframe noise for the current block. We now focus on the noise-free difference energy ~ S o (u). Our principal result is that ~ S o (u) is, approximately, quadratic in u. One way to demonstrate this is simply by expanding ~ S o (u) in a Taylor series about the point u=(0,0). In doing so, one finds that for small u ~ S o (u) a u 2 x + b u 2 y + c u x u y, (11) where B B a = 0 ( F(u 0 u x,u y )) 2 B B du x du y, b = x 0 ( F(u 0 u x,u y )) 2 du x du y, (12) y B c = B These coefficients may in turn be approximated as B-1 B-1 a (F[nx,n y ]-F[n x -1,n y ]) 2 B-1, b nx =1 n y =0 u x F(u x,u y ) u y F(u x,u y ) du x du y. (13) nx =0 B-1 (F[nx,n y ]-F[n x,n y -1]) 2, (14) n y =1 B-1 B-1 c 2 (F[nx,n y ]-F[n x -1,n y ])(F[n x,n y ]-F[n x,n y -1]). (15) nx =1 n y =1 From the above, we see that the coefficients a and b are essentially measures of the texture of the current frame along x and along y, respectively, while c is related to the correlation of the texture along x and y. Though (11) shows the basic form of ~ S o (u), it does not indicate how small u needs to be in order for the approximation to be accurate. This could be studied by analyzing the error term in the Taylor series expansion, but such approach appears to be somewhat complex. Instead, we undertake a direct derivation of ~ S o (u) based on a Fourier series representation of F(x). As additional benefits, we will see how a, b and c depend on the frequency components of F(x), and we will see that c is usually small enough that it can be ignored. In the vicinity of the block B c =[0,B] [0,B], we approximate F(x) as periodic with period B, both horizontally and vertically. Accordingly, it has the Fourier series representation F(x) = K K n cos(ω o (x,n) + θ n ), (16) n L where n=(n x,n y ), L={n: n x =0 & 0<n y <, or n x >0 & - <n y < }, ω o =2π/B, (x,n)=xn x +yn y, K n e jθn = 1 B 2 F(x) e -jω o(x,n) dx, (17) B c 7

9 and K n 0. Substituing (16) into (9), which is the definition of ~ S o (u), simplifying, and neglecting small terms yields ~ S o (u) 4B 2 K 2 n (1-cos ω o (u,n)). (18) n L Using the approximation cos β 1 - β2/2 for β π/2 in the above and simplifying, we find that when, as often happens 4, K n 0 for all n such that n x /B + n y /B >1/2, ~ S o (u) 8π 2 K 2 n (n 2 xu 2 x + 2n x n y u x u y + n 2 yu 2 y ), u x 1/2, u y 1/2. (19) n L From the above we see how the coefficients of the quadratic approximation depend on the components of F(x): a 8π 2 K 2 n n2 x, b 8π 2 K 2 n n2 y, c 16π 2 K 2 n n x n y. (20) n L n L n L Finally, we note that the coefficient c will ordinarily be small enough to ignore, because for typical image blocks [16, p. 39], the larger values of K n are along the x and y frequency axes (where the product n x n y is zero) or close to the origin (where n x n y is small). In summary, we obtain the approximations ~ S o (u) a u 2 x + b u 2 y, (21) ~ S(u) a u 2 x + b u 2 y + C n, (22) when u x 1/2, u y 1/2. Though one can also derive (20) by substituting the Fourier series representation (16) directly into (12) and (13), in this case one will not learn that the approximation is accurate when the magnitude of u x and u y are at most 1/2, which happens when the motion vector accuracy is less than 1 (pixel or subpixel accuracy), the usual case. Additional derivations of (21) are given in [17,18]. The quadratic dependence of difference frame energy on motion vector errors was observed in the empirical experiments reported in [19]. 2.3 Effect of Motion Vector Accuracies As mentioned in the introduction, we model the effect of motion accuracy as a uniform quantization of ideal motion vectors. Viewing the latter as random, we compute the average energy of the difference between a block and its prediction as E[ ~ S(U)] a E[u 2 x] + b E[u 2 y] + E[C n ] 4 This means that the amplitudes of the components of F(x) at block frequencies above 1/2 cycles/pixel are small. 8

10 α 2 x + β 2 y + γ, (23) where E[ ] is the expectation operator and a α = 12, β = b 12, γ = E[C n], (24) and where α and β were obtained assuming that the components of the motion error u = (u x,u y ) are, approximately, uniformly distributed over quantization cells of width x and y, respectively. It is interesting to note that the relationship in (23) will hold even when the value of c in (11) is not negligible, because the realizations of the motion errors u x and u y will normally be uncorrelated and have zero mean, i.e., E[cu x u y ] = c E[u x ]E[u y ] = 0. Summing (23) over all blocks of the current frame yields the following expression for the difference frame energy in terms of the motion vector accuracies specified for its blocks: S D ( ) 1 1 B 2 N N (α i 2 x,i + β i 2 y,i + γ i ), (25) i=1 where α i, β i, γ i are the quadratic coefficients for the ith block. This is a key result of the paper. Basically, it presumes a kind of linear relationship between the average size of the motion vector error u and the motion vector accuracy. 2.4 Estimating the Quadratic Coefficients. In order to use (25) to optimize the choice of the 's, we need estimates for the quadratic coefficients α, β and γ for each block. Here we mention several that can be useful in different situations. We assume that current and previous frames, F[n] and F - [n], and a block with coordinates B are given. A: Estimate α=a/12 and β=b/12 from the formulas for a and b in (12) or (14). (Notice that (12) requires finding the interpolated image F(x), at least approximately.) B: Find the Fourier series representation (16) of the interpolation F(x) and estimate α and β from the formulas for a and b in (20). Methods A and B estimate the α and β coefficients using only the present frame. As a result, they are not influenced by the actual motion or interframe noise in the present frame relative to the previous, as shown for example in (5) or (6). However, since we are actually trying to model the energy in the difference between the present frame and a motion compensated prediction based on the previous frame, we can do better, in some situations, by using curve fitting procedures based on both frames. While it is possible to use such a method to estimate a,b,c, and then compute α,β,γ from them, the following two methods directly estimate α,β,γ in (23). They exploit the fact that the resulting quadratic expression will be used to determine the best motion accuracies from a finite set of candidates. Let Γ= {δ 1,δ 2,,δ J } denote the set of candidate values for the component accuracies x and y. Without loss of generality, assume δ 1 <δ 2 < <δ J. We also assume δ 1 <<1. 9

11 C: Measure S(v), as defined by (3), for all v in some square grid of points {v 1,v 2,,v M } in the neighborhood of the block, with the horizontal and vertical resolution of the grid equal to δ 1. Let v* be the grid point v i for which S(v i ) is smallest. For every candidate pair of accuracies =( x, y ) with components in Γ, quantize v * x and v * y with uniform scalar quantizers with level spacings x and y, respectively, obtaining the motion vector ^v and motion vector error u = ^v - v *. Measure ~ S(u ) = S(^v ). Find the least squares fit of a polynomial of the form α 2 x + β 2 y + γ to the set of points (, ~ S(u )). Alternatively, one can let γ = S(v*), without much difference. D: Let the candidate set of accuracies be Γ = {δ 1,2δ 1,4δ 1,,2 J-1 δ 1 }, where as before δ 1 <<1. Find v * = (v * x,v * y) as in Method C. Then for every candidate pair of accuracies =( x, y ) with components in Γ, measure S(v*+(i o,j o )) for all integers i and j such that - x /2 i o x /2 and - y /2 j o y /2. Let ^S( ) be the average of the S(v*+(i o,j o ))'s. Find the least squares fit of a polynomial of the form α 2 x + β 2 y + γ to the set of points (,^S( )). Alternatively, one can let γ = (^S(δ 1,δ 1 ) + ^S(2δ 1,δ 1 ) + ^S(δ 1,2δ 1 ) + ^S(2δ 1,2δ 1 ))/4, without much difference. We found Methods C and D to be very accurate, and for this reason we have used them in this study, even though they are somewhat computationally complex. In practice, they are more appropriate for off-line or long-delay coding (e.g., on-demand video streaming, VCD). However, if the motion vector accuracies are restricted to be the same for the x and y components (e.g., see Modes 1, 2 and 3 later in Section 4), the computation is still reasonable for real-time coding, since only a few difference frame energy measurements per block are required. (For real-time coding, a low complexity approach should be used to find the ideal motion vectors, as in [6,9]). We do not present results with Methods A and B in this paper, but since they can be found in related work [11,12,18], we discuss them here briefly. Method A is well suited for very low complexity codecs, and it is still quite effective, as was shown in [11,12]. Method B is appropriate if the Fourier coefficients are available for each block. Some comparisons of the performance of these estimation methods were presented in [18, p. 51]. 3. Modeling Rate This section finds expressions for the motion and difference frame rates in terms of the motion vector accuracies and other key parameters. Ordinarily, the motion vectors are found by a simple motion estimation procedure, such as block matching, that computes the motion vectors on a grid with the chosen accuracy. However, as mentioned before, in this study, it is assumed that the ideal motion vectors are computed first and then quantized to the desired accuracy. 10

12 3.1 Motion Vector Coding and Rate We consider two kinds of motion vector coding: uniform scalar quantization with fixedrate coding and DPCM with entropy coding. In the first case, if the motion vector accuracies for the ith block of the current frame are x,i and y,i, then the x and y components of the ideal motion vector for that block are quantized with, respectively, N x = 2V m / x,i and N y = 2V m / y,i uniformly spaced levels, where V m is the maximum anticipated displacement between two adjacent frames. The outputs of these quantizers are assumed to be encoded with log 2 2V m / x,i and log 2 2V m / y,i bits, respectively. As a result, the overall rate (in bits per pixel) invested in motion vectors is R M ( ) = 1 1 B 2 N N 2V m 2V m log 2 i=1 + log 2 x,i (26) y,i If, as we wish to consider, the motion vector accuracies change with every frame or block, then they must also be encoded and sent to the decoder. However, since in this work x,i and y,i will take values in a relatively small set, the rate for this is usually negligible in comparison to the motion and difference frame rates. DPCM is a more popular technique for encoding motion vectors, due to its lower rate [20]. In fact, it has been adopted by a number of current video coding standards [1,2,3]. With DPCM, the quantized motion vector for the ith block is v i = v i-1 + q i (v * i -v i-1), where v * i is the ideal motion vector, v i-1 is the quantized motion vector for the previous block (in scan order), which serves as a prediction of v * i, and q i () denotes the operation of uniform scalar quantization of the x and y components of the prediction error with level spacings x,i and y,i, respectively. Because we consider entropy coding, the numbers of quantization levels are assumed to be large and the quantized prediction errors are assumed to be encoded with variable-length binary codes. Since the probability distribution of the quantized prediction errors depends on the level spacings, which in turn depend on the motion vector accuracies, and since in this study the motion vector accuracies may vary (with frame or block), we assume there is a different variable-length code for every possible accuracy and that when the accuracy is, the corresponding variable-length code produces on the average approximately H bits, where H is the entropy of the quantized prediction error when the level spacing is. When, as usually happens, the level spacing is considerably smaller than the standard deviation of the prediction errors, one may use the well known approximation [21, p. 228] H h - log 2, where h = - p(w) log2 p(w) dw is the differential entropy of the (unquantized) prediction errors, which are modelled as having probabil- - ity density p(w). As a result, the motion rate for DPCM with entropy coding is R M ( ) 1 B 2 1 N i=1 N (2h - log2 x,i y,i ) (27) We summarize the two expressions (26) and (27) for motion rate as 11

13 R M ( ) 2H B B 2 N N log 2 x,i y,i (28) i=1 where H = log 2 2V m in the case of uniform scalar quantization and H = h in the case of DPCM with entropy coding. 3.2 Difference Frame Coding and Rate The difference frame pixels are encoded by a uniform scalar quantizer with level spacing Q, followed by an entropy coder, where the latter is adapted to the individual frame being encoded. Since image pixels take values in the set {0, 1,, 255}, the difference frame pixels take values in {-255,,255}. The encoding is lossless when Q = 1, and lossy when Q > 1. Such a coder is adequate for this study because of the low correlation between the pixel values in the difference frame [4,5]. The mean squared error (MSE) distortion in the Q-quantized difference frame can be approximated by the well known expression Q 2 /12 [22, p. 152], assuming Q is neither too small nor too large. Equivalently, the peak signal-to-noise ratio (PSNR) is approximately log 10 Q. For example, Q = 25 and 20 correspond to approximately 31 and 33 db, respectively. When Q = 1, the lossless coding case, the MSE is 0 rather than 1/12. Note that the PSNR is affected little by the choice of motion vector accuracies. Letting ^p Q (d) denote the frequency of the value d in the Q-quantized difference frame, the rate (in bits/pixel) produced when encoding a particular difference frame is given, approximately, by the entropy H(^p Q ). Furthermore as in [5,11], we assume that ^p Q is the distribution resulting from quantizing a Laplacian density p D (d) with a uniform scalar quantizer with level spacing Q. When σ 2 /Q2 is large (i.e., Q and the distortion are small relative to the variance σ2 of p D ), H(^p Q ) h(p D ) - log 2 Q, (29) where h(p D ) = (1/2) log 2 2 e2 σ2 is the differential entropy of p D [21, p. 228]. On the other hand when σ2/q2 is small (i.e., Q and distortion are large), H(^p Q ) is approximately linear in σ2. We combine these two approximations to obtain H(^p Q ) 1 2 log 2 2e2 σ2 Q2, σ2 Q2 > 1 2e e ln 2 σ2 Q 2, σ 2 Q 2 1 2e (30) where the linear function of σ 2 is the unique tangent of (1/2) log 2 (2 e2 σ2/q2) that passes through the origin, and 1/2e is the point of tangency. (Note that the right side of (30) is a continuous differentiable function of σ2/q 2.) Figure 4 compares H(^p Q ) to the above approximation for different values of σ and Q. Though distributions other than Laplacian could also be considered for the difference frame, for example generalized Gaussian as in [23], we 12

14 find that our results are not very sensitive to the assumption about p D (d). For example, when σ 2 /Q 2 is large (the low distortion case), the differential entropies of Gaussian and Laplacian differ only by a constant, which does not affect the minimizations performed later. As our estimate of σ 2 we use the average of the per pixel energies of the blocks in the difference frame, and use (25) to give the approximate dependence of the latter on the motion vector accuracy. That is, we choose σ 2 = S D ( ) 1 P N (α i 2 i,x + β i 2 i,y + γ i ) (31) i=1 where α i,β i,γ i are the quadratic coefficients for the ith block of the current frame. Substituting (31) into (30) gives an expression for R D ( ), the difference frame rate as a function of the specified motion vector accuracies. 4. Optimizing Motion Vector Accuracies In this section we optimize the choice of motion vector accuracies. We assume that the the level spacing Q of the difference frame quantizer remains fixed at a value giving a satisfactory PSNR. We are then free to choose the motion vector accuracies to minimize the overall encoding rate, which by summing the expressions for motion and difference frame rates may be expressed as R( ) 1 2 log 2 2e2 σ 2 Q 2 + 2H B B 2 N N i=1 e ln 2 σ2 Q2 + 2H B B 2 N N log 2 x,i y,i, i=1 log 2 x,i y,i, σ 2 Q 2 > 1 2e σ2 Q2 1 2e, (32) where σ2 is given by (31). In optimizing the motion vector accuracies, we consider four modes of operation, corresponding to four increasing levels of adaptation. Mode 1: the nonadaptive, classical approach The components of all motion vectors of all frames have the same accuracy, as in most previous methods [1-4,7-10,]; i.e. x,1 = y,1 = = x,n = y,n =. Mode 2: frame-by-frame adaptation The same as the above except that is individually chosen for each frame. Mode 3: block-by-block adapation The motion vector accuracies x,1, y,1,, x,n, y,n are individually chosen for each block of each frame; however, it is required that x,i = y,i, i =1,,N. Mode 4: component-by-component adaptation 13

15 There are no constraints on the motion vector accuracies, so that each component of each motion vector of each block of each frame can be individually tailored. For each of these modes, we fix Q, which essentially fixes distortion 5 at the value D Q 2 /12, and minimize the total rate (32) subject to the constraints of the mode. The results given below are obtained by equating to zero the partial derivatives of (32) with respect to the motion accuracies. Optimized Mode 2: The optimal motion vector accuracy for the motion vectors in a given frame is = 2 1/2 B 2-2 µ 1/2 ν Q 2, µ < 2e B 2 B 2-2, (33) 1 1/2 B 2 e Q 2 1/2 ν, otherwise where µ = 1 1 B 2 N i=1 N γ i is the average interframe noise energy (per pixel) for the given frame (recall that γ i is the estimated interframe noise for the ith block of the frame), and ν = 1 1 B 2 N i=1 N (α i +β i ) is a measure of the texture (per pixel) of the given frame. By substituting (33) into (32) and using the fact that the condition Q2/µ<2eB 2 /(B 2-2) insures that σ2/q2>1/2e for the optimal *, and vice versa, one may obtain an expression for R( * ), which we omit since it is long and not particularly enlightening. Note that the coding distortion D, which is present in the previous frame, is one of the components of the interframe noise µ. Among other things this implies that µ D Q2/12 and that µ increases with Q. Optimized Mode 1: The optimized motion accuracy for Mode 1, the nonadaptive case, is the same as (33), except that µ and ν are the average interframe noise and texture over all blocks in all frames of the sequence, and Ν is the total number of such blocks. Optimized Mode 3: The optimal motion accuracy for the ith block in a given frame is * i = 2 1/2 B 2-2 µ 1/2 νi Q 2, µ < 2e B 2 B 2-2. (34) 1 1/2 B 2 e Q 2 1/2 νi, otherwise where ν i = 1 B 2 (α i +β i ) is a measure of the texture in the ith block. Optimized Mode 4: 5 Recall, D Q 2 /12, when Q is neither too large nor too small. For example, in the lossless case, Q=1 and D=0. 14

16 The optimal motion accuracies for the ith block in a given frame are * x,i = 2 1/2 B 2-2 µ 1/2 2α Q 2, i /B 2 µ < 2e B 2 B /2 B 2 e Q 2 1/2 α, otherwise i /B 2 (35a) * y,i = 2 1/2 B 2-2 µ 1/2 2β Q 2, i /B 2 µ < 2e B 2 B 2-2. (35b) 1 1/2 B 2 e Q 2 1/2 β, otherwise i /B 2 Formulas (33)-(35) show how the optimal motion vector accuracies depend on the parameters α i, β i, γ i and Q. Since they are based on the quadratic model (31) for difference frame energy (derived in Section 2), they apply when the desired motion accuracies x,i, y,i are less than or equal to 1, i.e., for pixel or subpixel accurate motion vectors. The block texture measures α i, β i are large for high texture blocks and small for low textured blocks, as explained in Section 2. Since they appear in the denominators of (33)- (35), the formulas show that the motion vectors of blocks (or frames in Modes 1 and 2) with more texture must be encoded more accurately (i.e., with smaller 's) than those with less. Recall that the interframe noise µ corresponds to the per pixel energy of the difference frame in the ideal case where the motion vectors were encoded with infinite accuracy (i.e., the 's are zero). Hence, this noise energy cannot be reduced by more accurate block motion compensation. The interframe noise µ would be zero only when the motion-compensated prediction could be made identical to the current frame, which, as we mentioned earlier, is not possible in real scenes because of the camera noise, light changes, occlusions, nontranslational motion, encoding distortion D, etc.. Because µ appears in the numerators of (33)-(35), these formulas show that for larger values of µ, lower motion vector accuracies are needed, a fact that was previously observed by Girod [4]. For large values of Q, µ is replaced by Q 2 in the numerators. The formulas indicate that less accurate motion vectors are needed at higher levels of compression. 5. Experimental Results We implemented several video coders and present results of their performance on synthetic and real (gray-level) video sequences. There were two synthetic sequences: "low texture" and "high texture", which are moving low-frequency and high-frequency sinusoids, respectively. The real video sequences are the well-known "caltrain" (a scene with a moving train and a calendar 6 ) and "miss america" (a video conferencing scene). The frame resolutions were pixels for both synthetic sequences, for "caltrain", and for 6 The caltrain sequence is a version of Mobile & Calendar, although not the typical MPEG4 version. It is available by anonymous ftp to ipl.rpi.edu. 15

17 "miss america". The frame rates were 30 frames/second. Figure 5 shows a frame of each of these sequences. The results described here are a representative subset of those in [18] and are presented for different levels of distortion and different modes of operation. We used classical full-search block matching [24, p. 335] on each 8 8 block with the minimum absolute error criteria to compute the (approximately) ideal motion vectors with o =1/64 subpixel accuracy 7. The subpixel values were computed using the commonly used bilinear interpolation, although other more advanced interpolation filters, such as those in [9,10,15,27], could have been used instead. In each case, the first frame of the sequence was intracoded by a simple uniform scalar quantizer with level spacing Q and each of the following frames was predicted by their respective (encoded) previous frames. 5.1 Video Coder 1: Lossless Entropy Coding of Difference Frames, Scalar Quantization of Motion Vectors. In this first video coder, the difference frame pixels are simply losslessly encoded with a first-order entropy coder and the ideal motion vectors are uniform scalar quantized with the desired motion accuracies x,i, y,i as the level spacings. Hence, the total encoding rate is modeled by the top term of (32), with Q=1 (lossless coding) and H = log 2 2V m (recall (26) and (28)). The maximum value or velocity in the scalar quantizer is set to V m =32 pixels. We ran this coder first in Mode 1 (fixed, rather than adaptive, motion vector accuracy ) on 9 frames of "caltrain" with equal to each value in the set Γ = {1, 1/2, 1/4, 1/8, 1/16, 1/32, 1/64}. The solid line in Figure 6 shows the resulting (empirically measured) rates R (in bits per pixel) versus. The dashed line shows the (empirically measured) difference frame rate R D. The distance between the solid and dashed lines is the rate R M devoted to motion, which as expected, increases as decreases (recall Figure 2). Next we ran the coder in Mode 2 (frame-by-frame adaptation of ) on the same 9 frames of "caltrain", using (33) to estimate the best motion vector accuracy * for each frame and Method C of Section 2.4 to estimate the quadratic coefficients α i, β i, γ i in (31) for each frame. This is a good choice because it measures the prediction error energies using the same uniform scalar quantizer that is used for encoding, but other estimation methods could have been used instead. The estimated *'s for the nine frames ranged from to 0.116; i.e. they differed little. The "o" on the -axis of Fig. 6 plots their average. Since each was closer to 1/8 than to any other member of Γ, the performance of the coder with =1/8, marked with an " " in Fig. 6, provides an indication of how this coder performs with Mode 2 adaptation. Though need not change significantly over these frames, one cannot conclude that Mode 1 is as good as Mode 2. This is because when Mode 1 is used, as in typical blockbased video coders, is usually fixed at 1 or 1/2, as a compromise for many different types 7Any motion estimation technique that computes fractional-pel motion vectors could be used as well. For example, the low-complexity methods in [6,19] could be applied. The computational complexity of the full-search approach with different motion accuracies is addressed in [12]. 16

18 of scenes. As one can see from Fig. 6, this would result in significantly higher rate on these 9 frames (approximately 0.35 bits/pixel larger for =1/2). As a further example, in [18, p. 73,74] we ran the coder in Mode 2 on 9 frames of "SRI trees" (a camera panning across a wooded scene) and found that for each frame * 1/4. This is indicative of the need to adapt, and of how (33) can be used to choose. In summary, the results suggest there are benefits to adapting * to scenes. Frame-by-frame adaptation accomplishes this, but since it appears that the adaptation may need only operate on a scene-by-scene basis, there might be simpler methods for adapting than the frame-by-frame method that we used. One may also see from Fig. 6 that the loss due to using too large a value of is much greater than that due to using too small a value of, an observation that should be useful when using Mode 1. We also ran the coder in Modes 3 and 4 (block-by-block adaptation of x and y, with and without the constraint that x = y ). The "+" and "*" in Fig. 6 plot ( * avg,r) for Modes 3 and 4, respectively, where * avg is the average of the *'s found using (34) and (35), respectively, and R is the empirical rate of the coder. For practical reasons, each was rounded to the closest value in the set Γ. One can see that Mode 4 gained little over Mode 3, and neither gained substantially over Mode 2 (frame-by-frame adaptation). On the other hand, their use of larger 's, might reduce the average complexity of block matching. To gain a sense of the goodness of the predicted * 's, consider Figure 7, which is like Fig. 6, but for only three blocks of one frame of "caltrain", specially chosen so the parameters α i, β i, γ i were distinctly different for each block. In addition to showing the same lines and marks as Fig. 6, the " " plots the average accuracy and minimum rate obtained by a full search over all motion accuracy combinations for all three blocks, i.e. Mode 4, with ( x,i, y,i ) Γ Γ, for i=1,2,3. No other blocks were included due to the huge search time that would have been required. One can see that using our optimized accuracies produced encoding rates close to the minimum found by full search. Finally, we mention that a comparison in [18, p. 76] indicates that the rate formulas we use provide accurate predictions. 5.2 Video Coder 2: Lossy Entropy Coding of Difference Frames, DPCM with Entropy Coding of Motion Vectors. In the second video encoder, the difference frame is lossy encoded with a uniform scalar quantizer with level spacing Q, followed by a first-order entropy coder. The ideal motion vectors are encoded using conventional DPCM plus entropy coding, as described in Sec The desired block motion accuracies x,i, y,i determine the level spacings used in the DPCM quantizer. Here, the encoding rate is modeled by (32) with H = h (recall (27),(28)). Figure 8 is like Fig. 6, but for this lossy video coder and for several levels of distortion, as determined by values of Q. The solid lines show the total empirical rate R for encoding 4 frames of "caltrain" with different values of. (Since in the lossy case the optimal motion accuracies tend to be larger, we show the rate also for =2, and omit =1/64.) Q=1 operates this coder losslessly, and Q=25 corresponds to PSNR 31 db. As in Fig. 6, each "o" shows 17

19 the average of our predictions for the value * yielding minimum rate for each frame (from right to left, the "o"'s correspond to Q=1,5,15,25). The quadratic coefficients α i, β i, γ i were computed using Method D described in Sec. 2.3, although as in the previous coder, other estimation methods could have been used as well. The " ", "+" and " " have the same meanings as in Fig. 6. As with the previous coder, there was little variation in the best from frame to frame; Modes 3 and 4 performed similarly; neither gained significantly over Mode 2; and the performance was less sensitive to an overly small than to an overly large. Notice also that, as expected, the optimal 's increased as Q (i.e. distortion) increases. Due to the DPCM motion vector encoder, it was not feasible to produce a plot like Figure 7. Figure 9 plots "rate-distortion" curves for the different modes of this coder. The solid line is for Mode 1 with motion vector accuracy =1 (i.e., pixel accurate motion vectors), and the dashed line is for Mode 1 with =1/2 (i.e., half pixel accuracy). The " ", "+", and " " signs indicate the rate-distortion points for optimized Modes 2, 3, and 4, respectively. Figures 10 (a), (b), (c) are like Fig. 6, but for two frames of "low texture" (Q=25), "high texture" (Q=5), and "miss america" (Q=25), respectively. Fig. 10 (d) is the same as Fig. 8 ("caltrain"), but just for Q=5. These curves illustrate how the rate-accuracy curves change substantially depending on scene texture and compression level, suggesting that adaptation will have benefits. In particular, the optimized *'s decrease when the image texture increases and the compression level decreases. The effect of compression level on the optimal motion accuracies is also reflected in Fig. 11, showing histograms of the x,i values for optimized Mode 4 in "caltrain" when Q=1 (top) and Q=25 (bottom). 5.3 Video Coder 3: DCT and Variable-length Coding of Difference Frames, DPCM with Entropy Coding of Motion Vectors. Although our model for the difference frame rate in Sec. 3.2 was derived for a coder based on uniform scalar quantization plus entropy coding, from rate-distortion theory or high-resolution theory we note that the rates of more advanced coders are expected to be lower than ours by approximately a constant. Therefore, equations (33)-(35) can still be used to find the optimal motion vector accuracies, because our minimization procedure is not affected by a constant difference. To confirm this, we implemented another video coder that uses a block-by-block DCT for the difference frame pixels. Specifically, this third video coder is essentially the same as second, except that we replaced the difference frame coder with a JPEG encoder [25]. This is quite similar to MPEG [2] because the latter does JPEGlike coding of difference frames, and DPCM plus entropy coding of motion vectors. Figure 12 is also like Fig. 6, but for encoding 30 frames of "caltrain" with video coder 3 at a PSNR of approximately 33 db. To achieve this PSNR at each frame, we searched for the JPEG quality factor that produced the PSNR closest to 33 db. Accordingly, we used Q=20 in our formulas. Every fifth frame was intracoded by a JPEG encoder at the same PSNR, 18

20 but the rate of the intracoded frames is not included in the results. The plot suggests that our formulas also work well with this more advanced, MPEG-like video coder. Note that we have mainly discussed results on optimizing motion vector accuracies according to (33)-(35), which is the primary focus of this paper. However, results on other aspects of this work (e.g., the precision of our energy and rate models) can be found in [18]. 5.4 Discussion of Experimental Results From our analysis and empirical experiments we conclude the following: If the motion accuracy is high enough (i.e., small enough ), lossless and lossy video coding are relatively insensitive to increases in motion accuracy. This is because the curves of empirical rate versus in Figures 6, 7, 8, 10, 12 are fairly flat in the vicinity of their minima. Moreover, the performance is generally less sensitive to an overly small than to an overly large. The rate for optimized Mode 2 (which fixes the same accuracy for all motion vectors in a frame) is as low as that of optimized Modes 3 and 4. Hence, considering different accuracies for different motion vectors did not usually help much. Only for "miss america" in Figure 10 (c) did the adaptive modes achieve significant rate savings over the best nonadaptive case. Nevertheless, block-based accuracy adaptation may have other benefits. For example, in a related work [12], we found that using block-adaptive motion accuracy produces significant computational savings when finding pixel and subpixel accurate motion vectors with typical block matching. Optimized Modes 2, 3, and 4 are consistently superior to Mode 1 with =1 and =1/2 (the typical motion vector accuracy choices of most video coding schemes). The optimized modes give significant savings in real scenes when Q takes values 1, 5, and 15. For example, when Q=1, the optimized modes saved up to 0.8 bits per pixel with respect to =1, as shown in Figures 6 and 8. Optimized Modes 3 and 4 also saved significant rate for Q=25 in Fig. 9 (up to about 17 percent of savings respect to =1/2). The rate savings of the optimized modes are expected to further improve in scenes with higher texture content. For instance, notice that for "high texture" in Figure 10 (b) these modes save up to about 0.8 bits per pixel or 35 percent with respect to Mode 1 with =1/2. The results also indicated that within a scene, there was little need to adapt, suggesting that the adaptation can proceed slowly, e.g. over a number of frames, which could, perhaps, be performed in a simpler fashion that that which we have used. Optimized Mode 3 is slightly superior to optimized Mode 4 for large Q, probably because the two-dimensional least-squares fit to the quadratic coefficients α i, β i, γ i for Mode 4 is more sensitive to distortion than the one-dimensional least-squares fit for Mode 3. 19

21 In general, the experimental results obtained with video coders suggest that our predictions and assumptions are quite accurate. For example, (33) predicts fairly well the value * where the minimum of the empirical rate for Mode 2 occurs, as seen in Figures 6, 7, 8, 10, and 12. The exact value of that minimizes each of the underlying continuous curves is not known, but when we round each of our * 's to the nearest value in the set of interest Γ, the rate that we obtain is either the minimum or very close to the minimum. Additionally, in the special cases where we made a test, our optimized motion vector accuracies achieved an empirical rate that was very close to that achieved by a full search on all possible accuracy combinations (recall Figure 7). On the other hand, for video coder 2 with Q=25, Fig. 8 shows that the predicted * 's were close to 1/2, whereas the coder would have had smaller rate if they were closer to 1/4. This is probably because our assumptions and models are less accurate for very large distortions. Our formulas (33)-(35) show that high motion vector accuracies are required in scenes with high texture, and lower accuracies are needed in scenes where the prediction is corrupted by camera noise, occlusions, or other phenomena, or if the level of compression is high. This is confirmed by our empirical experiments, particularly those in Fig Summary and Concluding Remarks The central theme of this paper is the development of a theoretical framework for finding the best motion accuracies in block-based video coding. Previous research provided interesting insights on this topic, but did not provide concrete formulas for the optimal motion vector accuracies that minimized rate in this type of coding. In this work, we presented effective difference frame and motion vector rate models that are tractable enough that analytical formulas for the optimal motion vector accuracies can be derived. To do this, we studied the effect of motion vector accuracy on the energy of a block in the difference frame and concluded that such energy is approximately quadratic in the block motion vector errors and accuracies. Adding the block energies, we obtained an expression for the difference frame energy in terms of the blocks' motion accuracies. We then derived formulas for the difference frame rate (which used the energy-accuracy expression) and the motion vector rate, in terms of the motion accuracies. Minimizing these formulas, we obtained concrete, analytical expressions for the optimal motion accuracies for different modes of operations (sequence, frame, or block adaptive). We implemented three block-based video coders and obtained experimental results on a variety of video sequences and distortion levels. The results suggest that our equations and assumptions are fairly accurate; in general, there is not much benefit to adapting motion accuracy on a block-by-block basis; the optimal motion vector accuracies depend on the texture and interframe noise of the particular scene (and they can be outside the 1/2-1/4 pixel interval [4]), and that optimizing motion vector accuracies with our procedures can provide significant bit rate savings over the typical pixel and half-pixel accuracy 20

22 choices of current video coders. Specifically, in our tests, most savings happened for high textured scenes at low compression levels, and were about percent. For some time, it has been known that using several block sizes in video coding provides some coding benefits (e.g., H263 [1] can use 16x16 or 8x8 blocks), but most video codecs still use the same, fixed motion accuracy. The results of this paper motivated the study and experiments in [27], in which multiple video sequences were coded (with an H.26L codec) at a variety of bit rates with several motion accuracies. This study also confirmed that some accuracies are significantly better than others, according to the specific sequence content and distortion level. As a result, H.26L has recently incorporated several motion accuracies in the test model TML-2 [28]. Our framework for optimizing motion accuracy can be potentially applied to quadtree or object-based video coders and can be extended to optimize other coding parameters such as block size [11]. Also, our rate equations can potentially be used to predict the bit rate and the potential coding gains when using different accuracies (c.f, [18, p. 76]). Finally, our models can be applied to other topics in video coding such as rate control [23] or motion estimation [19]. References [1] ITU-T, ``Video coding for low bitrate communication,'' ITU-T Recommendation H.263; version Nov. 1995; version 21, Jan [2] D. Le Gall, ``MPEG: A video compression standard for multimedia applications,'' Commun. ACM, Vol. 34, pp , Apr. 91. [3] Video Group, ``Text of ISO/IEC MPEG4 video VM,'' ISO/IEC JTC1/SC29/ WG11 Coding of Moving Pictures and Assoc. Audio, MPEG 97/W1796, Stockholm, July 97. [4] B. Girod, ``The efficiency of motion-compensating prediction for hybrid coding of video sequences,'' IEEE J. Sel. Areas Commun., Vol. 5, pp , Aug. 87. [5] F. Moscheni, F. Dufaux and H. Nicolas, ``Entropy criterion for optimal bit allocation between motion and prediction error information,'' Proc. SPIE VCIP, pp , Cambridge, Nov. 93. [6] S. Gupta and A. Gersho, ``On fractional pixel motion estimation,'' Proc. SPIE VCIP, pp , Cambridge, Nov. 93. [7] L. Vandendorpe, L. Cuvelier and B. Maison, ``Statistical properties of prediction error images in motion compensated interlaced image coding'', Proc. IEEE ICIP, Vol. 3, pp , Washington, D.C., Oct. 95. [8] H. Ito and N. Farvardin, ``On motion compensation of wavelet coefficients,'' Proc. IEEE ICASSP, Vol. 4, pp , Detroit, May 95. [9] U. Benzler, ``Results of core experiment P8: motion an aliasing compensating prediction,'' ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Associated Audio, MPEG 96/1512, Maceio, Nov. 96. [10] R. Buschmann, ``Efficiency of displacement estimation techniques,'' Signal Processing: Image Communication, Vol. 10, pp ,

23 [11] J. Ribas-Corbera and D. L. Neuhoff, ``Optimizing block size in motion-compensated video coding,'' Journal of Electronic Imaging, Vol. 7, pp , Jan. 98. [12] J. Ribas-Corbera and D.L. Neuhoff, ``Reducing rate/complexity in video coding by motion estimation with block adaptive accuracy,'' Proc. SPIE VCIP, pp , Orlando, Mar. 96. [13] B. Girod, ``Why B-pictures work: a theory of multi-hypothesis motion-compensated prediction,'' Proc. IEEE ICIP, Vol. II, pp , Chicago, Oct. 98. [14] M. Hötter, ``Optimization and efficiency of an object-oriented analysis-synthesis coder,'' IEEE Trans. Circuits and Systems for Video Technology, Vol. 4, pp , Apr. 94. [15] K. Illgner and F. Müller, ``Analytical analysis of subpel motion compensation,'' Proc. Picture Coding Symposium, pp , Berlin, [16] J.S. Lim, Two-Dimensional Signal and Image Processing. Prentice Hall Signal Processing Series, [17] J. Ribas-Corbera and D.L. Neuhoff, ``On the optimal motion vector accuracy for blockbased motion-compensated video coders,'' Proc. IST/SPIE Dig. Video Compr.: Alg. and Tech., San Jose, Feb. 96. [18] J. Ribas-Corbera, ``Optimizing the motion vector accuracies in block-based video coding,'' Ph.D. Thesis, University of Michigan, Ann Arbor, Sept [19] X. Li and C. Gonzales, ``A locally quadratic model for the motion estimation error criterion function and its application to subpixel interpolations,'' IEEE Trans. Circuits and Systems for Video Technology, Vol. 6, pp , Feb. 96. [20] P. Guillotel and C. Chevance, ``Comparison of motion vector coding techniques,'' Proc. SPIE Image and Video Compression, pp , Feb. 94. [21] T.M. Cover and J.M Thomas, Elements of Information Theory. John Wiley & Sons, Inc., 91. [22] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression. Kluwer Academic Publishers, 92. [23] K. Sharifi and A. Leon-Garcia, ``Estimation of shape parameter for generalized gaussian distributions in subband decompositions of video,'' IEEE Trans. Circuits and Systems for Video Technology, Vol. 5, pp , Feb. 95. [24] A. Netravali and B. Haskell, Digital pictures. Representation and compression, Plenum, [25] G.K. Wallace, ``The JPEG still picture compression standard,'' Commun. ACM, Vol. 34, pp , Apr. 91. [26] J. Ribas-Corbera and S. Lei, ``Rate control in DCT video coding for low-delay communications,'' IEEE Trans. Circuits and Systems for Video Technology, Vol. 9, pp , Feb. 99. [27] J. Shen and J. Ribas-Corbera, More experiments and low-complexity AMA for H.26L, ITU-T SG16/Q15, doc. Q15-I-38, Red Bank, Oct. 99. [28] Test model TML-2 (G. Bjontegaard, ed.), ITU-T SG16/Q15, doc. Q15-I-36, Red Bank, Oct

24 CURRENT FRAME PREDICTION DIFFERENCE FRAME DIFFERENCE FRAME CODER R D (bits/pixel) MOTION ESTIMATION Previous Frame PREVIOUS DECODED FRAME Current Frame MOTION COMPENSATION V1 Image Blocks Motion Vectors B V i B MOTION VECTOR CODER R= R D R M R M (bits/pixel) Figure 1: A typical block-based video coder. 23

25 R typical optimal R R M R D R* R M R D 1 1/2 * motion vector accuracy Figure 2: The typical eect of motion vector accuracy (e.g., = 1=2 corresponds to half pixel accuracy) on encoding bit rate R, when distortion is xed. More accurate motion vectors (smaller ) result in higher motion rate R M and lower dierence frame rate R D. The motion accuracy that minimizes R is denoted. CURRENT FRAME CURRENT BLOCK F(x) PREVIOUS FRAME IDEAL PREDICTION PREDICTION ^* ^ F F (x) (x) u = v* v ideal motion vector computed motion vector motion vector accuracy u = v - v* = motion vector error Figure 3: Illustration of a block's motion vector error u and accuracy, for the one-dimensional case. The location of the prediction for the block is shifted by u units from the ideal prediction, because of the limited accuracy of the computed motion vector. 24

26 6 5 Q=1 4 ENTROPY 3 Q=3 2 1 Q= SIGMA Figure 4: From top to bottom, the solid lines are the entropy of a Q-quantized Laplacian distribution with respect to standard deviation, when Q =1; 3 and 15. The dashed lines are the respective approximations obtained using (30). (a) \Low Texture" (b) \High Texture" (c) \Miss America" (d) \Caltrain" Figure 5: A frame from each of the video sequences used in the experiments. 25

On the optimal block size for block-based, motion-compensated video coders. Sharp Labs of America, 5750 NW Pacic Rim Blvd, David L.

On the optimal block size for block-based, motion-compensated video coders. Sharp Labs of America, 5750 NW Pacic Rim Blvd, David L. On the optimal block size for block-based, motion-compensated video coders Jordi Ribas-Corbera Sharp Labs of America, 5750 NW Pacic Rim Blvd, Camas, WA 98607, USA. E-mail: jordi@sharplabs.com David L.

More information

Basic Principles of Video Coding

Basic Principles of Video Coding Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on lectures from B. Lee, B. Girod, and A. Mukherjee 1 Outline Digital Signal Representation

More information

Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding

Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding Aditya Mavlankar, Chuo-Ling Chang, and Bernd Girod Information Systems Laboratory, Department of Electrical

More information

at Some sort of quantization is necessary to represent continuous signals in digital form

at Some sort of quantization is necessary to represent continuous signals in digital form Quantization at Some sort of quantization is necessary to represent continuous signals in digital form x(n 1,n ) x(t 1,tt ) D Sampler Quantizer x q (n 1,nn ) Digitizer (A/D) Quantization is also used for

More information

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING 5 0 DPCM (Differential Pulse Code Modulation) Making scalar quantization work for a correlated source -- a sequential approach. Consider quantizing a slowly varying source (AR, Gauss, ρ =.95, σ 2 = 3.2).

More information

Digital Image Processing Lectures 25 & 26

Digital Image Processing Lectures 25 & 26 Lectures 25 & 26, Professor Department of Electrical and Computer Engineering Colorado State University Spring 2015 Area 4: Image Encoding and Compression Goal: To exploit the redundancies in the image

More information

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256

encoding without prediction) (Server) Quantization: Initial Data 0, 1, 2, Quantized Data 0, 1, 2, 3, 4, 8, 16, 32, 64, 128, 256 General Models for Compression / Decompression -they apply to symbols data, text, and to image but not video 1. Simplest model (Lossless ( encoding without prediction) (server) Signal Encode Transmit (client)

More information

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course L. Yaroslavsky. Fundamentals of Digital Image Processing. Course 0555.330 Lec. 6. Principles of image coding The term image coding or image compression refers to processing image digital data aimed at

More information

Proyecto final de carrera

Proyecto final de carrera UPC-ETSETB Proyecto final de carrera A comparison of scalar and vector quantization of wavelet decomposed images Author : Albane Delos Adviser: Luis Torres 2 P a g e Table of contents Table of figures...

More information

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction SPIE Conference on Visual Communications and Image Processing, Perth, Australia, June 2000 1 A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction Markus Flierl, Thomas

More information

Image Compression. Fundamentals: Coding redundancy. The gray level histogram of an image can reveal a great deal of information about the image

Image Compression. Fundamentals: Coding redundancy. The gray level histogram of an image can reveal a great deal of information about the image Fundamentals: Coding redundancy The gray level histogram of an image can reveal a great deal of information about the image That probability (frequency) of occurrence of gray level r k is p(r k ), p n

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2006 jzhang@cse.unsw.edu.au

More information

Phase-Correlation Motion Estimation Yi Liang

Phase-Correlation Motion Estimation Yi Liang EE 392J Final Project Abstract Phase-Correlation Motion Estimation Yi Liang yiliang@stanford.edu Phase-correlation motion estimation is studied and implemented in this work, with its performance, efficiency

More information

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5 Lecture : Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP959 Multimedia Systems S 006 jzhang@cse.unsw.edu.au Acknowledgement

More information

Design of Optimal Quantizers for Distributed Source Coding

Design of Optimal Quantizers for Distributed Source Coding Design of Optimal Quantizers for Distributed Source Coding David Rebollo-Monedero, Rui Zhang and Bernd Girod Information Systems Laboratory, Electrical Eng. Dept. Stanford University, Stanford, CA 94305

More information

Analysis of Redundant-Wavelet Multihypothesis for Motion Compensation

Analysis of Redundant-Wavelet Multihypothesis for Motion Compensation Analysis of Redundant-Wavelet Multihypothesis for Motion Compensation James E. Fowler Department of Electrical and Computer Engineering GeoResources Institute GRI Mississippi State University, Starville,

More information

MODERN video coding standards, such as H.263, H.264,

MODERN video coding standards, such as H.263, H.264, 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006 Analysis of Multihypothesis Motion Compensated Prediction (MHMCP) for Robust Visual Communication Wei-Ying

More information

The information loss in quantization

The information loss in quantization The information loss in quantization The rough meaning of quantization in the frame of coding is representing numerical quantities with a finite set of symbols. The mapping between numbers, which are normally

More information

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 12, NO 11, NOVEMBER 2002 957 Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression Markus Flierl, Student

More information

Transform coding - topics. Principle of block-wise transform coding

Transform coding - topics. Principle of block-wise transform coding Transform coding - topics Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform Threshold coding Typical coding artifacts

More information

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l Vector Quantization Encoder Decoder Original Image Form image Vectors X Minimize distortion k k Table X^ k Channel d(x, X^ Look-up i ) X may be a block of l m image or X=( r, g, b ), or a block of DCT

More information

Lecture 7 Predictive Coding & Quantization

Lecture 7 Predictive Coding & Quantization Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Lecture 7 Predictive Coding & Quantization June 3, 2009 Outline Predictive Coding Motion Estimation and Compensation Context-Based Coding Quantization

More information

Predictive Coding. Prediction Prediction in Images

Predictive Coding. Prediction Prediction in Images Prediction Prediction in Images Predictive Coding Principle of Differential Pulse Code Modulation (DPCM) DPCM and entropy-constrained scalar quantization DPCM and transmission errors Adaptive intra-interframe

More information

ECE533 Digital Image Processing. Embedded Zerotree Wavelet Image Codec

ECE533 Digital Image Processing. Embedded Zerotree Wavelet Image Codec University of Wisconsin Madison Electrical Computer Engineering ECE533 Digital Image Processing Embedded Zerotree Wavelet Image Codec Team members Hongyu Sun Yi Zhang December 12, 2003 Table of Contents

More information

Digital communication system. Shannon s separation principle

Digital communication system. Shannon s separation principle Digital communication system Representation of the source signal by a stream of (binary) symbols Adaptation to the properties of the transmission channel information source source coder channel coder modulation

More information

On Compression Encrypted Data part 2. Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University

On Compression Encrypted Data part 2. Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University On Compression Encrypted Data part 2 Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University 1 Brief Summary of Information-theoretic Prescription At a functional

More information

Rate Bounds on SSIM Index of Quantized Image DCT Coefficients

Rate Bounds on SSIM Index of Quantized Image DCT Coefficients Rate Bounds on SSIM Index of Quantized Image DCT Coefficients Sumohana S. Channappayya, Alan C. Bovik, Robert W. Heath Jr. and Constantine Caramanis Dept. of Elec. & Comp. Engg.,The University of Texas

More information

Statistical Analysis and Distortion Modeling of MPEG-4 FGS

Statistical Analysis and Distortion Modeling of MPEG-4 FGS Statistical Analysis and Distortion Modeling of MPEG-4 FGS Min Dai Electrical Engineering Texas A&M University, TX 77843 Dmitri Loguinov Computer Science Texas A&M University, TX 77843 Hayder Radha Hayder

More information

BASICS OF COMPRESSION THEORY

BASICS OF COMPRESSION THEORY BASICS OF COMPRESSION THEORY Why Compression? Task: storage and transport of multimedia information. E.g.: non-interlaced HDTV: 0x0x0x = Mb/s!! Solutions: Develop technologies for higher bandwidth Find

More information

Rate-Distortion Based Temporal Filtering for. Video Compression. Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801

Rate-Distortion Based Temporal Filtering for. Video Compression. Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801 Rate-Distortion Based Temporal Filtering for Video Compression Onur G. Guleryuz?, Michael T. Orchard y? University of Illinois at Urbana-Champaign Beckman Institute, 45 N. Mathews Ave., Urbana, IL 68 y

More information

Multimedia & Computer Visualization. Exercise #5. JPEG compression

Multimedia & Computer Visualization. Exercise #5. JPEG compression dr inż. Jacek Jarnicki, dr inż. Marek Woda Institute of Computer Engineering, Control and Robotics Wroclaw University of Technology {jacek.jarnicki, marek.woda}@pwr.wroc.pl Exercise #5 JPEG compression

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Lesson 7 Delta Modulation and DPCM Instructional Objectives At the end of this lesson, the students should be able to: 1. Describe a lossy predictive coding scheme.

More information

Compression methods: the 1 st generation

Compression methods: the 1 st generation Compression methods: the 1 st generation 1998-2017 Josef Pelikán CGG MFF UK Praha pepca@cgg.mff.cuni.cz http://cgg.mff.cuni.cz/~pepca/ Still1g 2017 Josef Pelikán, http://cgg.mff.cuni.cz/~pepca 1 / 32 Basic

More information

Waveform-Based Coding: Outline

Waveform-Based Coding: Outline Waveform-Based Coding: Transform and Predictive Coding Yao Wang Polytechnic University, Brooklyn, NY11201 http://eeweb.poly.edu/~yao Based on: Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and

More information

Can the sample being transmitted be used to refine its own PDF estimate?

Can the sample being transmitted be used to refine its own PDF estimate? Can the sample being transmitted be used to refine its own PDF estimate? Dinei A. Florêncio and Patrice Simard Microsoft Research One Microsoft Way, Redmond, WA 98052 {dinei, patrice}@microsoft.com Abstract

More information

Lossless Image and Intra-frame Compression with Integer-to-Integer DST

Lossless Image and Intra-frame Compression with Integer-to-Integer DST 1 Lossless Image and Intra-frame Compression with Integer-to-Integer DST Fatih Kamisli, Member, IEEE arxiv:1708.07154v1 [cs.mm] 3 Aug 017 Abstract Video coding standards are primarily designed for efficient

More information

Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences

Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences Jingning Han, Vinay Melkote, and Kenneth Rose Department of Electrical and Computer Engineering University of California, Santa

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 5 Other Coding Techniques Instructional Objectives At the end of this lesson, the students should be able to:. Convert a gray-scale image into bit-plane

More information

CSE 126 Multimedia Systems Midterm Exam (Form A)

CSE 126 Multimedia Systems Midterm Exam (Form A) University of California, San Diego Inst: Prof P. V. Rangan CSE 126 Multimedia Systems Midterm Exam (Form A) Spring 2003 Solution Assume the following input (before encoding) frame sequence (note that

More information

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak 4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof. What is data compression? Reducing the file size without compromising the quality of the data stored in the

More information

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 On the Structure of Real-Time Encoding and Decoding Functions in a Multiterminal Communication System Ashutosh Nayyar, Student

More information

Selective Use Of Multiple Entropy Models In Audio Coding

Selective Use Of Multiple Entropy Models In Audio Coding Selective Use Of Multiple Entropy Models In Audio Coding Sanjeev Mehrotra, Wei-ge Chen Microsoft Corporation One Microsoft Way, Redmond, WA 98052 {sanjeevm,wchen}@microsoft.com Abstract The use of multiple

More information

Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming

Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming Min Dai Electrical Engineering, Texas A&M University Dmitri Loguinov Computer Science, Texas A&M University

More information

Predictive Coding. Prediction

Predictive Coding. Prediction Predictive Coding Prediction Prediction in Images Principle of Differential Pulse Code Modulation (DPCM) DPCM and entropy-constrained scalar quantization DPCM and transmission errors Adaptive intra-interframe

More information

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms Flierl and Girod: Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms, IEEE DCC, Mar. 007. Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms Markus Flierl and Bernd Girod Max

More information

Edge Adaptive Prediction for Lossless Image Coding

Edge Adaptive Prediction for Lossless Image Coding Edge Adaptive Prediction for Lossless Image Coding Wee Sun Lee Department of Computer Science School of Computing National University of Singapore Singapore 119260 Republic of Singapore Abstract We design

More information

IMAGE COMPRESSION-II. Week IX. 03/6/2003 Image Compression-II 1

IMAGE COMPRESSION-II. Week IX. 03/6/2003 Image Compression-II 1 IMAGE COMPRESSION-II Week IX 3/6/23 Image Compression-II 1 IMAGE COMPRESSION Data redundancy Self-information and Entropy Error-free and lossy compression Huffman coding Predictive coding Transform coding

More information

Fast Progressive Wavelet Coding

Fast Progressive Wavelet Coding PRESENTED AT THE IEEE DCC 99 CONFERENCE SNOWBIRD, UTAH, MARCH/APRIL 1999 Fast Progressive Wavelet Coding Henrique S. Malvar Microsoft Research One Microsoft Way, Redmond, WA 98052 E-mail: malvar@microsoft.com

More information

Analysis of Finite Wordlength Effects

Analysis of Finite Wordlength Effects Analysis of Finite Wordlength Effects Ideally, the system parameters along with the signal variables have infinite precision taing any value between and In practice, they can tae only discrete values within

More information

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000 1 Wavelet Scalable Video Codec Part 1: image compression by JPEG2000 Aline Roumy aline.roumy@inria.fr May 2011 2 Motivation for Video Compression Digital video studio standard ITU-R Rec. 601 Y luminance

More information

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. Preface p. xvii Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p. 6 Summary p. 10 Projects and Problems

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition Review of Quantization UMCP ENEE631 Slides (created by M.Wu 004) Quantization UMCP ENEE631 Slides (created by M.Wu 001/004) L-level Quantization Minimize errors for this lossy process What L values to

More information

Video Coding With Linear Compensation (VCLC)

Video Coding With Linear Compensation (VCLC) Coding With Linear Compensation () Arif Mahmood Zartash Afzal Uzmi Sohaib Khan School of Science and Engineering Lahore University of Management Sciences, Lahore, Pakistan {arifm, zartash, sohaib}@lums.edu.pk

More information

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding SIGNAL COMPRESSION 8. Lossy image compression: Principle of embedding 8.1 Lossy compression 8.2 Embedded Zerotree Coder 161 8.1 Lossy compression - many degrees of freedom and many viewpoints The fundamental

More information

Multimedia Systems Giorgio Leonardi A.A Lecture 4 -> 6 : Quantization

Multimedia Systems Giorgio Leonardi A.A Lecture 4 -> 6 : Quantization Multimedia Systems Giorgio Leonardi A.A.2014-2015 Lecture 4 -> 6 : Quantization Overview Course page (D.I.R.): https://disit.dir.unipmn.it/course/view.php?id=639 Consulting: Office hours by appointment:

More information

CSE 408 Multimedia Information System Yezhou Yang

CSE 408 Multimedia Information System Yezhou Yang Image and Video Compression CSE 408 Multimedia Information System Yezhou Yang Lots of slides from Hassan Mansour Class plan Today: Project 2 roundup Today: Image and Video compression Nov 10: final project

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB)

Image Compression. Qiaoyong Zhong. November 19, CAS-MPG Partner Institute for Computational Biology (PICB) Image Compression Qiaoyong Zhong CAS-MPG Partner Institute for Computational Biology (PICB) November 19, 2012 1 / 53 Image Compression The art and science of reducing the amount of data required to represent

More information

Perceptual Feedback in Multigrid Motion Estimation Using an Improved DCT Quantization

Perceptual Feedback in Multigrid Motion Estimation Using an Improved DCT Quantization IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 10, NO. 10, OCTOBER 2001 1411 Perceptual Feedback in Multigrid Motion Estimation Using an Improved DCT Quantization Jesús Malo, Juan Gutiérrez, I. Epifanio,

More information

on a per-coecient basis in large images is computationally expensive. Further, the algorithm in [CR95] needs to be rerun, every time a new rate of com

on a per-coecient basis in large images is computationally expensive. Further, the algorithm in [CR95] needs to be rerun, every time a new rate of com Extending RD-OPT with Global Thresholding for JPEG Optimization Viresh Ratnakar University of Wisconsin-Madison Computer Sciences Department Madison, WI 53706 Phone: (608) 262-6627 Email: ratnakar@cs.wisc.edu

More information

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Dinesh Krithivasan and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science,

More information

Overview. Analog capturing device (camera, microphone) PCM encoded or raw signal ( wav, bmp, ) A/D CONVERTER. Compressed bit stream (mp3, jpg, )

Overview. Analog capturing device (camera, microphone) PCM encoded or raw signal ( wav, bmp, ) A/D CONVERTER. Compressed bit stream (mp3, jpg, ) Overview Analog capturing device (camera, microphone) Sampling Fine Quantization A/D CONVERTER PCM encoded or raw signal ( wav, bmp, ) Transform Quantizer VLC encoding Compressed bit stream (mp3, jpg,

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

LORD: LOw-complexity, Rate-controlled, Distributed video coding system

LORD: LOw-complexity, Rate-controlled, Distributed video coding system LORD: LOw-complexity, Rate-controlled, Distributed video coding system Rami Cohen and David Malah Signal and Image Processing Lab Department of Electrical Engineering Technion - Israel Institute of Technology

More information

Progressive Wavelet Coding of Images

Progressive Wavelet Coding of Images Progressive Wavelet Coding of Images Henrique Malvar May 1999 Technical Report MSR-TR-99-26 Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 1999 IEEE. Published in the IEEE

More information

SCALABLE AUDIO CODING USING WATERMARKING

SCALABLE AUDIO CODING USING WATERMARKING SCALABLE AUDIO CODING USING WATERMARKING Mahmood Movassagh Peter Kabal Department of Electrical and Computer Engineering McGill University, Montreal, Canada Email: {mahmood.movassagh@mail.mcgill.ca, peter.kabal@mcgill.ca}

More information

Predictive Coding. Lossy or lossless. Feedforward or feedback. Intraframe or interframe. Fixed or Adaptive

Predictive Coding. Lossy or lossless. Feedforward or feedback. Intraframe or interframe. Fixed or Adaptive Predictie Coding Predictie coding is a compression tecnique based on te difference between te original and predicted alues. It is also called DPCM Differential Pulse Code Modulation Lossy or lossless Feedforward

More information

Image Data Compression

Image Data Compression Image Data Compression Image data compression is important for - image archiving e.g. satellite data - image transmission e.g. web data - multimedia applications e.g. desk-top editing Image data compression

More information

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 = Chapter 5 Sequences and series 5. Sequences Definition 5. (Sequence). A sequence is a function which is defined on the set N of natural numbers. Since such a function is uniquely determined by its values

More information

Video Coding with Motion Compensation for Groups of Pictures

Video Coding with Motion Compensation for Groups of Pictures International Conference on Image Processing 22 Video Coding with Motion Compensation for Groups of Pictures Markus Flierl Telecommunications Laboratory University of Erlangen-Nuremberg mflierl@stanford.edu

More information

A WAVELET BASED CODING SCHEME VIA ATOMIC APPROXIMATION AND ADAPTIVE SAMPLING OF THE LOWEST FREQUENCY BAND

A WAVELET BASED CODING SCHEME VIA ATOMIC APPROXIMATION AND ADAPTIVE SAMPLING OF THE LOWEST FREQUENCY BAND A WAVELET BASED CODING SCHEME VIA ATOMIC APPROXIMATION AND ADAPTIVE SAMPLING OF THE LOWEST FREQUENCY BAND V. Bruni, D. Vitulano Istituto per le Applicazioni del Calcolo M. Picone, C. N. R. Viale del Policlinico

More information

Image Compression using DPCM with LMS Algorithm

Image Compression using DPCM with LMS Algorithm Image Compression using DPCM with LMS Algorithm Reenu Sharma, Abhay Khedkar SRCEM, Banmore -----------------------------------------------------------------****---------------------------------------------------------------

More information

Seismic compression. François G. Meyer Department of Electrical Engineering University of Colorado at Boulder

Seismic compression. François G. Meyer Department of Electrical Engineering University of Colorado at Boulder Seismic compression François G. Meyer Department of Electrical Engineering University of Colorado at Boulder francois.meyer@colorado.edu http://ece-www.colorado.edu/ fmeyer IPAM, MGA 2004 Seismic Compression

More information

EE368B Image and Video Compression

EE368B Image and Video Compression EE368B Image and Video Compression Homework Set #2 due Friday, October 20, 2000, 9 a.m. Introduction The Lloyd-Max quantizer is a scalar quantizer which can be seen as a special case of a vector quantizer

More information

Image and Multidimensional Signal Processing

Image and Multidimensional Signal Processing Image and Multidimensional Signal Processing Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ Image Compression 2 Image Compression Goal: Reduce amount

More information

How might we evaluate this? Suppose that, by some good luck, we knew that. x 2 5. x 2 dx 5

How might we evaluate this? Suppose that, by some good luck, we knew that. x 2 5. x 2 dx 5 8.4 1 8.4 Partial Fractions Consider the following integral. 13 2x (1) x 2 x 2 dx How might we evaluate this? Suppose that, by some good luck, we knew that 13 2x (2) x 2 x 2 = 3 x 2 5 x + 1 We could then

More information

A Hyper-Trellis based Turbo Decoder for Wyner-Ziv Video Coding

A Hyper-Trellis based Turbo Decoder for Wyner-Ziv Video Coding A Hyper-Trellis based Turbo Decoder for Wyner-Ziv Video Coding Arun Avudainayagam, John M. Shea and Dapeng Wu Wireless Information Networking Group (WING) Department of Electrical and Computer Engineering

More information

2. the basis functions have different symmetries. 1 k = 0. x( t) 1 t 0 x(t) 0 t 1

2. the basis functions have different symmetries. 1 k = 0. x( t) 1 t 0 x(t) 0 t 1 In the next few lectures, we will look at a few examples of orthobasis expansions that are used in modern signal processing. Cosine transforms The cosine-i transform is an alternative to Fourier series;

More information

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009

The Secrets of Quantization. Nimrod Peleg Update: Sept. 2009 The Secrets of Quantization Nimrod Peleg Update: Sept. 2009 What is Quantization Representation of a large set of elements with a much smaller set is called quantization. The number of elements in the

More information

Department of Electrical Engineering, Polytechnic University, Brooklyn Fall 05 EL DIGITAL IMAGE PROCESSING (I) Final Exam 1/5/06, 1PM-4PM

Department of Electrical Engineering, Polytechnic University, Brooklyn Fall 05 EL DIGITAL IMAGE PROCESSING (I) Final Exam 1/5/06, 1PM-4PM Department of Electrical Engineering, Polytechnic University, Brooklyn Fall 05 EL512 --- DIGITAL IMAGE PROCESSING (I) Y. Wang Final Exam 1/5/06, 1PM-4PM Your Name: ID Number: Closed book. One sheet of

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

A NEW BASIS SELECTION PARADIGM FOR WAVELET PACKET IMAGE CODING

A NEW BASIS SELECTION PARADIGM FOR WAVELET PACKET IMAGE CODING A NEW BASIS SELECTION PARADIGM FOR WAVELET PACKET IMAGE CODING Nasir M. Rajpoot, Roland G. Wilson, François G. Meyer, Ronald R. Coifman Corresponding Author: nasir@dcs.warwick.ac.uk ABSTRACT In this paper,

More information

Vector Quantization and Subband Coding

Vector Quantization and Subband Coding Vector Quantization and Subband Coding 18-796 ultimedia Communications: Coding, Systems, and Networking Prof. Tsuhan Chen tsuhan@ece.cmu.edu Vector Quantization 1 Vector Quantization (VQ) Each image block

More information

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES Abstract March, 3 Mads Græsbøll Christensen Audio Analysis Lab, AD:MT Aalborg University This document contains a brief introduction to pitch

More information

Multimedia Communications. Scalar Quantization

Multimedia Communications. Scalar Quantization Multimedia Communications Scalar Quantization Scalar Quantization In many lossy compression applications we want to represent source outputs using a small number of code words. Process of representing

More information

Transform Coding. Transform Coding Principle

Transform Coding. Transform Coding Principle Transform Coding Principle of block-wise transform coding Properties of orthonormal transforms Discrete cosine transform (DCT) Bit allocation for transform coefficients Entropy coding of transform coefficients

More information

EE5585 Data Compression April 18, Lecture 23

EE5585 Data Compression April 18, Lecture 23 EE5585 Data Compression April 18, 013 Lecture 3 Instructor: Arya Mazumdar Scribe: Trevor Webster Differential Encoding Suppose we have a signal that is slowly varying For instance, if we were looking at

More information

Image Compression - JPEG

Image Compression - JPEG Overview of JPEG CpSc 86: Multimedia Systems and Applications Image Compression - JPEG What is JPEG? "Joint Photographic Expert Group". Voted as international standard in 99. Works with colour and greyscale

More information

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014 Scalar and Vector Quantization National Chiao Tung University Chun-Jen Tsai 11/06/014 Basic Concept of Quantization Quantization is the process of representing a large, possibly infinite, set of values

More information

ECE472/572 - Lecture 11. Roadmap. Roadmap. Image Compression Fundamentals and Lossless Compression Techniques 11/03/11.

ECE472/572 - Lecture 11. Roadmap. Roadmap. Image Compression Fundamentals and Lossless Compression Techniques 11/03/11. ECE47/57 - Lecture Image Compression Fundamentals and Lossless Compression Techniques /03/ Roadmap Preprocessing low level Image Enhancement Image Restoration Image Segmentation Image Acquisition Image

More information

1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009

1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 10, OCTOBER 2009 2-D Order-16 Integer Transforms for HD Video Coding Jie Dong, Student Member, IEEE, King Ngi Ngan, Fellow,

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

More information

ECE 634: Digital Video Systems Wavelets: 2/21/17

ECE 634: Digital Video Systems Wavelets: 2/21/17 ECE 634: Digital Video Systems Wavelets: 2/21/17 Professor Amy Reibman MSEE 356 reibman@purdue.edu hjp://engineering.purdue.edu/~reibman/ece634/index.html A short break to discuss wavelets Wavelet compression

More information

Homework Set 3 Solutions REVISED EECS 455 Oct. 25, Revisions to solutions to problems 2, 6 and marked with ***

Homework Set 3 Solutions REVISED EECS 455 Oct. 25, Revisions to solutions to problems 2, 6 and marked with *** Homework Set 3 Solutions REVISED EECS 455 Oct. 25, 2006 Revisions to solutions to problems 2, 6 and marked with ***. Let U be a continuous random variable with pdf p U (u). Consider an N-point quantizer

More information

Objectives of Image Coding

Objectives of Image Coding Objectives of Image Coding Representation of an image with acceptable quality, using as small a number of bits as possible Applications: Reduction of channel bandwidth for image transmission Reduction

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin, PhD Shanghai Jiao Tong University Chapter 4: Analog-to-Digital Conversion Textbook: 7.1 7.4 2010/2011 Meixia Tao @ SJTU 1 Outline Analog signal Sampling Quantization

More information

AN IMPROVED CONTEXT ADAPTIVE BINARY ARITHMETIC CODER FOR THE H.264/AVC STANDARD

AN IMPROVED CONTEXT ADAPTIVE BINARY ARITHMETIC CODER FOR THE H.264/AVC STANDARD 4th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP AN IMPROVED CONTEXT ADAPTIVE BINARY ARITHMETIC CODER FOR THE H.264/AVC STANDARD Simone

More information