oise{robust Recursive otion Estimation for H.263{based videoconferencing systems Stefano Olivieriy, Gerard de Haan z, and Luigi Albaniy yphilips S.p.A, Philips Research onza Via Philips, 12, 252 onza (I, Italy Tel: +39-39-23 784 Fax: +39-39-23 78 E-mail: olivieri@monza.research.philips.com zphilips Research Laboratories, Video Processing and Display Electronics Group, Prof. Holstlaan 4, (WY1 5656 AA Eindhoven, The etherlands Tel: +31-4-742555 Fax: +31-4-74263 E-mail: dehaan@natlab.research.philips.com Abstract The key element in realizing low cost real{time software implementations of a H.263 videoconferencing system is a fast motion estimation algorithm, which only slightly decreases coding eciency. We propose a spatio{temporal recursive estimator that combines an excellent coding eciency with a high computational eciency. Experimentally, the new algorithm proves to be comparable to full{search block matching when encoding typical videoconferencing sequences in presence of additive noise, even though the computational burden has been greatly reduced. 1 Introduction The traditional full{search block{matching algorithm (FS{BA has been adopted for temporal redundancy reduction in many H.263{based video coder implementations, such as Telenor's T5 [1]. However, due to its considerable amount of computations, FS{BA is unlikely to be implemented for real{time media{processor{based videoconferencing systems. As a consequence, many researchers are focused on designing fast motion estimation algorithms which lead to negligible degradation. Another reason to look for alternatives for the FS-BA, is that the resulting motion vectors have a poor relation with the true motion of objects, particularly if the signal contains some noise. Such a relation would be desirable, as true-motion vector elds are usually smooth, both spatially and temporally. This smoothness helps to keep the bit rate low in the typical video codec that applies entropy coding of the motion vector eld. The rate{distortion (RD optimization framework provides the formal solution to this problem of nding the optimal trade o between minimizing the entropy of the displacement vectors and minimizing the displaced frame dierence [2], [3], [4]. A less complex, but sub{optimal solution, applies smoothing of the BA-vector eld to reduce 1
the bit{rate needed for the coding of the vectors. Care must be taken that this smoothing does not increase the prediction error too much. We propose an opposite approach. In particular, we use a low{complexity spatio{temporal recursive (3DRS BA [5], [6], which inherently leads to consisitent, near true-motion, vector elds through the use of recursion. Since this recursive estimation not necessarilly leads to optimal vectors for coding applications, we describe an additional recursive post processing that enhances the performance of this estimator in the coding application. We show that the proposed renement technique performed inside the recursion loop of the original recursive motion estimation algorithm, results in very accurate estimates without aecting the coherency of the motion eld. Experimentally, we shall prove that the enhanced 3DRS (E3DRS performs comparable to FS{BA when encoding typical videoconferencing sequences in presence of additive noise, while the computational complexity has been drastically decreased. 2 The original 3-D RS block matcher The 3DRS BA is a predictor{corrector type displacement estimator, which stimulates smoothness of the motion eld by employing recursion. Let ~ d(n denote the motion eld between the current image I(~x ; n and the previous image I(~x ; n 1, and let d( ~ ~ b c ;n 2 ~ d(n indicate the motion vector assigned to the centre ~ b c =[x c y c ] T of a block of pixels B( ~ b c in the current image I(~x ; n. The 3DRS BA estimates the motion vector d( ~ ~ b c ;n such that ~d( ~ b c ;n = f C ~ 2 CS( ~ b c je( C ~ < E( V ~ g 8( V ~ 2 CS( ~ b c ; ( CS( ~ b c = ~d( ~ b c ;n; d( ~ ~ bc ;n; d( ~ ~ bc 2 ~d( ~ b c ;n+~v 1 ( ~ b c ; ~ d( ~ bc ;n+~v 2 ( ~ b c ;n 1; (1 where E( ~ V is a matching measure to quantify block similarity, and [v 1x v 1y ] T = ~v 1 and [v 2x v 2y ] T = ~v 2 are random choices from the range [ a; a]. The candidate set CS( ~ b c consists of 5 vectors: three predictor vectors from a spatio{temporal neighborhood, and two vectors obtained by adding a random update vector to the motion vector estimated for the previous block. This implicitly assumes spatial and/or temporal consistency. Due to its highly consistent motion vectors, the 3DRS algorithm is particularly suitable for the eld rate conversion application; however, its smoothness constraint seems to be too strong for H.263 video coding, which leads to a poor performance of the codec. 3 The Enhanced 3DRS algorithm for H.263 video coding To rene the estimated motion eld, we must compute ~ d u ( ~ b c ;n 2 ~ d u (n such that 2
~d u ( ~ b c ;n= ~ d( ~ b c ;n+~u ( ~ b c (2 where the update term ~u ( ~ b c =[u x u y] T can be found for each motion vector d( ~ ~ b c ;n 2 ~ d(n according to ~u ( ~ b c = arg ( min E( d( ~ ~ b c ;n+~u ( ~ b c (u x;u y From (3, the update term is determined by searching for the location of the best{matching block within the search window centered about the pixel position referenced by the motion vector ~ d( ~ b c ;n. Large search windows are expected to provide very accurate estimates; however, the computation to rene the motion eld increases exponentially as the dimension of the search window increases, thus impairing the computational eciency of the 3DRS algorithm. Besides, large components of the update term may deteriorate the coherency of the estimated motion eld, which results in an increased number of bits to encode motion information. In order to ensure eective renement at reduced computational eort, we propose to compute the update term inside the recursion loop of the 3DRS, that is, ~u ( ~ b c is determined before generating the candidate motion vector set associated with the next block, and limit the search window such that (u x ;u y 2f1g. Experimental results prove the gain achieved with larger search windows to be negligible. The E3DRS then estimates the motion vector ~ d( ~ b c ;n= ~ d 2 ( ~ b c ;n 2 ~ d(n for each block B( ~ b c according to (3 ~d s ( ~ b c ;n = f ~ C 2 CS s ( ~ b c je( ~ C < E( ~ V g 8( ~ V 2 CS s ( ~ b c ; s=1; 2 CS 1 ( ~ b c = ( ~d 2 ( ~ b c ~d 2 ( ~ b c ;n; ~ d2 ( ~ b c ;n; ~ d2 ( ~ b c ;n; d2 ~ ( ~ b c 2 ;n+~u ( ~ b c ;n 1; (4 CS 2 ( ~ b c = f ~ Cj ~ C = ~ d 1 ( ~ b c ;n+~u( ~ b c g where ~ d 1 ( ~ b c ;n is the motion vector selected among the candidate set CS 1 ( ~ b c according to the 3DRS strategy, and ~ d 2 ( ~ b c ;n= ~ d 1 ( ~ b c ;n+~u ( ~ b c 2 CS 2 ( ~ b c is the rened motion vector. The E3DRS results in 13 candidates to be evaluated for each block. From (4, the convergence of the algorithm is accelerated without aecting the smoothness of the estimated motion eld, since the rened motion vector for the present block is taken as a candidate vector for the next block. Equation (4 also shows that a directional updating strategy suitable for the E3DRS BA is adopted. Indeed, the update term for the previous block ~u ( ~ b c [ ] T = d ~ 2 ( ~ b c [ ] T ;n d1 ~ ( ~ b c [ ] T ;n indicates the local trend of the motion; therefore, the updating process adapts to the direction of the minimum. The coecient 1 allows faster convergence of the updating process. 3
12 iss America (a 14 other and Daughter (b 1 12 8 6 4 2.5.1.15.2.25 1/Q p 1 8 6 4 3drs e3drs fs 2.5.1.15.2.25 1/Q p 37.5 iss America (c 38 other and Daughter (d 37 37 PSR (db 36.5 36 35.5 35 34.5 5 1 15 PSR (db 36 35 34 e3drs fs 33 5 1 15 2 Figure 1: Performance comparison between 3DRS, E3DRS and FS BA at integer pixel accuracy for the sequences \iss America" and \other and Daughter". 4 Experimental results The performance of E3DRS, which has been incorporated into a H.263 video encoder on VLIW programmable media processor, is now shown for typical videoconferencing sequences. The results were achieved processing 1 frames of the sequences \iss America" and \other and Daughter" at CIF format, with xed quantizers Q i and Q p (Q p = Q i 2, no options, and skipping every three out of four frames. Integer pixel accuracy was set for motion estimation. Figures 1-a and 1-b compare the rate performance of 3DRS, E3DRS and FS BA for original sequences. In this noise-free situation, E3DRS gains some 2% over 3DRS, whereas FS gains only some 7% over E3DRS at most. Figures 1-c and 1-d depict the RD performance of E3DRS and FS in case of sequences degraded by additive white Gaussian noise at 2 db SR. In this case, the recursive strategy of E3DRS signicantly increases the consistency of the estimated motion eld, and improves the noise robustness of the estimates. As a result, no signicant dierence exists between E3DRS and FS. 4
5 Conclusions We presented a fast recursive motion estimation algorithm suitable for low cost real{time software implementation of the H.263 videoconferencing system on a VLIW programmable media processor. Due to its enhanced recursive strategy, coherent vectors are estimated without negatively aecting the accuracy of the resulting motion compensated prediction. Experimentally, we could prove that the proposed algorithm performs comparable to full{search block matching when encoding typical videoconferencing sequences in the presence of additive noise, while the resulting computational load on the processor has been greatly reduced. References [1] Telenor Research, \T (H.263 encoder/decoder, version 1.4a", T (H.263 Codec, ay 1995. [2] G.. Schuster, A. K. Katsaggelos, \A theory for the optimal bit allocation between displacement vector eld and displaced frame dierence", IEEE J. Select. Areas Commun., vol. 15, pp. 1739{1751, Dec. 1997. [3]. C. Chen, A.. Willson, \Rate{distortion optimal motion estimation algorithms for motion{compensated transform video coding", IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp. 147{158, Apr. 1998. [4] F. Kossentini, Y. W. Lee,. J. T. Smith, R. K. Ward, \Predictive RD optimized motion estimation for very low bit{rate video coding", IEEE J. Select. Areas Commun., vol. 15, pp. 1752{1763, Dec. 1997. [5] G. de Haan, P.W.A.C. Biezen, H. Huijgen, O. A. Ojo, \True motion estimation with 3{D recursive search block matching", IEEE Trans. Circuits and Systems for Video Technology, Vol. 3, October 1993, pp. 368{379. [6] G. de Haan, P.W.A.C. Biezen, \Sub{pixel motion estimation with 3{D recursive search block{matching", Signal Processing: Image Communication 6 (1995, pp. 485{498. 5