An Investigation of 3D Dual-Tree Wavelet Transform for Video Coding

Similar documents
Converting DCT Coefficients to H.264/AVC

A Novel Fast Computing Method for Framelet Coefficients

Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms

- An Image Coding Algorithm

COMPLEX WAVELET TRANSFORM IN SIGNAL AND IMAGE ANALYSIS

A NEW BASIS SELECTION PARADIGM FOR WAVELET PACKET IMAGE CODING

Design of Image Adaptive Wavelets for Denoising Applications

Scalable color image coding with Matching Pursuit

Rational Coefficient Dual-Tree Complex Wavelet Transform: Design and Implementation Adeel Abbas and Trac D. Tran, Member, IEEE

Analysis of Redundant-Wavelet Multihypothesis for Motion Compensation

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding

Learning an Adaptive Dictionary Structure for Efficient Image Sparse Coding

Activity Mining in Sensor Networks

A WAVELET BASED CODING SCHEME VIA ATOMIC APPROXIMATION AND ADAPTIVE SAMPLING OF THE LOWEST FREQUENCY BAND

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

446 SCIENCE IN CHINA (Series F) Vol. 46 introduced in refs. [6, ]. Based on this inequality, we add normalization condition, symmetric conditions and

Digital Image Processing

Image Denoising using Uniform Curvelet Transform and Complex Gaussian Scale Mixture

Progressive Wavelet Coding of Images

Fast Progressive Wavelet Coding

MATCHING-PURSUIT DICTIONARY PRUNING FOR MPEG-4 VIDEO OBJECT CODING

Estimation Error Bounds for Frame Denoising

Direction-Adaptive Transforms for Coding Prediction Residuals

Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000

Can the sample being transmitted be used to refine its own PDF estimate?

MP3D: Highly Scalable Video Coding Scheme Based on Matching Pursuit

Logarithmic quantisation of wavelet coefficients for improved texture classification performance

ECE533 Digital Image Processing. Embedded Zerotree Wavelet Image Codec

Introduction to Wavelets and Wavelet Transforms

The Application of Legendre Multiwavelet Functions in Image Compression

A New Smoothing Model for Analyzing Array CGH Data

Symmetric Wavelet Tight Frames with Two Generators

Frequency-Domain Design and Implementation of Overcomplete Rational-Dilation Wavelet Transforms

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

A Higher-Density Discrete Wavelet Transform

Module 4. Multi-Resolution Analysis. Version 2 ECE IIT, Kharagpur

On the Dual-Tree Complex Wavelet Packet and. Abstract. Index Terms I. INTRODUCTION

Sparse and Shift-Invariant Feature Extraction From Non-Negative Data

Denoising and Compression Using Wavelets

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction

Lifting Parameterisation of the 9/7 Wavelet Filter Bank and its Application in Lossless Image Compression

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

Waveform-Based Coding: Outline

SDP APPROXIMATION OF THE HALF DELAY AND THE DESIGN OF HILBERT PAIRS. Bogdan Dumitrescu

Directionlets. Anisotropic Multi-directional Representation of Images with Separable Filtering. Vladan Velisavljević Deutsche Telekom, Laboratories

CONSTRUCTION OF AN ORTHONORMAL COMPLEX MULTIRESOLUTION ANALYSIS. Liying Wei and Thierry Blu

A TWO-STAGE VIDEO CODING FRAMEWORK WITH BOTH SELF-ADAPTIVE REDUNDANT DICTIONARY AND ADAPTIVELY ORTHONORMALIZED DCT BASIS

Proyecto final de carrera

Wavelets, Filter Banks and Multiresolution Signal Processing

DUAL TREE COMPLEX WAVELETS

Basic Principles of Video Coding

Low-Complexity Image Denoising via Analytical Form of Generalized Gaussian Random Vectors in AWGN

This is a repository copy of The effect of quality scalable image compression on robust watermarking.

VARIOUS types of wavelet transform are available for

Deconvolution of Overlapping HPLC Aromatic Hydrocarbons Peaks Using Independent Component Analysis ICA

The Dual-Tree Complex Wavelet Transform A Coherent Framework for Multiscale Signal and Image Processing

Image compression using an edge adapted redundant dictionary and wavelets

Wavelet Packet Based Digital Image Watermarking

UWB Geolocation Techniques for IEEE a Personal Area Networks

A Comparison Between Polynomial and Locally Weighted Regression for Fault Detection and Diagnosis of HVAC Equipment

Implementation of CCSDS Recommended Standard for Image DC Compression

+ (50% contribution by each member)

Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

Learning MMSE Optimal Thresholds for FISTA

SCALABLE 3-D WAVELET VIDEO CODING

EE67I Multimedia Communication Systems

IN recent years, the demand for media-rich applications over

ECE 634: Digital Video Systems Wavelets: 2/21/17

Compression and Coding

Design and Implementation of Multistage Vector Quantization Algorithm of Image compression assistant by Multiwavelet Transform

Minimizing Isotropic Total Variation without Subiterations

The Gamma Variate with Random Shape Parameter and Some Applications

Which wavelet bases are the best for image denoising?

Digital Image Processing

THE currently prevalent video coding framework (e.g. A Novel Video Coding Framework using Self-adaptive Dictionary

Image and Multidimensional Signal Processing

Objective: Reduction of data redundancy. Coding redundancy Interpixel redundancy Psychovisual redundancy Fall LIST 2

Multiscale Image Transforms

Truncation Strategy of Tensor Compressive Sensing for Noisy Video Sequences

Digital Image Processing

MR IMAGE COMPRESSION BY HAAR WAVELET TRANSFORM

VIDEO CODING USING A SELF-ADAPTIVE REDUNDANT DICTIONARY CONSISTING OF SPATIAL AND TEMPORAL PREDICTION CANDIDATES. Author 1 and Author 2

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

EFFICIENT IMAGE COMPRESSION ALGORITHMS USING EVOLVED WAVELETS

Wavelet Footprints: Theory, Algorithms, and Applications

EMBEDDED ZEROTREE WAVELET COMPRESSION

The Iteration-Tuned Dictionary for Sparse Representations

Seismic compression. François G. Meyer Department of Electrical Engineering University of Colorado at Boulder

Multiresolution analysis & wavelets (quick tutorial)

Problem with Fourier. Wavelets: a preview. Fourier Gabor Wavelet. Gabor s proposal. in the transform domain. Sinusoid with a small discontinuity

Wavelets: a preview. February 6, 2003 Acknowledgements: Material compiled from the MATLAB Wavelet Toolbox UG.

DPCM FOR QUANTIZED BLOCK-BASED COMPRESSED SENSING OF IMAGES

Invariant Pattern Recognition using Dual-tree Complex Wavelets and Fourier Features

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

Satellite image deconvolution using complex wavelet packets

Wavelets and Multiresolution Processing

State of the art Image Compression Techniques

Transcription:

MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com An Investigation of 3D Dual-Tree Wavelet Transform for Video Coding Beibei Wang, Yao Wang, Ivan Selesnick and Anthony Vetro TR2004-132 December 2004 Abstract This paper examines the properties of a recently introduced 3-D dual-tree discrete wavelet transform (DDWT) for video coding. The 3-D DDWT is an attractive video representation because it isolates motion along different directions in separate sub bands. However, it is an over complete transform with 8:1 redundancy. We examine the effectiveness of the iterative projective-based noise-shaping scheme proposed by Kingsbury [3] on reducing the number of coefficients. We also investigate the correlation between sub bands at the same spatial/temporal location, both in the significance map and in actual coefficient value. IEEE International Conference on Image Processing (ICIP) This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2004 201 Broadway, Cambridge, Massachusetts 02139

MERLCoverPageSide2

AN INVESTIGATION OF 3D DUAL-TREE WAVELET TRANSFORM FOR VIDEO CODING Beibei Wang 1, Yao Wang 1, Ivan Selesnick 1 and Anthony Vetro 2 1 Polytechnic University, Electrical and Computer Engineering Dept, Brooklyn, NY 2 Mitsubishi Electric Research Laboratories, Cambridge, MA Email: (bb_w, yao)@vision.poly.edu, selesi@duke.poly.edu, avetro@merl.com ABSTRACT This paper examines the properties of a recently introduced 3-D dual-tree discrete wavelet transform (DDWT) for video coding. The 3-D DDWT is an attractive video representation because it isolates motion along different directions in separate subbands. However, it is an overcomplete transform with 8:1 redundancy. We examine the effectiveness of the iterative projective-based noise shaping scheme proposed by Kingsbury [3] on reducing the number of coefficients. We also investigate the correlation between subbands at the same spatial/temporal location, both in the significance map and in actual coefficient value. 1. INTRODUCTION The standard separable discrete wavelet transform (DWT) provides a multi-resolution representation of a signal. The DWT can be used for a variety of applications such as denoising, restoration and enhancement, and compression. Several recently proposed DWT-based video coders have achieved coding efficiency similar to or better than blockbased hybrid video coders [1]. In addition, such coders provide a scalable representation of the video in spatial resolution, temporal resolution and quality. However, the multidimensional DWT mixes orientations in its subbands, which can lead to checkerboard artifacts at the low bit rate range. An important recent development in wavelet-related research is the design and implementation of 2-D multiscale transforms that represent edges more efficiently than does the DWT. Kingsbury s complex dual-tree wavelet transform (DT-CWT) is an outstanding example [2]. The DT-CWT is an overcomplete transform with m limited redundancy ( 2 :1 for m-dimensional signals). This transform has good directional selectivity and its subband responses are approximately shift-invariant. The 2-D DT-CWT has given superior results for image processing applications compared to the DWT [2,3]. Recently, Selesnick and Sendur introduced a 3-D version of the dual-tree wavelet transform and showed that it has superior motion selectivity [4]. In this paper, we explore the suitability of the 3-D DDWT for video coding. Fig.1 Isosurfaces of a typical 3-D DWT (left) and a typical 3-D DDWT (right). For the 3-D DDWT, each subband corresponds to motion in a specific direction. We start by briefly introducing the 3-D DDWT and its properties for video representation. Section 3 describes how to select significant coefficients for video coding. Section 4 investigates the correlation between bases at the same spatial/temporal location for both the significance map and the actual coefficients. The final section summarizes our work and discusses options for video coding using 3-D DDWT. 2. 3-D DUAL-TREE WAVELET TRANSFORM The design and the motion-selectivity of 3-D dual-tree complex wavelet transform are described in [4]. A Daubechies-like algorithm for the construction of Hilbert pairs of short orthonormal (and biorthogonal) wavelet bases yields pairs of bases, which can be used to efficiently implement the motion-selective wavelet transform [5]. Figure 1 illustrates the difference between the standard 3-D DWT and the 3-D DDWT. The figure depicts the wavelets (i.e. the basis functions) associated with the 3-D DWT and the 3-D DDWT respectively. As illustrated, the 3-D DWT mixes different orientations in one wavelet basis, but the 3-D DDWT is free of this effect. The 3-D DDWT has many more subbands than the

3-D DWT (28 high subbands instead of 7, 4 low subbands instead of 1). Only a subset of the high subbands is drawn in Fig.1 for DDWT. The 28 high subbands isolate 2-D edges with different orientations that are moving in different directions. Because of this motion selectivity, the 3-D DDWT is likely to be substantially more effective for the representation of video than the 3-D DWT. A core element common to all state-of-the art video coders is motion-compensated temporal prediction, which is the main contributor to the complexity as well errorsensitivity of a video encoder. Because the subband coefficients associated with the 3-D DDWT directly capture moving edges in different directions, it may not be necessary to perform motion estimation explicitly. This is our primary motivation for exploring the use of 3-D DDWT for video coding. For all the results in this paper we use the standard Daubechies (9, 7)-tap filters in the DWT and the description of the DDWT is in [4]. Each transform uses three levels of wavelet decomposition. All results are tested on two sequences. Foreman (QCIF) and Mobile- Calendar (CIF). 3. ITERATIVE SELECTION OF COEFFICIENTS The major challenge to apply the 3-D complex DDWT for video coding is it is an overcomplete transform with 8:1 redundancy. In our current study, we chose to retain only the real parts of the wavelet coefficients, which can still lead to perfect reconstruction, while retaining the motion selectivity. This reduces the redundancy to 4:1 [4]. An overcomplete transform is not necessarily ineffective for coding, because a redundant set provides flexibility in choosing which basis functions to use in representing a signal. Even though the transform itself is redundant, the critical coefficients that must be retained to represent a video signal accurately can be smaller than that obtained with standard non-redundant transform. In addition, motion-selective oriented basis functions are likely to lead to between visual quality especially at lower bit rates. The matching pursuit algorithm is a well-known technique for video coding with the overcomplete representations [6]. With matching pursuit, larger coefficients are chosen iteratively to represent a signal, but once the largest coefficient is chosen from the remaining ones, its associated basis is deleted from the set of bases to be considered in the following iterations. Kingsbury proposed an iterative projection-based noise shaping (NS) scheme [3], which modifies previously chosen large coefficients to compensate for the loss of small coefficients. It was shown that noise shaping applied to 2-D DT-CWT can yield a more compact set of coefficients than from the 2-D DWT. In this section, we verify that NS is also effective for the 3-D DDWT. (a) Foreman (QCIF) (b) Mobile-Calendar (CIF) Fig. 2 PSNR (db) vs. number of non-zero coefficients for the DDWT using noise shaping (DDWT_NS, upper curve), the DWT (middle curve), and the DDWT without noise shaping (lower curve). Figure 2 compares the reconstruction quality (in terms of PSNR) using the same number of retained coefficients with DWT, DDWT with noise shaping (DDWT_NS) and DDWT without noise shaping (DDWT_w/o_NS). For a given number of coefficients to retain, N, the results for DWT and DDWT_w/o_NS are obtained by simply choosing the N largest ones from the original coefficients. With DDWT_NS, the coefficients are obtained by running the iterative projection algorithm with a preset initial threshold, and gradually reducing it until the number of remaining coefficients reaches N. Figure 2 shows that although the raw number of coefficients with 3-D DDWT is 4 times more than DWT, this number can be reduced substantially by noise shaping. In fact, with the same number of retained coefficients, DDWT_NS yields higher PSNR than DWT. For foreman, 3-D DDWT_NS has a slightly higher PSNR than the DWT (0.3~0.7 db), and is 4~6 db better than DDWT_w/o_NS. For Mobile-Calendar, the DDWT_NS is 1.5~3.4 db better than the DWT.

4. CORRELATION BETWEEN BASES 4.1. Correlation in Significant Maps Figure 2 shows that with DDWT_NS, we can use fewer coefficients to reach a desired reconstruction quality than DWT. However, this does not necessarily mean that DDWT_NS will require fewer bits. This is because DDWT coefficients are spread over more subbands than DWT, and specifying the location of a DDWT coefficient may require more bits than specifying the location of a DWT coefficient. The success of a wavelet-based coder critically depends on whether the location information can be coded efficiently. We hypothesize that although 3-D DDWT has many more subbands, only a few subbands have significant energy for an object feature. Specifically, an oriented edge moving in a particular direction is likely to generate significant coefficients only in the subbands with the same or adjacent spatial orientation and motion pattern. On the other hand, with the 3-D DWT, a moving object in an arbitrary direction that are not characterized by any specific wavelet basis will likely contribute to many small coefficients in all subbands. To validate this hypothesis, we compute the entropy of the vector consisting of the significance bits at the same spatial/temporal location across 28 high subbands. The significance bit in a particular subband is either 0 or 1 depending on whether the corresponding coefficient is below or above a chosen threshold. This entropy will be close to 28 if there is not much correlation between the 28 subbands. On the other hand, if the pattern that describes which basess are simultanously significant is highly predictable, the entropy should be much lower than 28. Similarly, we calculate the entropy of the significance bits across the 7 high subbands of DWT, and compare it to the maximum value of 7. Figure 3 compares the vector entropies of significant maps associated with DWT, DDWT_NS and DDWT_w/o_NS, for varying thresholds from 128 to 8. The results shown here are for the top scale only; other scales follow the same trend. We see that, with DDWT, even without noise shaping, the vector entropy is much lower than 28. Moreover, noise shaping helps reduce the entropy further. In contrast, with DWT, the vector entropy is close to 7 at some threshold values. Also noteworthy is that, at high thresholds, the entropy for DDWT_NS is quite close to that for DWT. 4.2. Correlation in coefficient values In addition to the correlation among the significance maps of all subbands, we also investigate the correlation between the actual coefficient values. Strong correlation would suggest vector quantization or predictive quantization among the subbands. Towards this goal, we compute the correlation matrix and variance of the 28 high subbands. Figure 4 illustrates the correlation matrices for the top scale, for both the DDWT_w/o_NS and DDWT_NS. We note that the correlation patterns in other scales are similar to this top scale. From these correlation matrices, we find that only a few subbands have stronger correlation, and most other subbands are almost independent. After noise shaping, the correlation between bases is reduced significantly. A greater number of subbands are almost independent from each other. It is interesting to note that, for the Foreman sequence (which has predominantly vertical edges and horizontal motion), bands 9-12 are highly correlated before and after noise shaping. The wavelets associated with these four bands have nearly vertical orientations but all moving in the horizontal direction. (a) Foreman (QCIF) Mobile-Calendar (CIF) (b) Fig. 3 The vector entropy of significant maps using the 3- D DWT (the lowest curve), the DDWT_NS (the middle curve) and the DDWT_w/o_NS (the upper curve), for the top scale. Figure 5 illustrates the energy distribution among the 28 subbands for the top scale with and without noise shaping. The energy distribution pattern depends on the edge and motion patterns in the underlying sequence. For example, the energy is more evenly distributed between different subbands with Mobile-Calendar. Furthermore, noise shaping helps to concentrate the energy into fewer subbands.

previous coefficients. How to deduce such coefficient sets is a challenging open research problem. (a) Foreman(QCIF) (a) Foreman (QCIF) (b) Mobile-Calendar (CIF) Fig. 4 The correlation matrices of the 28 subbands of 3-D DDWT_w/o_NS (left) and DDWT_NS (right). The grayscale is logrithmically related to the absolute value of the correlation. The brighter colors represent higher correlation. 5. CONCLUSION We demonstrated that 3-D DDWT has attractive properties for video representation. Although the 3-D DDWT is an overcomplete transform, the raw number of coefficients can be reduced substantially by applying noise shaping. The fact that noise shaping can reduce the number of coefficients to below that required by DWT (for the same video quality) is very encouraging. The vector entropy study validates our hypothesis that only a few bases have significant energy for an object feature. The relatively low vector entropy suggests that the whereabouts of significant coefficients may be coded efficiently by applying vector Huffman coding to the significance bits across subbands. The fact that coefficient values do not have strong correlation among the subbands, on the other hand, indicates that the benefit from vector coding the magnitude bits across the subbands may be limited. In terms of future work, the correlation among adjacent spatial and temporal coefficients still needs to be explored. Context-based arithmetic coding or quadtree coding may be used to explore such correlation if exists. Finally, with noise shaping, the optimal set of coefficients to be retained changes with the target bit rate. To design a scalable video coder, we would like to have a scalable set of coefficients so that each additional coefficient offers a maximum reduction in distortion without modifying the (b) Mobile-Calendar (CIF) Fig. 5 The relative energy of 3-D DDWT 28 subbands with (the right column in each subband) and without noise shaping (the left column). 6. REFERENCES [1] S-T Hsiang and J. W. Woods, Embedded video coding using invertible motion compensated 3-D subband/wavelet filter bank, Signal Processing: Image Communications, Vol. 16, pp. 705-724, May 2001 [2] N.G. Kingsbury, Complex wavelets for shift invariant analysis and filtering of signals, Applied Computational Harmonic Anal, vol. 10, no. 3, pp. 234-253, May 2001 [3] T H Reeves and N G Kingsbury, Overcomplete image coding using iterative projection-based noise shaping, ICIP 02, Rochester, NY, Sept 2002. [4] I. W. Selesnick, and L. Sendur, Video denoising using 2D and 3D dual-tree complex wavelet transforms, Wavelet Appl Signal Image Proc. X (Proc. SPIE 5207), Aug 2003. [5] I. W. Selesnick, The design of approximate Hilbert transform pairs of wavelet bases, IEEE Trans. on Signal Processing, 50(5): 1144-1152, May 2002 [6] S. G. Mallat, Z. Zhang, Matching pursuits with time frequency dictionaries, IEEE Trans. Signal Processing, Vol. 41, pp. 3397-3415, Dec. 1993.