A Real-Time Wavelet Vector Quantization Algorithm and Its VLSI Architecture

Similar documents
ECE533 Digital Image Processing. Embedded Zerotree Wavelet Image Codec

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding

+ (50% contribution by each member)

- An Image Coding Algorithm

State of the art Image Compression Techniques

Fast Progressive Wavelet Coding

Progressive Wavelet Coding of Images

A NEW BASIS SELECTION PARADIGM FOR WAVELET PACKET IMAGE CODING

Embedded Zerotree Wavelet (EZW)

MODERN video coding standards, such as H.263, H.264,

Transformation Techniques for Real Time High Speed Implementation of Nonlinear Algorithms

Design and Implementation of Multistage Vector Quantization Algorithm of Image compression assistant by Multiwavelet Transform

LATTICE VECTOR QUANTIZATION FOR IMAGE CODING USING EXPANSION OF CODEBOOK

CHAPTER 3. Transformed Vector Quantization with Orthogonal Polynomials Introduction Vector quantization

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

Wavelet Packet Based Digital Image Watermarking

EMBEDDED ZEROTREE WAVELET COMPRESSION

EE67I Multimedia Communication Systems

RESOLUTION SCALABLE AND RANDOM ACCESS DECODABLE IMAGE CODING WITH LOW TIME COMPLEXITY

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l

Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources

Fast Indexing of Lattice Vectors for Image Compression

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms

Enhanced Stochastic Bit Reshuffling for Fine Granular Scalable Video Coding

Implementation of CCSDS Recommended Standard for Image DC Compression

Compression and Coding

Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding

Efficient Bit-Channel Reliability Computation for Multi-Mode Polar Code Encoders and Decoders

EE368B Image and Video Compression

Vector Quantization and Subband Coding

A Nonuniform Quantization Scheme for High Speed SAR ADC Architecture

Can the sample being transmitted be used to refine its own PDF estimate?

A WAVELET BASED CODING SCHEME VIA ATOMIC APPROXIMATION AND ADAPTIVE SAMPLING OF THE LOWEST FREQUENCY BAND

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

wavelet scalar by run length and Huæman encoding.

THE newest video coding standard is known as H.264/AVC

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Wavelets, Filter Banks and Multiresolution Signal Processing

Module 4. Multi-Resolution Analysis. Version 2 ECE IIT, Kharagpur

A Low-Error Statistical Fixed-Width Multiplier and Its Applications

Scalar and Vector Quantization. National Chiao Tung University Chun-Jen Tsai 11/06/2014

Improved Successive Cancellation Flip Decoding of Polar Codes Based on Error Distribution

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

JPEG2000 High-Speed SNR Progressive Decoding Scheme

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 6, JUNE Constrained-Storage Vector Quantization with a Universal Codebook

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

On Compression Encrypted Data part 2. Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University

Multimedia Communications. Scalar Quantization

A Bit-Plane Decomposition Matrix-Based VLSI Integer Transform Architecture for HEVC

2018/5/3. YU Xiangyu

Low-Complexity FPGA Implementation of Compressive Sensing Reconstruction

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

CSE 408 Multimedia Information System Yezhou Yang

Introduction to Wavelet. Based on A. Mukherjee s lecture notes

Multimedia Networking ECE 599

Digital Image Watermarking Algorithm Based on Wavelet Packet

EFFICIENT IMAGE COMPRESSION ALGORITHMS USING EVOLVED WAVELETS

HARDWARE IMPLEMENTATION OF FIR/IIR DIGITAL FILTERS USING INTEGRAL STOCHASTIC COMPUTATION. Arash Ardakani, François Leduc-Primeau and Warren J.

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

Review of Quantization. Quantization. Bring in Probability Distribution. L-level Quantization. Uniform partition

INTEGER WAVELET TRANSFORM BASED LOSSLESS AUDIO COMPRESSION

Design and Implementation of Efficient Modulo 2 n +1 Adder

on a per-coecient basis in large images is computationally expensive. Further, the algorithm in [CR95] needs to be rerun, every time a new rate of com

High-Speed Low-Power Viterbi Decoder Design for TCM Decoders

Selective Use Of Multiple Entropy Models In Audio Coding

A High-Yield Area-Power Efficient DWT Hardware for Implantable Neural Interface Applications

Compression and Coding. Theory and Applications Part 1: Fundamentals

On Optimal Coding of Hidden Markov Sources

Multiple Description Coding for quincunx images.

Efficient Alphabet Partitioning Algorithms for Low-complexity Entropy Coding

Vector Quantization. Institut Mines-Telecom. Marco Cagnazzo, MN910 Advanced Compression

Encoder Decoder Design for Feedback Control over the Binary Symmetric Channel

Design of Optimal Quantizers for Distributed Source Coding

An Investigation of 3D Dual-Tree Wavelet Transform for Video Coding

CLASSICAL error control codes have been designed

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction

The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

ECC for NAND Flash. Osso Vahabzadeh. TexasLDPC Inc. Flash Memory Summit 2017 Santa Clara, CA 1

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000

SCALABLE AUDIO CODING USING WATERMARKING

Fault Tolerance Technique in Huffman Coding applies to Baseline JPEG

School of EECS Seoul National University

High rate soft output Viterbi decoder

Relating Entropy Theory to Test Data Compression

Proyecto final de carrera

Analysis and Synthesis of Weighted-Sum Functions

Study of Wavelet Functions of Discrete Wavelet Transformation in Image Watermarking

Proc. of NCC 2010, Chennai, India

Implementation of Lossless Huffman Coding: Image compression using K-Means algorithm and comparison vs. Random numbers and Message source

<Outline> JPEG 2000 Standard - Overview. Modes of current JPEG. JPEG Part I. JPEG 2000 Standard

Image Data Compression

Quantization 2.1 QUANTIZATION AND THE SOURCE ENCODER

Median architecture by accumulative parallel counters

Basic Principles of Video Coding

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

Transcription:

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 475 A Real-Time Wavelet Vector Quantization Algorithm and Its VLSI Architecture Seung-Kwon Paek and Lee-Sup Kim Abstract In this paper, a real-time wavelet image compression algorithm using vector quantization and its VLSI architecture are proposed. The proposed zerotree wavelet vector quantization (WVQ) algorithm focuses on the problem of how to reduce the computation time to encode wavelet images with high coding efficiency. A conventional wavelet image-compression algorithm exploits the tree structure of wavelet coefficients coupled with scalar quantization. However, they can not provide the real-time computation because they use iterative methods to decide zerotrees. In contrast, the zerotree WVQ algorithm predicts in real-time zero-vector trees of insignificant wavelet vectors by a noniterative decision rule and then encodes significant wavelet vectors by the classified VQ. These cause the zerotree WVQ algorithm to provide the best compromise between the coding performance and the computation time. The noniterative decision rule was extracted by the simulation results, which are based on the statistical characteristics of wavelet images. Moreover, the zerotree WVQ exploits the multistage VQ to encode the lowest frequency subband, which is generally known to be robust to wireless channel errors. The proposed WVQ VLSI architecture has only one VQ module to execute in real-time the proposed zerotree WVQ algorithm by utilizing the vacant cycles for zero-vector trees which are not transmitted. And the VQ module has only +1 processing elements (PE's) for the real-time minimum distance calculation, where the codebook size is. PE's are for Euclidean distance calculation and a PE is for parallel distance comparison. Compared with conventional architectures, the proposed VLSI architectures has very cost-effective hardware (H/W) to calculate zerotree WVQ algorithm in real time. Therefore, the zerotree WVQ algorithm and its VLSI architectures are very suitable to wireless image communication, because they provide high coding efficiency, real-time computation, and cost-effective H/W. Index Terms Image compression, vector quantizatioin, VLSI architecture, wavelet, wireless communications. I. INTRODUCTION IMAGE communication over wireless transmission channels is an emerging technology, and there are several challenging issues to overcome current technology limitations. The first requirement for wireless image communication is high compression efficiency to satisfy the limited bandwidth of wireless channels. Conventional standards for video compression, such as JPEG, MPEG, and H.263, have good coding efficiency but still require a much larger bandwidth than that of wireless communication channels to be currently supported. Secondly, the Manuscript received July 8, 1998; revised June 21, 1999. This paper was recommended by Associate Editor J. Villasenor. S.-K. Paek is with Samsung Electronics Company, Kiheung, Korea (e-mail: skpaek@samsung.co.kr). L.-S. Kim is with the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Taejon 305-701, Korea. Publisher Item Identifier S 1051-8215(00)02803-2. image-compression techniques robust to transmission channel errors are essential to wireless image communication, because wireless communication channels suffer from burst errors in which a large number of consecutive bits are lost or corrupted by the channel-fading effect. The conventional image-coding standards are very susceptible to transmission errors, and hence, they need powerful error-correction codes. Therefore, it is desirable to design a robust image-coding technique, which has a high compression ratio and produces acceptable image quality over a fading channel. Finally, we should consider image compression algorithms and their VLSI architectures which allow portable decoders with small size, low-power consumption, and acceptable reconstructed image quality. Since the introduction of wavelets as a signal processing tool, considerable attention has focused on the application of wavelets to image compression [1] [6]. Initial wavelet-based coding algorithms were designed to exploit the energy-compaction properties of the wavelet transform by applying scalar or vector quantizers for the statistics of each frequency band of wavelet coefficients. Such algorithms have demonstrated modest improvements in coding efficiency over standard transform-based algorithms. Recently, a new class of wavelet image-coding algorithms, which exploit both the frequency and spatial-compaction property of the wavelet transform, has attracted wide attention because its good performance appears to confirm the promised coding efficiencies [4] [6]. These algorithms couple standard scalar quantization of frequency coefficients with spatial tree-structured quantization, and set to zero a tree-structured set where all the high-frequency wavelet coefficients in the tree-structured set have value zero. We call the tree-structured set of zero coefficients as zerotree. The zerotree quantization scheme improves coding gains drastically because only the tree-map informations for the zerotrees are transmitted. Shapiro [4] developed embedded zerotree wavelet (EZW) algorithm which uses the zerotree coding and the successive approximation in an iterative scalar quantization approach. Zero-tree coding divides and conquers the coefficients efficiently using the zerotree structure and thus provides a compact multiresolution representation of significance maps, which are binary maps indicating the position of the significant coefficients. The Set Partitioning in Hierarchical Trees (SPIHT) algorithm by Said [5] used zerotree coding techniques like the EZW algorithm. However, the SPIHT algorithm exploited efficient progressive transmission scheme which orders the coefficient by magnitude and transmits the most significant bit first. Xiong [6] formalized the problem of joint optimization of zerotree scalar quantization. These zerotree wavelet image coding algorithms are based on the successive approximation 1051 8215/00$10.00 2000 IEEE

476 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 scalar quantization using the self-similarity of the wavelet image transform and are considered to have the best performance in the literature. However, these algorithms exploit the iterative successive approximation and order coefficients by magnitude every iteration step to search for zerotrees. Thus, it takes too much time to encode motion pictures using these zerotree wavelet image coding algorithms in real time. This paper focuses on the problem of reducing the computation time to encode wavelet images using a zerotree quantization scheme with maintaining the picture quality. We propose a real-time wavelet image coding algorithm, which we call a zerotree wavelet vector quantization (WVQ) algorithm. The zerotree WVQ algorithm searches for zerotrees in real time by a noniterative method and applies the zerotree structure to vector quantization schemes. Our algorithm computes the zerotree wavelet image coding in real time, which the iterative algorithms in the literature cannot support. Furthermore, our noniterative method keeps almost the same coding performance as the conventional iterative methods. The real-time zerotree WVQ algorithm combines the multistage VQ (MSVQ) and the classified VQ (CVQ). The MSVQ encodes the lowest frequency subband of wavelet images and has the advantage of robustness to wireless channel errors [7]. The CVQ encodes the other higher frequency subbands using a decision rule which predicts zero-vector trees in real-time and hence has the high coding efficiency. Vector quantization have theoretically better coding performance than scalar quantization. Furthermore, VQ decoding hardware (H/W) has very simple architecture. Thus these two properties of VQ, high coding efficiency and simple decoding H/W, make it suitable to wireless image communication. Nevertheless, the vector quantization has not been exploited in practical implementation because it needs complex encoding H/W and large storage size. This paper focuses on the problem of reducing the encoding H/W complexity of the proposed zerotree WVQ algorithm. Generally, -stage MSVQ VLSI architecture requires VQ modules, each of which searches for the best codeword with the minimum distortion in the codebook. But, the proposed WVQ VLSI architecture has only one VQ module to perform the -stage MSVQ and the CVQ in real time by utilizing the vacant cycles for zero-vector trees which are not transmitted. Thus, the proposed VLSI architecture reduces the VQ H/W size to approximately of the conventional VLSI architectures. Furthermore, the proposed VQ module has only processing elements (PE's) for the real-time minimum distance calculation where the codebook size is. PE's are for Euclidean distance calculation and a PE is for parallel distance comparison. The distance-comparison PE converts all the Euclidean distance values from inner-product PE's to bit-serial sequences and then compares these bits with each other in parallel. Compared with conventional architectures, the proposed VLSI architecture for the zerotree WVQ is very cost-effective and calculates zerotree WVQ in real-time. We had proposed the semi-recursive 2-D discrete wavelet transform (DWT) VLSI architecture for the real-time wavelet transform [8], [9]. Therefore, the semi-recursive 2-D DWT VLSI architecture and the proposed cost-effective WVQ VLSI architecture execute Fig. 1. Same orientation cross-band vector from octave-band subbands. the two-dimensional discrete wavelet transform of images and the proposed zerotree WVQ algorithm of the wavelet images in real-time. Section II begins by defining the tree-structured hierarchy of wavelet coefficients related to spatial regions of images and wavelet self-similarity. We then define zerotree WVQ and describe the proposed zerotree WVQ algorithm, which combines the MSVQ and the CVQ. Section III describes the overall VLSI architecture of the proposed zerotree WVQ and then details of the VQ module which calculate the minimum distortion. Finally, Section IV presents the simulation results of the proposed algorithm and its VLSI architecture. II. WVQ ALGORITHM A. Definition of the Tree Structure In the hierarchical subband image decomposition using wavelet multiresolution analysis, each coefficient, except those coefficients in the lowest subband and the three highest subbands, is exactly related to 2 2 coefficients of the immediately higher subband. These four children coefficients correspond to the same orientation and spatial location as the parent coefficient does in the original image. Each of the four children coefficients is in turn related to 2 2 coefficients in the next higher subband and so on. These coefficients are collectively called the descendants of the parent and a residue tree is defined as the set of all descendants of a parent coefficient in a tree. Fig. 1 shows the tree-structured set of coefficients in wavelet image decomposition. In wavelet image decomposition, it is often true that when a coefficient has a magnitude less than some threshold value, all of its descendants, its residue tree, will have magnitude less than, too. The coefficient with magnitude less than is called an insignificant coefficient, and the coefficient with magnitude greater than is called a significant coefficient. The collection of these insignificant coefficients is then called the zerotree, with respect to, and the coefficient is called as the zerotree root. Thus, the zerotree is defined as an all-zero residue tree. In the literature, a few studies attempted to apply the zerotree idea to scalar quantization in wavelet image coding [4] [6]. In this paper, an insignificant coefficient and the zerotree in scalar quantization are extended to an insignificant vector

PAEK AND KIM: REAL-TIME WAVELET VECTOR QUANTIZATION ALGORITHM 477 Fig. 2. Flowchart of the noniterative search of zero-vector trees. and the zero-vector tree in vector quantization of wavelet coefficients, respectively. The vector is called the insignificant vector, where all of its elements are insignificant coefficients. The zero-vector tree is defined as the residue tree of an insignificant parent vector where all of its descendant vectors are insignificant vectors, and the insignificant parent vector is defined as the zero-vector tree root. That is, a residue tree of wavelet coefficient vectors is a zero-vector tree when a parent vector is an insignificant vector and all of its descendant vectors with the same orientation and spatial location as the parent vector are insignificant vectors, too. B. Non-Iterative Searching of Zero-Vector Trees In this paper, we propose the zerotree WVQ algorithm. In the zerotree WVQ algorithm, we should decide first whether an vector of wavelet coefficients to be coded is significant or insignificant. We decide the vector is insignificant when all the wavelet coefficients in the vector have the magnitudes less than some threshold value. That is, an vector is decided to be insignificant when all the elements of the vector are insignificant coefficients. In other cases, the vector is decided to be significant if there is at least one significant coefficient. The occurrence probability of insignificant vectors in wavelet images depends on the statistical characteristics of the images, frequency bands, the vector dimension, and the threshold value. Generally, the occurrence probability of insignificant vectors in three-level wavelet decomposition of images is 0.15 0.2 at at, or 0.4 0.9 at, where the resolution of each frequency subband is. We will show the statistical characteristics of insignificant vectors and the zero-vector trees in Section IV. As we can see from the simulation results in Section IV, there are many insignificant vectors in wavelet images. Especially, most of the vectors in the highest frequency subbands with the finest resolution are insignificant. Therefore, the coding efficiency can be increased dramatically if the insignificant vectors cannot be transmitted to the decoder. The zerotree wavelet algorithms using scalar quantizations in the literature search for all the insignificant coefficients iteratively and organize the zerotree binary map, which depicts the position of the insignificant coefficients. They then transmit the zerotree map instead of the quantized values of the insignificant coefficients, which causes the coding efficiency to be improved greatly. However, those earlier zerotree wavelet image-coding algorithms take too much time to encode motion pictures in real time, because they search for the zerotrees by iterative methods. Additionally, they should transmit side information of the zerotree map to the decoder, in addition to the quantized value of the significant coefficients. In contrast, the proposed zerotree WVQ algorithm decides the zero-vector trees in real time by a noniterative method. Furthermore, it transmits only the quantized value, i.e., the codebook index of the significant vectors, and no side information of the zero-vector tree map. In the zerotree WVQ algorithm, searching for the insignificant vectors is started in the lowest frequency subbands except the baseband, and is then continued in the next higher frequency subbands. When a vector is decided as the insignificant vector during the

478 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 Fig. 3. Proposed WVQ image-coding scheme. search, the vector and its residue tree are considered to be a zerotree root vector and a zero-vector tree, respectively. All the descendant vectors of the zerotree root vector are considered as insignificant vectors, and hence, are thrown away, not to be transmitted to the decoder. Therefore, only the escape code for the zerotree root vector instead of the zero-vector tree map is followed by the entropy coding and are then transmitted to the decoder for all the vectors of the zero-vector tree. When a vector to be searched is decided to be significant vector, the codebook index of the vector is followed by the entropy coding and is then transmitted to the decoder. When a vector to be searched is the vector whose parent vector was decided to be a zerotree root vector or one of its descendant insignificant vectors, the vector is automatically considered as the insignificant vector, then thrown away not to be transmitted. This search steps are executed sequentially and noniteratively. It causes the zerotree WVQ algorithm to be executed in real time. The flowchart of noniterative search of zero-vector tree is shown in Fig. 2. It is possible there are significant descendant vectors, although their parent vector is insignificant. In this case, our algorithm puts the wrong decision in predicting whether the descendant vectors are significant or not. This inevitable misprediction is caused by the noniterative searching method. Compared with the iterative searching method, the coding performance of the noniterative searching method is decreased slightly by the misprediction. However, the conditional probability Insignificant Vector at Insignificant Vector at is about 0.90 at or at. So, the ratio of misprediction is about 0.1 at or at. This means that if parent vector in a subband with resolution is insignificant, most of their descendant vectors in the immediately next higher frequency subband are insignificant, too. Therefore, in spite of its misprediction, it is reasonable that our algorithm decides the residue tree of an insignificant vector as a zero-vector tree for real time and noniterative computations of the zerotree WVQ. C. Image Compression Using WVQ We propose the real-time WVQ algorithm which combines the MSVQ and the CVQ in Fig. 3. In a wavelet-decomposed image, most of the information is concentrated in the lowest frequency subband. Any distortion in the lowest frequency subband causes critical defects on the quality of the reconstructed image. This defect is more serious in wireless image communication, where a large number of consecutive bits are lost or corrupted by the channel fading effect [7] [11]. In this paper, the MSVQ is employed to encode the lowest frequency subband, which alleviates the impact of channel fading on the lowest frequency subband. The multistage VQ consists of several consecutive VQ stages where each stage has a small size codebook. The first VQ stage performs a relatively crude quantization of the input vector and then the second VQ stage quantizes the error vector between the input vector and the first VQ stage output vector. The quantized error provides a second approximation to the input vector, leading to a refined representation of the input vector. The third VQ stage quantizes the second-stage error vector to provide a further refinement, and so on. In the MSVQ, most of the information in the lowest frequency subband is encoded by the first stage VQ, and hence, is reconstructed by the decoder if the information of the first stage VQ is protected from channel fading errors. Therefore, the MSVQ is robust to transmission channel errors and is very effective to compress the wavelet images for wireless image communication [7].

PAEK AND KIM: REAL-TIME WAVELET VECTOR QUANTIZATION ALGORITHM 479 (a) (b) Fig. 4. Multistage VQ structure. (a) Encoder block diagram. (b) Decoder block diagram. Fig. 5. VQ VLSI architecture for MSVQ and CVQ. The CVQ is employed to encode the other higher frequency subbands. In the typical CVQ, the input vectors are classified by classifiers into several classes which have different codebooks, and are then quantized by the codebook corresponding to the class of each input vector. The codebook of each class is designed by a prior knowledge of statistical characteristics of the input vectors, and hence, the quantization distortion in the CVQ is relatively small. On the other hand, the computational complexity or H/W size cannot but increase because the CVQ should have the classifier. In the wavelet image decomposition, horizontal and vertical 1-D DWT filters classify the anomalies such as edges or object boundaries by their orientations and frequency subbands. These intrinsic filter characteristics make the CVQ adopted in wavelet image coding without classifiers and additional H/W s. Especially, the CVQ employed in this paper exploits the noniterative searching for zero-vector trees which we mentioned above. That is, the input vector decided as the significant vector by the noniterative search method is quantized by the CVQ and then its codebook index, followed by the entropy coding, is transmitted to the decoder. In contrast, the zero-vector trees decided as the insignificant vectors by the noniterative search method are not transmitted to the decoder, and only the escape codes for the zerotree root vectors, followed by the entropy coding, are transmitted to the decoder. Therefore, the coding performance of the proposed CVQ is much better than that of the typical CVQ.

480 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 III. WVQ VLSI ARCHITECTURE A vector quantization of dimension and size is defined as mapping from a vector in a -dimensional Euclidean space into one of the reproduction vectors, called codewords. The best codeword for the input vector is to minimize a Euclidean distance between the input vector and a codeword in the codebook, which is defined as (1) is neglected in distance comparison and is stored in a memory with a pre-determined value. Thus, the minimum distance calculation (MDC) is defined as Fig. 6. Cost-effective VQ VLSI architecture for MSVQ and CVQ. (2) Generally, the VQ encoding VLSI architecture consists of codebook storages and the MDC units to search for the best codeword. The MDC unit requires many computations of multiplication and addition to calculate the Euclidean distance and many computations of comparison to decide the minimum distance among the Euclidean distance values. Also, real-time MDC is essential to encode moving pictures. Hence, it is very important to implement the the real-time VQ encoding H/W with a small size of MDC. A. Cost-Effective VLSI Architecture for MSVQ CVQ In the proposed WVQ algorithm, we exploit the MSVQ in the lowest frequency subband and the CVQ coupled with the noniterative zero-vector tree search method in the other higher frequency subbands. Conventional -stage MSVQ VLSI architecture in Fig. 4 should have MDC units for real-time computation. For example, the VLSI architecture combining the 3-stage MSVQ and the CVQ should have three MDC units for the MSVQ and one MDC unit for the CVQ. Although the MDC unit for the CVQ is shared with the MDC units for the MSVQ by time-multiplexing, there should be at least three MDC units as in Fig. 5 However, the proposed cost-effective WVQ VLSI architecture in Fig. 6 has only one MDC unit to compute, in real time, the proposed WVQ algorithm, which combines the 3-stage MSVQ and the CVQ. In the figure, the wavelet image in the lowest subband is quantized by the MDC unit and VQ codebooks and then the error image is stored in a buffer and is then quantized again. After the MSVQ of the lowest subband wavelet image, the CVQ of the other subbands is executed in the MDC unit. For real-time VQ computation, the throughput of the MDC unit should be at least one vector element/cycle. When the MDC unit is assumed to have the throughput of one, the -stage MSVQ with MDC units, as in Fig. 5, has the same throughput of one because it has a fully pipelined architecture. In contrast, the -stage MSVQ with only one MDC unit, as in Fig. 6, generally has the throughput of because it calculates each stage of the MSVQ recursively. Therefore, if it is assumed that vector dimension is and the number of vectors in the lowest frequency subband is, the -stage MSVQ Fig. 7. (a) (b) PE VLSI architecture. (a) IPU. (b) DCU. requires cycles to quantize the vectors in the WVQ VLSI architecture with MDC units. On the other hand, it requires cycles in the WVQ VLSI architecture with only one MDC unit. So cycles should be stalled to execute the CVQ

PAEK AND KIM: REAL-TIME WAVELET VECTOR QUANTIZATION ALGORITHM 481 Fig. 9. High-performance MDC VLSI architecture. (a) Fig. 8. Three types of systolic architecture. (a) Throughput = k. (b) Throughput = 1. (b) Fig. 10. Parallel comparison DCU with 1-bit serial input. in the next higher frequency subband, and thus, the throughput of is insufficient for the real-time -stage MSVQ. In the proposed WVQ algorithm, the CVQ exploits the noniterative searching for zero-vector trees, and hence, skips the insignificant vectors in zero-vector trees which are not transmitted. Therefore, there are many vacant cycles which are caused by the skipped insignificant vectors. The cost-effective WVQ VLSI architecture utilizes these vacant cycles for real-time WVQ. That is, vacant cycles in the CVQ produce faster throughput than the throughput required to compute the CVQ in real time. This faster throughput of the CVQ compensates insufficient throughput of the MSVQ. Only if more than the vectors to be quantized by the CVQ are decided as the skipped insignificant vectors, the cycles stalled by the -stage MSVQ are compensated by the CVQ and thus the WVQ algorithm with -stage MSVQ, and the CVQ is executed in real time by the cost-effective WVQ VLSI architecture. Actually, as we will show the simulation results, the number of skipped insignificant vectors in the CVQ is much more than the, where is smaller than five. In conclusion, the proposed cost-effective WVQ VLSI architecture has the minimum H/W cost of the only one MDC unit and supports the real-time execution of the proposed WVQ algorithm with the -stage MSVQ and the CVQ. B. VLSI Architecture for Minimum Distance Calculation The MDC consists of two kinds of PE's. One is the inner product unit (IPU) to calculate the inner product in of (2). And the other is the distance comparison unit (DCU) to decide the maximum value of. The typical VLSI architectures of IPU and DCU are depicted in Fig. 7. In the IPU, an input vector element is multiplied by a codeword element, and then the result is added to the in the first cycle, or to the accumulated sum in the other cycles. The final result of are produced in the last cycle. In the DCU, the of current stage and the of previous stage are compared, and then the larger value between them and its codebook index are selected. In the literature, there are several VQ VLSI architectures using those two basic processing elements. The 2-D systolic VQ architecture in Fig. 8(a) has the throughput of but has the very large size H/W with PE's [12]. The modified architecture in Fig. 8 (bagamel) has the medium-size H/W of and the throughput of one, which supports real-time MDC [13], but it still has a long latency of and large cost of H/W. In this paper, we propose a new MDC VLSI architecture which is shown in Fig. 9. It has the throughput of one and the latency of.ithas PE's, of which one is for the DCU

482 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 Fig. 11. Parallel comparison DCU with 3-bit serial input. and the others are for the IPU. Therefore, the proposed MDC architecture provides real-time MDC with smaller latency and smaller size H/W, compared with the conventional MDC architectures. The MDC architecture comparison will be shown in Section IV. In the proposed MDC architecture, inner products of the input vector and the codeword are calculated in parallel by the IPU's. At the th stage, the vector element and the codeword element are sent to the th IPU and the following is calculated in the th IPU: where (3) After the latency of, the th IPU produces the output value of the in (2) and then sends the to the DCU. The DCU compares all the distance values from IPU s concurrently and then produces the best codeword with the minimum distance with the latency of.

PAEK AND KIM: REAL-TIME WAVELET VECTOR QUANTIZATION ALGORITHM 483 (a) (b) Fig. 12. Probability of insignificant vectors in wavelet images. The IPU of Fig. 9 has the same architecture as Fig. 7(a). However, we propose a new architecture for the DCU, which modifies a conventional winner-take-all (WTA) architecture [14] by using a status register (SR) with a tri-state buffer (/OE). It causes the proposed DCU to have a smaller gate count than the sequential logic in [14]. In addition, the new DCU architecture compares distance values not only by 1 bit, but also by bits, where the distance values have the bit width of

484 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 Fig. 13. Insignificant vectors in wavelet-decomposed Lena image.. Therefore, the new DCU executes the real-time distance comparison, although the vector dimension is smaller than the bit-width, which the conventional WTA cannot support. The proposed DCU is depicted in Figs. 10 and 11. In the DCU of Fig. 10, the from a IPU is converted to a bit-serial input, which means the th bit of, and then the input bits from all the IPU's are compared with each other. The th status register contains 0 state when the from the th IPU has the largest value among the up to the current th bit comparison stage. Otherwise, contains 1 state. The distance comparison in the DCU starts from the most significant bit. Therefore, when the previous state of at the th bit-comparison stage was 0, the current state of at the th bit-comparison stage should be 0 whatever the input bit has. On the other hand, the current state is updated according to the input bit when the previous state is 1. At the final stage of, there is only one SR with state 1 and the input distance to the SR is decided as the minimum distance, which is stored in the register. Ifthe vector dimension is smaller than the bit-width, the DCU with 1-bit serial input in Fig. 10 cannot compare the distance values in real time, because 1-bit serial conversion requires cycles but the IPU requires cycles to calculate the Euclidean distance for an input vector. Thus, it should be required for the real-time comparison to convert the output of the IPU to the -bit serial input of the DCU. Figure 11 shows the -bit comparison DCU in case of. The proposed -bit comparison DCU converts the distance value to -bit serial input and then compares these -bit serial input values from the IPU's with each other every clock cycle. The -bit comparison DCU consists of 1-bit serial comparison units as shown in Fig. 10 and additional logics for the comparison of the outputs of 1-bit serial comparison. The additional comparison logics begin to compare the most significant bits (MSB's). If there is only one SR with state 1 among all the SR's for the MSB's, they produce the states of those SR's. However, if those SR's have the same states, they continue to compare the next MSB's. These comparison steps are continued upto the comparison of the LSB's. IV. SIMULATION RESULTS Experiments are performed on various 512 512 grey-scale images to test the proposed zerotree WVQ algorithm at several

PAEK AND KIM: REAL-TIME WAVELET VECTOR QUANTIZATION ALGORITHM 485 Fig. 14. Conditional probability of zero-vector trees. bit rates. We use the 7 9 biorthogonal set of [2] in all our experiments, and a 3-scale wavelet decomposition with the coarsest low-pass band having dimension 64 64. This lowest frequency band is coded by MSVQ. The other remaining frequency bands are coded by CVQ using the noniterative search of zero-vector trees. For the CVQ, we classify the remaining frequency bands except the coarsest lowpass band into nine classes according to their orientation and scale. We classify their orientation into horizontal, vertical, and diagonal and also classify their scale into three levels. If we define the class as the set of its orientation and scale, the classes from class 0 to 8 are denoted as, respectively. For decoding, bands are scanned from coarse to fine scale, so that no descendant vectors are output before their parent vector. First, we illustrate several statistical characteristics of the set of wavelet coefficients, which are closely related with the proposed noniterative algorithm to search for the zero-vector trees. Second, we compare the coding performance of the our WVQ algorithm with some of the conventional wavelet image-compression algorithms by Xiong [6], Shapiro [4], Antonini [2], and Lewis [3].Then, we show how our cost-effective WVQ VLSI architecture is proved to execute with the only one MDC module the CVQ coupled with the -stage MSVQ in real time. Finally, we compare the H/W performance of the proposed MDC VLSI architecture with the conventional MDC architectures. A. Coding Performance The proposed zerotree WVQ algorithm exploits intrinsic characteristics of the self-similarity of wavelet coefficients. The occurrence probabilities of insignificant wavelet coefficients and insignificant vectors are illustrated in Fig. 12. In the case of Fig. 12(a), where threshold value at and is 6, 10, and 16, respectively, the probability of insignificant coefficients in each class is about 0.42 0.98 and the probability of insignificant vectors with dimension of 2 2 is about 0.14 0.96. In contrast, the probability of insignificant vectors with dimension of 4 4 is about 0.03 0.92. Also, as shown in Fig. 12(b), the probabilities of insignificant vectors decrease as decreases. Therefore, we can see the probability of insignificant vectors increases as the frequency scale is finer, the vector dimension decreases, and threshold values increase. However, small vector dimension produces large number of bits to be coded, and large threshold value causes degradation of picture quality. Consequently, there should be a tradeoff to determine the vector dimension and threshold values. Fig, 13 shows the distribution of insignificant vectors in wavelet decomposed Lena image when the vector dimension is 2 2 and at, and is 6, 10, and 16, respectively. The dark region represents the insignificant vector. We can see most of the vectors in the finest scale subbands are insignificant. The proposed zerotree WVQ algorithm exploits the tree-structured set of insignificant vectors to reduce the number

486 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 Fig. 15. Comparison of coding performance. of bits to be transmitted. The probability of zero-vector trees are depicted in Fig. 14. The probability of zerotree root vector means the conditional probability that a vector is a zerotree root vector assuming the vector is insignificant. From Fig. 14, most of the insignificant vectors in class 0, 1, 2 where are proved to formulate the zero-vector trees. So, it is reasonable that our algorithm decides noniteratively the residue tree of an insignificant vector as zero-vector tree for real-time computation of the WVQ. But actually, there are insignificant parent vectors, whose descendant vectors are not insignificant. It leads to misprediction of zero-vector trees, which are depicted in Fig. 14, too. As the scale becomes finer, the probability of misprediction becomes lower, and hence, is negligible. The probability of descendant zero-vectors means the conditional probability that a vector is a descendant zero-vector whose parent vectors are also insignificant, assuming the vector is insignificant. As shown in Fig. 14, the probability of descendant zero-vectors in class 6, 7, and 8 is very high. Thus, most of the insignificant vectors in the fine scale are descendant zero-vectors which are included in zero-vector trees. Moreover, as shown in Fig. 12(a), most of the vectors in the fine scale are insignificant. Therefore, our algorithm significantly reduces the number of bits to be coded, because our algorithm does not transmit any information for the descendant zero-vectors and just transmits the escape code for the zerotree root vectors. The zerotree WVQ algorithm consists of the -stage MSVQ and the CVQ using the noniterative search of zero-vector trees. However, we apply in the simulation 1-stage MSVQ, i.e., a typical VQ for -stage MSVQ to compare the coding performance of our zerotree WVQ algorithm more objectively with the conventional wavelet image-compression algorithms. We use various vector dimensions and codebook sizes for various bit rates, and a fixed set of threshold values, which is at. We also apply entropy coding to the codebook indices and escape codes, which are produced by the outputs of our zerotree WVQ algorithm. For the vector quantization, we made various codebooks from ten training images. We compared the performance of our algorithm with some of conventional wavelet image compression algorithms by Xiong [6], Shapiro [4], Antonini [2], and Lewis [3]. The SFQ algorithm by Xiong [6] and the EZW algorithm by Shapiro [4] are known to be some of the best wavelet image-compression algorithms in the literature, but they use iterative methods to search the zerotree map. Antonini [2] applied classified VQ to the wavelet images and Lewis [3] proposed an wavelet image-compression algorithm suitable to VLSI implementation. Fig. 15 shows the comparison of coding performance of wavelet image-compression algorithms. Simulation shows that our algorithm has slightly lower coding performance than the SFQ and the EZW using iterative methods, but is competitive with Antonini's and Lewis's algorithms. Therefore, our algorithm makes the best tradeoff between the coding performance and the real-time computation of wavelet image compression. Examples of simulation results of our algorithm are shown in Figs. 16 and 17.

PAEK AND KIM: REAL-TIME WAVELET VECTOR QUANTIZATION ALGORITHM 487 (a) (a) Fig. 16. Example of simulation result. (a) Original Lena image (512 2 512). (b) PSNR = 33.12 db at 0.255 bits/pixel. (b) B. VLSI Architecture Comparison The proposed cost-effective WVQ VLSI architecture in Fig. 6 has only one MDC unit to compute, in real time, the proposed zerotree WVQ algorithm, which combines the -stage MSVQ and the CVQ. Only if more than the vectors to be quantized by the CVQ are decided as the skipped insignificant vectors, the cycles stalled by the -stage MSVQ are compensated by the CVQ. Thus, the WVQ algorithm with -stage MSVQ and the CVQ is executed in real time by the cost-effective WVQ VLSI architecture. Actually, as shown in Figs. 12(a) and 14, the probability of insignificant vectors at is at least 0.9 and the conditional probability of descendant insignificant vectors at is at least 0.6. So, the total number of skipped insignificant vectors in class 6 8isat least 0.6 0.9 (the total number of vectors in class 6 8. In the case of 3-scale wavelet decomposition of a 512 512 image and the vector dimension 2 2, the total number of vectors in class 6 8is. On the other hand, is. Therefore, the simulation results shows (b) Fig. 17. Example of simulation result. (a) Original Glodhill image (512 2 512). (b) PSNR = 32.00 db at 0.478 bits/pixel. the total number of skipped insignificant vectors in class 6 8 is much greater than. It means that the cost-effective WVQ VLSI architecture executes the zerotree WVQ algorithm in real-time with only one MDC unit. The cost-effective VLSI architecture with a MDC unit reduces the number of functional units such as multipliers and adders to, compared with the conventional VLSI architecture with MDC units. We compared the H/W complexity and the processing speed of the 2-D systolic architecture [12], the modified systolic architecture [13], and the proposed MDC architecture in Fig. 9, all of which supports the real-time vector quantization. The comparison results are shown in Table I. The proposed MDC architecture has processing elements of IPU and each of the IPU has a multiplier, an adder and a register. Thus, the proposed MDC architecture has the minimum H/W complexity of the IPU among the three architectures. The 2-D and the modified systolic architectures have the same DCU architecture, and each of the DCU has an -bit comparator and two registers. So, the DCU in [12] or [13] has about 14 equivalent gates, where 6 gates

488 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 TABLE I COMPARISON OF MDC VLSI ARCHITECTURE are for the comparators and 8 gates are for the registers. On the other hand, the DCU in the proposed MDC architecture has about 7 gates, where 4 gates are for the bit-serial conversion, 3 gates are for the modified WTA in Fig. 10(b), and 5 gates are for the final state encoding. In the case of, and, the DCU in [12] or [13] has about 22 000 gates, whereas the the DCU in the proposed MDC has about 11 500 gates. Furthermore, the proposed MDC architecture has the shortest latency of among the three architectures. V. CONCLUSION This paper describes a new image-compression scheme combining the wavelet transform and vector quantization, which is suitable to wireless image communication. The proposed WVQ algorithm and its VLSI architecture are satisfied with some requirements such high coding efficiency, real-time computation, and small-size H/W for wireless image communication. The proposed zerotree WVQ algorithm focused on the problem of how to reduce the computation time to encode images with high coding efficiency. The conventional wavelet image-compression algorithm exploits the tree structure of wavelet coefficients coupled with scalar quantization. Their coding performance is competitive with all known techniques. However, they could not provide the real-time computation. In contrast, the zerotree WVQ algorithm provide the best compromise between the coding performance and the computation time. These advantages of coding performance and real-time computation in the proposed algorithm are attributed to the use of zero-vector trees, which are formulated by the noniterative decision rule. The proposed noniterative decision rule was extracted by the simulation results which are based on the statistical characteristics of various wavelet images. In addition, the zerotree WVQ algorithm exploited two different VQ techniques. The MSVQ has been known to be robust to wireless channel errors, and the CVQ using zero-vector trees was proved to have the good coding efficiency comparable to coding algorithms in the literature. With respect to VLSI architectures, this paper focused on the minimization of H/W resources for the proposed zerotree WVQ algorithm. In the vector quantization, most of computational budget and H/W resources are required in the minimum distance calculation. Generally, -stage MSVQ combined with the CVQ requires MDC units. However, the proposed cost-effective VLSI architecture for the zerotree WVQ algorithm executes real-time computation of -stage MSVQ combined with the CVQ with only one MDC unit. Compared with conventional architectures, the cost-effective WVQ VLSI architecture has remarkably reduced the number of functional units such as multipliers and adders to. Furthermore, the proposed MDC VLSI architecture has the minimum H/W size and shortest latency and provides real-time computation of the minimum distance. In conclusion, the proposed zerotree WVQ algorithm and its VLSI architectures are satisfied with high coding efficiency, real-time computation, and small-size H/W. Therefore, the proposed WVQ algorithm and its VLSI architectures are suitable to wireless image communication. REFERENCES [1] J. W. Woods and S. O'Neil, Subband coding of images, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 1278 1288, Oct. 1986. [2] P. Mathieu, M. Antonini, M. Barlaud, and I. Daubechies, Image coding using wavelet transform, IEEE Trans. Image Processing, vol. 1, pp. 205 220, Apr. 1992. [3] A. Lewis and G. Knowles, Image compression using the 2-D wavelet transform., IEEE Trans. Image Processing, vol. 1, pp. 244 250, Apr. 1992. [4] J. M. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Processing, vol. 41, pp. 3445 3462, Dec. 1993. [5] A. Said and W. A. Pearlman, A new, fast, and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits Syst. Video Tech., vol. 6, pp. 243 250, June 1996. [6] K. Ramchandran, Z. Xiong, and M. T. Ochard, Space-frequency quantization for wavelet image coding, IEEE Trans. Image Processing, vol. 6, pp. 677 693, May 1997. [7] E. S. Jang and N. M. Nasrabadi, Subband coding with multistage VQ for wireless image communication, IEEE Trans. Circuits Syst. Video Technol., vol. 5, pp. 247 253, June 1995. [8] S. K. Paek and L. S. Kim, 2-D DWT VLSI architecture for wavelet image processing, Electron. Lett., vol. 34, pp. 537 538, Mar. 1998. [9] H. K. Jeon, S. K. Paek, and L. S. Kim, Semi-recursive VLSI architecture for two dimensional discrete wavelet transform, in IEEE Int. Symp. Circuits and Systems, May 1998, pp. 469 472. [10] L. Hanzo, R. Stedman, H. Gharavi, and R. Steele, Transmission of subband-coded images via mobile channels, IEEE Trans. Circuits Syst. Video Technol., vol. 26, pp. 15 26, Feb. 1993.

PAEK AND KIM: REAL-TIME WAVELET VECTOR QUANTIZATION ALGORITHM 489 [11] P. Fletcher, DECT-standard demo puts full-motion video over cordlesstelephone link, Electron. Design, vol. 40, p. 34, Sept. 1992. [12] P. R. Cappello, G. A. Davison, and A. Gersho, Systolic architecture for vector quantization, IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 1651 1664, Oct. 1988. [13] D. S. Kim, Minimum-distance classified vector quantizer and its systolic array architecture, J. Korea Inst. Tele. Electron., vol. 32, pp. 77 86, May 1995. [14] T. Shibata and A. Nakata, A fully-parallel vector quantization processor for real-time motion picture compression, IEEE Int. Solid-State Circuits Conf., vol. 40, pp. 270 271, Feb. 1997. Seung-Kwon Paek received the B.S. and M.S. degrees in control and instrumentation engineering from Seoul National University, Seoul, Korea, in 1987 and 1989. He received the Ph.D. degree in electrical engineering from Korea Advanced Institute of Science and Technology, Taejon, Korea, in 1999. Since 1989, he has been with Samsung Electronics Company, Kiheung, Korea. His research interests include VLSI architectures and algorithms, multimedia VLSI design, and video data compression. Lee-Sup Kim received the B.S. degree in electronics engineering from Seoul National University, Korea, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1986 and 1990, respectively. He was a post-doctoral Fellow with Toshiba Corporation, Kawasaki, Japan, during 1990 1993, where he was involved in the design of the high-performance DSP and single-chip MPEG2 decoder. Since March 1993, he has been with Korea Advanced Institute of Science and Technology, Taejon, Korea, where he is an Associate Professor. His research interests are multimedia VLSI design, hardware implementation of signal processing and 3-D graphics algorithms, and low-power IC design.