RESOLUTION SCALABLE AND RANDOM ACCESS DECODABLE IMAGE CODING WITH LOW TIME COMPLEXITY

Size: px

Start display at page:

Download "RESOLUTION SCALABLE AND RANDOM ACCESS DECODABLE IMAGE CODING WITH LOW TIME COMPLEXITY"

Gerard Andrews
5 years ago
Views:

1 RESOLUTION SCALABLE AND RANDOM ACCESS DECODABLE IMAGE CODING WITH LOW TIME COMPLEXITY By Yushin Cho A Thesis Submitted to the Graduate Faculty of Rensselaer Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major Subject: Computer Science Approved by the Examining Committee: William A. Pearlman, Thesis Adviser Daniel Freedman, Thesis Adviser Mukkai Krishnamoorthy, Member John W. Woods, Member Rensselaer Polytechnic Institute Troy, New York July 2005 (For Graduation August 2005)

2 RESOLUTION SCALABLE AND RANDOM ACCESS DECODABLE IMAGE CODING WITH LOW TIME COMPLEXITY By Yushin Cho An Abstract of a Thesis Submitted to the Graduate Faculty of Rensselaer Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major Subject: Computer Science The original of the complete thesis is on file in the Rensselaer Polytechnic Institute Library Examining Committee: William A. Pearlman, Thesis Adviser Daniel Freedman, Thesis Adviser Mukkai Krishnamoorthy, Member John W. Woods, Member Rensselaer Polytechnic Institute Troy, New York July 2005 (For Graduation August 2005)

4 CONTENTS LIST OF TABLES vii LIST OF FIGURES ix ACKNOWLEDGEMENTS xii ABSTRACT xiii 1. INTRODUCTION Outline of the Thesis EMBEDDED IMAGE CODING EZW Discrete Wavelet Transform Wavelet Transform as a Linear Transformation An Example of Hierarchical Wavelet Transformation Progressive Image Transmission Spatial Orientation Tree Coding a Significance Map Using Zerotrees Coding Algorithm (EZW) Analysis of EZW algorithm SPIHT Introduction Set Partitioning in Trees Coding Algorithm (SPIHT) Analysis of SPIHT algorithm Time Complexity of Hierarchical Wavelet transform QUANTIFYING THE CODING POWER OF A ZEROTREE OF WAVELET COEFFICIENTS Introduction Analysis of EZW and SPIHT algorithm Observation of Zerotrees in the Bitplanes of Wavelet Coefficients Degree-k Zerotree Experimental Analysis iii

5 3.5.1 The Effectiveness of a Higher Degree Zerotree Coder: Existence of Higher Degree Zerotrees Location of Zerotree Root vs. Significance of Bitplane Frequency of Zerotree (especially, degree-1) Occurrences Conclusion AGP (ALPHABET AND GROUP PARTITIONING) Introduction Analysis of Alphabet Partitioning Analysis of Sample-Set Partitioning Example algorithms and their assumptions Example 1: Groups of 2 2 pixels Example 2: Groups of 2 n 2 n pixels Example 3: Groups of hierarchically transformed wavelet coefficients LOW-COMPLEXITY IMAGE CODER : PROGRES Introduction Previous Work and Overview Coefficient Dynamic Ranges Representing the Dynamic Range of Coefficients Coding of Energy Ranges in a Partitioned Set Coding Algorithm Unary Coding The Extended Idea of Dynamic Range Coding Algorithm Description An Example of PROGRES Coding Experimental Results Analysis of PROGRES Algorithm Differences from SPIHT Similarities to SPIHT Entropy coding of dynamic ranges in PROGRES Resolution Scalability and Random Access Decoding Random Access Resolution Scalable Decoding Conclusion iv

6 6. VOLUMETRIC IMAGE CODING BY 3D-PROGRES D-PROGRES Compression results by 3D PROGRES (No tile, no fast random access version) Analysis A max-tree construction Coding a Very Large Image with 3D-PROGRES : A Tiled Version Tile and Block Based Coding Tile and Block Tile and Block Addressing Scheme ROI (Region of Interest) Over Tiles Wavelet Transform in Tiled Coding Resolution Progressive and Random Access Decoding in 3D-PROGRES Resolution Progressive Decoding Random Access Decoding of 3D ROI (Region of Interest) FAST RANDOM ACCESS DECODING Introduction Conventional Random Access Decoding : Linear and Slow Random Access Decoding Based on Image Blocks : Three methods Full Decoding Seek Linear Random Access Decoding Bi-sectional Random Access Decoding Comparison of Three Random Access Decoding Methods Size of a Link Field Example of Bisection Links in Bitstream Performance Analysis By Experiments Comparison of Random Access Decoding Time No Overhead from Encoding the Bi-sectional Links Small Overhead from the Link Information Experimental Results Justifying the Worst Case Performance Conclusion v

7 8. CONCLUSION Contributions of the Thesis Further Study LITERATURE CITED vi

8 LIST OF TABLES 2.1 The frequency bandwidth for each subband after the first wavelet decomposition The frequency bandwidth for each subband after the second wavelet decomposition source shown as digitized numbers wavelet transformed image wavelet transformed image with quantization Reconstructed image shown as digitized numbers Original and reconstructed signal shown in image Distribution of wavelet coefficient magnitudes Example of coded symbols by EZW and SPIHT for degree-1 and degree- 2 zerotree Example of coded bitstream by degree-0, 1, 2 zerotree coders Binary decisions , generated from a d 1 zerotree coded by a d 1 zerotree coder Binary decisions , generated from a d 2 zerotree coded by a d 1 zerotree coder Binary decisions , generated from a d 2 zerotree coded by a d 2 zerotree coder Distribution of both degree-0 and -1 zerotrees in Lena coded by SPIHT Distribution of degree-2 zerotrees in Lena coded by SPIHT Distribution of degree-3 zerotrees in Lena coded by SPIHT Distribution of both degree-0 and -1 zerotrees in Goldhill coded by SPIHT Distribution of degree-2 zerotrees in Goldhill coded by SPIHT Distribution of degree-3 zerotrees in Goldhill coded by SPIHT Saturation of coding efficiency over decomposition levels vii

9 3.13 Distribution of both degree-0 and -1 zerotrees in Lena coded by SPIHT Distribution of degree-2 zerotrees in Lena coded by SPIHT Distribution of degree-3 zerotrees in Lena coded by SPIHT Distribution of both degree-0 and -1 zerotrees in Goldhill coded by SPIHT Distribution of degree-2 zerotrees in Goldhill coded by SPIHT Distribution of degree-3 zerotrees in Goldhill coded by SPIHT Dynamic range of coefficients Step by step demonstration of PROGRES coding (resolution 0 through 3) Step by step demonstration of PROGRES coding (resolution 4) The comparison of coding time among SPIHT, LTW, and PROGRES Decoded qualities of Lena image by SPIHT and PROGRES at progressive resolutions Decoding time of progressive resolutions, coded at 0.5 bpp Entropy rate of d base at each prediction of dynamic range The comparison of coding time betweeen 3D-SPIHT and 3D-PROGRES Comparison of max-tree construction time during encoding The comparison of 3D-PROGRES decoding time for various resolutions: Susie The comparison of 3D-PROGRES decoding time for various resolutions: Chest Best and worst case random access decoding performance Comparison of random access decoding performances viii

10 LIST OF FIGURES 2.1 Source image and wavelet transformed image Hierarchical wavelet transform source image Parent-child relationship in a spatial-orientation tree Parent-child relationship in SPIHT Definition of sets Set partitioning engine Zerotrees on bitplanes A height h, t-ary tree A zerotree height on the bitplane Degree-1 and degree-2 zerotrees Relationship of coding powers among degree-0, 1, 2 zerotree coders A degree-k 2 zerotree coded by degree-k 1 zerotree coders Wavelet coefficient image of Lena, five level decomposition (magnitudes unscaled) Bitplanes 11, 10, 9, 8 for wavelet coefficients of Lena Bitplanes 7, 6, 5, 4 for wavelet coefficients of Lena Bitplanes 3, 2, 1, 0 for wavelet coefficients of Lena Magnified view of the lowest four frequency subbands of wavelet coefficients, Lena Magnified view of the lowest four frequency subbands of bitplanes 11, 10, 9, 8, Lena Overall diagram of alphabet partitioning Dynamic ranges in a spatial orientation tree Coding of Dynamic Ranges ix

11 5.3 Encoding algorithm Extended dynamic ranges in a spatial orientation tree Extended idea of dynamic range coding Extended encoding algorithm quantized wavelet transformed image Coefficients scanning order in PROGRES algorithm Reconstructed Lena by SPIHT Reconstructed Lena by PROGRES Reconstructed Goldhill by SPIHT Reconstructed Goldhill by PROGRES Bitstream structure for simultaneous random access decoding and resolution scalable coding Rearrangement of wavelet coefficients for random access coding Resolution progressive decoding Bitstream structure for resolution scalability of block Random Access and Resolution Scalable Decoding D set partitioning The comparison plotting of coding time betweeen 3D-SPIHT and 3D- PROGRES : Football The comparison plotting of coding time betweeen 3D-SPIHT and 3D- PROGRES : Susie Tiled image encoding and decoding Tiles in the bitstream Block numbering across adjacent tiles ROI (Region of Interest) over tiles ROI and its corresponding blocks over tiles Extension of pixels around decoded blocks in each tile D volumetric view of 3D image source: Susie (720x480x128) x

12 6.11 Susie ROI decoded at quarter resolution (50x25x16) Susie ROI decoded at half resolution (100x50x32) Susie ROI decoded at full resolution (200x100x64), view Susie ROI decoded at full resolution (200x100x64), view D volumetric view of 3D image source: Chest (256x256x64), view D volumetric view of 3D image source: Chest (256x256x64), view Chest ROI decoded at quarter resolution (16x16x16) Chest ROI decoded at half resolution (32x32x32) Chest ROI decoded at full resolution (64x64x64) Encoding of an image volume and its coded bitstream Full decoding seek in the bitstream of image blocks Linear block seek in the bitstream of image blocks Bi-sectional block seek in the bitstream of image blocks Adding the jump targets Size of link fields in bi-sectional method Bi-sectional links in the bitstream containing eight blocks Comparison of target block seek time xi

13 ACKNOWLEDGEMENTS I would like to thank my thesis advisor, Professor William A. Pearlman. Without his trust on me, I could not have accomplished this work. I also want to thank Professors, John W. Woods, Daniel Freedman and Mukkai Krishnamoorthy for serving on my doctoral committee, for their assistance and valuable suggestions. I am fortunate to get involved in Center for Image Processing Research (CIPR) at Rensselaer and to experience the most advanced shape of image and video coding. I owe special thanks to Dr. Amir Said for his initial work of PROGRES algorithm which my work is based on. I thank my colleague students in CIPR Lab. We have encouraged and motivated each other to make re-search s. Most of all, I want to express my gratitude to my family. My lovely wife Jungheoyn, my father Seongkyung Cho, my mother Seongmi Yoon, my brother Deockshin, and my parents-in-law have always encouraged me with love. I am sending a tribute to their supports. xii

14 ABSTRACT Modern image compression methods such as JPEG 2000 are based on the wavelet transform. They provide not only higher compression performance, but also the capability to support various features, such as quality (SNR) scalability, resolution scalability, and region-of-interest encoding and decoding. Quality scalability is commonly achieved via bit-plane coding, which also helps to improve compression, since neighboring bits provide convenient and powerful contexts for entropy coding. However, on many important applications (e.g. digital camera), the images always need to have a pre-defined high quality, and any extra effort required for quality scalability is wasted. Furthermore, for compressing a very large size image source, a low time complexity is often the most desirable characteristic of an image coding algorithm. In this thesis, we consider fast coding methods that support resolution scalability and efficient decoding of a region of interest by random access to the codestream. A resolution scalable and random accessible image coding algorithm, PROGRES (Progressive Resolution Decompression), is designed based on predictive dynamic range coding of wavelet coefficients and without bit-plane coding. Avoiding bitplane coding leads to considerable speed improvement without compromising coding efficiency. The dynamic range of a set of wavelet coefficients is represented as dynamic range number, which gives the number of bits required to represent every magnitude of the coefficient in the set. Based on the assumption of decaying power spectral density, the dynamic ranges of children sets are smaller than that of parent set. We code the amount of this decrease in dynamic range number. In addition, since local neighborhood wavelet coefficients have similar statistics, the partitioned subsets can share the information of decrease in dynamic range. This procedure is hierarchically applied to each coefficient resolution by resolution. Because a decrease in dynamic range between some parent-child coefficients affects not just those children (direct descendants) but all their descendants also, the presented dynamic range xiii

15 coding method efficiently represents the hierarchy of dynamic ranges over a spatial orientation tree. The PROGRES algorithm is designed and implemented for both 2D and 3D image sources. Experiments show that our suggested coding model lessens the computational burden of bit-plane based image coding, both in encoding and decoding time. For faster random access of a target image block in a very large compressed bitstream, an intuitive idea is applied to generate efficient block indices. The idea is based on a popular problem solving strategy, a search space reduction. Given a target block index, a conventional random access decoding method simply configures sequential links, so that an average of n 2 links should be followed to seek the target block out of n blocks in the image bitstream. The presented method applies the bi-section idea to narrow down the search space exponentially. This method has the worst case seek time complexity of O(log 2 n), which is a great improvement over O(n) of the standard linear seek method. The PROGRES algorithm combined with the fast random access decoding method is suitable for browsing a very large image bitstream. It can seek the requested part in the bitstream very quickly, and then decode them upto desired resolution at high speed. In related work, we introduce the concept of higher order zerotrees in modern wavelet-based coders and quantify their relative coding power. By analyzing two famous zerotree-based image coders, EZW and SPIHT, we are able to explain the superior coding efficiency of SPIHT through its ability to code higher order zerotrees than EZW. We are also able to calculate the bit savings of SPIHT compared to EZW within this framework. xiv

16 CHAPTER 1 INTRODUCTION Modern image compression technology has changed its basis to wavelets [1][2]. Wavelet transforms almost de-correlate many images over frequency subbands. However, significant statistical dependencies still exist in the wavelet domain, especially with natural images. Inter-subband and intra-subband dependencies are among those [3][4]. The inter-subband dependency can be briefly explained by the similarity of wavelet filter response throughout the entire subbands spaces. The intra-subband correlation means that the neighboring wavelet coefficients shows very similar statistics since they come from adjacent source signals [5]. A hierarchical wavelet transform enables compact representation of the source energy in the sense that most of the source energy is concentrated in relatively few wavelet coefficients of low frequency subband. This energy compaction property makes efficient low bit rate image compression successful [6][7][8]. Based on the selection of wavelet filter, a lossy or lossless image compression is possible [9][10]. Most popular wavelet filter sets for lossy image compression is Daubechies 9/7 filter [10]. Usually, a lossy image compression quantize the wavelet transformed image and then code the bins of quantized wavelet coefficients. Each subband can be quantized with different step size depending on its importance in reconstructed image quality. Otherwise, each bitplane of wavelet transformed image is coded successively [11]. A losselss image compression uses the reversible filters such as 5/3 filters [10] to perfectly reconstruct a source image. The hierarchical wavelet transform can be applied to more than two dimensional sources. 3D subband coding [12] or multispectral coding are example applications. Among many research works on wavelet-based image coding, EZW (Embedded Zerotree Wavelet) introduced by Jerome Shapiro in the 1992 and 1993 articles, An Embedded Wavelet Hierarchical Image Coder [13] and Embedded Image Coding Using Zerotrees of Wavelet Coefficients [14], would be a first step stone to modern wavelet based image coding. It outperformed JPEG at low bit rates and produced 1

17 2 naturally progressive quality encoding and decoding, which is a very important feature in many applications. As mentioned in the article [14], the idea of EZW is based upon four key concepts: 1) hierarchical subband decomposition by a discrete wavelet transform, 2) prediction of significance across the scales by exploiting the self-similarity inherent in images, 3) entropy-coded successive-approximation quantization, and 4) adaptive arithmetic coding of symbols. A lot of research effort has been performed to improve the performance of EZW. The most remarkable improvement among them would be SPIHT, invented by Amir Said and William Pearlman, in 1993 and 1996 articles, Image Compression Using the Spatial-orientation Tree [15] and Set Partitioning In Hierarchical Trees (SPIHT) [16]. In SPIHT, more zerotrees are efficiently found and represented by separating the tree root from the tree, i.e. zerotree with its root significant coefficient. Being independent of the significance of root coefficient, the significance of its entire descendants is represented by one symbol (or one bit in non-entropy coding mode). The idea of SPIHT is extended in many ways. Danyali and Mertins [17] presented FS-SPIHT (Fully Scalable SPIHT) which support both resolution and SNR scalability. They provide a bitstream that can be easily adapted (reordered) to a given bandwidth and resolution requirements by a simple parser. It adds the feature of spatial scalability without sacrificing the SNR (or rate, quality) embeddedness property. Three dimensional implementation of SPIHT, 3D SPIHT, is presented by Kim and Pearlman [18]. He et al. [19] presented an asymmetric 3D tree structure to better define the 3D zerotrees in decoupled 3D wavelet transformed image. Their idea is demonstrated with 3D SPIHT and showed improvements in coding efficiency. The concept of zerotree is studied and adopted in many places. MPEG-4 VTC (Visual Texture Coding) uses Multiscale Zerotree Entropy wavelet coding (MZTE). Lee et al. [20] presented R-D (Rate-Distortion) based bit-allocation method for zerotree coding in MPEG-4 VTC. Data dependent zerotree scheme is proposed by Effros [21], which introduces Weighted Universal Zerotree Code (WUZC) to better represent the given data with modified zerotrees and further improve the

18 3 compression rate. The idea of EZW has also been revisited recently by Dilmaghani et al. [22] for progressive medical image transmission and compression. While EZW and SPIHT represent initiative zerotree based image coder, SPECK (Set Partitioning Embedded block) algorithm proposed by Islam and Pearlman [23][24] represents another branch of wavelet based image compression. SPECK is ZeroBlock based image coder since it codes groups of zeros (usually on a bitplane) very efficiently. Variants of the SPECK algorithms were implemented such as SBHP (Subband Hierarchical Block Partitioning) by Chrysafis et al. [25] and EZBC (Embedded ZeroBlocks Coding) by Tsiang and Woods [26][27]. SBHP is the form of SPECK incorporated into the JPEG 2000 under development. EZBC exploited the dependence among quadtree representations of subbands and sophisticated contextbased arithmetic coding to improve the coding efficiency. Most recently, Xie et al. [28], enabled SPECK to have full scalability, based on the idea of quality layer formation similar to PCRD (Post Compression Rate-Distortion) in JPEG 2000 [29][30]. In this thesis, three ideas are presented in the context of modern wavelet based image coding. First, we address why different zerotree coding schemes give different compression performance. The reason for different zerotree coding performance of two most popular wavelet-based image coding schemes, EZW and SPIHT, was not clearly explained in any literature. By classifying the zerotrees and defining a model of zerotrees, we try to formulate the entropy coding power of the zerotree. The difference of zerotree coding power between EZW and SPIHT is discussed with this new paradigm. Secondly, we try to reduce the compression time of modern wavelet based image coder. Popular embedded image coding algorithms, such as EZW, SPIHT, SPECK, EZBC, and JPEG 2000 (EBCOT) [29][30] are based on bit-plane coding. While bit-plane coding enables precise rate control of image compression, it causes overhead in computation since only one bit of every coefficient being coded is actually necessary for its significance decision. Thus, in modern computers which have byteaddressable CPUs [31], all remaining bits except at one bit are the culprits of waste of computing resources. We present a method to reduce the time complexity of set-partitioning based

19 4 image coding algorithm by not using the bitplane based approach. A two-stage dynamic range coding method, which encodes the decrease in dynamic range over two consecutive frequency subbands, is invented. The presented algorithm has a resolution-scalable, random-access-decodable feature and is named PROGRES (Progressive Resolution Decompression). We also extend the PROGRES algorithm for 2D source image to a 3D image case. Finally, we present a fast random access decoding method, which uses the idea of bi-section. Random access decoding is a scheme for extracting the target image information from the bitstream with minimum decoding work. It is usually required in interactive image browsing systems, where users will first browse on coarse resolution images, then will probably look into details of some parts of the images according to their interests. However, not much work has been done for improving random access decoding. Conventional random access decoding methods in JPEG 2000 [32, 33] can simply use a map of indices or links to the blocks ( code-blocks in JPEG 2000 or EBCOT notation). To find the interested code-block (or the larger, precinct, tile), the decoder should look up all the indices upto the target block. Consequently, its block seek time depends entirely on the location of a block within a bitstream. Coupled with ROI (Region of Interest) decoding, the proposed fast random access decoding capability gives efficient image retrieval from a very large image bitstream 1.1 Outline of the Thesis In Chapter 2, fundamentals of hierarchical wavelet based image compression scheme is introduced and then two representative algorithms EZW and SPIHT are reviewed and analyzed. The EZW algorithm, which had achieved a breakthrough in a wavelet based image compression algorithm, is explained and analyzed as to why it gives efficient representation of hierarchically transformed wavelet coefficients. The SPIHT algorithm, so popular as an image compression benchmark, is analyzed giving the difference from EZW. In Chapter 3, a new framework to explain the coding power of various zerotreebased image coding algorithm is established. The degree-k zerotree model indicates

20 5 upto which level the zero symbols are filled from the bottom level of the tree to be coded. The difference of coding power between well-known EZW and SPIHT is explained based on this degree-k zerotree framework. This also tells why the list processing for insignificant sets and pixels in SPIHT is not very simple. In Chapter 4, Alphabet and Group Partitioning (AGP) algorithm is reviewed and analyzed. Two parts, alphabet partitioning and sample-set partitioning, of the algorithm are explained. The strategy for choosing the best alphabet partitions is stated from the viewpoint of information theory. The set number and set index in AGP closely relates to the idea of dynamic range number in the next chapter. In Chapter 5, a very fast, low complexity algorithm, PROGRES (Progressive Resolution Decompression), for resolution scalable and random access decoding is presented. The algorithm is based on set-partitioning and non bit-plane coding. Our main goal is to design a high-speed coder by reducing the time complexity of a compression algorithm. Here, a two-stage prediction of the dynamic range of coefficients magnitudes is performed. The dynamic range of coefficients magnitudes in a spatial orientation tree is hierarchically represented. The hierarchical dynamic range coding naturally enables resolution scalable representation of wavelet transformed coefficients. A zerotree representation is also implicitly exploited by our dynamic range coding scheme. Experiments show that our suggested coding model lessens the computational burden of bit-plane based image coding, three times in encoding and six times in decoding. We also show the statistical analysis for its superiority based on known distribution of wavelet coefficients across the bit-planes. In Chapter 6, the idea of 2D PROGRES algorithm in Chapter 5 is extended to 3D case. Its performance in coding speed is compared to 3D SPIHT and results are shown. Random access decoding of ROIs (Region of Interest) with resolution scalability is demonstrated. Also, for the application with very large scale images, a tiled coding scheme for a 3D image is also discussed. In Chapter 7, a fast and constant time random access decoding method is presented. For faster random access of a target image block, a bi-section idea is applied to link image blocks. Conventional methods configure the blocks in linearly

21 6 linked way, where a block seek time entirely depends on the location of the block on the compressed bitstream. The block linkage information is configured such that binary search is possible, giving the worst case block seek time of log 2 n, for n blocks. Experimental results with 3D-SPIHT on video sequences show that the presented idea gives substantial speed improvement with minimal bit overhead. Finally, Chapter 8 concludes this thesis with discussing the overall conclusion and further works.

22 CHAPTER 2 EMBEDDED IMAGE CODING In this chapter, fundamentals of modern hierarchical wavelet based image compression scheme is briefly introduced and then EZW and SPIHT algorithms are reviewed and analyzed. Image coders such as JPEG 2000, SPECK use intra subband correlation, and EZBC (Embedded ZeroBlocks Coding) uses both inter and intra subband correlations. In JPEG 2000 (or EBCOT), each subband is partitioned into fixed size code-blocks and each code-block again splits into sub-blocks followed by sample-bysample coding. In SPECK and EZBC, each whole subband is efficiently mapped onto a quadtree, locating the positions of significant wavelet coefficients at a given bitplane or threshold. Meanwhile, EZW and SPIHT is inter-subband coder. The context-based entropy coding exploits the existing intra-subband correlation in both coders. 2.1 EZW EZW stands for Embedded Zerotree Wavelet, which is abbreviated from the title of Jerome Shapiro s 1993 article [14], Embedded Image Coding Using Zerotrees of Wavelet Coefficients. The EZW algorithm views wavelet coefficients as a collection of spatial orientation trees. Each tree consists of coefficients from all subbands (both frequency and orientation) that correspond to the same spatial area in an image. The algorithm codes the wavelet coefficients with the largest magnitudes first. Given a threshold, a coefficient is classified (sorted) as significant if its magnitude is greater than or equal to the threshold. Otherwise, it is insignificant. A tree is significant if the largest coefficient magnitude in the tree is greater than or equal to the threshold. Otherwise, the tree is insignificant. For each pass of the algorithm, the threshold value is halved from the previous pass, which enables larger (and thus significant) coefficients to be transmitted first. In each pass, the significance of sets in lower frequency subband is tested first. If 7

23 8 Figure 2.1: a) Original Lena image in grey level, size 512x512 and b) A three level 2-D wavelet transform of Lena the set is insignificant, a zerotree symbol is used to represent that all coefficients in the set are zeros. Otherwise, the set is partitioned into four subsets for further significance tests in the same pass. After all sets and coefficients are tested the current pass ends. EZW coding is based on the hypothesis that most natural images will have fast decaying power spectral density. That is, if a wavelet coefficient in a lower frequency subband is small, it is very likely that its descendant coefficients in higher frequency subbands are also small. In other words, in the viewpoint of thresholding the coefficients, if a parent wavelet coefficient is insignificant, it is very likely that its descendants are also insignificant. Now, the zerotree symbol will efficiently represent that all coefficients in a spatial orientation tree are insignificant. EZW coding can be thought of as a bit-plane coding if the thresholds are powers of two. It encodes one bit-plane at a time, starting from the MSB (most significant bit). With successive bit-plane coding and scanning the trees from lower to higher frequency subbands on each bit-plane, EZW achieves embedded coding. The bit-mapped position information of significant coefficients at each threshold is called the significance map. And the successive passes of sorting (Dominant) and refinement (Subordinate) with decreasing threshold is called successive quantization approximation.

24 9 As mentioned in the article [14], the idea of EZW is based upon four key concepts: 1) hierarchical subband decomposition by a discrete wavelet transform, 2) prediction of significance across the scales by exploiting the self-similarity inherent in images, 3) entropy-coded successive-approximation quantization, and 4) adaptive arithmetic coding of symbols Discrete Wavelet Transform The hierarchical wavelet transform used in the EZW and SPIHT is equal to a hierarchical subband decomposition system, where the subbands are logarithmically spaced in frequency. An example of two-level wavelet decomposition of 2D source image is shown in Figure 2.2. The image is first divided into four subbands by applying horizontal filtering (Figure 2.2 (b)) and vertical filtering (Figure 2.2 (c)). And then it is subsampled to obtain the subband layout in Figure 2.2 (c). Horizontal Filtering Vertical Filtering 2nd Decomposition LL 2 HL 2 LL 1 HL 1 HL 1 Source Image L H LH 2 HH 2 LH 1 HH 1 LH 1 HH 1 (a) (b) (c) (d) Figure 2.2: Hierarchical wavelet transform: (a) Source image, (b) After horizontal filtering, (c) After vertical filtering, (d) After 2nd decomposition In Figure 2.2 (c), each coefficient corresponds to the spatial area of approximately 2 2 pixels in original input image. After the first decomposition as in Figure 2.2 (c), the three subbands, HL 1, LH 1, and HH 1 are considered as high frequency subbands having three different orientations, vertical, horizontal, and diagonal, respectively. Let ω h and ω v be horizontal and vertical frequency, respectively. The frequency bandwidth for each subband after the first wavelet decomposition is described in Table 2.1. The HL 1, LH 1, and HH 1 subbands have the finest scale wavelet coefficients [34].

25 10 Table 2.1: The frequency bandwidth for each subband after the first wavelet decomposition, Figure 2.2 (c) subband horizontal frequency vertical frequency LL 1 0 ω h < π 2 0 ω v < π 2 π HL 1 2 h < π 0 ω v < π 2 LH 1 0 ω h < π π 2 2 v < π π HH 1 π 2 h < π 2 v < π Using the same filters (low-pass and high-pass), the subband LL 1 (Figure 2.2 (c)) after the first decomposition is decomposed again and sampled to obtain a next coarser scale of wavelet coefficients, as shown in Figure 2.2 (d). Again, the frequency bandwidth for each subband after the second wavelet decomposition is described in Table 2.2. Table 2.2: The frequency bandwidth for each subband after the second wavelet decomposition, Figure 2.2 (d) subband horizontal frequency vertical frequency LL 2 0 ω h < π 4 0 ω v < π 4 π HL 2 4 h < π 0 ω 2 v < π 4 LH 2 0 ω h < π π 4 4 v < π 2 π HH 2 4 h < π π 2 4 v < π 2 π HL 1 2 h < π 0 ω v < π 2 LH 1 0 ω h < π π 2 2 v < π π HH 1 π 2 h < π 2 v < π Wavelet Transform as a Linear Transformation We can represent the above wavelet transform as a linear transformation [4]. Let p be a column vector whose elements represents a scan of image pixels. Also, let c be a column vector whose elements represents wavelet coefficients obtained by discrete wavelet transform applied to p. Then, viewing the wavelet transforms as a matrix W with its rows as basis functions of transformation, c can be represented by a linear transformation of p by a matrix W : c = W p.

26 11 The p can be obtained by a inverse wavelet transformation, which also can be represented by: p = W 1 c. If the transform W is orthogonal, then W 1 = W T, thus p = W T c. In reality, the wavelet transform W is biorthogonal, thus only approximately orthogonal An Example of Hierarchical Wavelet Transformation An example of hierarchical wavelet transformation is demonstrated in this section. The source image with its size and its corresponding pixel values are shown in Figure 2.3 and Table 2.3, respectively. A four level wavelet decomposition is applied to the source image. The Daubechies biorthogonal 9/7 wavelet filter sets [10] are used. Table 2.4 shows the resulting transformed image or transform coefficients without quantization but truncated to integers. Note that most of large magnitude coefficients are concentrated at low frequency subbands and a lot of coefficients have small magnitudes. The energy compaction property of wavelet transform is well observed in this example. Now, Table 2.5 shows the quantized transformed image. Where the quantization is done only for a first level of decomposition. A scaling factor of 0.25 is multiplied to every wavelet filter coefficient and then low-pass and high-pass filter sets are applied to the image source. The corresponding quantization step size is 16 in this case. After quantization, most of the coefficients in the highest frequency subbands became zero, where a compression is obtained. The reconstruced image and its pixel values by inverse wavelet transformation are also shown in Figure 2.7 (b) and Table 2.6, respectively. Because of the quantization, the reconstruction is lossy.

12 Figure 2.3: 16 16 source image: Magnified.

27 12 Figure 2.3: source image: Magnified. Extracted from the location (256,256) of Lena image Table 2.3: source shown as digitized numbers Progressive Image Transmission Let Ω( ) be an orthogonal hierarchical subband transformation. And let P be a matrix of scannings of pixels, p i,j, where (i, j) is the coordinate of the pixel and let C be the matrix of transformed coefficients. Then, C = Ω(P ) and C is what we want to code. In fully embedded coding, the decoder initially sets the reconstruction matrix

28 13 Table 2.4: wavelet transformed image (four level decomposition, with no quantization, truncated to integer) Table 2.5: wavelet transformed image quantized (with step size 16) at the first level of decomposition, truncated to integer. Ĉ as zeros and incrementally updates its elements as each bit is received. decoder obtains a reconstructed image: The 1 ˆP = Ω (Ĉ). The constraint in progressive transmission is to transmit the most important information first, which yields the largest distortion reduction. Thus, the major

29 Table 2.6: Reconstructed image shown as digitized numbers by Inverse hierarchical wavelet transform (a) (b) Table 2.7: (a) Original and (b) Reconstructed image: Magnified. Corresponds to the location (256,256) of Lena image. Compressed size = 44 bytes, compression rate is (44 bytes 8 bits)/(16 16) pixels = bpp. issue is to choose the most important information in the transformed source C. The distortion metric, MSE (Mean Squared Error), is used for measuring the importance. D MSE (P ˆP ) = P ˆP N where N is the number of source image pixels. = 1 (p i,j ˆp i,j ) 2, N i,j

30 15 Now, we can use the fact that Euclidean norm is preserved in the orthogonal transformation Ω, giving: D MSE (P ˆP ) = D MSE (C Ĉ) = 1 N (c i,j ĉ i,j ) 2. i,j From above equation, it is clear that the more accurate value has the decoder for c i,j, the less distortion the reconstructed image will have. And the distortion D MSE (C Ĉ) is decreased by the amount of 1 ĉ N i,j 2. Consequently, this urges us to transmit the larger coefficients first, since they will contribute more to decrease the distortion in reconstructed image [8]. Furthermore, if we view the c i,j in binary representation, the information in c i,j can be transmitted also progressively. In other words, the MSB (Most Significant Bit) is the most important, so is transmitted first, and the LSB (Least Significant Bit) is the least important, so is transmitted last Spatial Orientation Tree The parent-child relationships in the spatial-orientation trees are shown in Figure 2.4. Except in the LL subband and highest subbands (HL 1, LH 1, and HH 1 ), each coefficient has four children. The coefficient in LL subband has three children, each in HL 1, LH 1, and HH 1 relatively. The three dashed lines in 2.4 illustrate these three children relationship. The spatial orientation tree is a group of wavelet coefficients having the same frequency orientation, where the coefficients are arranged in the order of frequency bandwidth. One of the most important characteristics of the spatial orientation tree is that it corresponds to the same spatial area of the source image that is wavelet transformed Coding a Significance Map Using Zerotrees In EZW algorithm, a significance map is defined as the bitmap representing whether each wavelet coefficient is greater than or equal to a given threshold T, i.e. whether zero or nonzero quantized value with quantization step size T. The EZW encoder s work now is to represent the significance map corresponding to a

31 16 a spatial orientation tree : a coefficient LL HL3 HL2 LH3 HH3 HL1 LH2 HH2 LH1 HH1 Figure 2.4: Parent-child relationship in a spatial-orientation tree threshold. This encoding job is equivalent to coding the positions of significant bits in each bitplane if we assume that the initial bitplane is filled with zeros. This work is repeated for each bitplane, from MSB to LSB. Viewing this work as an entropy coding, the best idea is to find and utilize underlying sturctures in the given data, i.e. positions of the significance bits. Basically, the decision of every branch through the spatial orientation tree is coded, i.e. given the bit in a subband k, we need to decide how to code its corresponding children bits in subband k 1. As discussed before, there is an inter-dependence between adjacent subbands in the same spatial orientation, i.e. in the same spatial location and orientation. If coefficient in parent subband is insignificant, it is very likely that its children coefficients are also insignificant. This is well supported by another mostly true hypothesis that the wavelet transformed natural images have decaying spectral power density.

32 17 This idea of parent-child relationship is applied to every bitplane of coefficients. In the EZW, the significance of a coefficient is determined with respect to a given threshold. The threshold is decreased for each coding pass. The thresholed coefficients are viewed as a bitplane of binary values, 0 indicating insignificant and 1 indicating significant. We define a zerotree as a tree consisting of all insignificant coefficients of the same orientation in the same spatial location, with respect to a certain threshold. Thus, viewing thresholded coefficients as a bitplane, a zerotree is simply a tree consisting of all zero bits. Supported by the above hypothesis, which is almost always true, there are a lot of zerotrees found in each bitplane. Note that the zerotree is found on a bitplane. Given the coded (known) position in the bitplane, once a zerotree structure rooted at that position is found, only one bit is required to represent it since we already know the position of every descendant of the root Coding Algorithm (EZW) Two lists, dominant list DL and subordinate list SL are used to keep track of the coordinates of the coefficients to be coded and the coefficients which are already coded as significant in one of previous dominant passes, respectively. The c max is defined as the absolute value of the maximum of wavelet coefficients. 1. Initialization : (a) Append all wavelet coefficients to the dominant list, DL. (b) Set initial threshold as T = 2 log 2 cmax. 2. Dominant Pass: (a) By the scanning order, test each coefficient c in DL using current threshold T and assign one of four symbols to each coefficient: i. Positive Significant (PS) : If the coefficient c is significant according to current threshold T and positive.

33 18 ii. Negative Significant (NS) : If the coefficient c is significant according to current threshold T and negative. iii. Isolated Zero (IZ) : If the coefficient c is insignificant according to current threshold T and one or more of its descendants significant. iv. Zerotree root (ZTR) : If current coefficient c and all of its descendants are insignificant (zero) according to current threshold T. (b) Any coefficient that is already known as a descendant of a zerotree in previous pass will not be coded. (c) The significant coefficients (PS,NS) are appended to SL and their values in the bitplane are set as zero. (d) Output the symbol assigned to the coefficient c. 3. Subordinate Pass: (a) Output 0 or 1 for each coefficient c in SL depending on the significance of c corresponding to current threshold T. 4. Quantization (a) Halve the current threshold T, i.e. T = T/2. (b) Goto Step 2. The dominant pass and subordinate pass are repeated until the target bitrate or desired image quality is acquired. During any of two passes, the coding process can be stopped if desired condition is met Analysis of EZW algorithm In each dominant pass of EZW algorithm, once a coefficient is tested and known as significant according to certain threshold, the coefficient will not be met again for further dominant pass since the coefficient is appended to the subordinate list. Table 2.8 shows the distribution of thresholded wavelet coefficients of Lena with five levels of decomposition. we can observe two interesting facts. First, most

34 19 of the coefficients found in lower frequency subbands have large magnitudes, while most of the coefficients found in higher frequency subbands have relatively smaller magnitudes. Also, a lot of zero coefficients are found in the highest frequency subbands. In the viewpoint of bitplane coding, where the thresholds are powers of two, this means that the bits on the higher bitplanes are mostly coded in lower frequency subbands and the bits on the lower bitplanes are mostly coded in lower frequency subbands. Second, fewer coefficients are significant corresponding to the higher thresholds than those those significant coefficients corresponding to the lower thresholds. As an example, only two coefficients are significant with threshold 2 11, but 12,791 coefficients are significant with threshold 2 0. Table 2.8: Distribution of wavelet coefficient magnitudes along thresholds and frequency level (5: the lowest, 0: the highest): Lena ( ) [23]. The peak value in each frequency level is bold marked. Low Frequency level High Threshold 2 n Magnitude range [2 11,, (2 12 1)] [2 10,, (2 11 1)] [2 9,, (2 10 1)] [2 8,, (2 9 1)] [2 7,, (2 8 1)] [2 6,, (2 7 1)] [2 5,, (2 6 1)] [2 4,, (2 5 1)] [2 3,, (2 4 1)] [4,5,6,7] [2,3] [1] all zeros [0]

35 SPIHT Introduction More improvements over EZW are achieved by SPIHT, by Amir Said and William Pearlman, in 1996 article, Set Partitioning In Hierarchical Trees [16]. In this method, more (wide-sense) zerotrees are efficiently found and represented by separating the tree root from the tree, i.e. zerotree with its root significant coefficient. Separate from the significance of a certain coefficient, the significance of its descendants is represented by one symbol (or one bit in non-entropy coding mode). SPIHT is computationally very fast and among the best image compression algorithms known today. The set partitioning in trees working on each bitplane is actually doing the role of the entropy coder, which gives very little loss of coding efficiency even without using popular (adaptive) arithmetic coding. It is frequently chosen as a benchmark on the performance in evaluation of state-of-art image compression algorithms Set Partitioning in Trees In order to reduce the number of decisions in bit comparisons, the set partitioning rule is defined using an expected ordering in the hierarchy implied by the subband pyramid. The natural objective here is to derive new partitions such that those expected to be insignificant contain a large number of elements (i.e. bits in a zerotree), and others expected to be significant contain only one element. The parent-child relationships of SPIHT in the spatial-orientation trees are shown in Figure 2.5. Note that 2 2 adjacent coefficients are processed together to exploit the local statistics (usually by entropy coding), which is different from that of EZW. Another difference from EZW is that each coefficient has four children except the * marked coefficients in the LL subband and the coefficients in the highest subbands (HL 1, LH 1, HH 1 ). The following set of coordinates of coefficients are used to represent set partitioning method in SPIHT algorithm. The location of coefficient is notated by (i, j), where i and j indicate row and column indices, respectively.

36 21 a set or a spatial orientation tree : a coefficient * LL HL2 HL1 LH2 HH2 LH1 HH1 Figure 2.5: Parent-child relationship in SPIHT H : Roots of the all spatial orientation trees O(i, j) : Set of offspring of the coefficient (i, j), D(i, j) : Set of all descendants of the coefficient (i, j), L(i, j) : D(i, j) O(i, j) Figure 2.6 shows the definition of the sets in a spatial orientation tree. A significance function S n (τ) which decides the significance of the set of coordinates, τ, with respect to the threshold 2 n is defined by: 1, if max (i,j) τ { c i,j } 2 n S n (τ) = 0, else The offspring set O(i, j) is actually defined as:. (2.1) O(i, j) = {(2i, 2j), (2i, 2j + 1), (2i + 1, 2j), (2i + 1, 2j + 1)}, except (i, j) is in LL

37 22 c i,j O(i,j) D(i,j) LL (i,j) LH3 HH3 HL2 HL1 LH2 HH2 L(i,j) LH1 HH1 Figure 2.6: Definition of D(i, j), O(i, j), and L(i, j) in set partitioning algorithm subband and in the highest frequency subbands HL 1, LH 1, and HH 1, where there are no offspring. When (i,j) is in LL subband, O(i, j) is defined as: O(i, j) = {(i, j + w LL ), (i + h LL, j), (i + h LL, j + w LL )}, where w LL and h LL is the width and height of the LL subband, respectively. The D(i, j) is recursively defined as: D(i, j) = O(i, j) + {D(2i, 2j), D(2i, 2j + 1), D(2i + 1, 2j), D(2i + 1, 2j + 1)} = {(2i, 2j), (2i, 2j + 1), (2i + 1, 2j), (2i + 1, 2j + 1)} +{D(2i, 2j), D(2i, 2j + 1), D(2i + 1, 2j), D(2i + 1, 2j + 1)}

38 23 The L(i, j) is defined as: L(i, j) = D(i, j) O(i, j) = {D(2i, 2j), D(2i, 2j + 1), D(2i + 1, 2j), D(2i + 1, 2j + 1)} The set partitioning rules are simply defined as follows: 1. The initial partition is formed with the sets {(i, j) (i, j) H} and {D(i, j) D(i, j) φ, (i, j) H}. 2. If D(i, j) is significant, then it is partitioned into two parts : 1) four singleelement sets indexed by (k, l) O(i, j) and 2) L(i, j). 3. If L(i, j) is significant, then it is partitioned into four sets D(k, l), (k, l) O(i, j) Coding Algorithm (SPIHT) In the algorithm, three ordered lists are used to store the significance information during set partitioning. List of insignificant sets (LIS), list of insignificant pixels (LIP ), and list of significant pixels (LSP ) are those three lists. Note that the term pixel is actually indicating wavelet coefficient if the set partitioning algorithm is applied to a wavelet transformed image. Each entry in three lists is identified by its coordinate (i, j), which represents individual coefficient in LIP and LSP and a set of coefficients in LIS. The sets in LIS is either D(i, j) or L(i, j), which are identified by type A and type B, respectively. The algorithm consists of four parts. First, find the maximum threshold in log 2 scale, initialize both LIP and LIS with all coordinates (i, j) of coefficients in H. Note that each entry in LIP indicates a single coefficient, while each entry in LIS indicates type-a D(i, j), i.e. a set of all descendants indexed by the root coordinate (i,j). Second, in the sorting pass, the coefficients in LIP (which are there because they are classified as insignificant in the previous pass) are tested for significance.

39 24 Those tested as significant are moved to LSP. And then, the sets in LIS are tested similarly. In this case, if a set D(i, j) is tested as significant, then it is partitioned into subsets and removed from the list. If a new subset has more than one element (i.e. coefficient) in it, it is appended to end of the LIS. Otherwise, if the subset has one element, it is appended to the end of LIP or LSP depending on its significance. Note that one bit is output whenever a significance test is performed or a coefficient is added to LSP. Third, in the refinement pass, the significance (0 or 1 in current bitplane) of all coefficients indexed by their coordinates in LSP at current threshold is output. The above two stages (i.e. the second and the third), are repeated with decreasing threshold. This can be understood as the quantization step size decreasing for each succeeding bitplane. In fact, with the threshold value as power of two, each bitplane from MSB to LSB is coded sequentially. The detailed SPIHT algorithm is shown as following. Algorithm : SPIHT 1) Initialization: 1. output n = log 2 (max (i,j) { c i,j }) ; 2. set LSP = φ; 3. set LIP = (i, j) H; 4. set LIS = (i, j) H, where D(i, j) φ and set each entry in LIS as type A ; 2) Sorting Pass: 1. for each (i, j) LIP do: (a) output S n (i, j) (b) if S n (i, j) = 1 then move (i, j) to LSP and output sign(c i,j ) 2. for each (i, j) LIS do: (a) if (i, j) is type A then

40 25 i. output S n (D(i, j)) ii. if S n (D(i, j)) = 1 then A. for each (k, l) O(i, j) output S n (k, l) if S n (k, l) = 1 then append (k, l) to LSP, output sign(c k,l ), and c k,l = c k,l 2 n sign(c k,l ) else append (k, l) to LIP B. move (i, j) to the end of LIS as type B (b) if (i, j) is type B then i. output S n (L(i, j)) ii. if S n (L(i, j)) = 1 then append each (k, l) O(i, j) to the end of LIS as type A remove (i, j) from LSP 3) Refinement Pass: 1. for each (i, j) in LSP, except those included in the last sorting pass output the n-th MSB of c i,j ; 4) Quantization Pass: 1. decrement n by 1 2. goto step 2) The compact illustration of set partitioning loop is shown in Figure 2.7. Note that the input of the set partitioning engine is only coming from the LIS, while the output of the set partitioning engine is going to all three lists, LIS, LIP, and LSP. The other data flow is from LIP to LSP, which is not managed by set partitioning engine.

41 26 LIP LSP LIS Set Partitioning Figure 2.7: Set partitioning engine: partitioned subsets are moved to LIS again or added to LSP or LIP, if they are significant or insignificant coefficients, respectively. (Courtesy of Dr. James Fowler, [2]) Analysis of SPIHT algorithm In EZW, the two symbols, PS and NS (Positive Significant and Negative Significant, respectively) represent whether a certain coefficient is significant, i.e. above given threshold. Once the coefficient is coded by one of these symbols, its four branched descendants also should be probed for their significance separately. Thus, four symbols necessarily follow, even though all of four branched descendants are insignificant. SPIHT has a special syntax to represent this context: four branched descendants are all insignificant conditioned that the root node is significant. And the syntax requires only one symbol, or one bit (output 0 for D(i, j) in the original article). Another rather complicated syntax is to represent a tree with all of its descendants except the offspring and the root node are zeros. This also takes only one bit, 0 (output 0 for L(i, j) in the original article). Here is the room that EZW leaves for further compression, which was first discovered by the SPIHT algorithm.

42 Time Complexity of Hierarchical Wavelet transform In this section, we will formulate the time complexity of hierarchical wavelet transform with respect to an input size, the number of pixels. Assume we have square image, and let n be the number of pixels, then n is both the width and height of the image. Dyadic wavelet transform over size n n image requires Mn operations except load and store operations, where M is the number of multiplications performed for low pass and high pass filters, respectively. Then, for example, M would be 16 (9 multiplications for low and 7 multiplications for high) if the Daubechies 9/7 filter set was chosen. If we perform L levels of decomposition, the number of total operations W T n (n is the number of pixels) will be: W T n = L M n 4 = Mn L 1 k 1 4 = ( 1 1) 4 L k Mn = 4 3 (1 1 4 )Mn. 4 L k=1 k=1 With L, W T n = 4 Mn = O(n), which is linear to the number of pixels n. 3 (Actually, the largest L is simply log 2 ( n) ). In fact, once the same wavelet transform filter and decomposition scheme is used, the amount of computation from hierarchical wavelet transform will be the same for the given source image. Thus, separating it from wavelet transform, we can compare the performances of coding algorithms only.

43 CHAPTER 3 QUANTIFYING THE CODING POWER OF A ZEROTREE OF WAVELET COEFFICIENTS In this chapter, a degree-k zerotree model is presented, in order to quantify the coding power of zerotrees in wavelet-based image coding. Based on the model, the coding behavior of modern zerotree based image coder is clearly explained. Also, we explain why the well-known SPIHT algorithm can code a wider range of zerotrees than EZW can do. An experimental result supports our idea that higher degree zerotree coder will have more coding power. 3.1 Introduction The analysis of popular wavelet image coders, EZW [14] and SPIHT [16], is performed. While the reason for different zerotree coding performance of the two schemes in terms of zerotree was not clearly stated in any literature, we establish a framework to explain it formally. While the most representative works based on zerotree are EZW and SPIHT, there are many variants of it such as the work by Davis [35] and a zerotree concept is also applied to video coders, such as the work by Martucci et al. [36] as well as image coders. Their zerotree was 2D zerotree to code each I-frame or P-frame (residue frame). Ramaswamy et al. [37] presented a criteria of cumulative zerotree count to analyze the performance of the SPIHT coder. However, the purpose of it was to evaluate the wavelet filters in the SPIHT algorithm. Moreover, only the number of zerotrees was measured and the height of the zerotree was not considered. Both EZW and SPIHT use the idea of decaying spectral power density and successive quantization approximation. For each coding pass of these algorithms, a significance map is constructed, which contains the significance information of every coefficient for a given threshold. The threshold decreases successively for each new pass, enabling the more important coefficients to be coded first. 28

44 29 Zerotrees on the bitplanes are shown in Figure 3.1. The bitplanes here indicate those of wavelet coefficients obtained from wavelet decomposition. Each bitplane-i is also equivalent to the significance map with threshold value 2 i. In the figure, there are 13 bitplanes shown, where bitplane 0 is the least significant bitplane (LSB) and bitplane 0 is the most significant bitplane (MSB). The two least significant bitplanes 0,1 are not coded in this case. The darker parts indicate zeros, and the brighter parts indicate non-zeros. Note that each zerotree is always defined on one bitplane. In other words, the zerotree is not defined across two or more bitplanes. Since most of the energy is concentrated in the lower frequency subbands, the large magnitude wavelet coefficients are only found in the lower frequency subbands with very small exceptions. Thus, only a few significant bitplanes have significant bits (i.e. 1 s) in it, and furthermore, they should be found in lower frequency subbands as shown brighter part in bitplane 12 of Figure 3.1. resolution 1 resolution level 0 The tallest zerotree MSB bitplane 12 bitplane 11 bitplane 10 resolution 1 resolution level 0 The shortest zerotree LSB bitplane 2 bitplane 1 bitplane 0 : zeros : non zeros Figure 3.1: Zerotrees on bitplanes A zerotree is defined on the assumption that if a coefficient is insignificant, it is very likely that its descendants coefficients in higher frequency subbands are also insignificant. If a coefficient and all of its descendant coefficients are insignificant (i.e. zero in a bitplane), a zerotree is found in the EZW. The zerotree in EZW is simply a tree consisting of all zero values. We denote this zerotree as degree-0

45 30 zerotree. On the other hand, the zerotrees in SPIHT are defined in a wider sense. It can represent two more classes of zerotrees. The SPIHT algorithm treats a root coefficient and its corresponding descendants separately. So, the tree with significant root coefficient and insignificant descendant coefficients can be coded by a zerotree symbol. Obviously, the significant root coefficient is coded separately. We denote this class of zerotree to be coded as degree-1 zerotree since every coefficient except at top level is all zeros. Also, SPIHT can treat indirect descendant coefficients separately from a root and children coefficients. Thus, the tree with significant root and children coefficients and insignificant indirect descendant coefficients can be coded by a zerotree symbol. Here also, the significant root and children coefficients are coded separately. We denote this class of zerotree to be coded as degree-2 zerotree since every coefficient except at the top two levels is zero. Our models of zerotrees are based on their zeroness. At present, no image coder that can code zerotrees with more than degree-2 is reported. The degree-2 zerotree is the maximum complexity of zerotree discovered so far and is used by the SPIHT image compression algorithm. In the viewpoint of block entropy coder, the entropy coding power of zerotree is defined and explained simply. Based on the suggested framework, the possibility of further improvement of SPIHT or any other zerotree-based algorithm is discussed [38]. 3.2 Analysis of EZW and SPIHT algorithm As discussed in Section of Chapter 2, Once the wavelet coefficient is coded by one of EZW symbols, PS and NS (Positive Significant and Negative Significant, respectively), each of the four subtrees branching from this coefficient should be tested for their significance. Thus, four symbols (one for each subtree) necessarily follow, regardless of the significance of the four branched descendants. Meanwhile, SPIHT can handle this situation differently. It codes parent coefficients and children coefficients separately. It has a special syntax representing whether there is any significant information in the children coefficients. If there is

46 31 no significant children coefficients, code 0. Otherwise, code 1 and also code those four bits telling the significance of each child coefficient. Here, the syntax requires only one symbol, or one bit (output 0 for D(i, j) in the original article) if there were no significant child coefficient. There is also mode advanced and sophisticated syntax defined in SPIHT, to represent the tree having zeros for all descendant coefficients of the four children. This again takes only one bit, 0 (output 0 for L(i, j) in the original article) if all those descendant coefficients are zeros. These are the most important reasons how the SPIHT algorithm improves the EZW algorithm in compression efficiency. 3.3 Observation of Zerotrees in the Bitplanes of Wavelet Coefficients The zerotree is defined on the bitplane and the zerotree root and its body is found in a spatial orientation tree. If a zerotree root is located in LL subband, each of its three children tree is different spatial orientation tree. For the wavelet transformed image for Lena (See Figure 3.7), the bitplanes 11(M SB) 0(LSB) are seen in Figures 3.8, 3.9, and 3.10 for Lena. Since each bitplane image is binary image, each pixel has only two variations, zero or nonzero (one). In a EZW context, a tree with all zero values is said to be a zerotree. Note that most part of the binary image is filled with zeros in bitplanes 11 8, which is highest four bitplanes. Many zerotrees are found on each bitplane of the Figures, and the height and the size of zerotree varies along biplane number. Figures 3.11 and 3.12 give magnified view of transformed image and its first four significant bitplanes 11 8, respectively. It is observed that zerotrees rooted at lower frequency subbands have taller height than those rooted at higher frequency subbands. The taller zerotree gives higher compression efficiency in zerotree coding since they represent more zero value wavelet coefficients.

47 Degree-k Zerotree We establish definitions and theorems regarding the efficacy of zerotrees, which can formally explain the difference of zerotree coding power between popular EZW, SPIHT and possibly other zerotree-based algorithms. Viewing a zerotree as an entropy coding scheme, we classify it into different cases depending on the fullness of zeros in each level. They also tell the possibility of further improvement in compression that uses the zerotrees of wavelet coefficients as in EZW and SPIHT. Through all following definitions, theorems, and proofs, assume that each zerotree is a height h, t-ary (i.e. having t branches), and complete (i.e. full leaves) tree (See Figure 3.2). And let us call this tree a source tree or a source zerotree. The level 0 of a tree indicates the root node (i.e. the top) and the level h indicates the leaf nodes (i.e. the bottom). Note that the level is numbered starting from the top (root) level. Thus, a height h tree has h + 1 levels, i.e. level 0 to level h. At level i, there are t i nodes and the total number of nodes in the tree is simply T = h i=0 ti = th+1 1 t 1. t branches 0... level 0 level level h Figure 3.2: A height h, t-ary tree Each node of the tree is assumed to have binary number (i.e. 0 or 1) and corresponds to one symbol (alphabet = {0, 1}). Each node has its value, which is the significance of a wavelet coefficient for a given threshold. Representing each node as a random variable X with zero-order statistic, i.e. 0 and 1 are equally probable, we need N bits to encode a sequence of N nodes, X 0, X 1,, X N 1 [39][40]. Thus, for a height α, t-ary subtree, the required number of bits to code it is equal to the total number of nodes in the subtree, i.e. tα+1 1 t 1 (bits).

48 33 Zerotree height Zerotree Zerotree height Figure 3.3: A zerotree height on the bitplane Definition 1. For any complete k-ary tree with height h, the level 0 indicating the root node and the level h indicating the leaf nodes, if all nodes from the bottom level (i.e. the level h) to the level k have zero values, we call the tree degree-k zerotree. In other words, all nodes except top k levels have zero values in a degree-k zerotree. 0 1 Non-zero All 0's Not all 0's All 0's All 0's (a) degree-0 zerotree (b) degree-1 zerotree (c) degree-2 zerotree Figure 3.4: Degree-0, degree-1 and degree-2 zerotrees So, the degree-0 zerotree is the tree having all zeros, as shown in Figure 3.4 (a). The degree-1 zerotree (shown in Figure 3.4 (b)) is the tree having all zeros except the root node. And, the degree-2 zerotree (shown in Figure 3.4 (c)) is the tree having all zeros except the root node and the children nodes of root node. Table 3.1 shows how EZW and SPIHT differently code the degree-0, degree- 1, and degree-2 zerotrees. As discussed in [14], the EZW coder has four symbols: PS, NS (Positive and Negative Significant), ZTR (Zerotree), and IZ (Isolated Zero). The second symbol 0 (bold faced) in SPIHT s code for both degree-0 and degree-1

49 34 examples informs that there does not exist a degree-0 zerotree. The second symbol 1 (bold faced) in SPIHT s code for degree-2 example informs that there does not exist a degree-1 zerotree. The last symbol 0 (bold faced) in SPIHT s code for degree-2 example informs that there exists a degree-2 zerotree. Note that 2 bits is required to code each symbol of EZW without entropy coding. Table 3.1: Example of coded symbols generated by EZW and SPIHT for degree-1 and degree-2 zerotree EZW SPIHT degree-0 zerotree ZTR 0,0 in Fig. 3.4 (a) degree-1 zerotree PS,ZTR,ZTR,ZTR,ZTR 1,0 in Fig. 3.4 (b) degree-2 zerotree PS,ZTR,PS,PS,ZTR, 1,1,0,1,1,0,0 in Fig. 3.4 (c) IZ,IZ,IZ,IZ,IZ,IZ,IZ,IZ Now we derive the rule which can be generally applied to an image coding algorithm using zerotrees of wavelet coefficients. Basically, without using a zerotree symbol, a degree-k zerotree in a height h, t-ary source tree is coded by two parts: 1. Non-zero part : Code all symbols from top (root) level (i.e. level 0) to level k 1, which are not all zeros (For degree-0 zerotree, there is no non-zero part). The number of the symbols is k 1 i=0 ti. By the definition of degree-k zerotree, at least one node from level k 1 is non-zero (i.e. one). These symbols can be modeled as a sequence of random variables, i.e. X 0, X 1,, X ( k 1 i=0 ti ) 1. Since 0 and 1 are assumed to be equally probable, k 1 i=0 ti bits are required to represent this sequence. 2. Zero part : Code all symbols from level k to bottom level (i.e. level h), which are all zeros. The number of the symbols is h i=k ti. Since we assumed 0 and 1 are equally probable, h i=k ti bits are required to represent this sequence of zeros. However, if we use a zerotree symbol, the zero part can be coded by only one symbol. The number of bits saved by the use of a degree-k zerotree symbol is the number of nodes from level k to bottom level in the zerotree minus one for the

50 35 zerotree root symbol. Definition 2. In representing a degree-k zerotree of height h and t-ary branches, the bit savings S k by using a degree-k zerotree symbol is simply: S k = h t i 1 (bits) i=k For example, for a height 3, 4-ary, degree-1 zerotree, the total number of nodes T is 10. We use one bit to represent the root node at level 0, and use another bit to represent the degree-1 zerotree. Then, the bit savings S 1 is 9-1 = 8 (bits) since without degree-1 zerotree symbol it needs 9 bits to represent the 9 nodes from level 1 to level 3 (bottom). The bit savings S k is larger for the taller zerotree, i.e. the larger height h. Definition 3. Following above definition, the coding fraction F of degree-k zerotree in height h and k-ary tree is: T S k T = 1 S k T. For example, the bit savings S 1 of 8 (bits) for the above example, the coding fraction is calculated as = 0.2, which means that only 20% of the original nodes in the source tree is coded by exploiting the degree-1 zerotree symbol, to represent the tree. Therefore, the coding fraction F decreases as the bit savings obtained by zerotree increases. The range of coding fraction F is: 0 < F 1. Theorem 1. For k 1 < k 2, the bit savings of degree-k 2 zerotree from degree-k 1 zerotree, D k1,k 2, is: D k1,k 2 = and the difference of coding fraction is: k 2 1 i=k 1 t i (bits) D k1,k 2 T.

51 36 Proof. The difference of bit savings is: D k1,k 2 = ( h i=k 1 t i 1 ) ( h i=k 2 t i 1 ) = k 2 1 i=k 1 t i (bits). Now, the difference of coding fraction is: = ( h ) ( i=k 1 1 t i 1 h ) i=k 1 2 t i 1 T T h i=k 1 t i h i=k 2 t i = D k 1,k 2. T T Corollary. The difference of bit savings between degree-0 zerotree and degree-1 zerotree is only one bit. Proof. In the case of height h and 4-ary tree (i.e. quadtree and t=4), a degree-0 zerotree has bit savings, h bits. And a degree-1 zerotree has bit savings, h bits. Thus, the difference of savings is only one bit between degree-0 zerotree and degree-1 zerotree. It is easy to show that this is true for general k-ary tree with its height h: ( h ) ( h ) t i 1 t i 1 = 1 (bits) i=0 i=1 This implies the difference of coding fraction is: 1/T, which is very small and so the coding power of degree-0 and degree-1 zerotree is very close. Definition 4. A degree-k zerotree coder is a zerotree coder which can represent all zerotrees with degree-i, 0 i k. By the definition of degree-k zerotree coder above, the degree-2 zerotree coder, as an example, can code all degree-0, degree-1, and degree-2 zerotrees. Hence, it is more powerful coder than both degree-0 and degree-1 zerotree and has lower coding fraction than those. (See Figure 3.5). Note that a degree-0 zerotree source is represented with two symbols 00 by SPIHT algorithm.

52 37 degree-2 zerotree coder degree-1 zerotree coder SPIHT EZW degree-0 zerotree coder Figure 3.5: Relationship of coding powers among degree-0, 1, 2 zerotree coders Table 3.2: Example of coded bitstream by degree-0, 1, 2 zerotree coders (The d i means degree-i. The 0 i or 1 i indicates the occurrence of degree-i zerotree, 0 i for not-existing and 1 i for existing. The d 0, d 1, and d 2 source zerotrees correspond to Figures 3.4 (a),(b), and (c), respectively.) source zerotree zerotree coder d 0 zerotree d 1 zerotree d 2 zerotree d 0 coder d 1 coder d 2 coder Examples of coded bitstream for degree-0,1,2 zerotree sources by degree-0, 1, 2 zerotree coders are demonstrated in Table 3.2. Assume that we use only two symbols 0 and 1 to code each binary decision of the zerotree coder. The d i means degree-i. The 0 i or 1 i indicates the occurrence of degree-i zerotree, 0 i for not-existing and 1 i for yes. The d 0, d 1, and d 2 source zerotrees correspond to Figures 3.4 (a),(b), and (c), respectively. We can see that a shorter bitstream is generated, for three kinds of zerotree sources, i.e. degree-0, 1, 2 zerotrees, by a higher degree zerotree coder. For degree-0 zerotree in Figure 3.4 (a), all three zerotree coders generate the same bitstream, 0 0, meaning an existence of a degree-0 zerotree. To encode degree-1 zerotree (in Figure 3.4 (b)) by a d 1 zerotree coder, the sequence of binary decisions is executed. The description of each symbol is explained in Table 3.3. Theorem 2. For coding a degree-k 2 zerotree source, the maximum bit savings of degree-k 2 zerotree coder from degree-k 1 zerotree coder with k 1 < k 2 is: t k 2 k 1 1 (bits).

53 38 Table 3.3: Binary decisions , generated from a d 1 coded by a d 1 zerotree coder zerotree (Figure 3.4 (b)) symbol syntactic meaning corresponding tree levels (root = 0) 1 0 no degree-0 zerotree all 1 the value at level 0 of the source tree degree-1 zerotree all Table 3.4: Binary decisions , generated from a d 2 zerotree (Figure 3.4 (c)) coded by a d 1 zerotree coder symbol syntactic meaning corresponding tree levels (root = 0) 1 0 no degree-0 zerotree all 1 the value at level 0 of the source tree degree-0 zerotree 1 1 0, 1, 0 1 no degree-0 zerotree, value 1, degree-1 zerotree 1, 1 h, 1 h 1 0, 1, 0 1 no degree-0 zerotree, value 1, degree-1 zerotree 1, 1 h, 1 h 0 0 degree-0 zerotree 1 Proof. A degree-k 2 zerotree coder can simply represent a degree-k 2 zerotree source with one symbol. However, since k 1 < k 2, a degree-k 1 zerotree coder rooted at top level 0 of a degree-k 2 source zerotree cannot represent the source zerotree. Instead, by having the roots of degree-k 1 zerotree coders at level (k 2 k 1 ), a degree-k 2 zerotree can be represented by a multiple of degree-k 1 zerotree coders. The number of these degree-k 1 zerotree coders minus one equals the bit savings. If k 1 = 0, degree-0 zerotrees rooted at level k 2 are coded by degree-0 zerotree coders rooted at level k 2. Similarly, if k 1 = 1, degree-1 zerotrees rooted at level k 2 +1 are coded by degree-1 zerotree coders rooted at level k In this way, degree-k 1 zerotrees rooted at level k 2 k 1 are coded by degree-k 1 zerotree coders rooted at level k 2 k 1. The numbers of these additional zerotree coders rooted at level k 2 i, i = 0, 1,, k 1 are : t k 2, t k2 1,, t k 2 k 1, respectively. Figure 3.6 shows that a degree-k 2 zerotree is coded by t k 2 k 1 degree-k 1 zerotree coders (shaded part). From the above definitions and theorems, our analysis on the performance difference between EZW and SPIHT is: EZW algorithm is only using degree-0 zerotrees, while SPIHT is using degree-1 and

54 39 Table 3.5: Binary decisions , generated from a d 2 zerotree (Figure 3.4 (c)) coded by a d 2 zerotree coder symbol syntactic meaning corresponding tree levels (root = 0) 1 0 no degree-0 zerotree all 1 the value at level-0 of the source tree no degree-1 zerotree all 0 2 degree-2 zerotree all 0, 1, 1, 0 value 0, 1, 1, 0 at level 1 1 a degree-k 2 source zerotree level 0 level 1 level k 1 t k2-k1 degree-k 1 zerotree coders not all 0's level (k 2 -k 1 ) a degree-k 1 zerotree coder not all 0's level 0 k 1 all 0's all 0's all 0's level k 2 all 0's level k 1 Zerotree Source Figure 3.6: A degree-k 2 zerotree coded by t k 2 k 1 Zerotree Encoder degree-k 1 zerotree coders degree-2 zerotrees as well. The D(i, j) of type A and B in SPIHT correspond to degree-1 and degree-2 zerotrees, respectively. Ideally, if we found higher degree zerotree coder, i.e. degree-m, m > 2, better coding performance would be expected. The hurdle for this, however, is the increased complexity in the implementation of the set partitioning engine. Also, since the number of wavelet decompositions, or equivalently the height of a spatial orientation tree, is usually not more than 5 7, zerotrees of degree greater than 2 do not occur frequently, as will be shown subsequently. An experimental analysis of the frequency of degree-1,2,3 zerotrees is presented in the next section. In our experiments of degree-3 zerotree SPIHT coder, no coding improvement is achieved. In each source tree, if it is not a degree-2 zerotree, then we test if it is a degree-3 zerotree, and

55 40 finally we code a degree-3 zerotree symbol to represent the test result. If the tree were a degree-3 zerotree, there is apparent coding gain. However, if it were not, then the coded degree-3 zerotree symbol works as overhead. In most wavelet transformed images the frequency of the degree-3 zerotree is very low. And this means there is no coding improvement with degree-3 zerotree coders. The same is expected to be true for higher degree zerotree coders. 3.5 Experimental Analysis We will show the effectiveness of a higher degree zerotree coder by showing the actual occurrences of higher degree zerotrees in experiments. Also, we try to show the behaviors of zerotree depending the significance of the bitplane where it is located. Tables 3.6 and 3.7 show the distribution of degree-0, -1 and degree-2 zerotrees coded in SPIHT, for the Lena at 1.0 bpp with 8 level wavelet decomposition. Each entry indicates the number of zerotrees for the specific bitplane and zerotree height. The bottom level of every zerotree is located at the highest resolution subbands, i.e. resolution level 0 which is not shown in both Tables 3.6 and 3.7. Note that bitplane 12 is MSB and bitplane 0 is LSB. The bitplane i represents the significance map with threshold 2 i. In fact, the height of a zerotree equals the resolution level of the zerotree root, for 0 being the highest resolution and 8 being the lowest resolution. For the Lena image, the minimum threshold for the bitrate 1.0 bpp is 2 2 = 4, which corresponds to bitplane 2. Thus, the two least insignificant bitplanes 0 and 1 are not coded The Effectiveness of a Higher Degree Zerotree Coder: Existence of Higher Degree Zerotrees An important observation is that the occurrence of degree-2 zerotrees is as frequent as that of degree-1 zerotrees, as shown in Tables 3.6 and 3.7. This proves the idea that degree-2 zerotree coder is superior to degree-1 zerotree coder since degree-2 zerotree coder can directly encode degree-2 zerotree with just one symbol which degree-1 zerotree coder will need three more symbols to represent degree-2

56 41 Table 3.6: Distribution of both degree-0 and -1 zerotrees in Lena coded by SPIHT, decomposition level = 8 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane zerotree for 4-ary source tree (i.e. quadtree). From this example, it is certain that the higher degree zerotree coder is more powerful since there apparently exist higher degree zerotree sources. The trends of zerotree behavior at lower bitrates (i.e. < 1.0 bpp) will be very similar since the less significant bitplanes are simply not coded at lower rates coding Location of Zerotree Root vs. Significance of Bitplane The next observation is the location of zerotree roots depending on the significance of bitplane. Is there any dependency between the significance of bitplane and the location of zerotree roots in various frequency subbands? As we have discussed in the Introduction section of this Chapter, the zerotree roots found at significant bitplanes (such as bitplanes 12, 11, 10) are located in the lowest frequency subbands (such as resolution level 8,7,6). Meanwhile, the zerotree roots found at insignificant bitplanes (such as bitplanes 2,3,4) are located in the highest frequency subbands (such as resolution level 1,2,3). The fact that most of high energy is concentrated in the lowest frequency subbands well explains this result. The significant bits at significant bitplanes can be found only in lowest frequency subbands with very small

57 42 Table 3.7: Distribution of degree-2 zerotrees in Lena coded by SPIHT, decomposition level = 8 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane exceptions, such as extremely high texture or high contrasted details Frequency of Zerotree (especially, degree-1) Occurrences The other observation we have is the frequency of zerotree (especially degree- 1) occurrences in specific frequency subband (especially LL subband) depending on the number of wavelet decomposition. Both Tables 3.6 and 3.7, where the wavelet decomposition level is 8, it is observed that the tallest zerotrees with height 8 are very few while there are many short zerotrees. However, we will have a little bit different result for smaller number of decomposition levels. See Tables 3.13 and 3.14, where the wavelet decomposition level is 5. In Table 3.13, we see 256 occurrences of degree-1 zerotree symbols at bitplane 11. Note that the bitplane 12 is not coded when the level is 5. Since the total number of coefficient occurrences in LL subband is 256 for 5 levels wavelet decomposition of image, we see that every spatial orientation tree rooted at LL is coded as a degree-1 zerotree. This means, regardless of root coefficients which reside in LL subbands, it happened that their descendant coefficients are entirely all zeros on the

58 43 Table 3.8: Distribution of degree-3 zerotrees in Lena coded by SPIHT, decomposition level = 8 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane 11. This observation gives us the idea of improving coding efficiency in low bit rate coding, i.e. adding a new syntax which can represent this situation (entire descendants of LL subbands are zeros in a bitplane) will save the bit resources. The 3-D Virtual SPIHT by Danyali and Mertins [41] has well exploited this trend happening in the SPIHT algorithm. If the coding efficiency is saturated around certain decomposition level and we can use this new symbol, the improvement can be obtained. The saturation of coding efficiency is shown in Table 3.12 for 512x512 Grey Lena at 1.0 bpp. The bold faced numbers shows that there is very small increase in Mean Squared Error (MSE) between decomposition level 5 and 8, especially for entropy coded case. In fact, even for the decomposition level is 8, a majority of the spatial orientation trees in the highest bitplane is coded by degree-1 zerotree symbols. That is to say, three degree-1 zerotrees are coded in bitplane 12 of Table 3.6 (Lena) and four degree-1 zerotrees are coded in bitplane 13 of Table 3.9 (Goldhill). Note that the maximum possible number of degree-1 zerotrees rooted at LL subband is four for 8

59 44 Table 3.9: Distribution of both degree-0 and -1 zerotrees in Goldhill coded by SPIHT, decomposition level = 8 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane level wavelet decomposition since the size of LL subband (or baseband) is 2 2. The frequency of degree-3 zerotrees is much lower than degree-1 or degree- 2 zerotrees. The frequency of degree-3 zerotrees, for Lena and Goldhill images, is illustrated in Tables 3.8 and 3.15 for Lena image with 8 and 5 level wavelet decomposition and Tables 3.11 and 3.18 for Goldhill image with 8 and 5 level wavelet decomposition, respectively. 3.6 Conclusion We have tried to quantify the coding power of zerotrees of wavelet coefficients. A degree-k zerotree means the tree with all zero values except top k levels. And the degree-k zerotree coder means the source tree coder which can encode degree-i zerotrees, 0 i k. Thus, the higher degree zerotree coder will have more coding power. Based on this model, we classify the popular image coders EZW and SPIHT as examples, and this leads to an answer for the question as to why SPIHT is better

60 45 Table 3.10: Distribution of degree-2 zerotrees in Goldhill coded by SPIHT, decomposition level = 8 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane than EZW. It is because EZW is a degree-0 zerotree coder and SPIHT is a degree-2 zerotree coder. SPIHT can encode degree-1 or 2 zerotree by one symbol for each, while EZW will need three more symbols for each than does SPIHT. SPIHT can code wider range of zerotrees than can EZW.

61 46 Table 3.11: Distribution of degree-3 zerotrees in Goldhill coded by SPIHT, decomposition level = 8 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane Table 3.12: Saturation of coding efficiency over decomposition levels. Image source: 512x512 Grey Lena. Compression rate 1.0 bpp. MSE (Mean Squared Error) decomposition level not entropy coded entropy coded baseband (LL) size 2 levels levels levels levels levels levels levels

62 47 Table 3.13: Distribution of both degree-0 and -1 zerotrees in Lena coded by SPIHT, decomposition level = 5 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane Table 3.14: Distribution of degree-2 zerotrees in Lena coded by SPIHT, decomposition level = 5 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane

63 48 Table 3.15: Distribution of degree-3 zerotrees in Lena coded by SPIHT, decomposition level = 5 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane Table 3.16: Distribution of both degree-0 and -1 zerotrees in Goldhill coded by SPIHT, decomposition level = 5 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane

64 49 Table 3.17: Distribution of degree-2 zerotrees in Goldhill coded by SPIHT, decomposition level = 5 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane Table 3.18: Distribution of degree-3 zerotrees in Goldhill coded by SPIHT, decomposition level = 5 Height of zerotree (i.e. resolution level of zerotree root) bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane bitplane

65 Figure 3.7: Wavelet coefficient image of Lena, five level decomposition (magnitudes unscaled) 50

66 51 bitplane 11 bitplane 10 bitplane 9 bitplane 8 Figure 3.8: Bitplanes 11, 10, 9, 8 for wavelet coefficients of Lena,five level decomposition (White pixel:0, black pixel:1)

67 52 bitplane 7 bitplane 6 bitplane 5 bitplane 4 Figure 3.9: Bitplanes 7, 6, 5, 4 for wavelet coefficients of Lena, five level decomposition (White pixel:0, black pixel:1)

68 53 bitplane 3 bitplane 2 bitplane 1 bitplane 0 Figure 3.10: Bitplanes 3, 2, 1, 0 for wavelet coefficients of Lena, five level decomposition (White pixel:0, black pixel:1)

69 Figure 3.11: Magnified view of the lowest four frequency subbands, i.e. the highest three levels only (size out of ) of wavelet coefficients, Lena, five level decomposition 54

70 55 bitplane 11 bitplane 10 bitplane 9 bitplane 8 Figure 3.12: Magnified view of the lowest four frequency subbands (size out of ) of bitplanes 11, 10, 9, 8, Lena, five level decomposition (White pixel:0, black pixel:1)

71 CHAPTER 4 AGP (ALPHABET AND GROUP PARTITIONING) In this chapter, AGP algorithm is reviewed and analyzed. Two parts, alphabet partitioninig and sample-set parititoning, known as Amplitude and Group Partitioning is explained. The strategy for choosing the best alphabet partitions is shown from the viewpoint of information theory. 4.1 Introduction The emphasis of this work is to reduce the amount of computations in encoding symbols and to efficiently represent the sample signals [42][43][44]. AGP consists of two parts : alphabet partitioning and sample-set partitioning. The AGP combines complexity reduction and efficient entropy coding, assuming the statistics of source symbols in the vicinity are very similar. First, the alphabet partitioning means that the symbols in the alphabet are grouped such that a symbol is coded by two parts: a set number and a set index in the group. The goal of the alphabet partitioning is to reduce the computational complexity of a entropy coding. The set number is coded by powerful entropy coder, while the set index is coded by low-complexity coder such as simple binary representation. The high-complexity coder will not really take a long time since the number of sets is small. Thus, the overall time complexity for entropy coding the symbols is reduced. The waveform data to be entropy coded is usually obtained after transformation (DCT, Wavelet, and etc.) and quantization, whose distribution (i.e. the transformed waveform data) is both highly peaked and very long tailed. This means that the entropy coder would have large alphabet size, which inevitably accompanies a high complexity, even though high compression ratio is possible. In practical application, the time complexity often appears as one of issues in the design stage of a source coder, which prohibit choosing a powerful entropy coder. The feasibility of alphabet partitioning comes from the fact that a simple al- 56

72 57 gorithm is available to find near optimal partitions. This algorithm allows large reduction in computational complexity with a very small loss of compression efficiency. Actually, various methods that fit in this general strategy have been in popular use, such as JPEG, MPEG, and H.26x, which are image and video compression standards. Second, the sample-set partitioning is a scheme to compactly represent the samples in a sample space by using the dependence of adjacent samples, so that it can achieve the rate (bits/symbol) lower than first-order entropy rate. The samples used here are the set numbers of the amplitude partition of the transformed coefficients. A group of samples, such as image pixels or coefficients, is partitioned into subgroups. During so, the maximum set number of the samples is entropy coded and the binary mask is used to indicate whether or not a sub-group maximum equals to the global maximum. 4.2 Analysis of Alphabet Partitioning The overall diagram of alphabet partitioning is shown in Figure 4.1. The goal of partitioning is to reduce the complexity of entropy coding with minimal loss of coding efficiency. Here, we will derive the coding rate of the combined method, i.e. coding a set number and then a symbol member of that set. Finally, we will show how to choose the partitions such that the loss of coding efficiency is minimized. Waveform source Transformation and quantization Alphabet partitioning Entropy encoder Symbol indices Set numbers Channel (transmission/storage) Reconstruced waveform Reconstruction Alphabet recovery Entropy decoder Figure 4.1: Overall diagram of alphabet partitioning Assume we have a M symbol-alphabet A = {a 1, a 2,, a M } with probability

73 58 of symbol a k being p k, i.e. P (a k ) = p k, k = 1,, M and M k=1 p k = 1. Then its entropy rate is defined by: H = M k=1 p k log 2 1 p k. We partition the source symbols into N nonempty and disjoint sets G j, j = 1,, N, and denote the number of elements in set G j by G j The probability that a symbol is in set G j is P j, i.e. P (symbol G j ) = P j. Now we consider the ideal entropy coding method which first codes the set number to which the symbol belongs, and then codes the set index inside that set. Then, the ideal combined coding rate H 1 is: H 1 = = N j=1 N j=1 P j log 2 1 P j + P j log 2 1 P j + N P j j=1 N j=1 i G j p i P j log 2 P j p i (4.1) i G j p i log 2 P j p i (bits/symbol) (4.2) Since P j = i G j p i, the first term of equation (4.2) can be written as: N j=1 P j log 2 1 P j = N j=1 { i G j p i } log 2 1 P j = N 1 p i log 2 (4.3) P j i G j Then, substituting the first term of equation (4.2) with (4.3), we see that H 1 is equal to H. j=1 H 1 = N 1 N P j p i log 2 + p i log P 2 j=1 j p i G j j=1 i i G j = N 1 p i log 2 p j=1 i i G j = H Obviously, there is a penalty for using the maximum of log 2 G j bits to every symbol in a set G j since the minimal number of bits used by symbol i in set G j is log 2 P j p i. The combined entropy rate of sets when log 2 G j is assigned to the elements

74 59 of set G j is: H 2 = = N j=1 N j=1 [ ] 1 P j log 2 + log P 2 G j j P j log 2 1 P j + N j=1 (4.4) i G j p i log 2 G j (4.5) From the equations (4.2) and (4.5), the difference between H 1 and H 2 is: H = H 2 H1 = = N j=1 i G j N j=1 [ ] P j p i log 2 G j p i log 2 p i (4.6) i G j p i log 2 G j p i P j (4.7) From the above equation (4.7), we see that the difference, H = H 2 H 1, will be small if we partition the alphabet in such a way that: 1. i G j p i log 2 G j = P j log 2 G j is very small, i.e. the relative contribution of group G j to H 2 is very small. 2. p i P j G j, i G j, i.e. the distribution of each group G j is very close to uniform. 4.3 Analysis of Sample-Set Partitioning The sample-set is defined over sample space or signal space, which may be transformed or not, such as 1D-audio or 2D-image signal. The objective of a partitioning is to obtain a locally adaptive representation of samples. Since the adjacent (in time or space) samples are likely to have similar statistics, the groups of samples are separately coded depending on their statistical distributions. Briefly, the set partitioning rule or the set partitioning condition is applied whenever the partitions have different statistics or use the different range of symbols in the alphabet. In 2D-image pixels, partitioning is usually performed starting from the entire image region. As an example, a size 2 n 2 n image will be partitioned into four

75 60 subimages of size 2 n 1 2 n 1 each. And each of these subimages is recursively partitioned until the subimage size is 1 1. On the other hand, the partitioning of a wavelet transformed image is different. It might work in each subband separately as in 2D-image. However, there is a dependence among coefficients across the frequency subbands in the same spatial region. If we want to exploit the dependence, the partitioning is performed over each spatial orientation tree, which is obtained by a wavelet transform, and corresponds to a certain spatial region of source image. The wavelet transformed image is viewed as a collection of spatial orientation trees. The partitioning starts from each node (coefficient) in lowest frequency subband LL. Note that each node in LL subband has three child subtrees in three different subbands, HL, LH, and HH, each of which has different orientation. Now, each node in HL, LH, and HH subbands will be partitioned into four child subtrees. And each of these child subtrees is recursively partitioned until the child belongs to the highest frequency subband. Above two cases of partitioning would be best understood by some examples. Note that the terminology set used here is completely different from the set in alphabet partitioning, where the set indicates the part of an alphabet Example algorithms and their assumptions We assume that alphabet partitioning is already done, and fixed during the sample-set partitioning. So, the symbols produced by sample-set partitioning are coded by the pair of a set number and a set index. However, note that only set numbers are coded by these example algorithms. The corresponding set index also will be coded in its natural binary representation. Thus, each pixel value in the following example algorithms indicate the set number of the symbol for the pixel value, not including the set index. We also assume that the set numbers are ordered by their probability. Thus, the set number 0 is most probable (the highest probability), and the set number 1 is the second most probable. In the algorithms, the variable v m stands for maximum value in the group.

76 Example 1: Groups of 2 2 pixels 1. Find the maximum value v m in the group of 2 2 pixels. 2. Entropy-code v m ; if v m = 0 then stop. 3. Create a binary mask µ with 4 bits, each bit corresponding to a pixel in the group of 2 2. Each bit is set to 1 if the pixel value is equal to v m and set to 0 otherwise. 4. Entropy-code µ using a 15-symbol alphabet (µ = 0 never happens). If µ = 15, i.e or v m = 1 then stop. 5. Let r be the number of pixels with values smaller than v m. If vm r is sufficiently small, then aggregate the r symbols together and entropy-code them with a vm-symbol r alphabet. Otherwise, entropy-code each of the r values with a v m -symbol alphabet. 6. Stop Example 2: Groups of 2 n 2 n pixels To code the group of 2 n 2 n pixels, the algorithm for 2 2 pixels is applied hierarchically and recursively, with the maximum value being defined over the group, which is larger than 2 2 when n > 1. First, the algorithm codes the maximum value (again, the value means set number of the sample in the partitioned alphabet) in the group of 2 n 2 n pixels. Then, a 4-bit binary mask is coded to indicate which of the four sub-groups, each with size 2 n 1 2 n 1, has the maximum value. Now, the same procedure is applied to each 2 n 1 2 n 1 pixels sub-group until all the 2 2 sub-groups are coded. For set-partitioning of the groups of 2 n 2 n pixels, we can start by calling the below function with the argument as a group of 2 n 2 n pixels, i.e. Set-partitioning (2 n 2 n pixels). function Set-partitioning (a group) 1. Find the maximum value v m in the group.

77 62 2. Entropy-code v m ; if v m = 0 then stop. 3. Create a binary mask µ with 4 bits, each bit corresponding to one of four sub-groups. Each bit is set to 1 if the max value of the sub-clsuter is equal to v m and set to 0 otherwise. 4. Entropy-code µ using a 15-symbol alphabet (µ = 0 never happens). if µ = 15, i.e or v m = 1 then stop. 5. Let r be the number of sub-groups with values smaller than v m. if vm r is sufficiently small, then aggregate the r symbols together and entropy-code them with a vm-symbol r alphabet. otherwise, entropy-code each of the r values with a v m -symbol alphabet. 6. for each sub-group S, Set-partitioning (S). 7. Stop. We can state the advantages of this algorithm as: 1. It allows to exploit the similar statistics shared by spatially adjacent pixels (or coefficients) over a large group. 2. Extended alphabets (i.e. vm-symbol r alphabet in Example 1) with a small number of symbols (i.e. vm) r are automatically constructed in large groups where the values are uniformly small, causing the desirable adaptive alphabetselection. 3. A single symbol can represent the large groups with all zero values in it, causing extremely efficient compression akin to zerotrees in EZW and SPIHT Example 3: Groups of hierarchically transformed wavelet coefficients The difference of applying set-partitioning to spatial-orientation tree from applying it to image pixels is the defition of partitioned sub-groups. In wavelet spatial-orientation tree, the children are defined in next higher frequency subband.

78 63 The similarity in a set-partitioning of image pixels and spatial-orientation tree is that they are both using the statistical dependence existing over spatially adjacent samples. Spatial-orientation tree, obtained by hierarchical wavelet transform, gives extra dependence to exploit, the property of decreasing magnitude as a frequency increases. Note that the choice of partitioning rules, the number of partitioned subsets, and the entropy-coding scheme are all very flexible. For example, the binary mask can be used for the case where the number of subsets is relatively small. And sometimes, we need a non-recursive implementation of the algorithm to give BFS (Breadth First Search) like traversal or to fit the practical computing environments (e.g. small stacks).

79 CHAPTER 5 LOW-COMPLEXITY IMAGE CODER : PROGRES In this chapter, a very fast, low complexity algorithm for resolution scalable and random access decoding is presented. The dynamic range of coefficients magnitudes in a spatial orientation tree is hierarchically represented. The hierarchical dynamic range coding naturally enables resolution scalable representation of wavelet transformed coefficients. 5.1 Introduction Modern image coding methods, like JPEG2000 s EBCOT [45], are able to support simultaneous sub-image decompression (ROI), and also quality (SNR), resolution, and spectral scalability. Unfortunately, while the loss in compression incurred by supporting these features can be quite small, they may increase computational complexity significantly. Quality scalability is commonly done via bit-plane coding, which also helps to improve compression, since neighboring bits provide convenient and powerful contexts for entropy coding. However, on many important applications the images always need to have a pre-defined high quality, and any extra effort required for quality scalability is wasted. One good example of these applications is an image codec used in a digital camera. If quality embedded decoding is not required, it does not necessarily need quality embedded encoding, because most users requires a certain level of quality and not an exact size of the compressed bitstream. Various kinds of scene will have various degree of image content complexity, thus setting the target bit rate low may cause the loss of important image content for images with high complexity. We consider fast coding methods that support only resolution scalability and efficient decompression of sub-images by random access decoding. We focus on the entropy coding effort, which becomes the most important on high-quality images since its complexity grows with bit rate. Our solution addresses the challenge of 64

80 65 avoiding compression loss and at the same time reducing complexity by not using bit-plane coding (and its contexts), nor standard entropy coding. The original EZW and SPIHT do not support resolution scalability since they do not code the resolution boundaries. Random access decoding functions are also not supported by them. The presented algorithm, PROGRES (Progressive Resolution Decompression) is a method that exploits the same image properties as SPIHT, but adapted to support resolution scalability with great speed [46]. For a pre-defined quality, it can very efficiently decompress any image region at several resolutions. It is an excellent choice for remote sensing and GIS applications, where rapid browsing of large images is necessary. 5.2 Previous Work and Overview Speed improvements were observed in hybrid forms of bit-plane coding, where once an image transform coefficient is classified as significant during a bit-plane pass, its sign and all its less significant bits are encoded together, so that refinement passes are not needed [47]. Oliver and Malumbres [48] presented LTW (Lower-Tree Wavelet), which is another solution for resolution scalable wavelet image coding with low complexity, based on non-embedded coding. The lower-trees are equivalent to the zerotrees of pre-quantized wavelet coefficients, where the quantization step size is 2 rplane (rplane is the number of the lowest bit-planes to drop). Arithmetic coded symbols for zerotree, isolated zero, and number of bits to represent the magnitude of coefficients are used. Similar to other wavelet based image coding methods using intra and interband coding contexts, our method is based on two properties of natural images: (a) energy in each subband normally decreases with frequency; (b) statistics in a local neighborhood are similar. Thus, we also use the strategy of coding wavelet coefficients following the order of expected importance, i.e., from low to high-resolution subbands, and from most significant bits. However, to reduce the computational burden we do not follow a plane-by-plane scan. Each coefficient, represented by sign and magnitude, is processed only once.

81 66 Since we want to avoid using standard entropy coding methods like arithmetic or Huffman codes, we can code only the sign bit, and the bits below the most significant non-zero bit, so the position of that bit (dynamic range number) must be known in advance. We code that value by coding its difference from similar values at same position in the corresponding subband with lower resolution. Coefficients in a spatial-orientation tree are coded independently. This way we sacrifice SNR scalability for faster coding, but preserve both resolution scalability and ability to decode sub-images. 5.3 Coefficient Dynamic Ranges Representing the Dynamic Range of Coefficients We use c i,j and s i,j to represent, respectively, a wavelet coefficient at location (i, j), and the spatial orientation tree (set of coefficients) with root at location (i, j). As mentioned above, to represent the magnitude compactly, the number of required bits should be known in advance. When the dynamic range of a coefficient magnitude is represented by the number of bits, k, the magnitude varies in the range of [0, 1,, 2 k 1]. Thus, the dynamic range is notated in two ways: 1) a range between two integers 0 and 2 k 1 inclusive, and 2) a minimum number of bits k to represent all integers in the range. We will mostly use the notation 2) to represent a dynamic range in this study and call that minimum number of bits as dynamic range number, which is very analogous to the set number in AGP described in Chapter 4. For example, if the dynamic range number of a coefficient is 3, it can have the values varying from -7 to +7 with an additional one bit for sign information. Each set (a spatial orientation tree) will contain a different dynamic range of magnitudes, based on the activity of its coefficients. We define the dynamic range number r i,j of the set s i,j as: r i,j = log 2 ( max c p,q s i,j c p,q + 1), which accounts for how many bits are required to represent every coefficient magnitude in the set. For example, if the maximum magnitude is 7 in the set s 0,1, then

82 67 Table 5.1: Dynamic range of coefficients Dynamic Dynamic range # bits for # bits range number of coefficient symbol index for sign 0 [0] [-1, 0, 1] [-3,, 3] [-7,, 7] [-15,, 15] [-31,, 31] [-63,, 63] [-127,, 127] [-255,, 255] [-511,, 511] [-1023,, 1023] 10 1 the dynamic range number r 0,1 is 3. If the maximum magnitude is 0 in the set s 1,1, then the dynamic range number r 1,1 is 0. Table 5.1 shows the dynamic range number, their corresponding dynamic ranges, the number of bits for a symbol index in each range, and the corresponding sign information. Note that a pair of dynamic range number and symbol index in a range here corresponds to the pair of set number and set index in AGP (Alphabet and Group Partitioning) discussed in Chapter 4. However, the amplitude sets in AGP are disjoint, while the dynamic ranges in PROGRES are nested. In AGP, an alphabet is partitioned into sets such that the distribution of set indices in each set is close to even, i.e. zero order statistics. Thus, set indices can be simply binary coded without entropy coding, which leads to a gain in compression speed. From the same motivation, symbol indices contained in each group is binary coded without using entropy coding in PROGRES algorithm Coding of Energy Ranges in a Partitioned Set When a set is partitioned into its subsets, each subset will have a different dynamic range, probably a decreased one because the root coefficient of the set likely to have the largest magnitude in the set. Thus, child sets (i.e. subsets) are likely to have smaller dynamic ranges than that of their parent set. Therefore, it is a good idea to predict the dynamic range of energy in each

83 68 NOT a bit-plane k k+1 dynamic range (in bits) Max magnitude k+2 resolution k resolution k+1 r parent r i,j r children four subsets r 2i,2j+1 d base Split Partition r 2i,2j r 2i+1,2j r 2i+1,2j+1 Max magnitude a set, s i,j s 2i,2j s 2i,2j+1 s 2i+1,2j s 2i+1,2j+1 resolution resolution k resolution k+1 Figure 5.1: Dynamic ranges in a spatial orientation tree : a spatial orientation tree in the wavelet transformed image and its corresponding dynamic ranges of coefficients are shown. subset based on the dynamic range of energy of a parent set, as shown in Fig Assuming that a parent set s i,j is partitioned into four subsets, s 2i,2j, s 2i,2j+1, s 2i+1,2j, s 2i+1,2j+1, then the r 2i,2j, r 2i,2j+1, r 2i+1,2j, r 2i+1,2j+1 are ranges for each subset, respectively. The variable r in this chapter always represents the dynamic range in bits. Thus, r x indicates the number of bits required to represent the dynamic range of x, where x indicates wavelet coefficient or a group of wavelet coefficients. Let I(i, j) = {(2i, 2j), (2i, 2j + 1), (2i + 1, 2j), (2i + 1, 2j + 1)} denote the set of position indices of the children of set s i,j. Then, the range, r m,n, is defined for each subset s m,n, (m, n) I(i, j). Now, for representing the dynamic range number of each subset s m,n, we en-

84 69 dynamic range r parent r children resolution k r i,j Partition resolution k+1 four subsets r 2i,2j r 2i,2j+1 r 2i+1,2j r 2i+1,2j+1 d base a set, s i,j resolution k s 2i,2j s 2i,2j+1 s 2i+1,2j s 2i+1,2j+1 resolution resolution k+1 Figure 5.2: Coding of Dynamic Ranges : the dynamic range number for each subset s m,n, is reconstructed by : r children = r parent d base, where the information of r children is common to every subset s m,n. code d base = r parent r children, where r parent is the dynamic range number r i,j of the parent set s i,j and r children is the dynamic range number of the children sets of s i,j, i.e. r children = max (r m,n ). (m,n) I(i,j) Note that one dynamic range number, r children, is used to represent the magnitudes in all children sets. The encoding algorithm is described in Section 5.4. Then, in decoder side, given the information of r parent and d base, the r children can be reconstructed and we use this value as the dynamic range number for the child sets s m,n. Note that the information of r parent d base is common to every subset s m,n. Now, the coded information for the tree s i,j with two resolution scales will be: r i,j, c i,j, d base, c 2i,2j, c 2i,2j+1, c 2i+1,2j, c 2i+1,2j+1,

85 70 where c 2i,2j, c 2i,2j+1, c 2i+1,2j, c 2i+1,2j+1 are root coefficients of each child subset. The c i,j and c m,n, (m, n) I(i, j) contain sign information. The c i,j comprises r i,j +1 bits of information including a sign. The c m,n, (m, n) I(i, j), comprise r i,j d base + 1 bits of information including a sign. There is a reason why we choose d base rather than r children to code, where r children = r parent d base. From our experience, it is more probable that d base r children, in other words, it can be observed that P (d base r children ) > 0.5 in any wavelet transformed image and P (d base r children ) is getting closer to 1 for lower bit rates. This explains that coding d base will cost fewer bits than coding r children. The above coding scheme of dynamic range is applied to every two adjacent resolution scales, k and (k+1), k = 0 to M 1, where M is the highest resolution. In this case, note that the number of parent-children relationships increases four times for each additional resolution scale. By coding the decrease in dynamic range number or d base for each group of four subsets, the amount of bit savings is simply 3 d base bits since we would need a different decrease in the dynamic range number for each subtree if we did not use the common d base information. 5.4 Coding Algorithm The encoding algorithm of PROGRES is described here. For simplicity, we assume that LL subband has one wavelet coefficient. Thus, the algorithm works on size 2 M 2 M wavelet coefficients if M levels of wavelet decomposition is performed. The list L contains the sets to be coded. The set s 0,0 rooted in LL subband has three subsets s 0,1, s 1,0, s 1,1, corresponding to subbands HL M, LH M, HH M. Except root and leaf sets, every set s i,j has four subsets, s 2i,2j, s 2i,2j+1, s 2i+1,2j, s 2i+1,2j+1. Fig. 5.3 shows the encoding algorithm. Note that // indicates the comments in corresponding statement. As seen in Statement 5. in the algorithm, the PROGRES coder encodes the wavelet coefficients information resolution by resolution, from lower to higher. This enables the progressive resolution decoding. Also, when a block of wavelet coeffi-

86 71 1. Find the maximum dynamic range number r parent and binary encode it; 2. if r parent = 0 return; // no coefficients to encode? 3. Initialize a list L a set in the lowest resolution (i.e. LL subband); 4. Binary encode a root coefficient in the list L using r parent bits; 5. for each resolution level k(from the lowest to the highest) (a) for each set j in current resolution level k i. Enumerate subsets of the current set j; ii. r parent maximum dynamic range number of current set j; iii. r children maximum dynamic range number of subsets in current set j; iv. d base r parent r children ; v. Unary encode d base ; vi. if r children = 0, goto (a) vii. for each subset i A. Binary encode the the root coefficient of the subset i using r children bits and encode its sign information using one bit; B. if subset i has its descendants, then append subset i to the end of the list L for next resolution coding; viii. Remove the current set j from the list L; Figure 5.3: Encoding algorithm. cients corresponding to the same sub-image is coded together, each sub-image is both random access decodable and progressive resolution decodable. In this way, the target sub-image can be decoded by random access with progressive resolution. If the LL subband has more than one coefficient, each of those coefficients becomes a root of a spatial orientation tree. Each tree is coded by above algorithm, independent of other tree coefficients Unary Coding A unary coding is used for coding the decrease in dynamic ranges, i.e. d base. Unary coding is a prefix code that is the optimal Huffman code for the exponential probability distribution. For an example, 0, 10, 110,..., represent the codewords C for the events x 1, x 2,..., x n with the probabilities p(x 1 ) = 1 2, p(x 2) = 1 4,..., p(x n) = 1 2 n, respectively. The most probable event x 1 is assigned the length one codeword. The length of the codeword increases by one for the next probable event. With n

87 72 approaching, the average length (coding rate) of the binary codeword R(C) tends to 2 (bits). R(C) = lim n n = n = 2 n (1 1 = 2 (bits) )2 2 n 1 n=1 by the Gabriel s staircase kr k = k=1 k=1 i=k r i = r (1 r) 2, 0 < r < 1. Thus, when the distribution of source follows the exponential probability, the coding rate of unary coding is close to 2 bits/symbol. 5.5 The Extended Idea of Dynamic Range Coding The PROGRES image coder is built on the extended idea of dynamic range coding. Instead of sharing the d base value among four children coefficients, it is shared by sixteen children in practice, whose parents are in the same tree level. In other words, these sixteen children have the same grand-parent, as shown in Fig We assume (m, n) I(i, j) as before. Then, in Fig. 5.5, C(s m,n ) at resolution k+2 indicates the children sets of each set s m,n at resolution k+1. Our goal here is to code the root coefficients in C(s m,n ) at resolution k+2, i.e. the grand children coefficients of the set s i,j. The information of r parents is available to every child s m,n at resolution k, since every root coefficient c m,n is coded by using r parents bits. Now, the dynamic range for each C(s m,n ) at resolution k+2 can be predicted in two stages. First, the r children is predicted by d base, and then, second, the d local,c(sm,n ) is further used to predict the dynamic range for each C(s m,n ). Thus, each set C(s m,n ) has the dynamic range number, r parents d base d local, C(sm,n), where r parents d base = r children. As a result, the sixteen root coefficients from C(s m,n ) are sharing the information of d base, which enables PROGRES algorithm to code the dynamic ranges efficiently.

88 73 NOT a bit-plane k k+1 k+2 dynamic range resolution k r i,j resolution k+1 resolution k+2 r parents Partition Partition four subsets r 2i,2j+1 4 x four subsets r children r 2i,2j r 2i+1,2j r 2i+1,2j+1 d local, C(2i,2j) d base a set, s i,j s 2i,2j s 2i,2j+1 s 2i+1,2j s 2i+1,2j+1 C(s 2i,2j ) C(s 2i,2j+1 ) resolution resolution k resolution k+1 resolution k+2 children of s 2i,2j children of s 2i,2j+1 Figure 5.4: Extended dynamic ranges in a spatial orientation tree : a spatial orientation tree in the wavelet transformed image and its corresponding dynamic ranges of coefficients are shown Algorithm Description The description of the algorithm with this extended idea is given in Fig Assume we have M + 1 resolutions, 0 M, 0 for the lowest resolution, and M for the highest resolution. In the begining of the algorithm, the decrease of dynamic range between the root coefficient in LL subband and its corresponding three 2 2 children in each of HL, LH, and HH subbands is coded by one value. This value for the decrease in dynamic range number first appeared as: d base r parents r children in the algorithm, where r parents and r children represent the dynamic range numbers for root coefficient in LL subband and children coefficients in each of HL, LH, and HH subbands, respectively. After above initial steps, the procedure for dynamic range coding resolution by

89 74 dynamic range resolution k r i,j resolution k+1 resolution k+2 r parents Partition Partition four subsets r 2i,2j+1 4 x four subsets r children r 2i,2j r 2i+1,2j r 2i+1,2j+1 d local, C(2i,2j) d base a set, s i,j s 2i,2j s 2i,2j+1 s 2i+1,2j s 2i+1,2j+1 C(s 2i,2j ) C(s 2i,2j+1 ) resolution resolution k resolution k+1 resolution k+2 children of s 2i,2j children of s 2i,2j+1 Figure 5.5: Extended idea of dynamic range coding : the dynamic range for each set C(s m,n ), is reconstructed by: r parents d base d local, C(sm,n ), where the information of r children is common to every set C(s m,n ). resolution is briefly as follows. For each loop for the resolution k, first the difference (d base ) of two dynamic range numbers between resolution k+1 and k+2 is calculated and coded. And then basing on the dynamic range number for resolution k + 2, the decrease (d local, C(sm,n )) of dynamic range number to each four of coefficients (C(s m,n )) at resolution k+2 having the same parent (s m,n ) is calculated and coded. Finally, each four of coefficients at resolution k+2 is coded by r parents d base d local, C(sm,n) bits. In the statement 10 of the algorithm, i.e. the for loop for each resolution k, the actual coefficient values being coded belong to resolution k+2. The dynamic ranges of these coefficients located at resolution k+2 are differentially predicted from the dynamic range number for resolution k and k+1 by two stages. The two statement 10.(a).1 and 10.(a).2 in the algorithm find the dynamic range numbers for resolution k+1 and k+2, respectively. The two dynamic range numbers for r parents and r children for for resolution k+1 and k+2 are also shown in Fig While the d base represents the difference of dynamic range numbers between resolution k+1 and k+2, the d local C(sm,n ) represents the decrease of dynamic range numbers inside the resolution k+2. The statement d base r parents r children ; at

90 75 10.(a).iii and the statement d local r children r subset ; at 10.(a).iv.B correspond to these, respectively.

91 76 1. Find the maximum dynamic range number r parents and binary encode it; 2. if r parents = 0 return; // no coefficients to encode? 3. Initialize a list L a set in the lowest resolution (i.e. LL subband); 4. Binary encode the root coefficient in the list L using r parents bits; 5. r children a maximum dynamic range number of the children coefficients of the root coefficient; 6. d base r parents r children ; 7. Unary encode d base ; 8. if r children = 0 exit // means, nothing to encode (i.e. zerotree) exit 9. Binary encode the value of children coefficients of the set in LL using r children bits; 10. for each resolution level k = 0 to M 2 (a) for each set j in current resolution level i. r parents the dynamic range number by which the children coefficients in current set j is coded; ii. r children a maximum dynamic range number of children coefficients of the subsets; iii. d base r parents r children ; iv. Unary encode d base ; v. if r children = 0 continue // means, nothing to encode (i.e. zerotree), goto (a) vi. for each subset i if subset i has children coefficients; A. r subset maximum dynamic range number of the children of the current subset i; B. d local r children r subset ; C. Unary encode d local ; D. if r subset = 0 continue; // means, no more descendants, goto vii. E. Binary encode the children coefficients of subset i using r subset bits; F. Append subset i to the list L for next resolution coding; vii. Remove the current set j from the list L; Figure 5.6: Extended encoding algorithm.

92 An Example of PROGRES Coding For the wavelet coefficients of Fig. 5.7, the step by step demonstration of PROGRES coding is described in Tables 5.2 and 5.3. Each row of the table shows each coding step sequentially. The basic processing order of the source wavelet coefficients is resolution by resolution as stated in the beginning of this Chapter. And for each resolution, the coefficient is visited by the numbering policy shown in Figure 5.8. Note that this policy is the same to the BFS (Breadth First Search) [49] algorithm. The coefficients in the next higher level resolution will never be processed until the ones in the current resolution level are all finished. In the last column of the Tables 5.2 and 5.3, a pair of parenthesized number and a number such as (0) 96 and (1) -6 indicates the scanning order of current wavelet coefficient and the wavelet coefficient itself (i.e. the coefficient 96 is processed first and the coefficient -6 is processed in the second). First, the initial dynamic range number r parents for the wavelet coefficients block (representing the image block) is 7. This means that the maximum coefficient magnitude can be 2 7 1, which is 127. The coefficient range with sign is [ 127, 127]. All the coefficients in this block can be represented by 7 bits, although the dynamic range prediction scheme of PROGRES will further reduce the dynamic range through increasing resolutions. The information that how many bits is required to represent the dynamic range can be viewed as the set number in AGP method discussed in Chapter 4. The actual maximum coefficient is 96 as seen in Figure 5.7, which is located at LL subband, the resolution level 0. Thus, the magnitude 96 and its sign + is coded by bits. Now, each of the three coefficients (1) -6, (2) -25, and (3) -8 at resolution level 1 (See Figure 5.7) is coded with 5 bits to accommodate the range maximum for (2) -25. Note that, the root coefficient (0) has three children coefficients, (1), (2), and (3), which is different from other coefficients that has four children coefficients. In the Table 5.8, see that if the current dynamic range number becomes 0 then there is nothing to code since all the coefficients in the group are just zeros.

93 78 Figure 5.7: quantized wavelet transformed image, four levels of decomposition, truncated to integer. 5.7 Experimental Results Tests were performed using an Intel 2.0 GHz Xeon processor, MS-Windows 2000, and Visual C Compiler with speed optimization. The coding times of two 8 bpp gray scale images, Lena and Woman, at the rate of 0.125, 0.25, 0.5 and 1.0 bpp are shown in Table 5.4. The coding time is measured in CPU cycles of the Pentium processor. Since the elapsed time for discrete wavelet transform is exactly the same if the same wavelet filter and decomposition method are used, the time complexity is usually measured for the coding (encoding and decoding) time only in this experiment. The binary uncoded version of 2D-SPIHT from RPI is chosen for comparison. Note that wavelet transformation times are not included. Six and eight levels of wavelet decomposition with Daubechies 9/7 filters are used for Lena and Woman, respectively. The PROGRES scheme performs lossless coding of quantizer bin numbers on pre-quantized wavelet transformed image. Note that both SPIHT and PRO- GRES used here do not use subsequent entropy coding of the code streams. In Table 5.4, it shows that the encoding time of PROGRES increases very slowly with increasing bit rate and reveals greater speed improvement over SPIHT

94 79 Figure 5.8: Coefficients scanning order in PROGRES algorithm for image block for higher bit rate, two times at 1.0 bpp. The speed improvement in decoding is achieved over all bit ranges, four times on average. The loss of decoded quality (in PSNR) is very small as shown in Table 5.5. In fact, the small loss of decoded quality is caused by the capability of random access decoding because: 1) the maximum dynamic range number (in bits) of the transformed image block is encoded in each code-block, 2) the block length (in bytes ) of each code-block is encoded in each codeblock, and 3) the block size should be a multiple of bytes for a random access, thus unused bits in the last byte of a code-block can be wasted. The overhead is usually decreasing towards high bit rate. Also, it is seen in the Table 5.4 that PROGRES outperforms the coding speed of LTW in [48], upto two times in encoding and upto seven times in decoding, LTW uses arithmetic coding of bit ranges of coefficients in subbands. Table 5.5 shows the comparison of decoded qualities and numbers of decoded of bits between PROGRES and SPIHT for progressive resolutions. In general, the loss of quality of PROGRES over SPIHT is decreasing towards higher bit rates. However, at 1.0 bpp of full resolution, the loss is increasing because of the increase of block length information for each code-block. For half and quarter resolutions, the difference of decoded qualities at various resolutions is small since the details of the image are stored in higher frequency subbands (resolutions). Notice that the loss of quality (in SNR) is very small at quarter resolution from a minimum of

95 80 resolution r parents d base r children d local current dynamic range coded, level range number (scan order) coefficient values , (0) , (1) -6 (2) -25 (3) , (4) 1 (5) -7 (6) 2 (7) , (8) -3 (9) -2 (10) 3 (11) , (12) 0 (13) 3 (14) -3 (15) , (16) 1 (17) 0 (18) 0 (19) , (20) 2 (21) -3 (22) 0 (23) , (24) 0 (25) 3 (26) 3 (27) , (28) -1 (29) -7 (30) 0 (31) , (32) -1 (33) 0 (34) 0 (35) , (36) -1 (37) 1 (38) 1 (39) , (40) 0 (41) 1 (42) -1 (43) , (44) -1 (45) 1 (46) -5 (47) , (52) 0 (53) 1 (54) 1 (55) , (56) -1 (57) 2 (58) 0 (59) , (60) -2 (61) -1 (62) 0 (63) 3 Table 5.2: Step by step demonstration of PROGRES coding (Resolution 0 through 3. The parenthesized numbers indicate the order of coefficient encoding.) db at 1.0 bpp to db at bpp. One of the benefits of progressive resolution encoding/decoding is that a reduced number of bits can be decoded to reconstruct a reduced scale. In PROGRES, as shown in Table 5.5, less number of bits are required to decode at lower resolutions, especially at higher bit rates. In SPIHT, the full bitstream must be decoded for lower resolutions. As an example, for the half resolution decoding at 1.0 bpp, only 149,936 bits are decoded by PROGRES, while the full 262,144 bits are decoded by SPIHT. Similarly, for the quarter resolution decoding at 1.0 bpp, only 65,400 bits are decoded by PROGRES, but the full 262,144 bits are decoded by SPIHT. Figures 5.9 and 5.10 show the reconstructed Lena images for four different bit rates by 2D-SPIHT and PROGRES, respectively. Figures 5.11 and 5.12 show the reconstructed Goldhill images by 2D-SPIHT and PROGRES, respectively. Small differences in quality are almost not noticeable, especially at higher bit rates. Table 5.6 shows that the decoding times of Lena and Woman at 0.5 bpp are increasing for progressively increasing resolutions. In the Lena image, the decoding time increases less than 1.5 times whenever the resolution increases. Meanwhile, the decoding time increases two times for the next higher resolution in the Woman

81 image. Figure 5.9: Reconstructed Lena by SPIHT, from left to right, top to bottom: (a) 0.125 bpp, (b) 0.25 bpp, (c) 0.5 bpp, and (d) 1.0 bpp 5.8 Analysis of PROGRES Algorithm 5.8.1 Differences from SPIHT The biggest difference between the PROGRES and SPIHT algorithms is the existence of dynamic range coding in PROGRES.

96 81 image. Figure 5.9: Reconstructed Lena by SPIHT, from left to right, top to bottom: (a) bpp, (b) 0.25 bpp, (c) 0.5 bpp, and (d) 1.0 bpp 5.8 Analysis of PROGRES Algorithm Differences from SPIHT The biggest difference between the PROGRES and SPIHT algorithms is the existence of dynamic range coding in PROGRES. In PROGRES, for each coefficient bin, the dynamic range is represented by a unary coding. However, SPIHT does not code the dynamic range for each coefficient. Instead, it is coding the maximum dynamic range (i.e. the number of bit planes) using a bitplane-by-bitplane coding

97 82 Figure 5.10: Reconstructed Lena by PROGRES, from left to right, top to bottom: (a) bpp, (b) 0.25 bpp, (c) 0.5 bpp, and (d) 1.0 bpp scheme. The bitplne itself works as a sort of a synchronization method. Whenever each bit is decoded, the decoder knows which significance map (i.e. bitplane) the bit information corresponds to. Each coefficient magnitude is represented by two parts : the number of bits to represent it and the magnitude. The number of bits is understood as the dynamic range in the thesis. The set consists of its root value (a wavelet coefficient) and subsets. Each subset is recursively defined in the same way as its parent. The representation of the dynamic range of a set is represented by the difference d base, from that of its parent set.

83 Figure 5.11: Reconstructed Goldhill by SPIHT, from left to right, top to bottom: (a) 0.125 bpp (b) 0.25 bpp, (c) 0.5 bpp, and (d) 1.

98 83 Figure 5.11: Reconstructed Goldhill by SPIHT, from left to right, top to bottom: (a) bpp (b) 0.25 bpp, (c) 0.5 bpp, and (d) 1.0 bpp Without entropy coding, this difference information is coded as a unary number ending with 0. The end mark 0 can be understood as significance testing bit in SPIHT. Because, for example, a decrease of dynamic range 3 in PROGRES which is 1110 in unary form can be viewed in SPIHT s context that there is no significant bits for succeeding three bitplanes and first significant bits will be found at fourth bitplane from current bitplane. Meanwhile, in EZW and SPIHT, both without entropy coding, the magnitude of a significant coefficient can be coded using m bits in m bit-planes. Assume a certain significant coefficient c i,j. If the first significant bit of coefficient c i,j is at bit-plane k, where m bit-planes are defined from 0 (LSB) to m 1 (MSB), the

99 84 Figure 5.12: Reconstructed Goldhill by PROGRES, from left to right, top to bottom: (a) bpp (b) 0.25 bpp, (c) 0.5 bpp, and (d) 1.0 bpp sorting passes will output (m 1 k) of 0 s for bit-plane m 1 down to k + 1 and one of 1 for bit-plane k and the refinement passes will output k of either 0 s or 1 s for bit-plane k 1 through 0 depending on the magnitude of the coefficient. The total number of bits is : (m k 1) k = m. We don t include the set partitioning information to locate the coefficient c i,j since it represents the location information, not the coefficient value itself. In SPIHT, a significant coefficient is coded by its position in the significance map, i.e. the bitplane corresponding to current threshold. The position is represented by a sequence of binary decisions of partitions. For the significance bit information in remaining bitplanes at the same position, only the significance for

100 85 the corresponding threshold is coded without position Similarities to SPIHT While there is apparent difference in bitplane management between between PROGRES and SPIHT, similarities also can be found if we interpret the meanings of each bit, especially for d base and d local of PROGRES. First, the similarity between dynamic range coding of PROGRES and sorting pass of SPIHT is stated. The unary value of d base and d local is very similar to the sequence of coefficient significance testings in SPIHT. That is, each additional 1 of unary value in PROGRES indicates the dynamic range of coefficient is dropped by half, which corresponds to 0 representing insignificance for the given threshold in SPIHT. As an example, if d base = 1110 in PROGRES, it means that the coefficient is insignificant for following three less significant bitplanes in SPIHT. Secondly, the similarity between actual coding of coefficient values in PRO- GRES and refinement pass of SPIHT is stated as follows. In SPIHT, once the first significance bit of a coefficient appears in certain bitplane, the refinement pass starts for all the next bits of the coefficient. Each bit coded in the refinement pass of SPIHT exactly corresponds to each bit of coefficient in binary form, just starting from the next bit of the first nonzero bit of the coefficient, which is similar to actual coefficient value coding steps in PRGORES. And the last similarity, which is quite complicated, is the zerotree coding scheme in SPIHT and hierarchical dynamic range coding scheme in PROGRES. In SPIHT, two kinds of significance testing exist. One is for coefficients and another is for sets. The significance testing mentioned in the first similarity above is a significance testing for a coefficient. The 0 in SPIHT indicates do not split the set since the set does not have any significant coefficient in it. And the 1 indicates split the set since the set has at least one significant coefficient in it. Thus, a lot of coefficients grouped in a set are represented together for their significance regarding a current threshold or bitplane, causing savings of many significant bits. Analogous processing is also done in PROGRES by way of a hierarchical dynamic range coding. Briefly, the decrease in dynamic range between two adjacent

101 86 resolution levels means that the bits on the bitplanes corresponding to the decreased dynamic range do not need to be coded in the higher resolution levels because they are all zeros implicitly. Say, two resolution levels k and k+1 have dynamic range of p and q (p > q), respectively. Once the q is decoded from the information of d base = p q, all the bits on the bitplanes p through q+1 in resolution levels 0 (i.e. the lowest frequency) through k+1 are known as insignificant. They are never coded in the encoding step. This procedure is very similar to the zerotree coding of coefficients in SPIHT, which is performed for each bitplane in SPIHT though. Since PROGRES also engages a spatial orientation tree for hierarchical dynamic range coding, the tree structure to save the significance bits information is identical in nature. The d local information in PROGRES, which codes the decrease of dynamic range between r children and r grandchildren, can be interpreted in a similar manner as above. 5.9 Entropy coding of dynamic ranges in PROGRES Here, we can see if there is any possibility for reducing the overhead of unary coding in PROGRES. For Woman image encoded at 0.5 bpp, we have observed the distribution of d base information for each resolution level, and measured the ideal entropy rate and the actual coding rates obtained by unary coding. As shown in Table 5.7, a coding gain can be obtained for the d base in the prediction of the dynamic range of the second lowest resolution. However, the prediction for second resolution occurs only once for each image block where PROGRES algorithm is applied. Thus, the overall improvement expected is not substantial. The proportion of d base information out of entire compressed bitstream is about 7.6% in case of 1.0 bpp encoded Woman image, with 4096 independent spatial orientation trees or image blocks in it (i.e. 5 levels of decomposition). The remaining part of the bitstream corresponds to the magnitude and sign bits information. This proportion increases for a lower bit rate coding and decreases for a higher rate.

102 Resolution Scalability and Random Access Decoding A bitstream structure for simultaneous random access and resolution scalable decoding is shown in Figure Let us define image block as a non-overlapping square portion of a source image. Thus, the source image consists of image blocks. Assume that each image block is encoded independently. And let us define the bitstream block as portions of the coded bitstream associated with a given image block. Each image block of the source image is encoded in each bitstream block. As stated above, each image block i of the source image is independently coded in bitstream block i, and moreover each resolution is coded in scalable fashion. A more detailed view of each block bitstream is also shown in Figure 5.16 with length information of the block. the lowest resolution the highest resolution b 0,0 b 0,1... b 0,M-1 b 1,0 b 1,1... b 1,M-1... block 0 block 1 Figure 5.13: Bitstream structure for simultaneous random access decodable and resolution scalable image. Each resolution γ in block β is notated as b β,γ Random Access The purpose of random access decoding is to enable fast access to an interesting area with minimal decoding work. For this, portions of the wavelet transform of an input image corresponding to a local region are encoded independently, so that each bitstream block can be randomly accessed (See Figure 5.13). To encode a portion of the wavelet transform independently, we rearrange the wavelet coefficients such that a block of wavelet coefficients corresponds to a local region in the image as shown in Figure Since each image block should be independently coded, the maximum dynamic range of the block should be coded separately for each block and this causes another overhead of random access capability in addition to block length information.

103 88 Wavelet transformed image : a wavelet coefficient Rearranged wavelet transformed image HL2 HL1 a wavelet coefficients block for a image block k LH2 HH2 block k-1 block k+1 block k+2 LH1 HH1 Figure 5.14: Rearrangement of wavelet coefficients for random access coding : each block of rearranged wavelet coeffifcients corresponds to the local region in a source image The beginning of the target bitstream block can be found by using the length information of every block coded before the target block, and this leads to minimal decoding. The length information stored in each block causes inevitable overhead and loss of coding efficiency. The average block seek time will be Mp N, where M is 2 the number of resolution scales, p is the average skip time, and N is the total number n of image blocks. The average seek time can be improved to an order O( ) faster log 2 n by following the method in Chapter 7 of this study, which is also described in [50] Resolution Scalable Decoding Resolution scalability is the feature that enables increase of resolution whenever more succeeding bits are decoded. Figure 5.15 shows the example of five different resolution scales, which are sequentially decoded from one encoded bitstream. Each bitstream for image block i has M resolution scales. Figure 5.16 shows the bitstream structure of block i with M resolutions. If a user wants to decode up to resolution k, the PROGRES decoder fetches the required bits from the bitstream for

104 89 Figure 5.15: Resolution progressive decoding : five resolution scales (at 0.5 bpp), from left to right: (a) 1/16, (b) 1/8, (c) 1/4, (d) 1/2, and (e) Full resolution the block i sequentially. However, the length of the sub-bitstream for each resolution is not coded because the decoder knows where to stop for each resolution decoding. When a user want see more details about the block i, the remaining undecoded resolution can be decoded resolution by resolution. l i,0 + l i, l i,m-1 = length of b i,0,b i,1,...,b i,m-1 b i,0 b i,1... b i,m-1 l i,0 l i,1 l i,m-1 bitstream for block i Figure 5.16: Bitstream structure for resolution scalability of block i. A combined random access and resolution scalable decoding is well demonstrated in Figure Once the beginning of the target block is seeked, each bit

105 x 512 Lenna Query : "Decode 32x32 block at (256,256) Encoding random access An encoded bitstream block k Decode Progressive resolution decoding in each block Figure 5.17: Random Access and Resolution Scalable Decoding after that point of bitstream is decoded until requested resolution is achieved. In Figure 5.17, a size target image block located at (row,column) = (256,256) of Lena image is randomly accessed from the compressed bitstream and then four resolutions are progressively decoded Conclusion The low time complexity 2D-image coding algorithm, PROGRES (Progressive Resolution Decompression), is presented. Non bit-plane coding scheme is applied to reduce the coding time. The dynamic ranges of wavelet coefficients are efficiently coded by sharing information of decrease in energy along subbands with increasing frequencies. The presented method is faster than original 2D SPIHT, two times in encoding and four times in decoding at 1.0 bpp. With only small loss of quality, this scheme achieves a very low time-complexity with resolution scalable and random access decodable features.

106 91 resolution r parents d base r children d local current dynamic range coded, level range number (scan order) coefficient values , (92) 0 (93) 0 (94) 0 (95) , (96) 0 (97) -1 (98) 0 (99) , (100) 0 (101) 0 (102) 1 (103) , (112) 0 (113) 1 (114) 0 (115) , (116) -1 (117) 0 (118) 0 (119) , (128) -1 (129) 0 (130) 0 (131) , (156) 0 (157) 0 (158) 0 (159) , (180) -1 (181) -1 (182) 1 (183) , (184) 1 (185) 0 (186) 0 (187) , (188) -1 (189) 0 (190) 0 (191) Table 5.3: Step by step demonstration of PROGRES coding (resolution 4) (Continued from Table 5.2. The parenthesized numbers indicate the order of coefficient encoding)

107 92 Table 5.4: The comparison of coding time among original RPI SPIHT, LTW, and the presented PROGRES (Lena 8 bpp , Woman 8 bpp ), wavelet transform times are not included Bitrate Encoding Decoding (bpp) (cycles 10 6 ) (cycles 10 6 ) SPIHT LTW PROGRES SPIHT LTW PROGRES Lena Woman N/A N/A Table 5.5: Decoded qualities of Lena image by SPIHT and PROGRES at progressive resolutions bit rate SPIHT PROGRES decoded decoded decoded decoded loss of quality quality size quality size (db) (bpp) (db) (bits) (db) (bits) , , Full , , resolution , , , , , , Half , , resolution , , , , , , Quarter , , resolution , , , ,

108 93 Table 5.6: Decoding time of progressive resolutions, coded at 0.5 bpp (Inverse wavelet transform times are not included) Lena ( ) Woman ( ) Resolution Decoding time Resolution Decoding time (cycles 10 6 ) (cycles 10 6 ) Table 5.7: Entropy rate of d base at each prediction of dynamic range for Woman at 0.5 bpp (the lowest resolution level is 0 and the highest resolution level is 5) Predicted resolution level Entropy rate Rate by unary coding # of (bits) (bits) occurrences

109 CHAPTER 6 VOLUMETRIC IMAGE CODING BY 3D-PROGRES In this chapter, 3D PROGRES is introduced which can code the volumetric image. The brief, overall design of the 3D PROGRES algorithm is explained, extending the idea of previously shown 2D version of PROGRES. The most big difference from the 2D version is the usage of a 3D spatial orientation tree, which is constructed from the 3D subband/wavelet transform coefficients D-PROGRES The design and implementation of 3D PROGRES is a straightforward extension of 2D version of PROGRES. For wavelet decomposition, asymmetric (decoupling) decomposition is chosen since it is reported to show better performance when the correlation along depth (z-axis) direction is stronger than that along horizontal or vertical direction [19]. The decoupling wavelet decomposition first performs in-depth filtering and then repeatedly performs horizontal and vertical filtering in turns. The volumetric image source can be represented by a three dimensional array. The 3D subband/wavelet transform is performed on the array. Then, the transformed wavelet coefficients are represented by a octree (i.e. 8-ary tree) to form a 3D spatial orientation tree. While a 2D spatial orientation tree represent the parentchild relationship in three orientations, a 3D spatial orientation tree represent the parent-child relationship in seven orientations. Out of seven orientations in 3D tree, three correspond to in-plane, and remaining four correspond to in-depth. A 3D spatial orientation tree is shown in Figure 6.1. One distinguishing point of the 3D spatial orientation tree is that the growth rate of energy toward low frequency subbands is larger than the 2D tree. This causes the average decrease in dynamic range over two consecutive resolution levels larger than the one in a 2D case. The decrease in dynamic range is represented by two kinds of variable, d base and d local in 2D PROGRES algorithm, as seen in Figures

110 95 Figure 6.1: 3D spatial orientation tree and 5.5 of Chapter 5. In 3D PROGRES, the average amount of d base and d local at a certain resolution level is larger than those of 2D PROGRES at the same resolution level, especially in the prediction of dynamic ranges for lower resolution levels. As discussed in Chapter 5, the number of coefficients in a spatial orientation tree increases exponentially. Even for quadtree, the growth is very fast, resulting 256 and 1024 coefficients for four and five levels decomposition, respectively. The amount of growth is far more impressive for octree, where the number of coefficient for each next level is eight times increasing, resulting 4096 and coefficients for four and five levels of decomposition, respectively. And the last, since the dynamic range or the energy range rarely increases towards higher frequency subbands but decreases, the prediction works effectively.

111 Compression results by 3D PROGRES (No tile, no fast random access version) Analysis A comparison of coding speed between 3D PROGRES and 3D SPIHT performed in this section. The trend is quite similar to that of 2D version of the algorithms. 3D PROGRES is faster than 3D SPIHT for every case. Plotted curves of performances are shown in Figures 6.2 and fig:3d encode comp plot susie show the performance comparisons in plotted curves. Table 6.1 shows the comparison results for Football and Susie video sequences.. Two 3D image sources for experiments are Football 8 bpp, (SIF format) and Susie 8 bpp (ITU 601 format)). The numbers of wavelet decomposition levels are three and five for 3D-SPIHT and 3D-PROGRES, respectively. The running platform was Intel Pentium Centrino 1.5 GHz processor with Windows-XP Pro OS. MS-Visual C compiler with no speed optimization is used to generate the executable programs. 3D-PROGRES algorithm shows faster coding speed than 3D-SPIHT in every case. Note that the increase of encoding time along increasing bit rate in 3D- PROGRES is very low comparing to that of 3D-SPIHT. The decoding time is apparently increasing for a higher bit rate. For decoding, 3D-PROGRES is 5 10 times faster than 3D-SPIHT A max-tree construction Both in PROGRES and SPIHT, a preprocessing of constructing a max-tree information is performed to improve the compression speed. The implementation of SPIHT for non-commercial or academic purposes usually does not consider speed improvement, as found in SPIHT of QccPack [2], since their main concern is just a reproduction of the exactly working algorithm. SPIHT constructs the max-tree such that the maximum magnitude among descendants is available at every node of the spatial orientation tree. Similarly, PROGRES calculates the maximum dynamic ranges of each subtree in advance. The goal of this analysis step is to avoid duplicate search of coefficient magnitudes larger

112 Comparison of coding speed between 3D PROGRES and 3D SPIHT : Football encoding 3D PROGRES encoding 3D SPIHT encoding DWT 1.4 Coding time (sec) Bitrate Comparison of coding speed between 3D PROGRES and 3D SPIHT : Football decoding 3D PROGRES decoding 3D SPIHT decoding IDWT Coding time (sec) Bitrate Figure 6.2: The comparison plotting of coding time betweeen 3D-SPIHT and 3D- PROGRES : Football ( , SIF format)

113 98 30 Comparison of coding speed between 3D PROGRES and 3D SPIHT : Susie encoding 3D PROGRES encoding 3D SPIHT encoding DWT 25 Coding time (sec) Bitrate 9 8 Comparison of coding speed between 3D PROGRES and 3D SPIHT : Susie decoding 3D PROGRES decoding 3D SPIHT decoding IDWT 7 6 Coding time (sec) Bitrate Figure 6.3: The comparison plotting of coding time betweeen 3D-SPIHT and 3D- PROGRES : Susie ( , ITU 601 format)

114 99 Table 6.1: The comparison of coding time between RPI 3D-SPIHT and the presented 3D-PROGRES (3D image sources : Football 8 bpp, (SIF format), Susie 8 bpp (ITU 601 format)) Bitrate Encoding Decoding (bpp) (in seconds) (in seconds) 3D-SPIHT 3D-PROGRES 3D-DWT 3D-SPIHT 3D-PROGRES 3D-IDWT Football Susie than a given threshold in SPIHT and to avoid duplicate calculations of maximum dynamic ranges of subtrees. Encoding time = DWT + max-tree construction + PROGRES encoding Decoding time = PROGRES decoding + Inverse-DWT Intuitively, the encoding time subtracted by max-tree construction time is expected to be the same as the decoding time in both algorithms. A decoding procedure follows exactly the same execution path followed in an encoding procedure. Table 6.1 shows the comparison of coding speeds between 3D-SPIHT and 3D-PROGRES. The results are very similar to the two dimensional cases. As a target bitrate increases, a pure coding time (i.e. except the max-tree construction of encoding) is also growing in both algorithms. The overall encoding time (without wavelet transform) seems to be very slowly growing with a increasing bitrate, since the proportion of max-tree construction time is substantial in encoding time. The elapsed times taken for max-tree constructions are 0.69 sec and sec for 3D-SPIHT Football and Susie, and 0.45 sec and 7.14 sec for 3D-PROGRES Football and Susie (Table 6.2). This tells that the actual amount of computation in

115 100 max-tree construction is larger in 3D-SPIHT. Table 6.2: Comparison of max-tree construction time during encoding (in seconds) 3D-SPIHT 3D-PROGRES Football Susie Coding a Very Large Image with 3D-PROGRES : A Tiled Version Suppose we try to compress very large size three dimensional volumetric image, say ten gigabytes size. The first expected problem would be if the working memory space of encoder is enough. Today s popular 32 bit microprocessors support 2 32 physical memory space, which is 4 gigabytes of byte-addressable space. With the aid of virtual memory system, the microprocessor supports much wider logical memory space than physical space. However, when the actual size of the source image is larger than the size of physical memory, the occurrence of page faults from a paging system (Virtual Memory System) is inevitable [51]. This will cause overhead in image encoding. Thus, it would be a good idea to split the large image into tiles and then encode each tile separately Tile and Block Based Coding The 3D PROGERS is designed to encode very large volumetric images, where each 3D image block can be randomly accessed and then decoded. Also, each block can be decoded progressively in resolution, namely, from low resolution to high resolution. The tiled encoding scheme is necessary when the entire source image can not be loaded into the main memory of encoder. For example, the JPEG 2000 image coder has the option for tiled encoding. The block is the unit of image data dimension for encoding and decoding keeping a random access in mind. Meanwhile, the tile is the unit of image data dimension for both wavelet transform and image I/O (i.e. loading the source image into memory and storing the reconstructed image in output device). Only the data

116 101 loaded in the main memory can be processed together by CPU driven by the source data itself and instructions of encoder or decoder computer program. Therefore, the available main memory size will determine the possible tile size. The tile size cannot exceed the main memory size. Source Tile image Wavelet Transform Inverse Wavelet coefficients block 0 block 1... block n-1 block 0 Reconstructed block 1 Wavelet Tile image... Transform block n-1 Wavelet coefficients Block Encoder Coded bitstram for block i Block Decoder Figure 6.4: Tiled image encoding and decoding The random access decoding feature requires that we need to keep track of the order in which blocks are coded. We will see the detailed explanations of the above statements in the following sections Tile and Block We have two kinds of units, a tile and a block. Both the tile size and block size are determined during an encoding stage (tile size is multiple of block size), depending on the available main memory size and desired granularity (i.e. the minimum size) of random access. A larger tile size will improve the entire compression/decompression speed since the image loading/storing can be done more efficiently. Also, the larger tile size will improve the coding efficiency for low bit rate since it reduces the occurrences of boundary artifacts due to the increased tile size, though they are very small improvements. In order to get the best performance in compression and decompression, the trade-off between tile and block size should

117 102 be well considered for each application and system Tile and Block Addressing Scheme The tiles in the source image are coded into the bitstream in a raster scanning order, i.e. left to right, top to bottom, and front to rear (in 3D case). Figure 6.5 shows the bitstream for sequentially coded tiles. Tiles bitstream Figure 6.5: Tiles in the bitstream However, the blocks in the source image are not coded in a raster scanning order. They are ordered only inside each tile. An example is shown below. Suppose we have the tile size and the block size Then, each tile has four rows of blocks in it, where each row has four blocks. Two adjacent image tiles are shown in Figure 6.6. The first rows of horizontally adjacent tiles are not consecutively encoded since the next image block to be coded after the block 4 is block 5, not a block 17. Note that only the image information in a tile is available for each instance of encoding and decoding. That is, when a tile n is being processed, a tile n+1 is not available both in encoding and decoding. The shape of the tile is chosen as a square to best de-correlate the source by an 2D wavelet transform. In the formation of link information of each block for random access decoding, a block numbering policy across the tiles should be considered ROI (Region of Interest) Over Tiles The ROI is usually chosen by an user interaction. Since the minimum unit of random access is a block, the ROI is defined at a block level. The ROI can overlap multiple tiles as shown in Figure 6.7.

118 103 block index Tile n 16 Tile n Tile n 32 Tile n+1 Figure 6.6: Block numbering across adjacent tiles Tile nroi Tile n Tile n Tile n Figure 6.7: ROI (Region of Interest) over tiles For decoding the ROI, each block included in the ROI is randomly accessed and decoded. Since the the order of block coding is made within each tile, the blocks in the ROI are only decoded in that order. Figure 6.8 shows an example of block decoding for an ROI. The number (1 through 64) in each block indicates when it is encoded or it can be decoded later, i.e. the order of block coding. For decoding the ROI shown as a dotted rectangle, four tiles should be processed. For each tile processing, all blocks in need are decoded, and then wavelet transform is performed over all those decoded blocks with appropriate extension around the boundary of ROI. In the first tile, the blocks 7,8,11,12,15,16 are decoded. In the second tile, the blocks 21,25,29 are decoded. In the third tile, the blocks 35, 36 are decoded. In the last fourth tile, the block 49 is decoded. Again, note that the wavelet transform is performed for each tile, just after all ROI-involved blocks in the tile are decoded.

119 Tile nroi Tile n Tile n Tile n Figure 6.8: ROI and its corresponding blocks over tiles Wavelet Transform in Tiled Coding In encoding, the wavelet transformation is performed on each tile image. This means that all blocks in the tile image are wavelet transformed together and only the boundary pixels of the tile are extended to reduce boundary artifacts. For decoding, similarly, if a tile image is requested to decode, inverse wavelet transform is performed over the entire tile (containing wavelet coefficients), with all blocks in it together. For ROI decoding, however, since not all blocks in the tile are decoded, the inverse wavelet transform over all blocks in the tile is impossible. Thus, the extension is done for every block boundary which is not adjacent to the other block. Figure 6.9 shows how the extension is performed for wavelet transform of each tile. 6.4 Resolution Progressive and Random Access Decoding in 3D-PROGRES The idea of progressive random access decoding in PROGRES is explained in Section 5.10 of Chapter 5. In this Section, the experimental results of simultaneous progressive resolution and random accessible decoding are displayed for volume images. Figures 6.10 through 6.14 and 6.15 through 6.19 show randomly accessed and progressively decoded ROIs (Region of Interest) of the Susie video sequence and the Chest volume image both coded at 0.5 bpp. The ROI s selected for experiments

120 105 Extension Tile 1 Tile Tile n Tile n Tile n Tile n Tile 3 Tile 4 : decoded blocks for ROI : not decoded blocks : extension Figure 6.9: Extension of pixels around decoded blocks in each tile are, a region of size at location (x, y, z) = (200,100,0) for the Susie sequence and a region of size at location (x, y, z) = (100,25,0) for the Chest image. Tables 6.3 and 6.4 show the experimental results of resolution scalable decoding with random access. Whenever a user selected ROI is to be decoded, every block involved with the ROI is randomly accessed and then decoded, which leads to substantial time savings by avoiding a full frame decoding. More detailed explanations about the results in the Tables are discussed in following sections Resolution Progressive Decoding The selected 3D ROI s (Region of Interest) are decoded in a resolution progressive manner. That is, the bitstream parts corresponding to ROI s can be decoded from lower resolution to higher resolution, progressively. If users want to see more details of the ROI, the information for the next resolution is decoded. Figures 6.11 through 6.14 show the resolution progressive ROI decoding of encoded Susie bitstream, from quarter to full resolution. Figures 6.17 through 6.19 show the res-

121 106 olution progressive ROI decoding for Chest. Note that, using the random access decoding scheme, only the blocks corresponding to the requested ROI are decoded progressively in resolution. As seen in Tables 6.3 and 6.4, the decoding time of full frame is apparently decreasing for a lower resolution. For the size Susie sequence, a half resolution decoding is done about nine times faster than full resolution decoding. For the smaller 3D image source, size Chest, it is about six times faster when a decoding is done at half resolution. Note that the total decoding times in the Tables include the disk writing time of reconstructed image as well as PROGRES decoding and Inverse DWT (Discrete Wavelet Transform) time. Table 6.3: The comparison of 3D-PROGRES decoding time for various resolutions for Susie 8 bpp coded at 0.5 bpp (The selected ROI is at location (x, y, z) = (200,100,0)) decoding resolution PROGRES IDWT + I/O total region decoding time (sec) decoding time (sec) (sec) full frame 1/ full frame 1/ full frame full ROI 1/ ROI 1/ ROI full Table 6.4: The comparison of 3D-PROGRES decoding time for various resolutions for Chest 8 bpp coded at 0.5 bpp (The selected ROI is at location (x, y, z) = (100,25,0)) decoding resolution PROGRES IDWT + I/O total region decoding time (sec) decoding time (sec) (sec) full frame 1/ full frame 1/ full frame full ROI 1/ ROI 1/ ROI full

122 Random Access Decoding of 3D ROI (Region of Interest) The granularity for a random access is a sized block for both cases. An image bitstream file consists of independent parts, each of which corresponds to a randomly accessible image block. In order to randomly access a target timeefficiently, minimum decoding is desirable. Three different approaches for random access are stated in Chapter 7. One of the main goals of studies related with random access of image or video is to reduce the block seek time. The easiest way to implement the random access is to put the length of a block in the header of each block. The block length information is used to skip the blocks put ahead of the target block in the bitstream. Because the average block decoding time is far more than the time for skipping a block or a group of blocks in the bitstream, a block skipping technique prevents unnecessary blocks being fully decoded. When the block length is used as pointer to the beginning of the next block, we call it link or index. The time efficiency of random access decoding for a coded 3D image is demonstrated in Chapter 7. In this Section, we try to show the time savings of ROI decoding by the help of random access decoding. In both Susie and Chest sources, the block skipping method is applied. The experimental results showing the time savings by a random access decoding and a ROI decoding are seen in Tables 6.3 and 6.4. The difference of decoding times between full-frame decoding and ROI decoding is clear. For the Susie bitstream, ROI decodings are four, seven and eleven times faster than full-frame decoding at the resolution of quarter, half, and full, respectively. For the smaller size image, Chest bitstream, ROI decoding is about 2.5 times faster than full-frame decoding at all three resolutions. The decoding speed gain obtained from the ROI and random access decoding depends on the size of a ROI. A smaller ROI will give a more gain in decoding speed.

123 108 Figure 6.10: 3D volumetric view of 3D image source: Susie (720x480x128) Figure 6.11: Susie ROI decoded at quarter resolution (50x25x16)

124 Figure 6.12: Susie ROI decoded at half resolution (100x50x32) 109

125 Figure 6.13: Susie ROI decoded at full resolution (200x100x64), view 1 110

126 Figure 6.14: Susie ROI decoded at full resolution (200x100x64), view 2 111

127 Figure 6.15: 3D volumetric view of 3D image source: Chest (256x256x64), view 1 112

128 113 Figure 6.16: 3D volumetric view of 3D image source: Chest (256x256x64), view 2 Figure 6.17: Chest ROI decoded at quarter resolution (16x16x16)

129 114 Figure 6.18: Chest ROI decoded at half resolution (32x32x32) Figure 6.19: Chest ROI decoded at full resolution (64x64x64)

130 CHAPTER 7 FAST RANDOM ACCESS DECODING 7.1 Introduction With the enormously increasing quantity of various image volumes of recent years, most image servers will need to store the volume image data in their compressed form. On the compressed bitstream, the servers also need the ability of search and random access decoding of image data. In addition, for the huge images with more than hundreds of gigabytes, the volume image should be compressed in a tiled fashion to avoid trashing from the excessive context switching in modern operating systems based on paging systems [51]. For this reason, a tiled version of the 3D-SPIHT (Set Partitioning In Hierarchical Trees) [18] is a good choice for our experiments. Random access decoding is a scheme for extracting the target information from the bitstream with minimum decoding work. It is usually required in interactive image browsing systems, where users will first browse on coarse resolution images, then probably will look into details of some parts of the images according to their interests. In context-based image retrieval systems, the spatial location of the target image object is often one of the key features to search, where the location is the index of an image block in the bitstream. In the image server system such as image database or digital video broadcasting system, users search and request some part of the image or video. In the server side, there should be seamlessly many such access requests which lead the server to fetch the corresponding portions of image or video from the coded bitstream. Thus, the access time should be minimized. For this, it is desirable that the access time to a certain part (or block) of the image be well predicted, and it can be best predicted if each access time for the image block is constant. Here, we find the motivation of our research. 115

131 116 An image volume with 8 blocks, each block size = w x w x w w w w block 0 block 1 block 2 block 3 block 4 block 5 block 6 block 7 Encoding A compressed bitstream of the image volume with 8 blocks b 0 b 1 b 2 b 3 b 4 b 5 b 6 b 7 Bitstream for block 0 Bitstream for block 4 Figure 7.1: Encoding of an image volume and its coded bitstream Conventional Random Access Decoding : Linear and Slow Conventional random access decoding methods in JPEG 2000 [32, 33] simply use a map of indices or links to the blocks ( code-blocks in JPEG 2000 or EBCOT notation). To find the code block (or the larger, precinct, tile) of interest, the decoder should look up all the indices up to the target block. Consequently, its block seek time depends entirely on the location of a block within a bitstream. It would not be a problem with images of several megabytes. However, for huge volume images comprising gigabytes, such as Visible Human Project [52], the overall performance of random access decoding in interactive imaging system will depend heavily on block seek time. Rodler [53] presented a representation method of wavelet compressed volume data for fast random access of a voxel. However, it was conducted in the viewpoint of graphics, and therefore the coding efficiency was not competitive.

132 Random Access Decoding Based on Image Blocks : Three methods An image volume is encoded as n independently decodable block bitstreams, b i, 0 i n 1. An example for n = 8 is shown with Figure 7.1. Each block bitstream b i, 0 i n 1, contains the information of a given image block and consists of an positive integer number of bytes, which may be different for each image block. Three different random access decoding methods are described. Figures 7.2, 7.3, and 7.4 show three different block seek methods: full, linear, and bi-sectional, respectively Full Decoding Seek Full decoding seek with n=8 blocks, average block seek time = q x n / 2 (q : average time for decoding a block, q >> p) b 0 b 1 b 2 b 3 b 4 b 5 b 6 b 7 Bitstream for block 0 Bitstream for block 4 Figure 7.2: Full decoding seek in the bitstream of image blocks (p : a link following time (to jump or skip a block)) In full decoding seek, i blocks should be decoded to seek the block i, b i. Thus, its average block seek time is q n, where q is an average decoding time of a block 2 and n is the total number of blocks in the bitstream. In Figure 7.2, if b 7 is requested, 7 blocks (i.e. b 0 b 6 ) should be decoded to reach b 7. The block seek time is entirely dependent on where the target block is located on the bitstream. In the worst case, the seek time for the last block (i.e. (n 1)-th block, b n 1 ) is q (n 1). The complexity of average seek time for full decoding method is O(n) for n blocks but the constant factor q is relatively large.

133 118 Linear seek with n=8 blocks, average block seek time = p x n / 2 (p : time for jump to other block by following a link) a link Link field 0 b 0 b 0 b 1 b 2 b 3 b 4 b 5 b 6 b 7 Bitstream for block 0 Bitstream for block 4 Figure 7.3: Linear block seek in the bitstream of image blocks Linear Random Access Decoding Meanwhile, from the beginning of certain block bitstream b i, we can directly jump to the next block bitstream b i+1 without actual decoding of block b i if we know the length of b i. This will save the time for random access. In the linear method, the length information of each block bitstream b i (except the last block b n 1 ) is stored in the header of block b i. Specifically, there is a link field reserved in the beginning of each block in the bitstream, except the last block. Each link field contains the size of the block in bytes. The size of the link field is determined by the maximum allowable size of the block in bytes. Assuming 2 m bytes be the maximum size of every block, the size of every link field is m bits. Recall that we can represent 2 m different events (i.e. lengths in the context here) by m bits. In the Figure 7.3, each link is represented by the link field b 0 i comprising m bits. Since the last block does not have a link field, for a bitstream with n blocks, there exist total n 1 link fields and the size of space reserved for link fields is a total of m (n 1) bits. We assume that the link fields are not entropy coded. Figure 7.3 shows b 7 being requested, where 7 links are followed to jump to b 7. Note that the subscript i in b 0 i indicates the block number or index, while superscript 0 in b 0 i indicates the range of jump by following a link, which is fixed as just one block in the linear method. The link from block b i to b i+1, 0 i n 2, is represented by the link field, b 0 i as shown in the Figure 7.3. Since the average number of jumps to reach the target block is approximately

134 119 n, the average seek time of the linear method is p n, where p is the jump time to 2 2 the next block by following a link. However, it is far less than that of full decoding method since p q. The complexity of average seek time for the linear method is O(n) for n blocks and the constant factor p is very small. In the worst case, the entire links are followed to reach the last block so the worst case seek time is p (n 1) Bi-sectional Random Access Decoding Bi-sectional seek with n=8 blocks, average block seek time = p x log 2 n (p : time for jump to other block by following a link) b 0 b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 0 2 b0 1 b0 0 Link fields Bitstream for block 0 Bitstream for block 4 Figure 7.4: Bi-sectional block seek in the bitstream of image blocks The suggested bi-sectional random access decoding gives a large improvement over the linear method. Figure 7.4 shows b 7 being requested, where only 3 links (from b 0 to b 4, from b 4 to b 6, and from b 6 to b 7 ) are followed. This is possible since the seek to the block is performed similar to binary search. Three link fields, b 0 2, b 1 2, and b 2 2 are stored in block b 0, and two link fields, b 0 4, b 1 4 are stored in b 4. The blocks b 2 and b 6 has one link field for each. The blocks b 1, b 3, b 5, and b 7 do not have link fields. The superscript 0 in the link field b 0 i indicates that the link represented by it has the jump range of 2 0 = 1. For the link field b j i of a block in the bitstream i, the superscript j indicates that the link is used to jump over 2 j blocks in the bitstream. As different from linear access decoding, the lengths of block(s) stored in link fields are not trivially obtained in the encoding phase for bi-sectional method. The procedure for it is as following. We use a stack to keep the most recently generated

135 120 link information, which helps to efficiently calculate the length of multiple blocks and generate the link field information. For an example, in Figure 7.4, the length of the block b 0 is obtained after the block is encoded and then written to the link field for b 0 0. This block length (link) information b 0 0 is stored in the stack for further use in calculating the length of two blocks, b 0 and b 1. Now, once b 1 is encoded, the block length of b 1 is available. Then, to generate the length of two blocks b 0 and b 1, i.e. b 1 0 in the Figure, the block length of b 1 is added to the block length of b 0 (i.e. b 0 0) stored in the stack. Note that the block length of b 0 is popped out of stack for this and the stack is empty now. Again, the newly generated length b 1 0 is written to the link field for b 1 0 and then stored in the stack. At this point, there is one element stored in the stack. This procedure is repeated until all the link field information is obtained. Note that, at any time, the maximum number of elements in the stack is just log 2 (n 1) for n image blocks in the bitstream, which is two for a bitstream of eight blocks as in the Figure 7.4. If we view the target block index in binary number system, the decision steps for jumping can be understood more easily. Given the n blocks on bitstream and index range of [0 n 1], convert a target index j into binary number with log 2 n digits. As an example, block index j = 7 is The first 1 indicates we will take the farthest jump (from b 2 0). After taking the jump to b 4, two links, b 0 4 and b 1 4 are available. The second 1 indicates to take the jump from link field b 1 4, which is the farthest jump available in block b 4. The last 1 also indicates to take the farthest jump at current position. As another example, if the target index j were 5, which is in binary, the second bit 0 indicates not to take jump from block b 4 to block b 6. Now, the last bit 1 indicates to take jump from b 4 to block b 5, which is the target block we seek. The general procedure starts by deciding whether we take the farthest jump starting from the first block b 0. The farthest jump by following a link from b 2 0 is used to jump over four blocks to reach b 4. In this way, the first four blocks b 0 through b 3 are discarded from the search space by one operation of link-following jump. If this link is not taken, four blocks b 4 through b 7 are discarded from the search space. After making the first decision on jumping and its resulting block

136 121 position, we decide again for next jumping, which will discard half the remaining subspace. These decision steps are iteratively continued until we arrive at the target block. And the maximum number of decision steps is log 2 n. Adding the longest jump target Adding the second longest jump target : New jump targets added : Jump targets added in the previous step n 2 0 n-1 n 4 2 n 3 n n-1 n 8 2 n 3 n 4 n n 6 n 7 n n-1 Adding the shortest jump target 3 0 n n-1 1 n 1 n ( n 4) 1 ( n 2) n Figure 7.5: Adding the jump targets Assuming n = 2 m, m 0, the jump target locations are given as : { n} after the longest link (i.e. jump over n blocks) is added, { n, 2n, 3n } after the next longest links (there are two of them, each of which jump over n blocks) are 4 added,, and finally, { i : 1 i n 1} after the second shortest links (i.e. n/2 2 jumping two blocks) are added, and { i : 1 i n 1} after the shortest links n (i.e. jumping one block) are added. This procedure is well shown in Figure 7.5. Counting the number of total jump target locations gives the total number of links, = n 1. In the equation, the first 1 is the number of the longest n links, the 2 is the number of the second longest links,, and finally, the 2 is the n number of the shortest links. Note that the bi-sectional method requires the same number of links n 1 as the linear method uses. If we assume the maximum size of each block is 2 m bytes, the total number of

137 122 bits to represent all links is: 1 (m + (log 2 n 1)) + 2 (m + (log 2 n 2)) + 4 (m + (log 2 n 3)) + + n 2 m = log 2 n 1 i=0 (2 i (m + log 2 n 1 i)) for n blocks. In the equation, the index i indicates the different jump range, where i = 0 for the longest jump and i = log 2 n 1 for the shortest jump. The term 2 i corresponds to the number of links for each different jump range. The numbers of link field bits for different jump ranges, from the longest to the shortest, m+(log 2 n 1), m+(log 2 n 2),, m+1, m, are decreasing successively by one bit. It is derived from the fact that whenever the jump range in bytes is halved, one bit less is required to represent the range. In other words, if we enumerate them from the shortest links, the numbers of bits to represent the jump links, m, m+1, m+2,, m+(log 2 n 1), are increasing by one bit whenever the jump range is doubled. As before, we assume the link fields are not entropy coded. The block indices are considered as sorted keys in a binary search algorithm. The algorithm works on these keys by testing the middle of sorted keys and eliminating the half of the indices within the interval in which the key cannot exist, and then repeating the procedure iteratively. This search method also can be viewed as a binary tree. In particular, for the bi-sectional method presented in this article, a complete binary tree should be constructed. The complexity of binary search algorithm is O(log 2 n). Whenever the algorithm follows a link, half the block indices are eliminated from the search space. This enables the worst case seek time equal to p log 2 n. 7.3 Comparison of Three Random Access Decoding Methods While link information is configured sequentially in conventional methods [32, 33], the suggested method configures link information in a hierarchical way such that a binary search is possible. It runs in a seek time of O( log 2 n ) and guarantees

138 123 Table 7.1: Best and worst case random access decoding performance (p : link following time (sec/link), q : block decoding time (sec/block), p q) Method Worst case Average case Best case full decoding q (n 1) + q q n 2 + q q linear p (n 1) + q p n 2 + q q bi-sectional p log 2 n + q p O(log 2 n) + q q constant seek time even for the worst case, while the conventional linear (next block linking) search method gives average O( n ) time, where its best case gives O(1) 2 and worst case gives O(n). Best and worst case random access decoding (seek and decode) performance is shown in Table 7.1. Assume that the request for block indices is evenly distributed and we have n = 2 m blocks, for m 0. Then, the actual average number of jumps for bi-sectional method notated O(log 2 n) in Table 7.1 can be derived using the following idea. 1. Target block 0 can be reached without any jump. 2. Set a beginning point as block The i-th block can be reached by finding the longest jump to j-th block, where j is the largest power of two index less than or equal to i. 4. If j i, set i (i j), set the beginning point as j-th block, and then repeat the step 2. until j = i Size of a Link Field In the linear method, the size in bits of a link field is fixed as m bits, i.e. log 2 (max [bitstream size in bytes of a image block]) bits. I.e. the maximum range of jump to next block is 2 m bytes since the maximum size of a block bitstream is 2 m bytes. Thus, each link field b 0 i, 0 i n 1, is represented by m bits. In the bi-sectional method, every link has different size of link field, i.e. some links just jump to next image block (actually, half of n 1 links are in this use) and some links jump as long as n 2 blocks. Thus, the size of each link field in bi-sectional method is decided corresponding to the jump range of the link. In Figure 7.6, a link

139 124 (such as the link from b 0 0) jumping to next block is represented by m bits, a link (such as the link from b 1 0) jumping two blocks is represented by m + 1 bits (m is the maximum bitstream size of a image block in bytes). Thus, the sizes of link fields, b 0 0, b 1 0, and b 2 0 are m, m+1, and m+2 respectively. Bi-sectional seek with n=8 blocks, maximum size of one block = 2 m bytes link field size = 2+m bits link field size =1+m bits link field size = m bits b 0 b 1 b 2 b 3 b 4 b 5 b 6 b 7 Bitstream for block 0 Bitstream for block 4 Figure 7.6: Size of link fields in bi-sectional method Example of Bisection Links in Bitstream An example of bisection links in bitstream is shown in Figure 7.7. Under the bitstream, byte address of blocks are marked as 0, 235,, 481, Block 0 has three link fields, each of them has the length of block(s) to jump, 877, 481, and 235 bytes, respectively. The first value 877 (bytes) is for the longest jump to the beginning of block 4, which bi-sect the eight blocks into blocks 0 through 3 and blocks 4 through 7 shown in the Figure. the beginning, the first link field having 877 (byte) is used. The second value 481 (bytes) is for the jump to the beginning of block 2. The third value 235 (bytes) is simply to skip the block 0 itself and to locate the beginning the block 1. After one of three links in the header of block 0 is taken, depending on the target index, further jumping can be needed. For example, if the target is block 6, the length information of 411 bytes stored in the header of block 4 is used to jump to the beginning of block 6. Recall that the size of the link field in bits is different depending on the range

140 125 of jump as discussed in previous subsection. In Figure 7.7, the shortest jump is represented with m=8 bits, which means that the maximum size of each block is 2 8 = 256 bytes, and the second shortest jump is represented with m+1=9 bits. The longest jump in this Figure is represented with m+2=10 bits. : Header information : length of blocks to skip in bytes 877 bytes (10 bits) 481 bytes (9 bits) 411 bytes (9 bits) 235 (8 bits) 181 (8 bits) 218 (8 bits) 222 (8 bits) block 0 block 1 block 2 block 3 block 4 block 5 block 6 block Block addresses in bytes Figure 7.7: Bi-sectional links in the bitstream containing eight blocks Performance Analysis By Experiments Three modes of decoding are experimented : full decoding seek, linear seek, and bi-sectional seek. For encoding, 3D-SPIHT is used with target bit rate around 0.2 bpp and image quality at 29 db. Recall that a block is the unit of random access decoding. The block size is set as , which is perhaps smaller than actual in practice, but in order to have many blocks we purposefully used a small block size. In the 3D-SPIHT, the size of each block is multiple of bytes and the arithmetic coder is initialized for every block, since we need to access a randomly chosen block. The computational platform for experiments is Intel Xeon CPU 2.00GHz with Windows 2000 OS Comparison of Random Access Decoding Time A performance comparison is shown in Table 7.2. The last column of the Table shows average random access seek time. Since the block decoding time for all three cases in the Table is the same, the actual block decoding time can be obtained

141 126 Table 7.2: Comparison of random access decoding performances of linear and bisectional methods in Susie sequence (coded at 29 db) Method Encoding time Bitstream size Bit rate Average block (secs) (bytes) (bpp) seek time (secs) full decoding ,252, sec linear ,276, sec bi-sectional ,300, sec by simply adding average block decoding time, which is ms for the given bitstream. For the video sequence Susie (ITU601, 720x480x144), 12,600 3D-blocks are encoded, where each block size is The last column Average block seek time in Table 7.2 shows that the average block seek time for the three methods are 38.4 sec, sec, and 5.8 ms ( sec) for full, linear, and bi-sectional, respectively. Theoretically, the predicted speed improvement of bi-sectional method over linear one will be O( n log 2 n ) since the linear method has O(n) seek time and the bi-sectional one has O(log 2 n). As an example, for the bitstream containing 10,000 image blocks, the suggested method is approximately 100 times faster than the linear method (n = 10000, log ). In practice, as shown in the experiment, i.e sec versus sec, bisectional method showed around 100 times speed improvement over linear method for 12,600 blocks No Overhead from Encoding the Bi-sectional Links The computational burden of organizing the hierarchical links in encoder for the suggested fast bi-sectional method is as light as that of the linear method, as illustrated in the second column (encoding time) of Table 7.2. In encoding time, every link information (i.e. how many bytes to skip to jump to next block) for each block is available only after the image block is encoded. The difference between linear and bi-sectional method is that the latter accumulates the jump sizes of previous blocks based upon current block index. The task of this size accumulation is managed by a stack, which has maximum depth of

142 127 log 2 n at any time. Comparing to the link size forming in the linear method, the additional overhead from pushing and popping over stack is very small. Thus, the overall encoding time for all blocks does now show any noticeable increase Small Overhead from the Link Information The generated bitstream sizes are 1.21 MB (1,276,509 bytes) for the linear method and 1.24 MB (1,300,821 bytes) for our bi-sectional method. The difference of two bitstream is 24,312 bytes ( bpp), less than 2% of the whole bitstream. Thus, the overhead from the link information is relatively small (See Table 7.2). Actually, the number of links are the same in both methods but the sizes of some links in our method are larger than those in the conventional linear method. In this experiment, the size of each link field is fixed as 2 bytes and 4 bytes for linear and bi-sectional methods respectively. Thus, the maximum allowable size of bitstream in the bi-sectional method is 2 32 bytes = 4 gigabytes. For longer bitstreams, we must define a larger size for link fields Experimental Results Figure 7.8 shows the speed performance of the two methods, linear and bisectional. The horizontal axis indicates a block index and the vertical axis indicates a seek time to find the indexed block. The seek time for 1260 different block indices are measured, every tenth block out of total 12,600 indexed blocks (index 0 through index 12,599). As plotted in Figure, the difference in speed will sharply increase with the increasing block index number. Note that the maximum (i.e. worst case) seek time of bi-sectional method never exceeds 0.02 seconds, while the seek time of linear method increases linearly with the increasing index number. The performance of conventional linear method is never competitive to the new bi-sectional method. In the Figure, the seek time curve for full decoding is not plotted because of its low performance. The fluctuations of the two curves are due to the characteristics of memory hierarchy or other tasks in the computer. The overall results from experiments show that the suggested method is superior to conventional linear seek method. The advantage of the suggested bi-sectional seek method is that substantial improvements on seek speed can be obtained with

143 128 Figure 7.8: Comparison of target block seek time between the linear and suggested methods : The horizontal axis shows the target block index that users want to decode and the vertical axis shows the block seek time in seconds not including the block decoding time. The n is the total number of blocks coded in the bitstream, which is 12,600 in this plot. very small bit overhead and without any extra encoding time over the linear method. Executable decoders for random access decoding and test bitstreams are downloadable from Justifying the Worst Case Performance Assuming the number of blocks n = 2 m, m 0, the maximum number of steps of link following will be required for the block b n 2 + n , which is the last block b n 1 on the bitstream. Thus, the worst case seek time is p (sec/link) log 2 n links, where p is the time for one jump by following a link. Actual block decoding time will add an average block decoding time q (sec/block) to this quantity. If n 2 m, m 0, the worst case seek time will be not more than p (sec/link) log 2 n steps.

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding SIGNAL COMPRESSION 8. Lossy image compression: Principle of embedding 8.1 Lossy compression 8.2 Embedded Zerotree Coder 161 8.1 Lossy compression - many degrees of freedom and many viewpoints The fundamental