Vector Quantization and Subband Coding 18-796 ultimedia Communications: Coding, Systems, and Networking Prof. Tsuhan Chen tsuhan@ece.cmu.edu Vector Quantization 1
Vector Quantization (VQ) Each image block has N pels Consider each image block as an N-D vector x x R y y 1 R 3 y 3 R 1 x 1 Quantization: x y k if x R k y k : codewords or code vectors The set of y k is called a codebook Rate and Distortion If the number of codewords is K, then the number of bits required to send one vector is log K Rate R R bits per pixel NR bits for one vector, so log K=NR, i.e., log K = NR Distortion D Given the probability density function p (x) and distortion measure d (x,y), the average distortion is K D = k =1 d ( x, y k )p ( x)dx R k
Given y k, R k should be chosen such that { ( ) d( x, y j ) j k} R k = x: d x, y k = the set of x for which y k is the nearest point For L norm, i.e., d ( x, y ) = 1 x we get N ( i y i ) N i =1 Convex Polytope (Voroni Cell) Q :How about L1 or L? Given R k, y k should be chosen such that d ( x, y k )p ( x)dx is minimum R k With L norm, we get y = centroid of R = xp( x) dx k k R k In the discrete case, optimal y k is the average of the vectors in R k 3
Generalized Lloyd Algorithm (LBG Algorithm, K-means Algorithm) Linde, Buzo, and Gray, 1980 Given p (x), or given a set of training vectors (1) Start with an initial set of y k, i.e., initial codebook () With the current y k, calculate the region R k (3) Replace each y k with the centroid of R k (4) If the overall distortion D is lower than a threshold, stop. Otherwise, go to () Only gives local optimum. Proper choice of initial codebook is important Choice of initial codebook A representative subset of the training vectors Scalar quantization in each dimension Splitting Nearest Neighbor (NN) algorithm [Equitz, 1984] Start with the entire training set erge the two vectors that are closest into one vector equal to their mean Repeat until the desired number of vectors is reached, or the distortion exceeds a certain threshold 4
Image Coding codeword 1 IAGE codeword codebook Find the best match codeword K Codebook Codebook x Best atch Encoder y k Index k Table Lookup Decoder y k Properties of VQ Codebook design is very complex 4x4 blocks at 1 bpp: 16 codewords 16 images of size 56 56: 16 training vectors (4 4 each) Codebook size: 16 4 4 8 bits = 8.3 bits ore useful for low bitrate 4x4 blocks at 0.5 bpp: 8 = 56 codewords One 56x56 image: 4096 training vectors Codebook size: 56 4 4 8 bits = 3.8 Kbits Simple decoder, complex encoder Very good for image retrieval Poor performance on images not in the training set vs. overhead of sending the codebook 5
VQ Variants and Improvements ultistage VQ Product Codes Send mean and variance separately Classified VQ Edges, texture areas, flat areas Predictive VQ VQ for color images Exploit correlation among color components, e.g. R,G,B YUV components are practically uncorrelated Subband Coding 6
Subband Coding Lowpass H 1 (z) CODEC 1 Lowpass F 1 (z) H (z) Highpass CODEC F (z) Highpass Decompose the signal in the frequency domain Critical downsampling (maximal decimation) maintains the number of samples in the subbands Wavelet coding: Recursively apply subband decomposition to the low freq band -D: Separable filtering to get 4 bands: LL, LH, HL, HH Subband Coding vs. Transform Coding H 1 (z) F 1 (z) H (z) F (z) H (z) F (z) z -1 E(z) R(z) z z -1 z Polyphase Representation 7
Perfect reconstruction (PR) is obtained when R(z)=E -1 (z) When E(z) and R(z) are constant matrices, subband coding degenerated to blocked-based operation, i.e., transform coding In particular, if E(z) is a DCT matrix and R(z) is IDCT, this becomes DCT coding Subband coding can be viewed as transform coding with overlapped blocks. So, it can exploit correlation of pixels at longer range Coding Artifacts: Transform Coding: blocking Subband Coding: ringing, contouring Optimal Bit Allocation We can allocate different bit rates to the subbands based on their properties Assume that we apply scalar quantization with bitrate b k to the subbands x k, then the quantization error is σ q k = c b k σ x k The overall quantization error is The overall bitrate is b = 1 b k k =1 σ 1 q = σ q k k =1 8
1 σ q σ qk ( A- G inequality ) k =1 1 = c b k σ x = c b k 1 k ( ) k =1 σ x k k =1 = c b 1 σ x (a constant for given signal and filter bank ) k k =1 Equality holds if and only if σ q k = σ q k Optimal bit allocation 1 k =1 σ x k b k = 1 c σ x log k σ q Gain = No gain if are identical 1 1 σ x k σ k =1 x k Pyramid Coding b 1 b b 3 b K L L L Int b,e - Int b 3,e - Int b K,e - Detail L 1 L L K-1 L K-1... L L 1 b K freq 9
Consider the -D case The Pyramid b b 3 b 1 : original L L 1 L 3 b 4 transmitted Number of samples is 33% more Non-critical sampling PR is always possible N N N + + + L 4 16 No matter how L and Int are designed Progressive transmission is possible 4 3 N 10
References VQ Allen Gersho, and Robert. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, 199 Subband John Woods, ed., Subband Image Coding, Kluwer Academic Publishers, 1991 P.P. Vaidyanathan, ultirate Systems and Filter Banks, Prentice Hall, 1993 N.S. Jayant and Peter Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, Prentice Hall, 1984 11