Linear, Worst-Case Estimators for Denoising Quantization Noise in Transform Coded Images

Size: px

Start display at page:

Download "Linear, Worst-Case Estimators for Denoising Quantization Noise in Transform Coded Images"

Linette Wood
6 years ago
Views:

1 Linear, Worst-Case Estimators for Denoising Quantization Noise in Transform Coded Images 1 Onur G. Guleryuz DoCoMo Communications Laboratories USA, Inc. 181 Metro Drive, Suite 3, San Jose, CA 91 guleryuz@docomolabs-usa.com, , (fax) Abstract Transform coded images exhibit distortions that fall outside of the assumptions of traditional denoising techniques. In this paper we use tools from robust signal processing to construct linear, worst-case estimators for the denoising of transform compressed images. We show that while standard denoising is fundamentally determined by statistical models for images alone, the distortions induced by transform coding are heavily dependent on the structure of the transform used. Our method thus uses simple models for the image and for the quantization error, with the latter capturing the transform dependency. Based on these models we derive optimal, linear estimators of the original image that are optimal in the mean squared error sense for the worst-case cross-correlation between the original and the quantization error. Our construction is transform agnostic and is applicable to transforms from block DCTs to wavelets. Furthermore, our approach is applicable to different types of image statistics and can also serve as an optimization tool for the design of transforms/quantizers. Through the interaction of the source and quantizer models, our work provides useful insights and is instrumental in identifying and removing quantization artifacts from general signals coded with general transforms. As we decouple the modeling and processing steps, we allow for the construction of many different types of estimators depending on the desired sophistication and available computational complexity. In the low end of this spectrum, our lookup table based estimator, which can be deployed in low complexity environments, provides competitive PSNR values with some of the best results in the literature. EDICS: 2-LFLT Linear Filtering and Enhancement, 2-ANAL Analysis, 2-MODL Modeling Keywords: Robust, worst-case, cross-correlation, quantization, post-processing, transform coding, deblocking, artifacts, denoising

2 LIST OF FIGURES 1 (a) The transform coding scenario examined in this paper. A signal x is transformed with a linear transform H, transform coefficients scalar quantized, and finally the quantized coefficients inverse transformed to obtain the decoded signal. (b) The equivalent compound or overall quantization of the signal. The distinction between the utilized scalar quantizers and the compound quantization becomes important from the perspective of Propositions 2.3 and 2.4. When viewed individually, the scalar quantizers do not result in distortion that exceeds the particular coefficient s energy. However, the overall process does lead to subspaces where the projected signal energy is less than the projected quantization error energy making worst-case analysis relevant Normalized canonical uniform and deadzone quantizer distortion for generalized Gaussian random variables as a function of σ and shape parameter ν (stepsize = 1) Improvements in SNR for a smooth Gaussian process transform coded with 1 8 block DCTs (top row) and 2-level, biorthogonal D7 9 wavelets (bottom row). In (a) and (c), all transform coefficients are scalar quantized with the same uniform quantizer of stepsize. In (b) and (d), a deadzone scalar quantizer is used. The signal satisfies a first order covariance structure with K i,j = ρ i j, i, j {1,...,16} and ρ =.9. For comparison, the performance of the optimal linear estimate of the original signal is also included. This estimator has perfect knowledge of K, Q, and C (actual, not worst-case), whereas the estimators A w and A only know K Improvements in SNR for a piecewise smooth Gaussian process containing discontinuities transform coded with 1 8 block DCTs (top row) and 2-level, biorthogonal D7 9 wavelets (bottom row). Same scenario as Figure 3 but using a piecewise first order Markov process (Equation (29)) instead of a signal-wide Markov process (Equation (28)) Calculated subspaces for a one dimensional (16 1) signal using 1 8 block DCTs and a uniform quantizer. (a): ρ =.9, = 1, d = 1, (b): ρ =.9, = 2, d = 3. Only the subspace with the largest absolute eigenvalue is shown in (b). The subspaces are automatically concentrated at the block boundaries Calculated subspaces for a two dimensional (16 16) signal using 8 8 block DCTs and a uniform quantizer. ρ =.9, = 1, d = 62. Only the first six subspaces with the largest absolute eigenvalues are shown. The subspaces are automatically concentrated at the block boundaries Location of constrained spatial extent subspaces for a two dimensional signal using block DCTs. With our simple quantizer model, K and Q can be determined in a localized fashion, without requiring large spatial extent covariances Calculated, constrained spatial extent subspaces for a two dimensional signal using 8 8 block DCTs and a uniform quantizer. Same signal covariance scenario as in Figure 6, ρ =.9, = 1, d = 1. The spatial extent of the subspaces is limited to be 8 8, located over the 8 8 region where four blocks meet

3 9 Improvements in PSNR for the image Lena (luminance, 12 12) transform coded with 8 8 DCTs (top row) and level D7 9 wavelets (bottom row). In (a) and (c), all transform coefficients are scalar quantized with the same uniform quantizer of stepsize. In (b) and (d), a deadzone scalar quantizer is used Improvements in PSNR for the image Barbara (luminance, 12 12) transform coded with 8 8 DCTs (top row) and level D7 9 wavelets (bottom row). In (a) and (c), all transform coefficients are scalar quantized with the same uniform quantizer of stepsize. In (b) and (d), a deadzone scalar quantizer is used Improvements in PSNR for the image Boat (luminance, 12 12) transform coded with level D7 9 wavelets. In (a) all transform coefficients are scalar quantized with the same uniform quantizer of stepsize. In (b) a deadzone scalar quantizer is used Decoded (baseline dB), A (LUT) post-processed (27.43dB), and A (TI) post-processed (27.4dB) image Lena (from the first column of results in Table I) Decoded (baseline dB), A (ADP) post-processed (28.3dB), and A (TI) post-processed (28.84dB) image Barbara (deadzone quantized with DCTs and = ) Top row, original, and decoded (baseline dB). Bottom row, A (LUT) post-processed (28.83dB), A (ADP) post-processed (28.92dB), and A (TI) post-processed (29.dB) image Boat (from second column of results in Table III) Original, decoded (baseline dB), and A (TI) post-processed (29.46dB) image Boat (deadzone quantized with level D7 9 wavelets and = 4). PSNR inside the shown region is improved from 28.42dB to 28.76dB Original, decoded (baseline dB), and A (TI) post-processed (28.97dB) image Barbara (deadzone quantized with level D7 9 wavelets and = 4). PSNR inside the shown region is improved from 27.22dB to 27.8dB

4 LIST OF TABLES I Results on JPEG compressed Lena using the quantizer tables in [31]. The utilized version of the Lena image is the same as the one in [3]. The cited results are obtained from [3] II Results on JPEG compressed images using the quantizer tables in [19]. The cited results are obtained from [19]. (Results on Barbara are not reported since we couldn t obtain the non-standard version of Barbara used in [19]) III Results on JPEG compressed Boat using the quantizer tables in [31] and in [19] IV Results on JPEG2 compressed images compressed at rates.,.2, and. bits per pixel. (As we were unable to obtain the quantizer parameters, we report post-processing results assuming all wavelet coefficients were deadzone quantized using a single guessed quantizer stepsize.)

5 I. INTRODUCTION Transform coding is the most popular technique used in image compression applications. By virtue of good compaction performance over a wide class of images, linear transforms based on block DCTs, wavelets, and filter banks have captured the interest of researchers leading to high performance image/video coders and compression standards (see for e.g., [1], [4], [13], [18]). While very good visual quality can be achieved at high bit rates, it is well known that at moderate to low bit rates the reconstructed images that have undergone transform compression suffer from quantization artifacts. A significant body of work in the literature is thus concerned with post-processing algorithms that attempt to reduce or remove these artifacts from transform coded images (see for e.g., [31], [3], [29], [17], [2], [21], [28], [19], and references therein). The main difficulty involved in designing effective quantization artifact removal algorithms is the formulation of nontrivial statistical relationships that specify how the original uncompressed image relates to the transform compressed one. For example, if images are modeled as random vectors then the exact knowledge of the image probability distribution function (pdf) can be used to formulate such a relationship, and one can estimate the original image given its transform coded version and the quantizer parameters. However, in the absence of an exact pdf one has to resort to certain assumptions about the structure of images and utilize these as regularization. Researchers have typically formulated these assumptions in an ad hoc fashion, for example by requiring similar pixel values at block boundaries for post-processing block transform coded images [2] (which is not valid for non-block transforms or non-smooth images), by imposing stationarity on assumed cyclostationary decompressed images [21] (which is not valid for spatially varying statistics, transforms, or quantizers), by using singularity processing and overcomplete transforms [12], [28] (which is not valid for general image statistics and non-block transforms), etc. Unfortunately, these assumptions are typically restricted to specific transforms, image models, and algorithms. They hence do not provide a robust approach that is applicable to more general transform coding scenarios. The main aim of this paper is to address this shortcoming by providing a simple statistical formalism that can be broadly utilized in post-processing transform coded data. Another important issue that this paper will try to address is the computational complexity involved in the post-processing of transform coded images. In compression applications it is very desirable to maintain a low computational complexity decoder (transform decoding and post-processing combined) to the extent possible. Yet some of the most effective post-processing algorithms reported in the literature that result in the best mean squared error or PSNR improvements are computationally intensive, requiring iterative operations such as alternating projections, repeated optimizations, redundant transformations, or many encoding/decoding operations [31], [29], [22], [28], [21], [19], [23]. If computational complexity of post-processing can be assigned to two factors, namely the complexity required in adaptively determining a model for the data out of a range of possibilities, and the complexity required in processing the data with respect to that model, it is clear that both of these factors can become quite significant on their own depending on the desired sophistication. For example, even if one had exact knowledge of the image pdf, i.e., zero modeling error and complexity (we will assume that the quantizer parameters are known), the construction of the optimal estimate can still be computationally prohibitive or even infeasible due to the large dimensions of image vectors. It is therefore very desirable to have techniques that decouple the modeling 2

6 and processing operations so that specific applications can be tuned based on the target set of images and overall computational complexity requirements. The work presented in this paper is geared toward providing this flexibility. In this paper we use tools from robust statistics and signal processing to construct linear, worst-case estimates of the original image given the decompressed image. Our method is based on simple local covariance models for the original image pixels with one such local covariance is canonically represented with an N N matrix K. Based on K, we derive a corresponding simple covariance model for the pixel domain quantization error, an N N matrix Q, using the transform and quantizer parameters. The formulation of statistical relationships between the original and decompressed images is thus converted to the determination of the linear cross-correlations between the original image and the quantization error. We avoid the pitfalls associated with this critical and difficult cross-correlation determination by utilizing worst-case statistics, i.e., we derive the optimal linear estimate of the original, based on the worst-case cross-correlation. The estimators we construct are optimal, linear, worst-case estimators that are optimal in the mean squared error sense for the worst-case cross-correlation. In the proposed post-processing algorithm we limit modeling complexity to the determination of K over localized image regions (Q is approximated using K and transform/quantizer parameters). Our general solution is a convex minimization problem which can be tackled numerically using well-known techniques. However, instead of this general solution, we derive an intuitive and computationally simple estimator/projection in order to curb processing complexity. The processing complexity of this simple estimator is an eigen decomposition of Z = K Q followed by the removal of linear subspaces from the data where the expected quantization distortion exceeds the expected signal energy, i.e., the subspaces formed by the eigenvectors of Z that have negative eigenvalues. We will term these subspaces as the subspaces of quantization artifacts. For simple covariance models (such as Markov models) or for cases where one expects to encounter a small number of distinct covariance models, our approach can be reduced to subspace removal alone with the aid of lookup tables. This results in a very fast quantization artifact removal algorithm that provides competitive PSNR values with some of the best results in the literature 1. After the discussion in Section I-A, Section II introduces the main ideas and the simple but generic worst-case algebra. These results are specialized to the quantization problem as defined in Section III, with further simple but quantization specific results provided in Section III-A. In order to obtain concrete results and to add further perspective to our algebraic formulation, Section III-B discusses some relevant properties of scalar uniform and deadzone quantizers. Section IV is devoted to simulation results and related analysis. In Section IV-A we consider simulated random vectors where the source model is known in order to see the performance and limitations of our estimators. Section IV-B contains visualizations of the optimum denoising coordinate systems which we use in Section IV-C, to define two critically sampled, constrained spatial extent, adaptive estimators on images. For completeness we also introduce a translation invariant estimator. After comparing results to prior work, we conclude the paper in Section V with discussion for future work. A. Discussion of Properties, Contributions, and Generalizations The approach presented in this paper results in some useful insights due to the interplay between K and Q in the formation of the estimates. In particular, our formulation is transform agnostic and is applicable to transforms from 1 Our early results were reported in [9]. 3

7 DCTs to wavelets. It is applicable to different types of image statistics, to regions that are locally smooth, locally high frequency, edge, etc. The presented method can accommodate spatially varying statistics and quantization parameters (the latter often encountered due to rate-control based compression in video coders [13], [18], due to truncation of bitstreams generated by embedded transform coders [4], [24], etc.). Most importantly, the presented approach can be deployed as a useful analysis and optimization tool in the design of quantizers and transforms. Much of our work is very intuitive, and leads to useful comparisons among different transforms and to visualizations of the types of quantization artifacts they are likely to produce over various image regions 2. The results of this paper can even be used to define a quantization figure of merit for different transforms, similar to energy compaction or coding gain performance numbers. While not examined in this paper, our formulation also opens a mathematical optimization avenue for joint transform and quantizer design to reduce artifacts (so that they are less visible without any post-processing), and the design of loop/deblocking filters for video. Furthermore, since our approach is algebraic and general, it can be directly applied to other types of signals (audio, speech, etc.) and compression scenarios in a straightforward fashion. From a purely mathematical standpoint it is worth pointing out that given two random quantities and their respective marginal pdfs, one can utilize well-known techniques to construct a joint pdf using worst-case analysis and proceed with estimation problems. While such methods are popular and are often cited as examples of convex optimization [2], note that the problem we are investigating involves the second quantity (the quantization error signal) being a deterministic function of the first (the original signal) via the known quantizer map. Hence if we know the pdf of the original, other than problems due to dimensions/complexity, it is straightforward to obtain the minimum mean squared error estimate without resorting to worst-case tactics. The issues that motivate worst-case cross-correlations here are the imperfect information of the pdf, and our desire to solve a computationally simple problem, i.e., we do not know what the pdf is but we would still like to have a robust solution when we can determine approximate, local, second order statistics for this large dimensional problem. In comparison to some established work in robust signal processing (see for e.g., [16]), which assumes wide-sense stationary signals and hence an orthonormal transform that simultaneously diagonalizes K and Q in our notation, our analysis indicates that the main flavor of the quantization problem examined here is due to the non-simultaneous diagonalization of the two matrices K and Q by an orthonormal transform. As we will see, the relationships between K and Q in the general case lead to useful insights and an effective algorithm. An important issue with worst-case analysis is how relevant it remains in typical cases. Beyond simulation results that show this relevance, our subspace removal algorithm also offers an intuitive reassurance since it removes subspaces from the data where the expected quantization error energy exceeds the expected original signal energy, i.e., assuming K and Q are correctly modeled, the subspace removal algorithm is guaranteed to improve the mean squared error or leave it unaltered. Our worst-case estimator only does better (see Proposition 2.) and one expects the techniques proposed in this paper to improve mean squared error in typical regimes of quantization with boundaries identified in Propositions 2.3 and The definition of an artifact is of course subjective. In this paper we will say that artifacts are due to portions of the decoded signal where the quantization error energy exceeds signal energy (please see Section II and Definition 2). 4

8 Finally, the subspace removal algorithm proposed in this paper can also be thought of as a method for denoising transform coded images. In standard denoising, one tries to identify coordinates where the SNR is lowest and one removes these from the noisy data using per-coordinate operations to arrive at denoised estimates under white noise assumptions [7]. In our notation, assumptions that involve white noise that is cross-correlated to the data correspond to the sub case where there is an orthonormal transform that simultaneously diagonalizes K and Q (Proposition 2.6). Then, the proposed subspaces of quantization artifacts reduce to the eigenvectors of K that correspond to the small eigenvalues, i.e., the Karhunen-Loeve transform (KLT), which by definition diagonalizes K, is the best coordinate system for denoising under white noise assumptions. In the general case however, the noise due to quantization is not white nor i.i.d., and the statistical requirements of popular denoising techniques are not satisfied: Over a wide-regime of quantization, Q is such that the simultaneous diagonalization requirement does not hold. Hence, one cannot rely on sparse decompositions of the original data alone and one must incorporate Q in order to obtain the optimum coordinate system for denoising, which in turn will yield the coordinates of lowest SNR. In this sense, the paper can be thought of as deriving a set of basis vectors that can be used in denoising transform coded images. As we will see, over a wide-range of quantization, through the interaction of K and Q, the optimal denoising coordinate system as derived in this work is quite different from the KLT or other popular coordinate systems. Furthermore, the optimal denoising strategy is even more different, since in general it cannot be written as per-coordinate denoising, regardless of the coordinate system. II. MAIN IDEAS The transform coding scenario examined in this paper is shown in Figure 1 (a). At the encoder a signal is transformed, the transform coefficients scalar quantized, and losslessly encoded using an entropy coder. At the decoder the reverse entropy coding takes place to determine the quantized coefficients, which are inverse transformed to obtain the decoded signal. For our purposes the entropy coding particulars are not important, and thus not included in the figure. Throughout, bold capital letters will denote matrices and bold lowercase letters will denote vectors. While our examples will be with scalar quantizers, the paper can tolerate more sophisticated quantization provided one can determine/model Q as defined in Equation (3) below. Of course, since we will be estimating the original from the decoded signal, it is important that the minimum mean squared estimate of the original signal using the decoded signal be nontrivial. For example, in the case of joint vector quantization (VQ) of all coefficients using a minimum mean squared error VQ codebook [8], it is clear that the optimal estimate is the decoded signal itself, and any post-processing is redundant. On the other hand in cases of block based VQ, lattice quantization, Lloyd-Max scalar quantization, etc., residual correlations (such as interblock correlations for block transforms/quantizers, etc.) remain that may allow nontrivial estimates to be constructed. The techniques we will propose can be extended to such scenarios in a straightforward fashion. For the purposes of this paper it is important to distinguish between direct quantization, which is the application of the individual scalar quantizer maps to individual coefficients, and compound quantization (Figure 1 (b)), which is the equivalent high dimensional quantizer in signal domain. This independent v.s. joint consideration of the data allows one to construct trivial v.s. nontrivial estimators. For example, if we assume that minimum mean squared error Lloyd-Max scalar quantization is applied to each coefficient, then it is well-known that the optimal estimate

9 c c = e 1 ^c ^c 1 1 x, H c= H x= c c 2 = ^c 2 e ^c ^ 2 2 ^c= 2 x= H ^c... c N... c = N ^c e N N... ^c N Signal and transform Transform coefficients N scalar quantizers Quantized coefficients Decoded. signal (a) 1 x= ^x H e = xq ^ Compound quantization of the signal (b) Fig. 1. (a) The transform coding scenario examined in this paper. A signal x is transformed with a linear transform H, transform coefficients scalar quantized, and finally the quantized coefficients inverse transformed to obtain the decoded signal. (b) The equivalent compound or overall quantization of the signal. The distinction between the utilized scalar quantizers and the compound quantization becomes important from the perspective of Propositions 2.3 and 2.4. When viewed individually, the scalar quantizers do not result in distortion that exceeds the particular coefficient s energy. However, the overall process does lead to subspaces where the projected signal energy is less than the projected quantization error energy making worst-case analysis relevant. of each coefficient using only the corresponding decoded coefficient is trivial, i.e., each decoded coefficient itself is the best estimate in the scalar problem. However, the optimal estimate of the original signal (or the entire set of coefficients) using all of the decoded signal (or all of the decoded coefficients) will likely be nontrivial. This paper is geared toward obtaining the latter type of estimates. We start our development by deriving simple worst-case results for given positive semidefinite matrices K and Q. In the post-processing algorithm we will propose, K will be determined adaptively from localized image regions and Q will be determined using K, the transform, and the quantizer parameters. We now leave these modeling and determination issues aside and concentrate on the simple min-max problem we define below. A. Main results for general K and Q Consider the random vector x (N 1) such that x = ˆx q, (1) where ˆx is the decoded/quantized (observable) version of x, and q is the quantization error. Assume zero-mean quantities. Let E[...] denote expectation and define the (N N) covariances K = E[xx T ], (2) Q = E[qq T ], (3) where (...) T denotes transpose. Let C determine the cross-covariance C = E[xq T ]. (4) We assume that K and Q are known, but C is unknown. Regardless, we would still like to obtain a linear estimate of x via y = Aˆx, () 6

10 where A (N N) is the linear estimation matrix. In doing so we would like to choose A so that it yields the smallest mean squared error for the worst-case cross-covariance, i.e., we would like to solve the min-max problem determined by Definition 1 (Linear, worst-case estimation): For a given C and arbitrary A, let M SE(A, C) be the mean squared error obtained via We define the optimal, linear, worst-case estimation matrix A w as MSE(A,C) = E[(x Aˆx) T (x Aˆx)]. (6) A w Substituting ˆx = x q in Equation (6), we obtain A w as = arg min {max MSE(A,C)}. (7) A C A w = arg min A {max C E[ (1 A)x Aq 2 ]}, (8) where 1 is the (N N) identity matrix. Observe that while there are no constraints on A in the min portion, the cross-covariance matrix C of the max portion must be such that the augmented covariance matrix E[ x [xt q T ] ] = K C (9) q C T Q is positive semidefinite. As we will not be able to get an analytical solution for A w, initially it is useful to see that this problem is convex and the optimal solution can be found using convex optimization techniques. Proposition 2.1: The linear worst-case estimation problem of Definition 1 is convex over linear estimation matrices A. Proof: Let A and A 1 determine two linear estimation matrices with corresponding worst-case cross-covariances given by C and C 1. Let λ 1, A λ = (1 λ)a λa 1, and let C λ be the worst-case cross-covariance for A λ. Since MSE(A,C) is convex for a given C we have MSE((1 λ)a λa 1,C λ ) (1 λ)mse(a,c λ ) λmse(a 1,C λ ) () (1 λ)mse(a,c ) λmse(a 1,C 1 ) (11) where the second inequality follows since C and C 1 are the worst-case cross-covariances for A and A 1. The next proposition affects the max step and obtains A w through a single optimization which can be solved through convex programming techniques. Let T r[...] denote trace. Proposition 2.2: The linear worst-case estimation problem of Definition 1 is solved by an A w that satisfies A w = arg min A {Tr[(1 A)K(1 A)T ] Tr[AQA T ] 2Tr[(Q 1/2 A T (1 A)K(1 A) T AQ 1/2 ) 1/2 ]} (12) Proof: See Appendix I. We were not able to obtain an analytical solution for this form of A w due to the singular value decomposition implicit in the last term. While we will be able to extract some useful properties based on this form, we can get significant intuition about the flavor of the problem by considering the following simple example. 7

11 Example 1: Consider the quantization problem in Equation (1) in one dimension, i.e., let N=1. Let σ 2 x = K, σ 2 q = Q, and let c = E[xq] = C. Equation (8) becomes a w = arg min{max E[ (1 a)x aq 2 ]}, a c = arg min{max{(1 a) 2 σ 2 a c x a 2 σq 2 2a(1 a)c}}. (13) Using Schwarz inequality we have E[xq] σ x σ q, and it is clear that the last term in Equation (13) is maximized by choosing c = sign(a(1 a))σ x σ q. (Such a value for c is realizable by choosing x and q to be scaled versions of the same random variable, i.e., the cross-correlation E[xq] = sign(a(1 a))σ x σ q satisfies the constraint of Equation (9)). We thus have a w = arg min a {(1 a) 2 σ 2 x a 2 σ 2 q 2 a(1 a) σ x σ q }. Considering the three second order polynomials for a <, a 1, and a > 1 we obtain 1 σx 2 σq 2 a w = otherwise. (14) Hence the optimal linear, worst-case estimator provides nontrivial estimates only when σq 2 > σx, 2 i.e., when the quantizer distortion exceeds signal energy. Remark: At first glance this results points to potential redundancy of our worst-case analysis. Indeed, as we will see in Section III, under reasonable assumptions, when applied judiciously the scalar quantizers used in transform coding do not result in distortion that exceeds each quantized coefficient s energy. Yet, unlike this limited direct quantization viewpoint, we will also see that if one views the transform coding of an entire image as a compound quantization operation (Figure 1 (b)), it is possible to find subspaces where the projected signal energy is less than the projected quantization error energy. Our simulation results will show on actual data that not only do such subspaces exist, but their removal as we propose below results in significant improvements in mean squared error. The following two propositions generalize Example 1 and show that the worst-case analysis will only provide nontrivial estimates if Z = K Q is not positive semidefinite. Proposition 2.3 (Analogue of σx 2 σq 2 case in Equation (14)): Suppose Z = K Q is positive semidefinite. Then the mean squared error max C {MSE(A,C)} MSE(1,.) for any A, and setting A w = 1 minimizes the worst-case linear estimation problem. Proof: See Appendix II. Proposition 2.4 (Analogue of σx 2 σq 2 case in Equation (14)): Suppose K Q is negative semidefinite. Then the mean squared error max C {MSE(A,C)} MSE(,.) for any A, and setting A w = minimizes the worstcase linear estimation problem. Proof: Similar to the proof of Proposition 2.3. The following intuitive but suboptimal estimator/projection is influenced by Example 1 and extends it to accommodate the compound quantization operation. Definition 2 (Subspaces of Quantization Artifacts - SQA): Consider the eigen decomposition of Z = K Q. Let d determine the number of negative eigenvalues and let S be the matrix that contains the orthogonal 8

12 eigenvectors of Z that correspond to negative eigenvalues (up to rotations to accommodate duplicate eigenvalues). If d =, set S =. We have ZS = SΛ (1) where Λ (d d) is the diagonal matrix of negative eigenvalues of Z. Define the SQA estimator/projection as A = 1 SS T. (16) We immediately have Proposition 2.: MSE(A,C) is independent of C with MSE(A w,c) MSE(A,.) = Tr[Q Λ ] MSE(1,.) = Tr[Q]. (17) Proof: We only show MSE(A,.) = Tr[Q Λ ] since other parts follow from definitions. Noting that A is an orthonormal projection, i.e., (1 A )A =, we have MSE(A,C) = E[ (1 A )x A q 2 ] = E[ (1 A )x 2 ] E[ A q 2 ] = Tr[(1 A )(K Q Q)(1 A )] Tr[A QA ] = Tr[(1 A )Z(1 A )] Tr[(1 A )Q(1 A )] Tr[A QA ] = Tr[Λ Q], (18) since Tr[(1 A )QA ] = Tr[A (1 A )Q] =. Remark: Observe that, as expected, the SQA estimator strictly reduces the MSE if d >, since Tr[Λ ] becomes less than zero. Simulations using random positive semidefinite matrices for K and Q indicate substantial mean squared error differences between estimates formed with A w and A for the worst-case C. However, in cases where Q is determined from K through our quantizer model of Definition 3, the differences were significantly reduced. Most importantly, on simulations involving actual quantization (where the worst case C is not necessarily involved) the differences in mean squared error were marginal (see Section IV-A). We will use such observations in Section IV to argue for the preference of A over A w due to computational complexity reasons. We conclude this section by considering the case where K and Q can be simultaneously diagonalized by an orthonormal transform, similar to the scenario used by classical results on robust signal processing and denoising. Proposition 2.6: Suppose there exists an orthonormal transform that simultaneously diagonalizes both K and Q. Then A w = A, and if d > in Definition 2, then S coincides with d eigenvectors of K. Proof: See Appendix III. Remark: While this case represents similarities to classical work in robust statistics, its implications on the transform used in compression is unrealistic and the regime of quantization that it represents is not very interesting: In the transform coding scenario, and under the quantizer model of Definition 3, K and Q can be simultaneously diagonalized if one uses the Karhunen-Loeve transform (to be examined in Proposition 3.1 below) or if one is in a high resolution regime where Q becomes a scalar multiple of the identity (Proposition 3.2). However, neither of 9

13 these scenarios is representative of typical transform coding. For example it is well known that the block DCTs used in JPEG provide good decorrelation properties within each block, but coefficients from different blocks are typically very correlated (see for e.g., [] which uses interblock correlations to design a better reconstruction basis), i.e., popular transforms used in transform coding such as block DCTs, wavelet transforms, etc., cannot be said to form the KLT for typical images. Similarly, as we will see in Section III, for a wide-range of bitrates, transform coefficients tend to incur varying mean squared quantization errors which casts significant doubt to the validity of white quantization noise assumptions. Our simulation results in Section IV will indicate that for a wide regime of quantization with popular transforms, the estimators A w and A will be substantially different from those of Proposition 2.6. III. QUANTIZER MODEL Since the results in Section II work on two general positive semidefinite matrices, our general analysis is invariant to how Q is obtained. Regardless, it is interesting to see the folowing simple results for the case where Q is approximately modeled by the actual quantization process undergone by the transform coefficients. As far as the main theme of the paper is concerned, the basic result this section will motivate is that unless one is in some high resolution regime, the structure of the utilized transform heavily influences the structure of Q. As far as the practical aspects of the work, this section will show a way to approximate Q directly from K using transform and quantizer parameters alone for reasons of computational complexity and simplicity of modeling. The proposed model is only intended as a conduit for the understanding of the simple results below, and it is lacking in many ways when compared to more rigorous quantizer models. We note however that Q can also be obtained through more sophisticated means, through simulations, statistical sampling, or other modeling procedures without affecting our main results. As in Figure 1 (a), suppose that the invertible transform utilized in transform coding is arranged in an N N matrix H so that c = Hx (19) gives the transform coefficients of x. The N transform coefficients in c are each scalar quantized to yield c i = ĉ i e i, i = 1,...,N, (2) where c i is the i th coefficient, ĉ i its quantized version, and e i is the quantization error. Arranging the quantized coefficients into a vector ĉ we obtain the decoded/quantized vector in signal domain as ˆx = H 1 ĉ. (21) Similarly, grouping the quantization errors in e i into a vector e and inverse transforming yields q = H 1 e, (22) x = ˆx q. (23) Main Quantizer Assumption: Our main assumption in forming Q will be that the quantization error e in transform domain is decorrelated, i.e., that E[ee T ] is a diagonal matrix. This assumption is justified on two grounds.

14 1. Except for very coarse quantization, it is well-known that quantization error using typical scalar quantizers, for even very heavily correlated random variables, is approximately decorrelated [8]. 2. Transform coding with typical transforms produces mostly decorrelated transform coefficients. In particular, the coarsely quantized transform coefficients tend to be small (usually high frequency) coefficients which show little correlation with other coefficients. This further inhibits nonnegligible cross-correlations in E[ee T ], as it dampens the violations of the first assumption above. Note however that we do not assume that the transform coefficients themselves are decorrelated in general, and most importantly, is in general non-diagonal. Q = H 1 E[ee T ]H 1 T (24) Definition 3 (Simple Quantizer Model): Assume that the quantization error, e i, for the i th transform coefficient, c i is given via Equation (2). The utilized quantizer satisfies the Simple Quantizer Model if E[e i e j ] = ε i δ i,j, i = 1,...,N, j = 1,...,N, (2) where ε i is the mean squared quantization error incurred by the i th transform coefficient and δ i,j is the Kronecker delta. Recipe for Q: In our actual post-processing algorithm we will determine K adaptively for localized image regions and then obtain Q with the aid of given scalar quantizer characteristics. Since the actual quantization happens in coefficient domain, we will first estimate the variances of transform coefficients, σi 2, i = 1,...,N, either by averaging or using the diagonal elements of the coefficient covariance HKH T. We will then form E[ee T ] through Equation (2) by setting ε i = D(σ 2 i, ν i ), (26) where D(σ 2 i, ν i) is the nonlinear quantizer map that determines the average quantization distortion for the given scalar quantizer when quantizing a generalized Gaussian random variable of variance σ 2 i and shape parameter ν i. Finally we will inverse transform E[ee T ] to obtain Q via Equation (24). The actual form of D(σ 2, ν) for uniform and deadzone scalar quantizers used in this paper will be discussed below (see Figure 2), where we will also argue for the convenience of setting ν i = 2, which makes Q directly a function of K, H, and quantizer parameters. A. Simple Properties It can be shown that most reasonable scalar quantizers, such as the uniform/deadzone quantizers depicted in Figure 2, result in direct quantization distortion ε i = D(σ 2 i, ν i) σ 2 i for the zero-mean generalized Gaussian family. With this basic observation, we can show the following simple statements with respect to the simple quantizer model of Definition 3. Proposition 3.1 (Karhunen-Loeve Transform): Suppose the rows of the utilized transform is given by the eigenvectors of K, i.e., suppose H is the Karhunen-Loeve transform (KLT). Using the quantizer model of Definition 3, assume that the quantizer is such that ε i σ 2 i. Then A w = A = 1. Proof: Follows from Proposition 2.3 and Definition 2 since K and Q become simultaneously diagonalizable and K Q is positive semidefinite. 11

15 Remark: If we use the Karhunen-Loeve transform in transform coding, then the quantization artifacts in the sense of this paper will not be present. Hence in practice we expect improvements from the algorithms we will propose in cases where the utilized transforms deviate from the KLT. Proposition 3.2 (High Resolution): Using the quantizer model of Definition 3, assume that the quantizer is such that ε i = ε for all i = 1,...,N. Then max C {MSE(A w,c)} MSE(1,.) if and only if K has eigenvalues with magnitudes less than ε. Proof: Follows from Proposition 2.3. Remark: Using uniform/deadzone quantizers, it is well known that as, the mean squared quantization error tends to a constant ε = 2 /12 on many types of random variables [8]. Hence under the common transform coding scenario which uses the same scalar quantizer for all coefficients, unless K has very small eigenvalues, we expect to see no improvements with worst-case estimation in the very high resolution limit. In the regime where ε = 2 /12 holds but is nonnegligible, using Proposition 2.6, we have A w = A. Hence A w is formed as subspace projections using the eigenvectors of K that correspond to the eigenvalues with magnitudes less than ε. Standard denoising schemes can be seen to be operating in this regime with a fixed coordinate system [7]. In this regime the actual transform used in compression, H, does not play a dominant role, and modeling of the source through the KLT (or another representation such as wavelets/curvelets, etc., assumed to be close to the KLT) is sufficient, i.e., standard results of robust statistics and denoising can be applied. This regime can be reached for example by doing pixel domain quantization with moderate values for or with dithered quantizers, etc. As we will see in Section IV however, below this regime the structure of the transform H will heavily influence the estimators. Solutions for the estimators will become interesting, and optimal solutions will deviate significantly from those that can be obtained using the KLT alone. B. Quantizer Specifics From the perspective of remarks surrounding Propositions 2.6 and 3.2, it is important to see that typical transform coders that utilize orthonormal and near orthonormal transforms have an interesting regime which leads to Q that does not satisfy the conditions of the propositions, i.e., not all of the diagonal elements of E[ee T ], the ε i in Equation (26), should have the same value at all bitrates. Some transform coders such as JPEG [1], have the capability to utilize different scalar quantizers for different transform coefficients, which automatically tends to generate such variation. However, even if the same scalar quantizer is used for all of the coefficients, variation results at typical bitrates simply due to differences in the coefficient standard deviations and the impact this has through the quantizer characteristics. As is commonly done in the literature [1], in this paper we will assume that the transform coefficients have a generalized Gaussian distribution with zero mean, σi 2 variance, and ν i shape parameter. In order to make Q concrete for these cases, assume that we are interested in uniform and deadzone scalar quantizers, that are parameterized only with a stepsize. In this paper a uniform quantizer is one whose codewords are at integer multiples of. The closely related deadzone quantizer has a bin at zero of width 2, but otherwise has uniformly spaced bins of size. Hence, if a canonical uniform/deadzone (u/d) quantizer with = 1 results in mean squared quantization error D u/d (σ, ν) for a generalized Gaussian random variable, it is clear that for arbitrary the mean squared 12

16 quantization error ε becomes ε = 2 D u/d (σ/, ν). (27) The forms of the functions D u/d (σ, ν) are illustrated in Figure 2. Note that for all values of ν the functions show variation as a function of σ, and as expected, they tend toward the steady state value of 1/12 =.8 as σ gets large. Using these curves it is clear that even if all the coefficients are quantized with the same quantizer using ν=2. ν=.6 ν=.4 D u ( σ, v) ν=2. ν=.6 D ( d σ, v).4.1 ν= (a) Canonical uniform quantizer σ (b) Canonical deadzone quantizer Fig. 2. Normalized canonical uniform and deadzone quantizer distortion for generalized Gaussian random variables as a function of σ and shape parameter ν (stepsize = 1). σ the same stepsize, small transform coefficients tend to incur different quantization distortion when compared to larger coefficients. This will lead to uneven ε i, a nondiagonal Q, and, as we will see, an interesting coordinate systems for denoising quantization artifacts. Since this main behavior is sufficiently captured by the Gaussian (ν = 2) case, and since we didn t observe significant differences in our simulations, we will make the convenience assumption of ν = 2. This allows Q to be a function of K, H, and the quantizer parameters. The only required statistics for the construction of Q are the variances of transform coefficients which can be determined using K and H, or by a direct estimation procedure if more convenient. IV. SIMULATION RESULTS Simulation results of this paper are divided into two. The first set of simulations in Section IV-A are on toy examples where we can calculate the statistics directly (software for these simulations can be found in [32]). This allows us to measure the performance when there are no modeling errors, compare results to the optimal linear estimator (which knows the actual cross-covariance), and briefly examine the impact of the simple quantizer model. The results on toy examples will motivate us to obtain our post-processing algorithm for general images based on A, which avoids the convex minimization required for A w. The visualizations of subspaces of quantization artifacts provided in Section IV-B will show how to tile an image for adaptive processing so that one gets the most performance out of critically decimated estimators of limited spatial extent. The results on images and comparison to earlier work are provided in Section IV-C. A. Postprocessing Transform Coded Random Processes In this section we show the performance of our estimators for the case where K is known, i.e., where the modeling of the original source data does not introduce a performance penalty. Without loss of generality, we use two sets of examples to demonstrate performance over smooth, and piecewise smooth processes with sharp discontinuities. 13

17 13 First order Markov process using block DCTs and uniform quantization First order Markov process using block DCTs and deadzone quantization 12 A (with K only) w A (with K only) 9 A (with K only) w A (with K only) 11 Optimal linear (with actual K, Q, and C) 8 Optimal linear (with actual K, Q, and C) SNR (db) 8 SNR (db) (a) First order Markov process using D7 9 wavelets and uniform quantization (b) First order Markov process using D7 9 wavelets and deadzone quantization 12 A (with K only) w A (with K only) 9 A (with K only) w A (with K only) Optimal linear (with actual K, Q, and C) 8 Optimal linear (with actual K, Q, and C) SNR (db) 6 SNR (db) Fig (c) Improvements in SNR for a smooth Gaussian process transform coded with 1 8 block DCTs (top row) and 2-level, biorthogonal D7 9 wavelets (bottom row). In (a) and (c), all transform coefficients are scalar quantized with the same uniform quantizer of stepsize. In (b) and (d), a deadzone scalar quantizer is used. The signal satisfies a first order covariance structure with K i,j = ρ i j, i, j {1,..., 16} and ρ =.9. For comparison, the performance of the optimal linear estimate of the original signal is also included. This estimator has perfect knowledge of K, Q, and C (actual, not worst-case), whereas the estimators A w and A only know K. (d) For each set we further subdivide by considering transform coding scenarios with two popular transforms, block DCTs and D7 9 biorthogonal wavelets. 3, Gaussian random vectors are used in each simulation. We first generated Gaussian random vectors having the first order Markov covariance structure K i,j = ρ i j, i, j {1,...,N}, (28) with ρ =.9, and N = 16. The random vectors were transform coded using the pipeline of Figure 1 (a), using 8 1 block DCTs, resulting in 2 blocks of coefficients. All transform coefficients were scalar quantized using the same uniform/deadzone quantizer of stepsize, and the results inverse transformed to yield the transform coded random vectors. The estimators A w and A were applied to these vectors using Equation () to obtain the post-processed results. Both estimators use the known K but obtain Q through the simple quantizer model of Section III, i.e., using Equation (27), with the variance of each coefficient determined from the coefficient covariance HKH T, ν set to 2, the quantizer stepsize as given, and D u/d (...) obtained from the quantizer curves in Figure 2. The estimators multiply the transform coded vectors directly. The estimation is single pass. There is no iterative processing, no apllication of a quantizer constraint set, no alternating projections, etc. (See [31] for the use of quantizer constraint sets in an alternating projections setting.) The worst-case estimator A w is obtained using a simplex search algorithm 14

18 that minimizes Equation (12). The resulting SNR as a function of the utilized stepsize is illustrated in Figure 3 (a) and (b) for each type of quantizer. In the figure we also show the performance of the optimal linear estimator which has perfect information of all statistics. Observe that for general sources, constructing the optimal linear estimate is very difficult since one does not know the required statistics. The optimal linear estimate results are included since it is useful to get an idea of the amount of performance loss due to the lack of knowledge of all statistics. It can be seen that both of our estimators improve SNR significantly, especially for the case of a uniform quantizer. Furthermore, the performance of A w and A are almost identical, which can also be observed in all of the remaining examples in this section. Since the estimators A w and A are obtained with perfect information of K, performance difference with respect to the optimal linear estimator can only be due to errors in modeling Q and due to the utilized worst-case cross-covariance, rather than the actual C. Our simulations in which we allowed the design of A w and A using the experimentally observed Q made only marginal differences in SNR performance (these results are not included to preserve space and clarity). Hence the primary performance loss is due to switching to the worst-case cross-covariance. Of course what is gained in return is the robustness that allows the algorithm to be applied to much more general types of sources in Section IV-C, where determining the actual C from compressed data is very difficult. In Figure 3 (c) and (d), the same results are shown for the case where the transform coding is done using 2-level biorthogonal D7 9 wavelets instead of DCTs. As advertised, the change of the transform used in transform coding does not affect our core methodology which is transform agnostic, and we again observe performance improvements similar to the DCT case. In order to show performance on a process with discontinuities, we generated a piecewise smooth process as follows. First, the three segments, L 1 = {1,...,}, L 2 = {6,...,11} and L 3 = {12,...,16}, were created. These segments were then used to generate random vectors that have correlated samples inside each segment while samples between segments have no correlation, i.e., Gaussian random vectors were obtained with the covariance ρ i j, i L m and j L m, m = 1, 2, 3, K i,j = (29), elsewhere, with ρ =.9. Such covariance models are used in the literature to analyze piecewise smooth processes with step discontinuities (see for e.g., []). Performance on DCT and wavelet coded data is illustrated in Figures 4 (a, b) and 4 (c, d) respectively. We again observe the performance improvements due to post-processing with our estimators even though there is a significant change in the source model. As our results indicate, as long as the utilized transform is not the KLT for the underlying source (Proposition 3.1), the post-processing of this paper will improve mean squared error regardless of the actual source or transform type. B. Subspace Visualizations and Localization Since A obtains estimates by removing the components of the decoded data in subspaces determined by the columns of S (see Definition 2), it is interesting to consider what these subspaces look like in signal domain. Figure shows the calculated subspaces for the scenario of Figure 3 using a uniform quantizer and DCTs. For = 1, a single subspace is found, and for = 2 the subspace dimension increases to d = 3. Note the concentration around the block boundary in Figure (a), between sample points 8 and 9. Since the original process is first 1

19 Piecewise first order Markov process using block DCTs and uniform quantization Piecewise first order Markov process using block DCTs and deadzone quantization 12 A (with K only) w A (with K only) 9 A (with K only) w A (with K only) Optimal linear (with actual K, Q, and C) 8 Optimal linear (with actual K, Q, and C) SNR (db) 6 SNR (db) (a) Piecewise first order Markov process using D7 9 wavelets and uniform quantization (b) Piecewise first order Markov process using D7 9 wavelets and deadzone quantization 12 A w (with K only) Optimal linear (with actual K, Q, and C) 9 A (with K only) w A (with K only) 8 Optimal linear (with actual K, Q, and C) SNR (db) 6 SNR (db) (c) (d) Fig. 4. Improvements in SNR for a piecewise smooth Gaussian process containing discontinuities transform coded with 1 8 block DCTs (top row) and 2-level, biorthogonal D7 9 wavelets (bottom row). Same scenario as Figure 3 but using a piecewise first order Markov process (Equation (29)) instead of a signal-wide Markov process (Equation (28)). order Markov with a positive correlation coefficient, the algebraically obtained subspace is such that A removes differences at the block boundary. This is due to the algebraic properties of K (nearby pixels are correlated) and Q (as determined by the block transform), and it is not high-level knowledge that is built into the algorithm. Unlike transform specific methods, our approach automatically determines the relevant portions for estimation through our statistical methodology. In Figure (b), we again observe that the subspace corresponding to the most negative eigenvalue is such that the resulting estimator reduces the difference at the block boundary. But this time quantization is coarser and the extent of the boundary is automatically deemed larger. While other post-processing techniques have apriori steps determined by human experts to build specific constraints for block continuity, boundary extent, etc., such constraints come out as a simple by-product of our optimization. Most importantly, as we will see in Section IV-C, our algorithm can also successfully operate over image regions where smoothness assumptions do not hold, and where enforcing block continuity will generate incorrect estimates. In our simulations we will successfully deblock textures with the same algorithm that deblocks smooth regions. Note also that the eigenvectors of the first order process defined via Equation (28) 3 are substantially different from the calculated subspaces of Figure. Considering the SNR values for = 1 and = 2 in the performance plots of Figure 3 (a), we can see that, even at relatively high SNR settings, our estimators are constructing estimates outside 3 For ρ close to 1, it is well known that eigenvectors of K defined via Equation (28) are close to N 1 DCTs [14]. 16

20 λ= λ= (a) (b) Fig.. Calculated subspaces for a one dimensional (16 1) signal using 1 8 block DCTs and a uniform quantizer. (a): ρ =.9, = 1, d = 1, (b): ρ =.9, = 2, d = 3. Only the subspace with the largest absolute eigenvalue is shown in (b). The subspaces are automatically concentrated at the block boundaries. λ=.39 λ= λ= λ= λ= λ= Fig. 6. Calculated subspaces for a two dimensional (16 16) signal using 8 8 block DCTs and a uniform quantizer. ρ =.9, = 1, d = 62. Only the first six subspaces with the largest absolute eigenvalues are shown. The subspaces are automatically concentrated at the block boundaries. of the regimes indicated in Propositions 2.6 and 3.2. Since the results correspond to the case of a single scalar quantizer being used for all transform coefficients, it is clear that the disparity in coefficient standard deviations, as mapped through the quantizer curve D u (σ, 2), is responsible for this behavior. This can be observed further in Figure 6, which shows the first six subspaces calculated for a two dimensional, Gaussian random field, transform coded with 8 8 DCTs and a uniform quantizer (the covariance of two pixels at (i, j) and (k, l) is given by ρ (i k)2 (j l) 2, with ρ =.9). Again the subspaces are around block boundaries, and in particular, the subspace with the most negative eigenvalue is where four DCT blocks meet. Unlike established work, which tries to formulate a small number of constraints to derive estimates, depending on the coarseness of quantization, the techniques of this paper can generate many subspaces (d = 62 for this example). These can of course be used in an application specific way, for example by using only the first few subspaces that correspond to the most negative eigenvalues in order to save computation. In order to accommodate spatially varying image statistics and in order to curb processing complexity, our post-processing algorithm will estimate K and Q adaptively from localized image regions. With each such pair determining their own A, it becomes important to ensure that the calculated subspaces from different localized regions remain orthonormal with one another, so that an overall estimator can be constructed for the image in a straightforward way. As can be seen from the subspace visualizations, how this localization is achieved with respect 17

21 to the transform used in compression is important. For example, for post-processing DCT compressed images, the results of Figure 6 encourage the localized regions to tile the places where four DCT blocks meet in order to achieve the best performance for limited spatial extent. How this localization can be achieved for the example case of block DCTs is shown in Figure 7. In the figure, the dark gray area shows the desired localization, whose block translations can be used to tile the entire image. Since the tiling is obtained through nonoverlapping regions, one can process each region separately without subspace orthogonalization concerns. For each tile the needed K can be obtained using the covariance of pixels that are inside the given tile. For the needed Q however, one must have information about all four blocks since the tile overlaps four coded DCT blocks. Hence conceptually, one needs to form a large Q for the entire region and then project this to the tile of interest to determine estimators. With our simple quantizer model however, it suffices to have individual DCT coefficient variance estimates for blocks 1 through 4, which can then be used to obtain the needed Q. DCT Block 1 Block 2 Block 3 Block 4 Location of constrained spatial extent subspaces for tiling the image. Z = K Q Only requires the covariance of pixels in the constrained spatial extent. Requires the covariance of pixels in each block (1, 2, 3, 4), but not the interblock covariances. Fig. 7. Location of constrained spatial extent subspaces for a two dimensional signal using block DCTs. With our simple quantizer model, K and Q can be determined in a localized fashion, without requiring large spatial extent covariances. It is straightforward to arrive at overlapping tiles by windowing, where one obtains covariances K and Q describing a larger spatial extent than desired, and uses a windowing matrix W to generate the covariances of interest via, K = W KW T and Q = W QW T. Again, for the example case of 8 8 block DCTs, any orthonormal filter bank that remains orthonormal to its horizontal/vertical block translations can be used to construct windows to obtain overlapping tiles over regions where four DCT blocks meet. In order to preserve space, we leave further issues about window selection and optimization aside, and concentrate mainly on simple nonoverlapping tiles. (For comparison, Section IV-C will include several examples where this nonoverlapping scheme is extended to a translation invariant subspace estimator). Using the scenario of Figure 6, Figure 8 shows all the subspaces for the constrained spatial extent, 8 8 tile shown in Figure 7. C. Postprocessing Transform Coded Images This section is intended to show the results of directly applying the presented ideas to transform coded images. We first show results on 8 8 DCT compressed images, using the same uniform quantizer on all coefficients (Figures 9 (a) and (a)), using the same deadzone quantizer on all coefficients (Figures 9 (b) and (b)), using a set of JPEG quantizer tables (Table I), and finally using another set of JPEG quantizer tables (Table II). We then provide results on wavelet compressed images ( level, D7 9 biorthogonal wavelets) using the same uniform 18

22 . λ=.39.. λ=.26.. λ=.2.. λ=.2.. λ=.21.. λ=.21.. λ=.21.. λ=.1.. λ=.13.. λ=.13.. λ=.11.. λ=.6.. λ=.6.. λ=.6.. λ=.. Fig. 8. Calculated, constrained spatial extent subspaces for a two dimensional signal using 8 8 block DCTs and a uniform quantizer. Same signal covariance scenario as in Figure 6, ρ =.9, = 1, d = 1. The spatial extent of the subspaces is limited to be 8 8, located over the 8 8 region where four blocks meet. quantizer on all coefficients (Figures 9 (c), (c), and 11 (a)), using the same deadzone quantizer on all coefficients (Figures 9 (d), (d), and 11 (b)), and finally using JPEG2 compressed images (Table IV) 4. Through experimentation we found that it is beneficial to not completely remove signal components in the subspaces defined in Definition 2, but to scale them down via the estimator A = 1 αss T, (3) where α is a scalar (α = 1 corresponds to A = A ). The A estimator corresponds to the case where the subspaces are determined in a cross correlation independent way, but where the subsequent coordinatewise denoising allows for slightly more favorable conditions than those of the worst-case cross correlations (Figures 9 (a), (b), and (a), (b) show how the two estimators compare). The value of α can be motivated as follows. Assume d > in Definition 2. In the SQA coordinate system of Definition 2, the original value of each coordinate (say s i ) can be written in terms of the noisy version, ŝ i, as s i = ŝ i τ i, where τ i is the quantization error incurred by this coordinate. By the definition we have, σ 2 s i σ 2 τ i = σ 2 s i 2E[s i ŝ i ] σ 2 ŝ i, (31) from which we have E[s i ŝ i ]/σŝ 2 i 1/2. Hence, if we write the optimal linear estimate of s i using ŝ i given by E[s iŝ i] σ ŝ 2 i, as (1 α)ŝ i, we have α > 1/2. Setting α =.7 for the critically decimated results below tends to ŝ i improve PSNR results by.1 to.2 db compared to α = 1. All A results in this section use α =.7. For the noncritically decimated A (TI) results, α = 1 as we have not observed any significant benefits by setting α 1. Beyond the the definition of A, we obtained two different estimators based on the covariance model K used in their calculation. For both cases, we carried out the naive subspace localization with the abrupt rectangular window 4 As we were unable to obtain the JPEG2 quantizer parameters, we report post-processing results assuming all wavelet coefficients were deadzone quantized using a single guessed quantizer stepsize. The value was guessed by matching the PSNR of the baseline reconstructions with the wavelet PSNR curves of the corresponding image in Figures 9 (d), (d), or 11 (b). For each rate R, JPEG2 results are generated using the software in [33] with the command line kdu compress -i image.pgm -o image.jp2 -rate R -record image.rec. 19

23 shown in Figure 7, i.e., the calculated subspaces are restricted to be 8 8, constrained to regions where four DCT blocks meet. Estimation is single pass with no quantizer constraints, thresholding, etc., carried out. A (LUT): This estimator uses the circular covariance model so that two pixels at (i, j) and (k, l) have the covariance σpρ 2 (i k)2 (j l) 2, where σp 2 is the variance of pixel values, and ρ < 1. For this case we construct a lookup table (LUT) by sweeping the required parameters that will generate all possible A and A (up to discretization of the parameters) in a range of interest as follows: A general image region can be fit into the covariance model by estimating ρ and σ p from the region. Using Equation 27, the quantization distortion for the i th transform coefficient becomes ε i = 2 i D u/d (σ p σ i / i, ν = 2), (32) where σ i is the coefficient standard deviation for the case of a unit variance process, and i is the stepsize. Since the calculated subspaces do not depend on an overall normalization, by dividing the signal and quantization error covariances with σp 2 we are reduced to the case of a unit variance process, where the i th coefficient is quantized with the stepsize i /σ p. In the case where all transform coefficients use the same stepsize it is clear that the canonical Z = K Q becomes a function of ρ and /σ p. Sweeping these two parameters on a grid gives the desired set of estimators for the given quantizer. When a quantizer table is involved, one can write i = δ i, where δ i denote suitably normalized table stepsizes, and again sweep with ρ and /σ p. The post-processing in this case corresponds to estimating ρ and σ p from the given tile (one representative tile is shown shaded in Figure 7), finding the closest point on the LUT grid, and applying the corresponding estimator from the LUT. This technique is very fast and can easily be done at video framerates and beyond on modern processors. A (ADAPTIV E ADP): This estimator is more adaptive but requires an eigen decomposition for each tile (only the eigenvectors corresponding to negative eigenvalues, if any, are required). It is based on estimating the variance of DCT coefficients, and it does so for the DCT coefficients in blocks 1 through 4 in Figure 7, in order to determine Q using quantizer parameters as outlined in Section IV-A. The variance estimation is done by evaluating spurious forward DCTs in the / 4 pixel around each block, and averaging the square of each obtained coefficient to form an estimate of the variance of the corresponding coefficient. This version also estimates the DCT coefficient variances in the tile shown in Figure 7, and by assuming that the DCT basis approximates the KLT for the 8 8 region in the tile, forms K that governs the pixel covariance in the tile. This estimator allows for better processing in texture regions, and in its K estimation step, reuses the computations of the Q estimation step. A (TRANSLATION INV ARIANT TI): This final estimator is similar to A (ADP) except that α = 1, and the estimation is done in a way that is invariant to the translations of the tilings, i.e., the estimation operation is repeated for all possible shifts of 8 8 tilings and the results are averaged to obtain the final post-processed result. This estimator is included to demonstrate the limitations of the critically decimated estimators. DCT-JPEG Results: As can be seen in Figure 9 (a) and (b), both A estimators perform similarly on the standard image Lena and they are followed closely by A (LUT). On the standard image Barbara (Figure (a) and (b)) the situation changes and the ADP estimator gains an edge over LUT estimators. This is expected since the high frequency content of this image does not fit the smooth covariance model. In Tables I and II, we compare our 2

24 Results on Lena compressed with 8 8 DCTs and a uniform quantizer A (LUT) A (LUT) A (ADAPTIVE) Results on Lena compressed with 8 8 DCTs and a deadzone quantizer A (LUT) A (LUT) A (ADAPTIVE) PSNR (db) PSNR (db) (a) Results on Lena compressed with level D79 wavelets and a uniform quantizer 4 39 A (ADAPTIVE) 38 A (OVC) 3 (b) Results on Lena compressed with level D79 wavelets and a deadzone quantizer A (ADAPTIVE) 36 A (OVC) PSNR (db) PSNR (db) (c) (d) Fig. 9. Improvements in PSNR for the image Lena (luminance, 12 12) transform coded with 8 8 DCTs (top row) and level D7 9 wavelets (bottom row). In (a) and (c), all transform coefficients are scalar quantized with the same uniform quantizer of stepsize. In (b) and (d), a deadzone scalar quantizer is used. results to early work using JPEG quantizer tables. Observe that even the A (LUT) estimator provides results which are competitive with or better than some of the best results in the literature, which require significantly more computational power in order to establish alternating projections, overcomplete transforms, singularity processing, etc. The visual quality that can be expected from this version of our post-processing can be seen in Figures 12, 13, 14. Since localization is done via rectangular windowing, some residual blockiness remains especially since the pixels at the window boundaries are effectively estimated in a single-sided fashion. This is reduced when one switches to larger, overlapping windows (larger windows also help improve the performance beyond tabulated values in images like Peppers, where correlations are over larger spatial extents), or as shown in the figures, to translation invariant estimators. Observe in Figure 13 that A (ADP) is effective in deblocking textures and regions over high frequency structures. Better modeling and windowing are expected to improve visual quality and PSNR. Wavelet-JPEG2 Results: We report results for wavelet transforms using the A (ADP) and A (TI) estimators. Both estimators remain as designed for the DCT case, except for the change in Q due to the wavelet transform. As illustrated in Figures 9 (c, d), (c, d), 11 (a, b), post-processing gains are more modest especially for the deadzone quantizer. The deblocking gains offered over smooth portions of images are now absorbed by the level wavelet transform, which maches the KLT much better than block DCTs in such regions. As illustrated in 21

25 Fig.. PSNR (db) PSNR (db) Results on Barbara compressed with 8 8 DCTs and a uniform quantizer A (LUT) A (LUT) A (ADAPTIVE) (a) Results on Barbara compressed with level D79 wavelets and a uniform quantizer 4 39 A (ADAPTIVE) A (OVC) (c) PSNR (db) Results on Barbara compressed with 8 8 DCTs and a deadzone quantizer A (LUT) A (LUT) A (ADAPTIVE) (b) Results on Barbara compressed with level D79 wavelets and a deadzone quantizer A (ADAPTIVE) 36 3 A (OVC) PSNR (db) Improvements in PSNR for the image Barbara (luminance, 12 12) transform coded with 8 8 DCTs (top row) and level D7 9 wavelets (bottom row). In (a) and (c), all transform coefficients are scalar quantized with the same uniform quantizer of stepsize. In (b) and (d), a deadzone scalar quantizer is used. (d) Figures 1 and 16, post-processing must be able to extract any improvements around edges and texture regions where wavelets too deviate significantly from the KLT. Better covariance modeling is required for further improved results. Since JPEG2 adaptively terminates bistreams in coded tiles, the stepsize of the effective quantizer seen by each coefficient is typically different. Unfortunately, we were not able to obtain these quantizer parameters. We thus provide a rudimentary set of results in Table IV. (The post-processing results in Figures 9 (c, d), (c, d), and 11 (a, b) should be considered more representative.) V. CONCLUSIONS AND FUTURE WORK We have presented an algebraic method for reducing quantization noise in transform coded images by parameterizing transform compression with two canonical matrices, K and Q. The linear, worst-case estimators we have constructed based on this parameterization allow for a robust approach to denoising quantization noise. The presented approach is general and can easily be extended to other compression scenarios and to signals beyond images and video. As we have seen, the interaction of K and Q leads to artifacts along block boundaries for block transforms encoding smooth processes, but in general one expects different types of artifacts depending on the source statistics and utilized transforms. Our formulation automatically discovers and zooms into these problem areas through source and transform agnostic means. 22

26 PSNR (db) Baseline improvement A (LUT) improvement A (ADP) improvement A (TI) improvement in [3] improvement in [28] improvement in [3] TABLE I RESULTS ON JPEG COMPRESSED LENA USING THE QUANTIZER TABLES IN [31]. THE UTILIZED VERSION OF THE LENA IMAGE IS THE SAME AS THE ONE IN [3]. THE CITED RESULTS ARE OBTAINED FROM [3]. Pointing out the differences of the quantization noise denoising problem from white noise denoising, we designed subspaces that can be used in coordinate-wise denoising. Our results, even using a very simple implementation of the proposed ideas, are competitive with some of the best results in the literature, which require significant computational complexity and apriori constraints that must be built in by human experts. These simple results can be improved by allowing for better source covariance modeling, larger spatial sizes, overlapping windows, and translation invariant subspace formulations. In similar vein, well-known filter design techniques can be applied to obtain separable subspaces, per pixel adaptive denoising filters, etc. Unlike traditional denoising, where a single denoising coordinate system can be used regardless of the amount of noise (the optimal coordinate system depends on K alone), in denoising quantization noise, the optimal coordinate systems must vary with both K and Q. Hence, the transform used in compression plays a vital role in coordinate design through its impact on Q. While visualizations of subspaces show similarities as compression gets coarser, the associated sequence of optimal coordinate systems are difficult to approximate using a single basis. More sophisticated basis design and approximation methods are required to convert the quantization noise problem into a scenario that can be handled through a possibly overcomplete set of basis, thresholding, and nonlinear approximation PSNR (db) Results on Boat compressed with level D79 wavelets and a uniform quantizer 4 39 A (ADAPTIVE) 38 A 37 (OVC) PSNR (db) Results on Boat compressed with level D79 wavelets and a deadzone quantizer A (ADAPTIVE) 3 A (OVC) Fig. 11. Improvements in PSNR for the image Boat (luminance, 12 12) transform coded with level D7 9 wavelets. In (a) all transform coefficients are scalar quantized with the same uniform quantizer of stepsize. In (b) a deadzone scalar quantizer is used. 23

27 ideas similar to [7], [6]. APPENDIX I PROOF OF PROPOSITION 2.2 In order to show Equation (12), we find the worst-case cross-covariance. For convenience, let u = Aq and v = (1 A)x. Without loss of generality assume that the rank of u s covariance matrix is greater than or equal to the rank of v s covariance matrix, i.e., rank(e[uu T ]) rank(e[vv T ]). The max portion of Equation (8) reduces to max C {E[uT u] 2E[u T v] E[v T v]} = E[u T u] max C {2E[uT v]} E[v T v], (33) since the cross-covariance only affects the middle term. Using minimum mean squared linear estimation theory [26], we know that v can be written in terms of u as v = Pu δ, (34) where P is the minimum mean squared linear estimator and δ is the estimation error that is statistically orthogonal to u, i.e., E[uδ T ] =. As such, note that E[v T u] = E[u T Pu] = Tr[PE[uu T ]]. Since PE[uu T ]P and E[uu T ] are both real symmetric, positive semidefinite matrices they can each be factorized as PE[uu T ]P T = E[vv T ] E[δδ T ] = FF T E[uu T ] = GG T, (3) PSNR (db) Lena Peppers Baboon Baseline improvement A (LUT) improvement A (ADP) improvement A (TI) improvement in [28] improvement in [12] improvement in [2] improvement in [23] improvement in [3] improvement in [19] TABLE II RESULTS ON JPEG COMPRESSED IMAGES USING THE QUANTIZER TABLES IN [19]. THE CITED RESULTS ARE OBTAINED FROM [19]. (RESULTS ON BARBARA ARE NOT REPORTED SINCE WE COULDN T OBTAIN THE NON-STANDARD VERSION OF BARBARA USED IN [19]). 24

PSNR (db) Quantizer tables in [31] Quantizer tables in [19] Baseline 31.16 28.2 2.2 28.2 27.73 2.2 improvement A (LUT).44.8.73.8.2.73 improvement A (ADP).6.67.79.67.69.79 improvement A (TI).74.8.93.8.86.

28 PSNR (db) Quantizer tables in [31] Quantizer tables in [19] Baseline improvement A (LUT) improvement A (ADP) improvement A (TI) TABLE III RESULTS ON JPEG COMPRESSED BOAT USING THE QUANTIZER TABLES IN [31] AND IN [19]. and PG can be related to F via an orthonormal matrix H [11], i.e., The maximization in Equation (33) becomes max C {2E[uT v]} PG = FH. (36) = 2 max C {Tr[FHGT ]} = 2 max C {Tr[GT FH]} = 2 max C {Tr[BT H]}, (37) where B T = G T F. It is well known that the orthonormal H that maximizes Tr[B T H] results in this trace being the sum of the singular values of B [11]. We provide a short proof to preserve continuity. Let us refer to the columns of B and H by b i and h i respectively, with i = 1,...,N. The orthonormal H that maximizes Tr[B T H] = N i=1 b T i h i can be found using maximization with Lagrange multipliers (that enforce orthonormality) by writing N N N b T i h i λ i,j h T i h j. (38) i=1 i=1 j=i Taking derivatives with respect to the h i, we obtain b i = N j=1 γ i,j h j with γ i,j = γ j,i, or B T = ΓH T, (39) where Γ = Γ T and H is the maximizing H. It follows then that Tr[B T H ] = Tr[Γ]. Fig. 12. Decoded (baseline dB), A (LUT) post-processed (27.43dB), and A (TI) post-processed (27.4dB) image Lena (from the first column of results in Table I). 2

Fig. 13. Decoded (baseline - 27.83dB), A (ADP) post-processed (28.3dB), and A (TI) post-processed (28.84dB) image Barbara (deadzone quantized with DCTs and = ).

29 Fig. 13. Decoded (baseline dB), A (ADP) post-processed (28.3dB), and A (TI) post-processed (28.84dB) image Barbara (deadzone quantized with DCTs and = ). Observe that we can write the symmetric positive semidefinite matrix B T B = Γ 2 as the product of a symmetric positive semidefinite matrix R = (B T B) 1/2 with itself. R can be related to Γ by an orthonormal transform, and in particular the eigenvalues of R are the absolute values of the eigenvalues of Γ. Hence, all eigenvalues of Γ must also be nonnegative since H maximizes the trace, which is the sum of Γ s eigenvalues. We thus arrive at Tr[B T H] Tr[B T H ] (4) = Tr[Γ] = Tr[(Γ 2 ) 1/2 ] = Tr[(B T B) 1/2 ] = Tr[(G T FF T G) 1/2 ] = Tr[(G T PE[uu T ]P T G) 1/2 ] = Tr[(G T (E[vv T ] E[δδ T ])G) 1/2 ] PSNR (db) Lena Barbara Boat Baseline improvement A (TI) TABLE IV RESULTS ON JPEG2 COMPRESSED IMAGES COMPRESSED AT RATES.,.2, AND. BITS PER PIXEL. (AS WE WERE UNABLE TO OBTAIN THE QUANTIZER PARAMETERS, WE REPORT POST-PROCESSING RESULTS ASSUMING ALL WAVELET COEFFICIENTS WERE DEADZONE QUANTIZED USING A SINGLE GUESSED QUANTIZER STEPSIZE.) 26

Fig. 14. Top row, original, and decoded (baseline - 28.2dB). Bottom row, A (LUT) post-processed (28.83dB), A (ADP) postprocessed (28.92dB), and A (TI) post-processed (29.

46dB) image Boat (deadzone quantized with level D7 9 wavelets and = 4). PSNR inside the shown region is improved from 28.42dB to 28.76dB.

30 Fig. 14. Top row, original, and decoded (baseline dB). Bottom row, A (LUT) post-processed (28.83dB), A (ADP) postprocessed (28.92dB), and A (TI) post-processed (29.dB) image Boat (from second column of results in Table III). Fig. 1. Original, decoded (baseline dB), and A (TI) post-processed (29.46dB) image Boat (deadzone quantized with level D7 9 wavelets and = 4). PSNR inside the shown region is improved from 28.42dB to 28.76dB. Tr[(G T E[vv T ]G) 1/2 ] (41) where inequality (4) follows since H is maximizing, and (41) follows since E[δδ T ] is positive semidefinite. Note that the final expression is independent of C and it is the maximum we are looking for provided that we can meet the inequalities with equality. It is clear that we can meet the first inequality by choosing H = H, which forces Equation (36) into PG = FH. (42) In order to meet inequality (41) we must have E[δδ T ] = which requires v = Pu and E[vv T ] = PE[uu T ]P T. (43) 27

Fig. 16. Original, decoded (baseline - 28.4dB), and A (TI) post-processed (28.97dB) image Barbara (deadzone quantized with level D7 9 wavelets and = 4).

31 Fig. 16. Original, decoded (baseline dB), and A (TI) post-processed (28.97dB) image Barbara (deadzone quantized with level D7 9 wavelets and = 4). PSNR inside the shown region is improved from 27.22dB to 27.8dB. Since rank(e[uu T ]) rank(e[vv T ]), Equation (43) can be satisfied by a P that simultaneously satisfies Equation (42). The resulting random process [u T v T ] T can be realized by randomly generating u with covariance E[uu T ], and obtaining v = Pu. Hence there exists a C that enables the upper bound of (41). Letting G = AQ 1/2, Equation (33) thus becomes which is the expression we are looking for. max C {E[uT u] 2E[u T v] E[v T v]} = Tr[E[uu T ]] Tr[E[vv T ]] 2Tr[(G T E[vv T ]G) 1/2 ] = Tr[AQA T ] Tr[(1 A)K(1 A) T ] 2Tr[(Q 1/2 A T (1 A)K(1 A) T AQ 1/2 ) 1/2 ], (44) APPENDIX II PROOF OF PROPOSITION 2.3 In order to show the proposition we consider the mean squared error for using the worst-case estimation matrix A w, i.e., MSE(A w,c), and compare it to that of using the identity matrix as the estimation matrix, i.e., MSE(1,C). Observe that the latter term is independent of C, i.e., MSE(1,C) = MSE(1,.) = E[q T q] = Tr[qq T ] = Tr[Q]. We have max C {MSE(A w,c) MSE(1,C)} = max C {E[ (1 A w)x A w q 2 ] Tr[Q]}, (4) where we have used the max portion of Equation (8). Expanding Equation (4) in terms of defined matrices we get = max C {Tr[(1 A w)k(1 A w ) T ] 2Tr[(1 A w )CA T w] Tr[A w QA T w] Tr[Q]} = max C {Tr[(1 A w)(k Q)(1 A w ) T ] 2Tr[(1 A w )(C Q)A T w]}, (46) where in Equation (46) we have added and subtracted Tr[(1 A w )Q(1 A w ) T ] = Tr[Q 2QA T w A w QA T w] to the previous equation. 28

Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part I - Theory

Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part I - Theory Onur G. Guleryuz DoCoMo Communications Laboratories USA, Inc. 181 Metro Drive,