A Dual Formulation of the TV-Stokes Algorithm for Image Denoising Christoffer A. Elo, Alexander Malyshev, and Talal Rahman Department of Mathematics, University of Bergen, Johannes Bruns gate 12, 5007 Bergen, Norway Abstract. We propose a fast algorithm for image denoising, which is based on a dual formulation of a recent denoising model involving the total variation imization of the tangential vector field under the incompressibility condition stating that the tangential vector field should be divergence free. The model turns noisy images into smooth and visually pleasant ones and preserves the edges quite well. While the original TV- Stokes algorithm, based on the primal formulation, is extremely slow, our new dual algorithm drastically improves the computational speed and possesses the same quality of denoising. Numerical experiments are provided to demonstrate practical efficiency of our algorithm. 1 Introduction We suppose that the observed image d 0 (x,y), (x,y) Ω R 2, is an original image d(x,y) perturbed by an additive noise η, d 0 = d + η. (1) The problem of recovering the image d from the noisy image d 0 is an inverse problem that is often solved by variational methods using the total variation (TV) imization. The corresponding Euler equation, which is a set of nonlinear partial differential equations, is typically solved by applying a gradient-descent method to a finite difference approximation of these equations. A classical total variation denoising model is the primal formulation due to Rudin, Osher and Fatemi [1] (the ROF model): d d L1 + λ 2 d d 0 2 L 2. (2) The parameter λ > 0 can be chosen, e.g., to approximately fulfill the condition d d 0 L2 σ, where σ is an estimate of η L2. The Euler equation christoffer.elo@gmail.com alexander.malyshev@math.uib.no talal.rahman@math.uib.no
div ( d/ d ) + λ(d d 0 ) = 0 is usually replaced by a regularized one, ( ) d div + λ(d d 0 ) = 0, (3) d β where d β = d 2 + β 2 is a necessary regularization, since images contain flat areas where d = d 2 x + d 2 y 0. When solving (3) numerically, an explicit time marching scheme with an artificial time variable, t, is typically used. However, such an algorithm is rather slow due to severe restrictions requiring small time steps for the convergence. It is well known that the ROF model suffers from the so called staircase effect, which is a disadvantage when denoising images with affine regions. To overcome this defect, we motivate for a two-step approach, where the fourth-order model, studied in [2 4], is decoupled into two second-order problems. Such methods are known to overcome the staircase effect, but tend to have computational difficulties due to very large conditioning. The authors of [5, 6] used the same two-step approach as in [7], but adopting ideas from [8, 9] they proposed to preserve the divergence-free condition on the tangential vector field. Recall that the tangential vector field τ is orthogonal to the normal (gradient) vector field n of the image d: n = d = (d x,d y ), τ = d = ( d y,d x ) T. (4) Hence div τ = 0. The first step of the TV-Stokes algorithm smoothes the tangential vector field τ 0 = d 0 for a given noisy image d 0 and then solve the imization problem { τ L1 + 12δ } τ τ τ 0 2L2 subject to div τ = 0, (5) where δ > 0 is some carefully chosen parameter. Once a smoothed tangential vector field τ is obtained, the second step reconstructs the image d by fitting it to the normal vector field by solving the imization problem d { d L1 ( ) } n d, n L 2 subject to d d 0 L2 = σ, (6) where σ is an estimate of η L2. In [5] the imization problems (5) and (6) are numerically solved by means of a time marching explicit scheme, while existence and uniqueness are proven for the Modified TV-Stokes in [6]. The TV-Stokes approach resulted in an algorithm which does not suffer from the staircase effect, preserves the edges, and the denoised images look visually pleasant. However, the TV-Stokes algorithm from [5] is extremely slow convergent and therefore practically unusable as demonstrated in the last section of the present paper.
We adopt the TV-Stokes denoising model but reduce the above presented primal formulation to the so called dual formulation, which is then numerically solved by a variant of fast Chambolle s iteration [10]. The reduction exploits the orthogonal projector Π K onto the subspace K = {τ : div τ = 0} for eliation of the divergence-free constraint. 2 The TV-Stokes denoising algorithm in dual formulation To overcome difficulties with non-differentiability in the primal formulation, Carter[11], Chambolle[10] and Chan, Golub and Mulet [12] have proposed dual formulations of the ROF model, where a dual variable p = (p 1 (x,y),p 2 (x,y)) is used to express the total variation: d L1 = max {(d,divp) L2 : p j (x,y) 1 (x,y) Ω, j = 1,2}. (7) p For instance, a variant of dual formulation from [10] consists in imization of the distance divp λd 0 L2. In [10] Chambolle also proposed a fast iteration for solving this imization problem that produces a denoised image after a few steps only. Below we show how to reduce the TV-Stokes model to a dual formulation. 2.1 Step 1 To derive a dual formulation of the first step we take advantage of the following analog of (7) for the total variation of the tangential vector field τ = (τ 1,τ 2 ) T : τ L1 = max {(τ,divp) L2 : p i (x,y) 1 (x,y) Ω, i = 1,2}, (8) p where the dual variable p is a pair of two rows, p 1 = (p 11,p 12 ) and p 2 = (p 21,p 22 ). The divergence is defined as follows: divp = (divp 1,divp 2 ) T, where divp i = p i1 x + p i2, i = 1,2. (9) y This definition is similar to the vectorial dual norm from [13] for vectorial images, e.g. color images. Plugging (8) into (5) yields {(τ,divp) max L2 + 12δ } (τ τ o,τ τ o ) L2. (10) div τ=0 p i 1 Results from convex analysis, see for instance Theorem 9.3-1 in [14], allow us to exchange the order of max and in (10) and obtain an equivalent optimization problem max {(τ,divp) L2 + 12δ } (τ τ o,τ τ o ) L2. (11) p i 1 div τ=0
Now comes a trick. Let us introduce the orthogonal projection Π K onto the constrained subspace K = {τ : div τ = 0}. Note that τ 0 K. By means of the pseudoinverse + we may write that Π K ( τ1 τ 2 ) = ( τ1 τ 2 ) + div ( τ1 τ 2 ). (12) The constraint div τ = 0 means that Π K τ = τ, and the latter implies the equalities (τ,divp) = (Π K τ,divp) = (τ,π K divp). Hence (11) is equivalent to max {(τ,π K divp) L2 + 12δ } (τ τ o,τ τ o ) L2. (13) p i 1 div τ=0 Solution to the imization problem (without constraint div τ = 0!) {(τ,π K divp) L2 + 12δ } (τ τ τ o,τ τ o ) L2 is τ = τ 0 δπ K divp (14) and satisfies the constraint div τ = 0. Owing to (14) we have the equality (τ,π K divp)+ 1 2δ (τ τ o,τ τ o ) = 1 2δ [(τ 0,τ 0 ) (δπ K divp τ 0,δΠ K divp τ 0 )], which together with (13) gives our dual formulation: p { ΠK divp δ 1 τ 0 L2 : p i 1, i = 1,2 }. (15) Numerical solution of (15) is computed by Chambolle s iteration from [10]: p 0 = 0, p n+1 = pn + t [ ( )] Π K divp n δ 1 τ 0 1 + t (Π K divp n δ 1. (16) τ 0 ) The iteration converges rapidly when t 1 4. The smoothed tangential field after n iterations is given by τ n = τ 0 δπ K divp n. 2.2 Step 2 The image d is reconstructed at the second step by fitting it to the normal vector field built from the tangential vector field computed at step 1, (n 1,n 2 ) = (τ 2, τ 1 ). Again we introduce a dual variable r = (r 1 (x,y),r 2 (x,y)) and use the formula d L1 = max r 1 ( d, r) L2. Then the imization problem (6) is equivalent to the problem max d r 1 { ( d,div ( r + n n )) L 2 + 1 2µ d d 0 2 L 2 }, (17) where µ > 0 is a Lagrangian multiplier. After interchanging and max in (17) we find conditions for attaining the imum: ( d = d 0 µdiv r + n ). (18) n
By analogy with (15) we can derive the dual formulation for step 2: { ( r div r + n ) d } 0 n µ : r 1. (19) L2 Chambolle s iteration for (19) is as follows: [ ( ( ) )] r n + t div r n + n r n+1 n µ 1 d 0 = ( ( ) ) 1 + t div r n + n. (20) n µ 1 d 0 2.3 The discrete algorithm The staggered grid is used for discretization as in [5]. For convenience we introduce the differentiation matrices 1 1 1 B = 1 1 1 h......, BT = 1 1 1......, (21) h 1 1 1 1 1 where B is the forward difference operator and B T is the backward difference operator. The discrete gradient operator applied to a matrix d is then defined as h d = ( db T x,b y d ), (22) where B x (B y ) stands for differentiation in the x (resp. y) direction. The discrete divergence operator is given by div h (p 1,p 2 ) = p 1 B x B T y p 2. (23) The discrete analog of the projection operator Π K has the form Π h K = I h ( h ) + div h, (24) where the gradient and divergence are applied in a slightly different manner: ( ) ( ) div h τ1 = τ τ 1 B x By T τ 2, h db T d = x. (25) 2 B y d To complete the definition (24) we need a description of the pseudoinverse operator ( h ) + for the discrete Laplacian h d = db T x B x B T y B y d. (26) Let us introduce the orthogonal N N matrix of the Discrete Cosine Transform, C, which is defined by dst(eye(n)) in MATLAB. The symmetric matrix
of the Discrete Sine Transform, S, defined in MATLAB by dst(eye(n-1)), satisfies the equation S T S = (N/2) I, where I is the identity matrix. We prefer to use the orthogonal symmetric matrix S = S/ N/2 of order N 1. The singular value decomposition of B has the form B = S[0,Σ]C, Σ = diag(σ 1,...,σ N 1 ), (27) where the diagonal matrix Σ has the diagonal entries σ k = 2 h πk sin, k = 1,2,...,N 1. (28) 2N By the aid of (27) equation (26) can be rewritten as [ ] [ ] f = h d = dc T 0 0 Σx 2 C C T Σy 2 Cd. (29) Denoting f = CfC T and d = CdC T we arrive at the equation [ ] [ ] 0 0 f = d Σx 2 d. Σy 2 (30) This equation is easily solved with respect to d. Suppose that the matrices f and d have the entries f ij and d ij for i,j = 0,1,... Note that in our case f 00 = 0. Then the solution d = G( f) is as follows: d 00 = 0, d i,0 = f i,0 /σ 2 i,y, i = 1,2,..., d 0,j = f 0,j /σ 2 j,x, j = 1,2,..., (31) d ij = f ij /(σi,y 2 + σ2 j,x ), i,j = 1,2,... Thus the pseudoinverse operator ( h ) + can be efficiently computed with the help of the Discrete Cosine Transform: ( h ) + f = C T G(CfC T )C, (32) where the function G is defined in (31). In conclusion we recall that multiplication of an N N matrix by C or C T = C 1 is typically implemented by the aid of the fast Fourier transform and requires only O(N 2 log 2 N) arithmetical operations. All other computations have the cost O(N 2 ).
(a) Lena, 200 200 (b) Cameraman, 256 256 (c) Barbara, 512 512 Fig. 1. Original images Algorithm: Dual TV-Stokes Given d 0, k, δ and µ ; Step one; Let p 0 = 0 and q 0 = 0 ; Calculate τ 0 = (v 0,u 0 ) : v 0 = Bd and u 0 = db T ; Initialize counter: n = 0 ; while not converged do Calculate projections: Update counter: n = n + 1 ; end Calculate τ: Step two; (π p,π q ) = ΠK(div h h p n,div h q n ) (33) p n+1 = pn + k ( ( )) h π p δ 1 v 0 1 + k ( h (π p δ 1 v 0 )). (34) q n+1 = qn + k ( h ( π q δ 1 u 0 )) 1 + k ( h (π q δ 1 u 0 )). (35) τ = τ 0 Π h K(δdiv h p n+1,δdiv h q n+1 ) (36) Let r 0 = 0 and calculate the normal field: n = (n 1,n 2 ),n 1 = u(v 2 + u 2 ) 1/2 and n 2 = v(v 2 + u 2 ) 1/2 ; Initialize counter: n = 0 ; while not converged do Calculate projections: )) r n + k ( (div h h (r n + n) µ 1 v 0 r n+1 = )) 1 + k ( (div h h. (37) (r n + n) µ 1 v 0 Update counter: n = n + 1 ; end Recover image d: d = d 0 µdiv h r n+1 (38) Algorithm 1: Dual TV-Stokes algorithm for image denoising
2.4 Numerical experiments 130 4.4 x 105 120 4.2 110 4 100 3.8 3.6 90 3.4 80 3.2 70 3 60 2.8 50 0 10 20 30 40 50 60 70 80 90 100 (a) Energy vs. iterations for the dual TV-Stokes algorithm 1 2.6 0 1 2 3 4 5 6 7 8 (b) Energy vs. iterations for the TV-Stokes algorithm from [5] x 10 4 Fig. 2. Energy plot for the first step In what follows we present several examples to show how the TV-Stokes method works for different images. All the images we have tested are normalized into gray-scale values, ranging from 0 (black) to 1 (white). In the experiments we start with a clean image, shown in figure 1, and then add random noise with zero mean. This is done by the imnoise MATLAB command, where the variance parameter is set to 0.001 for the Barbara image and 0.005 for the Lena image. The Cameraman image is taken directly from the paper [5], so we compare the results with the same noisy image as input. In [5] this model is further compared to the two-step method LOT and famous ROF model. The signal-to-noise ratio is measured in decibels before denoising: ( SNR = 20log 10 Ω(d ) d)2 dx Ω (η, (39) η)2 dx where d = 1 d dx, and η = 1 η dx (40) Ω Ω Ω Ω The numerical procedures used in [5] were based on explicit finite difference schemes. This process is very slow, as the constraint converges slowly. However, in the proposed dual method the constraint is satisfied on each step by the orthogonal projection. The energy and number of iterations required for convergence in step one are shown in figure 2. The figure clearly illustrates that the dual TV-Stokes algorithm requires less iterations before the energy is stable than the primal TV-Stokes algorithm. Although the iterations in the dual TV-Stokes algorithm require more computational effort in each iteration, it is much faster than using sparse linear solvers.
Inverting the Laplacian for the orthogonal projection in each iteration is a bottleneck for very large images. In all these examples the projection was applied by the aid of the Fast Fourier Transform, which needs O(n 2 log(n)) operations in each iteration. For very large images, one should consider using a multigrid solver method for applying the projection. This will reduce the operations cost to O(N 2 ). All methods were coded in MATLAB, and in table 1 the CPU time is given in seconds for each test image. The figure shows the dual TV-Stokes algorithm vs. the primal TV-Stokes algorithm from [5]. We measure the L 2 -norm of the energy in (15) and (19) for stopping criteria, and stop the iteration when the difference of the energy is below 10 3. For the TV-Stokes algorithm we used the same stopping criteria as in [5], where the tolerance of the L 2 -norm of the constraint is equal to 5 10 3 and the difference in the energy tolerance is equal to 10 3. The time steps were set to 10 3 and 5 10 3 respecitvely for the first and second step of the TV-Stokes algorithm. Our first test is the well known Lena image, which we will recover from highly added noise. We have cropped the image to show the face, which consists of smooth areas and edges that are important to preserve. The denoised image in Figure 3, shows that the dual TV-Stokes method has recovered the smooth areas without inducing any staircase-effect. The smoothing parameter δ is equal to 0.0835 and µ is equal to 0.17. Since this is a highly noisy image, the ROF model fails to give a visually pleasant image, because the smooth surfaces are piecewise continuous. The TV-Stokes algorithm however, has nearly the same quality as the dual TV-Stokes algorithm. For the TV-Stokes algorithm, δ was equal to 0.045. The next test is the Cameraman image, which consists of a smooth skyline and some low-intensity buildings in the background. The buildings are difficult to recover, as they get smeared out by the denoising. The results are shown in figure 4 with δ equal to 0.055 and µ equal to 0.08. The TV-Stokes result is taken from [5] where the SNR are the same as the one we report, 20log 10 (8.21) 18.28. Figure 4.e shows the TV-Stokes reconstruction for the same noisy image, where the delta parameter is equal to 0.06. The last example is the Barbara image, which is quite detailed, with high and low intensity textures. The high intensity textures and the smooth areas are preserved quite well, but the low intensity textures disappear in the same way as for the Cameraman. This image is 512 512 in size, which makes the algorithm slower, because of the rather large number of matrix operations per iteration. However, reaching a result for the optimal parameters is still obtainable, since the method has a denoised image after a few steps. Thus, one can run the method multiple times to find the optimal parameters. For this image we used δ equal to 0.05 and µ equal to 0.15. We do not report on an optimal result for the particular case of the TV-Stokes algorithm, due to page limitation and the amount of running time. Clearly, using the dual formulation is more effective than solving the model with the explicit gradient descent method. The CPU time is found for only one
(a) Noisy image, SNR 14.0 (b) Denoised using the dual TV- Stokes algorithm (c) Contour plot, dual TV-Stokes image (d) Difference image, dual TV- Stokes (e) Denoised using ROF [1] (f) Difference image, ROF (g) Denoised using the TV-Stokes algorithm [5] (h) Difference image, TV-Stokes Fig. 3. Lena image (200 200), denoised using the dual TV-Stokes, TV-Stokes and the ROF algorithm.
(a) Noisy image, SNR 18.28 (b) Denoised using the dual TV- Stokes algorithm (c) Contour plot, dual TV-Stokes image (d) Difference image, dual TV- Stokes (e) Denoised using the TV-Stokes algorithm [5] (f) Difference image, TV-Stokes Fig. 4. Cameraman (256 256), denoised using the dual and the primal formulation of the TV-Stokes algorithm.
(a) Noisy image, SNR 20.0 (b) Denoised image (c) Contour plot (d) Difference image Fig. 5. Barbara (512 512), denoised using the dual formulation of the TV-Stokes algorithm. Algorithm Dual TV-Stokes algorithm TV-Stokes algorithm, [5] Image First step Second step First step Second step Lena 9.8 1.12 9083.2 1992.5 Cameraman 17.4 2.2 11189.0 2259.4 Barbara 128.2 20.7 80602.5 14926.3 Table 1. Runtimes of the dual TV-Stokes algorithm compared to the TV-Stokes algorithm [5]. The test system is a 2 Opteron 270 dualcore 64-bit processor and 8GB RAM. Both steps in the dual TV-Stokes algorithm are computed with 150 iterations, while the first step in the primal TV-Stokes algorithm is calculated with 75000 iterations and the second step with 25000 iterations.
runtime, since computing an average of many runtimes is very time consug for the TV-Stokes method. Although, the time shown are for one runtime, they clearly give the indication that our method is much faster and stable. The comparison with the primal method also shows that the proposed dual method has the same denoising quality. References 1. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1-4) (1992) 259 268 2. Chan, T., Marquina, A., Mulet, P.: High-order total variation-based image restoration. SIAM J. Sci. Comput. 22(2) (2000) 503 516 3. Chambolle, A., Lions, P.L.: Image recovery via total variation imization and related problems. Numer. Math. 76 (1997) 167 188 4. Lysaker, O., Lundervold, A., Tai, X.C.: Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Trans. Imag. Proc 12 (2003) 1579 1590 5. Rahman, T., Tai, X.C., Osher, S.: A tv-stokes denoising algorithm. In Sgallari, F., Murli, A., Paragios, N., eds.: SSVM. Volume 4485 of Lecture Notes in Computer Science., Springer (2007) 473 483 6. Litvinov, W., Rahman, T., Tai, X.C.: A modified tv-stokes model for image processing. (Submitted 2008) 7. Lysaker, O.M., Osher, S., Tai, X.C.: Noise removal using smoothed normals and surface fitting. IEEE Transaction on Image Processing 13(10) (2004) 1345 1357 8. Bertalmio, M., Bertozzi, A., Sapiro, G.: Navier-stokes, fluid dynamics, and image and video inpainting. Proc. IEEE Computer Vision and Pattern Recognition (CVPR) (2001) 9. Tai, X., Osher, S., Holm, R.: Image inpainting using tv-stokes equation. Image Processing based on partial differential equations (2006) 10. Chambolle, A.: An algorithm for total variation imization and applications. J. Math. Imaging Vis. 20(1-2) (2004) 89 97 11. Carter, J.: Dual methods for total variation-based image restoration. PhD thesis, UCLA (2001) 12. Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6) (1999) 1964 1977 13. X., B., T.F., C.: Fast imization of the vectorial total variation norm and applications to color image processing. CAM Report 07-25 (2007) 14. Ciarlet, P.G., Jean-Marie, T., Bernadette, M.: Introduction to numerical linear algebra and optimisation. Cambridge University Press (1989)