Scaled gradient projection methods in image deblurring and denoising

Scaled gradient projection methods in image deblurring and denoising Mario Bertero 1 Patrizia Boccacci 1 Silvia Bonettini 2 Riccardo Zanella 3 Luca Zanni 3 1 Dipartmento di Matematica, Università di Genova 2 Dipartimento di Matematica, Università di Ferrara 3 Dipartimento di Matematica, Università di Modena e Reggio Emilia Conference on Applied Inverse Problems, Vienna July 20 24 2009 Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 1 / 26

Outline 1 Examples of Imaging problems 2 Optimization problem 3 Gradient methods and step-length selections 4 Scaled Gradient Projection (SGP) Method 5 Test results 6 Conclusions and Future Works Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 2 / 26

Image Deblurring example Image acquisition model: y = Hx + b + n, where: y R n observed image, H R n n blurring operator, b R n background radiation, n R n unknown noise. Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 3 / 26

Image Deblurring example Image acquisition model: y = Hx + b + n, where: y R n observed image, H R n n blurring operator, b R n background radiation, n R n unknown noise. Goal: Find an approximation of the true image x R n Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 3 / 26

Image Deblurring example Image acquisition model: where: y R n observed image, H R n n blurring operator, b R n n R n y = Hx + b + n, background radiation, unknown noise. Goal: Find an approximation of the true image x R n Maximum Likelihood Approach (and early stopping) min L y (x) sub. to x Ω Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 3 / 26

Image Denoising example Image acquisition model: y = x + n, where: y R n n R n observed image, unknown noise. Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 4 / 26

Image Denoising example Image acquisition model: y = x + n, where: y R n n R n observed image, unknown noise. Goal: Remove noise from y R n, while preserving some features Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 4 / 26

Image Denoising example Image acquisition model: y = x + n, where: y R n n R n observed image, unknown noise. Goal: Remove noise from y R n, while preserving some features Regularized Approach where J R (x) is (for example): x 2 2 Tikhonov, x 1 sparsity inducing, x Total Variation. Ω min J (0) y (x) + µj R (x) sub. to x Ω Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 4 / 26

Problem setting Both examples lead to: Constrained optimization problem min f (x) sub. to x Ω Ω f (x) is a convex and closed set is countinuously differentiable in Ω Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 5 / 26

Why gradient type methods? Gradient methods are first order optimization methods. Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 6 / 26

Why gradient type methods? Gradient methods are first order optimization methods. pros Simplicity of implementation first order iterative method Low memory requirements suitable to face high dimensional problems Ability to provide medium-accurate solutions Semiconvergence from numerical practice Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 6 / 26

Why gradient type methods? Gradient methods are first order optimization methods. pros Simplicity of implementation first order iterative method Low memory requirements suitable to face high dimensional problems Ability to provide medium-accurate solutions Semiconvergence from numerical practice cons Low convergence rate hundreds or thousands of iterations Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 6 / 26

The Barzilai-Borwein (BB) step-length selection rules Consider the gradient method: x (k+1) = x (k) α k g (k) k = 0, 1,..., with g(x) = f (x). Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 7 / 26

The Barzilai-Borwein (BB) step-length selection rules Consider the gradient method: x (k+1) = x (k) α k g (k) k = 0, 1,..., with g(x) = f (x). Problem: How the step-length α k > 0 can be chosen to improve the convergence rate? Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 7 / 26

The Barzilai-Borwein (BB) step-length selection rules Consider the gradient method: with g(x) = f (x). Solution: x (k+1) = x (k) α k g (k) k = 0, 1,..., Regard the matrix B(α k ) = (α k I) 1 as an approximation of the Hessian 2 f (x (k) ) Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 7 / 26

The Barzilai-Borwein (BB) step-length selection rules Consider the gradient method: x (k+1) = x (k) α k g (k) k = 0, 1,..., with g(x) = f (x). Solution: Regard the matrix B(α k ) = (α k I) 1 as an approximation of the Hessian 2 f (x (k) ) Determine α k by forcing a quasi-newton property on B(α k ): α k BB1 = argmin α R B(α)s(k 1) z (k 1) or α k BB2 = argmin α R s(k 1) B(α) 1 z (k 1), where s (k 1) = ( x (k) x (k 1)) and z (k 1) = (g (k) g (k 1) ). Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 7 / 26

The BB step-length selection rules (cont.) It follows that: α BB1 k = s(k 1)T s (k 1) or α BB2 k = s(k 1)T z (k 1) s (k 1)T z (k 1) z (k 1)T z (k 1) where s (k 1) = ( x (k) x (k 1)) and z (k 1) = (g (k) g (k 1) ). Remarkable improvements in comparison with the steepest descent method are observed: [Barzilai-Borwein, IMA J. Num. Anal. 1988] [Raydan, IMA J. Num. Anal. 1993] [Friedlander et al., SIAM J. Num. Anal. 1999] [Raydan, SIAM J. Optim. 1997] [Fletcher, Tech. Rep. 207, 2001] [Dai-Liao, IMA J. Num. Anal. 2002] Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 8 / 26

Effective use of the BB rules Further improvements are obtained by using adaptive alternations of the two BB rules; for example: α k = αk BB2 if αk BB2 /αk BB1 < τ, α k = αk BB1 otherwise, Many suggestions for the alternation are available: [Dai, Optim., 2003] [Dai-Fletcher, Math. Prog. 2005] [Serafini et al., Opt. Meth. Soft. 2005] [Dai et al., IMA J. Num. Anal. 2006] [Zhuo et al., Comput. Opt. Appl., 2006 ] [Frassoldati et al., J. Ind. Manag. Opt. 2008] Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 9 / 26

The BB step-lengths and Scaled Gradient Methods Consider the scaled gradient method: x (k+1) = x (k) α k D k g (k) k = 0, 1,..., where D k is the symmetric positive definite scaling matrix. By forcing the quasi-newton properties on B(α k ) = (α k D k ) 1 we have α BB1 k = s(k 1)T D 1 k D 1 k s (k 1)T D 1 k z (k 1) α k BB2 = and s (k 1) s(k 1)T D k z (k 1) z (k 1)T D k D k z (k 1), where s (k 1) = ( x (k) x (k 1)) and z (k 1) = (g (k) g (k 1) ). Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 10 / 26

Scaled Gradient Projection (SGP) method: basic notations [Bonettini et al., Inv. Prob. 2009] Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 11 / 26

Scaled Gradient Projection (SGP) method: basic notations [Bonettini et al., Inv. Prob. 2009] Scaling matrix: D k D L = {D s.p.d. R n n D L, D 1 L}, L > 1, if D k is diagonal, the requirement leads to: L 1 (D k ) ii L. Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 11 / 26

Scaled Gradient Projection (SGP) method: basic notations [Bonettini et al., Inv. Prob. 2009] Scaling matrix: D k D L = {D s.p.d. R n n D L, D 1 L}, L > 1, if D k is diagonal, the requirement leads to: Projection operator: L 1 (D k ) ii L. P Ω,D (x) argmin y Ω x y D, where x D = x T Dx. Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 11 / 26

Scaled Gradient Projection (SGP) method Given 0 < α min < α max, β, γ (0, 1) line-search parameters, and fix a positive integer M. Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 12 / 26

Scaled Gradient Projection (SGP) method Given 0 < α min < α max, β, γ (0, 1) line-search parameters, and fix a positive integer M. 1. Initialization. Set x (0) Ω, D 0 D L, α 0 [α min, α max ] For k = 0, 1, 2,... 2. Projection. y (k) = P Ω,D 1(x (k) α k D k f (x (k) )); k If y (k) = x (k) then stop. 3. Descent direction. d (k) = y (k) x (k). 3. Line-search. Set λ k = 1 and f = max 0 j min{k,m 1} f (x (k j) ) While f (x (k) + λ k d (k) ) > f + γλ k f (x (k) ) T d (k) λ k = βλ k end. Set x (k+1) = x (k) + λ k d (k). end Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 12 / 26

Scaled Gradient Projection (SGP) method Given 0 < α min < α max, β, γ (0, 1) line-search parameters, and fix a positive integer M. 1. Initialization. Set x (0) Ω, D 0 D L, α 0 [α min, α max ] For k = 0, 1, 2,... 2. Projection. y (k) = P Ω,D 1(x (k) α k D k f (x (k) )); k If y (k) = x (k) then stop. 3. Descent direction. d (k) = y (k) x (k). 3. Line-search. Set λ k = 1 and f = max 0 j min{k,m 1} f (x (k j) ) While f (x (k) + λ k d (k) ) > f + γλ k f (x (k) ) T d (k) λ k = βλ k end. Set x (k+1) = x (k) + λ k d (k). 4. Update. Define D k+1 and α k+1 [α min, α max ]. end Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 12 / 26

SGP acceleration techniques The acceleration technique involves: Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 13 / 26

SGP acceleration techniques The acceleration technique involves: selection of the step-length α k : general algorithm Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 13 / 26

SGP acceleration techniques The acceleration technique involves: selection of the step-length α k : general algorithm definition of the scaling matrix D k : problem dependent (see the experiment section) Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 13 / 26

SGP step-length selection Let α min = 10 3, α max = 10 5, M α = 3, τ = 0.5 Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 14 / 26

SGP step-length selection Let α min = 10 3, α max = 10 5, M α = 3, τ = 0.5 if s (k 1)T D 1 k z (k 1) 0 if s (k 1)T D k z (k 1) 0 αk BB1 = α max αk BB2 = α max else else α = s(k 1)T D 1 D 1 s (k 1) (k 1) T k k D k z (k 1) end s (k 1)T D 1 z (k 1)T D k D k z (k 1) αk BB1 = min{α max, max{α min, α}} αk BB2 = min{α max, max{α min, α}} k z (k 1) α = s end Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 14 / 26

SGP step-length selection Let α min = 10 3, α max = 10 5, M α = 3, τ = 0.5 if s (k 1)T D 1 k z (k 1) 0 if s (k 1)T D k z (k 1) 0 αk BB1 = α max αk BB2 = α max else else α = s(k 1)T D 1 D 1 s (k 1) (k 1) T k k D k z (k 1) end s (k 1)T D 1 z (k 1)T D k D k z (k 1) αk BB1 = min{α max, max{α min, α}} αk BB2 = min{α max, max{α min, α}} k z (k 1) α = s end if αk BB2 /αk BB1 < τ α k = min{αk j BB2, j = 0,..., M α 1} τ = τ 0.9 else α k = αk BB1 τ = τ 1.1 end Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 14 / 26

Convergence of SGP min f (x) sub. to x Ω (1) Ω f (x) is a convex and closed set is countinuously differentiable in Ω Theorem Assume that the level set Ω 0 = {x Ω : f (x) f (x (0) )} is bounded. Every accumulation point of the sequence {x (k) } generated by the algorithm SGP is a stationary point of (1). Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 15 / 26

Image Deblurring: Poisson noise Object Blurred Noisy image Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 16 / 26

Image Deblurring: Poisson noise Object Blurred Noisy image f (x) = D KL (Hx + b, y) = n i=1 ( n j=1 H ijx j + b i y i y i log Ω = {x R n x i 0, i = 1,..., n} P n ) j=1 H ij x j +b i y i A suited reconstruction is obtained by early stopping the SGP iterations. Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 16 / 26

Image Deblurring: Poisson noise (II) Algorithms: SGP Adaptive selection of α k, scaling matrix D k = min L, max diag(x (k) ), L 1. EM Richardson-Lucy or Expectation Maximization algorithm. EM_MATLAB deconvlucy function, Matlab image toolbox. WMRNSD Weighted Minimum Residual Norm Steepest Descent [Bardsley-Nagy]. Algorithm it. number l 2 rel. err. time [s] SGP 26 0.069 1.61 EM 500 0.069 21.69 EM_MATLAB 44 0.070 2.64 WMRNSD 80 0.069 4.26 Test environment: Matlab 7.5.0 on an AMD Opteron Dual Core 2.4 GHz processor. Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 17 / 26

Image Deblurring: SGP reconstruction Object Blurred Noisy image SGP reconstruction Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 18 / 26

Image Denoising: Poisson noise Object Blurred Noisy image Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 19 / 26

Image Denoising: Poisson noise Object Blurred Noisy image f (x) = D KL (x, y) + β TV(x) Ω = {x R n x i η, i = 1,..., n} Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 19 / 26

Image Denoising: Poisson noise (II) Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 20 / 26

Image Denoising: Poisson noise (II) Algorithms: SGP Adaptive selection of α k, scaling matrix D k = x (k) / (1 + βv ), with f (x (k) )) = V U, V i 0 and U i 0. [Lanteri et al., Inv. Prob. 2002] GP Adaptive selection of α k, scaling matrix D k = I. GP-BB Only α k BB1, scaling matrix D k = I. Algorithm it. number l 2 rel. err. time [s] SGP 148 0.025 14.30 GP 280 0.025 23.23 GP-BB 735 0.025 70.62 Test environment: Matlab 7.5.0 on an AMD Opteron Dual Core 2.4 GHz processor. [Zanella et al., Inv. Prob. 2009] Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 20 / 26

Image Denoising: SGP reconstruction Object Noisy image SGP reconstruction Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 21 / 26

An application in medical imaging Object Noisy image SGP reconstruction Image size: 512 512, parameters: β = 0.3. Noisy image relative error: 17.9%. Reconstructed image relative error: 2.9%. Computational time: 20.95 seconds (Matlab 7.5.0 on an AMD Opteron Dual Core 2.4 GHz processor). Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 22 / 26

GPU Implementation: Deblurring CPU GPU N = n n it. l 2 rel. err. time [s] it. l 2 rel. err. time [s] Speedup 256 2 29 0.070 0.72 29 0.071 0.05 14.7 512 2 29 0.065 2.69 29 0.065 0.16 16.8 1024 2 29 0.064 10.66 29 0.064 0.58 18.4 2048 2 29 0.064 49.81 29 0.063 2.69 18.5 C implementation: C-CUDA implementation: Microsoft Visual Studio 2005, AMD Athlon X2 Dual-Core at 3.11GHz. CUDA 2.0, NVIDIA GTX 280, AMD Athlon X2 Dual-Core at 3.11GHz. [Ruggiero et al., J. Global Optim. 2009] Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 23 / 26

GPU Implementation: Denoising CPU GPU N = n n it. l 2 rel. err. time [s] it. l 2 rel. err. time [s] Speedup 256 2 154 0.025 1.97 224 0.025 0.20 9.9 512 2 161 0.018 8.23 235 0.018 0.42 19.6 1024 2 166 0.014 33.51 201 0.014 1.09 30.7 2048 2 146 0.011 120.89 121 0.011 2.56 47.2 C implementation: C-CUDA implementation: Microsoft Visual Studio 2005, AMD Athlon X2 Dual-Core at 3.11GHz. CUDA 2.0, NVIDIA GTX 280, AMD Athlon X2 Dual-Core at 3.11GHz. [Serafini et al., ParCo 2009] Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 24 / 26

Other examples of applications Least-squares minimization: F. Benvenuto: Iterative methods for constrained and regularized least-square problems, M20, 23 July, 15:15-17:15, C2 Sparsity constraints: C. De Mol: Iterative Algorithms for Sparse Recovery, M19, 21 July, 15:15-17:15, C2 Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 25 / 26

Conclusions and Future Works Conclusions: by exploiting both the scaling matrix and the Barzilai-Borwein step-length rules, the SGP Method is able to achieve a satisfactory reconstruction in a reasonable time. easy to implement remarkable results in massively parallel architectures (GPU). Works in progress: comparative analysis TV image reconstruction [S. Wright, M39, 23 July, 10:30-12:30, D] Duality-based algorithms [Zhu-Wright, COAP 2008] Primal-dual approach [Zhu-Chan, CAM Rep. UCLA 2008], [Lee-Wright, 2008] Regularized deblurring [G. Landi, M39, 23 July, 10:30-12:30, D] Quasi-Newton approaches [Landi-Loli-Piccolomini, Num. Alg. 2008] Zanella (UniMoRe) Gradient projection methods in imaging AIP 2009 26 / 26