A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing

A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing Emilie Chouzenoux emilie.chouzenoux@univ-mlv.fr Université Paris-Est Lab. d Informatique Gaspard Monge UMR CNRS 8049 Séminaire Image, GREYC 2 février 2012 Emilie Chouzenoux Séminaire Image, GREYC 1 / 39

Outline 1 General context 2 l 2 -l 0 regularization functions Existence of minimizers Epi-convergence property 3 Minimization of F δ Proposed algorithm Convergence results 4 Application to image processing Image denoising Image segmentation Texture+Geometry decomposition Image reconstruction 5 Conclusion Emilie Chouzenoux Séminaire Image, GREYC 2 / 39

Context Image restoration We observe data y R Q, related to the original image x R N through: y = Hx+w, H R Q N Objective: Restore the unknown original image x from H and y. y x Emilie Chouzenoux Séminaire Image, GREYC 3 / 39

Context Penalized optimization problem Find where ( ) min F(x) = Φ(Hx y)+ψ(x), x R N Φ Data fidelity term, related to noise Ψ Regularization term, related to some a priori assumptions Assumption: There exist V = [ V1 ]... V [ S and c = c 1,...,c ], S V s R Ps N and c s R Ps, such that V x c is block-sparse. F 0 (x) = Φ(Hx y)+ S s=1 λχ R\{0}( V s x c s ) where χ R\{0} (t) = 0 if t = 0, 1 otherwise group l 0 penalty [Eldar10] Emilie Chouzenoux Séminaire Image, GREYC 5 / 39

Context Penalized optimization problem Find where ( ) min F(x) = Φ(Hx y)+ψ(x), x R N Φ Data fidelity term, related to noise Ψ Regularization term, related to some a priori assumptions Assumption: There exist V = [ V1 ]... V [ S and c = c 1,...,c ], S V s R Ps N and c s R Ps, such that V x c is block-sparse. F δ (x) = Φ(Hx y)+ S s=1 ψ s,δ( V s x c s ) where ψ s,δ is an approximation of λχ R\{0} depending on δ > 0. Emilie Chouzenoux Séminaire Image, GREYC 5 / 39

Examples of regularization functions l 2 -l 1 functions: Asymptotically linear with a quadratic behavior near 0. Example: ( s {1,,S})( t R), ψ s,δ (t) = λ( 1+ t2 1) δ 2 Limit case: When δ 0, ψ δ (t) = λ t (l 1 penalty). Convex functions Majorize-Minimize algorithms [Allain06,Chouzenoux11] Proximal algorithms [Combettes11,Condat11,Raguet11]. l 2 -l 0 functions: Asymptotically constant with a quadratic behavior near 0. Example: ( s {1,,S})( t R), ψ s,δ (t) = λmin(t 2 /(2δ 2 ),1) Limit case: When δ 0, ψ δ (t) = λχ R\{0} (t) (l 0 penalty). Non-convex functions Which algorithms? Emilie Chouzenoux Séminaire Image, GREYC 6 / 39

l 2 -l 0 regularization functions We consider the following class of potential functions: 1 ( s {1,...,S}) ( δ (0,+ )) lim t ψ s,δ (t) = λ. 2 ( s {1,...,S}) ( δ (0,+ )) ψ s,δ (t) = O(t 2 ) for small t. Examples: 1 ψ δ (t) = min( t2 2δ 2,1) ψ δ (t) = t2 2δ 2 +t 2 ψ δ (t) = (1 exp( t2 2δ 2 )) ψ δ (t) = tanh( t2 2δ 2 ) 0.8 0.6 0.4 0.2 0 5 2.5 0 2.5 5 t Emilie Chouzenoux Séminaire Image, GREYC 9 / 39

Existence of minimizers (I) S F δ (x) = Φ(Hx y)+ ψ s,δ ( V s x c s ) s=1 Difficulty: F δ is a non convex, non coercive function. Proposition 1 Assume that (i) Φ is continuous and coercive, i.e. lim x + Φ(x) = + (ii) For every δ > 0 and s {1,...,S}, ψ s,δ is continuous and takes nonnegative values (iii) KerH = {0} Then, for every δ > 0, F δ has a minimizer. Emilie Chouzenoux Séminaire Image, GREYC 10 / 39

Existence of minimizers (II) S F δ (x) = Φ(Hx y)+ ψ s,δ ( V s x c s )+ V 0 x 2 s=1 Difficulty: F δ is a non convex, non coercive function. Proposition 1 Assume that (i) Φ is continuous and coercive, i.e. lim x + Φ(x) = + (ii) For every δ > 0 and s {1,...,S}, ψ s,δ is continuous and takes nonnegative values (iii) KerH KerV 0 = {0} Then, for every δ > 0, F δ has a minimizer. Emilie Chouzenoux Séminaire Image, GREYC 10 / 39

Existence of minimizers (III) S F δ (x) = Φ(Hx y)+ ψ s,δ ( V s x c s ) s=1 Difficulty: F δ is a non convex, non coercive function. Proposition 1 Assume that (i) Φ is continuous and coercive, i.e. lim x + Φ(x) = + (ii) For every δ > 0 and s {1,...,S}, ψ s,δ is continuous and takes nonnegative values and ψ 1 s,δ (0) is a nonempty bounded set. (iii) KerH S s=1 KerV s = {0} Then, for every δ > 0, F δ has a minimizer. Emilie Chouzenoux Séminaire Image, GREYC 10 / 39

Epi-convergence to the group l 0 -penalized objective function Assumptions: 1 ( s {1,...,S}) ( (δ 1,δ 2 ) (0,+ ) 2 ) δ 1 δ 2 ( t R)ψ s,δ1 (t) ψ s,δ2 (t) 2 ( s {1,...,S})( t R), lim δ 0 ψ s,δ (t) = λχ R\{0} (t) 3 Assumptions of Proposition 1 Proposition 2 δ>0 Let (δ n ) n N be a decreasing sequence of positive real numbers converging to 0. Under the above assumptions, inff δn inff 0 as n + In addition, if for every n N, x n is a minimizer of F δn, then the sequence ( x n ) n N is bounded and all its cluster points are minimizers of F 0. Emilie Chouzenoux Séminaire Image, GREYC 11 / 39

Iterative minimization of F δ (x) In the sequel, we assume that F δ is differentiable. Descent algorithm x k+1 = x k +α k d k, ( k 0) d k : search direction satisfying g T k d k < 0 where g k F δ (x k ) Ex: Gradient, conjugate gradient, Newton, truncated Newton,... stepsize α k : approximate minimizer of f k,δ (α): α F δ (x k +αd k ) Generalization : subspace algorithm [Zibulevsky10] x k+1 = x k + M u m,k d m k, ( k 0) m=1 [d 1 k,...,dm k ] = D k: Set of search directions Ex: Super-memory gradient D k = [ g k,d k 1,..,d k l ] stepsize u k : approximate minimizer of f k,δ (u): u F δ (x k +D k u) Emilie Chouzenoux Séminaire Image, GREYC 13 / 39

Majorize-Minimize principle [Hunter04] Objective: Find ˆx Argmin x F δ (x) For all x, let Q(.,x ) a tangent majorant of F δ at x i.e., Q(x,x ) F δ (x), x, Q(x,x ) = F δ (x ) MM algorithm: j {0,...,J}, x j+1 Argmin x Q(x,x j ) Q(.,x j ) F δ x j x j+1 Emilie Chouzenoux Séminaire Image, GREYC 14 / 39

Quadratic tangent majorant function Assumptions: (i) Φ is differentiable with an L-Lipschitzian gradient (ii) For every s {1,,S}, ψ s,δ is a differentiable function. (iii) For every s {1,,S}, ψ s,δ (.) is concave on [0,+ ). (iv) For every s {1,,S}, there exists ω s [0,+ ) such that ( t (0,+ )) 0 ψ s,δ (t) ω s t where ψ s,δ is the derivative of ψ s,δ. In addition, lim t 0 ω s,δ (t) R with ω s,δ (t) ψ s,δ (t)/t. t 0 Lemma 1 [Allain06] For every x R N, let A(x) = µh H +V Diag{b(x)}V +2V 0 V 0 where µ [L,+ ) and b(x) R SP with b sp (x) = ω s,δ ( V s x c s ). Then, Q(x,x ) = F δ (x )+ F δ (x ) T (x x )+ 1 2 (x x ) T A(x )(x x ) is a convex quadratic tangent majorant of F δ at x. Emilie Chouzenoux Séminaire Image, GREYC 15 / 39

Majorize-Minimize multivariate stepsize [Chouzenoux11] x k+1 = x k +D k u k ( k 0) D k : set of directions u k resulting from MM minimization of f k,δ (u): u F δ (x k +D k u) q k (u,u j k ) : Quadratic tangent majorant of f k,δ at u j k with Hessian: B k,u j = Dk TA(x k +D k u j k )D k k MM minimization in the subspace: u 0 k = 0, u j+1 k Argmin u q k (u,u j k ), ( j {0,...J 1}) u k = u J k. Emilie Chouzenoux Séminaire Image, GREYC 16 / 39

Proposed algorithm Majorize-Minimize subspace algorithm For all k 0 1 Compute the set of directions D k = [d 1 k,,dm k ] 2 u 0 k = 0 3 j {0,...J 1}, B k,u j = Dk A(x k +D k u j k )D k k u j+1 k = u j k B 1 f u j k,δ (u j k ) k 4 u k = u J k 5 Update x k+1 = x k +D k u k Emilie Chouzenoux Séminaire Image, GREYC 17 / 39

Convergence results Assumptions ➀ Assumptions of Proposition 1 Prop1 ➁ Assumptions of Lemma 1 Lem1 ➂ For every k N, the matrix of directions D k is of size N M with 1 M N and the first subspace direction d 1 k is gradient-related i.e., there exist γ 0 > 0 and γ 1 > 0 such that, for every k N, g k d1 k γ 0 g k 2, d 1 k γ 1 g k ➃ F δ satisfies the Lojasiewicz inequality [Attouch10a,Attouch10b]: For every x R N and every bounded neighborhood of E of x, there exist constants κ > 0, ζ > 0 and θ [0,1) such that F δ (x) κ F δ (x) F δ ( x) θ, for every x E such that F δ (x) F δ ( x) < ζ. Emilie Chouzenoux Séminaire Image, GREYC 18 / 39

Convergence results Theorem Under Assumptions ➀, ➁, and ➂, for all J 1, the MM subspace algorithm is such that lim k F δ(x k ) = 0. Furthermore, if Assumption ➃ is fulfilled, then The MM subspace algorithm generates a sequence converging to a critical point x of F δ. The sequence (x k ) k N has a finite length in the sense that + k=0 x k+1 x k < +. Emilie Chouzenoux Séminaire Image, GREYC 19 / 39

Simulation settings Considered penalization functions: ψ s,δ (t) = λ( Optimization algorithms: 1+ t2 δ 2 1) SC ψ s,δ (t) = λ λt2 2δ 2 +t SNC-(i) 2 ψ s,δ (t) = λ(1 exp( t2 2δ )) SNC-(ii) 2 ψ s,δ (t) = λtanh( t2 ψ s,δ (t) = λmin ( ) t 2,1 2δ 2 2δ 2 ) SNC-(iii) NSNC MM subspace algorithm with D k = [ g k x k x k 1 ] (MM-MG) NLCG [Hager06], L-BFGS [Liu89] and HQ [Allain06] algorithms NSNC: Four state-of-the-art combinatorial optimization algorithms : α-exp [Boykov01], QCSM [Jezierska11], TRW [Kolmogorov06] and BP [Felzenszwalb10] Emilie Chouzenoux Séminaire Image, GREYC 21 / 39

Image denoising Original image x with 128 128 pixels (left) and noisy image y, degraded by i.i.d. Gaussian noise, SNR= 15 db (right). F δ (x) = 1 2 x y 2 + S ψ s,δ ( V s x )+βd 2 B(x). s=1 d B is the quadratic distance to the closed convex interval B = [0,255] Anisotropic penalization on the gradients of x Emilie Chouzenoux Séminaire Image, GREYC 22 / 39

Results Denoising result (left) and absolute reconstruction error (right) with SC penalty using MM-MG, SNR = 20.41 db, MSSIM = 0.89. Emilie Chouzenoux Séminaire Image, GREYC 23 / 39

Results Denoising result (left) and absolute reconstruction error (right) with SNC-(i) penalty using MM-MG, SNR = 22.74 db, MSSIM = 0.92. Emilie Chouzenoux Séminaire Image, GREYC 23 / 39

Results Denoising result (left) and absolute reconstruction error (right) with NSNC penalty using TRW, SNR = 22.8 db, MSSIM = 0.93. Emilie Chouzenoux Séminaire Image, GREYC 23 / 39

Results Penalty Algorithm Iteration Time F δ SNR (db) SC MM-MG 122 0.22 2.7 10 6 20.41 NLCG 138 0.35 2.7 10 6 20.41 L-BFGS 209 0.73 2.7 10 6 20.41 HQ 670 3.03 2.7 10 6 20.41 SNC-(i) MM-MG 270 0.35 1.54 10 6 22.74 NLCG 1250 2.34 1.54 10 6 22.74 L-BFGS 332 0.96 1.54 10 6 22.73 HQ 1025 3.84 1.54 10 6 22.74 NSNC α-exp 4 4.67 1.31 10 6 22.69 QCSM 2 1.25 1.31 10 6 22.60 TRW 5 1.65 1.31 10 6 22.80 BP 18 5.33 1.31 10 6 22.73 Emilie Chouzenoux Séminaire Image, GREYC 24 / 39

Image segmentation Original image x with 256 256 pixels. F δ (x) = 1 S 2 x x 2 + ψ s,δ ( V s x ) s=1 Anisotropic penalization on the gradients of x Emilie Chouzenoux Séminaire Image, GREYC 25 / 39

Results Segmented image (left) and its gradient (right) with SC penalty using MM-MG. Emilie Chouzenoux Séminaire Image, GREYC 26 / 39

Results Segmented image (left) and its gradient (right) with SNC-(2) penalty using MM-MG. Emilie Chouzenoux Séminaire Image, GREYC 26 / 39

Results Segmented image (left) and its gradient (right) with NSNC penalty using TRW. Emilie Chouzenoux Séminaire Image, GREYC 26 / 39

Results Detail of segmented image (left) and its gradient (right) with SC penalty using MM-MG. Emilie Chouzenoux Séminaire Image, GREYC 27 / 39

Results Detail of segmented image (left) and its gradient (right) with SNC-(2) penalty using MM-MG. Emilie Chouzenoux Séminaire Image, GREYC 27 / 39

Results Detail of segmented image (left) and its gradient (right) with NSNC penalty using TRW. Emilie Chouzenoux Séminaire Image, GREYC 27 / 39

Results 200 180 160 140 120 100 0 50 100 150 200 250 Comparison of 50th line of segmented images using NSNC ( ), SNC-(ii) ( ) or SC ( ) potential functions. The 50th line of the original image is indicated in dotted plot. Emilie Chouzenoux Séminaire Image, GREYC 28 / 39

Results Penalty Algorithm Iteration Time F δ SC MM-MG 132 0.99 6.69 10 6 NLCG 144 1.49 6.69 10 6 L-BFGS 215 3.44 6.69 10 6 HQ 898 18.19 6.69 10 6 SNC-(ii) MM-MG 622 4.35 1.59 10 7 NLCG 1578 14.93 1.59 10 7 L-BFGS 632 9.57 1.59 10 7 HQ 3553 65.2 1.59 10 7 NSNC α-exp 9 57.97 5.58 10 6 QCSM 1 7.05 5.52 10 6 TRW 5 6.71 5.52 10 6 BP 50 61.83 5.52 10 6 Emilie Chouzenoux Séminaire Image, GREYC 29 / 39

Texture+Geometry decomposition Original image x with 256 256 pixels (left) and noisy image y, degraded with i.i.d. Gaussian noise, SNR=15 db (right) { ˆx geometry y = ˆx+ ˇx with where ˆx minimizes ([Osher03]) ˇx texture + noise F δ (x) = 1 2 1 (x y) 2 +λ Isotropic penalization on the gradients of x S ψ s,δ ( V s x ) Emilie Chouzenoux Séminaire Image, GREYC 30 / 39 s=1

Results Recovered geometry part x (left) and texture+noise part ˇx (right) with SC penalty using MM-MG. Emilie Chouzenoux Séminaire Image, GREYC 31 / 39

Results Recovered geometry part x (left) and texture+noise part ˇx (right) with SNC-(iii) penalty using MM-MG. Emilie Chouzenoux Séminaire Image, GREYC 31 / 39

Results Penalty Algorithm Iteration Time F δ SC MM-MG 633 20.8 2.99 10 12 NLCG 591 17.5 2.99 10 12 L-BFGS 674 22.9 2.99 10 12 HQ 6067 306 2.99 10 12 SNC-(iii) MM-MG 448 11.84 2.14 10 12 NLCG 1058 25.1 2.14 10 12 L-BFGS 457 13.2 2.14 10 12 HQ 3882 168 2.17 10 12 Emilie Chouzenoux Séminaire Image, GREYC 32 / 39

Image reconstruction Original image x with 128 128 pixels (left) and noisy sinogram y with 181 256 measurements, degraded by i.i.d. Laplacian noise, SNR=23.5 db (right). F δ (x) = Q φ q,ρ ((Rx) q y q )+ q=1 S ψ s,δ ( V s x )+βd 2 B(x)+τ x 2 s=1 R is the Radon projection matrix, φ q,ρ is the SC function Isotropic penalization on the gradients of x Emilie Chouzenoux Séminaire Image, GREYC 33 / 39

Results Reconstructed image (left) and detail (right) with SC penalty using MM-MG, SNR = 18.05 db, MSSIM = 0.81. Emilie Chouzenoux Séminaire Image, GREYC 34 / 39

Results Reconstructed image (left) and detail (right) with SNC-(i) penalty using MM-MG, SNR = 21.13 db, MSSIM = 0.92. Emilie Chouzenoux Séminaire Image, GREYC 34 / 39

Results Penalty Algorithm Iteration Time F δ SNR SC MM-MG 253 59.3 1.1 10 6 18.05 NLCG 358 84.1 1.1 10 6 18.05 L-BFGS 349 82.3 1.1 10 6 18.05 HQ 728 337 1.1 10 6 18.05 SNC-(i) MM-MG 516 119.8 8.6214 10 6 21.13 NLCG 618 143 8.6228 10 6 20.89 L-BFGS 870 203 8.6225 10 6 21.17 HQ 1152 530 8.6236 10 6 20.85 Emilie Chouzenoux Séminaire Image, GREYC 35 / 39

Conclusion Majorize-Minimize subspace algorithm for l 2 -l 0 minimization Faster methods w.r.t. combinatorial optimization techniques Simplicity of implementation Future work Constrained case Non differentiable case Application to a wider class of problems (Ex: IRM) Emilie Chouzenoux Séminaire Image, GREYC 37 / 39

Bibliography M. Allain, J. Idier and Y. Goussard On global and local convergence of half-quadratic algorithms IEEE Transactions on Image Processing, 15(5):1130 1142, 2006 H. Attouch, J. Bolte and B. F. Svaiter Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods http://www.optimization-online.org, 2010 H. Attouch, J. Bolte, P. Redont and A. Soubeyran Proximal alternating minimization and projection methods for nonconvex problems. An approach based on the Kurdyka- Lojasiewicz inequality Mathematics of Operations Research, 35(2):438 457, 2010 E. Chouzenoux, J. Idier and S. Moussaoui A Majorize-Minimize strategy for subspace optimization applied to image restoration IEEE Transactions on Image Processing, 20(6):1517 1528, 2011 E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot A Majorize-Minimize subspace approach for l 2 -l 0 image regularization http://arxiv.org/abs/1112.6272, 2011 Emilie Chouzenoux Séminaire Image, GREYC 38 / 39

Thanks for your attention! Emilie Chouzenoux Séminaire Image, GREYC 39 / 39

Existence of minimizers (I) S F δ (x) = Φ(Hx y)+ ψ s,δ ( V s x c s ) s=1 Difficulty: F δ is a non convex, non coercive function. Proposition 1 Back Assume that (i) Φ is continuous and coercive, i.e. lim x + Φ(x) = + (ii) For every δ > 0 and s {1,...,S}, ψ s,δ is continuous and takes nonnegative values (iii) KerH = {0} Then, for every δ > 0, F δ has a minimizer. Emilie Chouzenoux Séminaire Image, GREYC 1 / 2

Existence of minimizers (II) S F δ (x) = Φ(Hx y)+ ψ s,δ ( V s x c s )+ V 0 x 2 s=1 Difficulty: F δ is a non convex, non coercive function. Proposition 1 Back Assume that (i) Φ is continuous and coercive, i.e. lim x + Φ(x) = + (ii) For every δ > 0 and s {1,...,S}, ψ s,δ is continuous and takes nonnegative values (iii) KerH KerV 0 = {0} Then, for every δ > 0, F δ has a minimizer. Emilie Chouzenoux Séminaire Image, GREYC 1 / 2

Existence of minimizers (III) S F δ (x) = Φ(Hx y)+ ψ s,δ ( V s x c s ) s=1 Difficulty: F δ is a non convex, non coercive function. Proposition 1 Back Assume that (i) Φ is continuous and coercive, i.e. lim x + Φ(x) = + (ii) For every δ > 0 and s {1,...,S}, ψ s,δ is continuous and takes nonnegative values and ψ 1 s,δ (0) is a nonempty bounded set. (iii) KerH S s=1 KerV s = {0} Then, for every δ > 0, F δ has a minimizer. Emilie Chouzenoux Séminaire Image, GREYC 1 / 2

Quadratic tangent majorant function Assumptions: (i) Φ is differentiable with an L-Lipschitzian gradient (ii) For every s {1,,S}, ψ s,δ is a differentiable function. (iii) For every s {1,,S}, ψ s,δ (.) is concave on [0,+ ). (iv) For every s {1,,S}, there exists ω s [0,+ ) such that ( t (0,+ )) 0 ψ s,δ (t) ω s t where ψ s,δ is the derivative of ψ s,δ. In addition, lim t 0 ω s,δ (t) R with ω s,δ (t) ψ s,δ (t)/t. t 0 Lemma 1 [Allain06] Back For every x R N, let A(x) = µh H +2V 0 V 0 +V Diag{b(x)}V where µ [L,+ ) and b(x) R SP with b sp (x) = ω s,δ ( V s x c s ). Then, Q(x,x ) = F δ (x )+ F δ (x ) T (x x )+ 1 2 (x x ) T A(x )(x x ) is a convex quadratic tangent majorant of F δ at x. Emilie Chouzenoux Séminaire Image, GREYC 2 / 2