Combining multiresolution analysis and non-smooth optimization for texture segmentation

Combining multiresolution analysis and non-smooth optimization for texture segmentation Nelly Pustelnik CNRS, Laboratoire de Physique de l ENS de Lyon

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Stochastic textures Geometric textures periodic Stochastic textures Conclusions

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Stochastic textures Geometric textures periodic Stochastic textures scale-free? Conclusions

1.5 1 0.5 0-0.5-1 0 100 200 300 400 500 600 700 800 900 1000-1.5 4 3 2 1 0-1 -2-3 0 100 200 300 400 500 600 700 800 900 1000-4 1 0.5 0-0.5-1 -1.5 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7-1 7 6 5 4 3 2 1 0-1 -2 0 1 2 3 4 5 6 7-3 8 6 4 2 0-2 -4 0 1 2 3 4 5 6 7 8 9 Sinusoidal signal periodic Stochastic textures log power Time log frequency Sinusoidal signal + noise periodic log power Time log frequency Monofractal signal scale-free log power Time log frequency

Texture segmentation Ω 1 Ω 2 Mask Synthetic image Real texture Segmentation: Estimate the boundary between Ω 1 and Ω 2 - Contribution 1: Discrete Mumford-Shah, - Contribution 2: Chan-Vese model. Texture = local dependence = local regularity. - Contribution 3: Joint estimation and segmentation.

SIROCCO Projet (Start) Projet Jeunes Chercheur.e.s GdR ISIS 2013-2015 Défi Imag In CNRS 2017 Joint work with : B. Pascal, M. Foare, P. Abry, V. Vidal, J.-C. Géminard (LPENSL), L. Condat (GIPSA-Lab), H. Wendt, N. Dobigeon (IRIT). Difficulties: large size data (> 2 million pixels), accurate transition, avoid irregular contour.

Summary 1. Basics: wavelets and proximal tools 2. Segmentation by means of proximal tools 3. Two-step texture segmentation relying on scale-free descriptor 4. Joint texture segmentation

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Conclusions Wavelet transform and sparsity prox Wavelets: sparse representation of most natural signals. Dyadic wavelet transform, denoted F R Ω Ω filterbank implementation, orthonormal transform: FF = F F = I. g R Ω ζ = Fg

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Conclusions Wavelet transform and sparsity prox g ζ = Fg softλ (F g) b = F softλ (F g) u

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Conclusions Wavelet transform and sparsity prox g ζ = Fg softλ (F g) softλ (ζ) = max{ ζi λ, 0}sign(ζi ) i Ω X 1 νi = arg min kν ζk22 + λ ν 2 i {z } kνk1 1 b = arg min ku gk22 + λkf uk1 u u 2 b = F softλ (F g) u 10 8 Identity Soft-thresholding 6 4 2 -λ 0 λ -2-4 -6-8 αi

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Conclusions Wavelet transform and sparsity prox g ζ = Fg softλ (F g) b = F softλ (F g) u 10 softλ (ζ) = max{ ζi λ, 0}sign(ζi ) i Ω 8 Identity Soft-thresholding 6 = proxλk k1 (ζ) 4 2 -λ 0 1 b = arg min ku gk22 + λkf uk1 u u 2 λ -2-4 -6-8 αi

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Conclusions Wavelet transform and sparsity prox g ζ = Fg softλ (F g) b = F softλ (F g) u 10 softλ (ζ) = max{ ζi λ, 0}sign(ζi ) i Ω = proxλk k1 (ζ) 8 Identity Soft-thresholding 6 4 2 -λ 0 λ -2 b = proxλkf k1 (g) u -4-6 -8 αi

F s : linear operator, Non-smooth optimization û Argmin u R Ω f s : proper, convex, l.s.c functions. S f s (F s u) s=1 Since 2004, numerous proximal algorithms: [Bauschke-Combettes, 2017] - Forward-Backward S = 2, f 1 Lipschitz gradient and L 2 = Id - Douglas-Rachford S = 2 and F 1 = F 2 = Id - PPXA F 1 =... = F S = Id - ADMM Invert S i=1 F ifi - Primal-dual... Flexibility in the design of objective functions.

F s : linear operator, Non-smooth optimization û Argmin u R Ω f s : proper, convex, l.s.c functions. S f s (F s u) s=1 Handle with large size problems: Closed form expression of the proximity operators: Avoid splitting: prox s fs. prox fs u = arg min ν u 2 ν 2 + f s (ν). Exploit properties of f s (strong convexity) and of F s. Block-coordinate approach.

Summary 1. Basics: wavelets and proximal tools 2. Segmentation by means of proximal tools 3. Two-step texture segmentation relying on scale-free descriptor 4. Joint texture segmentation

minimize u,k Mumford-Shah 1 (u g) 2 dxdy + β 2 Ω }{{} fidelity Ω\K u 2 dxdy } {{ } smoothness [Mumford-Shah, 1989] Ω: image domain, g L (Ω): input (possibly noisy), u W 1,2 (Ω): piecewise smooth approximation of g, + λh 1 (K Ω) }{{} length W 1,2 (Ω) = { u L 2 (Ω) u L 2 (Ω) } where weak derivative operator K: set of discontinuities, H 1 : Hausdorff measure. g (û, K)

Total variation model 1 minimize (u g) 2 dxdy + β u 2 dxdy + λh 1 (K Ω) u,k 2 Ω Ω\K Discrete piecewise constant relaxation minimize u 1 2 u g 2 2 + λtv(u) + Convex. + Fast implementation due to strong convexity. TV denotes some form of the 2-D discrete total variation, i.e., N 1 N 2 ( u R Ω ) TV(u)= u i1 +1,i 2 u i1,i 2 2 + u i1,i 2 +1 u i1,i 2 2 i 1 =1i 2 =1 = Du 2,1,

Total variation model g û TV with λ = 100 û TV with λ = 500

Proposed Discrete Mumford-Shah minimize u,e 1 2 u g 2 2 + β (1 e) Du 2 + λr(e), [Foare-Pustelnik-Condat, 2018] Ω = {1,..., N 1 } {1,..., N 2 } g R Ω : input (possibly noisy), u R Ω : piecewise smooth approximation of g, D R E Ω : models a finite difference operator, e R E : edges between nodes whose value is 1 when a contour change is detected and 0 otherwise, R: non-smooth to favor sparse solution (i.e. short K ).

Proposed Discrete Mumford-Shah minimize u,e [Foare-Pustelnik-Condat, 2018] Ω = {1,..., N 1 } {1,..., N 2 } g R Ω : input (possibly noisy), 1 2 u g 2 2 + β (1 e) Du 2 + λr(e), u R Ω : piecewise smooth approximation of g, D R E Ω : models a finite difference operator, e R E : edges between nodes whose value is 1 when a contour change is detected and 0 otherwise, R: non-smooth to favor sparse solution (i.e. short K ). Hybrid linearized proximal alternating minimization (alternative to [Bolte et al. 2013]

Segmentation methods: summary Total Variation Discrete MS Chan-Vese + Fast + Piecewise constant Not accurate contour + Extract contour + Identify smooth variations + Piecewise smooth piecewise constant Time consuming Tune parameters + Perform good segmentation results Time consuming Tune parameters: number of labels, mean value µ q [Pustelnik-Condat, 2017]

Summary 1. Basics: wavelets and proximal tools 2. Segmentation by means of proximal tools 3. Two-step texture segmentation relying on scale-free descriptor 4. Joint texture segmentation

Local regularity (1D) 100 50 0-50 0 100 200 300 400 500

Local regularity (1D) f α regular at y f (x) f (y) χ x y α Example: α = 1 10

Local regularity (1D) f α regular at y f (x) f (y) χ x y α Example: α = 1 2

Local regularity (1D) Definition ( y) h(y) = sup α such that f is α-regular at y. Compute h(y) at every point?

Pointwise regularity and wavelet transform modulus [Extracted from Mallat 1998] log 2 Wf (u, s) log 2 A + (α + 1 2 ) log 2 s. Extract α at each location compute the slope. Continuous wavelet transform not adapted to large size images.

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Conclusions Local regularity and wavelet leaders Discrete wavelet coefficients: - Coefficients at scale j {1,..., J} and subband m = {1, 2, 3}: ζj,m = Hj,m g - Orthonormal transform: h i> > >,..., HJ,3 F = H1,1, L> where J,4 g N Hj,m R 4j ζ = Fg N N and LJ,4 R 4J N

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Conclusions Local regularity and wavelet leaders Wavelet leader at scale j and location k - local supremum of all wavelet coefficients taken within a spatial neighborhood across all finer scales j 0 j ( λj,k = [k2j, (k + 1)2j ) S Lj,k = sup ζj 0,m,k where Λj,k = p { 1,0,1}2 λj,k+p m={1,2,3} λj 0,k 0 Λj,k

Multiresolution + nonlinearity local regularity Behavior through the scales [Jaffard, 2004] L j,k s n 2 jhn when 2 j 0 (where k = 2 j n) Linear regression across scales [Wendt et al., 2009] ĥ n = j w j,k log 2 L j,k

Multiresolution + nonlinearity + nonsmooth Total variation: piecewise constant estimate 1 ĥ TV = arg min u 2 u w j log 2 L j 2 2 + λ Du 1 j }{{} Nonlinear Linear transform transform Linear transform wavelet log 2 leaders linear regression ĥ Nonlinear transform l 1 minimisation

Multiresolution + nonlinearity + nonsmooth Ω 1 Ω 2 Mask Original g Estimate ĥ Estimate ĥtv

Summary 1. Basics: wavelets and proximal tools 2. Segmentation by means of proximal tools 3. Two-step texture segmentation relying on scale-free descriptor 4. Joint texture segmentation

Multiresolution + nonlinearity + nonsmooth Total variation: Joint estimation and segmentation [Pustelnik et al., 2016] (ĥtvw, ŵ) = arg min u,w 1 2 u w j log 2 L j 2 2 + λ Du 1 + d C (w) j }{{} Relax unbiased contraint: C = {w R J Ω ( k) j w j,k 0 and j jw j,k 1} d C (ŵ)) = w P C (w) 2 1 P C (ŵ)) = arg min ν C 2 ν w 2 2 ĥ

Multiresolution + nonlinearity + nonsmooth Ω 1 Ω 2 Mask Original g Estimate ĥ Estimate ĥtv Estimate ĥtvw

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Multiresolution + nonlinearity + nonsmooth Original g b Estimate h btv Estimate h btvw Estimate h Conclusions

Intro Basics Segmentation Two-step texture segmentation Joint Texture Segmentation Conclusions Multiresolution + nonlinearity + nonsmooth [Yuan et al. 2015] Original g [Arbelaez et al. 2011] b Estimate h btv Estimate h btvw Estimate h

Multiresolution + nonlinearity + nonsmooth 1 (ĥtvw, ŵ) = arg min (u,w) 2 u w j log 2 L j 2 2 + λ Du 1 + d C (ŵ) j }{{} ĥ + Good texture segmentation performance + Convex minimization formulation + Combined estimation and segmentation (contrary to ĥtv) Computational cost. Not adapted for large scale data.

Multiresolution + nonlinearity local regularity Behavior through the scales [Jaffard, 2004] L j,k s n 2 jhn as 2 j 0 (where k = 2 j n) log 2 L j,k log 2 s n + jh n as 2 j 0. PLOVER: Piecewise constant LOcal VariancE and Regularity estimation [Pascal et al., 2018] Find ( v, ĥ) Argmin j log 2 L j v jh 2 2 + η Dh 1 + ζ Dv 1 v,h with ŝ = 2 v + Strongly convex computationally efficient + Combine estimation and segmentation. + Joint estimation of the local variance and local regularity.

Multiresolution + nonlinearity + nonsmooth + fast (a) Synthetic texture x (b) s mask (c) h mask 0.6 0.5 2 1 0.5 0.4 0 0.3-1 0.4 0.2-2 0.3 0.1 Linear regression Disjoint TV PLOVER Disjoint re-estimation SNR = 2.7496 0.7 SNR = 9.9722 0.7 SNR = 10.2854 0.7 SNR = 8.0758 PLOVER re-estimation SNR = 8.0241 0.7 0.7 Local variance 0.6 0.5 0.4 0.6 0.5 0.4 0.6 0.5 0.4 0.6 0.5 0.4 0.6 0.5 0.4 SNR = -5.3411 0.3 0.7 SNR = -4.2591 0.3 0.7 SNR = -4.1325 0.3 0.7 SNR = 0.14181 0.3 0.7 SNR = 0.24025 0.3 0.7 Local regularity 0.6 0.5 0.4 0.6 0.5 0.4 0.6 0.5 0.4 0.6 0.5 0.4 0.6 0.5 0.4 0.3 0.3 0.3 0.3 0.3

Multiresolution + nonlinearity + nonsmooth + fast Image g R N Zoom of g PLOVER : ŝ PLOVER : ĥ 2 1.5 1 0.5 0 [Arbelaez2011] [Yuan 2015] Disjoint TV PLOVER

Conclusions HL-PAM for fast discrete Mumford-Shah several applications going from image restoration to graph analysis. Proximity operator of a sum of two functions application to segmentation and depth map estimation. Scale-free descriptors in a variational framework large-scale texture segmentation procedure.

Perspectives TV denoising/ Chan-Vese/D-MS procedure allowing to propose to expert accurate estimation and segmentation. D-MS allows to go from piecewise smooth to piecewise constant. Both are of interest for the applications. HL-PAM and strong convexity? Quantify deadzone w.r.t. scale. Regularization parameter selection. Integrate anisotropy.

References N. Pustelnik, H. Wendt, P. Abry, N. Dobigeon, Combining local regularity estimation and total variation optimization for scale-free texture segmentation, IEEE Trans. on Computational Imaging, vol. 2, no. 4, pp. 468-479, Dec. 2016. N. Pustelnik, L. Condat, Proximity operator of a sum of functions; Application to depth map estimation, IEEE Signal Processing Letters, 2017. M. Foare, N. Pustelnik, L. Condat, A new proximal method for joint image restoration and edge detection with the Mumford-Shah model, accepted ICASSP 2018. B. Pascal, N. Pustelnik, P. Abry, M. Serres, V. Vidal, Joint estimation of local variance and local regularity for texture segmentation. Application to multiphase flow characterization, submitted IEEE ICIP 2018. J. Frecon, N. Pustelnik, N. Dobigeon, H. Wendt, and P. Abry, Bayesian selection for the regularization parameter in TVl0 denoising problems, IEEE Trans. on Signal Processing, 2017.

Proposed Discrete Mumford-Shah minimize u,e 1 2 u g 2 2 + β (1 e) Du 2 + λr(e) g û ê

Proposed Discrete Mumford-Shah minimize u,e 1 2 u g 2 2 + β (1 e) Du 2 + λr(e) R: favors binary, i.e. {0, 1} E and sparse solution (i.e. short K ) 1. Ambrosio-Tortorelli approximation: [Ambrosio-Tortorelli, 1990] [Foare-Lachaud-Talbot, 2016] R(e) = ε De 2 2 + 1 4ε e 2 2 with ε > 0 2. l 1 -norm: R(e) = e 1 3. Quadratic l 1 : [Foare-Pustelnik-Condat, 2017] R(e) = { } E i=1 max e i, e2 i. 4ε

minimize u,e Proposed Discrete Mumford-Shah Ψ(u, e) := 1 2 u g 2 2 + β (1 e) Du 2 +λr(e) }{{} S(e,Du) PALM [Bolte et al, 2014] Set u [0] R Ω and e [0] R E. For l N Set γ > 1 and c l = γχ(e [l] ) u [l+1] prox 1 g (u [l] c 2 c l u S ( e [l], Du [l])) l 2 Set δ > 1 and d k = δν(u [l+1] ) e [l+1] prox 1 λr (e [l] d l e S ( e [l], Du [l+1])) d l Under technical assumptions, the sequence (u [l], e [l] ) l N converges to a critical point (u, e ) of Ψ.

minimize u,e Proposed Discrete Mumford-Shah Ψ(u, e) := 1 2 u g 2 2 + β (1 e) Du 2 +λr(e), }{{} S(e,Du) Proposed HL-PAM [Foare-Pustelnik-Condat, 2017] Set u [0] R Ω and e [0] R E. For l N Set γ > 1 and c l = γχ(e [l] ). u [l+1] prox 1 g (u [l] c 2 c k u S ( e [l], Du [l])) k 2 Set d l > 0. ) e [l+1] prox 1 λr+s(,du d [l+1] ) (e [l] l Under technical assumptions, the sequence (u [l], e [l] ) l N converges to a critical point (u, e ) of Ψ.

Proposed Discrete Mumford-Shah Assumptions 1. The updating steps of u [l+1] and e [l+1] have closed form expressions; 2. e S is globally Lispchitz with moduli χ(e [l] ) for every l N and there exists χ, χ + > 0 such that χ χ(e [l] ) χ + ; 3. (d l ) l N is a positive sequence such that the stepsizes d l belongs to (d, d + ) for some positive d d +.

Proposed Discrete Mumford-Shah Proposition [Foare-Pustelnik-Condat, 2017] We assume that S is separable, i.e, ( e=(e i ) 1 i E ) R(e) = σ i (e i ), where σ i :R E ] ; + ] with a closed form proximity operator expression. Let d l > 0, then prox 1 d l λr+s(,du [l+1] ) (e[l] ) = ( prox λσ i 2β(Du [l] ) 2 i +d l E i=1 [l] i β(du[l+1] ) 2 i + d le 2 β(du [l+1] ) 2 i + d l 2 ) i E

Proposed Discrete Mumford-Shah Proposition [Foare-Pustelnik-Condat, 2017] For every η R and τ, ɛ > 0 { [ ( prox. τ max{., 2 }(η) = sign(η) max 0, min η τ, max 4ɛ, 4ɛ η )]} τ 2ɛ + 1

Proposed Discrete Mumford-Shah Convergence PALM versus HL-PALM: Ψ(u [l], e [l] ) w.r.t. iterations l 200 180 160 140 120 PALM, d l = 0.5/β HL-PAM, d l = 0.5/β HL-PAM, d l = 5/β HL-PAM, d l = 50/β HL-PAM, d l = 500/β 100 0 50 100 150 200 250 300

Data g TV [Strekalovskiy-Cremers, 2014] [Foare-Lachaud-Talbot, 2016] l 1 quadratic-l 1

Proposed Discrete Mumford-Shah Data g TV [Strekalovskiy-Cremers, 2014] [Foare-Lachaud-Talbot, 2016] l 1 quadratic-l 1

Proposed Discrete Mumford-Shah Convergence speed: Ψ(u [l+1], e [l+1] ) Ψ(u [l], e [l] ) < 10 4 TV [Foare-Lachaud-Talbot, 2016] l 1 quadratic l 1 dots ( Ω = 128 2 ) 0.4 43.6 2.2 2.1 dots ( Ω = 256 2 ) 2.2 231.3 6.2 5.5 dots ( Ω = 512 2 ) 30.8 1446.5 116.3 90.3 ellipse ( Ω = 128 2 ) 0.7 55.3 7.5 4.3 ellipse ( Ω = 256 2 ) 4.4 507.2 34.8 17.2 ellipse ( Ω = 512 2 ) 48.8 5038.6 535.7 385.6 peppers ( Ω = 128 2 ) 1.1 167.7 22.3 19.9 peppers ( Ω = 256 2 ) 8.8 1014.4 78.6 81.3 peppers ( Ω = 512 2 ) 61.8 10038.6 647.5 650.8

minimize u,k Chan-Vese model 1 (u g) 2 dxdy + β u 2 dxdy + λh 1 (K Ω) 2 Ω Ω\K Discrete piecewise constant relaxation with fixed label number [Chan-Vese, 2001] Q Q minimize θ (q 1) θ (q), (µ q g) 2 + λ TV(θ (q 1) θ (q) ) (θ (q) ) 1 q Q 1 q=1 q=1 s.t. 1 θ (0) θ (1)... θ (Q 1) θ (Q) 0, Ω 3 Ω 3 Ω 2 Ω 1 Ω 1 Ω Ω Ω 2 1 1 g

Chan-Vese model minimize Θ=(θ (q) ) 1 q Q 1 Q 1 Q β (q), θ (q) + λ DH q Θ 2,1 + ι [0,1] Q Ω (Θ) + ι E (Θ) q=1 q=1 β (q) = (µ q+1 g) 2 (µ q g) 2, H q : R Q Ω R Ω : Θ θ (q 1) θ (q), E = {Θ R Q Ω : θ (1)... θ (Q 1) }. Use of splitting proximal algorithms to deal with a sum of convex but non-smooth functions.

Chan-Vese model Three-term splitting : minimize Θ Q 1 q=1 β(q), θ (q) + λ Q q=1 DH qθ 2,1 + ι [0,1] Q Ω (Θ) + ι E (Θ) Two-term splitting : minimize Θ Q 1 q=1 β(q), θ (q) + λ Q q=1 DH qθ 2,1 + ι [0,1] Q Ω (Θ) + ι E (Θ) Question: When is it possible to compute the proximity operator of a sum of functions rather splitting. Would it be more efficient?

Chan-Vese model Proposition [Pustelnik, Condat, 2017] (i) For some function h 0 Γ 0(R), h is separable, with ( ) x = (xi ) i Ω h(x) = h 0(x i ). i Ω (ii) g has the following form: ( x = (xi ) i Ω ) g(x) = (m,m ) Υ Ω 2 σ Cm,m (x m x m), where σ Cm,m : t R sup {tp, p C m,m } is the support function of a closed real interval C m,m, such that inf C m,m = a m,m and sup C m,m = b m,m, for some a m,m R { } and b m,m R {+ }, with a m,m b m,m. a m,m t if t < 0, ( t R) σ Cm,m (t) = 0 if t = 0, b m,m t if t > 0, Under assumptions (i) and (ii), prox g+h = prox h prox g.

Chan-Vese model Particular cases: Fused Lasso: Ω = {1,..., N} and Υ = {(1, 2), (2, 3),..., (N 1, N)}, b n,n+1 = a n,n+1 = ω n 0, h 0 = λ, g(x) = N 1 n=1 ω n x n+1 x n Chan-Vese: Ω = {1,..., Q} and Υ = {(1, 2), (2, 3),..., (Q 1, Q)}, a n,n+1 = 0 b n,n+1 = +, h 0 = ι [0,1], g(x) = ι E Compute P E with Pool Adjacent Violators Algorithm (PAVA) [Ayer et al., 1995]

Chan-Vese model g λ = 10 3 λ = 10 4

Chan-Vese model 10 10 Minimal splitting (proposed method) Intermediate splitting 10 8 Full splitting 10 6 10 4 10 2 10 0 0 5000 10000 15000