Mathematical methods for Image Processing

Size: px

Start display at page:

Download "Mathematical methods for Image Processing"

June Higgins
5 years ago
Views:

1 Mathematical methods for Image Processing François Malgouyres Institut de Mathématiques de Toulouse, France invitation by Jidesh P., NITK Surathkal funding Global Initiative on Academic Network Oct François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

2 Plan 1 Non-smooth optimization : the proximal gradient algorithm François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

3 The non-smooth problem We consider the minimization problem with where W is a Euclidean space w Argmin w W E(w) E(w) = E(w) + R(w), for all w W, E is convex, coercive, differentiable with a ipschitz gradient R is lower-semi-continuous, proper, convex and simple. Definition (proximal operator and simple) We say R is simple if there is a simple way to compute prox t R(w ) = Argmin w W t 2 w w R(w). (e.g. It is given in a closed form expression or computed by a fast algorithm.) François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

4 Example 1 : R is a characteristic function et C W be a non-empty closed set : { 0, if w C R(w) = χ C (w) = +, otherwise. Then, prox t R(w ) = t Argmin w W 2 w w R(w) = Argmin w C w w 2 2 prox t R (w ) is the projection onto C. Usually easy to compute when (for instance) C is an l 1, l 2 or l ball an affine space. François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

5 Example 2 : R is. 1 If R(w) = w 1 = i w i we have (w ) = Argmin w W 2 w w w 1, ( ) = Argmin w W 2 (w i w i ) 2 + w i. (1) i The i th entry of (w ) is (w ) i = Argmin t R 2 (t w i ) 2 + t. François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

6 Example 2 : R is. 1 The i th entry of (w ) is (w ) i = Argmin t R 2 (t w i ) 2 + t. Proof : et v i = Argmin t R 2 (t w i )2 + t and v = (v i ) i. Since for every i and every w W 2 (v i w i ) 2 + v i 2 (w i w i ) 2 + w i, we have ( ) ( ) 2 (v i w i ) 2 + v i 2 (w i w i ) 2 + w i i Therefore (w ) = v. 2 v w v 1 2 w w w 1 i François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

7 Example 2 : R is. 1 (w ) i is obtained by a soft thresholding w (w i 1, si w i > 1. ) i = 0, si 1 w i 1, w i + 1, si w i < 1, Proof : We remind that (w ) i = Argmin t R 2 (t w i )2 + t and distinguish three cases Case 1 : (w ) i > 0 (w ) i > 0 and ( (w ) i w i ) + 1 = 0 (w ) i = w i 1 and w i > 1 François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

8 Example 2 : R is. 1 (w ) i is obtained by a soft thresholding w (w i 1, si w i > 1. ) i = 0, si 1 w i 1, w i + 1, si w i < 1, Proof : We remind that (w ) i = Argmin t R 2 (t w i )2 + t and distinguish three cases Case 2 : (w ) i = 0 (w ) i = 0 and ( (w ) i w i ) [ 1, 1] (w ) i = 0 and 1 w i 1 François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

9 Example 2 : R is. 1 (w ) i is obtained by a soft thresholding w (w i 1, si w i > 1. ) i = 0, si 1 w i 1, w i + 1, si w i < 1, Proof : We remind that (w ) i = Argmin t R 2 (t w i )2 + t and distinguish three cases Case 3 : (w ) i < 0 (w ) i < 0 and ( (w ) i w i ) 1 = 0 (w ) i = w i + 1 and w i < 1 François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

10 Example 3 : smooth case If R is continuously differentiable satisfies therefore prox t R(w ) = Argmin w W t 2 w w R(w) t ( prox t R(w ) w ) + R(prox t R(w )) = 0. prox t R(w ) = w 1 t R(proxt R(w )). prox t R (w ) is an implicit gradient step with step-size 1 t. François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

11 The proximal gradient algorithm Also known as forward-backward algorithm, implicit-explicit, ISTA, PAM... Algorithm 2 Proximal gradient algorithm Entry: Entry needed for computing E, E and prox t R (.) Output: Approximation of a minimizer of E : w Initialize w While Not converged Do Compute d = E(w) Compute a step-size t 0 Update : w prox t R (w t d) End while François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

12 Convergence of the Proximal Gradient Algorithm Theorem (Convergence of the Proximal Gradient algorithm) We consider E = E + R Where E : W R is convexe, coercive, differentiable, with a ipschitz gradient a of constant > 0 Where R is lower semi-continuous, proper, convex and coercive. The sequence (w k ) k N generated by the Proximal gradient Algorithm for a step-size t < 1 is such that (E(w k )) k N is non-increasing For any minimizer w of E E(w k ) E(w ) 2k w 0 w 2. a w, w W, E(w ) E(w) w w François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

13 Proof (Majorize-Minorize) emma (A quadratique majorant) We have for any w, w W E(w ) E(w) + E(w), w w + 2 w w 2 2. Proof Using the second fundamental theorem of calculus, we have Therefore E(w ) = E(w) E(tw + (1 t)w ), w w dt. = E(w ) E(w) E(w), w w 1 0 E(tw + (1 t)w ) E(w), w w dt, François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

14 Proof (Majorize-Minorize) emma (A quadratique majorant) We have for any w, w W E(w ) E(w) + E(w), w w + 2 w w 2 2. End of the proof = E(w ) E(w) E(w), w w E(tw + (1 t)w ) E(w), w w dt, E(tw + (1 t)w ) E(w) 2 w w 2 dt, tw + (1 t)w w 2 w w 2 dt, = w w (1 t)dt = 2 w w 2 2. François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

15 Proof (Majorize-Minorize) We denote for k 1 and w W, F k (w) = E(w k 1 ) + E(w k 1 ), w w k w w k We have (using the previous emma) E(w) F k (w). (1) emma (Minorize) We have w k = Argmin w W F k (w) + R(w). (2) We also have for all w W F k (w) + R(w) F k (w k ) + R(w k ) + 2 w w k 2 2. (3) François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

16 Proof (Majorize-Minorize) Proof of (2): w k = Argmin w W F k (w) + R(w) ( w k = prox R w k 1 1 ) E(w k 1 ), = Argmin w W 2 w w k E(w k 1 ) R(w), = Argmin w W 1 2 E(w k 1 ) w w k 1, E(w k 1 ) + 2 w w k R(w), = Argmin w W F k (w) + R(w). François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

17 Proof (Majorize-Minorize) Proof of (3) : F k (w) + R(w) F k (w k ) + R(w k ) + 2 w w k 2 2 First notice that, for all w W F k (w) F k (w k ) = E(w k 1 ) + E(w k 1 ), w w k w w k ( (E(w k 1 ) + E(w k 1 ), w k w k 1 + ) 2 w k w k 1 2 2, = E(w k 1 ), w w k + ( w w k w w k, w k w k 1 ), 2 = 2 w w k E(w k 1 ) + (w k w k 1 ), w w k, = 2 w w k F k (w k ), w w k. François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

18 Proof (Majorize-Minorize) End of the proof of (3) : F k (w) + R(w) F k (w k ) + R(w k ) + 2 w w k 2 2 F k (w) + R(w) F k (w k ) R(w k ) = 2 w w k F k (w k ), w w k + R(w) R(w k ). Moreover, since w k = Argmin w W F k (w) + R(w), 0 (F k + R)(w k ) = F k (w k ) + R(w k ), we have w k Argmin w W Fk (w k ), w w k + R(w). Therefore Fk (w k ), w w k + R(w) R(w k ) and F k (w) + R(w) F k (w k ) R(w k ) 2 w w k 2 2. François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

19 Proof (Majorize-Minorize) et us resume to the proof of the main Theorem. et w Argmin w W E(w), we have using (1) (E(w) F k (w)), emma 2, and the convexity of E that E(w k ) F k (w k ) + R(w k ) F k (w ) + R(w ) 2 w w k 2 2 = E(w k 1 ) + E(w k 1 ), w w k 1 + R(w ) E(w ) w w k w w k 2 2 ( w w k w w k 2 2) François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

20 François Malgouyres (IMT) Mathematics for Image Processing Oct / 16 Proof (Majorize-Minorize) Using E(w k ) F k (w k ) and w k = Argmin w W F k (w) + R(w), we have E(w k ) F k (w k ) + R(w k ) F k (w k 1 ) + R(w k 1 ) = E(w k 1 ). In words (E(w k )) k N is non-increasing. We therefore have for all k k and therefore E(w k ) E(w ) 1 k E(w k ) E(w ) E(w k ) E(w ). 2k k k =1 k k =1 ( ) E(w k ) E(w ) ( ) w w k w w k 2 2 ( w w w w k 2 2k 2) 2k w w 0 2 2

21 To go further Accelerated version exists (convergence in O( 1 k 2 )) : FISTA (Beck-Teboulle) For other algorithm using the prox, see : Chambolle-Pock Algorithm, Douglas-Rachford algorithm, Proximal Point Algorithm Convergence proof including non-convex settings : PAM (Bolte-Sabach-Teboulle) Including a Stochastic setting : Chouzenoux-Pesquet-Reppeti François Malgouyres (IMT) Mathematics for Image Processing Oct / 16

Sequential convex programming,: value function and convergence

Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional