Signal Processing and Networks Optimization Part VI: Duality

Size: px

Start display at page:

Download "Signal Processing and Networks Optimization Part VI: Duality"

Osborne Mitchell
6 years ago
Views:

1 Signal Processing and Networks Optimization Part VI: Duality Pierre Borgnat 1, Jean-Christophe Pesquet 2, Nelly Pustelnik 1 1 ENS Lyon Laboratoire de Physique CNRS UMR 5672 pierre.borgnat@ens-lyon.fr, nelly.pustelnik@ens-lyon.fr 2 LIGM Univ. Paris-Est CNRS UMR 8049 jean-christophe.pesquet@univ-paris-est.fr

2 2/29 Fenchel-Rockafellar duality Primal problem Let H and G be two real Hilbert spaces. Let f : H ],+ ], g: G ],+ ]. Let L B(H,G). We want to minimize f(x)+g(lx). x H Dual problem Let H and G be two real Hilbert spaces. Let f : H ],+ ], g: G ],+ ]. Let L B(H,G). We want to minimize f ( L v)+g (v). v G

3 3/29 Fenchel-Rockafellar duality Weak duality Let H and G be two real Hilbert spaces. Let f be a proper fonction from H to ],+ ], g be a proper function from G to ],+ ], and L B(H,G). Let µ = inf x H f(x)+g(lx) and µ = inf v G f ( L v)+g (v). We have µ µ. If µ R, µ+µ is called the duality gap.

4 3/29 Fenchel-Rockafellar duality Weak duality Let H and G be two real Hilbert spaces. Let f be a proper fonction from H to ],+ ], g be a proper function from G to ],+ ], and L B(H,G). Let µ = inf x H f(x)+g(lx) and µ = inf v G f ( L v)+g (v). We have µ µ. If µ R, µ+µ is called the duality gap. Proof: According to Fenchel-Young inequality, for every x H and v G, f(x)+g(lx)+f ( L v)+g (v) x L v + Lx v = 0

5 4/29 Fenchel-Rockafellar duality Strong duality Let H and G be two real Hilbert spaces. Let f Γ 0 (H), g Γ 0 (G), and L B(H,G). If int(domg) L(domf) or domg int ( L(domf) ), then µ = inf x H f(x)+g(lx) = min v G f ( L v)+g (v) = µ.

6 5/29 Example 1: Linear programming Let L R K N, b R K, and c R N. The primal problem Primal-LP : minimize c x s.t. Lx b x [0,+ [ N is associated with the the dual problem Dual-LP : maximize b y s.t. L y c. y [0,+ [ K In addition, if the primal problem has a solution, then strong duality holds.

7 5/29 Example 1: Linear programming Let L R K N, b R K, and c R N. The primal problem Primal-LP : minimize c x s.t. Lx b x [0,+ [ N is associated with the the dual problem Dual-LP : maximize b y s.t. L y c. y [0,+ [ K In addition, if the primal problem has a solution, then strong duality holds. Proof: Set ( x H = R N ) f(x) = c x +ι [0,+ [ N(x), ( z G = R K ) g(z) = ι [0,+ [ K(z b), y = v

8 6/29 Example 2: Consensus and sharing Let H be a real Hilbert space. For every i {1,...,m}, let g i : H ],+ ] and h i : H ],+ ]. The consensus problem is given by minimize (x 1,...,x m) H m x 1= =x m The sharing problem is given by maximize (u 1,...,u m) H m u 1+ +u m=u m g i (x i ). i=1 m h i (u i ), u H. i=1 If, for every i {1,...,m}, h i = g i ( u/m), then sharing is the dual problem of consensus.

9 6/29 Example 2: Consensus and sharing Let H be a real Hilbert space. For every i {1,...,m}, let g i : H ],+ ] and h i : H ],+ ]. The consensus problem is given by minimize (x 1,...,x m) H m x 1= =x m The sharing problem is given by maximize (u 1,...,u m) H m u 1+ +u m=u m g i (x i ). i=1 m h i (u i ), u H. i=1 If, for every i {1,...,m}, h i = g i ( u/m), then sharing is the dual problem of consensus. Proof: Set L = Id and ( x = (x 1,...,x m ) H N) { f(x) = ι Λm (x), g(x) = m i=1 g i(x i ) where Λ m = { (x 1,...,x m ) H m x 1 =... = x m }.

10 7/29 Fenchel-Rockafellar duality Let H and G be two real Hilbert spaces. Let f : H ],+ ], g: G ],+ ], and L B(H,G). If domg L(domf), then ( x H) f(x)+l g(lx) (f +g L)(x) Let H and G be two real Hilbert spaces. Letf Γ 0 (H), g Γ 0 (G), andl B(H,G). Ifint(domg) L(domf) and domg int ( L(domf) ), then f +L gl = (f +g L)

11 8/29 Fenchel-Rockafellar duality Duality theorem (1) Let H and G be two real Hilbert spaces. Let f Γ 0 (H), g Γ 0 (G), and L B(H,G). zer( f +L gl ) zer ( ( L) f ( L )+ g )

12 8/29 Fenchel-Rockafellar duality Duality theorem (1) Let H and G be two real Hilbert spaces. Let f Γ 0 (H), g Γ 0 (G), and L B(H,G). zer( f +L gl ) zer ( ( L) f ( L )+ g ) Proof: ( x H) 0 f(x)+l g(lx) ( x H)( v G) ( x H)( v G) ( v G) { L v f(x) v g(lx) { x f ( L v) Lx g (v) 0 L f ( L v)+ g (v).

13 9/29 Fenchel-Rockafellar duality Duality theorem (2) Let H and G be two real Hilbert spaces. Let f Γ 0 (H), g Γ 0 (G), and L B(H,G). If there exists x H such that 0 f( x)+l g(l x), then x is a solution to the primal problem. Moreover, there exists a solution v to the dual problem such that L v f( x) and L x g ( v). If there exists ( x, v) H G such that L v f( x) and L x g ( v) then x (resp. v) is a solution to the primal (resp. dual) problem. If ( x, v) H G is such that L v f( x) and L x g ( v), then ( x, v) is called a Kuhn-Tucker point.

14 10/29 Fenchel-Rockafellar duality Proof: 0 f( x)+l g(l x) (f +g L)( x). Then, according to Fermat rule, x is a solution to the primal problem. In addition, there exists v G such that { 0 f( x)+l v { L v f( x) v g(l x) L x g ( v). We have also x f ( L v), which implies that On the other hand, 0 L f ( L v)+ g ( v). 0 L f ( L v)+ g ( v) ( f ( L )+g ) ( v) v solution to the dual problem. The second assertion is shown in a similar manner.

15 10/29 Fenchel-Rockafellar duality Particular case: If f = ϕ+ 1 2 z 2 where ϕ Γ 0 (H) and z H, then L v f( x) L v ϕ( x)+ x z 0 x +L v z + ϕ( x). Hence, x = prox ϕ ( L v +z).

16 11/29 Link with Lagrange duality Minimax problem Let H and G be two real Hilbert spaces. Let f : H ],+ ], g: G ],+ ], and L B(H,G). The primal problem is equivalent to finding µ = inf sup (x,y) H G v G L(x,y,v) where L is the Lagrange function defined as ( (x,y,v) H G 2 ) L(x,y,v) = f(x)+g(y)+ v Lx y

17 11/29 Link with Lagrange duality Minimax problem Let H and G be two real Hilbert spaces. Let f : H ],+ ], g: G ],+ ], and L B(H,G). The primal problem is equivalent to finding µ = inf sup (x,y) H G v G L(x,y,v) where L is the Lagrange function defined as ( (x,y,v) H G 2 ) L(x,y,v) = f(x)+g(y)+ v Lx y Proof: µ = inf f(x)+g(lx) = inf f(x)+g(y)+ι {0} (Lx y) x H (x,y) H G = inf f(x)+g(y)+sup v Lx y (x,y) H G v G = inf (x,y) H G v G supf(x)+g(y)+ v Lx y.

18 11/29 Link with Lagrange duality Maximin problem Let H and G be two real Hilbert spaces. Let f : H ],+ ], g: G ],+ ], and L B(H,G). The dual problem is equivalent to finding µ = sup inf L(x,y,v) v G (x,y) H G where L is the Lagrange function defined as ( (x,y,v) H G 2 ) L(x,y,v) = f(x)+g(y)+ v Lx y Proof: µ = inf f ( L v)+g (v) = inf v G v G ( = inf v G = sup ( sup x H x L v f(x) ) + ( sup y v g(y) ) y G ) inf (x,y) H G inf v G (x,y) H G f(x)+g(y)+ v Lx y f(x)+g(y)+ v Lx y.

19 11/29 Link with Lagrange duality Maximin problem Let H and G be two real Hilbert spaces. Let f : H ],+ ], g: G ],+ ], and L B(H,G). The dual problem is equivalent to finding µ = sup inf L(x,y,v) v G (x,y) H G where L is the Lagrange function defined as ( (x,y,v) H G 2 ) L(x,y,v) = f(x)+g(y)+ v Lx y Remark: v is called the Lagrange multiplier associated with the constraint Lx = y.

20 12/29 Link with Lagrange duality Let ( x,ŷ, v) H G 2. ( x,ŷ, v) is a saddle point of the Lagrange function L if ( (x,y,v) H G 2 ) L( x,ŷ,v) L( x,ŷ, v) L(x,y, v). Let H and G be two real Hilbert spaces. Let f Γ 0 (H), g Γ 0 (G), and L B(H,G). Let ( x,ŷ, v) H G 2. Assume that int(domg) L(domf) or domg int ( L(domf) ). ( x,ŷ, v) is a saddle point of the Lagrange function ( x, v) is a Kuhn-Tucker point and ŷ = L x.

21 13/29 Link with Lagrange duality Proof ( ): If ( x,ŷ, v) is a saddle point of L, then it is a critical point of L, that is 0 x L( x,ŷ, v) = f( x)+l v 0 y L( x,ŷ, v) = g(ŷ) v 0 = v L( x,ŷ, v) = L x ŷ L v f( x) v g(ŷ) ŷ = L x L v f( x) L x g ( v) ŷ = L x.

22 13/29 Link with Lagrange duality Proof ( ): Conversely, assume that ( x, v) is a Kuhn-Tucker point and ŷ = L x. Since x (resp. v) is a solution to the primal (resp. dual) problem, then µ = inf sup (x,y) H G v G v G L(x,y,v) = supl( x,ŷ,v) v G µ = sup inf L(x,y,v) = inf L(x,y, v). (x,y) H G (x,y) H G By strong duality, sup v G L( x,ŷ,v) = inf (x,y) H G L(x,y, v), which can be rewritten as ( (x,y,v) H G 2 ) L( x,ŷ,v) L(x,y, v) or equivalently ( (x,y,v) H G 2 ) L( x,ŷ,v) L( x,ŷ, v) L(x,y, v).

23 14/29 Alternating-direction method of multipliers Idea: iterations for finding a saddle point ( x,ŷ,ẑ): x n Argmin L(,y n,z n ) ( n N) y n+1 Argmin L(x n,,z n ) v n+1 such that L(x n,y n+1,v n+1 ) L(x n,y n+1,v n ). But the convergence is not guaranteed in general!

24 14/29 Alternating-direction method of multipliers Idea: iterations for finding a saddle point ( x,ŷ,ẑ): x n Argmin L(,y n,z n ) ( n N) y n+1 Argmin L(x n,,z n ) v n+1 such that L(x n,y n+1,v n+1 ) L(x n,y n+1,v n ). But the convergence is not guaranteed in general! Solution: introduce an Augmented Lagrange function. Let γ ]0,+ [, we define ( (x,y,z) H G 2 ) L(x,y,z) =f(x)+g(y)+γ z Lx y The Lagrange multiplier is v = γz. + γ 2 Lx y 2

25 15/29 Alternating-direction method of multipliers Let H and G be two real Hilbert spaces. Let f Γ 0 (H), g Γ 0 (G), and L B(H,G). Let ( x,ŷ, v) H G 2. Assume that int(domg) L(domf) or domg int ( L(domf) ). ( x,ŷ,ẑ) is a saddle point of the augmented Lagrange function ( x,γẑ) is a Kuhn-Tucker point and ŷ = L x.

26 16/29 Alternating-direction method of multipliers Proof ( ): If ( x,ŷ,ẑ) is a saddle point of L, then 0 x L( x,ŷ,ẑ) = f( x)+γl ẑ +γl (L x ŷ) 0 y L( x,ŷ,ẑ) = g(ŷ) γẑ +γ(ŷ L x) 0 = z L( x,ŷ,ẑ) = L x ŷ 0 x L( x,ŷ,γẑ) 0 y L( x,ŷ,γẑ) 0 = v L( x,ŷ,γẑ).

27 16/29 Alternating-direction method of multipliers Proof ( ): Conversely, if ( x,γẑ) is a Kuhn-Tucker point and ŷ = L x, then it is a saddle point of L. In addition, ( (x,y,z) H G 2 ) L(x,y,z) = L(x,y,γz)+ γ 2 Lx y 2. It can be deduced that L( x,ŷ,z) = L( x,ŷ,γz) L( x,ŷ,γẑ) = L( x,ŷ,ẑ) L(x,y,γẑ) L(x,y,ẑ).

28 17/29 Alternating-direction method of multipliers Algorithm for finding a saddle point: x n = argmin L(x,y n,z n ) x H ( n N) y n+1 = argmin L(x n,y,z n ) y G z n+1 such that L(x n,y n+1,z n+1 ) L(x n,y n+1,z n ). By performing a gradient ascent on the Lagrange multiplier, x n = argmin f(x)+γ z n Lx y n + γ 2 Lx y n 2 x H ( n N) y n+1 = argmin g(y)+γ z n Lx n y + γ 2 Lx n y 2 y G z n+1 = z n + 1 γ z L(x n,y n+1,z n ) 1 x n = argmin 2 Lx y n +z n γ f(x) x H ( n N) y n+1 = proxg γ (z n +Lx n ) z n+1 = z n +Lx n y n+1.

29 18/29 Augmented Lagrangian method ADMM algorithm (Alternating-direction method of multipliers) Let H and G be two Hilbert spaces. Let f Γ 0(H) and g Γ 0(G). Let L B(H,G) such that L L is an isomorphism and let γ ]0,+ [. ( n N) 1 x n = argmin Lx 2 yn +zn f(x) γ x H s n = Lx n y n+1 = prox g (zn +sn) γ z n+1 = z n +s n y n+1.

30 18/29 Augmented Lagrangian method ADMM algorithm (Alternating-direction method of multipliers) Let H and G be two Hilbert spaces. Let f Γ 0(H) and g Γ 0(G). Let L B(H,G) such that L L is an isomorphism and let γ ]0,+ [. We assume that int(domg) L(domf) or domg int ( L(domf) ), and that Argmin(f +g L). Let (y 0,z 0) G 2 and 1 x n = argmin Lx 2 yn +zn f(x) γ x H s ( n N) n = Lx n y n+1 = prox g (zn +sn) γ We have: z n+1 = z n +s n y n+1. x n x where x Argmin(f +g L) γz n v where v Argmin(f ( L )+g ).

31 18/29 Augmented Lagrangian method ADMM algorithm (Alternating-direction method of multipliers) Let H and G be two Hilbert spaces. Let f Γ 0(H) and g Γ 0(G). Let L B(H,G) such that L L is an isomorphism and let γ ]0,+ [. We assume that int(domg) L(domf) or domg int ( L(domf) ), and that Argmin(f +g L). Let (y 0,z 0) G 2 and 1 x n = argmin Lx 2 yn +zn f(x) γ x H s ( n N) n = Lx n y n+1 = prox g (zn +sn) γ We have: z n+1 = z n +s n y n+1. x n x where x Argmin(f +g L) γz n v where v Argmin(f ( L )+g ). Sketch of the proof: ADMM can be shown to be equivalent to Douglas-Rachford applied to the dual problem.

32 19/29 Augmented Lagrangian method Extension: Splitting more than 2 functions minimize x H M f i (x) i=1 where H a real Hilbert space and, for every i {1,...,m}, f i Γ 0 (H). Idea: Reformulate the problem as a consensus one in the product space H = H m M minimize f i (x i )+ι Λm (x) x=(x 1,...,x m) H where i=1 Λ m = { (x 1,...,x m ) H x1 = = x m }.

33 20/29 Augmented Lagrangian method Use of ADMM: set ( x = (x 1,...,x m ) H) g = ι Λm, L = Id. f(x) = M i=1 f i(x i ) Resulting SDMM (Simultaneous Direction Method of Multipliers): ( n N) 1 x n = argmin 2 x y n +z n γ f(x) x H y n+1 = proxg γ (z n +x n ) z n+1 = z n +x n y n+1.

34 20/29 Augmented Lagrangian method Use of ADMM: set ( x = (x 1,...,x m ) H) g = ι Λm, L = Id. f(x) = M i=1 f i(x i ) Resulting SDMM (Simultaneous Direction Method of Multipliers): x n = prox f (y n z n ) γ ( n N) y n+1 = P Λm (z n +x n ) z n+1 = z n +x n y n+1. P Λm z n+1 = P Λm (z n +x n ) P Λm y n+1 = y n+1 y n+1 = 0

35 20/29 Augmented Lagrangian method Use of ADMM: set ( x = (x 1,...,x m ) H) g = ι Λm, L = Id. f(x) = M i=1 f i(x i ) Resulting SDMM (Simultaneous Direction Method of Multipliers): x n = prox f (y n z n ) γ ( n N) y n+1 = P Λm x n z n+1 = z n +x n y n+1 provided that P Λm z 0 = 0.

36 20/29 Augmented Lagrangian method Projection onto Λ m : ( u = (u1,...,u m ) H ) P Λm (u) = (u,...,u) where u = 1 m m u i. Resulting SDMM (Simultaneous Direction Method of Multipliers): x n = prox f (y n z n ) γ ( n N) y n+1 = P Λm x n z n+1 = z n +x n y n+1 provided that P Λm z 0 = 0. i=1

37 20/29 Augmented Lagrangian method Projection onto Λ m : ( u = (u1,...,u m ) H ) P Λm (u) = (u,...,u) where u = 1 m m u i. Resulting SDMM (Simultaneous Direction Method of Multipliers): ( n N) provided that m i=1 z 0,i = 0. i=1 x n,i = proxf i (x n z n,i ), i {1,...,m} γ x n = 1 m x n,i m i=1 z n+1,i = z n,i +x n,i x n, i {1,...,m}

38 20/29 Augmented Lagrangian method Projection onto Λ m : ( u = (u1,...,u m ) H ) P Λm (u) = (u,...,u) where u = 1 m m u i. Resulting SDMM (Simultaneous Direction Method of Multipliers): ( n N) i=1 x n,i = proxf i (x n z n,i ), i {1,...,m} γ x n = 1 m x n,i m i=1 z n+1,i = z n,i +x n,i x n, i {1,...,m} Parallel computing of proximity operator, but centralized computation of average.

39 21/29 Distributed optimization V = {1,...,m} set of indices of the vertices of an undirected nonreflexive graph x H vector of node weights E = { e i,j (i,j) E } (with i < j) set of edges.

40 21/29 Distributed optimization V = {1,...,m} set of indices of the vertices of an undirected nonreflexive graph x H vector of node weights E = { e i,j (i,j) E } (with i < j) set of edges. The graph is assumed to be connected: x Λ m ( (i,j) E ) (x i,x j ) Λ 2

41 21/29 Distributed optimization V = {1,...,m} set of indices of the vertices of an undirected nonreflexive graph x H vector of node weights E = { e i,j (i,j) E } (with i < j) set of edges. The graph is assumed to be connected: x Λ m ( (i,j) E ) (x i,x j ) Λ 2 For every (i,j) E, we define decimation operators: L i,j : H H 2 : x (x i,x j ) interpolation operators: L i,j : H2 H: (y 1,y 2 ) x = (x i ) 1 i m where y 1 if i = i ( i {1,...,m}) x i = y 2 if i = j 0 otherwise.

42 21/29 Distributed optimization V = {1,...,m} set of indices of the vertices of an undirected nonreflexive graph x H vector of node weights E = { e i,j (i,j) E } (with i < j) set of edges. The graph is assumed to be connected: x Λ m ( (i,j) E ) (x i,x j ) Λ 2 For every (i,j) E, we define decimation operators: L i,j : H H 2 : x (x i,x j ) interpolation operators: L i,j : H2 H: (y 1,y 2 ) x = (x i ) 1 i m where y 1 if i = i ( i {1,...,m}) x i = y 2 if i = j 0 otherwise. The concatenated decimation operator L = (L i,j ) (i,j) E is such that L L = L i,j L i,j = Diag(d 1 Id,...,d mid) (i,j) E where, for every i {1,...,m}, d i is the degree of vertex of index i.

43 22/29 Distributed optimization Distributed consensus minimize x=(x 1,...,x m) H M f i (x i )+ i=1 (i,j) E ι Λm (x) }{{} ( ι Λ2 (xi,x j ) )

44 22/29 Distributed optimization Distributed consensus minimize x=(x 1,...,x m) H M f i (x i )+ ( ι Λ2 Li,j x ) i=1 (i,j) E

45 22/29 Distributed optimization Distributed consensus minimize x=(x 1,...,x m) H M f i (x i )+ ( ι Λ2 Li,j x ) i=1 (i,j) E Use of ADMM: Set M ( x = (x 1,...,x m ) H) f(x) = f i (x i ) i=1 ( y = (yi,j ) (i,j) E H E ) g(y) = (i,j) E ι Λ2 ( yi,j ).

46 22/29 Distributed optimization Use of ADMM: Set ( x = (x 1,...,x m ) H) f(x) = M f i (x i ) i=1 ( y = (yi,j ) (i,j) E H E ) g(y) = (i,j) E ι Λ2 ( yi,j ). Distributed ADMM (1): 1 x n = argmin 2 Lx y n +z n γ f(x) x H s ( n N) n = Lx n y n+1 = proxg γ (z n +s n ) z n+1 = z n +s n y n+1.

47 22/29 Distributed optimization Distributed ADMM (1): 1 x n = argmin 2 Lx y n +z n γ f(x) x H s ( n N) n = Lx n y n+1 = proxg γ (z n +s n ) z n+1 = z n +s n y n+1. Distributed ADMM (2): provided that ( (i,j) E ) P Λ2 z 0,i,j = 0, ( n N) 0 L (Lx n y n +z n )+ 1 γ f(x n) s n,i,j = (x n,i,x n,j ), (i,j) E y n+1,i,j = P Λ2 (z n,i,j +s n,i,j ) = P Λ2 s n,i,j, (i,j) E z n+1,i,j = z n,i,j +s n,i,j y n+1,i,j, (i,j) E.

48 22/29 Distributed optimization Distributed ADMM (2): provided that ( (i,j) E ) P Λ2 z 0,i,j = 0, 0 L (Lx n y n +z n )+ 1 γ f(x n) s n,i,j = (x n,i,x n,j ), (i,j) E ( n N) y n+1,i,j = P Λ2 (z n,i,j +s n,i,j ) = P Λ2 s n,i,j, (i,j) E z n+1,i,j = z n,i,j +s n,i,j y n+1,i,j, (i,j) E. Distributed ADMM (3): 0 d i x n,i ( L (y n z n ) ) i + 1 γ f i(x n,i ), i V s n,i,j = (x n,i,x n,j ), (i,j) E ( n N) y n+1,i,j = P Λ2 s n,i,j, (i,j) E z n+1,i,j = z n,i,j +s n,i,j y n+1,i,j, (i,j) E.

49 22/29 Distributed optimization Distributed ADMM (3): 0 d i x n,i ( L (y n z n ) ) i + 1 γ f i(x n,i ), i V s n,i,j = (x n,i,x n,j ), (i,j) E ( n N) y n+1,i,j = P Λ2 s n,i,j, (i,j) E z n+1,i,j = z n,i,j +s n,i,j y n+1,i,j, (i,j) E. Distributed ADMM (4): ( ( x n,i = prox f i d 1 i L (y n z n ) ) ) i i V γd i y n+1,i,j = 1 2 (x n,i +x n,j ), (i,j) E ( n N) y n+1,i,j = (y n+1,i,j,y n+1,i,j ) z n+1,i,j,1 = z n,i,j,1 +x n,i y n+1,i,j, (i,j) E z n+1,i,j,2 = z n,i,j,2 +x n,j y n+1,i,j, (i,j) E.

50 22/29 Distributed optimization Distributed ADMM (4): ( ( x n,i = prox f i d 1 i L (y n z n ) ) ) i i V γd i y n+1,i,j = 1 2 (x n,i +x n,j ), (i,j) E ( n N) y n+1,i,j = (y n+1,i,j,y n+1,i,j ) z n+1,i,j,1 = z n,i,j,1 +x n,i y n+1,i,j, (i,j) E z n+1,i,j,2 = z n,i,j,2 +x n,j y n+1,i,j, (i,j) E. Distributed ADMM (5): ( x n,i = prox f i d 1 ( i (y n,i,j z n,i,j,1 )+ (y n,j,i z n,j,i,2 ) )), i V γd i j:(i,j) E j:(j,i) E ( n N) y n+1,i,j = 1 2 (x n,i +x n,j ), (i,j) E z n+1,i,j,1 = z n,i,j,1 +x n,i y n+1,i,j, (i,j) E z n+1,j,i,2 = z n,j,i,2 +x n,i y n+1,j,i, (j,i) E. { y n+1,i,j z n+1,i,j,1 = 2y n+1,i,j x n,i z n,i,j,1 = x n,j z n,i,j,1 y n+1,j,i z n+1,j,i,2 = x n,j z n,j,i,2

51 22/29 Distributed optimization Distributed ADMM (5): ( x n,i = prox f i d 1 ( i (y n,i,j z n,i,j,1 )+ (y n,j,i z n,j,i,2 ) )), i V γd i j:(i,j) E j:(j,i) E ( n N) y n+1,i,j = 1 2 (x n,i +x n,j ), (i,j) E z n+1,i,j,1 = z n,i,j,1 +x n,i y n+1,i,j, (i,j) E z n+1,j,i,2 = z n,j,i,2 +x n,i y n+1,j,i, (j,i) E. { y n+1,i,j z n+1,i,j,1 = 2y n+1,i,j x n,i z n,i,j,1 = x n,j z n,i,j,1 y n+1,j,i z n+1,j,i,2 = x n,j z n,j,i,2 Distributed ADMM (6): z n+1,i,j,1 = z n,i,j, (x n,i x n,j ), (i,j) E z n+1,j,i,2 = z n,j,i, (x n,i x n,j ), (j,i) E x n,i = d 1 ( i x n,j + ) x n,j i V ( n N) j:(i,j) E j:(j,i) E z n,i = d 1 ( i z n,i,j,1 + ) z n,j,i,2 i V j:(i,j) E ( γd i j:(j,i) E x n+1,i = prox f i x n,i z n,i ), i V

52 22/29 Distributed optimization Distributed ADMM (6): z n+1,i,j,1 = z n,i,j, (x n,i x n,j ), (i,j) E z n+1,j,i,2 = z n,j,i, (x n,i x n,j ), (j,i) E x n,i = d 1 ( i x n,j + ) x n,j i V ( n N) j:(i,j) E j:(j,i) E z n,i = d 1 ( i z n,i,j,1 + ) z n,j,i,2 i V j:(i,j) E ( γd i x n+1,i = prox f i j:(j,i) E x n,i z n,i ), i V Distributed ADMM (final form): x n,i = d 1 ( i x n,j + ) x n,j i V j:(i,j) E j:(j,i) E ( n N) z n+1,i = z n,i +d i x n,i x n,i x n+1,i = prox f i x n,i z n,i ), i V. γd i (

53 23/29 Primal-dual methods Problem Let H and G be two real Hilbert spaces, f Γ 0 (H), h Γ 0 (H), g Γ 0 (G), and L B(H,G). It is assumed that h is differentiable and have a β-lipschitzian gradient with β ]0,+ [. We want to minimize f(x)+g(lx)+h(x). x H

54 24/29 Primal-dual methods ADMM algorithm: Let γ ]0,+ [. ( n N) Limitations: 1 x n = argmin 2 Lx y n +z n ( ) γ f(x)+h(x) x H y n+1 = proxg γ (z n +Lx n ) z n+1 = z n +Lx n y n+1. Computation of xn at iteration n N may be complicated. Convergence requires L L to be invertible. The smoothness of h is not exploited.

55 25/29 Primal-dual methods Idea 1: the optimization problem is reformulated as finding inf sup f(x)+h(x)+ v Lx g (v). x H v G Arrow-Hurwitz method: Let (τ n ) n N and (σ n ) n N be sequences in ]0,+ [. t n f(x n ) x n+1 = x n τ n (t n + h(x n )+L v n ) ( n N) s n g (v n ) v n+1 = v n σ n (s n Lx n+1 ) requires stringent conditions on the choice of the step-size (e.g. decaying to zero)

56 26/29 Primal-dual methods Idea 2: Use implicit updates t n f(x n+1 ) x n+1 = x n τ n (t n + h(x n )+L v n ) ( n N) s n g (v n+1 ) v n+1 = v n σ n (s n Lx n+1 ) { 0 x n+1 x n +τ n ( h(x n )+L v n )+τ n f(x n+1 ) 0 v n+1 v n σ n Lx n+1 +σ n g (v n+1 ) { ( x n+1 = prox τnf xn τ n ( h(x n )+L v n ) ) ( ) v n+1 = prox σng vn +σ n Lx n+1 still does not converge for constant values of the step-size.

57 27/29 Primal-dual methods Idea 3: Use the approximation x n+1 2x n+1 x n { ( x n+1 = prox ( n N) τnf xn τ n ( h(x n )+L v n ) ( v n+1 = prox σng vn +σ n L(2x n+1 x n ) ).

58 28/29 Primal-dual optimization algorithm Convergence of PD algorithm Let H and G be two Hilbert spaces. Let L B(H,G). Let f Γ 0 (H) and g Γ 0 (G). Let h Γ 0 (H) have a β-lipschitzian gradient. ( n N) {x n+1 = prox τf ( xn τ ( h(x n )+L v n )) v n+1 = prox σg ( vn +σl(2x n+1 x n ) ).

59 28/29 Primal-dual optimization algorithm Convergence of PD algorithm Let H and G be two Hilbert spaces. Let L B(H,G). Let f Γ 0 (H) and g Γ 0 (G). Let h Γ 0 (H) have a β-lipschitzian gradient. Let τ ]0,+ [ and σ ]0,+ [. We assume that τ 1 σ L 2 β/2 and zer ( f + h+l gl ). Let x 0 H, v 0 G, and ( {x n+1 = prox τf xn τ ( )) h(x n )+L v n We have: ( n N) x n x Argmin ( f +h+g L ) v n+1 = prox σg ( vn +σl(2x n+1 x n ) ). v n v Argmin ( (f +g) ( L )+g ).

60 28/29 Primal-dual optimization algorithm Convergence of PD algorithm Let H and G be two Hilbert spaces. Let L B(H,G). Let f Γ 0 (H) and g Γ 0 (G). Let h Γ 0 (H) have a β-lipschitzian gradient. Let τ ]0,+ [ and σ ]0,+ [. We assume that τ 1 σ L 2 β/2 and zer ( f + h+l gl ). Let x 0 H, v 0 G, and ( {x n+1 = prox τf xn τ ( )) h(x n )+L v n We have: ( n N) x n x Argmin ( f +h+g L ) v n+1 = prox σg ( vn +σl(2x n+1 x n ) ). v n v Argmin ( (f +g) ( L )+g ). Sketch of the proof: rewrite the algorithm as a Forward-Backward iteration for solving a saddle point problem.

61 29/29 Exercice Let H and (G i ) 1 i m be real Hilbert spaces. Let f Γ 0 (H), let h Γ 0 (H), and, for every i {1,...,m}, let g i Γ 0 (G) and L i B(H,G i ). It is assumed that h is differentiable and have a β-lipschitzian gradient with β ]0,+ [. Propose a primal-dual algorithm to solve minimize x H m f(x)+ g i (L i x)+h(x). i=1

An Overview of Recent and Brand New Primal-Dual Methods for Solving Convex Optimization Problems

PGMO 1/32 An Overview of Recent and Brand New Primal-Dual Methods for Solving Convex Optimization Problems Emilie Chouzenoux Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est Marne-la-Vallée,