Primal-dual algorithms for the sum of two and three functions 1

Size: px
Start display at page:

Download "Primal-dual algorithms for the sum of two and three functions 1"

Transcription

1 Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF.

2 optimization problems for primal-dual algorithms minimize f(x) + g(x) + h(ax) x f, g, and h are convex. X and Y are two Hilbert spaces (e.g., R m, R n ). f : X R is differentiable with a 1/β-Lipschitz continuous gradient for some β (0, + ). A : X Y is a bounded linear operator.

3 applications: statistics Elastic net regularization (Zou-Hastie 05): minimize x µ 2 x µ 1 x 1 + l(ax, b), where x R p, A R n p, b R n, and l is the loss function, which may be nondifferentiable.

4 applications: statistics Elastic net regularization (Zou-Hastie 05): minimize x µ 2 x µ 1 x 1 + l(ax, b), where x R p, A R n p, b R n, and l is the loss function, which may be nondifferentiable. Fused lasso (Tibshirani et al. 05): minimize x 1 2 Ax b µ 1 x 1 + µ 2 Dx 1, where x R p, A R n p, b R n, and 1 1 D = is a matrix in R (p 1) p. 1 1

5 applications: decentralized optimization minimize x n f i(x) + g i(x) i=1 f i and g i is known at node i only. Nodes 1,, n are connected in a undirected graph. f i is differentiable with a Lipschitz continuous gradient.

6 applications: decentralized optimization minimize x n f i(x) + g i(x) i=1 f i and g i is known at node i only. Nodes 1,, n are connected in a undirected graph. f i is differentiable with a Lipschitz continuous gradient. Introduce a copy x i at node i: minimize x f(x) + g(x) := n f i(x i) + g i(x i) i=1 s.t. Wx = x x i R p, x = [x 1 x 2 x n] R n p. W is a symmetric doubly stochastic mixing matrix.

7 applications: decentralized optimization minimize x n f i(x) + g i(x) i=1 f i and g i is known at node i only. Nodes 1,, n are connected in a undirected graph. f i is differentiable with a Lipschitz continuous gradient. Introduce a copy x i at node i: minimize x f(x) + g(x) := n f i(x i) + g i(x i) i=1 s.t. Wx = x x i R p, x = [x 1 x 2 x n] R n p. W is a symmetric doubly stochastic mixing matrix. The sum of three functions: minimize x f(x) + g(x) + ι 0((I W) 1/2 x)

8 applications: imaging Image restoration with two regularizations: minimize x 1 2 Ax b ι C(x) + µ Dx 1, where x R n is the image to be reconstructed, A R m n is the forward projection matrix, b R m is the measured data with noise, D is a discrete gradient operator, and ι C is the indicator function that returns zero if x C (here, C is the set of nonnegative vectors in R n ) and + otherwise.

9 applications: imaging Image restoration with two regularizations: minimize x 1 2 Ax b ι C(x) + µ Dx 1, where x R n is the image to be reconstructed, A R m n is the forward projection matrix, b R m is the measured data with noise, D is a discrete gradient operator, and ι C is the indicator function that returns zero if x C (here, C is the set of nonnegative vectors in R n ) and + otherwise. Other problems: f: data fitting term (infimal convolution for mixed noise) h A: total variation; other transforms g: nonnegativity; box constraint

10 primal-dual formulation Original problem: minimize x f(x) + g(x) + h(ax)

11 primal-dual formulation Original problem: minimize x f(x) + g(x) + h(ax) Introduce a dual variable s: minimize x maxf(x) + g(x) + Ax, s h (s) s Here h is the conjugate function of h that is defined as h (s) = max t s, t h(t),

12 primal-dual formulation Original problem: minimize x f(x) + g(x) + h(ax) Introduce a dual variable s: minimize x maxf(x) + g(x) + Ax, s h (s) s Here h is the conjugate function of h that is defined as h (s) = max t s, t h(t), It is equivalent to (s h(ax ) Ax h (s )): { 0 f(x ) + g(x ) + A s 0 h (s ) Ax All primal-dual algorithms try to find (x, s ).

13 existing algorithms: Condat-Vu, AFBA, and PDFP Condat-Vu (Condat 13, Vu 13): Convergence conditions: λ AA + γ/(2β) 1 Per-iteration computations: A, A, f, one (I + γ g) 1, (I + λ γ h ) (I + γ g) 1 ( x) = arg min γg(x) + 1 x 2 x x 2. This is a backward step (or implicit step) because (I + γ g) 1 ( x) x γ g((i + γ g) 1 ( x))

14 existing algorithms: Condat-Vu, AFBA, and PDFP Condat-Vu (Condat 13, Vu 13): Convergence conditions: λ AA + γ/(2β) 1 Per-iteration computations: A, A, f, one (I + γ g) 1, (I + λ γ h ) 1 2 AFBA (Latafat-Patrinos 16): Convergence conditions: λ AA /2 + λ AA /2 + γ/(2β) 1 Per-iteration computations: A, A, f, one (I + γ g) 1, (I + λ γ h ) 1 2 (I + γ g) 1 ( x) = arg min γg(x) + 1 x 2 x x 2. This is a backward step (or implicit step) because (I + γ g) 1 ( x) x γ g((i + γ g) 1 ( x))

15 existing algorithms: Condat-Vu, AFBA, and PDFP Condat-Vu (Condat 13, Vu 13): Convergence conditions: λ AA + γ/(2β) 1 Per-iteration computations: A, A, f, one (I + γ g) 1, (I + λ γ h ) 1 2 AFBA (Latafat-Patrinos 16): Convergence conditions: λ AA /2 + λ AA /2 + γ/(2β) 1 Per-iteration computations: A, A, f, one (I + γ g) 1, (I + λ γ h ) 1 PDFP (Chen-Huang-Zhang 16): Convergence conditions: λ AA < 1; γ/(2β) < 1 Per-iteration computations: A, A, f, two (I + γ g) 1, (I + λ γ h ) 1 2 (I + γ g) 1 ( x) = arg min γg(x) + 1 x 2 x x 2. This is a backward step (or implicit step) because (I + γ g) 1 ( x) x γ g((i + γ g) 1 ( x))

16 PDHG (Zhu-Chan 08) When f = 0, we have [ g A A h ] [ x s ] 0

17 PDHG (Zhu-Chan 08) When f = 0, we have [ g A ] [ x ] A h s 0 It is equivalent to [ g ] [ x ] [ 0 A ] [ ] x A h s 0 s

18 PDHG (Zhu-Chan 08) When f = 0, we have [ g A ] [ x ] A h s 0 It is equivalent to [ 1 γ I+ g ] [ x ] [ 1 γ I A ] [ ] x A γ λ I+ h s γ λ I s

19 PDHG (Zhu-Chan 08) When f = 0, we have [ g A ] [ x ] A h s 0 It is equivalent to [ 1 γ I+ g ] [ x ] [ 1 γ I A ] [ x ] A γ λ I+ h s γ λ I s Primal-dual hybrid gradient (PDHG) x + = (I + γ g) 1 (x γa s) s + = ( I + λ γ h ) 1 ( s + λ γ Ax+)

20 PDHG (Zhu-Chan 08) When f = 0, we have [ g A ] [ x ] A h s 0 It is equivalent to [ 1 γ I+ g ] [ x ] [ 1 γ I A ] [ x ] A γ λ I+ h s γ λ I s Chambolle-Pock (Chambolle et.al 09, Esser-Zhang-Chan 10) x + = (I + γ g) 1 (x γa s) x + = x + +x + x s + = ( I + λ γ h ) 1 ( s + λ γ A x+)

21 Chambolle-Pock 11 as proximal point Chambolle-Pock (x s order) x + = (I + γ g) 1 (x γa s) s + = ( I + λ h ) 1 ( γ s + λ γ A(2x+ x) )

22 Chambolle-Pock 11 as proximal point Chambolle-Pock (x s order) x + = (I + γ g) 1 (x γa s) s + = ( I + λ h ) 1 ( γ s + λ γ A(2x+ x) ) CP is equivalent to the backward operator applied on the KKT system.

23 Chambolle-Pock 11 as proximal point Chambolle-Pock (x s order) x + = (I + γ g) 1 (x γa s) s + = ( I + λ h ) 1 ( γ s + λ γ A(2x+ x) ) CP is equivalent to the backward operator applied on the KKT system. [[ 1 I ] [ ]] [ ] [ γ g x + 1 A γ I + I ] [ γ A x A h s + A γ I s λ λ ] CP is 1/2-averaged under the metric induced by the matrix if λ satisfies the condition λ AA 1.

24 Chambolle-Pock 11 as proximal point Chambolle-Pock (x s order) x + = (I + γ g) 1 (x γa s) s + = ( I + λ h ) 1 ( γ s + λ γ A(2x+ x) ) CP is equivalent to the backward operator applied on the KKT system. [[ 1 I ] [ ]] [ ] [ γ A g A x + 1 A γ I + I ] [ γ A x A h s + A γ I s λ λ ] CP is 1/2-averaged under the metric induced by the matrix if λ satisfies the condition λ AA 1.

25 The optimality condition: [ g A Condat-Vu (Condat 13, Vu 13) A h ] [ x s ] + [ f(x ) CV is equivalent to the forward-backward applied on the KKT system. [[ ] [ ]] [ ] [ ] [ ] [ ] 1 γ I A g A x + 1 A γ λ I + A h s + γ I A x f(x) A γ λ I s 0 0 ] 0

26 The optimality condition: [ g A Condat-Vu (Condat 13, Vu 13) A h ] [ x s ] + [ f(x ) CV is equivalent to the forward-backward applied on the KKT system. [[ ] [ ]] [ ] [ ] [ ] [ 1 γ I A g A x + 1 A γ λ I + A h s + γ I A x f(x) A γ λ I s 0 That is: 0 ] 0 ] x + = (I + γ g) 1 (x γ f(x) γa s) s + = ( I + λ h ) 1 ( γ s + λ γ A(2x+ x) ) It is equivalent to (by changing the update order) s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γ f(x) γa s + ) x + = 2x + x

27 The optimality condition: [ g A Condat-Vu (Condat 13, Vu 13) A h ] [ x s ] + [ f(x ) CV is equivalent to the forward-backward applied on the KKT system. [[ ] [ ]] [ ] [ ] [ ] [ 1 γ I A g A x + 1 A γ λ I + A h s + γ I A x f(x) A γ λ I s 0 That is: 0 ] 0 ] x + = (I + γ g) 1 (x γ f(x) γa s) s + = ( I + λ h ) 1 ( γ s + λ γ A(2x+ x) ) It is equivalent to (by changing the update order) s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γ f(x) γa s + ) x + = 2x + x CV is non-expansive (forward-backward) under the metric induced by the matrix if γ and λ satisfy the condition λ AA + γ/(2β) 1.

28 PDFP 2 O/PAPC (Loris-Verhoeven 11, Chen-Huang-Zhang 13, Drori-Sabach-Teboulle 15) When g = 0, the optimality condition becomes: [ ] [ ] [ 0 A x f(x ) + A h s 0 ] 0

29 PDFP 2 O/PAPC (Loris-Verhoeven 11, Chen-Huang-Zhang 13, Drori-Sabach-Teboulle 15) When g = 0, the optimality condition becomes: [ ] [ ] [ 0 A x f(x ) + A h s 0 ] 0 PAPC is equivalent to the forward-backward applied on the KKT system. [ 1 ] [ ] [ γ I A x + 1 ] [ ] [ A γ λ I γaa + h s + γ I x γ λ I γaa s f(x) 0 ]

30 PDFP 2 O/PAPC (Loris-Verhoeven 11, Chen-Huang-Zhang 13, Drori-Sabach-Teboulle 15) When g = 0, the optimality condition becomes: [ ] [ ] [ 0 A x f(x ) + A h s 0 ] 0 PAPC is equivalent to the forward-backward applied on the KKT system. [ 1 ] [ ] [ γ I A x + 1 ] [ ] [ A γ λ I γaa + h s + γ I x γ λ I γaa s f(x) 0 ] [ 1 γ I A γ λ I + h ] [ x + s + ] [ 1 γ I A γ λ I γaa ] [ x s ] [ f(x) γa f(x) ] PAPC is non-expansive (forward-backward) under the metric induced by the matrix if γ and λ satisfy the conditions λ AA 1 and γ/(2β) 1.

31 PAPC PAPC can be expressed as s + = ( I + λ h ) 1 ( γ (I λaa )s + λ A (x γ f(x))) γ x + = x γ f(x) γa s +

32 PAPC PAPC can be expressed as It is equivalent to s + = ( I + λ h ) 1 ( γ (I λaa )s + λ A (x γ f(x))) γ x + = x γ f(x) γa s + s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γ f(x) γa s + x + = x + γ f(x + ) γa s +

33 PAPC PAPC can be expressed as s + = ( I + λ h ) 1 ( γ (I λaa )s + λ A (x γ f(x))) γ x + = x γ f(x) γa s + It is equivalent to s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γ f(x) γa s + x + = x + γ f(x + ) γa s + PAPC is α-averaged under the metric induced by the matrix. PAPC converges if γ and λ satisfy the conditions λ AA < 4/3 and γ/(2β) < 1 (Li-Yan 17).

34 PDFP (Chen-Huang-Zhang 16) Rewrite PDFP 2 O as s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γ f(x) γa s + x + = x + γ f(x + ) γa s +

35 PDFP (Chen-Huang-Zhang 16) Rewrite PDFP 2 O as s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γ f(x) γa s + x + = x + γ f(x + ) γa s + PDFP, as a generalization of PDFP 2 O, is s + = (I + λ γ h ) ( 1 s + λ A x) γ x + = (I + γ g) 1 (x γ f(x) γa s + ) x + = (I + γ g) 1 (x + γ f(x + ) γa s + ) When g is the indicator function, PDFP reduces to Preconditioned Alternating Projection Algorithm (PAPA) (Krol-Li-Shen-Xu 12).

36 AFBA (Latafat-Patrinos 16) Rewrite PAPC as s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γa (s + s) x + = x + γ f(x + ) γa s +

37 AFBA (Latafat-Patrinos 16) Rewrite PAPC as s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γa (s + s) AFBA, as a generalization of PAPC, is x + = x + γ f(x + ) γa s + s + = (I + λ γ h ) 1 ( s + λ γ A x) x + = x γa (s + s) x + = (I + γ g) 1 (x + γ f(x + ) γa s + ) Convergence conditions: λ AA /2 + λ AA /2 + γ/(2β) 1

38 Chambolle-Pock: Chambolle-Pock and PAPC s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γa s + ) x + = 2x + x

39 Chambolle-Pock: PAPC: Chambolle-Pock and PAPC s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γa s + ) x + = 2x + x s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γ f(x) γa s + x + = x + γ f(x + ) γa s +

40 Chambolle-Pock: Chambolle-Pock and PAPC s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γa s + ) x + = 2x + x PAPC: s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γ f(x) γa s + x + = 2x + x γ f(x + ) + γ f(x)

41 Chambolle-Pock: Chambolle-Pock and PAPC s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γa s + ) x + = 2x + x PAPC: s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γ f(x) γa s + x + = 2x + x γ f(x + ) + γ f(x) PD3O (Yan 16): s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γ f(x) γa s + ) x + = 2x + x γ f(x + ) + γ f(x)

42 Chambolle-Pock Chambolle-Pock (x s order): s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γa s + ) x + = 2x + x

43 Chambolle-Pock 2x + z, s x +, s 2x + z γa s +, s + z, s z +, s + Chambolle-Pock (x s order): z = x γa s x + = (I + γ g) 1 (z) s + = ( I + λ h ) 1 ( γ (I λaa )s + λ γ A(2x+ z) ) z + = z + 2x + z γa s + x +

44 C-P and PAPC PAPC: s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = x γ f(x) γa s + x + = 2x + x γ f(x + ) + γ f(x)

45 C-P and PAPC 2x + z, s 2x + z γ f(x + ), s x +, s 2x + z γ f(x + ) γa s +, s + z, s z +, s + PAPC: x + = z = x γ f(x) γa s s + = ( I + λ h ) 1 ( γ (I λaa )s + λ γ A(2x+ z γ f(x + )) ) z + = z + 2x + z γ f(x + ) γa s + x +

46 PD3O 2x + z, s 2x + z γ f(x + ), s x +, s 2x + z γ f(x + ) γa s +, s + z, s z +, s + PD3O: s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γ f(x) γa s + ) x + = 2x + x γ f(x + ) + γ f(x)

47 PD3O 2x + z, s 2x + z γ f(x + ), s x +, s 2x + z γ f(x + ) γa s +, s + z, s z +, s + PD3O: z = x γ f(x) γa s x + = (I + γ g) 1 (z) s + = ( I + λ h ) 1 ( γ (I λaa )s + λ γ A(2x+ z γ f(x + )) ) z + = z + 2x + z γ f(x + ) γa s + x +

48 PD3O vs Condat-Vu vs AFBA vs PDFP Algorithms: s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γ f(x) γa s + ) PDFP x + = (I + γ g) 1 (x + γ f(x + ) γa s + ) Condat-Vu x + = 2x + x PD3O x + = 2x + x + γ f(x) γ f(x + )

49 PD3O vs Condat-Vu vs AFBA vs PDFP Algorithms: s + = ( I + λ γ h ) 1 ( s + λ γ A x) x + = (I + γ g) 1 (x γ f(x) γa s + ) PDFP x + = (I + γ g) 1 (x + γ f(x + ) γa s + ) Condat-Vu x + = 2x + x PD3O x + = 2x + x + γ f(x) γ f(x + ) Parameters: f 0, g 0 f = 0 g = 0 PDFP λ AA < 1; γ/(2β) < 1 PAPC Condat-Vu λ AA + γ/(2β) 1 C-P AFBA λ AA /2 + λ AA /2 + γ/(2β) 1 PAPC PD3O λ AA 1; γ/(2β) < 1 C-P PAPC

50 convergence results: summary Let z = x γ f(x) γa s and x + x: x = (I + γ g) 1 z s + = ( I + λ h ) 1 ( γ (I λaa )s + λ A (2x z γ f(x))) γ z + = x γ f(x) γa s +

51 convergence results: summary Let z = x γ f(x) γa s and x + x: x = (I + γ g) 1 z s + = ( I + λ h ) 1 ( γ (I λaa )s + λ A (2x z γ f(x))) γ z + = x γ f(x) γa s + (z k+1, s k+1 ) (z k, s k ) 2 M = o ( 1 k+1), and (z k, s k ) weakly converges to a fixed point (z, s )

52 convergence results: summary Let z = x γ f(x) γa s and x + x: x = (I + γ g) 1 z s + = ( I + λ h ) 1 ( γ (I λaa )s + λ A (2x z γ f(x))) γ z + = x γ f(x) γa s + (z k+1, s k+1 ) (z k, s k ) 2 M = o ( 1 k+1), and (z k, s k ) weakly converges to a fixed point (z, s ) Let L(x, s) = f(x) + g(x) + Ax, s h (s), then L( x k, s) L(x, s k+1 ) (z1, s 1 ) (z, s) 2 where ( x k, s k+1 ) = 1 k k i=1 (xi, s i+1 ), and z = x γ f(x) γa s. k

53 convergence results: summary Let z = x γ f(x) γa s and x + x: x = (I + γ g) 1 z s + = ( I + λ h ) 1 ( γ (I λaa )s + λ A (2x z γ f(x))) γ z + = x γ f(x) γa s + (z k+1, s k+1 ) (z k, s k ) 2 M = o ( 1 k+1), and (z k, s k ) weakly converges to a fixed point (z, s ) Let L(x, s) = f(x) + g(x) + Ax, s h (s), then L( x k, s) L(x, s k+1 ) (z1, s 1 ) (z, s) 2 where ( x k, s k+1 ) = 1 k k i=1 (xi, s i+1 ), and z = x γ f(x) γa s. Linear convergence with additional assumptions on f, g, and h k

54 convergence analysis: the general case Let M = γ2 λ (I λaa ) be positive definite. Define s M = s, s M = s, Ms and (z, s) M = z 2 + s 2 M. Lemma The iteration T mapping (z, s) to (z +, s + ) is a nonexpansive operator under the metric defined by M if γ 2β. Furthermore, it is α-averaged with α = 2β 4β γ. Chambolle-Pock is firmly non-expansive under the new metric, which is different from the previous metric.

55 convergence analysis: the general case Theorem 1) Let (z, s ) be any fixed point of T. Then ( (z k, s k ) (z, s ) M) k 0 is monotonically nonincreasing. 2) The sequence ( T(z k, s k ) (z k, s k ) M) k 0 is monotonically nonincreasing and converges to 0. 3) We have the following convergence rate T(z k, s k ) (z k, s k ) 2 M = o ( ) 1 k+1 4) (z k, s k ) weakly converges to a fixed point of T, and if X has finite dimension (e.g., R m ), then it is strongly convergent.

56 convergence analysis: linear convergent Denote: u h = γ λ (I λaa )s + A z γ λ s+ h (s + ), u g = 1 (z x) g(x), γ u h =A( z γa s ) = Ax h (s ), u g = 1 γ (z x ) g(x ). and g(x) g(y) L g x y, s + s, u h uh τ h s + s 2 M, x x, u g ug τ g x x 2, x x, f(x) f(x ) τ f x x.

57 convergence analysis: linear convergent Theorem We have z + z 2 + (1 + 2γτ h ) s + s 2 M ρ ( ) z z 2 + (1 + 2γτ h ) s s 2 M where ρ = max 1 1+2γτ h, 1 (( 2γ γ2 β )τ f +2γτ g ) 1+γL g. (5) When, in addition, γ < 2β, τ h > 0, and τ f + τ g > 0, we have that ρ < 1 and the algorithm converges linearly.

58 numerical experiment: fused lasso minimize x p Ax b µ 1 x 1 + µ 2 i=1 x i+1 x i, x = (x 1,, x p) R p, A R n p, b R n True PD3O PDFP Condat-Vu 3 2 True PD3O PDFP Condat-Vu Figure: The true sparse signal and the reconstructed results using PD3O, PDFP, and Condat-Vu. The right figure is a zoom-in of the signal in [3000, 5500].

59 numerical experiment: fused lasso 10 3 PD3O PD3O PDFP-.1 Condat-Vu PDFP-.1 Condat-Vu-.1 PD3O-.2 PD3O PDFP-.2 PD3O PDFP-.2 PD3O-.3 f!f $ f $ PDFP-.3 f!f $ f $ PDFP iteration iteration Figure: In the left figure, we fix λ = 1/8 and let γ = β, 1.5β, 1.9β. In the right figure, we fix γ = 1.9β and let λ = 1/80, 1/8, 1/4.

60 applications: decentralized optimization minimize x n f i(x) + g i(x) i=1 f i and g i is known at node i only. Nodes 1,, n are connected in a undirected graph. f i is differentiable with a Lipschitz continuous gradient.

61 applications: decentralized optimization minimize x n f i(x) + g i(x) i=1 f i and g i is known at node i only. Nodes 1,, n are connected in a undirected graph. f i is differentiable with a Lipschitz continuous gradient. Introduce a copy x i at node i: minimize x f(x) + g(x) := n f i(x i) + g i(x i) i=1 s.t. Wx = x x i R p, x = [x 1 x 2 x n] R n p. W is a symmetric doubly stochastic mixing matrix.

62 applications: decentralized optimization minimize x n f i(x) + g i(x) i=1 f i and g i is known at node i only. Nodes 1,, n are connected in a undirected graph. f i is differentiable with a Lipschitz continuous gradient. Introduce a copy x i at node i: minimize x f(x) + g(x) := n f i(x i) + g i(x i) i=1 s.t. Wx = x x i R p, x = [x 1 x 2 x n] R n p. W is a symmetric doubly stochastic mixing matrix. The sum of three functions: minimize x f(x) + g(x) + ι 0((I W) 1/2 x)

63 comparing PG-EXTRA and NIDS (Li-Shi-Yan 17) minimize x 1 2 n A ix i b i 2 + µ 1 i=1 n x i 1 + ι 0((I W) 1/2 x) i=1

64 comparing PG-EXTRA and NIDS (Li-Shi-Yan 17) minimize x 1 2 n A ix i b i 2 + µ 1 i=1 n x i 1 + ι 0((I W) 1/2 x) i= NIDS-1/L NIDS-1.5/L 10 0 NIDS-1.9/L PGEXTRA-1/L PGEXTRA-1.2/L 10-2 PGEXTRA-1.3/L PGEXTRA-1.4/L number of iterations 10 4

65 conclusion a new primal-dual algorithm for minimizing the sum of three functions. a new interpretation of Chambolle-Pock: Douglas-Rachford splitting on the KKT system under a new metric induced by a block diagonal matrix. PAPC is forward-backward splitting applied on the KKT system under the same metric; we proved the optimal bound for the parameters (dual stepsize). PD3O is a generalization of both Chambolle-Pock and PAPC, and it has the advantages of both Condat-Vu (a generalization of Chambolle-Pock), and AFBA and PDFP (two generalizations of PAPC). In decentralized consensus optimization, we derive a fast method whose stepsize does not depend on the network structure; we provide an optimal bound for the stepsize in PG-EXTRA (Shi et al. 15).

66 Thank You! Paper 1 M. Yan, A new primal-dual method for minimizing the sum of three functions with a linear operator, Arxiv: arxiv: Code Paper 2 Z. Li, W. Shi and M. Yan, A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates, arxiv: Code Paper 3 Z. Li and M. Yan, A primal-dual algorithm with optimal stepsizes and its application in decentralized consensus optimization, arxiv:

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

arxiv: v4 [math.oc] 29 Jan 2018

arxiv: v4 [math.oc] 29 Jan 2018 Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:

More information

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator

A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator https://doi.org/10.1007/s10915-018-0680-3 A New Primal Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator Ming Yan 1,2 Received: 22 January 2018 / Accepted: 22 February 2018

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Uniqueness of DRS as the 2 Operator Resolvent-Splitting and Impossibility of 3 Operator Resolvent-Splitting

Uniqueness of DRS as the 2 Operator Resolvent-Splitting and Impossibility of 3 Operator Resolvent-Splitting Uniqueness of DRS as the 2 Operator Resolvent-Splitting and Impossibility of 3 Operator Resolvent-Splitting Ernest K Ryu February 21, 2018 Abstract Given the success of Douglas-Rachford splitting (DRS),

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging 1/38 Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging Xiaoqun Zhang Department of Mathematics and Institute of Natural Sciences Shanghai Jiao Tong

More information

arxiv: v1 [math.oc] 16 May 2018

arxiv: v1 [math.oc] 16 May 2018 An Algorithmic Framewor of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method Li Shen 1 Peng Sun 1 Yitong Wang 1 Wei Liu 1 Tong Zhang 1 arxiv:180506137v1 [mathoc] 16 May 2018 Abstract We

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method

An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method An Algorithmic Framewor of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method Li Shen 1 Peng Sun 1 Yitong Wang 1 Wei Liu 1 Tong Zhang 1 Abstract We propose a novel algorithmic framewor

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection EUSIPCO 2015 1/19 A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection Jean-Christophe Pesquet Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin May 3, 216 Abstract Finding a fixed point to a nonexpansive operator, i.e., x = T

More information

arxiv: v3 [math.oc] 30 Nov 2018

arxiv: v3 [math.oc] 30 Nov 2018 A New Randomized Block-Coordinate Primal-Dual Proximal Algorithm for Distributed Optimization Puya Latafat, Nikolaos M. Freris, Panagiotis Patrinos arxiv:706.02882v3 [math.oc] 30 Nov 208 Abstract This

More information

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Radu Ioan Boţ Christopher Hendrich November 7, 202 Abstract. In this paper we

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Mathematical Programming manuscript No. (will be inserted by the editor) On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Daniel O Connor Lieven Vandenberghe

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

A primal-dual fixed point algorithm for multi-block convex minimization *

A primal-dual fixed point algorithm for multi-block convex minimization * Journal of Computational Mathematics Vol.xx, No.x, 201x, 1 16. http://www.global-sci.org/jcm doi:?? A primal-dual fixed point algorithm for multi-block convex minimization * Peijun Chen School of Mathematical

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR TV MINIMIZATION ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient (PDHG) algorithm proposed

More information

On the equivalence of the primal-dual hybrid gradient method and Douglas-Rachford splitting

On the equivalence of the primal-dual hybrid gradient method and Douglas-Rachford splitting On the equivalence of the primal-dual hybrid gradient method and Douglas-Rachford splitting Daniel O Connor Lieven Vandenberghe September 27, 2017 Abstract The primal-dual hybrid gradient (PDHG) algorithm

More information

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE

A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE A GENERAL FRAMEWORK FOR A CLASS OF FIRST ORDER PRIMAL-DUAL ALGORITHMS FOR CONVEX OPTIMIZATION IN IMAGING SCIENCE ERNIE ESSER XIAOQUN ZHANG TONY CHAN Abstract. We generalize the primal-dual hybrid gradient

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

arxiv: v1 [math.oc] 20 Jun 2014

arxiv: v1 [math.oc] 20 Jun 2014 A forward-backward view of some primal-dual optimization methods in image recovery arxiv:1406.5439v1 [math.oc] 20 Jun 2014 P. L. Combettes, 1 L. Condat, 2 J.-C. Pesquet, 3 and B. C. Vũ 4 1 Sorbonne Universités

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29 A Random block-coordinate primal-dual proximal algorithm with application to 3D mesh denoising Emilie CHOUZENOUX Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est, France Horizon Maths 2014

More information

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin July 7, 215 The problem of finding a fixed point to a nonexpansive operator is an abstraction

More information

On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization

On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization Linbo Qiao, and Tianyi Lin 3 and Yu-Gang Jiang and Fan Yang 5 and Wei Liu 6 and Xicheng Lu, Abstract We consider

More information

Solving DC Programs that Promote Group 1-Sparsity

Solving DC Programs that Promote Group 1-Sparsity Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014

More information

Decentralized Consensus Optimization with Asynchrony and Delay

Decentralized Consensus Optimization with Asynchrony and Delay Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization,

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

10-725/ Optimization Midterm Exam

10-725/ Optimization Midterm Exam 10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

An Overview of Recent and Brand New Primal-Dual Methods for Solving Convex Optimization Problems

An Overview of Recent and Brand New Primal-Dual Methods for Solving Convex Optimization Problems PGMO 1/32 An Overview of Recent and Brand New Primal-Dual Methods for Solving Convex Optimization Problems Emilie Chouzenoux Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est Marne-la-Vallée,

More information

A first-order primal-dual algorithm with linesearch

A first-order primal-dual algorithm with linesearch A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv:608.08883v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of

More information

Inertial Douglas-Rachford splitting for monotone inclusion problems

Inertial Douglas-Rachford splitting for monotone inclusion problems Inertial Douglas-Rachford splitting for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek Christopher Hendrich January 5, 2015 Abstract. We propose an inertial Douglas-Rachford splitting algorithm

More information

A primal dual fixed point algorithm for convex separable minimization with applications to image restoration

A primal dual fixed point algorithm for convex separable minimization with applications to image restoration IOP PUBLISHING Inverse Problems 29 (2013) 025011 (33pp) INVERSE PROBLEMS doi:10.1088/0266-5611/29/2/025011 A primal dual fixed point algorithm for convex separable minimization with applications to image

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Solving monotone inclusions involving parallel sums of linearly composed maximally monotone operators

Solving monotone inclusions involving parallel sums of linearly composed maximally monotone operators Solving monotone inclusions involving parallel sums of linearly composed maximally monotone operators Radu Ioan Boţ Christopher Hendrich 2 April 28, 206 Abstract. The aim of this article is to present

More information

Signal Processing and Networks Optimization Part VI: Duality

Signal Processing and Networks Optimization Part VI: Duality Signal Processing and Networks Optimization Part VI: Duality Pierre Borgnat 1, Jean-Christophe Pesquet 2, Nelly Pustelnik 1 1 ENS Lyon Laboratoire de Physique CNRS UMR 5672 pierre.borgnat@ens-lyon.fr,

More information

GSOS: Gauss-Seidel Operator Splitting Algorithm for Multi-Term Nonsmooth Convex Composite Optimization

GSOS: Gauss-Seidel Operator Splitting Algorithm for Multi-Term Nonsmooth Convex Composite Optimization : Gauss-Seidel Operator Splitting Algorithm for Multi-Term Nonsmooth Convex Composite Optimization Li Shen 1 Wei Liu 1 Ganzhao Yuan Shiqian Ma 3 Abstract In this paper, we propose a fast Gauss-Seidel Operator

More information

Mathematical methods for Image Processing

Mathematical methods for Image Processing Mathematical methods for Image Processing François Malgouyres Institut de Mathématiques de Toulouse, France invitation by Jidesh P., NITK Surathkal funding Global Initiative on Academic Network Oct. 23

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems

Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems Tom Goldstein, Min Li, Xiaoming Yuan, Ernie Esser, Richard Baraniuk arxiv:3050546v2 [mathna] 24 Mar 205 Abstract The Primal-Dual hybrid

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Research Article Strong Convergence of a Projected Gradient Method

Research Article Strong Convergence of a Projected Gradient Method Applied Mathematics Volume 2012, Article ID 410137, 10 pages doi:10.1155/2012/410137 Research Article Strong Convergence of a Projected Gradient Method Shunhou Fan and Yonghong Yao Department of Mathematics,

More information

ADMM for monotone operators: convergence analysis and rates

ADMM for monotone operators: convergence analysis and rates ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated

More information

Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem

Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 (206), 424 4225 Research Article Iterative algorithms based on the hybrid steepest descent method for the split feasibility problem Jong Soo

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Algorithms for finding minimum norm solution of equilibrium and fixed point problems for nonexpansive semigroups in Hilbert spaces

Algorithms for finding minimum norm solution of equilibrium and fixed point problems for nonexpansive semigroups in Hilbert spaces Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 (6), 37 378 Research Article Algorithms for finding minimum norm solution of equilibrium and fixed point problems for nonexpansive semigroups

More information

Lecture Notes on Iterative Optimization Algorithms

Lecture Notes on Iterative Optimization Algorithms Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell December 8, 2014 Lecture Notes on Iterative Optimization Algorithms Contents Preface vii 1 Overview and Examples

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Monotone Operator Splitting Methods in Signal and Image Recovery

Monotone Operator Splitting Methods in Signal and Image Recovery Monotone Operator Splitting Methods in Signal and Image Recovery P.L. Combettes 1, J.-C. Pesquet 2, and N. Pustelnik 3 2 Univ. Pierre et Marie Curie, Paris 6 LJLL CNRS UMR 7598 2 Univ. Paris-Est LIGM CNRS

More information

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Graph Convergence for H(, )-co-accretive Mapping with over-relaxed Proximal Point Method for Solving a Generalized Variational Inclusion Problem

Graph Convergence for H(, )-co-accretive Mapping with over-relaxed Proximal Point Method for Solving a Generalized Variational Inclusion Problem Iranian Journal of Mathematical Sciences and Informatics Vol. 12, No. 1 (2017), pp 35-46 DOI: 10.7508/ijmsi.2017.01.004 Graph Convergence for H(, )-co-accretive Mapping with over-relaxed Proximal Point

More information

arxiv: v1 [math.oc] 12 Mar 2013

arxiv: v1 [math.oc] 12 Mar 2013 On the convergence rate improvement of a primal-dual splitting algorithm for solving monotone inclusion problems arxiv:303.875v [math.oc] Mar 03 Radu Ioan Boţ Ernö Robert Csetnek André Heinrich February

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1 Masao Fukushima 2 July 17 2010; revised February 4 2011 Abstract We present an SOR-type algorithm and a

More information

Asynchronous Parallel Computing in Signal Processing and Machine Learning

Asynchronous Parallel Computing in Signal Processing and Machine Learning Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,

More information

A primal dual Splitting Method for Convex. Optimization Involving Lipschitzian, Proximable and Linear Composite Terms,

A primal dual Splitting Method for Convex. Optimization Involving Lipschitzian, Proximable and Linear Composite Terms, A primal dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms Laurent Condat Final author s version. Cite as: L. Condat, A primal dual Splitting Method

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 4890 4900 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa A generalized forward-backward

More information

An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems

An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems An Algorithmic Framework of Generalized Primal-Dual Hybrid Gradient Methods for Saddle Point Problems Bingsheng He Feng Ma 2 Xiaoming Yuan 3 January 30, 206 Abstract. The primal-dual hybrid gradient method

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS. Yazheng Dang. Jie Sun. Honglei Xu

INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS. Yazheng Dang. Jie Sun. Honglei Xu Manuscript submitted to AIMS Journals Volume X, Number 0X, XX 200X doi:10.3934/xx.xx.xx.xx pp. X XX INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS Yazheng Dang School of Management

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

arxiv: v2 [math.oc] 2 Mar 2017

arxiv: v2 [math.oc] 2 Mar 2017 CYCLIC COORDINATE UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS YAT TIN CHOW, TIANYU WU, AND WOTAO YIN arxiv:6.0456v [math.oc] Mar 07 Abstract. Many problems reduce to the fixed-point

More information

Non-stationary Douglas-Rachford and alternating direction method of multipliers: adaptive stepsizes and convergence

Non-stationary Douglas-Rachford and alternating direction method of multipliers: adaptive stepsizes and convergence Non-stationary Douglas-Rachford and alternating direction method of multipliers: adaptive stepsizes and convergence Dirk A. Lorenz Quoc Tran-Dinh January 14, 2018 Abstract We revisit the classical Douglas-Rachford

More information