BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption

Size: px
Start display at page:

Download "BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption"

Transcription

1 BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption Daniel Reem (joint work with Simeon Reich and Alvaro De Pierro) Department of Mathematics, The Technion, Haifa, Israel dream@technion.ac.il The 2018 annual meeting of the Israel Mathematical Union (IMU 2018), The Technion, Haifa, Israel 24 May 2018 (20 25 minutes) BISTA 24 May / 23

2 Minimization of a separable (decomposable) function BISTA 24 May / 23

3 Minimization of a separable (decomposable) function Goal: to estimate BISTA 24 May / 23

4 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} BISTA 24 May / 23

5 Minimization of a separable (decomposable) function Goal: to estimate where inf{f (x) : x C} BISTA 24 May / 23

6 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} where F = f + g BISTA 24 May / 23

7 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} where F = f + g C is a closed and convex subset of some space X, say R n BISTA 24 May / 23

8 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} where F = f + g C is a closed and convex subset of some space X, say R n f is smooth, g possibly not, both are convex. BISTA 24 May / 23

9 Motivation BISTA 24 May / 23

10 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: BISTA 24 May / 23

11 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems BISTA 24 May / 23

12 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing BISTA 24 May / 23

13 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing compressed sensing BISTA 24 May / 23

14 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing compressed sensing machine learning BISTA 24 May / 23

15 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing compressed sensing machine learning more BISTA 24 May / 23

16 A typical scenario: l 1 regularization in signal processing BISTA 24 May / 23

17 A typical scenario: l 1 regularization in signal processing Here BISTA 24 May / 23

18 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g BISTA 24 May / 23

19 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. BISTA 24 May / 23

20 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. BISTA 24 May / 23

21 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. m and n are large positive integers. BISTA 24 May / 23

22 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. m and n are large positive integers. b R m is known. BISTA 24 May / 23

23 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. m and n are large positive integers. b R m is known. λ > 0 is a regularization parameter BISTA 24 May / 23

24 The proximal gradient method BISTA 24 May / 23

25 The proximal gradient method Among the methods used for solving the minimization problem. BISTA 24 May / 23

26 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: BISTA 24 May / 23

27 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given BISTA 24 May / 23

28 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 BISTA 24 May / 23

29 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given A more general form: x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 x k := argmin x C (F k (x) + c k B(x, x k 1 )), k 2. BISTA 24 May / 23

30 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given A more general form: x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 x k := argmin x C (F k (x) + c k B(x, x k 1 )), k 2. Here c k > 0 and F k is an approximation to F = f + g, e.g., F k (x) := f (x k 1 ) + f (x k 1 ), x x k 1 + g(x). BISTA 24 May / 23

31 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given A more general form: x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 x k := argmin x C (F k (x) + c k B(x, x k 1 )), k 2. Here c k > 0 and F k is an approximation to F = f + g, e.g., F k (x) := f (x k 1 ) + f (x k 1 ), x x k 1 + g(x). B is a Bregman divergence induced by some function b (more on that: later) BISTA 24 May / 23

32 Proximal algorithms: a very common assumption BISTA 24 May / 23

33 Proximal algorithms: a very common assumption f is globally Lipschitz continuous on C (or on X ), namely: there exists L(f ) 0 such that BISTA 24 May / 23

34 Proximal algorithms: a very common assumption f is globally Lipschitz continuous on C (or on X ), namely: there exists L(f ) 0 such that f (x) f (y) L(f ) x y, x, y C BISTA 24 May / 23

35 Proximal algorithms: a very common assumption f is globally Lipschitz continuous on C (or on X ), namely: there exists L(f ) 0 such that f (x) f (y) L(f ) x y, x, y C This assumption does not always hold, thus casting a limitation on many prox algorithms BISTA 24 May / 23

36 Our results: schematic description BISTA 24 May / 23

37 Our results: schematic description Introducing BISTA : BISTA 24 May / 23

38 Our results: schematic description Introducing BISTA : a new proximal gradient method BISTA 24 May / 23

39 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence BISTA 24 May / 23

40 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: BISTA 24 May / 23

41 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: We decompose C = k=1 S k BISTA 24 May / 23

42 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: We decompose C = k=1 S k Proximal iteration: on S k and not globally (on the whole C or X ); BISTA 24 May / 23

43 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: We decompose C = k=1 S k Proximal iteration: on S k and not globally (on the whole C or X ); Hence more flexibility, better chance that good properties are satisfied, e.g., that f will be Lipschitz continuous on S k (frequently all the S k are bounded) BISTA 24 May / 23

44 The origin of BISTA BISTA 24 May / 23

45 The origin of BISTA ISTA: BISTA 24 May / 23

46 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA 24 May / 23

47 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA significantly extends ISTA BISTA 24 May / 23

48 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA significantly extends ISTA BISTA = Bregmanian ISTA [or Brazilian ISTA :-) ] BISTA 24 May / 23

49 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA significantly extends ISTA BISTA = Bregmanian ISTA [or Brazilian ISTA :-) ] Preliminary version: in , during the postdoc in Brazil (I worked with Alvaro De Pierro) BISTA 24 May / 23

50 Our results: schematic description (Cont.) BISTA 24 May / 23

51 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) BISTA 24 May / 23

52 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X BISTA 24 May / 23

53 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) BISTA 24 May / 23

54 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) Several convergence results (under some assumptions), for example: BISTA 24 May / 23

55 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) Several convergence results (under some assumptions), for example: Non-asymptotic (in the function values) convergence rate of O(1/k) or a rate arbitrarily close to O(1/k), BISTA 24 May / 23

56 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) Several convergence results (under some assumptions), for example: Non-asymptotic (in the function values) convergence rate of O(1/k) or a rate arbitrarily close to O(1/k), Weak convergence BISTA 24 May / 23

57 Our results: schematic description (Cont.) BISTA 24 May / 23

58 Our results: schematic description (Cont.) Some auxiliary results of independent interest: BISTA 24 May / 23

59 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 BISTA 24 May / 23

60 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point BISTA 24 May / 23

61 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point A principle about the stability of the extreme values of a uniformly continuous function in a general metric space: a small perturbation of the set on which these values are computed yields a small change of these values BISTA 24 May / 23

62 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point A principle about the stability of the extreme values of a uniformly continuous function in a general metric space: a small perturbation of the set on which these values are computed yields a small change of these values (more details: in Slide 22) BISTA 24 May / 23

63 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point A principle about the stability of the extreme values of a uniformly continuous function in a general metric space: a small perturbation of the set on which these values are computed yields a small change of these values (more details: in Slide 22) This stability principle suggests a general scheme for tackling a wide class of non-convex and non-smooth optimization problems (details: in the paper) BISTA 24 May / 23

64 Bregman divergences (Bregman distances) BISTA 24 May / 23

65 Bregman divergences (Bregman distances) Let b : X (, ] BISTA 24 May / 23

66 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: BISTA 24 May / 23

67 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } BISTA 24 May / 23

68 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. BISTA 24 May / 23

69 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). BISTA 24 May / 23

70 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). Then the Bregman divergence associated with b is: BISTA 24 May / 23

71 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). Then the Bregman divergence associated with b is: { b(x) b(y) b B(x, y) := (y), x y, (x, y) dom(b) U, otherwise. BISTA 24 May / 23

72 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). Then the Bregman divergence associated with b is: { b(x) b(y) b B(x, y) := (y), x y, (x, y) dom(b) U, otherwise. Here b (y) X and b (y), x y := b (y)(x y) BISTA 24 May / 23

73 Strongly convex functions: reminder BISTA 24 May / 23

74 Strongly convex functions: reminder Suppose that: BISTA 24 May / 23

75 Strongly convex functions: reminder Suppose that: C is a convex subset BISTA 24 May / 23

76 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) BISTA 24 May / 23

77 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) BISTA 24 May / 23

78 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. BISTA 24 May / 23

79 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. In other words, b satisfies a stronger condition than convexity BISTA 24 May / 23

80 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. In other words, b satisfies a stronger condition than convexity Many well-known Bregman functions are strongly convex on bounded subsets of their effective domain, BISTA 24 May / 23

81 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. In other words, b satisfies a stronger condition than convexity Many well-known Bregman functions are strongly convex on bounded subsets of their effective domain,e.g., the negative Boltzmann-Gibbs-Shannon entropy b(x) = n k=1 x k log(x k ) BISTA 24 May / 23

82 A few assumptions BISTA 24 May / 23

83 A few assumptions C dom(b) BISTA 24 May / 23

84 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; BISTA 24 May / 23

85 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; g : C (, ]: convex, proper, lower semicontinuous BISTA 24 May / 23

86 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; g : C (, ]: convex, proper, lower semicontinuous F (x) := { f (x) + g(x), x C,, x / C. BISTA 24 May / 23

87 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; g : C (, ]: convex, proper, lower semicontinuous F (x) := { f (x) + g(x), x C,, x / C. MIN(F ) (the set of minimizers of F ) is nonempty and contained in U := Int(dom(b)) BISTA 24 May / 23

88 A few assumptions (Cont.) BISTA 24 May / 23

89 A few assumptions (Cont.) C = k=1 S k (a core new contribution) BISTA 24 May / 23

90 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: BISTA 24 May / 23

91 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 BISTA 24 May / 23

92 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 b is strongly convex on S k with µ k > 0 and µ k µ k+1 BISTA 24 May / 23

93 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 b is strongly convex on S k with µ k > 0 and µ k µ k+1 g is proper on S k BISTA 24 May / 23

94 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 b is strongly convex on S k with µ k > 0 and µ k µ k+1 g is proper on S k f is Lipschitz continuous on S k U with a constant L(f, S k U) BISTA 24 May / 23

95 A few assumptions (Cont.) BISTA 24 May / 23

96 A few assumptions (Cont.) Given L k > 0, let BISTA 24 May / 23

97 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. BISTA 24 May / 23

98 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U BISTA 24 May / 23

99 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U Lemma: BISTA 24 May / 23

100 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U Lemma: p Lk,µ k,s k (y) exists and is unique for each y S k U Assumption: BISTA 24 May / 23

101 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U Lemma: p Lk,µ k,s k (y) exists and is unique for each y S k U Assumption: p Lk,µ k,s k (y) S k U for each k BISTA 24 May / 23

102 Remark on the assumptions BISTA 24 May / 23

103 Remark on the assumptions These assumptions occur frequently in applications (details: in the paper) BISTA 24 May / 23

104 BISTA (the Lipschitz step size version) BISTA 24 May / 23

105 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). BISTA 24 May / 23

106 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. BISTA 24 May / 23

107 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. Step k, k 2: L k is arbitrary such that L k max{l k 1, L(f, S k U)}. BISTA 24 May / 23

108 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. Step k, k 2: L k is arbitrary such that L k max{l k 1, L(f, S k U)}. Given µ k > 0 (a parameter of strong convexity of b on S k ), let BISTA 24 May / 23

109 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. Step k, k 2: L k is arbitrary such that L k max{l k 1, L(f, S k U)}. Given µ k > 0 (a parameter of strong convexity of b on S k ), let x k := p Lk,µ k,s k (x k 1 ) BISTA 24 May / 23

110 A few relevant works BISTA 24 May / 23

111 A few relevant works Beck-Teboulle 2009, BISTA 24 May / 23

112 A few relevant works Beck-Teboulle 2009, Cohen 1980, BISTA 24 May / 23

113 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), BISTA 24 May / 23

114 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, BISTA 24 May / 23

115 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, BISTA 24 May / 23

116 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, BISTA 24 May / 23

117 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 BISTA 24 May / 23

118 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in BISTA 24 May / 23

119 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or BISTA 24 May / 23

120 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or the method, and/or BISTA 24 May / 23

121 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or the method, and/or the results BISTA 24 May / 23

122 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or the method, and/or the results More details: in the paper. BISTA 24 May / 23

123 The convergence results BISTA 24 May / 23

124 The convergence results Omitted, due to lack of time (but short details about them can be found in Slide 9). BISTA 24 May / 23

125 The convergence results Omitted, due to lack of time (but short details about them can be found in Slide 9). Details about them: in the paper BISTA 24 May / 23

126 The convergence results Omitted, due to lack of time (but short details about them can be found in Slide 9). Details about them: in the paper An illustrating example: next slide BISTA 24 May / 23

127 Example: l p -l 1 minimization BISTA 24 May / 23

128 Example: l p -l 1 minimization Goal: to estimate BISTA 24 May / 23

129 Example: l p -l 1 minimization Goal: to estimate where: inf x X 1 p Ax z p p + λ x 1 }{{}, }{{} g f BISTA 24 May / 23

130 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f BISTA 24 May / 23

131 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, BISTA 24 May / 23

132 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, BISTA 24 May / 23

133 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, A : X R m is a given linear operator, BISTA 24 May / 23

134 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, A : X R m is a given linear operator, z R m given, BISTA 24 May / 23

135 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, A : X R m is a given linear operator, z R m given, λ > 0 is given BISTA 24 May / 23

136 Example: l p -l 1 minimization (Cont.) BISTA 24 May / 23

137 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 BISTA 24 May / 23

138 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods BISTA 24 May / 23

139 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 BISTA 24 May / 23

140 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) BISTA 24 May / 23

141 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 BISTA 24 May / 23

142 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) BISTA 24 May / 23

143 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) The convergence theorem ensures that x k converges to an optimal solution BISTA 24 May / 23

144 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) The convergence theorem ensures that x k converges to an optimal solution ( non-asymptotic rate of convergence is O, ) 1 k 1 γ(p 2) BISTA 24 May / 23

145 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) The convergence theorem ensures that x k converges to an optimal solution ( 1 non-asymptotic rate of convergence is O, namely, can be k 1 γ(p 2) ) arbitrarily close to O(1/k) since γ can be arbitrary small BISTA 24 May / 23

146 Stability of the extreme values (intuitive formulation) BISTA 24 May / 23

147 Stability of the extreme values (intuitive formulation) Principle Consider an arbitrary uniformly continuous function f : X R defined on a metric space X. Then its extreme (optimal) values depend continuously on the subset over which they are computed. BISTA 24 May / 23

148 Stability of the extreme values (intuitive formulation) Principle Consider an arbitrary uniformly continuous function f : X R defined on a metric space X. Then its extreme (optimal) values depend continuously on the subset over which they are computed. In other words, if we compute the extreme values of f over some subset C X and we slightly change C to C, then the extreme values change slightly, namely inf f (x ) inf f (x) and sup f (x ) sup x C x C x C x C f (x). BISTA 24 May / 23

149 Stability of the extreme values (intuitive formulation) Principle Consider an arbitrary uniformly continuous function f : X R defined on a metric space X. Then its extreme (optimal) values depend continuously on the subset over which they are computed. In other words, if we compute the extreme values of f over some subset C X and we slightly change C to C, then the extreme values change slightly, namely inf f (x ) inf f (x) and sup f (x ) sup x C x C x C x C f (x). Remark Uniform continuity is essential: there are counterexamples to the stability principle when the considered function is not uniformly continuous. BISTA 24 May / 23

150 The End P.S. The slides and the paper can be found online: BISTA 24 May / 23

151 The End P.S. The slides and the paper can be found online: Slides: BISTA 24 May / 23

152 The End P.S. The slides and the paper can be found online: Slides: Paper: BISTA 24 May / 23

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces Existence and Approximation of Fixed Points of in Reflexive Banach Spaces Department of Mathematics The Technion Israel Institute of Technology Haifa 22.07.2010 Joint work with Prof. Simeon Reich General

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE

WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE Fixed Point Theory, Volume 6, No. 1, 2005, 59-69 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.htm WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE YASUNORI KIMURA Department

More information

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Auxiliary-Function Methods in Optimization

Auxiliary-Function Methods in Optimization Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Hessian Riemannian Gradient Flows in Convex Programming

Hessian Riemannian Gradient Flows in Convex Programming Hessian Riemannian Gradient Flows in Convex Programming Felipe Alvarez, Jérôme Bolte, Olivier Brahic INTERNATIONAL CONFERENCE ON MODELING AND OPTIMIZATION MODOPT 2004 Universidad de La Frontera, Temuco,

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Mathematical methods for Image Processing

Mathematical methods for Image Processing Mathematical methods for Image Processing François Malgouyres Institut de Mathématiques de Toulouse, France invitation by Jidesh P., NITK Surathkal funding Global Initiative on Academic Network Oct. 23

More information

arxiv: v3 [math.oc] 17 Dec 2017 Received: date / Accepted: date

arxiv: v3 [math.oc] 17 Dec 2017 Received: date / Accepted: date Noname manuscript No. (will be inserted by the editor) A Simple Convergence Analysis of Bregman Proximal Gradient Algorithm Yi Zhou Yingbin Liang Lixin Shen arxiv:1503.05601v3 [math.oc] 17 Dec 2017 Received:

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

From error bounds to the complexity of first-order descent methods for convex functions

From error bounds to the complexity of first-order descent methods for convex functions From error bounds to the complexity of first-order descent methods for convex functions Nguyen Trong Phong-TSE Joint work with Jérôme Bolte, Juan Peypouquet, Bruce Suter. Toulouse, 23-25, March, 2016 Journées

More information

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France

More information

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract

More information

On Total Convexity, Bregman Projections and Stability in Banach Spaces

On Total Convexity, Bregman Projections and Stability in Banach Spaces Journal of Convex Analysis Volume 11 (2004), No. 1, 1 16 On Total Convexity, Bregman Projections and Stability in Banach Spaces Elena Resmerita Department of Mathematics, University of Haifa, 31905 Haifa,

More information

Convergence Theorems for Bregman Strongly Nonexpansive Mappings in Reflexive Banach Spaces

Convergence Theorems for Bregman Strongly Nonexpansive Mappings in Reflexive Banach Spaces Filomat 28:7 (2014), 1525 1536 DOI 10.2298/FIL1407525Z Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Convergence Theorems for

More information

BREGMAN DISTANCES, TOTALLY

BREGMAN DISTANCES, TOTALLY BREGMAN DISTANCES, TOTALLY CONVEX FUNCTIONS AND A METHOD FOR SOLVING OPERATOR EQUATIONS IN BANACH SPACES DAN BUTNARIU AND ELENA RESMERITA January 18, 2005 Abstract The aim of this paper is twofold. First,

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State

More information

Auxiliary-Function Methods in Iterative Optimization

Auxiliary-Function Methods in Iterative Optimization Auxiliary-Function Methods in Iterative Optimization Charles L. Byrne April 6, 2015 Abstract Let C X be a nonempty subset of an arbitrary set X and f : X R. The problem is to minimize f over C. In auxiliary-function

More information

A convergence result for an Outer Approximation Scheme

A convergence result for an Outer Approximation Scheme A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento

More information

Sequential convex programming,: value function and convergence

Sequential convex programming,: value function and convergence Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

On the Minimization Over Sparse Symmetric Sets: Projections, O. Projections, Optimality Conditions and Algorithms

On the Minimization Over Sparse Symmetric Sets: Projections, O. Projections, Optimality Conditions and Algorithms On the Minimization Over Sparse Symmetric Sets: Projections, Optimality Conditions and Algorithms Amir Beck Technion - Israel Institute of Technology Haifa, Israel Based on joint work with Nadav Hallak

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Convergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces

Convergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces Applied Mathematical Sciences, Vol. 6, 212, no. 63, 319-3117 Convergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces Nguyen Buong Vietnamese

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture 3 Scribe: Mina Karzand Oct., 05 Previously, we analyzed the convergence of the projected gradient descent algorithm. We proved

More information

Shih-sen Chang, Yeol Je Cho, and Haiyun Zhou

Shih-sen Chang, Yeol Je Cho, and Haiyun Zhou J. Korean Math. Soc. 38 (2001), No. 6, pp. 1245 1260 DEMI-CLOSED PRINCIPLE AND WEAK CONVERGENCE PROBLEMS FOR ASYMPTOTICALLY NONEXPANSIVE MAPPINGS Shih-sen Chang, Yeol Je Cho, and Haiyun Zhou Abstract.

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

A Dykstra-like algorithm for two monotone operators

A Dykstra-like algorithm for two monotone operators A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct

More information

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent

Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order

More information

ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES

ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES U.P.B. Sci. Bull., Series A, Vol. 80, Iss. 3, 2018 ISSN 1223-7027 ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES Vahid Dadashi 1 In this paper, we introduce a hybrid projection algorithm for a countable

More information

A First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems

A First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems A First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems Amir Beck and Shoham Sabach July 6, 2011 Abstract We consider a general class of convex optimization problems

More information

First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems

First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems Jérôme Bolte Shoham Sabach Marc Teboulle Yakov Vaisbourd June 20, 2017 Abstract We

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

Continuity of convex functions in normed spaces

Continuity of convex functions in normed spaces Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

MOSCO STABILITY OF PROXIMAL MAPPINGS IN REFLEXIVE BANACH SPACES

MOSCO STABILITY OF PROXIMAL MAPPINGS IN REFLEXIVE BANACH SPACES MOSCO STABILITY OF PROXIMAL MAPPINGS IN REFLEXIVE BANACH SPACES Dan Butnariu and Elena Resmerita Abstract. In this paper we establish criteria for the stability of the proximal mapping Prox f ϕ =( ϕ+ f)

More information

Primal and Dual Predicted Decrease Approximation Methods

Primal and Dual Predicted Decrease Approximation Methods Primal and Dual Predicted Decrease Approximation Methods Amir Beck Edouard Pauwels Shoham Sabach March 22, 2017 Abstract We introduce the notion of predicted decrease approximation (PDA) for constrained

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

On nonexpansive and accretive operators in Banach spaces

On nonexpansive and accretive operators in Banach spaces Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 3437 3446 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa On nonexpansive and accretive

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

THROUGHOUT this paper, we let C be a nonempty

THROUGHOUT this paper, we let C be a nonempty Strong Convergence Theorems of Multivalued Nonexpansive Mappings and Maximal Monotone Operators in Banach Spaces Kriengsak Wattanawitoon, Uamporn Witthayarat and Poom Kumam Abstract In this paper, we prove

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Belief Propagation, Information Projections, and Dykstra s Algorithm

Belief Propagation, Information Projections, and Dykstra s Algorithm Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu

More information

Convex Sets. Prof. Dan A. Simovici UMB

Convex Sets. Prof. Dan A. Simovici UMB Convex Sets Prof. Dan A. Simovici UMB 1 / 57 Outline 1 Closures, Interiors, Borders of Sets in R n 2 Segments and Convex Sets 3 Properties of the Class of Convex Sets 4 Closure and Interior Points of Convex

More information

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities Caihua Chen Xiaoling Fu Bingsheng He Xiaoming Yuan January 13, 2015 Abstract. Projection type methods

More information

Analysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t

Analysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t Analysis Comprehensive Exam Questions Fall 2. Let f L 2 (, ) be given. (a) Prove that ( x 2 f(t) dt) 2 x x t f(t) 2 dt. (b) Given part (a), prove that F L 2 (, ) 2 f L 2 (, ), where F(x) = x (a) Using

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Generalized Monotonicities and Its Applications to the System of General Variational Inequalities

Generalized Monotonicities and Its Applications to the System of General Variational Inequalities Generalized Monotonicities and Its Applications to the System of General Variational Inequalities Khushbu 1, Zubair Khan 2 Research Scholar, Department of Mathematics, Integral University, Lucknow, Uttar

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

A proximal-like algorithm for a class of nonconvex programming

A proximal-like algorithm for a class of nonconvex programming Pacific Journal of Optimization, vol. 4, pp. 319-333, 2008 A proximal-like algorithm for a class of nonconvex programming Jein-Shan Chen 1 Department of Mathematics National Taiwan Normal University Taipei,

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Viscosity Iterative Approximating the Common Fixed Points of Non-expansive Semigroups in Banach Spaces

Viscosity Iterative Approximating the Common Fixed Points of Non-expansive Semigroups in Banach Spaces Viscosity Iterative Approximating the Common Fixed Points of Non-expansive Semigroups in Banach Spaces YUAN-HENG WANG Zhejiang Normal University Department of Mathematics Yingbing Road 688, 321004 Jinhua

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

An inertial forward-backward method for solving vector optimization problems

An inertial forward-backward method for solving vector optimization problems An inertial forward-backward method for solving vector optimization problems Sorin-Mihai Grad Chemnitz University of Technology www.tu-chemnitz.de/ gsor research supported by the DFG project GR 3367/4-1

More information

CONVERGENCE TO FIXED POINTS OF INEXACT ORBITS FOR BREGMAN-MONOTONE AND FOR NONEXPANSIVE OPERATORS IN BANACH SPACES

CONVERGENCE TO FIXED POINTS OF INEXACT ORBITS FOR BREGMAN-MONOTONE AND FOR NONEXPANSIVE OPERATORS IN BANACH SPACES CONVERGENCE TO FIXED POINTS OF INEXACT ORBITS FOR BREGMAN-MONOTONE AND FOR NONEXPANSIVE OPERATORS IN BANACH SPACES Dan Butnariu, Simeon Reich, and Alexander J. Zaslavski Abstract. Fixed points of Bregman-monotone

More information

Two-Step Iteration Scheme for Nonexpansive Mappings in Banach Space

Two-Step Iteration Scheme for Nonexpansive Mappings in Banach Space Mathematica Moravica Vol. 19-1 (2015), 95 105 Two-Step Iteration Scheme for Nonexpansive Mappings in Banach Space M.R. Yadav Abstract. In this paper, we introduce a new two-step iteration process to approximate

More information

Problem Set 6: Solutions Math 201A: Fall a n x n,

Problem Set 6: Solutions Math 201A: Fall a n x n, Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series

More information

A characterization of essentially strictly convex functions on reflexive Banach spaces

A characterization of essentially strictly convex functions on reflexive Banach spaces A characterization of essentially strictly convex functions on reflexive Banach spaces Michel Volle Département de Mathématiques Université d Avignon et des Pays de Vaucluse 74, rue Louis Pasteur 84029

More information

Lecture Notes on Iterative Optimization Algorithms

Lecture Notes on Iterative Optimization Algorithms Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell December 8, 2014 Lecture Notes on Iterative Optimization Algorithms Contents Preface vii 1 Overview and Examples

More information

A Note on the Class of Superreflexive Almost Transitive Banach Spaces

A Note on the Class of Superreflexive Almost Transitive Banach Spaces E extracta mathematicae Vol. 23, Núm. 1, 1 6 (2008) A Note on the Class of Superreflexive Almost Transitive Banach Spaces Jarno Talponen University of Helsinki, Department of Mathematics and Statistics,

More information

Research Article Hybrid Algorithm of Fixed Point for Weak Relatively Nonexpansive Multivalued Mappings and Applications

Research Article Hybrid Algorithm of Fixed Point for Weak Relatively Nonexpansive Multivalued Mappings and Applications Abstract and Applied Analysis Volume 2012, Article ID 479438, 13 pages doi:10.1155/2012/479438 Research Article Hybrid Algorithm of Fixed Point for Weak Relatively Nonexpansive Multivalued Mappings and

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm

More information

An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1

An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1 An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1 Nobuo Yamashita 2, Christian Kanzow 3, Tomoyui Morimoto 2, and Masao Fuushima 2 2 Department of Applied

More information

Projection and proximal point methods: convergence results and counterexamples

Projection and proximal point methods: convergence results and counterexamples Projection and proximal point methods: convergence results and counterexamples Heinz H. Bauschke, Eva Matoušková, and Simeon Reich June 14, 2003 Version 1.15 Abstract Recently, Hundal has constructed a

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium

More information

Penalized Barycenters in the Wasserstein space

Penalized Barycenters in the Wasserstein space Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

On the split equality common fixed point problem for quasi-nonexpansive multi-valued mappings in Banach spaces

On the split equality common fixed point problem for quasi-nonexpansive multi-valued mappings in Banach spaces Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 (06), 5536 5543 Research Article On the split equality common fixed point problem for quasi-nonexpansive multi-valued mappings in Banach spaces

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

An Optimal Affine Invariant Smooth Minimization Algorithm.

An Optimal Affine Invariant Smooth Minimization Algorithm. An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,

More information

Primal and dual predicted decrease approximation methods

Primal and dual predicted decrease approximation methods Math. Program., Ser. B DOI 10.1007/s10107-017-1108-9 FULL LENGTH PAPER Primal and dual predicted decrease approximation methods Amir Beck 1 Edouard Pauwels 2 Shoham Sabach 1 Received: 11 November 2015

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

On robustness of the regularity property of maps

On robustness of the regularity property of maps Control and Cybernetics vol. 32 (2003) No. 3 On robustness of the regularity property of maps by Alexander D. Ioffe Department of Mathematics, Technion, Haifa 32000, Israel Abstract: The problem considered

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI. Received December 14, 1999

ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI. Received December 14, 1999 Scientiae Mathematicae Vol. 3, No. 1(2000), 107 115 107 ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI Received December 14, 1999

More information

A FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS

A FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS Fixed Point Theory, (0), No., 4-46 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html A FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS A. ABKAR AND M. ESLAMIAN Department of Mathematics,

More information

Compact operators on Banach spaces

Compact operators on Banach spaces Compact operators on Banach spaces Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto November 12, 2017 1 Introduction In this note I prove several things about compact

More information

The resolvent average of monotone operators: dominant and recessive properties

The resolvent average of monotone operators: dominant and recessive properties The resolvent average of monotone operators: dominant and recessive properties Sedi Bartz, Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang September 30, 2015 (first revision) December 22, 2015 (second

More information

Derivatives. Differentiability problems in Banach spaces. Existence of derivatives. Sharpness of Lebesgue s result

Derivatives. Differentiability problems in Banach spaces. Existence of derivatives. Sharpness of Lebesgue s result Differentiability problems in Banach spaces David Preiss 1 Expanded notes of a talk based on a nearly finished research monograph Fréchet differentiability of Lipschitz functions and porous sets in Banach

More information