BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption

Size: px

Start display at page:

Download "BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption"

Hannah Harrison
5 years ago
Views:

1 BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption Daniel Reem (joint work with Simeon Reich and Alvaro De Pierro) Department of Mathematics, The Technion, Haifa, Israel dream@technion.ac.il The 2018 annual meeting of the Israel Mathematical Union (IMU 2018), The Technion, Haifa, Israel 24 May 2018 (20 25 minutes) BISTA 24 May / 23

2 Minimization of a separable (decomposable) function BISTA 24 May / 23

3 Minimization of a separable (decomposable) function Goal: to estimate BISTA 24 May / 23

4 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} BISTA 24 May / 23

5 Minimization of a separable (decomposable) function Goal: to estimate where inf{f (x) : x C} BISTA 24 May / 23

6 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} where F = f + g BISTA 24 May / 23

7 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} where F = f + g C is a closed and convex subset of some space X, say R n BISTA 24 May / 23

8 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} where F = f + g C is a closed and convex subset of some space X, say R n f is smooth, g possibly not, both are convex. BISTA 24 May / 23

9 Motivation BISTA 24 May / 23

10 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: BISTA 24 May / 23

11 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems BISTA 24 May / 23

12 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing BISTA 24 May / 23

13 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing compressed sensing BISTA 24 May / 23

14 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing compressed sensing machine learning BISTA 24 May / 23

15 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing compressed sensing machine learning more BISTA 24 May / 23

16 A typical scenario: l 1 regularization in signal processing BISTA 24 May / 23

17 A typical scenario: l 1 regularization in signal processing Here BISTA 24 May / 23

18 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g BISTA 24 May / 23

19 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. BISTA 24 May / 23

20 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. BISTA 24 May / 23

21 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. m and n are large positive integers. BISTA 24 May / 23

22 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. m and n are large positive integers. b R m is known. BISTA 24 May / 23

23 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. m and n are large positive integers. b R m is known. λ > 0 is a regularization parameter BISTA 24 May / 23

24 The proximal gradient method BISTA 24 May / 23

25 The proximal gradient method Among the methods used for solving the minimization problem. BISTA 24 May / 23

26 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: BISTA 24 May / 23

27 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given BISTA 24 May / 23

28 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 BISTA 24 May / 23

29 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given A more general form: x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 x k := argmin x C (F k (x) + c k B(x, x k 1 )), k 2. BISTA 24 May / 23

30 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given A more general form: x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 x k := argmin x C (F k (x) + c k B(x, x k 1 )), k 2. Here c k > 0 and F k is an approximation to F = f + g, e.g., F k (x) := f (x k 1 ) + f (x k 1 ), x x k 1 + g(x). BISTA 24 May / 23

31 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given A more general form: x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 x k := argmin x C (F k (x) + c k B(x, x k 1 )), k 2. Here c k > 0 and F k is an approximation to F = f + g, e.g., F k (x) := f (x k 1 ) + f (x k 1 ), x x k 1 + g(x). B is a Bregman divergence induced by some function b (more on that: later) BISTA 24 May / 23

32 Proximal algorithms: a very common assumption BISTA 24 May / 23

33 Proximal algorithms: a very common assumption f is globally Lipschitz continuous on C (or on X ), namely: there exists L(f ) 0 such that BISTA 24 May / 23

34 Proximal algorithms: a very common assumption f is globally Lipschitz continuous on C (or on X ), namely: there exists L(f ) 0 such that f (x) f (y) L(f ) x y, x, y C BISTA 24 May / 23

35 Proximal algorithms: a very common assumption f is globally Lipschitz continuous on C (or on X ), namely: there exists L(f ) 0 such that f (x) f (y) L(f ) x y, x, y C This assumption does not always hold, thus casting a limitation on many prox algorithms BISTA 24 May / 23

36 Our results: schematic description BISTA 24 May / 23

37 Our results: schematic description Introducing BISTA : BISTA 24 May / 23

38 Our results: schematic description Introducing BISTA : a new proximal gradient method BISTA 24 May / 23

39 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence BISTA 24 May / 23

40 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: BISTA 24 May / 23

41 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: We decompose C = k=1 S k BISTA 24 May / 23

42 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: We decompose C = k=1 S k Proximal iteration: on S k and not globally (on the whole C or X ); BISTA 24 May / 23

43 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: We decompose C = k=1 S k Proximal iteration: on S k and not globally (on the whole C or X ); Hence more flexibility, better chance that good properties are satisfied, e.g., that f will be Lipschitz continuous on S k (frequently all the S k are bounded) BISTA 24 May / 23

44 The origin of BISTA BISTA 24 May / 23

45 The origin of BISTA ISTA: BISTA 24 May / 23

46 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA 24 May / 23

47 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA significantly extends ISTA BISTA 24 May / 23

48 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA significantly extends ISTA BISTA = Bregmanian ISTA [or Brazilian ISTA :-) ] BISTA 24 May / 23

49 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA significantly extends ISTA BISTA = Bregmanian ISTA [or Brazilian ISTA :-) ] Preliminary version: in , during the postdoc in Brazil (I worked with Alvaro De Pierro) BISTA 24 May / 23

50 Our results: schematic description (Cont.) BISTA 24 May / 23

51 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) BISTA 24 May / 23

52 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X BISTA 24 May / 23

53 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) BISTA 24 May / 23

54 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) Several convergence results (under some assumptions), for example: BISTA 24 May / 23

55 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) Several convergence results (under some assumptions), for example: Non-asymptotic (in the function values) convergence rate of O(1/k) or a rate arbitrarily close to O(1/k), BISTA 24 May / 23

56 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) Several convergence results (under some assumptions), for example: Non-asymptotic (in the function values) convergence rate of O(1/k) or a rate arbitrarily close to O(1/k), Weak convergence BISTA 24 May / 23

57 Our results: schematic description (Cont.) BISTA 24 May / 23

58 Our results: schematic description (Cont.) Some auxiliary results of independent interest: BISTA 24 May / 23

59 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 BISTA 24 May / 23

60 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point BISTA 24 May / 23

61 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point A principle about the stability of the extreme values of a uniformly continuous function in a general metric space: a small perturbation of the set on which these values are computed yields a small change of these values BISTA 24 May / 23

62 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point A principle about the stability of the extreme values of a uniformly continuous function in a general metric space: a small perturbation of the set on which these values are computed yields a small change of these values (more details: in Slide 22) BISTA 24 May / 23

63 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point A principle about the stability of the extreme values of a uniformly continuous function in a general metric space: a small perturbation of the set on which these values are computed yields a small change of these values (more details: in Slide 22) This stability principle suggests a general scheme for tackling a wide class of non-convex and non-smooth optimization problems (details: in the paper) BISTA 24 May / 23

64 Bregman divergences (Bregman distances) BISTA 24 May / 23

65 Bregman divergences (Bregman distances) Let b : X (, ] BISTA 24 May / 23

66 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: BISTA 24 May / 23

67 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } BISTA 24 May / 23

68 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. BISTA 24 May / 23

69 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). BISTA 24 May / 23

70 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). Then the Bregman divergence associated with b is: BISTA 24 May / 23

71 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). Then the Bregman divergence associated with b is: { b(x) b(y) b B(x, y) := (y), x y, (x, y) dom(b) U, otherwise. BISTA 24 May / 23

72 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). Then the Bregman divergence associated with b is: { b(x) b(y) b B(x, y) := (y), x y, (x, y) dom(b) U, otherwise. Here b (y) X and b (y), x y := b (y)(x y) BISTA 24 May / 23

73 Strongly convex functions: reminder BISTA 24 May / 23

74 Strongly convex functions: reminder Suppose that: BISTA 24 May / 23

75 Strongly convex functions: reminder Suppose that: C is a convex subset BISTA 24 May / 23

76 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) BISTA 24 May / 23

77 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) BISTA 24 May / 23

78 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. BISTA 24 May / 23

79 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. In other words, b satisfies a stronger condition than convexity BISTA 24 May / 23

80 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. In other words, b satisfies a stronger condition than convexity Many well-known Bregman functions are strongly convex on bounded subsets of their effective domain, BISTA 24 May / 23

81 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. In other words, b satisfies a stronger condition than convexity Many well-known Bregman functions are strongly convex on bounded subsets of their effective domain,e.g., the negative Boltzmann-Gibbs-Shannon entropy b(x) = n k=1 x k log(x k ) BISTA 24 May / 23

82 A few assumptions BISTA 24 May / 23

83 A few assumptions C dom(b) BISTA 24 May / 23

84 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; BISTA 24 May / 23

85 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; g : C (, ]: convex, proper, lower semicontinuous BISTA 24 May / 23

86 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; g : C (, ]: convex, proper, lower semicontinuous F (x) := { f (x) + g(x), x C,, x / C. BISTA 24 May / 23

87 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; g : C (, ]: convex, proper, lower semicontinuous F (x) := { f (x) + g(x), x C,, x / C. MIN(F ) (the set of minimizers of F ) is nonempty and contained in U := Int(dom(b)) BISTA 24 May / 23

88 A few assumptions (Cont.) BISTA 24 May / 23

89 A few assumptions (Cont.) C = k=1 S k (a core new contribution) BISTA 24 May / 23

90 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: BISTA 24 May / 23

91 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 BISTA 24 May / 23

92 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 b is strongly convex on S k with µ k > 0 and µ k µ k+1 BISTA 24 May / 23

93 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 b is strongly convex on S k with µ k > 0 and µ k µ k+1 g is proper on S k BISTA 24 May / 23

94 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 b is strongly convex on S k with µ k > 0 and µ k µ k+1 g is proper on S k f is Lipschitz continuous on S k U with a constant L(f, S k U) BISTA 24 May / 23

95 A few assumptions (Cont.) BISTA 24 May / 23

96 A few assumptions (Cont.) Given L k > 0, let BISTA 24 May / 23

97 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. BISTA 24 May / 23

98 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U BISTA 24 May / 23

99 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U Lemma: BISTA 24 May / 23

100 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U Lemma: p Lk,µ k,s k (y) exists and is unique for each y S k U Assumption: BISTA 24 May / 23

101 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U Lemma: p Lk,µ k,s k (y) exists and is unique for each y S k U Assumption: p Lk,µ k,s k (y) S k U for each k BISTA 24 May / 23

102 Remark on the assumptions BISTA 24 May / 23

103 Remark on the assumptions These assumptions occur frequently in applications (details: in the paper) BISTA 24 May / 23

104 BISTA (the Lipschitz step size version) BISTA 24 May / 23

105 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). BISTA 24 May / 23

106 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. BISTA 24 May / 23

107 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. Step k, k 2: L k is arbitrary such that L k max{l k 1, L(f, S k U)}. BISTA 24 May / 23

108 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. Step k, k 2: L k is arbitrary such that L k max{l k 1, L(f, S k U)}. Given µ k > 0 (a parameter of strong convexity of b on S k ), let BISTA 24 May / 23

109 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. Step k, k 2: L k is arbitrary such that L k max{l k 1, L(f, S k U)}. Given µ k > 0 (a parameter of strong convexity of b on S k ), let x k := p Lk,µ k,s k (x k 1 ) BISTA 24 May / 23

110 A few relevant works BISTA 24 May / 23

111 A few relevant works Beck-Teboulle 2009, BISTA 24 May / 23

112 A few relevant works Beck-Teboulle 2009, Cohen 1980, BISTA 24 May / 23

113 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), BISTA 24 May / 23

114 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, BISTA 24 May / 23

115 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, BISTA 24 May / 23

116 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, BISTA 24 May / 23

117 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 BISTA 24 May / 23

118 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in BISTA 24 May / 23

119 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or BISTA 24 May / 23

120 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or the method, and/or BISTA 24 May / 23

121 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or the method, and/or the results BISTA 24 May / 23

122 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or the method, and/or the results More details: in the paper. BISTA 24 May / 23

123 The convergence results BISTA 24 May / 23

124 The convergence results Omitted, due to lack of time (but short details about them can be found in Slide 9). BISTA 24 May / 23

125 The convergence results Omitted, due to lack of time (but short details about them can be found in Slide 9). Details about them: in the paper BISTA 24 May / 23

126 The convergence results Omitted, due to lack of time (but short details about them can be found in Slide 9). Details about them: in the paper An illustrating example: next slide BISTA 24 May / 23

127 Example: l p -l 1 minimization BISTA 24 May / 23

128 Example: l p -l 1 minimization Goal: to estimate BISTA 24 May / 23

129 Example: l p -l 1 minimization Goal: to estimate where: inf x X 1 p Ax z p p + λ x 1 }{{}, }{{} g f BISTA 24 May / 23

130 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f BISTA 24 May / 23

131 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, BISTA 24 May / 23

132 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, BISTA 24 May / 23

133 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, A : X R m is a given linear operator, BISTA 24 May / 23

134 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, A : X R m is a given linear operator, z R m given, BISTA 24 May / 23

135 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, A : X R m is a given linear operator, z R m given, λ > 0 is given BISTA 24 May / 23

136 Example: l p -l 1 minimization (Cont.) BISTA 24 May / 23

137 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 BISTA 24 May / 23

138 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods BISTA 24 May / 23

139 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 BISTA 24 May / 23

140 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) BISTA 24 May / 23

141 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 BISTA 24 May / 23

142 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) BISTA 24 May / 23

143 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) The convergence theorem ensures that x k converges to an optimal solution BISTA 24 May / 23

144 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) The convergence theorem ensures that x k converges to an optimal solution ( non-asymptotic rate of convergence is O, ) 1 k 1 γ(p 2) BISTA 24 May / 23

145 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) The convergence theorem ensures that x k converges to an optimal solution ( 1 non-asymptotic rate of convergence is O, namely, can be k 1 γ(p 2) ) arbitrarily close to O(1/k) since γ can be arbitrary small BISTA 24 May / 23

146 Stability of the extreme values (intuitive formulation) BISTA 24 May / 23

147 Stability of the extreme values (intuitive formulation) Principle Consider an arbitrary uniformly continuous function f : X R defined on a metric space X. Then its extreme (optimal) values depend continuously on the subset over which they are computed. BISTA 24 May / 23

148 Stability of the extreme values (intuitive formulation) Principle Consider an arbitrary uniformly continuous function f : X R defined on a metric space X. Then its extreme (optimal) values depend continuously on the subset over which they are computed. In other words, if we compute the extreme values of f over some subset C X and we slightly change C to C, then the extreme values change slightly, namely inf f (x ) inf f (x) and sup f (x ) sup x C x C x C x C f (x). BISTA 24 May / 23

149 Stability of the extreme values (intuitive formulation) Principle Consider an arbitrary uniformly continuous function f : X R defined on a metric space X. Then its extreme (optimal) values depend continuously on the subset over which they are computed. In other words, if we compute the extreme values of f over some subset C X and we slightly change C to C, then the extreme values change slightly, namely inf f (x ) inf f (x) and sup f (x ) sup x C x C x C x C f (x). Remark Uniform continuity is essential: there are counterexamples to the stability principle when the considered function is not uniformly continuous. BISTA 24 May / 23

150 The End P.S. The slides and the paper can be found online: BISTA 24 May / 23

151 The End P.S. The slides and the paper can be found online: Slides: BISTA 24 May / 23

152 The End P.S. The slides and the paper can be found online: Slides: Paper: BISTA 24 May / 23

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces

Existence and Approximation of Fixed Points of in Reflexive Banach Spaces Department of Mathematics The Technion Israel Institute of Technology Haifa 22.07.2010 Joint work with Prof. Simeon Reich General