BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption
|
|
- Hannah Harrison
- 5 years ago
- Views:
Transcription
1 BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption Daniel Reem (joint work with Simeon Reich and Alvaro De Pierro) Department of Mathematics, The Technion, Haifa, Israel dream@technion.ac.il The 2018 annual meeting of the Israel Mathematical Union (IMU 2018), The Technion, Haifa, Israel 24 May 2018 (20 25 minutes) BISTA 24 May / 23
2 Minimization of a separable (decomposable) function BISTA 24 May / 23
3 Minimization of a separable (decomposable) function Goal: to estimate BISTA 24 May / 23
4 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} BISTA 24 May / 23
5 Minimization of a separable (decomposable) function Goal: to estimate where inf{f (x) : x C} BISTA 24 May / 23
6 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} where F = f + g BISTA 24 May / 23
7 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} where F = f + g C is a closed and convex subset of some space X, say R n BISTA 24 May / 23
8 Minimization of a separable (decomposable) function Goal: to estimate inf{f (x) : x C} where F = f + g C is a closed and convex subset of some space X, say R n f is smooth, g possibly not, both are convex. BISTA 24 May / 23
9 Motivation BISTA 24 May / 23
10 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: BISTA 24 May / 23
11 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems BISTA 24 May / 23
12 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing BISTA 24 May / 23
13 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing compressed sensing BISTA 24 May / 23
14 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing compressed sensing machine learning BISTA 24 May / 23
15 Motivation Such a minimization problem appears in the solution process of many theoretical and practical problems, including: inverse problems image processing compressed sensing machine learning more BISTA 24 May / 23
16 A typical scenario: l 1 regularization in signal processing BISTA 24 May / 23
17 A typical scenario: l 1 regularization in signal processing Here BISTA 24 May / 23
18 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g BISTA 24 May / 23
19 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. BISTA 24 May / 23
20 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. BISTA 24 May / 23
21 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. m and n are large positive integers. BISTA 24 May / 23
22 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. m and n are large positive integers. b R m is known. BISTA 24 May / 23
23 A typical scenario: l 1 regularization in signal processing Here inf x R n( Ax b 2 + λ x }{{} 1 ) }{{} f g Now g is not smooth. A is an m n matrix. m and n are large positive integers. b R m is known. λ > 0 is a regularization parameter BISTA 24 May / 23
24 The proximal gradient method BISTA 24 May / 23
25 The proximal gradient method Among the methods used for solving the minimization problem. BISTA 24 May / 23
26 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: BISTA 24 May / 23
27 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given BISTA 24 May / 23
28 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 BISTA 24 May / 23
29 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given A more general form: x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 x k := argmin x C (F k (x) + c k B(x, x k 1 )), k 2. BISTA 24 May / 23
30 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given A more general form: x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 x k := argmin x C (F k (x) + c k B(x, x k 1 )), k 2. Here c k > 0 and F k is an approximation to F = f + g, e.g., F k (x) := f (x k 1 ) + f (x k 1 ), x x k 1 + g(x). BISTA 24 May / 23
31 The proximal gradient method Among the methods used for solving the minimization problem. Basic form: x 1 C is given A more general form: x k := argmin x C (F (x) + c k x x k 1 2 ), k 2 x k := argmin x C (F k (x) + c k B(x, x k 1 )), k 2. Here c k > 0 and F k is an approximation to F = f + g, e.g., F k (x) := f (x k 1 ) + f (x k 1 ), x x k 1 + g(x). B is a Bregman divergence induced by some function b (more on that: later) BISTA 24 May / 23
32 Proximal algorithms: a very common assumption BISTA 24 May / 23
33 Proximal algorithms: a very common assumption f is globally Lipschitz continuous on C (or on X ), namely: there exists L(f ) 0 such that BISTA 24 May / 23
34 Proximal algorithms: a very common assumption f is globally Lipschitz continuous on C (or on X ), namely: there exists L(f ) 0 such that f (x) f (y) L(f ) x y, x, y C BISTA 24 May / 23
35 Proximal algorithms: a very common assumption f is globally Lipschitz continuous on C (or on X ), namely: there exists L(f ) 0 such that f (x) f (y) L(f ) x y, x, y C This assumption does not always hold, thus casting a limitation on many prox algorithms BISTA 24 May / 23
36 Our results: schematic description BISTA 24 May / 23
37 Our results: schematic description Introducing BISTA : BISTA 24 May / 23
38 Our results: schematic description Introducing BISTA : a new proximal gradient method BISTA 24 May / 23
39 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence BISTA 24 May / 23
40 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: BISTA 24 May / 23
41 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: We decompose C = k=1 S k BISTA 24 May / 23
42 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: We decompose C = k=1 S k Proximal iteration: on S k and not globally (on the whole C or X ); BISTA 24 May / 23
43 Our results: schematic description Introducing BISTA : a new proximal gradient method Prox operator is induced by a Bregman divergence a core new contribution: We decompose C = k=1 S k Proximal iteration: on S k and not globally (on the whole C or X ); Hence more flexibility, better chance that good properties are satisfied, e.g., that f will be Lipschitz continuous on S k (frequently all the S k are bounded) BISTA 24 May / 23
44 The origin of BISTA BISTA 24 May / 23
45 The origin of BISTA ISTA: BISTA 24 May / 23
46 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA 24 May / 23
47 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA significantly extends ISTA BISTA 24 May / 23
48 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA significantly extends ISTA BISTA = Bregmanian ISTA [or Brazilian ISTA :-) ] BISTA 24 May / 23
49 The origin of BISTA ISTA: a method suggested by Beck-Teboulle (2009) BISTA significantly extends ISTA BISTA = Bregmanian ISTA [or Brazilian ISTA :-) ] Preliminary version: in , during the postdoc in Brazil (I worked with Alvaro De Pierro) BISTA 24 May / 23
50 Our results: schematic description (Cont.) BISTA 24 May / 23
51 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) BISTA 24 May / 23
52 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X BISTA 24 May / 23
53 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) BISTA 24 May / 23
54 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) Several convergence results (under some assumptions), for example: BISTA 24 May / 23
55 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) Several convergence results (under some assumptions), for example: Non-asymptotic (in the function values) convergence rate of O(1/k) or a rate arbitrarily close to O(1/k), BISTA 24 May / 23
56 Our results: schematic description (Cont.) Relatively general setting: real reflexive Banach space X (Hilbertian in practice) Constrained minimization: C X f is not necessary globally Lipschitz continuous (should be Lipschitz continuous only on a subset of S k ) Several convergence results (under some assumptions), for example: Non-asymptotic (in the function values) convergence rate of O(1/k) or a rate arbitrarily close to O(1/k), Weak convergence BISTA 24 May / 23
57 Our results: schematic description (Cont.) BISTA 24 May / 23
58 Our results: schematic description (Cont.) Some auxiliary results of independent interest: BISTA 24 May / 23
59 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 BISTA 24 May / 23
60 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point BISTA 24 May / 23
61 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point A principle about the stability of the extreme values of a uniformly continuous function in a general metric space: a small perturbation of the set on which these values are computed yields a small change of these values BISTA 24 May / 23
62 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point A principle about the stability of the extreme values of a uniformly continuous function in a general metric space: a small perturbation of the set on which these values are computed yields a small change of these values (more details: in Slide 22) BISTA 24 May / 23
63 Our results: schematic description (Cont.) Some auxiliary results of independent interest: Generalization of a key lemma in Beck-Teboulle 2009 Sufficient conditions for the minimizer which appears in the proximal operation to be an interior point A principle about the stability of the extreme values of a uniformly continuous function in a general metric space: a small perturbation of the set on which these values are computed yields a small change of these values (more details: in Slide 22) This stability principle suggests a general scheme for tackling a wide class of non-convex and non-smooth optimization problems (details: in the paper) BISTA 24 May / 23
64 Bregman divergences (Bregman distances) BISTA 24 May / 23
65 Bregman divergences (Bregman distances) Let b : X (, ] BISTA 24 May / 23
66 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: BISTA 24 May / 23
67 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } BISTA 24 May / 23
68 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. BISTA 24 May / 23
69 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). BISTA 24 May / 23
70 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). Then the Bregman divergence associated with b is: BISTA 24 May / 23
71 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). Then the Bregman divergence associated with b is: { b(x) b(y) b B(x, y) := (y), x y, (x, y) dom(b) U, otherwise. BISTA 24 May / 23
72 Bregman divergences (Bregman distances) Let b : X (, ] Assume that: U := Int(dom(b)), where dom(b) := {x X : b(x) < } b is Gâteaux differentiable in U. b is convex and lower semicontinuous on X and strictly convex on dom(b). Then the Bregman divergence associated with b is: { b(x) b(y) b B(x, y) := (y), x y, (x, y) dom(b) U, otherwise. Here b (y) X and b (y), x y := b (y)(x y) BISTA 24 May / 23
73 Strongly convex functions: reminder BISTA 24 May / 23
74 Strongly convex functions: reminder Suppose that: BISTA 24 May / 23
75 Strongly convex functions: reminder Suppose that: C is a convex subset BISTA 24 May / 23
76 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) BISTA 24 May / 23
77 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) BISTA 24 May / 23
78 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. BISTA 24 May / 23
79 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. In other words, b satisfies a stronger condition than convexity BISTA 24 May / 23
80 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. In other words, b satisfies a stronger condition than convexity Many well-known Bregman functions are strongly convex on bounded subsets of their effective domain, BISTA 24 May / 23
81 Strongly convex functions: reminder Suppose that: C is a convex subset S C (S is not necessarily convex) b : C R is called strongly convex on S if there exists µ > 0 such that for each (x, y) S 2 and each λ (0, 1) b(λx + (1 λ)y) λb(x) + (1 λ)b(y) 0.5µλ(1 λ) x y 2. In other words, b satisfies a stronger condition than convexity Many well-known Bregman functions are strongly convex on bounded subsets of their effective domain,e.g., the negative Boltzmann-Gibbs-Shannon entropy b(x) = n k=1 x k log(x k ) BISTA 24 May / 23
82 A few assumptions BISTA 24 May / 23
83 A few assumptions C dom(b) BISTA 24 May / 23
84 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; BISTA 24 May / 23
85 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; g : C (, ]: convex, proper, lower semicontinuous BISTA 24 May / 23
86 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; g : C (, ]: convex, proper, lower semicontinuous F (x) := { f (x) + g(x), x C,, x / C. BISTA 24 May / 23
87 A few assumptions C dom(b) f : dom(b) R: convex on dom(b), Gâteaux differentiable in U; g : C (, ]: convex, proper, lower semicontinuous F (x) := { f (x) + g(x), x C,, x / C. MIN(F ) (the set of minimizers of F ) is nonempty and contained in U := Int(dom(b)) BISTA 24 May / 23
88 A few assumptions (Cont.) BISTA 24 May / 23
89 A few assumptions (Cont.) C = k=1 S k (a core new contribution) BISTA 24 May / 23
90 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: BISTA 24 May / 23
91 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 BISTA 24 May / 23
92 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 b is strongly convex on S k with µ k > 0 and µ k µ k+1 BISTA 24 May / 23
93 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 b is strongly convex on S k with µ k > 0 and µ k µ k+1 g is proper on S k BISTA 24 May / 23
94 A few assumptions (Cont.) C = k=1 S k (a core new contribution) For each k: S k is closed, convex, S k U, S k S k+1 b is strongly convex on S k with µ k > 0 and µ k µ k+1 g is proper on S k f is Lipschitz continuous on S k U with a constant L(f, S k U) BISTA 24 May / 23
95 A few assumptions (Cont.) BISTA 24 May / 23
96 A few assumptions (Cont.) Given L k > 0, let BISTA 24 May / 23
97 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. BISTA 24 May / 23
98 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U BISTA 24 May / 23
99 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U Lemma: BISTA 24 May / 23
100 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U Lemma: p Lk,µ k,s k (y) exists and is unique for each y S k U Assumption: BISTA 24 May / 23
101 A few assumptions (Cont.) Given L k > 0, let Q Lk,µ k,s k (x, y) := f (y) + f (y), x y + L k µ k B(x, y) + g(x), x S k, y U. and p Lk,µ k,s k (y) := argmin{q Lk,µ k,s k (x, y) : x S k }, y S k U Lemma: p Lk,µ k,s k (y) exists and is unique for each y S k U Assumption: p Lk,µ k,s k (y) S k U for each k BISTA 24 May / 23
102 Remark on the assumptions BISTA 24 May / 23
103 Remark on the assumptions These assumptions occur frequently in applications (details: in the paper) BISTA 24 May / 23
104 BISTA (the Lipschitz step size version) BISTA 24 May / 23
105 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). BISTA 24 May / 23
106 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. BISTA 24 May / 23
107 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. Step k, k 2: L k is arbitrary such that L k max{l k 1, L(f, S k U)}. BISTA 24 May / 23
108 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. Step k, k 2: L k is arbitrary such that L k max{l k 1, L(f, S k U)}. Given µ k > 0 (a parameter of strong convexity of b on S k ), let BISTA 24 May / 23
109 BISTA (the Lipschitz step size version) Input: A positive number L 1 L(f, S 1 U). Step 1 (Initialization): arbitrary x 1 S 1 U. Step k, k 2: L k is arbitrary such that L k max{l k 1, L(f, S k U)}. Given µ k > 0 (a parameter of strong convexity of b on S k ), let x k := p Lk,µ k,s k (x k 1 ) BISTA 24 May / 23
110 A few relevant works BISTA 24 May / 23
111 A few relevant works Beck-Teboulle 2009, BISTA 24 May / 23
112 A few relevant works Beck-Teboulle 2009, Cohen 1980, BISTA 24 May / 23
113 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), BISTA 24 May / 23
114 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, BISTA 24 May / 23
115 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, BISTA 24 May / 23
116 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, BISTA 24 May / 23
117 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 BISTA 24 May / 23
118 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in BISTA 24 May / 23
119 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or BISTA 24 May / 23
120 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or the method, and/or BISTA 24 May / 23
121 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or the method, and/or the results BISTA 24 May / 23
122 A few relevant works Beck-Teboulle 2009, Cohen 1980, Tseng 2008 (preprint), Bello Cruz and Nghia 2016, Nguyen 2017, Bauschke-Bolte-Teboulle 2017, Bolte-Sabach-Teboulle-Vaisbourd 2017/8 All of them are significantly different from our work, either in the setting, and/or the method, and/or the results More details: in the paper. BISTA 24 May / 23
123 The convergence results BISTA 24 May / 23
124 The convergence results Omitted, due to lack of time (but short details about them can be found in Slide 9). BISTA 24 May / 23
125 The convergence results Omitted, due to lack of time (but short details about them can be found in Slide 9). Details about them: in the paper BISTA 24 May / 23
126 The convergence results Omitted, due to lack of time (but short details about them can be found in Slide 9). Details about them: in the paper An illustrating example: next slide BISTA 24 May / 23
127 Example: l p -l 1 minimization BISTA 24 May / 23
128 Example: l p -l 1 minimization Goal: to estimate BISTA 24 May / 23
129 Example: l p -l 1 minimization Goal: to estimate where: inf x X 1 p Ax z p p + λ x 1 }{{}, }{{} g f BISTA 24 May / 23
130 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f BISTA 24 May / 23
131 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, BISTA 24 May / 23
132 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, BISTA 24 May / 23
133 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, A : X R m is a given linear operator, BISTA 24 May / 23
134 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, A : X R m is a given linear operator, z R m given, BISTA 24 May / 23
135 Example: l p -l 1 minimization Goal: to estimate where: inf x X n N, m N, p [2, ), 1 p Ax z p p + λ x 1 }{{}, }{{} g f X := R n with the p-norm x p := ( n k=1 x k p) 1/p, x X, C = X, A : X R m is a given linear operator, z R m given, λ > 0 is given BISTA 24 May / 23
136 Example: l p -l 1 minimization (Cont.) BISTA 24 May / 23
137 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 BISTA 24 May / 23
138 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods BISTA 24 May / 23
139 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 BISTA 24 May / 23
140 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) BISTA 24 May / 23
141 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 BISTA 24 May / 23
142 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) BISTA 24 May / 23
143 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) The convergence theorem ensures that x k converges to an optimal solution BISTA 24 May / 23
144 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) The convergence theorem ensures that x k converges to an optimal solution ( non-asymptotic rate of convergence is O, ) 1 k 1 γ(p 2) BISTA 24 May / 23
145 Example: l p -l 1 minimization (Cont.) f is not globally Lipschitz continuous when p > 2 Hence we cannot use most of the prox-grad methods But we can use BISTA, e.g., with b(x) := 1 2 x 2 2 Fix some γ (0, 1/(p 2)) (if p = 2, then γ can be arbitrary positive) let S k be the ball of radius r k := k γ and center 0 We can take L k := (p 1)2 p 2 A 2 k γ(p 2) The convergence theorem ensures that x k converges to an optimal solution ( 1 non-asymptotic rate of convergence is O, namely, can be k 1 γ(p 2) ) arbitrarily close to O(1/k) since γ can be arbitrary small BISTA 24 May / 23
146 Stability of the extreme values (intuitive formulation) BISTA 24 May / 23
147 Stability of the extreme values (intuitive formulation) Principle Consider an arbitrary uniformly continuous function f : X R defined on a metric space X. Then its extreme (optimal) values depend continuously on the subset over which they are computed. BISTA 24 May / 23
148 Stability of the extreme values (intuitive formulation) Principle Consider an arbitrary uniformly continuous function f : X R defined on a metric space X. Then its extreme (optimal) values depend continuously on the subset over which they are computed. In other words, if we compute the extreme values of f over some subset C X and we slightly change C to C, then the extreme values change slightly, namely inf f (x ) inf f (x) and sup f (x ) sup x C x C x C x C f (x). BISTA 24 May / 23
149 Stability of the extreme values (intuitive formulation) Principle Consider an arbitrary uniformly continuous function f : X R defined on a metric space X. Then its extreme (optimal) values depend continuously on the subset over which they are computed. In other words, if we compute the extreme values of f over some subset C X and we slightly change C to C, then the extreme values change slightly, namely inf f (x ) inf f (x) and sup f (x ) sup x C x C x C x C f (x). Remark Uniform continuity is essential: there are counterexamples to the stability principle when the considered function is not uniformly continuous. BISTA 24 May / 23
150 The End P.S. The slides and the paper can be found online: BISTA 24 May / 23
151 The End P.S. The slides and the paper can be found online: Slides: BISTA 24 May / 23
152 The End P.S. The slides and the paper can be found online: Slides: Paper: BISTA 24 May / 23
Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces
Existence and Approximation of Fixed Points of in Reflexive Banach Spaces Department of Mathematics The Technion Israel Institute of Technology Haifa 22.07.2010 Joint work with Prof. Simeon Reich General
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationWEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE
Fixed Point Theory, Volume 6, No. 1, 2005, 59-69 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.htm WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE YASUNORI KIMURA Department
More informationA New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction
A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization
More informationIterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem
Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationAuxiliary-Function Methods in Optimization
Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationHessian Riemannian Gradient Flows in Convex Programming
Hessian Riemannian Gradient Flows in Convex Programming Felipe Alvarez, Jérôme Bolte, Olivier Brahic INTERNATIONAL CONFERENCE ON MODELING AND OPTIMIZATION MODOPT 2004 Universidad de La Frontera, Temuco,
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationMathematical methods for Image Processing
Mathematical methods for Image Processing François Malgouyres Institut de Mathématiques de Toulouse, France invitation by Jidesh P., NITK Surathkal funding Global Initiative on Academic Network Oct. 23
More informationarxiv: v3 [math.oc] 17 Dec 2017 Received: date / Accepted: date
Noname manuscript No. (will be inserted by the editor) A Simple Convergence Analysis of Bregman Proximal Gradient Algorithm Yi Zhou Yingbin Liang Lixin Shen arxiv:1503.05601v3 [math.oc] 17 Dec 2017 Received:
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationFrom error bounds to the complexity of first-order descent methods for convex functions
From error bounds to the complexity of first-order descent methods for convex functions Nguyen Trong Phong-TSE Joint work with Jérôme Bolte, Juan Peypouquet, Bruce Suter. Toulouse, 23-25, March, 2016 Journées
More informationNon-smooth Non-convex Bregman Minimization: Unification and new Algorithms
Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France
More informationNon-smooth Non-convex Bregman Minimization: Unification and New Algorithms
JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract
More informationOn Total Convexity, Bregman Projections and Stability in Banach Spaces
Journal of Convex Analysis Volume 11 (2004), No. 1, 1 16 On Total Convexity, Bregman Projections and Stability in Banach Spaces Elena Resmerita Department of Mathematics, University of Haifa, 31905 Haifa,
More informationConvergence Theorems for Bregman Strongly Nonexpansive Mappings in Reflexive Banach Spaces
Filomat 28:7 (2014), 1525 1536 DOI 10.2298/FIL1407525Z Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Convergence Theorems for
More informationBREGMAN DISTANCES, TOTALLY
BREGMAN DISTANCES, TOTALLY CONVEX FUNCTIONS AND A METHOD FOR SOLVING OPERATOR EQUATIONS IN BANACH SPACES DAN BUTNARIU AND ELENA RESMERITA January 18, 2005 Abstract The aim of this paper is twofold. First,
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS
ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State
More informationAuxiliary-Function Methods in Iterative Optimization
Auxiliary-Function Methods in Iterative Optimization Charles L. Byrne April 6, 2015 Abstract Let C X be a nonempty subset of an arbitrary set X and f : X R. The problem is to minimize f over C. In auxiliary-function
More informationA convergence result for an Outer Approximation Scheme
A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento
More informationSequential convex programming,: value function and convergence
Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationSequential Unconstrained Minimization: A Survey
Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.
More informationOn the Minimization Over Sparse Symmetric Sets: Projections, O. Projections, Optimality Conditions and Algorithms
On the Minimization Over Sparse Symmetric Sets: Projections, Optimality Conditions and Algorithms Amir Beck Technion - Israel Institute of Technology Haifa, Israel Based on joint work with Nadav Hallak
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationConvergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces
Applied Mathematical Sciences, Vol. 6, 212, no. 63, 319-3117 Convergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces Nguyen Buong Vietnamese
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture 3 Scribe: Mina Karzand Oct., 05 Previously, we analyzed the convergence of the projected gradient descent algorithm. We proved
More informationShih-sen Chang, Yeol Je Cho, and Haiyun Zhou
J. Korean Math. Soc. 38 (2001), No. 6, pp. 1245 1260 DEMI-CLOSED PRINCIPLE AND WEAK CONVERGENCE PROBLEMS FOR ASYMPTOTICALLY NONEXPANSIVE MAPPINGS Shih-sen Chang, Yeol Je Cho, and Haiyun Zhou Abstract.
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence
Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationIntroduction to Real Analysis Alternative Chapter 1
Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces
More informationA Dykstra-like algorithm for two monotone operators
A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct
More informationRelative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent
Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order
More informationON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES
U.P.B. Sci. Bull., Series A, Vol. 80, Iss. 3, 2018 ISSN 1223-7027 ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES Vahid Dadashi 1 In this paper, we introduce a hybrid projection algorithm for a countable
More informationA First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems
A First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems Amir Beck and Shoham Sabach July 6, 2011 Abstract We consider a general class of convex optimization problems
More informationFirst Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems
First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems Jérôme Bolte Shoham Sabach Marc Teboulle Yakov Vaisbourd June 20, 2017 Abstract We
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationContinuity of convex functions in normed spaces
Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationOn the acceleration of the double smoothing technique for unconstrained convex optimization problems
On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities
More informationMOSCO STABILITY OF PROXIMAL MAPPINGS IN REFLEXIVE BANACH SPACES
MOSCO STABILITY OF PROXIMAL MAPPINGS IN REFLEXIVE BANACH SPACES Dan Butnariu and Elena Resmerita Abstract. In this paper we establish criteria for the stability of the proximal mapping Prox f ϕ =( ϕ+ f)
More informationPrimal and Dual Predicted Decrease Approximation Methods
Primal and Dual Predicted Decrease Approximation Methods Amir Beck Edouard Pauwels Shoham Sabach March 22, 2017 Abstract We introduce the notion of predicted decrease approximation (PDA) for constrained
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationOn nonexpansive and accretive operators in Banach spaces
Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 3437 3446 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa On nonexpansive and accretive
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationTHROUGHOUT this paper, we let C be a nonempty
Strong Convergence Theorems of Multivalued Nonexpansive Mappings and Maximal Monotone Operators in Banach Spaces Kriengsak Wattanawitoon, Uamporn Witthayarat and Poom Kumam Abstract In this paper, we prove
More informationThe Proximal Gradient Method
Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationBelief Propagation, Information Projections, and Dykstra s Algorithm
Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu
More informationConvex Sets. Prof. Dan A. Simovici UMB
Convex Sets Prof. Dan A. Simovici UMB 1 / 57 Outline 1 Closures, Interiors, Borders of Sets in R n 2 Segments and Convex Sets 3 Properties of the Class of Convex Sets 4 Closure and Interior Points of Convex
More informationOn the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities
On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities Caihua Chen Xiaoling Fu Bingsheng He Xiaoming Yuan January 13, 2015 Abstract. Projection type methods
More informationAnalysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t
Analysis Comprehensive Exam Questions Fall 2. Let f L 2 (, ) be given. (a) Prove that ( x 2 f(t) dt) 2 x x t f(t) 2 dt. (b) Given part (a), prove that F L 2 (, ) 2 f L 2 (, ), where F(x) = x (a) Using
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationGeneralized Monotonicities and Its Applications to the System of General Variational Inequalities
Generalized Monotonicities and Its Applications to the System of General Variational Inequalities Khushbu 1, Zubair Khan 2 Research Scholar, Department of Mathematics, Integral University, Lucknow, Uttar
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationA proximal-like algorithm for a class of nonconvex programming
Pacific Journal of Optimization, vol. 4, pp. 319-333, 2008 A proximal-like algorithm for a class of nonconvex programming Jein-Shan Chen 1 Department of Mathematics National Taiwan Normal University Taipei,
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationViscosity Iterative Approximating the Common Fixed Points of Non-expansive Semigroups in Banach Spaces
Viscosity Iterative Approximating the Common Fixed Points of Non-expansive Semigroups in Banach Spaces YUAN-HENG WANG Zhejiang Normal University Department of Mathematics Yingbing Road 688, 321004 Jinhua
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More informationAn inertial forward-backward method for solving vector optimization problems
An inertial forward-backward method for solving vector optimization problems Sorin-Mihai Grad Chemnitz University of Technology www.tu-chemnitz.de/ gsor research supported by the DFG project GR 3367/4-1
More informationCONVERGENCE TO FIXED POINTS OF INEXACT ORBITS FOR BREGMAN-MONOTONE AND FOR NONEXPANSIVE OPERATORS IN BANACH SPACES
CONVERGENCE TO FIXED POINTS OF INEXACT ORBITS FOR BREGMAN-MONOTONE AND FOR NONEXPANSIVE OPERATORS IN BANACH SPACES Dan Butnariu, Simeon Reich, and Alexander J. Zaslavski Abstract. Fixed points of Bregman-monotone
More informationTwo-Step Iteration Scheme for Nonexpansive Mappings in Banach Space
Mathematica Moravica Vol. 19-1 (2015), 95 105 Two-Step Iteration Scheme for Nonexpansive Mappings in Banach Space M.R. Yadav Abstract. In this paper, we introduce a new two-step iteration process to approximate
More informationProblem Set 6: Solutions Math 201A: Fall a n x n,
Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series
More informationA characterization of essentially strictly convex functions on reflexive Banach spaces
A characterization of essentially strictly convex functions on reflexive Banach spaces Michel Volle Département de Mathématiques Université d Avignon et des Pays de Vaucluse 74, rue Louis Pasteur 84029
More informationLecture Notes on Iterative Optimization Algorithms
Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell December 8, 2014 Lecture Notes on Iterative Optimization Algorithms Contents Preface vii 1 Overview and Examples
More informationA Note on the Class of Superreflexive Almost Transitive Banach Spaces
E extracta mathematicae Vol. 23, Núm. 1, 1 6 (2008) A Note on the Class of Superreflexive Almost Transitive Banach Spaces Jarno Talponen University of Helsinki, Department of Mathematics and Statistics,
More informationResearch Article Hybrid Algorithm of Fixed Point for Weak Relatively Nonexpansive Multivalued Mappings and Applications
Abstract and Applied Analysis Volume 2012, Article ID 479438, 13 pages doi:10.1155/2012/479438 Research Article Hybrid Algorithm of Fixed Point for Weak Relatively Nonexpansive Multivalued Mappings and
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems
Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm
More informationAn Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1
An Infeasible Interior Proximal Method for Convex Programming Problems with Linear Constraints 1 Nobuo Yamashita 2, Christian Kanzow 3, Tomoyui Morimoto 2, and Masao Fuushima 2 2 Department of Applied
More informationProjection and proximal point methods: convergence results and counterexamples
Projection and proximal point methods: convergence results and counterexamples Heinz H. Bauschke, Eva Matoušková, and Simeon Reich June 14, 2003 Version 1.15 Abstract Recently, Hundal has constructed a
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationLeast squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova
Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium
More informationPenalized Barycenters in the Wasserstein space
Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationOn the split equality common fixed point problem for quasi-nonexpansive multi-valued mappings in Banach spaces
Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 (06), 5536 5543 Research Article On the split equality common fixed point problem for quasi-nonexpansive multi-valued mappings in Banach spaces
More informationBregman Divergence and Mirror Descent
Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,
More informationSplitting methods for decomposing separable convex programs
Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques
More informationAn Optimal Affine Invariant Smooth Minimization Algorithm.
An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,
More informationPrimal and dual predicted decrease approximation methods
Math. Program., Ser. B DOI 10.1007/s10107-017-1108-9 FULL LENGTH PAPER Primal and dual predicted decrease approximation methods Amir Beck 1 Edouard Pauwels 2 Shoham Sabach 1 Received: 11 November 2015
More informationOn the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems
On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationOn robustness of the regularity property of maps
Control and Cybernetics vol. 32 (2003) No. 3 On robustness of the regularity property of maps by Alexander D. Ioffe Department of Mathematics, Technion, Haifa 32000, Israel Abstract: The problem considered
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI. Received December 14, 1999
Scientiae Mathematicae Vol. 3, No. 1(2000), 107 115 107 ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI Received December 14, 1999
More informationA FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS
Fixed Point Theory, (0), No., 4-46 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html A FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS A. ABKAR AND M. ESLAMIAN Department of Mathematics,
More informationCompact operators on Banach spaces
Compact operators on Banach spaces Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto November 12, 2017 1 Introduction In this note I prove several things about compact
More informationThe resolvent average of monotone operators: dominant and recessive properties
The resolvent average of monotone operators: dominant and recessive properties Sedi Bartz, Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang September 30, 2015 (first revision) December 22, 2015 (second
More informationDerivatives. Differentiability problems in Banach spaces. Existence of derivatives. Sharpness of Lebesgue s result
Differentiability problems in Banach spaces David Preiss 1 Expanded notes of a talk based on a nearly finished research monograph Fréchet differentiability of Lipschitz functions and porous sets in Banach
More information