Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization

Size: px
Start display at page:

Download "Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization"

Transcription

1 Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization Szilárd Csaba László November, 08 Abstract. We investigate an inertial algorithm of gradient type in connection with the minimization of a nonconvex differentiable function. The algorithm is formulated in the spirit of Nesterov s accelerated convex gradient method. We show that the generated sequences converge to a critical point of the objective function, if a regularization of the objective function satisfies the Kurdyka- Lojasiewicz property. Further, we provide convergence rates for the generated sequences and the objective function values formulated in terms of the Lojasiewicz exponent. Key Words. inertial algorithm, nonconvex optimization, Kurdyka- Lojasiewicz inequality, convergence rate AMS subject classification. 90C6, 90C30, 65K0 Introduction Let g : R m R be a not necessarily convex Fréchet differentiable function with L g -Lipschitz continuous gradient, i.e. there exists L g 0 such that gx gy L g x y for all x, y R m. We deal with the optimization problem P inf gx. x Rm We associate to the following inertial algorithm of gradient type. Consider the starting points x 0 = y 0 R m, and for all n N x n+ = y n s gy n, y n = x n + n + α x n x n, where α > 0, β 0, and 0 < s < β L g. Note that is a nonconvex descendant of the methods of Polyak [3] and Nesterov []. Indeed, in [3], Polyak introduced a modified gradient method for minimizing a smooth convex function g. His two-step iterative method, the so called heavy ball method, takes the following form: x n+ = y n λ n gx n, 3 y n = x n + α n x n x n, Technical University of Cluj-Napoca, Department of Mathematics, Str. Memorandumului nr. 8, 4004 Cluj-Napoca, Romania, laszlosziszi@yahoo.com This work was supported by a grant of Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P-.-TE , and by a grant of Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P4-ID-PCE , within PNCDI III.

2 where α n [0, and λ n > 0 is a step-size parameter. In a seminal paper [], Nesterov proposed a modification of the heavy ball method in order to obtain optimal convergence rates for smooth convex functions. More precisely, Nesterov used α n = tn where t n 4t n ++ satisfies the recursion t n+ =, t = and put y n also for evaluating the gradient. Additionally, λ n is chosen in such way that λ n L g. His scheme in its simplest form is given by: x n+ = y n s gy n, y n = x n + t 4 n x n x n, t n+ where s L g. This scheme leads to the convergence rate gx n gx = O /n, where x is a minimizer of the convex function g, and this is optimal among all methods having only information about the gradient of g and consecutive iterates, []. By taking t n = n+a a, a in 4 we obtain an algorithm that is asymptotically equivalent to the original Nesterov method and leads to the same rate of convergence O /n, see [5, 5]. This case has been considered by Chambolle and Dossal [5], in order to prove the convergence of the iterates of the modified FISTA algorithm see [7]. We emphasize that Algorithm has a similar form as the algorithm studied by Chambolle and Dossal see [5] and also [8], but we allow the function g to be nonconvex. Unfortunately, our analysis do not cover the case β =. Su, Boyd and Candès see [5], showed that in case t n = n+ the algorithm 4 has the exact limit the second order differential equation t n+ ẍt + α ẋt + gxt = 0. 5 t with α = 3. Recently, Attouch and his co-authors see [4, 6], proved that, if α > 3 in 5, then the generated trajectory xt converges to a minimizer of g as t +, while the convergence rate of the objective function along the trajectory is o/t. Further, in [5], some results concerning the convergence rate of the objective function g along the trajectory generated by 5, in the subcritical case α 3, have been obtained. However, the convergence of the generated trajectories by 5 in case g is nonconvex is still an open question. Some important steps in this direction have been made in [4] see also [], where convergence of the trajectories of a system, that can be viewed as a perturbation of 5, have been obtained in a nonconvex setting. More precisely in [4] is considered the system ẍt + γ + α t ẋt + gxt = 0, 6 and it is shown that the generated trajectory converges to a critical point of g, if a regularization of g satisfies the Kurdyka- Lojasiewicz property. In what follows we show that by choosing appropriate values of β, the numerical scheme has as the exact limit the continuous second order dynamical systems 5 studied in [4 6, 5], and also the continuous dynamical system 6 studied in [4]. We take to this end in small step sizes and follow the same approach as Su, Boyd and Candès in [5], see also [4]. For this purpose we rewrite in the form x n+ x n = s n + α xn x n s gy n n 7 s and introduce the Ansatz x n xn s for some twice continuously differentiable function x : [0, + R n. We let n = t s and get xt x n, xt + s x n+, xt s x n. Then, as the step size s goes

3 to zero, from the Taylor expansion of x we obtain x n+ x n s = ẋt + ẍt s + o s and x n x n s = ẋt ẍt s + o s. Further, since s gyn gx n sl g y n x n = sl g n + α x n x n = o s, it follows s gy n = s gx n + o s. Consequently, 7 can be written as or, equivalently ẋt + ẍt s + o s = βt t + α ẋt s ẍt s + o s s gxt + o s t + α s ẋt + ẍt s + o s = βt ẋt ẍt s + o s st + α s gxt + o s. Hence, α s + + βt ẍt s + βt + α s ẋt + st + α s gxt = o s. 8 Now, if we take β = γs < in 8 for some s > γ > 0, we obtain α s + γst ẍt s + γst + α s ẋt + st + α s gxt = o s. After dividing by s and letting s 0, we obtain which, after division by t, gives 5, that is tẍt + αẋt + t gxt = 0, ẍt + α ẋt + gxt = 0. t Similarly, by taking β = γ s < in 8, for some s > γ > 0, we obtain α s + γ st ẍt s + γ st + α s ẋt + st + α s gxt = o s. After dividing by s and letting s 0, we get which, after division by t, gives 6, that is tẍt + γt + αẋt + t gxt = 0, ẍt + γ + α t ẋt + gxt = 0. 3

4 Consequently, our numerical scheme can be seen as the discrete counterpart of the continuous dynamical systems 5 and 6, in a full nonconvex setting. The techniques for proving the convergence of use the same main ingredients as other algorithms for nonconvex optimization problems involving KL functions. More precisely, in the next section, we show a sufficient decrease property for the iterates, which also ensures that the iterates gap belongs to l, further we show that the set of cluster points of the iterates is included in the set of critical points of the objective function, and, finally, we use the KL property of an appropriate regularization of the objective function in order to obtain that the iterates gap belongs to l, which implies the convergence of the iterates, see also [3, 8, 3]. Moreover, in section 3, we obtain several convergence rates both for the sequences x n n N, y n n N generated by the numerical scheme, as well as for the function values gx n, gy n in the terms of the Lojasiewicz exponent of g and a regularization of g, respectively for some general results see [6]. The convergence of the generated sequences In this section we investigate the convergence of the proposed algorithm. We show that the sequences generated by the numerical scheme converge to a critical point of the objective function g, provided the regularization of g, Hx, y = gx + y x, is a KL function. The main tool in our forthcoming analysis is the so called descent lemma, see []. Lemma Let g : R m R be Frèchet differentiable with L g Lipschitz continuous gradient. Then gy gx + gx, y x + L g y x, x, y R m. Now we are able to obtain a decrease property for the iterates generated by. Theorem In the settings of problem, for some starting points x 0 = y 0 R m let x n n N, y n n N be the sequences generated by the numerical scheme. Consider the sequences and A n = sl g s C n = sl g s for all n N, n. Then, there exists N N such that + βn + α n + α β + βn + α n + α n + α δ n = A n C n + βn + α sn + α, β s n + α n + α i The sequence gy n + δ n x n x n n N is decreasing and δ n > 0 for all n N. Assume that g is bounded from below. Then, the following statements hold. ii The sequence gy n + δ n x n x n n N is convergent; iii n x n x n < +. 4

5 Proof. From we have gy n = s y n x n+, hence Now, from Lemma we obtain gy n, y n+ y n = s y n x n+, y n+ y n. consequently we have gy n+ gy n + gy n, y n+ y n + L g y n+ y n, and hence Further, Since we have, and gy n+ L g y n+ y n gy n + s y n x n+, y n+ y n. 9 y n x n+, y n+ y n = y n+ y n + y n+ x n+, y n+ y n, y n+ x n+ = βn + n + α + x n+ x n, gy n+ + s L g y n+ y n gy n + y n+ y n = βn+ n+α+ s x n+ x n, y n+ y n. 0 + βn + α + β + x n+ x n n + α + n + α x n x n, y n+ y n = + βn + α + β + x n+ x n n + α + n + α x n x n + βn + α + β + x n+ x n + x n x n n + α + n + α x n+ x n, y n+ y n = + βn + α + β + n + α + x n+ x n, n + α x n+ x n, x n x n, + βn + α + β + x n+ x n n + α + n + α x n x n = + βn + α + β + x n+ x n n + α + n + α x n+ x n, x n x n. Replacing the above equalities in 0, we obtain sl g +βn+α+β+ n+α+ βn++βn+α+β+ n+α+ gy n+ + x n+ x n s sl g n+α gy n x n x n + s sl g +βn+α+β+ n+α n+α+ n+α s 5 βn+ n+α+ x n+ x n, x n x n. =

6 For simplicity let for all n N. Hence we have B n = sl g n+α s gy n+ + A n x n+ x n C n x n+ x n, x n x n gy n B n x n x n. By using the equality we obtain x n+ x n, x n x n = x n+ + x n x n x n+ x n x n x n gy n+ + A n C n x n+ x n + C n x n+ + x n x n gy n + C n B n x n x n. Note that A n C n = δ n+ and let us denote n = B n +A n C n C n. Consequently the following inequality holds. C n x n+ +x n x n + n x n x n gy n +δ n x n x n gy n+ +δ n+ x n+ x n. and Since 0 < β < and s < β L g, we have lim A n = sl gβ + β β s lim B n = sl gβ s > 0, lim C n = sl gβ + β β s lim n = sl g β > 0, s > 0, > 0, lim δ n = β sl g β + > 0. s Hence, there exists N N and C > 0, D > 0 such that for all n N one has C n C, n D and δ n > 0 which, in the view of, shows i, that is, the sequence gy n +δ n x n x n is decreasing for n N. Assume now that g is bounded from below. By using again, we obtain 0 C x n+ + x n x n + D x n x n gy n + δ n x n x n gy n+ + δ n+ x n+ x n, for all n N, or more convenient, that 0 D x n x n gy n + δ n x n x n gy n+ + δ n+ x n+ x n, 3 for all n N. Let r > N. By summing up the latter relation we have D r x n x n gy N + δ N x N x N gy r+ + δ r+ x r+ x r n=n 6

7 which leads to gy r+ + D r x n x n gy N + δ N x N x N. 4 n=n Now, taking into account that g is bounded from below, by letting r + we obtain which proves iii. The latter relation also shows that hence x n x n + n=n lim x n x n = 0, lim δ n x n x n = 0. But then, from the fact that g is bounded from below we obtain that the sequence gy n +δ n x n x n is bounded from below. On the other hand, from i we have that the sequence gy n + δ n x n x n is decreasing for n N, hence there exists lim gy n + δ n x n x n R. Remark 3 Observe that conclusion iii in the hypotheses of Theorem assures that the sequence x n x n n N l, in particular that lim x n x n = 0. 5 Let us denote by ωx n n N the set of cluster points of the sequence x n n N, and denote by critg = {x R m : gx = 0} the set of critical points of g. In the following result we use the distance function to a set, defined for A R n as distx, A = inf y A x y for all x R n. Lemma 4 In the settings of problem, for some starting points x 0 = y 0 R m consider the sequences x n n N, y n n N generated by Algorithm. Assume that g is bounded from below and consider the function Consider further the sequence H : R m R m R, Hx, y = gx + y x. u n = δ n x n x n + y n, for all n N, where δ n was defined in Theorem. Then, the following statements hold true. i ωu n n N = ωy n n N = ωx n n N crit g; ii There exists and is finite the limit lim Hy n, u n ; iii ωy n, u n n N crit H = {x, x R m R m : x crit g}; iv Hy n, u n s x n+ x n + sn+α + δ n x n x n for all n N; 7

8 v Hy n, u n x s n+ x n + sn+α δ n + δn x n x n for all n N; vi H is finite and constant on ωy n, u n n N. Assume that x n n N is bounded. Then, vii ωy n, u n n N is nonempty and compact; viii lim disty n, u n, ωy n, u n n N = 0. Proof. i Let x ωx n n N. Then, there exists a subsequence x nk k N of x n n N such that lim x n k = x. k + Since by 5 lim x n x n = 0 and the sequences δ n n N, that which shows that lim y n k = k + lim u n k = k + lim x n k = x, k + n+α n N ωx n n N ωu n n N and ωx n n N ωy n n N. Further from, the continuity of g and 5, we obtain that gx = lim gy n k = k + s s lim k + lim y n k x nk + = k + [ x nk x nk + + k n k + α x n k x nk ] = 0. converge, we obtain Hence ωx n n N crit g. Conversely, if u ωu n n N then, from 5 results that u ωy n n N and u ωx n n N. Hence, ωu n n N = ωy n n N = ωx n n N crit g. ii is nothing else than ii in Theorem. For iii observe that Hx, y = gx + x y, y x, hence, Hx, y = 0 leads to x = y and gx = 0. Consequently crit H = {x, x R m R m : x crit g}. Further, consider y, u ωy n, u n n N. Then, there exists y nk, u nk k N y n, u n n N such that y, u = lim y n k, u nk = lim x n k, x nk = x, x. k + k + Hence, u = y = x ωx n n N crit g and x, x crit H. iv By using the -norm of R m R m and, for every n N we have Hy n, u n Hy n, u n = gy n + y n u n, u n y n = gy n + y n u n + u n y n gy n + δ n x n x n = s x n + n + α x n x n x n+ + δ n x n x n s x n+ x n + sn + α + δ n x n x n. 8

9 v We use the euclidian norm of R m R m, that is x, y = x + y for all x, y R m R m. We have: Hy n, u n = gy n + y n u n, u n y n = gy n + y n u n + u n y n = s x n x n+ + sn + α δ n x n x n + δ n x n x n s x n+ x n + sn + α δ n + δ n x n x n for all n N. vi follows directly from ii. Assume now that x n n N is bounded and let us prove vii, see also [4]. Obviously y n, u n n N is bounded, hence according to Weierstrass Theorem ωy n, u n n N, and also ωx n n N, is nonempty. It remains to show that ωy n, u n n N is closed. From i and the proof of iii we have ωy n, u n n N = {x, x R m R m : x ωx n n N }. 6 Hence, it is enough to show that ωx n n N is closed. Let be x p p N ωx n n N and assume that lim p + x p = x. We show that x ωx n n N. Obviously, for every p N there exists a sequence of natural numbers n p k +, k +, such that lim k + x n p k = x p. Let be ϵ > 0. Since lim p + x p = x, there exists P ϵ N such that for every p P ϵ it holds x p x < ϵ. Let p N be fixed. Since lim k + x n p = x p, there exists kp, ϵ N such that for every k kp, ϵ it k holds x n p x p < ϵ k. Let be k p kp, ε such that n p k p > p. Obviously n p k p as p + and for every p P ϵ x n p x < ϵ. kp Hence thus x ωx n n N. viii By using 6 we have lim p + x n p kp = x, lim disty n, u n, ωy n, u n n N = lim inf y n, u n x, x. x ωx n n N Since there exists the subsequences y nk k N and u nk k N such that lim k y nk = lim k u nk = x 0 ωx n n N it is straightforward that lim disty n, u n, ωy n, u n n N = 0. 9

10 Remark 5 We emphasize that if g is coercive, that is lim x + gx = +, then g is bounded from below and x n n N, y n n N, the sequences generated by, are bounded. Indeed, notice that g is bounded from below, being a continuous and coercive function see [4]. Note that according to Theorem the sequence D r n=n x n x n is convergent hence is bounded. Consequently, from 4 it follows that y r is contained for every r > N, N is defined in the hypothesis of Theorem, in a lower level set of g, which is bounded. Since y n n N is bounded, taking into account 5, it follows that x n n N is also bounded. In order to continue our analysis we need the concept of a KL function. For η 0, + ], we denote by Θ η the class of concave and continuous functions φ : [0, η [0, + such that φ0 = 0, φ is continuously differentiable on 0, η, continuous at 0 and φ s > 0 for all s 0, η. Definition Kurdyka- Lojasiewicz property Let f : R n R be a differentiable function. We say that f satisfies the Kurdyka- Lojasiewicz KL property at x R n if there exist η 0, + ], a neighborhood U of x and a function φ Θ η such that for all x in the intersection the following inequality holds U {x R n : fx < fx < fx + η} φ fx fx fx. If f satisfies the KL property at each point in R n, then f is called a KL function. The origins of this notion go back to the pioneering work of Lojasiewicz [9], where it is proved that for a real-analytic function f : R n R and a critical point x R n that is fx = 0, there exists θ [/, such that the function f fx θ f is bounded around x. This corresponds to the situation when φs = C θ s θ. The result of Lojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [7] extended this property to differentiable functions definable in an o-minimal structure. Further extensions to the nonsmooth setting can be found in [, 9 ]. To the class of KL functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convex and convex functions satisfying a growth condition. We refer the reader to [ 3, 8 ] and the references therein for more details regarding all the classes mentioned above and illustrating examples. An important role in our convergence analysis will be played by the following uniformized KL property given in [8, Lemma 6]. Lemma 6 Let Ω R n be a compact set and let f : R n R be a differentiable function. Assume that f is constant on Ω and f satisfies the KL property at each point of Ω. Then there exist ε, η > 0 and φ Θ η such that for all x Ω and for all x in the intersection {x R n : distx, Ω < ε} {x R n : fx < fx < fx + η} 7 the following inequality holds φ fx fx fx. 8 The following convergence result is the first main result of the paper. Theorem 7 In the settings of problem, for some starting points x 0 = y 0 R m consider the sequences x n n N, y n n N generated by Algorithm. Assume that g is bounded from below and consider the function H : R m R m R, Hx, y = gx + y x. Assume that x n n N is bounded and H is a KL function. Then the following statements are true 0

11 a n x n x n < + ; b there exists x critg such that lim x n = x. Proof. Consider the sequence u n = δ n x n x n + y n, for all n N, that was defined in the hypotheses of Lemma 4. Furthermore, consider x, x ωy n, u n n N. Then, according to Lemma 4, the sequence Hy n, u n is decreasing for all n N, where N was defined in Theorem, further x crit g and lim Hy n, u n = Hx, x. We divide the proof into two cases. Case I. There exists n N, n N, such that Hy n, u n = Hx, x. Then, since Hy n, u n is decreasing for all n N and lim Hy n, u n = Hx, x we obtain that The latter relation combined with 3 leads to Hy n, u n = Hx, x for all n n. 0 D x n x n Hy n, u n Hy n+, u n+ = Hx, x Hx, x = 0 for all n n. Hence x n n n is constant and the conclusion follows. Case II. For every n N one has that Hy n, u n > Hx, x. Let Ω = ωy n, u n n N. Then according to Lemma 4, Ω is nonempty and compact and H is constant on Ω. Since H is KL, according to Lemma 6 there exist ε, η > 0 and φ Θ η such that for all z, w belonging to the intersection one has {z, w R n R n : distz, w, Ω < ε} {z, w R n R n : Hx, x < Hz, w < Hx, x + η} φ Hz, w Hx, x Hz, w. Since lim disty n, u n, Ω = 0, there exists n N such that disty n, u n, Ω < ϵ, n n. and Since there exists n N such that lim Hy n, u n = Hx, x Hy n, u n > Hx, x for all n N, Hx, x < Hy n, u n < Hx, x + η, n n. Hence, for all n n = maxn, n we have φ Hy n, u n Hx, x Hy n, u n. Since φ is concave, for all n N we have φhy n, u n Hx, x φhy n+, u n+ Hx, x

12 φ Hy n, u n Hx, x Hy n, u n Hy n+, u n+, hence, φhy n, u n Hx, x φhy n+, u n+ Hx, x Hy n, u n Hy n+, u n+ Hy n, u n for all n n. Now, from 3 and Lemma 4 iv we obtain φhy n, u n Hx, x φhy n+, u n+ Hx, x 9 D x n x n, s x n+ x n + sn+α + δ n x n x n for all n n. Since lim δ n = β sl gβ+ s > 0 and lim sn+α = β s N N, N n and M > 0 such that max s, sn + α + δ n M, 0 there exists for all n N. Hence, 9 becomes φhy n, u n Hx, x φhy n+, u n+ Hx, x 0 for all n N. Consequently, D x n x n M x n+ x n + x n x n, x n x n M D φhy n, u n Hx, x φhy n+, u n+ Hx, x x n+ x n + x n x n, for all n N. By using the arithmetical-geometrical mean inequality we have M D φhy n, u n Hx, x φhy n+, u n+ Hx, x x n+ x n + x n x n x n+ x n + x n x n 3 for all n N. Hence x n+ x n + x n x n 3 for all n N which leads to + 3M 4D φhy n, u n Hx, x φhy n+, u n+ Hx, x x n x n + 3M 4D φhy n, u n Hx, x φhy n+, u n+ Hx, x x n x n x n+ x n 9M 4D φhy n, u n Hx, x φhy n+, u n+ Hx, x

13 for all n N. Let P > N. By summing up from N to P we obtain P x n x n n=n x N x N + x P + x P + 9M 4D φhy N, u N Hx, x φhy P +, u P + Hx, x. Now, by letting P + and using the fact that φ0 = 0 and 5 we obtain that hence n=n x n x n x N x N + 9M 4D φhy N, u N Hx, x < +, x n x n < + n which is exactly a. Obviously the sequence S n = n k= x k x k is Cauchy, hence, for all ϵ > 0 there exists N ϵ N such that for all n N ϵ and for all p N one has But S n+p S n = n+p k=n+ S n+p S n ϵ. x k x k n+p k=n+ hence the sequence x n n N is Cauchy, consequently is convergent. Let Now, according to Lemma 4 i one has lim x n = x. {x} = ωx n n N crit g x k x k = x n+p x n which proves b. Remark 8 Since the class of semi-algebraic functions is closed under addition see for example [8] and x, y x y is semi-algebraic, the conclusion of the previous theorem holds if the condition H is a KL function is replaced by the assumption that g is semi-algebraic. Remark 9 Note that, according to Remark 5, the conclusion of Theorem 7 remains valid if we replace in its hypotheses the conditions that g is bounded from below and x n n N is bounded by the condition that g is coercive. Remark 0 Note that under the assumptions of Theorem 7 we have lim y n = x and lim gx n = lim gy n = gx. 3

14 3 Convergence rates In this section we will assume that the regularized function H satisfies the Lojasiewicz property, which, as noted in the previous section, corresponds to a particular choice of the desingularizing function φ see [, 9, 9]. Definition Let f : R n R be a differentiable function. The function f is said to fulfill the Lojasiewicz property, if for every x crit f there exist K, ϵ > 0 and θ 0, such that fx fx θ K fx for every x fulfilling x x < ϵ. The number θ is called the Lojasiewicz exponent of f at the critical point x. This corresponds to the case when the desingularizing function φ has the form φt = K θ t θ. In the following theorems we provide convergence rates for the sequence generated by, but also for the function values, in terms of the Lojasiewicz exponent of H see, also, [, 9]. Note that the forthcoming results remain valid if one replace in their hypotheses the conditions that g is bounded from below and x n n N is bounded by the condition that g is coercive. Theorem In the settings of problem consider the sequences x n n N, y n n N generated by Algorithm. Assume that g is bounded from below and that x n n N is bounded, let x critg be such that lim x n = x and suppose that H : R n R n R, Hx, y = gx + x y fulfills the Lojasiewicz property at x, x crit H with Lojasiewicz exponent θ 0, ]. Then, for every p > 0 there exist a, a, a 3, a 4 > 0 and k N such that the following statements hold true: a gy n gx a n p for every n > k, a gx n gx a n p for every n > k, a 3 x n x a 3 n p a 4 y n x a 4 n p for every n > k, for all n > k. Proof. As we have seen in the proof of Theorem 7, if there exists n N, n N, where N was defined in Theorem, such that Hy n, u n = Hx, x, then, Hy n, u n = Hx, x for all n n and x n n n is constant. Consequently y n n n is also constant and the conclusion of the theorem is straightforward. Hence, in what follows we assume that Hy n, u n > Hx, x, for all n N. Let us fix p > 0 and let us prove a. For simplicity let us denote r n = Hy n, u n Hx, x > 0 for all n N. From we have From Lemma 4 v we have Hy n, u n s x n+ x n + n x n x n r n r n+ for all n N. sn + α δ n + δ n x n x n 4

15 for all n N. Let S n = sn+α δ n + δn, for all n N. It follows that, for all n N one has x n x n S n Hy n, u n s S n x n+ x n Hy n, u n S n s r n+ r n+. S n n+ Now by using the Lojasiewicz property of H at x, x crit H, and the fact that lim y n, u n = x, x, we obtain that there exists K, ϵ > 0 and N N, such that for all n N one has y n, u n x, x < ϵ, consequently r n r n+ n Hy n, u n n S n s r n+ r n+ S n n+ n K r θ n n S n s r n+ r n+ S n n+ n K r θ n n+ S n s r n+ r n+ = α n rn+ θ β n r n+ r n+, S n n+ where α n = n K S n and β n = n s S n n+. It is obvious that the sequences α n n N and β n n N are convergent, further lim α n > 0 and lim β n > 0. Now, since 0 < θ and r n+ 0, there exists N N, N N, such that r θ n+ r n+ for all n N. Note that this implies that 0 r n for all n N. Hence, r n α n β n + r n+ + β n r n+, for all n N. Let us define for every n > N the sequence Ξ n = +. Since lim α n β n + + β n+ Ξ n+ there exists k N, k N such that for all n k one has np n+ p n. Then, since p > 0 one has lim p Ξ n = β n + Ξ n α n β n + + β n+ Ξ n+ β n + Ξ n. = lim α n > 0, Consequently, or, equivalently r n r n + + β n+ Ξ n+ β n + Ξ n r n+ β n + Ξ n r n+ + β n r n+, for all n k, + β n+ r n+ + Ξ n+ β n + β n+ Ξ n+ r n+, 3 5

16 for all n k. Now 3 leads to n r k + k=k β k hence after simplifying we get But, β n+ Ξ n+ By denoting Hence, r k + + β k Ξ k r k+ β k = n+p n+ p, hence r k + β k + β k Ξ k r k+ + β r k k+ Ξ k k=k n k=k n k=k + β n k+ r k+ + Ξ k+ k=k β k + β k+ Ξ k+ r k+, β n + β r n+ + k+ + β r n+ n+. 4 Ξ k+ Ξ n+ n n k + p k + p + β = = k+ k + p n + p. Ξ k+ k=k k + p = a, we have a n + p r β n n+ + + n+ β r n+. n+ a n p a n + p r n = gy n gx + δ n x n x n gy n gx 5 which is a. For a we start from Lemma and and we have gx n gy n gy n, x n y n + L g x n y n = x n x n+ + s n + α x n x n, sl g x n x n + n + α s s By using the inequality X, Y x n+ x n, n + α x n x n n + α x n x n x n+ x n, + L g x n x n = n + α. n + α x n x n a X + Y for all X, Y R m, a R \ {0}, we obtain a sl g x n+ x n + sl g x n x n, n + α consequently From 3 we have gx n gy n s sl g x n+ x n. x n x n D gy n + δ n x n x n gy n+ + δ n+ x n+ x n 6

17 and since the sequence gy n + δ n x n+ x n n k is decreasing and has the limit gx, we obtain that gy n+ + δ n+ x n+ x n gx, consequently Hence, for all n k one has x n x n D r n. 6 gx n gy n sd sl g r n+. 7 Now, the identity gx n gx = gx n gy n + gy n gx and a lead to gx n gx for every n > k, which combined with 5 give sd sl g r n+ + a n p gx n gx sd sl g a n + p + a n p a + sd sl g n p = a n p, for every n > k. For a 3 observe, that by summing up from n k to P > n and using the triangle inequality we obtain P x P x n x k x k k=n x n x n + x P + x P + 9M 4D φhy n, u n Hx, x φhy P +, u P + Hx, x. By letting P + we get x n x x n x n + 9M 4D φhy n, u n Hx, x 9M 4D φhy n, u n Hx, x. But, φt = K θ t θ, hence x n x 9MK 4D θ Hy n, u n Hx, x θ = M r θ n, 8 where M = 9MK 4D θ. But assures that 0 r n which combined with θ 0, ] leads to r θ n we have The conclusion follow by 5, since we have x n x M rn. r n, consequently x n x M a n p = a 3 n p for every n > k. Finally, for n > k we have y n x = x n + n + α x n x n x 7

18 Let a 4 = + βa 3. Then for all n > k, which proves a 4. + n + α + n + α x n x + a 3 n p + n + α n + α x n x + n + α a 3 n p a 3. n p y n x a 4, n p Remark In the previous theorem we obtained convergence rates with order p, for every p > 0. This happened when we took in 4 β n+ n + p = Ξ n+ n + p. But actually we have shown more. If one takes β n+ Ξ n+ = ρ n+ > 0 where lim ρ n = 0 then one obtains that there exits k N and A > 0 such that for all n k one has hence 4 becomes α n β n + + ρ n+ β n + ρ n gy n gx A n k=k+ From here, as in the proof of Theorem, one can derive that and x n x = O gx n gx A n k=k+ n k=k+ + ρ k. + ρ k for some A > 0, and y n x = O + ρ k n k=k+. + ρ k Having in mind this general result, and taking into account that in [4], for the dynamical system 6 which, as it is shown in Introduction, can be viewed as the continuous counterpart of the numerical scheme, it was obtained finite time convergence of the generated trajectories for θ 0, and exponential convergence rate for θ =, it seems a valid question whether we can obtain exponential convergence rate for the sequences generated by, by choosing an appropriate sequence ρ n. We show in what follow that this is not possible. We have n k=k+ = e n k=k+ ln+ρk. + ρ k Obviously ln + ρ k > 0, for all k > k and lim k + ln + ρ k = 0. Now, by using the Cesàro-Stolz theorem we obtain that n k=k+ lim ln + ρ k = lim n ln + ρ n+ = 0, 8

19 hence n k=k+ ln + ρ k = on, which shows that O n k=k+ > O e n. + ρ k Remark 3 According to [8], H is KL with Lojasiewicz exponent θ [,, whenever g is KL with Lojasiewicz exponent θ [,. Therefore, we have the following corollary. Corollary 4 In the settings of problem consider the sequences x n n N, y n n N generated by Algorithm. Assume that g is bounded from below and that x n n N is bounded, let x critg be such that lim x n = x and suppose that g fulfills the Lojasiewicz property at x with Lojasiewicz exponent θ =. Then, for every p > 0 there exist a, a, a 3, a 4 > 0 and k N such that the following statements hold true: a gy n gx a n p for every n > k, a gx n gx a n p for every n > k, a 3 x n x a 3 n p a 4 y n x a 4 n p for every n > k, for all n > k. In case the Lojasiewicz exponent of the regularization function H is θ, we have the following result concerning the convergence rates of the sequences generated by. Theorem 5 In the settings of problem consider the sequences x n n N, y n n N generated by Algorithm. Assume that g is bounded from below and that x n n N is bounded, let x critg be such that lim x n = x and suppose that H : R n R n R, Hx, y = gx + x y fulfills the Lojasiewicz property at x, x crit H with Lojasiewicz exponent θ,. Then, there exist b, b, b 3, b 4 > 0 such that the following statements hold true: b gy n gx b, for all n N n θ + ; b gx n gx b, for all n N n θ + ; b 3 x n x b 3, for all n N n θ θ + ; b 4 y n x b 4, for all n > N n θ θ +, where N N was defined in the proof of Theorem. Proof. Also here, to avoid triviality, in what follows we assume that Hy n, u n > Hx, x, for all n N. From we have that for every n N it holds where α n = n K S n and β n = n s S n n+. r n r n+ α n r θ n+ β n r n+ r n+, 9

20 Hence, r n r n+ r θ n+ + β nr n+ r n+ r θ n+ α n, for all n N. Consider the function ϕt = K θ t θ where K is the constant defined at the Lojasiewicz property of H. Then ϕ t = Kt θ and we have Analogously, hence, ϕr n+ ϕr n = rn+ r n ϕ tdt = K rn r n+ t θ dt Kr n r n+ r θ n. ϕr n+ ϕr n+ Kr n+ r n+ r θ n+. Assume that for some n N it holds that rn θ Then r θ n+. ϕr n+ ϕr n + β n ϕr n+ ϕr n+ K r n r n+ r θ n+ + Kβ nr n+ r n+ r θ n+ 9 Conversely, if r θ n ϕr n+ ϕr n = Consequently, Let C = inf n N C +β n α n K r n r n+ r θ n+ + K β nr n+ r n+ r θ n+ K α n. < r θ n+ for some n N, then θ θ r θ n < rn+ θ, K θ r θ n+ r θ n K θ θ rn θ K θ θ r θ = C θ θ N. ϕr n+ ϕr n + β n ϕr n+ ϕr n+ C + β n. > 0. Then, ϕr n+ ϕr n + β n ϕr n+ ϕr n+ C α n. 30 From 9 and 30 we get that there exists C > 0 such that ϕr n+ ϕr n + β n ϕr n+ ϕr n+ Cα n, for all n N. Let β = sup n N β n. Then the latter relation becomes ϕr n+ ϕr n + βϕr n+ ϕr n+ Cα n, for all n N, which leads to Consequently, n k=n ϕr k+ ϕr k + βϕr k+ ϕr k+ C ϕr n+ ϕr N + βϕr n+ ϕr N + C n k=n α k. n k=n α k 0

21 and by using the fact that the sequence r n n N is decreasing and ϕ is also decreasing, we obtain Hence, In other words rn θ Cθ K + β + βϕr n+ C n n Cθ θ r n α k K + β k=n n k=n α k. k=n α k, for all n N +. θ, for all n N +. Since n α k=n k αn N, where 0 < α = inf k N α k we have that there exists M > 0 such that n k=n α k Therefore, we have θ α θ n N θ Cθ θ r n α θ Mn θ K + β But, r n = gy n gx + δ n x n x n, consequently α θ Mn θ, for all n N +. = b n θ, for all n N +. gy n gx b n θ, for all n N + and b is proved. For b observe that 7 holds for all n N, hence for all n N one has gx n gy n sd sl g r n+ Thus, there exists M > 0 such that gx n gx = gx n gy n + gy n gx sd sl g b n + θ. sd sl g b M + b n θ = b n θ, for all n N +. For proving b 3 we use 8. Note that the relation x n x M rn θ holds for all n N. Hence, x n x M b n θ θ, for all n N +. Consequently, x n x b 3 n θ θ, for all n N +, where b 3 = M b θ and this proves b 3. For b 4 observe that for n N + 3 we have y n x = x n + n + α x n x n x

22 + n + α + n + α + n + α where one can take b 4 = sup n N +3 x n x + b 3 n θ θ + b n+α n + α b 3 n + α x n x n + α b 3n θ θ n θ θ b 3 + n+α b 3 According to Remark 3 we have the following corollary. b4 n θ θ, θ n θ n. Corollary 6 In the settings of problem consider the sequences x n n N, y n n N generated by Algorithm. Assume that g is bounded from below and that x n n N is bounded, let x critg be such that lim x n = x and suppose that g fulfills the Lojasiewicz property at x with Lojasiewicz exponent θ,. Then, there exist b, b, b 3, b 4 > 0 such that the following statements hold true: b gy n gx b, for all n N n θ + ; b gx n gx b, for all n N n θ + ; b 3 x n x b 3, for all n N n θ θ + ; b 4 y n x b 4, for all n > N n θ θ +, where N N was defined in the proof of Theorem. 4 Conclusions In this paper we show the convergence of a Nesterov type algorithm in a full nonconvex setting, assuming that a regularization of the objective function satisfies the Kurdyka- Lojasiewicz property. For this purpose as a starting point we show a sufficient decrease property for the iterates generated by our algorithm. Though our algorithm is asymptotically equivalent to Nesterov s accelerated gradient method, we cannot obtain full equivalence due to the fact that in order to obtain the above mentioned decrease property we cannot allow the inertial parameter, more precisely the parameter β, to attain the value. Nevertheless, we obtain convergence rates of order p for every p > 0, for the sequences generated by our numerical scheme but also for the function values in these sequences, provided the objective function, or a regularization of the objective function, satisfies the Lojasiewicz property with Lojasiewicz exponent θ 0, ]. We also show that, at least with our techniques, exponential convergence rates cannot be obtained. In case the Lojasiewicz exponent of the objective function, or a regularization of the objective function, is θ,, we obtain polynomial convergence rates. A related future research is the study of a modified FISTA algorithm in a nonconvex setting. Indeed, let f : R m R be a proper convex and lower semicontinuous function and let g : R m R be a possible nonconvex smooth function with L g Lipschitz continuous gradient. Consider the optimization problem inf fx + gx. x Rm

23 We associate to this optimization problem the following proximal-gradient algorithm. For x 0, y 0 R m consider x n+ = prox sf y n s gy n, y n = x n + n + α x n x n, where α > 0, β 0, and 0 < s < β L g. Obviously, when f 0 then 3 becomes the numerical scheme studied in the present paper. We emphasize that 3 has a similar formulation as the modified FISTA algorithm studied by Chambolle and Dossal in [5] and the convergence of the generated sequences, to a critical point of the objective function f + g, would open the gate for the study of FISTA type algorithms in a nonconvex setting. References [] H. Attouch, J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Mathematical Programming 6- Series B, 5-6, 009 [] H. Attouch, J. Bolte, P. Redont, A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka- Lojasiewicz inequality, Mathematics of Operations Research 35, , 00 [3] H. Attouch, J. Bolte, B.F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Mathematical Programming 37- Series A, 9-9, 03 [4] H. Attouch, Z. Chbani, J. Peypouquet, P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. Program. 68- Ser. B, 3-75, 08 [5] H. Attouch, Z. Chbani, H. Riahi, Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α 3, ESAIM: COCV 07. doi:0.05/cocv/07083 [6] H. Attouch, J. Peypouquet, P. Redont, Fast convex optimization via inertial dynamics with Hessian driven damping, J. Differential Equations 60, , 06 [7] A. Beck, M. Teboulle, A Fast Iterative Shrinkage- Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences, 83-0, 009 [8] J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming Series A 46-, , 04 [9] J. Bolte, A. Daniilidis, A. Lewis, The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM Journal on Optimization 74, 05-3, 006 [0] J. Bolte, A. Daniilidis, A. Lewis, M. Shiota, Clarke subgradients of stratifiable functions, SIAM Journal on Optimization 8, , 007 [] J. Bolte, A. Daniilidis, O. Ley, L. Mazet, Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity, Transactions of the American Mathematical Society 366, ,

24 [] R.I. Boţ, E.R. Csetnek, S.C. László, Approaching nonsmooth nonconvex minimization through second-order proximal-gradient dynamical systems, Journal of Evolution Equations [3] R.I. Boţ, E.R. Csetnek, S.C. László, An inertial forward-backward algorithm for minimizing the sum of two non-convex functions, Euro Journal on Computational Optimization 4, 3-5, 06 [4] R.I. Boţ, E.R. Csetnek, S.C. László, A second order dynamical approach with variable damping to nonconvex smooth minimization, Applicable Analysis [5] A. Chambolle, Ch. Dossal, On the convergence of the iterates of the fast iterative shrinkage/thresholding algorithm, J. Optim. Theory Appl. 663, , 05 [6] P. Frankel, G. Garrigos, J. Peypouquet, Splitting Methods with Variable Metric for Kurdyka Lojasiewicz Functions and General Convergence Rates, Journal of Optimization Theory and Applications, 653, , 05 [7] K. Kurdyka, On gradients of functions definable in o-minimal structures, Annales de l institut Fourier Grenoble 483, , 998 [8] G. Li, T. K. Pong, Calculus of the Exponent of Kurdyka- Lojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods, Foundations of Computational Mathematics, -34, 08 [9] S. Lojasiewicz, Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles, Éditions du Centre National de la Recherche Scientifique Paris, 87-89, 963 [0] D.A. Lorenz, T. Pock, An inertial forwardbackward algorithm for monotone inclusions, J. Math. Imaging Vis., 5, [] Y.E. Nesterov, A method for solving the convex programming problem with convergence rate O/k, Russian Dokl. Akad. Nauk SSSR 693, , 983 [] Y. Nesterov, Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, Dordrecht, 004 [3] B. T. Polyak, Some methods of speeding up the convergence of iteration methods, U.S.S.R. Comput. Math. Math. Phys., 45:-7, 964 [4] R.T. Rockafellar, R.J.-B. Wets, Variational Analysis, Fundamental Principles of Mathematical Sciences 37, Springer-Verlag, Berlin, 998 [5] W. Su, S. Boyd, E.J. Candes, A differential equation for modeling Nesterov s accelerated gradient method: theory and insights, Journal of Machine Learning Research, 7, -43, 06 4

A gradient type algorithm with backward inertial steps for a nonconvex minimization

A gradient type algorithm with backward inertial steps for a nonconvex minimization A gradient type algorithm with backward inertial steps for a nonconvex minimization Cristian Daniel Alecsa Szilárd Csaba László Adrian Viorel November 22, 208 Abstract. We investigate an algorithm of gradient

More information

An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions

An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions Radu Ioan Boţ Ernö Robert Csetnek Szilárd Csaba László October, 1 Abstract. We propose a forward-backward

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

arxiv: v2 [math.oc] 21 Nov 2017

arxiv: v2 [math.oc] 21 Nov 2017 Unifying abstract inexact convergence theorems and block coordinate variable metric ipiano arxiv:1602.07283v2 [math.oc] 21 Nov 2017 Peter Ochs Mathematical Optimization Group Saarland University Germany

More information

arxiv: v2 [math.oc] 11 Dec 2018

arxiv: v2 [math.oc] 11 Dec 2018 OPTIMAL CONVERGENCE RATES FOR NESTEROV ACCELERATION J-F AUJOL, CH DOSSAL 2 AND A RONDEPIERRE 2,3 UNIV BORDEAUX, BORDEAUX INP, CNRS, IMB, UMR 525, F-33400 TALENCE, FRANCE 2 IMT, UNIV TOULOUSE, INSA TOULOUSE,

More information

Sequential convex programming,: value function and convergence

Sequential convex programming,: value function and convergence Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional

More information

A semi-algebraic look at first-order methods

A semi-algebraic look at first-order methods splitting A semi-algebraic look at first-order Université de Toulouse / TSE Nesterov s 60th birthday, Les Houches, 2016 in large-scale first-order optimization splitting Start with a reasonable FOM (some

More information

OPTIMAL CONVERGENCE RATES FOR NESTEROV ACCELERATION

OPTIMAL CONVERGENCE RATES FOR NESTEROV ACCELERATION OPTIMAL CONVERGENCE RATES FOR NESTEROV ACCELERATION Jean François Aujol, Charles Dossal, Aude Rondepierre To cite this version: Jean François Aujol, Charles Dossal, Aude Rondepierre OPTIMAL CONVERGENCE

More information

A user s guide to Lojasiewicz/KL inequalities

A user s guide to Lojasiewicz/KL inequalities Other A user s guide to Lojasiewicz/KL inequalities Toulouse School of Economics, Université Toulouse I SLRA, Grenoble, 2015 Motivations behind KL f : R n R smooth ẋ(t) = f (x(t)) or x k+1 = x k λ k f

More information

On the iterate convergence of descent methods for convex optimization

On the iterate convergence of descent methods for convex optimization On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.

More information

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping March 0, 206 3:4 WSPC Proceedings - 9in x 6in secondorderanisotropicdamping206030 page Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping Radu

More information

From error bounds to the complexity of first-order descent methods for convex functions

From error bounds to the complexity of first-order descent methods for convex functions From error bounds to the complexity of first-order descent methods for convex functions Nguyen Trong Phong-TSE Joint work with Jérôme Bolte, Juan Peypouquet, Bruce Suter. Toulouse, 23-25, March, 2016 Journées

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems

First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems Jérôme Bolte Shoham Sabach Marc Teboulle Yakov Vaisbourd June 20, 2017 Abstract We

More information

Variable Metric Forward-Backward Algorithm

Variable Metric Forward-Backward Algorithm Variable Metric Forward-Backward Algorithm 1/37 Variable Metric Forward-Backward Algorithm for minimizing the sum of a differentiable function and a convex function E. Chouzenoux in collaboration with

More information

Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems

Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems Jérôme Bolte Shoham Sabach Marc Teboulle Abstract We introduce a proximal alternating linearized minimization PALM) algorithm

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Tame variational analysis

Tame variational analysis Tame variational analysis Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Daniilidis (Chile), Ioffe (Technion), and Lewis (Cornell) May 19, 2015 Theme: Semi-algebraic geometry

More information

A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators

A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators Radu Ioan Boţ Ernö Robert Csetnek April 23, 2012 Dedicated to Jon Borwein on the occasion of his 60th birthday Abstract. In this note

More information

Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity.

Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Jérôme BOLTE, Aris DANIILIDIS, Olivier LEY, Laurent MAZET. Abstract The classical Lojasiewicz inequality and its extensions

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

arxiv: v2 [math.oc] 6 Oct 2016

arxiv: v2 [math.oc] 6 Oct 2016 The value function approach to convergence analysis in composite optimization Edouard Pauwels IRIT-UPS, 118 route de Narbonne, 31062 Toulouse, France. arxiv:1604.01654v2 [math.oc] 6 Oct 2016 Abstract This

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from

More information

Steepest descent method on a Riemannian manifold: the convex case

Steepest descent method on a Riemannian manifold: the convex case Steepest descent method on a Riemannian manifold: the convex case Julien Munier Abstract. In this paper we are interested in the asymptotic behavior of the trajectories of the famous steepest descent evolution

More information

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 2, 26 Abstract. We begin by considering second order dynamical systems of the from

More information

Relationships between upper exhausters and the basic subdifferential in variational analysis

Relationships between upper exhausters and the basic subdifferential in variational analysis J. Math. Anal. Appl. 334 (2007) 261 272 www.elsevier.com/locate/jmaa Relationships between upper exhausters and the basic subdifferential in variational analysis Vera Roshchina City University of Hong

More information

Hedy Attouch, Jérôme Bolte, Benar Svaiter. To cite this version: HAL Id: hal

Hedy Attouch, Jérôme Bolte, Benar Svaiter. To cite this version: HAL Id: hal Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods Hedy Attouch, Jérôme Bolte, Benar Svaiter To cite

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

Active sets, steepest descent, and smooth approximation of functions

Active sets, steepest descent, and smooth approximation of functions Active sets, steepest descent, and smooth approximation of functions Dmitriy Drusvyatskiy School of ORIE, Cornell University Joint work with Alex D. Ioffe (Technion), Martin Larsson (EPFL), and Adrian

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

arxiv: v1 [math.oc] 7 Jan 2019

arxiv: v1 [math.oc] 7 Jan 2019 LOCAL MINIMIZERS OF SEMI-ALGEBRAIC FUNCTIONS TIÊ N-SO. N PHẠM arxiv:1901.01698v1 [math.oc] 7 Jan 2019 Abstract. Consider a semi-algebraic function f: R n R, which is continuous around a point x R n. Using

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University

More information

subgradient trajectories : the convex case

subgradient trajectories : the convex case trajectories : Université de Tours www.lmpt.univ-tours.fr/ ley Joint work with : Jérôme Bolte (Paris vi) Aris Daniilidis (U. Autonoma Barcelona & Tours) and Laurent Mazet (Paris xii) inequality inequality

More information

On the convergence of a regularized Jacobi algorithm for convex optimization

On the convergence of a regularized Jacobi algorithm for convex optimization On the convergence of a regularized Jacobi algorithm for convex optimization Goran Banjac, Kostas Margellos, and Paul J. Goulart Abstract In this paper we consider the regularized version of the Jacobi

More information

Convergence rate estimates for the gradient differential inclusion

Convergence rate estimates for the gradient differential inclusion Convergence rate estimates for the gradient differential inclusion Osman Güler November 23 Abstract Let f : H R { } be a proper, lower semi continuous, convex function in a Hilbert space H. The gradient

More information

AW -Convergence and Well-Posedness of Non Convex Functions

AW -Convergence and Well-Posedness of Non Convex Functions Journal of Convex Analysis Volume 10 (2003), No. 2, 351 364 AW -Convergence Well-Posedness of Non Convex Functions Silvia Villa DIMA, Università di Genova, Via Dodecaneso 35, 16146 Genova, Italy villa@dima.unige.it

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction J. Korean Math. Soc. 38 (2001), No. 3, pp. 683 695 ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE Sangho Kum and Gue Myung Lee Abstract. In this paper we are concerned with theoretical properties

More information

A proximal-newton method for monotone inclusions in Hilbert spaces with complexity O(1/k 2 ).

A proximal-newton method for monotone inclusions in Hilbert spaces with complexity O(1/k 2 ). H. ATTOUCH (Univ. Montpellier 2) Fast proximal-newton method Sept. 8-12, 2014 1 / 40 A proximal-newton method for monotone inclusions in Hilbert spaces with complexity O(1/k 2 ). Hedy ATTOUCH Université

More information

Linear convergence of iterative soft-thresholding

Linear convergence of iterative soft-thresholding arxiv:0709.1598v3 [math.fa] 11 Dec 007 Linear convergence of iterative soft-thresholding Kristian Bredies and Dirk A. Lorenz ABSTRACT. In this article, the convergence of the often used iterative softthresholding

More information

Definable Extension Theorems in O-minimal Structures. Matthias Aschenbrenner University of California, Los Angeles

Definable Extension Theorems in O-minimal Structures. Matthias Aschenbrenner University of California, Los Angeles Definable Extension Theorems in O-minimal Structures Matthias Aschenbrenner University of California, Los Angeles 1 O-minimality Basic definitions and examples Geometry of definable sets Why o-minimal

More information

On damped second-order gradient systems

On damped second-order gradient systems On damped second-order gradient systems Pascal Bégout, Jérôme Bolte and Mohamed Ali Jendoubi Abstract Using small deformations of the total energy, as introduced in [31], we establish that damped second

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

A forward-backward dynamical approach to the minimization of the sum of a nonsmooth convex with a smooth nonconvex function

A forward-backward dynamical approach to the minimization of the sum of a nonsmooth convex with a smooth nonconvex function A forwar-backwar ynamical approach to the minimization of the sum of a nonsmooth convex with a smooth nonconvex function Rau Ioan Boţ Ernö Robert Csetnek February 28, 2017 Abstract. We aress the minimization

More information

A proximal minimization algorithm for structured nonconvex and nonsmooth problems

A proximal minimization algorithm for structured nonconvex and nonsmooth problems A proximal minimization algorithm for structured nonconvex and nonsmooth problems Radu Ioan Boţ Ernö Robert Csetnek Dang-Khoa Nguyen May 8, 08 Abstract. We propose a proximal algorithm for minimizing objective

More information

arxiv: v1 [math.oc] 11 Jan 2008

arxiv: v1 [math.oc] 11 Jan 2008 Alternating minimization and projection methods for nonconvex problems 1 Hedy ATTOUCH 2, Jérôme BOLTE 3, Patrick REDONT 2, Antoine SOUBEYRAN 4. arxiv:0801.1780v1 [math.oc] 11 Jan 2008 Abstract We study

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

A double projection method for solving variational inequalities without monotonicity

A double projection method for solving variational inequalities without monotonicity A double projection method for solving variational inequalities without monotonicity Minglu Ye Yiran He Accepted by Computational Optimization and Applications, DOI: 10.1007/s10589-014-9659-7,Apr 05, 2014

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Received: 15 December 2010 / Accepted: 30 July 2011 / Published online: 20 August 2011 Springer and Mathematical Optimization Society 2011

Received: 15 December 2010 / Accepted: 30 July 2011 / Published online: 20 August 2011 Springer and Mathematical Optimization Society 2011 Math. Program., Ser. A (2013) 137:91 129 DOI 10.1007/s10107-011-0484-9 FULL LENGTH PAPER Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward backward splitting,

More information

arxiv: v3 [math.oc] 27 Oct 2016

arxiv: v3 [math.oc] 27 Oct 2016 A Multi-step Inertial Forward Backward Splitting Method for Non-convex Optimization arxiv:1606.02118v3 [math.oc] 27 Oct 2016 Jingwei Liang Jalal M. Fadili Gabriel Peyré Abstract In this paper, we propose

More information

(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively);

(x k ) sequence in F, lim x k = x x F. If F : R n R is a function, level sets and sublevel sets of F are any sets of the form (respectively); STABILITY OF EQUILIBRIA AND LIAPUNOV FUNCTIONS. By topological properties in general we mean qualitative geometric properties (of subsets of R n or of functions in R n ), that is, those that don t depend

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

SHARPNESS, RESTART AND ACCELERATION

SHARPNESS, RESTART AND ACCELERATION SHARPNESS, RESTART AND ACCELERATION VINCENT ROULET AND ALEXANDRE D ASPREMONT ABSTRACT. The Łojasievicz inequality shows that sharpness bounds on the minimum of convex optimization problems hold almost

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing

A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing Emilie Chouzenoux emilie.chouzenoux@univ-mlv.fr Université Paris-Est Lab. d Informatique Gaspard

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Kantorovich s Majorants Principle for Newton s Method

Kantorovich s Majorants Principle for Newton s Method Kantorovich s Majorants Principle for Newton s Method O. P. Ferreira B. F. Svaiter January 17, 2006 Abstract We prove Kantorovich s theorem on Newton s method using a convergence analysis which makes clear,

More information

Inertial Douglas-Rachford splitting for monotone inclusion problems

Inertial Douglas-Rachford splitting for monotone inclusion problems Inertial Douglas-Rachford splitting for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek Christopher Hendrich January 5, 2015 Abstract. We propose an inertial Douglas-Rachford splitting algorithm

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Continuous Functions on Metric Spaces

Continuous Functions on Metric Spaces Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Convergence of descent methods for semi-algebraic and tame problems.

Convergence of descent methods for semi-algebraic and tame problems. OPTIMIZATION, GAMES, AND DYNAMICS Institut Henri Poincaré November 28-29, 2011 Convergence of descent methods for semi-algebraic and tame problems. Hedy ATTOUCH Institut de Mathématiques et Modélisation

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Une méthode proximale pour les inclusions monotones dans les espaces de Hilbert, avec complexité O(1/k 2 ).

Une méthode proximale pour les inclusions monotones dans les espaces de Hilbert, avec complexité O(1/k 2 ). Une méthode proximale pour les inclusions monotones dans les espaces de Hilbert, avec complexité O(1/k 2 ). Hedy ATTOUCH Université Montpellier 2 ACSIOM, I3M UMR CNRS 5149 Travail en collaboration avec

More information

Variational Analysis and Tame Optimization

Variational Analysis and Tame Optimization Variational Analysis and Tame Optimization Aris Daniilidis http://mat.uab.cat/~arisd Universitat Autònoma de Barcelona April 15 17, 2010 PLAN OF THE TALK Nonsmooth analysis Genericity of pathological situations

More information

2 Statement of the problem and assumptions

2 Statement of the problem and assumptions Mathematical Notes, 25, vol. 78, no. 4, pp. 466 48. Existence Theorem for Optimal Control Problems on an Infinite Time Interval A.V. Dmitruk and N.V. Kuz kina We consider an optimal control problem on

More information

arxiv: v1 [math.na] 25 Sep 2012

arxiv: v1 [math.na] 25 Sep 2012 Kantorovich s Theorem on Newton s Method arxiv:1209.5704v1 [math.na] 25 Sep 2012 O. P. Ferreira B. F. Svaiter March 09, 2007 Abstract In this work we present a simplifyed proof of Kantorovich s Theorem

More information

Spring School on Variational Analysis Paseky nad Jizerou, Czech Republic (April 20 24, 2009)

Spring School on Variational Analysis Paseky nad Jizerou, Czech Republic (April 20 24, 2009) Spring School on Variational Analysis Paseky nad Jizerou, Czech Republic (April 20 24, 2009) GRADIENT DYNAMICAL SYSTEMS, TAME OPTIMIZATION AND APPLICATIONS Aris DANIILIDIS Departament de Matemàtiques Universitat

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

Convergence of generalized entropy minimizers in sequences of convex problems

Convergence of generalized entropy minimizers in sequences of convex problems Proceedings IEEE ISIT 206, Barcelona, Spain, 2609 263 Convergence of generalized entropy minimizers in sequences of convex problems Imre Csiszár A Rényi Institute of Mathematics Hungarian Academy of Sciences

More information

1 Introduction and preliminaries

1 Introduction and preliminaries Proximal Methods for a Class of Relaxed Nonlinear Variational Inclusions Abdellatif Moudafi Université des Antilles et de la Guyane, Grimaag B.P. 7209, 97275 Schoelcher, Martinique abdellatif.moudafi@martinique.univ-ag.fr

More information

REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

More information

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities Caihua Chen Xiaoling Fu Bingsheng He Xiaoming Yuan January 13, 2015 Abstract. Projection type methods

More information

Weak sharp minima on Riemannian manifolds 1

Weak sharp minima on Riemannian manifolds 1 1 Chong Li Department of Mathematics Zhejiang University Hangzhou, 310027, P R China cli@zju.edu.cn April. 2010 Outline 1 2 Extensions of some results for optimization problems on Banach spaces 3 4 Some

More information

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L. McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal

More information

Regularity estimates for fully non linear elliptic equations which are asymptotically convex

Regularity estimates for fully non linear elliptic equations which are asymptotically convex Regularity estimates for fully non linear elliptic equations which are asymptotically convex Luis Silvestre and Eduardo V. Teixeira Abstract In this paper we deliver improved C 1,α regularity estimates

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Optimality Conditions for Nonsmooth Convex Optimization

Optimality Conditions for Nonsmooth Convex Optimization Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Geometric Descent Method for Convex Composite Minimization

Geometric Descent Method for Convex Composite Minimization Geometric Descent Method for Convex Composite Minimization Shixiang Chen 1, Shiqian Ma 1, and Wei Liu 2 1 Department of SEEM, The Chinese University of Hong Kong, Hong Kong 2 Tencent AI Lab, Shenzhen,

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information