A gradient type algorithm with backward inertial steps for a nonconvex minimization

Size: px
Start display at page:

Download "A gradient type algorithm with backward inertial steps for a nonconvex minimization"

Transcription

1 A gradient type algorithm with backward inertial steps for a nonconvex minimization Cristian Daniel Alecsa Szilárd Csaba László Adrian Viorel November 22, 208 Abstract. We investigate an algorithm of gradient type with a backward inertial step in connection with the minimization of a nonconvex differentiable function. We show that the generated sequences converge to a critical point of the objective function, if a regularization of the objective function satisfies the Kurdyka- Lojasiewicz property. Further, we provide convergence rates for the generated sequences and the objective function values formulated in terms of the Lojasiewicz exponent. Finally, some numerical experiments are presented in order to compare our numerical scheme with some algorithms well known in the literature. Key Words. inertial algorithm, nonconvex optimization, Kurdyka- Lojasiewicz inequality, convergence rate AMS subject classification. 90C26, 90C30, 65K0 Introduction and Preliminaries Gradient-type algorithms have a long history, going back at least to Cauchy (847), and also a wealth of applications. Solving linear systems, Cauchy s original motivation, is maybe the most obvious application, but many of today s hot topics in machine learning or image processing also deal with optimization problems from an algorithmic perspective and rely on gradient-type algorithms. The original gradient descent algorithm x n+ = x n s g(x n ), which is precisely an explicit Euler method applied to the gradient flow ẋ(t) = g(x(t)), does not achieve very good convergence rates and much research has been dedicated to accelerating convergence. Based on the analogy to mechanical systems, e.g. to the movement, with friction, of a heavy ball in a potential well defined by the objective function g, Polyak [20] was able to provide the seminal idea Tiberiu Popoviciu Institute of Numerical Analysis, Romanian Academy, Cluj-Napoca, Romania and Department of Mathematics, Babes-Bolyai University, Cluj-Napoca, Romania, cristian.alecsa@math.ubbcluj.ro. This work was supported by a grant of Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P-.-TE , within PNCDI III. Technical University of Cluj-Napoca, Department of Mathematics, Str. Memorandumului nr. 28, 4004 Cluj-Napoca, Romania, laszlosziszi@yahoo.com. This work was supported by a grant of Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P-.-TE , within PNCDI III. Technical University of Cluj-Napoca, Department of Mathematics, Str. Memorandumului nr. 28, 4004 Cluj-Napoca, Romania, oviorel@mail.utcluj.ro. This work was supported by a grant of Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P-.-TE , within PNCDI III.

2 for achieving acceleration namely the addition of inertial (momentum) terms to the gradient algorithm. Probably the most acclaimed inertial algorithm is Nesterov s accelerated gradient method, which in its particular form x n+ = y n s g(y n ), y n = x n + n n + 3 (x n x n ), for a convex g with Lipschitz continuous gradient, exhibits an improved convergence rate of O(/n 2 ) and which, as highlighted by Su, Boyd and Candés [22], can be seen as the discrete counterpart of the second order differential equation ẍ(t) + 3 ẋ(t) g(x(t)) = 0. t In this paper, we deal with the optimization problem inf g(x), x R m where g : R m R is a Fréchet differentiable function with L g -Lipschitz continuous gradient, i.e. there exists L g 0 such that g(x) g(y) L g x y for all x, y R m, and we associate to (P) the following inertial algorithm of gradient type. Consider the starting points x 0 = y 0 R m, and for every n N let x n+ = y n g(y n ), () y n = x n + α n (x n x n ), where we assume that lim α n = α n + ( 0 + ) 68, 0, lim 8 = β and 0 < β < 4α2 + 0α + 2 n + L g (2α + ) 2. Observe that the inertial parameter α n becomes negative after a number of iteration and this can be viewed as we take a backward inertial step in our algorithm. We emphasize that the analysis of the proposed algorithm is intimately related to the properties of the following regularization of the objective function g (see also [0,, 2, 5, 6]), that is H : R m R m R, defined by : H(x, y) = g(x) + M y x 2, M > 0. (2) In the remaining of this section we introduce the necessary apparatus of notions and results that we will need in our forthcoming analysis. For a differentiable function f : R m R we denote by crit(f) = {x R m : f(x) = 0} the set of critical points of f. The following so called descent lemma, see [9], will play an essential role in our forthcoming analysis. Lemma Let f : R m R be Frèchet differentiable with L Lipschitz continuous gradient. Then f(y) f(x) + f(x), y x + L 2 y x 2, x, y R m. Furthermore, the set of cluster points of a given sequence (x n ) n N will be denoted by ω((x n ) n N ). At the same time, the distance function to a set, is defined for A R m as dist(x, A) = inf y A x y for all x Rm. Our convergence result relies on the concept of a KL function. For η (0, + ], we denote by Θ η the class of concave and continuous functions φ : [0, η) [0, + ) such that φ(0) = 0, φ is continuously differentiable on (0, η), continuous at 0 and φ (s) > 0 for all s (0, η). (P) 2

3 Definition (Kurdyka- Lojasiewicz property) Let f : R n R be a differentiable function. We say that f satisfies the Kurdyka- Lojasiewicz (KL) property at x R n if there exists η (0, + ], a neighborhood U of x and a function φ Θ η such that for all x in the intersection the following inequality holds U {x R m : f(x) < f(x) < f(x) + η} φ (f(x) f(x)) f(x)). If f satisfies the KL property at each point in R m, then f is called a KL function. The function φ is called a desingularizing function (see for instance [5]. The origins of this notion go back to the pioneering work of Lojasiewicz [7], where it is proved that for a real-analytic function f : R m R and a critical point x R m (that is f(x) = 0), there exists θ [/2, ) such that the function f f(x) θ f is bounded around x. This corresponds to the situation when φ(s) = C( θ) s θ, where C > 0 is a given constant, and leads to the following definition. Definition 2 (for which we refer to [7, 7, ].) A differentiable function f : R m R has the Lojasiewicz property with exponent θ (0, ) at x crit(f) if there exist K, ϵ > 0 such that for every x R m, with x x < ϵ. f(x) f(x) θ K f(x), (3) The result of Lojasiewicz allows the interpretation of the KL property as a re-parametrization of the function values in order to avoid flatness around the critical points. Kurdyka [4] extended this property to differentiable functions definable in an o-minimal structure. Further extensions to the nonsmooth setting can be found in [7, 2, 8, 9]. To the class of KL functions belong semi-algebraic, real sub-analytic, semiconvex, uniformly convex and convex functions satisfying a growth condition. We refer the reader to [7, 2, 9, 6, 8, 3, ] and the references therein for more details regarding all the classes mentioned above and illustrating examples. Finally, an important role in our convergence analysis will be played by the following uniformized KL property given in [6, Lemma 6]. Lemma 2 Let Ω R m be a compact set and let f : R m R be a differentiable function. Assume that f is constant on Ω and f satisfies the KL property at each point of Ω. Then there exist ε, η > 0 and φ Θ η such that for all x Ω and for all x in the intersection {x R m : dist(x, Ω) < ε} {x R m : f(x) < f(x) < f(x) + η} (4) the following inequality holds φ (f(x) f(x)) f(x). (5) The outline of the paper is the following. In Section 2 we give a sufficient condition that ensures the decrease property of the regularization H in the iterates, and which also ensures that the iterates gap belongs to l 2. Further, using these results, we show that the set of cluster points of the iterates is included in the set of critical points of the objective function. Finally, by means of the the KL property of H we obtain that the iterates gap belongs to l. This implies the convergence of the iterates, see also [3, 6,, 5]. In Section 3, we obtain several convergence rates both for the sequences (x n ) n N, (y n ) n N generated by the numerical scheme (), as well as for the function values g(x n ), g(y n ) in the terms of the Lojasiewicz exponent of g and H, respectively, see also [3, 5]. Finally, in Section 4, we present some numerical experiment that shows that our algorithm, in many cases, has better properties than the algorithms used in the literature. 3

4 2 Convergence results We start to investigate the convergence of the proposed algorithm by showing that H is decreasing along certain sequences built upon the iterates generated by (). Theorem 3 Assume that g is bounded from below and consider the sequences (x n ) n N, (y n ) n N generated by the numerical scheme () together with u n = where δ n will be explicitly specified later. Then, there exists N N such that δn M (x n x n ) + y n, (6) (i) The sequence (H(y n, u n )) n N = ( g(y n ) + δ n x n x n 2) n N is decreasing and δ n > 0 for all n N. (ii) The sequence (H(y n, u n )) n N is convergent; (iii) n x n x n 2 < +. Remark 4 An interesting fact is that for the sequence (H(y n, u n )) n N to be decreasing one does not need the boundedness of the objective function g, but only its regularity, as can bee seen below, in the proof of Theorem 3. The energy decay is thus a structural property of the algorithm (). Proof. By applying the descent Lemma to g(y n+ ) we have g(y n+ ) g(y n ) + g(y n ), y n+ y n + L g 2 y n+ y n 2. However, after rewriting the first equation in () as g(y n ) = (y n x n+ ) and taking the inner product with y n+ y n to obtain the descent inequality becomes g(y n ), y n+ y n = y n x n+, y n+ y n, g(y n+ ) L g 2 y n+ y n 2 g(y n ) + y n x n+, y n+ y n. (7) Further, y n x n+, y n+ y n = y n+ y n 2 + y n+ x n+, y n+ y n, y n+ y n = ( + α n+ )(x n+ x n ) α n (x n x n ) and hence y n+ x n+ = α n+ ((x n+ x n ), ( g(y n+ ) + L ) g y n+ y n 2 g(y n ) + α n+ x n+ x n, y n+ y n. (8) 2 Thus we have y n+ y n 2 = ( + α n+ )(x n+ x n ) α n (x n x n ) 2 = 4

5 and and ( + α n+ ) 2 x n+ x n 2 + α 2 n x n x n 2 2α n ( + α n+ ) x n+ x n, x n x n, x n+ x n, y n+ y n = x n+ x n, ( + α n+ )(x n+ x n ) α n (x n x n ) = Replacing the above equalities in (8) gives ( + α n+ ) x n+ x n 2 α n x n+ x n, x n x n. g(y n+ ) + (2 L g )( + α n+ ) 2 2α n+ ( + α n+ ) x n+ x n 2 2 g(y n ) (2 L g )α 2 n 2 x n x n 2 + (2 L g )α n ( + α n+ ) α n α n+ x n+ x n, x n x n. The above inequality motivates the introduction of the following notations A n = (2 L g )( + α n+ ) 2 2α n+ ( + α n+ ) 2, B n = (2 L g )α 2 n 2, C n = (2 L g )α n ( + α n+ ) α n α n+ 2 δ n := A n + C n and n = A n + B n + C n + C n (9) for all n N, n. In terms of these notations, after using the equality we can write 2 x n+ x n, x n x n = x n+ x n 2 x n+ x n 2 x n x n 2, C n x n+ x n 2 + g(y n+ ) + (A n + C n ) x n+ x n 2 g(y n ) + ( C n B n ) x n x n 2. (0) Consequently, we have C n x n+ x n 2 + n x n x n 2 (g(y n ) + δ n x n x n 2 ) (g(y n+ ) + δ n+ x n+ x n 2 ). () ( ) Now, since α n α, β as n + and α , 0, 0 < β < 4α2 +0α+2 L g(2α+) we have 2 lim A n = (2 βl g)(α + ) 2 + 2α 2α 2, n + 2β lim B n = (2 βl g)α 2, n + 2β lim C n = (2 βl g)α( + α) α 2 < 0, n + 2β lim n = (2 βl g)(2α + ) 2 + 2α 4α 2 > 0, n + 2β lim δ n = (2 βl g)(2α 2 + 3α + ) + 2α 3α 2 > 0. n + 2β 5

6 Hence, there exists N N and C > 0, D > 0 such that for all n N one has C n C, n D and δ n > 0 which, in the view of (), shows (i), that is, the sequence g(y n )+δ n x n x n 2 is decreasing for n N. By using () again, we obtain 0 < C x n+ x n 2 + D x n x n 2 (g(y n ) + δ n x n x n 2 ) (g(y n+ ) + δ n+ x n+ x n 2 ), for all n N, or, more convenient, that 0 < D x n x n 2 (g(y n ) + δ n x n x n 2 ) (g(y n+ ) + δ n+ x n+ x n 2 ), (2) for all n N. Let r > N. Summing up the latter relations gives D which leads to r x n x n 2 (g(y N ) + δ N x N x N 2 ) (g(y r+ ) + δ r+ x r+ x r 2 ) n=n g(y r+ ) + D r x n x n 2 g(y N ) + δ N x N x N 2. (3) n=n Now, taking into account that g is bounded from below, after letting r + we obtain which proves (iii). This also shows that hence x n x n 2 < + n=n lim x n x n 2 = 0, n + lim δ n x n x n 2 = 0. n + But then, using again the fact that g is bounded from below we have that the sequence g(y n ) + δ n x n x n 2 is bounded from below and also decreasing (see (i)) for n N, hence there exists lim g(y n) + δ n x n x n 2 R. n + Remark 5 Observe that conclusion (iii) in the hypotheses of Theorem 3 assures that the sequence (x n x n ) n N l 2, in particular that lim (x n x n ) = 0. (4) n + Lemma 6 In the framework of the optimization problem (P) assume that the objective function g is bounded from below and consider the sequences (x n ) n N and (y n ) n N generated by the numerical algorithm () and let (u n ) n N defined by (6). Then, the following statements are valid : (i) the sets of cluster points of (x n ) n N, (y n ) n N and (u n ) n N coincide and are contained in the set o critical points of g, i.e., ω((x n ) n N ) = ω((y n ) n N ) = ω((u n ) n N ) crit(g); 6

7 (ii) ω((y n, u n ) n N ) crit(h) = {(x, x) R m R m : x crit(g)}. Proof. (i) We start by proving ω((x n ) n N ) ω((u n ) n N ) and ω((x n ) n N ) ω((y n ) n N ). Bearing in mind that lim (x n x n ) = 0 and that the sequences (δ n ) n N, (α n ) n N and ( ) n N are convergent, n the conclusion is quite straightforward. Indeed, if x ω((x n ) n N ) and (x nk ) k N is a subsequence such that lim x n k = x then k and lim y n k = k lim u n k = k lim x n k + lim α n k lim (x n k x nk ) k k k lim δ n k lim (x n k x nk ) + k k lim k y n k imply that the sequences (x nk ) k N, (y nk ) k N and (u nk ) k N all converge to the same element x R m. In order to prove that ω((x n ) n N ) crit(g) we use the fact that g is a continuous operator. So, passing to the limit in g(y nk ) = (y nk x nk +) gives k g(x) = lim n k ) k = lim lim n k x nk +) n k k k and finally, as y nk x nk + = (x nk x nk +) + α nk (x nk x nk ), to g(x) = 0. The reverse inclusions follow in a very similar manner from the definitions of u n and y n. For proving the statement (ii), we rely on a direct computation yielding H(x, y) = ( g (x) + 2M (x y), 2M (y x)), (5) which, in turn, gives crit(h) = { (x, x) R m R m } : x crit(g) and allows us to apply (i) to obtain the desired conclusion. Some direct consequences of Theorem 3 (ii) and Lemma 6 are the following. Fact 7 In the setting of Lemma 6, let (x, x) ω((y n, u n ) n N ). It follows that x crit(g) and lim H(y n, u n ) = H(x, x). n Consequently, H is finite and constant on the set ω((y n, u n ) n N ). The arguments behind the proofs of the following facts are the same as those in Lemma 4 from [5]. Fact 8 If the assumptions from Lemma 6 hold true and if the sequence (x n ) n N is bounded, then the following conclusions hold up : (i) ω((y n, u n ) n N ) is nonempty and compact, (ii) lim dist((y n, u n ), ω((y n, u n ) n N )) = 0. n + 7

8 Remark 9 We emphasize that if g is coercive, that is lim x + g(x) = +, then g is bounded from below and (x n ) n N, (y n ) n N, the sequences generated by (), are bounded. Indeed, notice that g is bounded from below, being a continuous and coercive function (see [2]). Note that according to Theorem 3 the sequence D r n=n x n x n 2 is convergent hence is bounded. Consequently, from (3) it follows that y r is contained for every r > N, (N is defined in the hypothesis of Theorem 3), in a lower level set of g, which is bounded. Since (y n ) n N is bounded, taking into account (4), it follows that (x n ) n N is also bounded. Now, based on the conclusions of Lemma 6, we present a result which will be crucial latter on. For our next result, will denote the -norm and 2 will represent the 2-norm on the linear space R m R m. Lemma 0 Let H, (x n ) n N, (y n ) n N and (u n ) n N be as in all the previous results, with the mapping g bounded from below. Then, the following gradient inequalities hold true [ H(y n, u n ) 2 H(y n, u n ) (αn ) ] δn x n+ x n + + 4M x n x n (6) M and H(y n, u n ) β 2 n ( x n+ x n α n 2M ) 2 δn + 2δ n M x n x n 2. (7) M Proof. First of all note that from our numerical scheme () we have g(y n ) = ((x n x n+ )+α n (x n x n )). In terms of the on R m R m, we have H(y n, u n ) = ( g(y n ) + 2M (y n u n ), 2M (u n y n )) = g(y n ) + 2M (y n u n ) + 2M u n y n x n+ x n + α n x n x n + 4M which proves the desired inequality. Now, with respect to the Euclidean norm, similar arguments yield δn M x n x n, H(y n, u n ) 2 2 = g(y n ) + 2M(y n u n ) 2 + 2M(u n y n ) 2 δn ( ) 2 = g(y n ) 2M M (x n x n ) 2 + 4M 2 δn x n x n 2 M ( 2 ) βn 2 x n+ x n 2 α n δn M (x n x n ) + 4Mδ M n x n x n 2, completing the proof. Our main result concerning the convergence of the sequence (x n ) n N generated by the algorithm () to a critical point of the objective function g is the following. Theorem Consider the sequences (x) n N, (y n ) n N generated by the algorithm () and let the objective function g be bounded from below. If the sequence (x n ) n N is bounded and H is a KL function, then x n x n < + (8) n= and there exists an element x crit(g) such that lim x n = x. n + 8

9 Proof. Consider ( x, x) from the set ω((y n, u n ) n N ) under the assumptions of Lemma (6). It follows that x crit(g). Also, using Fact (7), we get that lim H(y n, u n ) = H( x, x). Furthermore, we consider n two cases: I. By using N from Theorem (3), assume that there exists n N, with n N, such that H(y n, u n ) = H( x, x). Then, since (H(y n, u n )) n N is a decreasing sequence, it follows that H(y n, u n ) = H(x, x), for every n n, Now, using (2) we get that for every n n we have the following inequality: 0 D x n x n 2 H(y n, u n ) H(y n+, u n+ ) = H( x, x) H( x, x) = 0. So, the sequence (x n ) n n is constant and the conclusion holds true. II. Now, we deal with the case when H(y n, u n ) > H(x, x), for every n N. So, consider the set Ω := ω((y n, u n ) n N ). From Fact (8) we have that the set Ω is nonempty and compact. Also, Fact (7) assures that mapping H is constant on Ω. From the hypotheses of the theorem we have that H is a KL function. So, according to Lemma 2, there exists ε > 0, η > 0 and a function φ Θ η, such that for all the points (z, w) from the set {(z, w) R m R m : dist((z, w), Ω) < ε} {(z, w) R m R m : H( x, x) < H(z, w) < η + H( x, x)} one has that On the other hand, using Fact (8), we obtain that exists an index n N, for which it is valid Because and since φ (H(z, w) H( x, x)) H(z, w). lim dist((y n, u n ), Ω) = 0. This means that there n + dist((y n, u n ), Ω) < ε, for all n n. lim H(y n, u n ) = H( x, x) n + H(y n, u n ) > H( x, x), then there exists another index n 2 N, such that for all n N H( x, x) < H(y n, u n ) < H( x, x) + η, for every n n 2. Taking n := max(n, n 2 ) we get that for each n n it follows Since the function φ is concave, we have φ (H(y n, u n ) H( x, x)) H(y n, u n ). φ(h(y n, u n ) H( x, x)) φ(h(y n+, u n+ ) H( x, x)) φ (H(y n, u n ) H( x, x)) (H(y n, u n ) H(y n+, u n+ )). Thus, the following relation takes place for each n n: φ(h(y n, u n ) H( x, x)) φ(h(y n+, u n+ ) H( x, x)) H(y n, u n ) H(y n+, u n+ ). H(y n, u n ) 9

10 On one hand, combining the inequality (2) and (6), it follows that for every n n φ(h(y n, u n ) H( x, x)) φ(h(y n+, u n+ ) H( x, x)) x n x n+ + D x n x n [ 2 α n + 4M δn M ] x n x n. (9) On the other hand, we know that the sequences (α n ) n N, ( ) n N and (δ n ) n N are convergent, and ( ) ( ) α n δn lim n + = β > 0, hence and + 4M are bounded. This shows that there M n N exists N N, N n and there exists M > 0, such that { α n, + 4M Thus, the inequality (9) becomes sup n N δn M } n N M. φ(h(y n, u n ) H( x, x)) φ(h(y n+, u n+ ) H( x, x)) D x n x n 2 M ( x n x n+ + x n x n ), (20) for every n N. This implies that for each n N, the following inequality holds : M x n x n D (φ(h(y n, u n ) H( x, x)) φ(h(y n+, u n+ ) H( x, x))) ( x n x n+ + x n x n ). From the well-known arithmetical-geometrical inequality, it follows that M D (φ(h(y n, u n ) H( x, x)) φ(h(y n+, u n+ ) H( x, x))) ( x n x n+ + x n x n ) x n+ x n + x n x n 3 Therefore, we obtain x n+ x n + x n x n 3 Consequently, we have + 3M 4D (φ(h(y n, u n ) H( x, x)) φ(h(y n+, u n+ ) H( x, x))). x n x n + 3M 4D (φ(h(y n, u n ) H( x, x)) φ(h(y n+, u n+ ) H( x, x))). 2 x n x n x n x n+ (2) 9M 4D (φ(h(y n, u n ) H( x, x)) φ(h(y n+, u n+ ) H( x, x))), for every n N, with n N. Now, by summing up the latter inequality from N to P N, we get that P x n x n x P + x P x N x N n= N + 9M 4D (φ(h(y N, u N ) H( x, x)) φ(h(y P +, u P + ) H( x, x))). 0

11 Now, it is time to use the fact that φ(0) = 0. In this setting, by letting P + and by using (4) we obtain It implies that x n x n x N x N + 9M φ(h(y N, u N ) H( x, x)) < +. 4D n= N x n x n < +, n= so the first part of the proof is done. At the same time, the sequence (S n ) n N, defined by S n = n x i x i i= is Cauchy. Thus, for every ε > 0, there exists a positive integer number N ε, such that for each n N ε and for all p N, one has S n+p S n ε. Furthermore, S n+p S n = n+p i=n+ x i x i n+p i=n+ (x i x i ) = x n+p x n. So, the sequence (x n ) n N is Cauchy, hence is convergent, i.e. there exists x R m, such that Thus, by using (i) of Lemma (6), it follows that lim x n = x. n + {x} = ω((x n ) n N ) crit(g), which leads to the conclusion of the second part of the present theorem. Remark 2 Since the class of semi-algebraic functions is closed under addition (see for example [6]) and (x, y) M x y 2 is semi-algebraic, the conclusion of the previous theorem holds if the condition H is a KL function is replaced by the assumption that g is semi-algebraic. Note that, according to Remark 9, the conclusion of Theorem remains valid if we replace in its hypotheses the conditions that g is bounded from below and (x n ) n N is bounded by the condition that g is coercive. Note that under the assumptions of Theorem we have lim n + y n = x and 3 Convergence rates lim g(x n) = lim g(y n) = g(x). n + n + In the following theorems we provide convergence rates for the sequence generated by (), but also for the function values, in terms of the Lojasiewicz exponent of H (see, also, [, 7, 5]). Note that the forthcoming results remain valid if one replace in their hypotheses the conditions that g is bounded from below and (x n ) n N is bounded by the condition that g is coercive.

12 Theorem 3 In the settings of problem (P) consider the sequences (x n ) n N, (y n ) n N generated by Algorithm (). Assume that g is bounded from below and that (x n ) n N is bounded, let x crit(g) be such that lim n + x n = x and suppose that H : R m R m R, H(x, y) = g(x) + M x y 2 fulfills the Lojasiewicz property at (x, x) crit H with Lojasiewicz exponent θ ( 0, 2]. Then, for every p > 0 there exist a, a 2, a 3, a 4 > 0 and k N such that the following statements hold true: (a ) g(y n ) g(x) a n p for every n k, (a 2 ) g(x n ) g(x) a 2 n p for every n k, (a 3 ) x n x a 3 n p 2 (a 4 ) y n x a 4 n p 2 for every n k, for all n k. Proof. We start by employing the ideas from the proof of Theorem 3, namely if there exists n N, with n N, for which one has that H(y n, u n ) = H(x, x), then it follows that the sequence (x n ) n n is constant. This leads to the fact that the sequence (y n ) n n is also constant. Furthermore, H(y n, u n ) = H(x, x) for all n n. That is, if the regularized energy is constant after a certain number of iterations, one can see that the conclusion follow in a straightforward way. Now, we can easily assume that In order to simplify notations we will use H(y n, u n ) > H(x, x) for all n N. H n := H(y n, u n ), H := H(x, x) and H n := H(y n, u n ). Our analysis aims at deriving a difference inequality for r n := H n H = H(y n, u n ) H(x, x) > 0 based on three previously established fundamental relations: a) the energy decay relation (2), for every n N H n H n+ D x n x n 2, b) the energy-gradient estimate (7), for every n N H n 2 2 βn 2 x n+ x n 2 + S n x n x n 2 where ( S n = 2 α n 2M ) 2 δn + 2δ n M, M 2

13 c) the Lojasiewicz inequality (3), for every n N (H n H) 2θ K 2 H n 2, where N is defined by the Lojasiewitz property of H at (x, x) such that for ϵ > 0 one has (y n, u n ) (x, x) < ϵ for all n N. By combining these three inequalities, one reaches (H n H) 2θ 2K2 β 2 nd (H n+ H n+2 ) + K2 S n D (H n H n+ ) and we are led to a nonlinear second order difference inequality r 2θ n 2K2 β 2 nd (r n+ r n+2 ) + K2 S n D (r n r n+ ), (22) for every n N. Using the fact that the positive sequence (r n ) n N is decreasing and converging to zero we have that r n+ r n and that r n < for n large enough. These observations lead, in the view θ (0, /2], to the existence of N 2 N, N 2 N such that rn 2θ θ-independent difference inequality for every n N 2, with the shorthand notations r 2θ n+ r n+ for all n N 2 and (22) reduces to the linear, r n µ n r n+2 + ( + λ n µ n )r n+, (23) λ n := D K 2 and µ n := 2 S n βns 2. n In order to derive our first convergence rate we proceed as follows: for any p > 0 we consider the diverging sequence (ξ n ) n N 2, ξ n +, defined by Since ξ n := lim ( + λ n µ n ) µ n+ n ξ n+ µ n n p (n + ) p n p. µ n + µ n ξ n + = lim λ n > 0, n there exists a large enough index k N, k N 2, such that for each n k, one has + λ n µ n µ n+ µ n ξ n+ + µ + (24) n ξ n Combining the above inequality with the difference inequality for r n yields after some computations r n + µ ( n + µ r n n+ + µ ) n+ µ n r n+ + ξ n+ + µ r n+ n+2 ξ n ξ n+ 3

14 for each n k, and after multiplying these inequalities from k to n and simplifying we have µ r k + r k k+ + µ n k + µ µ n r i+ n+ + r n+2 + µ. (25) n+ ξ i=k k ξ i+ ξ n+ ) p, which implies Now, using the fact that µ n+ ξ n+ = we obtain µ k where a = r k + r k+ + µ k ξk a ( n i=k ( n + 2 n + n n + µ = i+ ξ i=k i+ finally yields the desired estimate ( ) i + p = i + 2 ( ) p k +, n + 2 ( ) p µ n a r n+ + r n+2 n µ n+ ξ n+ (k + ) p. Now, from the definition of r n ) p a ( n + ) p r n = g(y n ) g( x) + δ n x n x n 2, g(y n ) g( x) r n a n p. (26) In order to give an upper bound for the difference g(x n ) g( x), we consider the following chain of inequalities based upon Lemma : g(x n ) g(y n ) g(y n ), x n y n + L g 2 x n y n 2 = (y n x n+ ), α n (x n x n ) + L g 2 x n y n 2 = x n+ x n, α n (x n x n ) α 2 2 L g n x n x n 2. 2 Here, using the inequality x n+ x n, α n (x n x n ) [ ] x n+ x n 2 + (2 L g )α L n x n x n 2, g leads, after some simplifications, to g(x n ) g(y n ) 2 (2 L g ) x n+ x n 2. By combining the inequality (2) with the fact that the sequence (g(y n )+δ n x n x n 2 ) n k is decreasing and converges to g(x), one obtains g(x n ) g(y n ) 2D (2 L g ) r n+, for all n k. (27) 4

15 From (26) we have r n+ a (n + ) p a n p, hence g(x n ) g(y n ) 2D (2 L g ) a, for all n k. np This means that for every n > k one has [ ] g(x n ) g( x) = (g(x n ) g(y n )) + (g(y n ) g(x)) a 2D (2 L g ) + n p. Since the sequence ( ) n N is convergent to β > 0, we can choose and we have a 2 = a sup n k 2D (2 L g ) + a g(x n ) g( x) a 2, for every n k. (28) np We continue the proof by establishing an estimate for x n x. By the triangle inequality and by summing up (2) from n k to P > n one has x P x n so, letting P gives P x k x k k=n x n x n + x P + x P + 9M [ φ(hn H) φ(h P + H) ], 4D x n x 9M 4D φ(h n H). Recall, however, that the desingularizing function is φ(t) = K θ t θ hence, x n x M r θ n, (29) with M = 9M K 4D( θ). Further, due to 0 < r n < combined with rn θ r n which holds for θ (0, /2], we conclude that where a 3 := a M. x n x M rn+ M rn a 3 ( n) p/2, for every n k, (30) Finally, we conclude this part of the proof by deducing an upper bound for y n x. The following inequalities hold true y n x = x n + α n (x n x n ) x + α n x n x + α n x n x ( + α n )a 3 + α n p n a 3 n p ( + 2 α n ) a 3 n p. 5

16 Let a 4 = sup n k ( + 2 α n )a 3 > 0. Then, y n x a 4, for all n k. (3) n p In case the Lojasiewicz exponent of the regularization function H is θ ( 2, ) we have the following result concerning the convergence rates of the sequences generated by (). Theorem 4 In the settings of problem (P) consider the sequences (x n ) n N, (y n ) n N generated by Algorithm (). Assume that g is bounded from below and that (x n ) n N is bounded, let x crit(g) be such that lim n + x n = x and suppose that H : R m R m R, H(x, y) = g(x) + M x y 2 fulfills the Lojasiewicz property at (x, x) crit H with Lojasiewicz exponent θ ( 2, ). Then, there exist b, b 2, b 3, b 4 > 0 such that the following statements hold true: (b ) g(y n ) g(x) b, for all n N n 2θ + 2; (b 2 ) g(x n ) g(x) b 2, for all n N n 2θ + 2; (b 3 ) x n x b 3, for all n N n 2θ θ + 2; (b 4 ) y n x b 4, for all n > N n 2θ θ + 2, where N N was defined in the proof of Theorem 3. Proof. To avoid triviality also here we assume that H n > H for all n N. We return to (22) which, after using r n+ r n, we rewrite as for all n N. Now, considering the real-valued function ϕ(t) = has that and by similar arguments (r n r n+ )r 2θ n+ + µ n(r n+ r n+2 )r 2θ n+ λ n, (32) ϕ(r n+ ) ϕ(r n ) = Assume that for some n N one has rn+ r n K 2θ t 2θ whose derivative is ϕ (t) = Kt 2θ, one ϕ (t)dt K(r n r n+ )r 2θ n ϕ(r n+2 ) ϕ(r n+ ) K(r n+ r n+2 )r 2θ n+. r 2θ n This leads to the following chain of inequalities : 2 r 2θ n+. ϕ(r n+ ) ϕ(r n ) + µ n (ϕ (r n+2 ) ϕ (r n+ )) K 2 (r n r n+ )r 2θ n+ + Kµ n(r n+ r n+2 )r 2θ n+ K 2 (r n r n+ )r 2θ n+ + K 2 µ n(r n+ r n+2 )r 2θ n+ (33) K 2 λ n. 6

17 On the other hand, if 2rn 2θ < rn+ 2θ, for some n N, then we obtain 2 2θ 2θ r 2θ n < rn+ 2θ. As a consequence, it follows that ϕ(r n+ ) ϕ(r n ) = K 2θ (r 2θ n+ r 2θ n ) K ) (2 2θ 2θ rn 2θ K ) (2 2θ 2θ r 2θ = C 2θ 2θ N. This implies that Now, let us denote ϕ(r n+ ) ϕ(r n ) + µ n (ϕ(r n+2 ) ϕ(r n+ )) C ( + µ n ). C ( + µ n ) inf n N λ n by C 2. Then ϕ(r n+ ) ϕ(r n ) + µ n (ϕ(r n+2 ) ϕ(r n+ )) C 2 λ n. (34) From the inequalities (33) and (34), it follows that there exist C > 0, such that ϕ(r n+ ) ϕ(r n ) + µ n (ϕ(r n+2 ) ϕ(r n+ )) Cλ n, for every n N. By taking µ = sup n N µ n and summing over the indices we reach Consequently n However, recall that both (r n ) n N and consequently for all n N + 2. Therefore, k=n ((ϕ(r k+ ) ϕ(r k )) + µ(ϕ(r k+2 ) ϕ(r k+ ))) C r n ϕ(r n+ ) ϕ(r N ) + µ(ϕ(r n+2 ) ϕ(r N k+ )) C n k=n λ k. n k=n λ k. and the function ϕ are decreasing, so we have ( + µ)ϕ(r n+2 ) C rn 2θ C(2θ ) K( + µ) ( ) C(2θ ) n 2 2θ λ k K( + µ) k=n n k=n λ k n 2 k=n λ k, 2θ for all n N + 2. The desired convergence rate, now follows from r n = g(y n ) g(x) + δ n x n x n 2 and n 2 k=n λ k λ(n N ), where 0 < λ = inf k N λ k. In fact, we have n 2 k=n λ k 2θ λ 2θ (n N ) 2θ λ 2θ Mn 2θ 7

18 for some M > 0 and for all n N + 2. Let b = ( ) C(2θ ) 2θ K(+µ) λ 2θ M. Then, g(y n ) g(x) b n 2θ, for all n N + 2. The other claims now follow quite easily. Indeed, note that (27) holds for every n N, hence g(x n ) g(y n ) 2D (2 L g ) r n+ Therefore, one obtains ( g(x n ) g(x) = (g(x n ) g(y n )) + (g(y n ) g(x)) 2D (2 L g ) b (n + ) 2θ. 2D (2 L g ) b + b ) n 2θ or every n N + 2. For proving (b 3 ), we use (29) again, and we have that for all n N + 2 it holds x n x M r θ n M (b n 2θ ) θ = b 3 n θ 2θ. = b2 n 2θ, The final estimate is a straightforward consequence of the definition of y n and the above estimates. Indeed, for all n > N + 2 one has y n x = x n + α n (x n x n ) x + α n x n x + α n x n x ( + α n )b 3 n θ 2θ + αn b 3 (n ) θ 2θ Let b 4 = sup n N +3 ( + 2 α n )b 3 ( n n ) θ 2θ > 0. Then, ( + 2 αn )b 3 (n ) θ 2θ. y n x b 4 n θ 2θ, for all n > N + 2. (35) Remark 5 According to [6], H is KL with Lojasiewicz exponent θ [ 2, ), whenever g is KL with Lojasiewicz exponent θ [ 2, ). Therefore, in the hypotheses of Theorem 3 and Theorem 4 it is enough to assume that g is KL with Lojasiewicz exponent θ [ 2, ). 4 Numerical experiments The aim of this section is to support the analytic results of the previous sections by numerical experiments and to highlight some interesting features of the generic algorithm (). To this end, let us consider Algorithm x n+ = y n s g(y n ), y n = x n 0.n n + 3 (x (36) n x n ), which is a specific version of the generic algorithm () with a constant, small enough, step-size and an always negative coefficient, i.e., = s and α n = 0.n n + 3 with 8 ( ) lim α n = n + 8, 0.

19 In order to give a better perspective on its advantages and disadvantages, we compare the proposed algorithm with: a) the standard Nesterov algorithm, in the case of convex objective functions; b) a Nesterov-like algorithm (see [5]), in the nonconvex case. In this respect, it may be useful to recall that until recently little was known about the efficiency of Nesterov s accelerated gradient method outside a convex setting. However, in [5], a Nesterov-like method, differing from the original only by a multiplicative coefficient, has been studied and convergence rates have been provided for the very general case of nonconvex coercive objective functions that have the KL property. The algorithms, considered in comparison to Algorithm, are: A form of Nesterov s accelerated gradient method { xn+ = y n s g(y n ), y n = x n + n n + 3 (x (37) n x n ), A Nesterov-like algorithm for nonconvex optimization { xn+ = y n s g(y n ), y n = x n + γn n + 3 (x n x n ), with γ (0, ). We start our experiments by testing Algorithm in the simplest case of a one-dimensional, quadratic function g(x) = x 2. Here, the comparison between Algorithm and Nesterov s accelerated gradient (see [8]), with both methods sharing the same step-size s = 0. shows that, although initially slower, at some point Algorithm surpasses Nesterov s method, as can be observed in Fig. a). The same behaviour can be observed also in a higher-dimensional setting, e.g., g(x, x 2 ) = 0.02x x 2 2, with the decay of both the energy error and the error in terms of iterates presented in Fig. c) and d), respectively. The difference of order between the decay in terms of the objective function and that of the norm of the iterates proved in Theorem 3 is confirmed by these two pictures. On the other hand, Fig. b) shows a comparison between the trajectories generated by the two algorithms, in the x x 2 -plane. A second set of numerical experiments is related to the minimization of the nonconvex, coercive function g : R R g(x) = ln( + (x 2 ) 2 ) depicted in Fig. 2 a). When the starting point lies within the convexity domain of g (e.g., x 0 = y 0 = 3, in Fig. 2 b)), Algorithm converges faster than a Nesterov-like algorithm, after a certain number of iterates. However, outside the convexity domain of g, e.g., for x 0 = y 0 = 0.0, the Nesterov-like algorithm clearly outperforms Algorithm (see Fig. 2 c)). Nevertheless, the very general structure of the generic algorithm () allows for much flexibility, as only the limit of the sequence (α n ) is prescribed. So, one can profit by switching between coefficients of different nature as is the case with the following modified version of Algorithm Algorithm 2 x n+ = y n ( s g(y n ), y n = x n + ( λ n ) λ n = ( 0. n n exp( + (5n 20)), ) ) n + λ n (x n x n ), n + 3 which first copies the Nesterov-like algorithm and then, after a number of iterates, switches to Algorithm, thus outperforming the Nesterov-like algorithm (see Fig. 2 d)). (38) (39) 9

20 Nesterov 0 2 Nesterov 0 2 Nesterov-like Algorithm 0 5 Nesterov 0 Algorithm Nesterov Algorithm (a) g(x) = x 2, x 0 = 3, s = 2 (b) g = 0.02x x 2 2, x 0 = (3, ), s = Algorithm 0 0 Algorithm (c) g = 0.02x x 2 2, x 0 = (3, ), s = 2 (d) g = 0.02x x 2 2, x 0 = (3, ), s = 2 Figure : Minimizing quadratic functions in dimension one (in a) and two (in b-d)) Algorithm (a) g : R R, g(x) = ln( + (x 2 ) 2 ) (b) x 0 = 3 with s = Nesterov-like Algorithm Algorithm Nesterov-like Algorithm Algorithm (c) x 0 = 0.0 with s = 0. (d) x 0 = 0.0 with s = 0. Figure 2: Minimizing the nonconvex function g(x) = ln( + (x 2 ) 2 ) by using different algorithms and different starting points. 20

21 References [] H. Attouch, J. Bolte, On the convergence of the proximal algorithm for nonsmooth functions involving analytic features, Mathematical Programming 6(-2) Series B, 5-6, 2009 [2] H. Attouch, J. Bolte, P. Redont, A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka- Lojasiewicz inequality, Mathematics of Operations Research 35(2), , 200 [3] H. Attouch, J. Bolte, B.F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods, Mathematical Programming 37(-2) Series A, 9-29, 203 [4] H. Attouch, Z. Chbani, J. Peypouquet, P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Math. Program. 68(-2) Ser. B, 23-75, 208 [5] P. Bégout, J. Bolte, M.A. Jendoubi, On damped second-order gradient systems, Journal of Differential Equations (259), , 205 [6] J. Bolte, S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Mathematical Programming Series A (46)(-2), , 204 [7] J. Bolte, A. Daniilidis, A. Lewis, The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM Journal on Optimization 7(4), , 2006 [8] J. Bolte, A. Daniilidis, A. Lewis, M. Shiota, Clarke subgradients of stratifiable functions, SIAM Journal on Optimization 8(2), , 2007 [9] J. Bolte, A. Daniilidis, O. Ley, L. Mazet, Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity, Transactions of the American Mathematical Society 362(6), , 200 [0] R.I. Boţ, E.R. Csetnek, S.C. László, Approaching nonsmooth nonconvex minimization through second-order proximal-gradient dynamical systems, Journal of Evolution Equations 8(3), 29-38, 208 [] R.I. Boţ, E.R. Csetnek, S.C. László, An inertial forward-backward algorithm for minimizing the sum of two non-convex functions, Euro Journal on Computational Optimization 4(), 3-25, 206 [2] R.I. Boţ, E.R. Csetnek, S.C. László, A second order dynamical approach with variable damping to nonconvex smooth minimization, Applicable Analysis (208), [3] P. Frankel, G. Garrigos, J. Peypouquet, Splitting Methods with Variable Metric for Kurdyka Lojasiewicz Functions and General Convergence Rates, Journal of Optimization Theory and Applications, 65(3), , 205 [4] K. Kurdyka, On gradients of functions definable in o-minimal structures, Annales de l institut Fourier (Grenoble) 48(3), , 998 [5] S.C. László, Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization, 2

22 [6] G. Li, T. K. Pong, Calculus of the Exponent of Kurdyka- Lojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods, Foundations of Computational Mathematics, -34, 208 [7] S. Lojasiewicz, Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles, Éditions du Centre National de la Recherche Scientifique Paris, 87-89, 963 [8] Y.E. Nesterov, A method for solving the convex programming problem with convergence rate O(/k 2 ), (Russian) Dokl. Akad. Nauk SSSR 269(3), , 983 [9] Y. Nesterov, Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, Dordrecht, 2004 [20] B. T. Polyak, Some methods of speeding up the convergence of iteration methods, U.S.S.R. Comput. Math. Math. Phys., 4(5):-7, 964 [2] R.T. Rockafellar, R.J.-B. Wets, Variational Analysis, Fundamental Principles of Mathematical Sciences 37, Springer-Verlag, Berlin, 998 [22] W. Su, S. Boyd, E.J. Candes, A differential equation for modeling Nesterov s accelerated gradient method: theory and insights, Journal of Machine Learning Research, 7, -43,

Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization

Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization Szilárd Csaba László November, 08 Abstract. We investigate an inertial algorithm of gradient type

More information

An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions

An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions Radu Ioan Boţ Ernö Robert Csetnek Szilárd Csaba László October, 1 Abstract. We propose a forward-backward

More information

arxiv: v2 [math.oc] 21 Nov 2017

arxiv: v2 [math.oc] 21 Nov 2017 Unifying abstract inexact convergence theorems and block coordinate variable metric ipiano arxiv:1602.07283v2 [math.oc] 21 Nov 2017 Peter Ochs Mathematical Optimization Group Saarland University Germany

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Sequential convex programming,: value function and convergence

Sequential convex programming,: value function and convergence Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional

More information

arxiv: v2 [math.oc] 11 Dec 2018

arxiv: v2 [math.oc] 11 Dec 2018 OPTIMAL CONVERGENCE RATES FOR NESTEROV ACCELERATION J-F AUJOL, CH DOSSAL 2 AND A RONDEPIERRE 2,3 UNIV BORDEAUX, BORDEAUX INP, CNRS, IMB, UMR 525, F-33400 TALENCE, FRANCE 2 IMT, UNIV TOULOUSE, INSA TOULOUSE,

More information

arxiv: v2 [math.oc] 6 Oct 2016

arxiv: v2 [math.oc] 6 Oct 2016 The value function approach to convergence analysis in composite optimization Edouard Pauwels IRIT-UPS, 118 route de Narbonne, 31062 Toulouse, France. arxiv:1604.01654v2 [math.oc] 6 Oct 2016 Abstract This

More information

A user s guide to Lojasiewicz/KL inequalities

A user s guide to Lojasiewicz/KL inequalities Other A user s guide to Lojasiewicz/KL inequalities Toulouse School of Economics, Université Toulouse I SLRA, Grenoble, 2015 Motivations behind KL f : R n R smooth ẋ(t) = f (x(t)) or x k+1 = x k λ k f

More information

On the iterate convergence of descent methods for convex optimization

On the iterate convergence of descent methods for convex optimization On the iterate convergence of descent methods for convex optimization Clovis C. Gonzaga March 1, 2014 Abstract We study the iterate convergence of strong descent algorithms applied to convex functions.

More information

A semi-algebraic look at first-order methods

A semi-algebraic look at first-order methods splitting A semi-algebraic look at first-order Université de Toulouse / TSE Nesterov s 60th birthday, Les Houches, 2016 in large-scale first-order optimization splitting Start with a reasonable FOM (some

More information

OPTIMAL CONVERGENCE RATES FOR NESTEROV ACCELERATION

OPTIMAL CONVERGENCE RATES FOR NESTEROV ACCELERATION OPTIMAL CONVERGENCE RATES FOR NESTEROV ACCELERATION Jean François Aujol, Charles Dossal, Aude Rondepierre To cite this version: Jean François Aujol, Charles Dossal, Aude Rondepierre OPTIMAL CONVERGENCE

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

Tame variational analysis

Tame variational analysis Tame variational analysis Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Daniilidis (Chile), Ioffe (Technion), and Lewis (Cornell) May 19, 2015 Theme: Semi-algebraic geometry

More information

First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems

First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems Jérôme Bolte Shoham Sabach Marc Teboulle Yakov Vaisbourd June 20, 2017 Abstract We

More information

Active sets, steepest descent, and smooth approximation of functions

Active sets, steepest descent, and smooth approximation of functions Active sets, steepest descent, and smooth approximation of functions Dmitriy Drusvyatskiy School of ORIE, Cornell University Joint work with Alex D. Ioffe (Technion), Martin Larsson (EPFL), and Adrian

More information

Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems

Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems Jérôme Bolte Shoham Sabach Marc Teboulle Abstract We introduce a proximal alternating linearized minimization PALM) algorithm

More information

Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity.

Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Jérôme BOLTE, Aris DANIILIDIS, Olivier LEY, Laurent MAZET. Abstract The classical Lojasiewicz inequality and its extensions

More information

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Variable Metric Forward-Backward Algorithm

Variable Metric Forward-Backward Algorithm Variable Metric Forward-Backward Algorithm 1/37 Variable Metric Forward-Backward Algorithm for minimizing the sum of a differentiable function and a convex function E. Chouzenoux in collaboration with

More information

From error bounds to the complexity of first-order descent methods for convex functions

From error bounds to the complexity of first-order descent methods for convex functions From error bounds to the complexity of first-order descent methods for convex functions Nguyen Trong Phong-TSE Joint work with Jérôme Bolte, Juan Peypouquet, Bruce Suter. Toulouse, 23-25, March, 2016 Journées

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Convergence rate estimates for the gradient differential inclusion

Convergence rate estimates for the gradient differential inclusion Convergence rate estimates for the gradient differential inclusion Osman Güler November 23 Abstract Let f : H R { } be a proper, lower semi continuous, convex function in a Hilbert space H. The gradient

More information

Steepest descent method on a Riemannian manifold: the convex case

Steepest descent method on a Riemannian manifold: the convex case Steepest descent method on a Riemannian manifold: the convex case Julien Munier Abstract. In this paper we are interested in the asymptotic behavior of the trajectories of the famous steepest descent evolution

More information

Hedy Attouch, Jérôme Bolte, Benar Svaiter. To cite this version: HAL Id: hal

Hedy Attouch, Jérôme Bolte, Benar Svaiter. To cite this version: HAL Id: hal Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods Hedy Attouch, Jérôme Bolte, Benar Svaiter To cite

More information

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping March 0, 206 3:4 WSPC Proceedings - 9in x 6in secondorderanisotropicdamping206030 page Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping Radu

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

subgradient trajectories : the convex case

subgradient trajectories : the convex case trajectories : Université de Tours www.lmpt.univ-tours.fr/ ley Joint work with : Jérôme Bolte (Paris vi) Aris Daniilidis (U. Autonoma Barcelona & Tours) and Laurent Mazet (Paris xii) inequality inequality

More information

A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators

A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators Radu Ioan Boţ Ernö Robert Csetnek April 23, 2012 Dedicated to Jon Borwein on the occasion of his 60th birthday Abstract. In this note

More information

arxiv: v1 [math.oc] 7 Jan 2019

arxiv: v1 [math.oc] 7 Jan 2019 LOCAL MINIMIZERS OF SEMI-ALGEBRAIC FUNCTIONS TIÊ N-SO. N PHẠM arxiv:1901.01698v1 [math.oc] 7 Jan 2019 Abstract. Consider a semi-algebraic function f: R n R, which is continuous around a point x R n. Using

More information

Relationships between upper exhausters and the basic subdifferential in variational analysis

Relationships between upper exhausters and the basic subdifferential in variational analysis J. Math. Anal. Appl. 334 (2007) 261 272 www.elsevier.com/locate/jmaa Relationships between upper exhausters and the basic subdifferential in variational analysis Vera Roshchina City University of Hong

More information

SHARPNESS, RESTART AND ACCELERATION

SHARPNESS, RESTART AND ACCELERATION SHARPNESS, RESTART AND ACCELERATION VINCENT ROULET AND ALEXANDRE D ASPREMONT ABSTRACT. The Łojasievicz inequality shows that sharpness bounds on the minimum of convex optimization problems hold almost

More information

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

On damped second-order gradient systems

On damped second-order gradient systems On damped second-order gradient systems Pascal Bégout, Jérôme Bolte and Mohamed Ali Jendoubi Abstract Using small deformations of the total energy, as introduced in [31], we establish that damped second

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Hamiltonian Descent Methods

Hamiltonian Descent Methods Hamiltonian Descent Methods Chris J. Maddison 1,2 with Daniel Paulin 1, Yee Whye Teh 1,2, Brendan O Donoghue 2, Arnaud Doucet 1 Department of Statistics University of Oxford 1 DeepMind London, UK 2 The

More information

On the convergence of a regularized Jacobi algorithm for convex optimization

On the convergence of a regularized Jacobi algorithm for convex optimization On the convergence of a regularized Jacobi algorithm for convex optimization Goran Banjac, Kostas Margellos, and Paul J. Goulart Abstract In this paper we consider the regularized version of the Jacobi

More information

arxiv: v1 [math.oc] 11 Jan 2008

arxiv: v1 [math.oc] 11 Jan 2008 Alternating minimization and projection methods for nonconvex problems 1 Hedy ATTOUCH 2, Jérôme BOLTE 3, Patrick REDONT 2, Antoine SOUBEYRAN 4. arxiv:0801.1780v1 [math.oc] 11 Jan 2008 Abstract We study

More information

AW -Convergence and Well-Posedness of Non Convex Functions

AW -Convergence and Well-Posedness of Non Convex Functions Journal of Convex Analysis Volume 10 (2003), No. 2, 351 364 AW -Convergence Well-Posedness of Non Convex Functions Silvia Villa DIMA, Università di Genova, Via Dodecaneso 35, 16146 Genova, Italy villa@dima.unige.it

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

CLARKE CRITICAL VALUES OF SUBANALYTIC LIPSCHITZ CONTINUOUS FUNCTIONS

CLARKE CRITICAL VALUES OF SUBANALYTIC LIPSCHITZ CONTINUOUS FUNCTIONS CLARKE CRITICAL VALUES OF SUBANALYTIC LIPSCHITZ CONTINUOUS FUNCTIONS Abstract. We prove that any subanalytic locally Lipschitz function has the Sard property. Such functions are typically nonsmooth and

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

arxiv: v1 [math.ca] 7 Jul 2013

arxiv: v1 [math.ca] 7 Jul 2013 Existence of Solutions for Nonconvex Differential Inclusions of Monotone Type Elza Farkhi Tzanko Donchev Robert Baier arxiv:1307.1871v1 [math.ca] 7 Jul 2013 September 21, 2018 Abstract Differential inclusions

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities

On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities On the Iteration Complexity of Some Projection Methods for Monotone Linear Variational Inequalities Caihua Chen Xiaoling Fu Bingsheng He Xiaoming Yuan January 13, 2015 Abstract. Projection type methods

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply

More information

A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing

A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing Emilie Chouzenoux emilie.chouzenoux@univ-mlv.fr Université Paris-Est Lab. d Informatique Gaspard

More information

WEIERSTRASS THEOREMS AND RINGS OF HOLOMORPHIC FUNCTIONS

WEIERSTRASS THEOREMS AND RINGS OF HOLOMORPHIC FUNCTIONS WEIERSTRASS THEOREMS AND RINGS OF HOLOMORPHIC FUNCTIONS YIFEI ZHAO Contents. The Weierstrass factorization theorem 2. The Weierstrass preparation theorem 6 3. The Weierstrass division theorem 8 References

More information

arxiv: v1 [math.na] 25 Sep 2012

arxiv: v1 [math.na] 25 Sep 2012 Kantorovich s Theorem on Newton s Method arxiv:1209.5704v1 [math.na] 25 Sep 2012 O. P. Ferreira B. F. Svaiter March 09, 2007 Abstract In this work we present a simplifyed proof of Kantorovich s Theorem

More information

Extremal Solutions of Differential Inclusions via Baire Category: a Dual Approach

Extremal Solutions of Differential Inclusions via Baire Category: a Dual Approach Extremal Solutions of Differential Inclusions via Baire Category: a Dual Approach Alberto Bressan Department of Mathematics, Penn State University University Park, Pa 1682, USA e-mail: bressan@mathpsuedu

More information

INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS. Yazheng Dang. Jie Sun. Honglei Xu

INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS. Yazheng Dang. Jie Sun. Honglei Xu Manuscript submitted to AIMS Journals Volume X, Number 0X, XX 200X doi:10.3934/xx.xx.xx.xx pp. X XX INERTIAL ACCELERATED ALGORITHMS FOR SOLVING SPLIT FEASIBILITY PROBLEMS Yazheng Dang School of Management

More information

EXISTENCE OF STRONG SOLUTIONS OF FULLY NONLINEAR ELLIPTIC EQUATIONS

EXISTENCE OF STRONG SOLUTIONS OF FULLY NONLINEAR ELLIPTIC EQUATIONS EXISTENCE OF STRONG SOLUTIONS OF FULLY NONLINEAR ELLIPTIC EQUATIONS Adriana Buică Department of Applied Mathematics Babeş-Bolyai University of Cluj-Napoca, 1 Kogalniceanu str., RO-3400 Romania abuica@math.ubbcluj.ro

More information

A First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems

A First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems A First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems Amir Beck and Shoham Sabach July 6, 2011 Abstract We consider a general class of convex optimization problems

More information

A forward-backward dynamical approach to the minimization of the sum of a nonsmooth convex with a smooth nonconvex function

A forward-backward dynamical approach to the minimization of the sum of a nonsmooth convex with a smooth nonconvex function A forwar-backwar ynamical approach to the minimization of the sum of a nonsmooth convex with a smooth nonconvex function Rau Ioan Boţ Ernö Robert Csetnek February 28, 2017 Abstract. We aress the minimization

More information

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

More information

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization Yinyu Ye Department is Management Science & Engineering and Institute of Computational & Mathematical Engineering Stanford

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from

More information

A convergence result for an Outer Approximation Scheme

A convergence result for an Outer Approximation Scheme A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction J. Korean Math. Soc. 38 (2001), No. 3, pp. 683 695 ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE Sangho Kum and Gue Myung Lee Abstract. In this paper we are concerned with theoretical properties

More information

Analysis of the convergence rate for the cyclic projection algorithm applied to semi-algebraic convex sets

Analysis of the convergence rate for the cyclic projection algorithm applied to semi-algebraic convex sets Analysis of the convergence rate for the cyclic projection algorithm applied to semi-algebraic convex sets Liangjin Yao University of Newcastle June 3rd, 2013 The University of Newcastle liangjin.yao@newcastle.edu.au

More information

Convergence of generalized entropy minimizers in sequences of convex problems

Convergence of generalized entropy minimizers in sequences of convex problems Proceedings IEEE ISIT 206, Barcelona, Spain, 2609 263 Convergence of generalized entropy minimizers in sequences of convex problems Imre Csiszár A Rényi Institute of Mathematics Hungarian Academy of Sciences

More information

A double projection method for solving variational inequalities without monotonicity

A double projection method for solving variational inequalities without monotonicity A double projection method for solving variational inequalities without monotonicity Minglu Ye Yiran He Accepted by Computational Optimization and Applications, DOI: 10.1007/s10589-014-9659-7,Apr 05, 2014

More information

Convergence of descent methods for semi-algebraic and tame problems.

Convergence of descent methods for semi-algebraic and tame problems. OPTIMIZATION, GAMES, AND DYNAMICS Institut Henri Poincaré November 28-29, 2011 Convergence of descent methods for semi-algebraic and tame problems. Hedy ATTOUCH Institut de Mathématiques et Modélisation

More information

OPTIMAL CONTROL AND STRANGE TERM FOR A STOKES PROBLEM IN PERFORATED DOMAINS

OPTIMAL CONTROL AND STRANGE TERM FOR A STOKES PROBLEM IN PERFORATED DOMAINS PORTUGALIAE MATHEMATICA Vol. 59 Fasc. 2 2002 Nova Série OPTIMAL CONTROL AND STRANGE TERM FOR A STOKES PROBLEM IN PERFORATED DOMAINS J. Saint Jean Paulin and H. Zoubairi Abstract: We study a problem of

More information

Received: 15 December 2010 / Accepted: 30 July 2011 / Published online: 20 August 2011 Springer and Mathematical Optimization Society 2011

Received: 15 December 2010 / Accepted: 30 July 2011 / Published online: 20 August 2011 Springer and Mathematical Optimization Society 2011 Math. Program., Ser. A (2013) 137:91 129 DOI 10.1007/s10107-011-0484-9 FULL LENGTH PAPER Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward backward splitting,

More information

Remarks on multifunction-based. dynamical systems. Bela Vizvari c. RRR , July, 2001

Remarks on multifunction-based. dynamical systems. Bela Vizvari c. RRR , July, 2001 R u t c o r Research R e p o r t Remarks on multifunction-based dynamical systems Gergely KOV ACS a Bela Vizvari c Marian MURESAN b RRR 43-2001, July, 2001 RUTCOR Rutgers Center for Operations Research

More information

Spring School on Variational Analysis Paseky nad Jizerou, Czech Republic (April 20 24, 2009)

Spring School on Variational Analysis Paseky nad Jizerou, Czech Republic (April 20 24, 2009) Spring School on Variational Analysis Paseky nad Jizerou, Czech Republic (April 20 24, 2009) GRADIENT DYNAMICAL SYSTEMS, TAME OPTIMIZATION AND APPLICATIONS Aris DANIILIDIS Departament de Matemàtiques Universitat

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

The Boosted DC Algorithm for nonsmooth functions

The Boosted DC Algorithm for nonsmooth functions The Boosted DC Algorithm for nonsmooth functions Francisco J. Aragón Artacho *, Phan T. Vuong December 17, 218 arxiv:1812.67v1 [math.oc] 14 Dec 218 Abstract The Boosted Difference of Convex functions Algorithm

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

1 The Observability Canonical Form

1 The Observability Canonical Form NONLINEAR OBSERVERS AND SEPARATION PRINCIPLE 1 The Observability Canonical Form In this Chapter we discuss the design of observers for nonlinear systems modelled by equations of the form ẋ = f(x, u) (1)

More information

Asymptotic Convergence of the Steepest Descent Method for the Exponential Penalty in Linear Programming

Asymptotic Convergence of the Steepest Descent Method for the Exponential Penalty in Linear Programming Journal of Convex Analysis Volume 2 (1995), No.1/2, 145 152 Asymptotic Convergence of the Steepest Descent Method for the Exponential Penalty in Linear Programming R. Cominetti 1 Universidad de Chile,

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

Observer design for a general class of triangular systems

Observer design for a general class of triangular systems 1st International Symposium on Mathematical Theory of Networks and Systems July 7-11, 014. Observer design for a general class of triangular systems Dimitris Boskos 1 John Tsinias Abstract The paper deals

More information

On the Convergence and O(1/N) Complexity of a Class of Nonlinear Proximal Point Algorithms for Monotonic Variational Inequalities

On the Convergence and O(1/N) Complexity of a Class of Nonlinear Proximal Point Algorithms for Monotonic Variational Inequalities STATISTICS,OPTIMIZATION AND INFORMATION COMPUTING Stat., Optim. Inf. Comput., Vol. 2, June 204, pp 05 3. Published online in International Academic Press (www.iapress.org) On the Convergence and O(/N)

More information

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS

SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS LE MATEMATICHE Vol. LVII (2002) Fasc. I, pp. 6382 SOME MEASURABILITY AND CONTINUITY PROPERTIES OF ARBITRARY REAL FUNCTIONS VITTORINO PATA - ALFONSO VILLANI Given an arbitrary real function f, the set D

More information

NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS

NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS Fixed Point Theory, Volume 9, No. 1, 28, 3-16 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS GIOVANNI ANELLO Department of Mathematics University

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Helly's Theorem and its Equivalences via Convex Analysis

Helly's Theorem and its Equivalences via Convex Analysis Portland State University PDXScholar University Honors Theses University Honors College 2014 Helly's Theorem and its Equivalences via Convex Analysis Adam Robinson Portland State University Let us know

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L. McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal

More information

APPLICATIONS OF DIFFERENTIABILITY IN R n.

APPLICATIONS OF DIFFERENTIABILITY IN R n. APPLICATIONS OF DIFFERENTIABILITY IN R n. MATANIA BEN-ARTZI April 2015 Functions here are defined on a subset T R n and take values in R m, where m can be smaller, equal or greater than n. The (open) ball

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

A proximal minimization algorithm for structured nonconvex and nonsmooth problems

A proximal minimization algorithm for structured nonconvex and nonsmooth problems A proximal minimization algorithm for structured nonconvex and nonsmooth problems Radu Ioan Boţ Ernö Robert Csetnek Dang-Khoa Nguyen May 8, 08 Abstract. We propose a proximal algorithm for minimizing objective

More information

Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs

Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs Jérôme Bolte and Edouard Pauwels September 29, 2014 Abstract In view of solving nonsmooth and nonconvex

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Optimality Conditions for Nonsmooth Convex Optimization

Optimality Conditions for Nonsmooth Convex Optimization Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never

More information

The Generalized Viscosity Implicit Rules of Asymptotically Nonexpansive Mappings in Hilbert Spaces

The Generalized Viscosity Implicit Rules of Asymptotically Nonexpansive Mappings in Hilbert Spaces Applied Mathematical Sciences, Vol. 11, 2017, no. 12, 549-560 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2017.718 The Generalized Viscosity Implicit Rules of Asymptotically Nonexpansive

More information

Numerical Sequences and Series

Numerical Sequences and Series Numerical Sequences and Series Written by Men-Gen Tsai email: b89902089@ntu.edu.tw. Prove that the convergence of {s n } implies convergence of { s n }. Is the converse true? Solution: Since {s n } is

More information