The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates
|
|
- Tobias Campbell
- 5 years ago
- Views:
Transcription
1 The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates Radu Ioan Boţ Dang-Khoa Nguyen January 6, 08 Abstract. We propose two numerical algorithms for minimizing the sum of a smooth function and the composition of a nonsmooth function with a linear operator in the fully nonconvex setting. The iterative schemes are formulated in the spirit of the proximal and, respectively, proximal linearized alternating direction method of multipliers. The proximal terms are introduced through variable metrics, which facilitates the derivation of proximal splitting algorithms for nonconvex complexly structured optimization problems as particular instances of the general schemes. Convergence of the iterates to a KKT point of the objective function is proved under mild conditions on the sequence of variable metrics and by assuming that a regularization of the associated augmented Lagrangian has the Kurdya- Lojasiewicz property. If the augmented Lagrangian has the Lojasiewicz property, then convergence rates of both augmented Lagrangian and iterates are derived. Keywords. nonconvex complexly structured optimization problems, alternating direction method of multipliers, proximal splitting algorithms, variable metric, convergence analysis, convergence rates, Kurdya- Lojasiewicz property, Lojasiewicz exponent AMS subject classification. 47H05, 65K05, 90C6 Introduction. Problem formulation and motivation In this paper, we address the solving of the optimization problem min tg paxq ` h pxqu, () xprn where g : R m Ñ R Y t`8u is a proper and lower semicontinuous function, h: R n Ñ R is a Fréchet differentiable function with L-Lipschitz continuous gradient and A: R n Ñ R m is a linear operator. The spaces R n and R m are equipped with Euclidean inner products x, y and associated norms a x, y, which are both denoted in the same way, as there is no ris of confusion. We start by briefly describing the Alternating Direction Method of Multipliers (ADMM) in the context of solving the more general problem min tf pxq ` g paxq ` h pxqu, () xprn where g and h are assumed to be also convex and f : R n Ñ R Y t`8u is another proper, convex and lower semicontinuous function. We rewrite the problem (), by introducing an auxiliary variable, as min px,zqpr nˆr m Ax z 0 tf pxq ` g pzq ` h pxqu. (3) Faculty of Mathematics, University of Vienna, Osar-Morgenstern-Platz, 090 Vienna, Austria, radu.bot@ univie.ac.at. Research partially supported by FWF (Austrian Science Fund), project I 49-N3. Invited Associate Professor, Babeş-Bolyai University, Faculty of Mathematics and Computer Sciences, str. Mihail Kogălniceanu, Cluj-Napoca, Romania Faculty of Mathematics, University of Vienna, Osar-Morgenstern-Platz, 090 Vienna, Austria, dang-hoa. nguyen@univie.ac.at. The author gratefully acnowledges the financial support of the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by Austrian Science Fund (FWF, project W60-N35).
2 For a fixed real number r ą 0, the augmented Lagrangian associated with problem (3) reads L r : R n ˆ R m ˆ R m Ñ R Y t`8u, L r px, z, yq fpxq ` g pzq ` h pxq ` xy, Ax zy ` r Ax z. Given a starting vector `x 0, z 0, y 0 P R n ˆ R m ˆ R m and tm u Ď R nˆn, ( M Ď Rmˆm, two sequences of symmetric and positive semidefinite matrices, the following proximal ADMM algorithm formulated in the presence of a smooth function and involving variable metrics has been proposed and investigated in [5]: for all ě 0 generate the sequence tpx, z, y qu by x ` P arg min xpr n z ` arg min zpr m # f pxq ` xx x, hpx qy ` r Ax z ` # g pzq ` r Ax` z ` r y y ` y ` `Ax ` z `. r y ` z z M ` + + x x, (4a) M, (4b) In case ρ, it has been proved in [5] that when the set of the Lagrangian associated with (3) (which is nothing else than L r when r 0) is nonempty and the two matrix sequences and A fulfill mild additional assumptions, then the sequence tpx, z, y qu converges to a saddle point of the Lagrangian associated with problem (3) (which is nothing else than L r when r 0) and provides in this way both an optimal solution of () and an optimal solution of its Fenchel dual problem. Furthermore, an ergodic primal-dual gap convergence rate result expressed in terms of the Lagrangian has been shown. In case h 0, the above iterative scheme encompasses different numerical algorithms considered in the literature. When M M 0 for all ě 0, (4a)-(4c) becomes the classical ADMM algorithm ([5,, 5, 6]), which has a huge popularity in the optimization community. And this despite its poor implementation properties caused by the fact that, in general, the calculation of the sequence of primal variables x ( does not correspond to a proximal step. For an inertial version of the classical ADMM algorithm we refer to [0]. When M M and M M for all ě 0, (4a)-(4c) recovers the proximal ADMM algorithm investigated by Shefi and Teboulle in [39] (see also [0, ]). It has been pointed out in [39] that, for suitable choices of the matrices M and M, this proximal ADMM algorithm becomes a primal-dual splitting algorithm in the sense of those considered in [3, 6, 9, 4], and which, due to their full splitting character, overcome the drawbacs of the classical ADMM algorithm. Recently, in [] it ( has been shown that, when f is strongly convex, suitable choices of the non-constant sequences M and ( M may lead to a rate of convergence for the sequence of primal iterates of O p{q. The reason why we address in this paper the slightly less general optimization problem () is exclusively given by the fact that in this setting we can provide sufficient conditions which guarantee that the sequence generated by the ADMM algorithm is bounded. In the nonconvex setting, the boundedness of the sequence tpx, z, y qu plays a central role the convergence analysis. The contributions of the paper are as follows:. We propose a proximal ADMM (P-ADMM) algorithm and a proximal linearized ADMM (PL-ADMM) algorithm for solving () and carry out a convergence analysis in parallel for both algorithms. We first prove under certain assumptions on the matrix sequences boundedness for the sequence of generated iterates tpx, z, y qu. Under these premises, we show that the cluster points of tpx, z, y qu are KKT points of the problem (). Global convergence of the sequence is shown provide that a regularization of the augmented Lagrangian satisfies the Kurdya- Lojasiewicz property. In case this regularization of the augmented Lagrangian has the Lojasiewicz property, we derive rates of convergence for the sequence of iterates. To the best of our nowledge, these are the first results in the literature addressing convergence rates for the nonconvex ADMM.. The two ADMM algorithms under investigation are of relaxed type, namely, we allow ρ P p0, q. We notice that ρ is the standard choice in the literature ([, 5,, 30, 39, 4]). Gabay and Mercier proved in [6] in the convex setting that ρ may be chosen in p0, q, however, the majority of the extensions of the convex relaxed ADMM algorithm assume that ρ P 0, `?5 43, 44] or as for a particular choice of ρ, which is interpreted as a step size, see [7]. (4c), see [0,, 5, 40, The only wor in the nonconvex setting dealing with an alternating minimization algorithm, however for the minimization of the sum of a simple nonsmooth with a smooth function, and which allows a relaxed parameter ρ different from is [44].
3 3. Particular outcomes of the proposed algorithms will be full splitting algorithms for solving the nonconvex complexly structured optimization (), which we will obtain by an appropriate choice of the matrix sequences. (P-ADMM) will give rise to an iterative scheme formulated only in terms of proximal steps for the function g and h and of forward evaluations of the matrix A, while (PL-ADMM) will give rise to an iterative scheme in which the function h will be performed via a gradient step. Exact formulas for proximal operators are available not only for large classes of convex ([8]), but also of nonconvex functions ([3, 3, 9]). The fruitful idea to linearize the step involving the smooth term has been used in the past in the context of ADMM algorithms mostly in the convex setting [3, 36, 37, 43, 45]; the paper [3] being the only exception in the nonconvex setting. For previous wors addressing the ADMM algorithm in the nonconvex setting we mention: [30], where () is studied by assuming that h is twice continuously differentiable with bounded Hessian; [4], where the convergence is studied in the context of solving a very particular nonconvex consensus and sharing problems; and [], where the ADMM algorithm is used in the penalized zero-variance discriminant analysis. In [4] and [3], the investigations of the ADMM algorithm are carried out in very restrictive settings generated by the strong assumptions on the nonsmooth functions and linear operators.. Notations and preliminaries Let N be a strictly positive integer. We denote by : p,..., q P R N and write for x : px,..., x N q, y : py,..., y N q P R N x ă y if and only if x i ă y N. The Cartesian product R N ˆ R N ˆ... ˆ R Np with some strictly positive integer p will be endowed with inner product and associated norm defined for u : pu,..., u p q, u : `u,..., up P R N ˆ R N ˆ... ˆ R Np by g fÿ u, u ui, uid and u e p u i, i respectively. Moreover, for every u : pu,..., u p q, u : `u,..., u p P R N ˆ R N ˆ... ˆ R Np we have ÿ p? p i g fÿ u i ď u e p u i ď i i pÿ u i. (5) We denote by S Ǹ the family of symmetric and positive semidefinite matrices M P R NˆN. Every M P S Ǹ induces a semi-norm defined by x M : xmx, P RN. The Loewner partial ordering on S Ǹ is defined for M, M P S Ǹ as i M ě M ô x M ě x P RN. Thus M P S Ǹ is nothing else than M ě 0. For α ą 0 we set P N α : M P S Ǹ : M ě αid (, where Id denotes the identity matrix. If M P Pα N, then the semi-norm M obviously becomes a norm. The linear operator A is surjective if and only if its associated matrix has full row ran. This assumption is further equivalent to the fact that the matrix associated to AA is positively definite. Since λ min paa q y ď y xaa y, yy A P R m, (6) AA this is further equivalent to λ min paa q ą 0 (and AA P P n λ minpaa q ), where λ minp q denotes the smallest eigenvalue of a matrix. Similarly, A is injective if and only if λ min pa Aq ą 0 (and A A P P n λ minpa Aq ). Proposition. Let Ψ: R N Ñ R be Fréchet differentiable such that its gradient is Lipschitz continuous with constant L ą 0. Then the following statements are true: 3
4 . For every x, y P R N and every z P rx, ys tp tqx ` ty : t P r0, su it holds Ψ pyq ď Ψ pxq ` x Ψ pzq, y xy ` L y x ; (7). If Ψ is bounded from below, then for every σ ą 0 it holds " ˆ inf Ψ pxq xpr N σ L * σ Ψ pxq ą 8. (8) Proof.. Let be x, y P R N and z : p tqx ` ty for t P r0, s. By the fundamental theorem for line integrals we get Since Ψ pyq Ψ pxq ď ż 0 ż 0 ż 0 ż 0 x Ψ pp sqx ` syq, y xy ds x Ψ pp sqx ` syq Ψ pzq, y xy ds ` x Ψ pzq, y xy. (9) x Ψ pp sqx ` syq Ψ pzq, y xy ds ż Ψ pp sqx ` syq Ψ pzq y x ds ď L x y s t ds ˆż t L x y p s ` tq ds ` 0 ż t ˆ ps tq ds L t p tq x y. (0) 0 The inequality (7) is obtained by combining (9) and (0) and by using that 0 ď t ď.. The inequality (7) gives for every x P R N ˆ 8 ă inf Ψ pyq ď Ψ ypr N x σ Bˆ ď Ψ pxq ` Ψ pxq Ψ pxq ˆ σ which leads to the desired conclusion. x σ L σ Ψ pxq Ψ pxq, F x, Ψ pxq ` L ˆ x Ψ pxq σ x Remar. The so-called Descent Lemma, which says that for a Fréchet differentiable function Ψ: R N Ñ R having Lipschitz continuous gradient with constant L ą 0 it holds Ψ pyq ď Ψ pxq ` x Ψ pxq, y xy ` L y y P R N, () follows from statement (i) of the above proposition for z : x. Moreover, for z : y we have that Ψ pxq ě Ψ pyq ` x Ψ pyq, x yy L x y P R N, () which is equivalent to the fact that Ψ ` L is a convex function, in other words, Ψ is a L-semiconvex function ([8]). It follows from the previous result that a Fréchet differentiable function with L-Lipschitz continuous gradient is L-semiconvex. Further, we will recall the definition and some properties of the limiting subdifferential, a notion which will play an important role in the convergence analysis we are going to carry out for the nonconvex 4
5 ADMM algorithm. Let Ψ: R N Ñ R Y t`8u be a proper and lower semicontinuous function. For any x P domψ : x P R N : Ψ pxq ă `8 (, the Fréchet (viscosity) subdifferential of Ψ at x is * p BΨ pxq : "d P R N Ψ pyq Ψ pxq xd, y xy : lim inf ě 0 yñx y x and the limiting (Morduhovich) subdifferential of Ψ at x is BΨ pxq : td P R N : exist sequences x Ñ x and d Ñ d as Ñ `8 such that Ψ `x Ñ Ψ pxq as Ñ `8 and d P p BΨ `x for all ě 0u. For x R dom pψq, we set p BΨ pxq BΨ pxq : H. The inclusion p BΨ pxq Ď Ψ pxq holds for each x P R N in general. In case Ψ is convex, these two subdifferential notions coincide with the convex subdifferential, thus p BΨ pxq BΨ pxq d P R N : Ψ pyq ě Ψ pxq ` xd, y P R N( for all x P domψ. If x P R N is a local minimum of Ψ, then 0 P BΨ pxq. We denote by critpψq tx P R N : 0 P BΨ pxqu the set of critical points of Ψ. The limiting subdifferential fulfills the closedness criterion: if x ( and td u are sequence in R N such that d P Ψ `x for all ě 0 and `x, d Ñ px, dq and Ψ `x Ñ Ψ pxq as Ñ `8, then d P Ψ pxq. We also have the following subdifferential sum rule ([34, Proposition.07], [38, Exercise 8.8]): if Φ: R N Ñ R is a continuously differentiable function, then B pψ ` Φq pxq BΨ pxq ` Φ pxq for all x P R N ; and the following formula for the subdifferential of the composition with a linear operator A: R N Ñ R N ([34, Proposition.], [38, Exercise 0.7]): if x P domψ and A is injective, then B pψ Aq pxq A BΨ paxq. We close this section by presenting some convergence results for real sequences that will be used in the sequel in the convergence analysis. The next lemma is often used in the literature when proving convergence of numerical algorithms relying on Fejér monotonicity techniques (see, for instance, [, Lemma.], [4, Lemma ]). Lemma. Let tb u be a sequence in R and tξ u a sequence in R`. Assume that tb u is bounded from below and that for every ě 0 Then the following statements hold: b ` ` ξ ď b.. the sequence tξ u is summable, namely ÿ ξ ă `8.. the sequence tb u is monotonically decreasing and convergent. The following lemma, which is an extension of [, Lemma.3] (see, also [4, Lemma 3]), is of interest by its own. Lemma 3. Let a : `a (, a,..., a N be a sequence in RǸ and tδ u a sequence in R D, a ` c 0, a D a D a D ` ě, (3) where c 0 : pc 0,, c 0,,..., c 0,N q P R N, c : pc,, c,,..., c,n q P R Ǹ and c : pc,, c,,..., c,n q P R Ǹ fulfill c 0 ` c ` c ă. Assume further that there exists δ s ě 0 such that for every K ě K ě Then for every i,..., N we have K δ ď s δ. ÿ a i ă `8. In particular, for every i,..., N and every K ě K ě, it holds K a i ď Nÿ j p c 0,j c,j q a K j ` p c 0,jq a K` j ı ` a K` j ` sδ c 0,i c,i c,i. (4) 5
6 Proof. Fix K ě K ě. If K K or K K ` then (4) holds automatically. Consider now the case when K ě K `. Summing up the inequality(3) for K `,, K, we obtain C Since, K` a ` G ď C K` c 0, K` K` K` K` a ` a a a a G ` K` ÿ K`3 K K ÿ K` K ÿ K the inequality (5) can be rewritten as C `, C K c, C a c, K K` a `a K ` a K` a a K K a G ` C c, K` a G ` a ` a K` a K a K` a K` a a K K ` a a a K K ` a, G C E a ` A, a K` a K a K` a K` ď c 0, K which further implies» Nÿ p c 0,j c,j c,j q j G C E a Ac, a K ` a K ` c, K a j fi fl Hence, for every i,..., N it holds C K c 0 c c, K a G G E a Ac, a K ` a K ` K a G 0, a K ` a K`D K` c 0 c, a KD c 0, a K`D a K`D ` Nÿ j p c 0,j c,j q a K j ` p c 0,jq a K` j ı ` a K` j δ, δ. (5) K` ` δ K` δ. p c 0,i c,i c,i q K a i ď Nÿ j p c 0,j c,j q a K j ` p c 0,jq a K` j ı ` a K` j ` sδ and the conclusion follows by taing into consideration that c 0 ` c ` c ă. A proximal ADMM and a proximal linearized ADMM algorithm in the nonconvex setting In this section we will propose two proximal ADMM algorithms for solving the optimization problem () and we will study their convergence behaviour. In this context, a central role will be played by the augmented Lagrangian associated with problem (), which is defined for every r ą 0 as L r : R n ˆ R m ˆ R m Ñ R Y t`8u, L r px, z, yq g pzq ` h pxq ` xy, Ax zy ` r Ax z. 6
7 . General formulations and particular instances written in the spirit of full proximal splitting algorithms Algorithm. Let be the matrix sequences ( M P Sǹ, ( M P Sm`, r ą 0 and 0 ă ρ ă. For a given starting vector `x 0, z 0, y 0 P R n ˆ R m ˆ R m, generate the sequence `x, z, y ( for every ě 0 as: " z ` P arg min g pzq Ax z D ` r Ax z ` * z z, (6a) zpr m M " x ` P arg min h pxq Ax z `D ` r Ax z ` ` * x x, (6b) xpr n M y ` : y ` `Ax ` z `. (6c) Let tt u be a sequence of positive real numbers such that t r A ď, M : t Id ra A and M `x : 0 for every ě 0. Algorithm becomes an iterative scheme which generates a sequence, z, y ( for every ě 0 as: # z ` P arg min zpr m x ` P arg min xpr n g pzq ` r z Ax + r y, " h pxq ` x x ` t y A ` r `Ax z ` t y ` : y ` `Ax ` z `. Recall that the proximal point operator with parameter γ ą 0 of a proper and lower semicontinuous function Ψ: R N Ñ R Y t`8u is the set-valued operator defined as ([35]) " prox γψ : R N ÞÑ RN, prox γψ pxq arg min Ψ pyq ` * x y. ypr N γ The above particular instance of Algorithm is an iterative scheme formulated in the spirit of full splitting numerical methods, namely, the functions g and h are evaluated by their proximal operators, while the linear operator A and its adjoint are evaluated by simple forward steps. Exact formulas for the proximal operator are available not only for large classes of convex functions ([8]), but also for many nonconvex functions appearing in applications ([3, 3, 9]). The second algorithm that we propose in this paper replaces h in the definition of x ` by its linearization at x for every ě 0. Algorithm. Let be the matrix sequences ( M P Sǹ, ( M P Sm`, r ą 0 and 0 ă ρ ă. For a given starting vector `x 0, z 0, y 0 P R n ˆ R m ˆ R m, generate the sequence `x, z, y ( for every ě 0 as: " z ` P arg min g pzq Ax z D ` r zpr m x ` P arg min xpr n Ax z ` x, h `x D Ax z `D ` r y ` : y ` `Ax ` z `. * z z M Ax z ` ` *,, (7a) *, (7b) x x M Due to the presence of the variable metric inducing matrix sequences we can thus provide a unifying scheme for several linearized ADMM algorithms discussed in the literature (see [3, 3, 36, 37, 43, 45]), which can be recovered for specific choices of the variable metrics. When taing as for Algorithm M : t Id ra A, where t r A ď, and M : 0, for every ě 0, then Algorithm translates for every ě 0 into: z ` P arg min zpr m # g pzq ` r z Ax r y +, x ` : x t ` h `x ` A y ` r `Ax z `, y ` : y ` `Ax ` z `. (7c) 7
8 This iterative scheme has the remarable property that the smooth term is evaluated via a gradient step. This is an improvement with respect to other nonconvex ADMM algorithms, such as [4, 44], where the smooth function is involved in a subproblem, which can be in general difficult to solve, unless it can be reformulated as a proximal step (see [30]). We will carry out a parallel convergence analysis for Algorithm and Algorithm and wor to this end in the following setting. Assumption. Assume that A is surjective and r ą 0, ρ P p0, q, µ : sup M ă `8 and µ : sup M ă `8 are such that there exists γ ą with r ě p ` γq T L ą 0 (8) and where $ ρ & T 0 : λ min paa qρ r ρ % λ min paa q p ρq and M 3 : M ` ra A C Id ě 3 C ě 0, (9) if 0 ă ρ ď, if ă ρ ă, $ 4T & µ, for Algorithm, C 0 : r % 4T pl ` µ q, for Algorithm, r $ &, if 0 ă ρ ď, T : λ min paa qρ ρ %, if ă ρ ă, λ min paa q p ρq $ & L ` 4T pl ` µ q, for Algorithm, C : r % L ` 4T µ, for Algorithm. r Remar. Notice that (9) can be equivalently written as $ & 6µ M `ra A `L ` r C M ` 4 pl ` µ q T, for Algorithm, Id ě ě 0, where CM : % 4µ ` 6 pl ` µ q T, for Algorithm. (0) In the following we present some possible choices of the matrix sequences ( M and ( M which fulfill Assumption.. Since ra A P S ǹ, when sup M µ ą L, by choosing there exists α ą 0 such that " r ě max p ` γq T L, µ ě α ě * C M ą 0, µ L ˆ L ` CM ą 0. r Thus (8) is verified, while (0) is ensured when choosing M such that µ Id ě M ě α Id for every ě 0. # +. Let M : t Id ra A for every ě 0, where 0 ă t ă min r A,. Then the relation (0) L becomes t Id ra A `L ` r C M Id ě 0, which automatically holds (as also (8) does), if " * tc M r ě max p ` γq T L, ą 0. tl 8
9 3. If A is assumed to be also injective, then ra A ě rλ min pa Aq ą 0. By choosing # + it follows that r ě max p ` γq T L, L ` al ` 4λ min pa Aq C M λ min pa Aq ra A `L ` r C M Id ě 0, thus, (8) and (0) hold for an arbitrary sequence of symmetric and positive semidefinite matrices M (. A possible choice is M 0 and M 0 for every ě 0, which allows us to recover the classical ADMM. When proving convergence for variable metric algorithms designed for convex optimization problems one usually assumes monotonicity for the matrix sequences inducing the variable metrics (see, for instance, [7, 5]). It is worth to mention that in this paper we manage to perform the convergence analysis for both Algorithm and Algorithm without any monotonicity assumption on ( M and ( M.. Preliminaries of the convergence analysis The following result of Fejér monotonicity type will play a fundamental role in our convergence analysis. Lemma 4. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Then for every ě it holds: L `x` r, z `, y ` ` T `y A ` 0 y ` ď L r `x, z, y ` T 0 A `y y ` C0 ą 0, x ` x ` M z ` z 3 M x x. () Proof. Let ě be fixed. In both cases the proof builds on showing that the following inequality L `x` r, z `, y ` ` x ` x L M x ` x `ra A ` z ` z M ď L `x r, z, y ` y ` y () is true and on providing afterwards an upper bound for. For Algorithm : From (6a) we have y ` y. g `z ` Ax z `D ` r Ax z ` ` z ` z M ď g `z Ax z D ` r Ax z. (3) The optimality criterion of (6b) is h `x ` A y `Ax ra ` z ` ` M `x x `. (4) From (7) (applied for z : x ` ) we get h `x ` ď h `x Ax Ax `D ` Ax ` z `, Ax Ax `D x ` x M ` L x ` x. (5) By combining (3), (5) and (6c), after some rearrangements, we obtain (). By using the notation u l : h `x l ` M l `xl x ě (6) and by taing into consideration (6c), we can rewrite (4) as A y l` ρu l` ` p ρq A y ě 0. (7) 9
10 The case 0 ă ρ ď. We have A `y ` y ρ Since 0 ă ρ ď, the convexity of gives and from here we get `u ` u ` p ρq A `y y. `y A ` y ď ρ u ` u ` p ρq `y A y. λ min paa qρ y ` y ď ρ `y A ` y ď ρ u ` u ` p ρq `y A y p ρq `y A ` y, (8) By using the Lipschitz continuity of h, we have u ` u ď pl ` µ q x ` x ` µ x x, (9) thus u ` u ď pl ` µ q x ` x ` µ x x. (30) After plugging (30) into (8), we get y ` y ď pl ` µ q λ min paa q ` p ρq λ min paa qρ r which, combined with (), provides (). x ` x ` µ λ min paa q A `y y The case ă ρ ă. This time we have from (7) that A `y ` y p ρq As ă ρ ă, the convexity of gives x x p ρq λ min paa qρ r ρ `u` u ρ `y ` pρ q A y. `y A ` y ď ρ u ` u ` pρ q `y A y. ρ and from here it follows A `y ` y, λ min paa q p ρq y ` y ď p ρq `y A ` y ď ρ u ` u ` pρ q `y A y pρ q `y A ` y, (3) ρ After plugging (30) into (3), we get y ` y ď ρ pl ` µ q λ min paa q p ρq r pρ q ` λ min paa q p ρq pρ q λ min paa q p ρq which, combined with (), provides ().. For Algorithm : The optimality criterion of (7b) is x ` x ` A `y y (3) ρµ x λ min paa q p ρq x r A `y ` y, (33) h `x A y `Ax ra ` z ` ` M `x x `. (34) 0
11 From (7) (applied for z : x ) we get h `x ` ď h `x Ax Ax `D ` Ax ` z `, Ax Ax `D x ` x M ` L x ` x. (35) Since the definition of z ` in (7a) leads also to (3), by combining this inequality with (35) and (7c), after some rearrangments, () follows. By using this time the notation u l : h `x l ` M l `xl x ě (36) and by taing into consideration (7c), we can rewrite (34) as The case 0 ă ρ ď. As in (8), we obtain A y l` ρu l` ` p ρq A y ě 0. (37) λ min paa qρ y ` y ď ρ `y A ` y ď ρ u ` u ` p ρq `y A y p ρq `y A ` y. (38) By using the Lipschitz continuity of h, we have u ` u ď µ x ` x ` pl ` µ q x x, (39) thus u ` u ď µ x ` x ` pl ` µ q x x. (40) After plugging (40) into (38), it follows y ` y ď µ λ min paa q p ρq ` λ min paa qρ r which, combined with (), provides (). The case ă ρ ă. As in (3), we obtain x ` x ` pl ` µ q λ min paa q A `y y x x p ρq λ min paa qρ r A `y ` y, λ min paa q p ρq y ` y ď p ρq `y A ` y ď ρ u ` u ` pρ q `y A y pρ q `y A ` y. (4) ρ After plugging (40) into (4), it follows y ` y ď ρµ λ min paa q p ρq r pρ q ` λ min paa q p ρq pρ q λ min paa q p ρq which, combined with (), provides (). This concludes the proof. x ` x ` The following three estimates will be useful in the sequel. A `y y ρ pl ` µ q λ min paa q p ρq r (4) x x A `y ` y. (43) Lemma 5. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Then the following statements are true:
12 (i) z ` z ď A x ` x ` Ax ` z ` ` Ax z A x ` x ` y ` y ` y ě ; (44) (ii) (iii) y ` ď T 0 `y A ` y ` T `x` h ` C0 x ` ě 0; (45) r r 4 y ` y ď C3 x ` x ` C4 x x ` T ` A `y y A `y ` ě, (46) where $ ρ pl ` µ q a, for Algorithm, & λmin paa q p ρ q C 3 : ρµ % a, for Algorithm, λmin paa q p ρ q $ ρµ a, for Algorithm, & λmin paa q p ρ q C 4 : ρ pl ` µ q % a, for Algorithm, λmin paa q p ρ q T : ρ a λmin paa q p ρ q. (47) Proof. The statement in (44) is straightforward. From (7) and (37) we have for every ě 0 or, equivalently A y ` ρu ` ` p ρq A y ρa y ` ρu ` ` p ρq A `y y `, where u ` is defined as being equal to u ` in (6), for Algorithm, and, respectively, to u ` in (36), for Algorithm. For 0 ă ρ ď we have λ min paa qρ y ` ď ρ A y ` ď ρ u ` ` p ρq A `y ` y, (48) while when ă ρ ă we have λ min paa qρ y ` ď ρ A y ` ď Notice further that when ă ρ ă we have {ρ ă and ă ρ{ p ρq. When u ` is defined as in (6), it holds ρ u ` ` pρ q `y A ` y. (49) ρ u ` u ` ď `x` h ` µ x ` ě 0, (50) while, when u ` is defined as in (36), it holds u ` u ` ď h `x ` ` pl ` µ q x ` ě 0. (5) We divide (48) and (49) by λ min paa qρ r ą 0 and plug (50) and, respectively, (5) into the resulting inequalities. This gives us (45).
13 Finally, in order to prove (46), we notice that for every ě it holds A `y ` y ď ρ u ` u ` ρ A `y y, so, a λmin paa q p ρ q y ` y ď p ρ q A `y ` y ď ρ u ` u ` ρ A `y y ρ A `y ` y. (5) We plug into (5) the estimates for u ` u derived in (9) and, respectively, (39) and divide the resulting inequality by a λ min paa q p ρ q ą 0. This furnishes the desired statement. The following regularization of the augmented Lagrangian will play an important role in the convergence analysis of the nonconvex proximal ADMM algorithms: F r : R n ˆ R m ˆ R m ˆ R n ˆ R m Ñ R Y t`8u, F r px, z, y, x, y q L r px, z, yq ` T 0 A `y y ` C0 where T 0 and C 0 are defined in Assumption. For every ě, we denote F : F r `x, z, y, x, y L r `x, z, y ` T 0 A `y y ` C0 x x, (53) x x. (54) `x Since the convergence analysis will rely on the fact that the set of cluster points of the sequence, z, y ( is nonempty, we will present first two situations which guarantee that this sequence is bounded. They mae use of standard coercivity assumptions for the functions g and h, respectively. Recall that a function Ψ : R N Ñ R Y t`8u is called coercive, if lim x Ñ`8 Ψ pxq `8. Theorem 6. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Suppose that one of the following conditions holds: (B-I) The operator A is invertible, g is coercive and h is bounded from below; (B-II) The function h is coercive and g and h are bounded from below. Then the sequence `x, z, y ( is bounded. Proof. From Lemma 4 we have that for every ě F ` ` x ` x ` z ` z ď F M 3 C0Id M (55) which shows, according to (9), that tf u ě is monotonically decreasing. Consequently, for every ě we have F ě F ` ` x ` x ` M 3 C0Id h `x ` ` g `z ` r ` T 0 A `y ` y ` which, thans to (45), leads to F ě h `x ` ` g `z ` T r ` T0 `y A ` y ` y ` ` r z ` z M Ax` z ` ` x ` x M 3 C0Id ` h `x ` ` r r y` Ax` z ` ` x ` x M 3 C0Id ` z ` z M ` C0 }x` x }, r y` z ` z M ` C0 4 }x` x }. (56) Next we will prove the boundedness of `x, z, y ( under each of the two scenarios. 3
14 (B-I) Since r ě p ` γq T L ą T L ą 0, there exists σ ą 0 such that σ L σ T r. From Proposition and (56) we see that for every ě g `z ` ` r Ax` z ` ` y` r ` C0 }x` x } 4 " * ď F inf h pxq T h pxq ă `8. xpr n r Since g is coercive, it follows that the sequences z (, Ax z ` r y ( and x ` x ( are bounded. This implies that Apx ` x q pz ` z q ( is bounded, from which we obtain the boundedness of r py ` y q (. According to the third update in the iterative scheme, we obtain that Ax z ( and thus y ( are also bounded. This implies the boundedness of Ax ( and, finally, since A is invertible, the boundedness of x (. (B-II) Again thans to (8) there exists σ ą 0 such that σ L σ p ` γq T r. We assume first that ρ or, equivalently, T 0 0. From Proposition and (56) we see that for every ě ˆ h `x ` ` T h `x ` ` r γ γr Ax` z ` ` y` r ` T0 }A py ` y q} ď F g `z ` γ inf xpr n " h pxq p ` γq T h pxq r * ă `8. Since h is coercive, we obtain that x (, Ax z ` r y ( and A `y ` y ( are bounded. For every ě 0 we have that λ min pa Aqρ r }Ax ` z ` } λ min pa Aqρ r }y ` y } ď }A py ` y q}, thus Ax z ( is bounded. Consequently, y ( and z ( are bounded. In case ρ or, equivalently, T 0 0, we have that for every ě ˆ h `x ` ` T `x` h ` r γ γr Ax` z ` ` ď F g `z ` " γ inf h pxq p ` γq T * h pxq ă `8, xpr n r r y` from which we deduce that x ( and Ax z ` r y ( are bounded. From Lemma 5 (iii) it yields that y ` y ( is bounded, thus, Ax z ( is bounded. Consequently, y ( ( and z are bounded. Both considered scenarios lead to the conclusion that the sequence `x, z, y ( is bounded. We state now the first convergence result of this paper. Theorem 7. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm, which is assumed to be bounded. The following statements are true: (i) For every ě it holds F ` ` C0 4 x ` x ` z ` z ď F M. (57) (ii) The sequence tf u is bounded from below and convergent. Moreover, x ` x Ñ 0, z ` z Ñ 0 and y ` y Ñ 0 as Ñ `8. (58) 4
15 (iii) The sequences tf u, L `x r, z, y ( and h `x ` g `z ( have the same limit, which we denote by F P R. Proof. (i) According to (9) we have that M 3 C 0 Id P P n C 0 and thus (55) implies (57). (ii) We will show that L `x r, z, y ( is bounded from below, which will imply that tf u is bounded from below as well. Assuming the contrary, as `x, z, y ( is bounded, there exists a subsequence `xq, z q, y q (qě0 converging to an element ppx, pz, pyq P Rn ˆ R m ˆ R m such that Lr `x q, z q, y q ( converges to 8 as q Ñ `8. However, using the lower semicontinuity of g qě0 and the continuity of h, we obtain lim inf L r `x q, z q, y q r ě h ppxq ` g ppzq ` xpy, Apx pzy ` qñ`8 Apx pz, which leads to a contradiction. From Lemma we conclude that tf u ě is convergent and ÿ x ` x ă `8, thus x ` x Ñ 0 as Ñ `8. We proved in (3), (33), (4) and (43) that for every ě y ` y ď C L x ` x ` C0 x x ` T 0 A `y y T 0 A `y ` y. Summing up the above inequality for,..., K, for K ą, we get y ` y ď C L x ` x ` C0 x x ` T 0 A `y y 0 T 0 A `y K` y K ď C L We let K converge to `8 and conclude ÿ x ` x ` C0 Ax ` z ` ÿ x x ` T 0 A `y y 0. y ` y ă `8, thus Ax ` z ` Ñ 0 and y ` y Ñ 0 as Ñ `8. Since x ` x Ñ 0 as Ñ `8, it follows that z ` z Ñ 0 as Ñ `8. (iii) By using (58) and the fact that y ( is bounded, it follows F lim F lim L `x r, z, y `x h ` g `z (. Ñ`8 Ñ`8 lim Ñ`8 The following lemmas provides upper estimates in terms of the iterates for limiting subgradients of the augmented Lagrangian and the regularized augmented Lagrangian F r, respectively. Lemma 8. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. For every ě 0 we have d ` ` : `d x, d ` z, d ` `x` y P BLr, z `, y `, (59) where `x` d ` x : C ` h h `x ` `y A ` y ` M `x x `, d ` z : y y ` ` ra `x x ` ` M `z z `, d ` y : `y` y. (60a) (60b) (60c) 5
16 and C : #, for Algorithm, 0, for Algorithm. where Moreover, for every ě 0 it holds d ` ď C 5 x ` x ` C 6 z ` z ` C 7 y ` y, (6) C 5 : C L ` µ ` r A, C 6 : µ, C 7 : ` A `. (6) Proof. Let ě 0 be fixed. Applying the calculus rules of the limiting subdifferential, we obtain x L r `x`, z `, y ` h `x ` ` A y ` ` ra `Ax ` z `, B z L r `x`, z `, y ` Bg `z ` y ` r `Ax ` z `, y L r `x`, z `, y ` Ax ` z `. (63a) (63b) (63c) Then (60c) follows directly from (63c) and (6c), respectively, (7c), while (60b) follows from y ` rpax z ` q ` M `z z ` P Bg `z `, which is a consequence of the optimality criterion of (6a) and (7a), respectively. In order to derive (60a), let us notice that for Algorithm we have (see (4)) A y ` M `x x ` h `x ` ` `Ax ra ` z `, (64) while for Algorithm we have (see (34)) h `x A y ` M `x x ` `Ax ra ` z `. (65) By using (63a) we get the desired statement. Relation (6) follows by combining the inequalities d ` x ď pc L ` µ q x ` x ` A y ` y, d ` ď y y ` ` r A x ` x ` µ z ` z. with (5). z Lemma 9. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. For every ě 0 we have D ` :, D `, D `, D `, D` P BF `x` r, z `, y `, x, y (66) where D ` x D ` x : d ` `x` x ` C 0 x, Dz ` D ` x : C `x` 0 x, Moreover, for every ě 0 it holds where z y x y : d ` z, Dy ` : d ` y ` T `y 0 AA ` y, D ` y : T 0AA `y ` y. (67) D ` ď C 8 x ` x ` C9 z ` z ` C0 y ` y, (68) C 8 : C 5 ` C 0, C 9 : C 6, C 0 : C 7 ` 4T A. (69) Proof. Let ě 0 be fixed. Applying the calculus rules of the limiting subdifferential it follows x F `x` r, z `, y `, x, y : x L `x` r, z `, y ` `x` ` C 0 x, B z F `x` r, z `, y `, x, y : B z L `x` r, z `, y ` y F `x` r, z `, y `, x, y : y L `x` r, z `, y ` ` T `y 0 AA ` y, x F `x` r, z `, y `, x, y `x` : C 0 x, y F `x` r, z `, y `, x, y : T `y 0 AA ` y, (70a) (70b) (70c) (70d) (70e) 6
17 Then (66) follows directly from the above relations and (59). Inequality (68) follows by combining D ` x ď d ` x ` C0 x ` x, D ` ď d ` ` T A y ` y. with (5). y y The following result is a straightforward consequence of Lemma 5 and Lemma 9. Corollary 0. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Then the norm of the element D ` P BF `x` r, z `, y `, x, y defined in the previous lemma verifies for every ě the following estimate ` D ` ď C x ` x ` x x ` x x where ` C ` A `y y A `y ` y ` C 3 ` A `y y A `y y, (7) " C : max C 8 ` C 9 A ` C 3 C 0 ` C3C 9 C : ˆ C 0 ` C9 T, C 3 : C 9T In the following, we denote by ω u (, C 4C 0 ` C3C 9, C 4C 9 *,. (7) the set of cluster points of a sequence u ( Ď RN. Lemma. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm, which is assumed to be bounded. The following statements are true: (i) if px q, z q, y q q ( is a subsequence of `x, z, y ( which converges to ppx, pz, pyq as q Ñ `8, qě0 then lim L r `x q, z q, y q Lr ppx, pz, pyq ; qñ8 (ii) it holds `x ω, z, y ( Ď crit pl r q Ď tppx, pz, pyq P R n ˆ R m ˆ R m : A py h ppxq, py P Bg ppzq, pz Apxu ; `x (iii) we have lim dist, z, y `x, ω, z, y ( ı 0; Ñ`8 `x (iv) the set ω, z, y ( is nonempty, connected and compact; (v) the function L r taes on ω `x, z, y ( objective function g A ` h does on Pr R n the value F `x ω, z, y ( lim Ñ`8 L `x r, z, y, as the ı. `x Proof. Let ppx, pz, pyq P ω, z, y ( and `xq, z q, y q (qě0 be a subsequence of x, z, y ( converging to ppx, pz, pyq as q Ñ `8. (i) From either (6a) or (7a) we obtain for all q ě g `z D ` y q, Ax q r z q ` ď g ppzq q, Ax q pz D ` r Ax q z q ` Ax q pz ` Taing the limit superior on both sides of the above inequalities, we get lim sup g `z q ď g ppzq, qñ8 z q z q q M pz z q q M. 7
18 which, combined with the lower semicontinuity of g, leads to lim g `z q g ppzq. qñ8 Since h is continuous, we further obtain lim L r `x q, z q, y q lim g `z D ` h `x q ` y q r, Ax q z q ` ı Ax q z q qñ8 qñ8 g ppzq ` h ppxq ` xpy, Apx pzy ` r Apx pz L r ppx, pz, pyq. (ii) For the sequence d ( defined in (60a) - (60c), we have that dq P BL r px q, z q, y q q for every q ě and d q Ñ 0 as q Ñ `8, while `x q, z q, y q Ñ ppx, pz, pyq and Lr `x q, z q, y q Ñ Lr ppx, pz, pyq as q Ñ `8. The closedness criterion of the limiting subdifferential guarantees that 0 P BL r ppx, pz, pyq or, in other words, ppx, pz, pyq P crit pl r q. Choosing now an element ppx, pz, pyq P crit pl r q, it holds which is further equivalent to 0 h ppxq ` A py ` ra papx pzq, 0 P Bg ppzq py r papx pzq, 0 Apx pz, A py h ppxq, py P Bg ppzq, pz Apx. (iii)-(iv) The proof follows in the lines of the proof of Theorem 5 (ii)-(iii) in [9], also by taing into consideration [9, Remar 5], according to which the properties in (iii) and (iv) are generic for sequences satisfying `x `, z `, y ` `x, z, y Ñ 0 as Ñ `8, which is indeed the case due to (58). (v) The conclusion follows according to the first two statements of this theorem and of the third statement of Theorem 7. Remar 3. An element ppx, pz, pyq P R n ˆ R m ˆ R m fulfilling A py h ppxq, py P Bg ppzq, pz Apx is a so-called KKT point of the optimization problem (). For such a KKT point we have When A is injective this is further equivalent to 0 A Bg papxq ` h ppxq. (73) 0 P Bpg Aqppxq ` h ppxq B pg A ` hq ppxq, (74) in other words, px is a critical point of the optimization problem (). On the other hand, when the functions g and h are convex, then (73) and (74) are equivalent, which means that px is a global optimal solution of the optimization problem (). In this case, py is a global optimal solution of the Fenchel dual problem of (). By combining Lemma 9, Theorem 7 and Lemma, one obtains the following result. Lemma. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm `x or Algorithm, which is assumed to be bounded. Denote by Ω : ω, z, y, x, y (. ě The following statements are true: (i) it holds Ω Ď tppx, pz, py, px, pyq P R n ˆ R m ˆ R m ˆ R n ˆ R m : ppx, pz, pyq P crit pl r qu ; (ii) we have lim dist `x, z, y, x, y, Ω 0; Ñ`8 (iii) the set Ω is nonempty, connected and compact; (iv) the regularized augmented Lagrangian F r taes on Ω the value F lim Ñ`8 F, as the objective function g A ` h does on Pr R nω. 8
19 .3 Convergence analysis under Kurdya- Lojasiewicz assumptions In this subsection we will prove global convergence for the sequence `x, z, y ( generated by the two nonconvex proximal ADMM algorithms in the context of K L property. The origins of this notion go bac to the pioneering wor of Kurdya who introduced in [8] a general form of the Lojasiewicz inequality ([33]). A further extension to the nonsmooth setting has been proposed and studied in [6, 7, 8]. We recall that the distance function of a given set Ω Ď R N is defined for every x by dist px, Ωq : inf t x y : y P Ωu. If Ω H, then dist px, Ωq `8. Definition. Let η P p0, `8s. We denote by Φ η the set of all concave and continuous functions ϕ: r0, ηq Ñ r0, `8q which satisfy the following conditions:. ϕ p0q 0;. ϕ is C on p0, ηq and continuous at 0; 3. for all s P p0, ηq : ϕ psq ą 0. Definition. Let Ψ: R N Ñ R Y t`8u be proper and lower semicontinuous.. The function Ψ is said to have the Kurdya- Lojasiewicz (K L) property at a point pu P dombψ : u P R N : BΨ puq H (, if there exists η P p0, `8s, a neighborhood U of pu and a function ϕ P Φ η such that for every the following inequality holds u P U X rψ ppuq ă Ψ puq ă Ψ ppuq ` ηs ϕ pψ puq Ψ ppuqq dist p0, BΨ puqq ě.. If Ψ satisfies the K L property at each point of dombψ, then Ψ is called K L function. The functions ϕ belonging to the set Φ η for η P p0, `8s are called desingularization functions. The K L property reveals the possibility to reparameterize the values of Ψ in order to avoid flatness around the critical points. To the class of K L functions belong semialgebraic, real subanalytic, uniformly convex functions and convex functions satisfying a growth condition. We refer the reader to [, 3, 4, 6, 7, 8, 9] and to the references therein for more properties of K L functions and illustrating examples. The following result, taen from [9, Lemma 6], will be crucial in our convergence analysis. Lemma 3. (Uniformized K L property) Let Ω be a compact set and Ψ: R N Ñ RYt`8u be a proper and lower semicontinuous function. Assume that Ψ is constant on Ω and satisfies the K L property at each point of Ω. Then there exist ε ą 0, η ą 0 and ϕ P Φ η such that for every pu P Ω and every element u in the intersection u P R N : dist pu, Ωq ă ε ( X rψ ppuq ă Ψ puq ă Ψ ppuq ` ηs it holds ϕ pψ puq Ψ ppuqq dist p0, BΨ puqq ě. Woring in the hypotheses of Lemma, we define for every ě E : F `x, z, y, x, y F F F ě 0, (75) where F is the limit of tf u ě as Ñ `8. The sequence te u ě is monotonically decreasing and it converges to 0 as Ñ `8. The next result shows that when the regularization of the augmented Lagrangian F r is a K L function, then the sequence `x, z, y ( converges to a KKT point of the optimization problem (). Theorem 4. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm, which is assumed to be bounded. If F r is a K L function, then the following statements are true: (i) the sequence `x, z, y ( has finite length, namely, ÿ x ` x ÿ ă `8, z ` z ă `8, 9 ÿ y ` y ă `8; (76)
20 (ii) the sequence `x, z, y ( converges to a KKT point of the optimization problem (). `x Proof. As in Lemma, we denote by Ω : ω, z, y, x, y (, which is a nonempty set. ě Let be ppx, pz, py, px, pyq P Ω, thus F r ppx, pz, py, px, pyq F. We have seen that te F F u ě converges to 0 as Ñ `8 and will consider, consequently, two cases. First we assume that there exists an integer ě 0 such that E 0 or, equivalently, F F. Due to the monotonicity of te u ě, it follows that E 0 or, equivalently, F F for all ě. Combining inequality (57) with Lemma 5, it yields that x ` x 0 for all ě `. Using Lemma 5 (iii) and telescoping sum arguments, it yields ÿ y ` y ă `8. Finally, by using Lemma 5 (i), we get that ÿ z ` z ă `8. Consider now the case when E ą 0 or, equivalently, F ą F for every ě. According to Lemma 3, there exist ε ą 0, η ą 0 and a desingularization function ϕ such that for every element u in the intersection tu P R n ˆ R m ˆ R m ˆ R n ˆ R m : dist pu, Ωq ă εu X tu P R n ˆ R m ˆ R m ˆ R n ˆ R m : F ă F r puq ă F ` ηu (77) it holds ϕ pf r puq F q dist p0, BF r puqq ě. Let be ě such that for all ě F ă F ă F ` η. Since lim dist `x, z, y, x, y, Ω 0, see Lemma (ii), there exists ě such that for all Ñ`8 ě dist `x, z, y, x, y, Ω ă ε. Thus, `x, z, y, x, y belongs to the intersection in (77) for all ě 0 : max t,, 3u, which further implies ϕ pf F q dist `0, BF r `x, z, y, x, y ϕ pe q dist `0, BF r `x, z, y, x, y ě. (78) Define for two arbitrary nonnegative integers p and q Then for all K ě 0 ě it holds p,q : ϕ pf p F q ϕ pf q F q ϕ pe p q ϕ pe q q. from which we get ÿ ě,` ă `8. 0,` 0,K` ϕ pe 0 q ϕ pe K` q ď ϕ pe 0 q, By combining Theorem 7 (i) with the concavity of ϕ we obtain for all ě,` ϕ pe q ϕ pe ` q ě ϕ pe q re E ` s ϕ pe q rf F ` s ě ϕ pe q C 0 4 The last relation combined with (78) imply for all ě 0 x ` x ď ϕ pe q dist `0, BF r `x, z, y, x, y x ` x ď 4 C 0,` dist `0, BF r `x, z, y, x, y. x ` x. (79) 0
21 By the arithmetic mean-geometric mean inequality and Corollary 0 we have that for every ě 0 and every β ą 0 x ` x ď c 4 C 0,` dist p0, BF r px, z, y, x, y qq We denote for every ě 3 ď β,` ` C 0 β dist `0, BF `x r, z, y, x, y ď β,` ` C ` x x ` x x ` x x 3 C 0 β ` C ` `y A y `y A y β ` C3 ` `y A y 3 `y A y. (80) β a : x x ě 0, δ : β,` ` C ` `y A y `y A y C 0 β ` C3 ` `y A y 3 A `y y. β The inequality (80) is nothing than (3) with c 0 c c : C β. Observe that for every K ě 0 we have δ ď β ϕ pe 0 q ` C C 0 β 0 `y A 0 y 0 C 3 ` `y A 0 y 0 3 β and thus, by choosing β ą 3C, we can use Lemma 3 to conclude that ÿ x ` x ă `8. The other two statements in (76) follow from Lemma 5. This means that the sequence `x, z, y ( is Cauchy, thus it converges to an element ppx, pz, pyq which is, according to Lemmas, a KKT point of the optimization problem (). Remar 4. The function F r is a K L function if, for instance, the objective function of () is semialgebraic, which is the case when the functions g and h are semi-algebraic. 3 Convergence rates under Lojasiewicz assumptions In this section we derive convergence rates for the sequence `x, z, y ( generated by Algorithm or Algorithm as well as for the regularized augmented Lagrangian function F r along this sequence, provided that the latter satisfies the Lojasiewicz property. 3. Lojasiewicz property and a technical lemma We recall the following definition from [] (see, also, [33]). Definition 3. Let Ψ: R N Ñ R Y t`8u be proper and lower semicontinuous. Then Ψ satisfies the Lojasiewicz property if for any critical point pu of Ψ, there exists C L ą 0, θ P r0, q and ε ą 0 such that Ψ puq Ψ ppuq θ ď C L distp0, P Ball ppu, εq, (8) where Ball ppu, εq denotes the open ball with center pu and radius ε.
22 Providing that the Assumption is fulfilled and `x, z, y ( is the sequence generated by Algorithm or Algorithm `x, which is assumed to be bounded, we have seen in Lemma that the set of cluster points Ω ω, z, y, x, y ( is nonempty, compact and connected and F r taes on Ω the value F ; moreover for any ppx, pz, py, px, pyq P Ω, ppx, pz, pyq belongs to crit pl r q. According to [, Lemma ], if F r has the Lojasiewicz property, then there exist C L ą 0, θ P r0, q and ε ą 0 such that for any `x, z, y, x, y P tu P R n ˆ R m ˆ R m : dist pu, Ωq ă εu, it holds Fr `x, z, y, x, y F θ ď C L dist `0, BF r `x, z, y, x, y. Obviously, F r is a K L function with desingularization function ϕ : r0, `8q Ñ r0, `8q such that ϕ psq : θ C Ls θ, which, according to Theorem 4, means that Ω contains a single element ppx, pz, py, px, pyq, namely, the limit of `x, z, y, x, y ( as Ñ `8. In other words, if F r has the Lojasiewicz property, then there exist C L ą 0, θ P r0, q and ε ą 0 such that Fr `x, z, y, x, y F θ ď C L dist `0, BF r `x, z, y, x, `x, z, y, x, y P Ball pppx, pz, py, px, pyq, εq. (8) In this case, F r is said to satisfy the Lojasiewicz property with Lojasiewicz constant C L ą 0 and Lojasiewicz exponent θ P r0, q. The following lemma will provides convergence rates for a particular class of monotonically decreasing sequences converging to 0. Lemma 5. Let te u be a monotonically decreasing sequence in R` converging 0. Assume further that there exists natural numbers 0 ě l 0 ě such that for every ě 0 e l0 e ě C e e θ, (83) where C e ą 0 is some constant and θ P r0, q. Then following statements are true: (i) if θ 0, then te u converges in finite time; (ii) if θ P p0, {s, then there exists C e,0 ą 0 and Q P r0, q such that for every ě 0 0 ď e ď C e,0 Q ; (iii) if θ P p{, q, then there exists C e, ą 0 such that for every ě 0 ` l 0 0 ď e ď C e, p l 0 ` q θ. Proof. Fix an integer ě 0. Since 0 ě l 0 ě 0, the recurrence inequality (83) is well defined for every ě 0. (i) The case when θ 0. We assume that e ą 0 for every ě 0. From (83) we get e l0 e ě C e ą 0, for every ě 0, which actually leads to contradiction to the fact that te u converges to 0 as Ñ `8. Consequently, there exists ě 0 such that e 0 for every ě and thus the conclusion follows. For the proof of (ii) and (iii) we can assume that e ą 0 for every ě 0. Otherwise, as te u is monotonically decreasing and converges to 0, the sequence is constant beginning with a given index, which means that both statements are true. (ii) The case when θ P p0, {s. We have e ď e 0, thus e0 θ e ď e θ, which leads to Therefore e ď C e e θ 0 ` e l ď ď ˆ ď C e e θ 0 ` e l0 e ě C e e θ ě C e e θ 0 ě 0. ˆ C e e θ 0 ` ] Y 0 l 0 l 0 0 l0 e 0 e 0 `Ce e θ 0 ` 0 max te 0`j : j 0,..., l 0 u l0 ` l bc 0 e e0 θ `,
23 where tpu denotes the greatest integer that is less than or equal to the real number p. This provides the linear convergence rate, as P r0, q. l bc 0 e e θ 0 ` (iii) The case when θ P p{, q. From (83) we get C e ď pe l0 e q e θ. (84) Define ζ : p0, `8q Ñ R, ζpsq s θ. We have that ˆ d s θ s θ ζ psq and ζ psq θs θ ă P p0, `8q. ds θ Consequently, ζ pe l0 q ď ζ psq for all s P re, e l0 s. Assume that ζ pe q ď ζ pe l0 q. Then (84) gives or, equivalently, Assume that ζ pe q ą ζ pe l0 q. equivalent to C e ď pe l0 e q ζ pe q ď pe l0 e q ζ pe l q ż e l0 ζ pe l0 q ds ď e `e θ e θ l θ 0 ż e l0 e ζ psq ds e θ e θ l 0 ě C, where C : pθ q C e ą 0. (85) In other words, eθ l 0 ą e θ. For ν : θ νe l0 ě e ô ν θ e θ l 0 ď e θ ô `ν θ e θ l 0 ď e θ e θ l 0. P p0, q this is Recall that ν θ ą 0, since θ ă 0, and e θ 0 ď e θ l 0, since te u is monotonically decreasing, and thus e θ e θ l 0 ě `ν θ e θ l 0 ě C, where C : `ν θ e θ 0 ą 0. (86) In both situations we get for every i ě 0 e θ i e θ i l 0 ě C : min C, C ( ą 0, (87) where C and C are defined as in (85) and (86), respectively. For every ě 0 ` l 0, by summing up the inequalities (87) for i 0 ` l 0,,, we get l 0 ÿ j 0 e θ j e θ 0`j ě C p 0 l 0 ` q ą 0. Using the fact that θ ă 0 and the monotonicity of te i u iě0, it yields and thus which gives e 0`l 0 ď ď e 0 ô e θ 0`l 0 ě ě e θ 0 ô e θ 0 ě ě e θ 0`l 0 `e θ l 0 e θ 0 ě l 0 ÿ j 0 Moreover, we obtain from (87) that Z 0 ` l 0 ě e θ 0 e θ j e θ 0`j ě C p 0 l 0 ` q, e θ ě e θ 0 ` 0 l 0 ` l 0 C. (88) l 0 ^ ˆ0 C ` l 0 ě C 0 C. (89) l 0 l 0 3
24 By plugging (89) into (88) we obtain e θ ě l 0 ` l 0 C, which implies This concludes the proof. θ ˆC e ď p l0 ` q θ. (90) l 0 Remar 5. The inequality in Lemma 5 (iii) can be writen in term of instead of l 0 ` when large enough. For instance, when ě and thus from (90) we get 3. Convergence rates γ γ pl 0 ` q for some γ ą then we have that l 0 ` ě γ ˆ θ ˆC e ď p l0 ` q C θ θ ď l 0 γ θ. l 0 In this subsection we will study the convergence rates of Algorithm and in the context of an assumption which is slightly more restricitve than Assumption. Assumption. We wor in the hypotheses of Assumption except for (9) which is replaced by M 3 : M ` ra A ` pl C q Id ě 5 C ě 0, (9) Notice that (9) can be written as $ & 0µ M `ra A `L ` r CM Id ě ě 0, where C ` 8 pl ` µ q T, for Algorithm, M : % 8µ ` 0 pl ` µ q T, for Algorithm. (9) Therefore (9) is nothing else than (0) after replacing C M by the bigger constant CM. So, all the examples in Remar can be adapted to the new setting and provide framewors which guarantee Assumption. The scenarios which ensure Assumption evidently satisfy Assumption, therefore the results investigated in Section remain valid in this setting. As follows we will provide improvements of the statements used in the convergence analysis which can be obtained thans to Assumptions by using similar techniques. Firstly, by the same arguments as in Lemma 4, we have that for every ě (see ()) L `x` r, z `, y ` ` x ` x L M x ` `ra A x ` z ` z M ď L `x r, z, y ` y ` y (93) and (see (3), (33), (4) and (43)) y ` y ď C L x ` x ` C0 x x ` T 0 A `y y T 0 A `y ` y. (94) By multiplying (94) by and by adding the resulting inequality to (93), we obtain for every ě L `x` r, z `, y ` ` x ` x ` M 3 T 0 A `y ` y ` y ` y z ` z M ` ď L r `x, z, y ` T 0 A `y y ` C 0 x x. (95) 4
A proximal minimization algorithm for structured nonconvex and nonsmooth problems
A proximal minimization algorithm for structured nonconvex and nonsmooth problems Radu Ioan Boţ Ernö Robert Csetnek Dang-Khoa Nguyen May 8, 08 Abstract. We propose a proximal algorithm for minimizing objective
More informationSecond order forward-backward dynamical systems for monotone inclusion problems
Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from
More informationADMM for monotone operators: convergence analysis and rates
ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated
More informationDouglas-Rachford splitting for nonconvex feasibility problems
Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying
More informationAn inexact subgradient algorithm for Equilibrium Problems
Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,
More informationIntroduction and Preliminaries
Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis
More informationarxiv: v2 [math.oc] 21 Nov 2017
Unifying abstract inexact convergence theorems and block coordinate variable metric ipiano arxiv:1602.07283v2 [math.oc] 21 Nov 2017 Peter Ochs Mathematical Optimization Group Saarland University Germany
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationErdinç Dündar, Celal Çakan
DEMONSTRATIO MATHEMATICA Vol. XLVII No 3 2014 Erdinç Dündar, Celal Çakan ROUGH I-CONVERGENCE Abstract. In this work, using the concept of I-convergence and using the concept of rough convergence, we introduced
More informationADVANCE TOPICS IN ANALYSIS - REAL. 8 September September 2011
ADVANCE TOPICS IN ANALYSIS - REAL NOTES COMPILED BY KATO LA Introductions 8 September 011 15 September 011 Nested Interval Theorem: If A 1 ra 1, b 1 s, A ra, b s,, A n ra n, b n s, and A 1 Ě A Ě Ě A n
More informationComputational Statistics and Optimisation. Joseph Salmon Télécom Paristech, Institut Mines-Télécom
Computational Statistics and Optimisation Joseph Salmon http://josephsalmon.eu Télécom Paristech, Institut Mines-Télécom Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward
More informationApproaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping
March 0, 206 3:4 WSPC Proceedings - 9in x 6in secondorderanisotropicdamping206030 page Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping Radu
More informationAn inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions
An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions Radu Ioan Boţ Ernö Robert Csetnek Szilárd Csaba László October, 1 Abstract. We propose a forward-backward
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationFrom error bounds to the complexity of first-order descent methods for convex functions
From error bounds to the complexity of first-order descent methods for convex functions Nguyen Trong Phong-TSE Joint work with Jérôme Bolte, Juan Peypouquet, Bruce Suter. Toulouse, 23-25, March, 2016 Journées
More informationMathematical Finance
ETH Zürich, HS 2017 Prof. Josef Teichmann Matti Kiiski Mathematical Finance Solution sheet 14 Solution 14.1 Denote by Z pz t q tpr0,t s the density process process of Q with respect to P. (a) The second
More informationProximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems
Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems Jérôme Bolte Shoham Sabach Marc Teboulle Abstract We introduce a proximal alternating linearized minimization PALM) algorithm
More informationProximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)
Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 206) Instructor: Wotao Yin April 29, 207 Given a function f, the proximal operator maps an input point x to the minimizer of f
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationKaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization
Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth
More informationON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS
MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth
More informationEntropy and Ergodic Theory Lecture 27: Sinai s factor theorem
Entropy and Ergodic Theory Lecture 27: Sinai s factor theorem What is special about Bernoulli shifts? Our main result in Lecture 26 is weak containment with retention of entropy. If ra Z, µs and rb Z,
More informationSome Properties of the Augmented Lagrangian in Cone Constrained Optimization
MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented
More informationSubdifferential representation of convex functions: refinements and applications
Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential
More informationChapter 2 Convex Analysis
Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,
More informationOn proximal-like methods for equilibrium programming
On proximal-lie methods for equilibrium programming Nils Langenberg Department of Mathematics, University of Trier 54286 Trier, Germany, langenberg@uni-trier.de Abstract In [?] Flam and Antipin discussed
More informationBASICS OF CONVEX ANALYSIS
BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,
More informationMAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9
MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended
More informationReal Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi
Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationA Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions
A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework
More informationREAL ANALYSIS II TAKE HOME EXAM. T. Tao s Lecture Notes Set 5
REAL ANALYSIS II TAKE HOME EXAM CİHAN BAHRAN T. Tao s Lecture Notes Set 5 1. Suppose that te 1, e 2, e 3,... u is a countable orthonormal system in a complex Hilbert space H, and c 1, c 2,... is a sequence
More informationA projection-type method for generalized variational inequalities with dual solutions
Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 4812 4821 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa A projection-type method
More informationDS-GA 1002: PREREQUISITES REVIEW SOLUTIONS VLADIMIR KOBZAR
DS-GA 2: PEEQUISIES EVIEW SOLUIONS VLADIMI KOBZA he following is a selection of questions (drawn from Mr. Bernstein s notes) for reviewing the prerequisites for DS-GA 2. Questions from Ch, 8, 9 and 2 of
More informationOptimization Theory. A Concise Introduction. Jiongmin Yong
October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization
More informationWE consider an undirected, connected network of n
On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been
More informationA Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions
A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the
More informationOn the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems
On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the
More informationOn the acceleration of the double smoothing technique for unconstrained convex optimization problems
On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities
More informationContents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.
Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological
More informationEntropy and Ergodic Theory Notes 22: The Kolmogorov Sinai entropy of a measure-preserving system
Entropy and Ergodic Theory Notes 22: The Kolmogorov Sinai entropy of a measure-preserving system 1 Joinings and channels 1.1 Joinings Definition 1. If px, µ, T q and py, ν, Sq are MPSs, then a joining
More informationA gradient type algorithm with backward inertial steps for a nonconvex minimization
A gradient type algorithm with backward inertial steps for a nonconvex minimization Cristian Daniel Alecsa Szilárd Csaba László Adrian Viorel November 22, 208 Abstract. We investigate an algorithm of gradient
More informationarxiv: v2 [math.ca] 13 May 2015
ON THE CLOSURE OF TRANSLATION-DILATION INVARIANT LINEAR SPACES OF POLYNOMIALS arxiv:1505.02370v2 [math.ca] 13 May 2015 J. M. ALMIRA AND L. SZÉKELYHIDI Abstract. Assume that a linear space of real polynomials
More informationOn duality theory of conic linear problems
On duality theory of conic linear problems Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 3332-25, USA e-mail: ashapiro@isye.gatech.edu
More informationarxiv: v2 [math.oc] 28 Jan 2019
On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming Liang Chen Xudong Li Defeng Sun and Kim-Chuan Toh March 28, 2018; Revised on Jan 28, 2019 arxiv:1803.10803v2
More informationOn nonexpansive and accretive operators in Banach spaces
Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 3437 3446 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa On nonexpansive and accretive
More informationGEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect
GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, 2018 BORIS S. MORDUKHOVICH 1 and NGUYEN MAU NAM 2 Dedicated to Franco Giannessi and Diethard Pallaschke with great respect Abstract. In
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationConvergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization
Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization Szilárd Csaba László November, 08 Abstract. We investigate an inertial algorithm of gradient type
More informationConvex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More information1 Directional Derivatives and Differentiability
Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=
More informationEXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE. Leszek Gasiński
DISCRETE AND CONTINUOUS Website: www.aimsciences.org DYNAMICAL SYSTEMS SUPPLEMENT 2007 pp. 409 418 EXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE Leszek Gasiński Jagiellonian
More informationChapter 2 Metric Spaces
Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics
More informationEfficient Energy Maximization Using Smoothing Technique
1/35 Efficient Energy Maximization Using Smoothing Technique Bogdan Savchynskyy Heidelberg Collaboratory for Image Processing (HCI) University of Heidelberg 2/35 Energy Maximization Problem G pv, Eq, v
More informationUNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems
UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More information1. Introduction. In this paper we consider a class of matrix factorization problems, which can be modeled as
1 3 4 5 6 7 8 9 10 11 1 13 14 15 16 17 A NON-MONOTONE ALTERNATING UPDATING METHOD FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS LEI YANG, TING KEI PONG, AND XIAOJUN CHEN Abstract. In this paper we consider
More informationConvex Analysis and Optimization Chapter 2 Solutions
Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationarxiv: v1 [math.oc] 21 Apr 2016
Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and
More informationEntropy and Ergodic Theory Lecture 7: Rate-distortion theory
Entropy and Ergodic Theory Lecture 7: Rate-distortion theory 1 Coupling source coding to channel coding Let rσ, ps be a memoryless source and let ra, θ, Bs be a DMC. Here are the two coding problems we
More informationSEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS
SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between
More informationFor example, p12q p2x 1 x 2 ` 5x 2 x 2 3 q 2x 2 x 1 ` 5x 1 x 2 3. (a) Let p 12x 5 1x 7 2x 4 18x 6 2x 3 ` 11x 1 x 2 x 3 x 4,
SOLUTIONS Math A4900 Homework 5 10/4/2017 1. (DF 2.2.12(a)-(d)+) Symmetric polynomials. The group S n acts on the set tx 1, x 2,..., x n u by σ x i x σpiq. That action extends to a function S n ˆ A Ñ A,
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationEQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES
EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES JEREMY J. BECNEL Abstract. We examine the main topologies wea, strong, and inductive placed on the dual of a countably-normed space
More informationCharacterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity.
Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Jérôme BOLTE, Aris DANIILIDIS, Olivier LEY, Laurent MAZET. Abstract The classical Lojasiewicz inequality and its extensions
More informationThe Proximal Gradient Method
Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,
More informationNon-smooth Non-convex Bregman Minimization: Unification and new Algorithms
Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France
More informationarxiv: v1 [math.oc] 13 Dec 2018
A NEW HOMOTOPY PROXIMAL VARIABLE-METRIC FRAMEWORK FOR COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, LIANG LING, AND KIM-CHUAN TOH arxiv:8205243v [mathoc] 3 Dec 208 Abstract This paper suggests two novel
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationConvex Optimization Notes
Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =
More informationAn introduction to Mathematical Theory of Control
An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationPROPERTIES OF A CLASS OF APPROXIMATELY SHRINKING OPERATORS AND THEIR APPLICATIONS
Fixed Point Theory, 15(2014), No. 2, 399-426 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html PROPERTIES OF A CLASS OF APPROXIMATELY SHRINKING OPERATORS AND THEIR APPLICATIONS ANDRZEJ CEGIELSKI AND RAFA
More informationLocal strong convexity and local Lipschitz continuity of the gradient of convex functions
Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate
More informationPrimal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming
Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationObserver design for a general class of triangular systems
1st International Symposium on Mathematical Theory of Networks and Systems July 7-11, 014. Observer design for a general class of triangular systems Dimitris Boskos 1 John Tsinias Abstract The paper deals
More informationProblem: A class of dynamical systems characterized by a fast divergence of the orbits. A paradigmatic example: the Arnold cat.
À È Ê ÇÄÁ Ë ËÌ ÅË Problem: A class of dynamical systems characterized by a fast divergence of the orbits A paradigmatic example: the Arnold cat. The closure of a homoclinic orbit. The shadowing lemma.
More informationNOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2
NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 MARCO VITTURI Contents 1. Solution to exercise 5-2 1 2. Solution to exercise 5-3 2 3. Solution to exercise 5-7 4 4. Solution to exercise 5-8 6 5. Solution
More informationConvex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE
Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the
More informationHedy Attouch, Jérôme Bolte, Benar Svaiter. To cite this version: HAL Id: hal
Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods Hedy Attouch, Jérôme Bolte, Benar Svaiter To cite
More informationLagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems
Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh Abstract. In Part I of
More informationA Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators
A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators Radu Ioan Boţ Ernö Robert Csetnek April 23, 2012 Dedicated to Jon Borwein on the occasion of his 60th birthday Abstract. In this note
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More informationCONVERGENCE AND STABILITY OF A REGULARIZATION METHOD FOR MAXIMAL MONOTONE INCLUSIONS AND ITS APPLICATIONS TO CONVEX OPTIMIZATION
Variational Analysis and Appls. F. Giannessi and A. Maugeri, Eds. Kluwer Acad. Publ., Dordrecht, 2004 CONVERGENCE AND STABILITY OF A REGULARIZATION METHOD FOR MAXIMAL MONOTONE INCLUSIONS AND ITS APPLICATIONS
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationOn the Moreau-Yosida regularization of the vector k-norm related functions
On the Moreau-Yosida regularization of the vector k-norm related functions Bin Wu, Chao Ding, Defeng Sun and Kim-Chuan Toh This version: March 08, 2011 Abstract In this paper, we conduct a thorough study
More informationNon-smooth Non-convex Bregman Minimization: Unification and New Algorithms
JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract
More informationVariational Functions of Gram Matrices: Convex Analysis and Applications
Variational Functions of Gram Matrices: Convex Analysis and Applications Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Lin Xiao (Microsoft Research) IMA Workshop on Resource
More informationMajorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs
Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs Jérôme Bolte and Edouard Pauwels September 29, 2014 Abstract In view of solving nonsmooth and nonconvex
More informationIntroduction to Real Analysis Alternative Chapter 1
Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces
More informationConvex Functions and Optimization
Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized
More informationOn the convergence of a regularized Jacobi algorithm for convex optimization
On the convergence of a regularized Jacobi algorithm for convex optimization Goran Banjac, Kostas Margellos, and Paul J. Goulart Abstract In this paper we consider the regularized version of the Jacobi
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationConvex Optimization Conjugate, Subdifferential, Proximation
1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview
More informationPARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION
PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION ALESSIO FIGALLI AND YOUNG-HEON KIM Abstract. Given Ω, Λ R n two bounded open sets, and f and g two probability densities concentrated
More informationPARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT
Linear and Nonlinear Analysis Volume 1, Number 1, 2015, 1 PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT KAZUHIRO HISHINUMA AND HIDEAKI IIDUKA Abstract. In this
More informationInertial forward-backward methods for solving vector optimization problems
Inertial forward-backward methods for solving vector optimization problems Radu Ioan Boţ Sorin-Mihai Grad Dedicated to Johannes Jahn on the occasion of his 65th birthday Abstract. We propose two forward-backward
More informationA user s guide to Lojasiewicz/KL inequalities
Other A user s guide to Lojasiewicz/KL inequalities Toulouse School of Economics, Université Toulouse I SLRA, Grenoble, 2015 Motivations behind KL f : R n R smooth ẋ(t) = f (x(t)) or x k+1 = x k λ k f
More information