The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates

Size: px
Start display at page:

Download "The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates"

Transcription

1 The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates Radu Ioan Boţ Dang-Khoa Nguyen January 6, 08 Abstract. We propose two numerical algorithms for minimizing the sum of a smooth function and the composition of a nonsmooth function with a linear operator in the fully nonconvex setting. The iterative schemes are formulated in the spirit of the proximal and, respectively, proximal linearized alternating direction method of multipliers. The proximal terms are introduced through variable metrics, which facilitates the derivation of proximal splitting algorithms for nonconvex complexly structured optimization problems as particular instances of the general schemes. Convergence of the iterates to a KKT point of the objective function is proved under mild conditions on the sequence of variable metrics and by assuming that a regularization of the associated augmented Lagrangian has the Kurdya- Lojasiewicz property. If the augmented Lagrangian has the Lojasiewicz property, then convergence rates of both augmented Lagrangian and iterates are derived. Keywords. nonconvex complexly structured optimization problems, alternating direction method of multipliers, proximal splitting algorithms, variable metric, convergence analysis, convergence rates, Kurdya- Lojasiewicz property, Lojasiewicz exponent AMS subject classification. 47H05, 65K05, 90C6 Introduction. Problem formulation and motivation In this paper, we address the solving of the optimization problem min tg paxq ` h pxqu, () xprn where g : R m Ñ R Y t`8u is a proper and lower semicontinuous function, h: R n Ñ R is a Fréchet differentiable function with L-Lipschitz continuous gradient and A: R n Ñ R m is a linear operator. The spaces R n and R m are equipped with Euclidean inner products x, y and associated norms a x, y, which are both denoted in the same way, as there is no ris of confusion. We start by briefly describing the Alternating Direction Method of Multipliers (ADMM) in the context of solving the more general problem min tf pxq ` g paxq ` h pxqu, () xprn where g and h are assumed to be also convex and f : R n Ñ R Y t`8u is another proper, convex and lower semicontinuous function. We rewrite the problem (), by introducing an auxiliary variable, as min px,zqpr nˆr m Ax z 0 tf pxq ` g pzq ` h pxqu. (3) Faculty of Mathematics, University of Vienna, Osar-Morgenstern-Platz, 090 Vienna, Austria, radu.bot@ univie.ac.at. Research partially supported by FWF (Austrian Science Fund), project I 49-N3. Invited Associate Professor, Babeş-Bolyai University, Faculty of Mathematics and Computer Sciences, str. Mihail Kogălniceanu, Cluj-Napoca, Romania Faculty of Mathematics, University of Vienna, Osar-Morgenstern-Platz, 090 Vienna, Austria, dang-hoa. nguyen@univie.ac.at. The author gratefully acnowledges the financial support of the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by Austrian Science Fund (FWF, project W60-N35).

2 For a fixed real number r ą 0, the augmented Lagrangian associated with problem (3) reads L r : R n ˆ R m ˆ R m Ñ R Y t`8u, L r px, z, yq fpxq ` g pzq ` h pxq ` xy, Ax zy ` r Ax z. Given a starting vector `x 0, z 0, y 0 P R n ˆ R m ˆ R m and tm u Ď R nˆn, ( M Ď Rmˆm, two sequences of symmetric and positive semidefinite matrices, the following proximal ADMM algorithm formulated in the presence of a smooth function and involving variable metrics has been proposed and investigated in [5]: for all ě 0 generate the sequence tpx, z, y qu by x ` P arg min xpr n z ` arg min zpr m # f pxq ` xx x, hpx qy ` r Ax z ` # g pzq ` r Ax` z ` r y y ` y ` `Ax ` z `. r y ` z z M ` + + x x, (4a) M, (4b) In case ρ, it has been proved in [5] that when the set of the Lagrangian associated with (3) (which is nothing else than L r when r 0) is nonempty and the two matrix sequences and A fulfill mild additional assumptions, then the sequence tpx, z, y qu converges to a saddle point of the Lagrangian associated with problem (3) (which is nothing else than L r when r 0) and provides in this way both an optimal solution of () and an optimal solution of its Fenchel dual problem. Furthermore, an ergodic primal-dual gap convergence rate result expressed in terms of the Lagrangian has been shown. In case h 0, the above iterative scheme encompasses different numerical algorithms considered in the literature. When M M 0 for all ě 0, (4a)-(4c) becomes the classical ADMM algorithm ([5,, 5, 6]), which has a huge popularity in the optimization community. And this despite its poor implementation properties caused by the fact that, in general, the calculation of the sequence of primal variables x ( does not correspond to a proximal step. For an inertial version of the classical ADMM algorithm we refer to [0]. When M M and M M for all ě 0, (4a)-(4c) recovers the proximal ADMM algorithm investigated by Shefi and Teboulle in [39] (see also [0, ]). It has been pointed out in [39] that, for suitable choices of the matrices M and M, this proximal ADMM algorithm becomes a primal-dual splitting algorithm in the sense of those considered in [3, 6, 9, 4], and which, due to their full splitting character, overcome the drawbacs of the classical ADMM algorithm. Recently, in [] it ( has been shown that, when f is strongly convex, suitable choices of the non-constant sequences M and ( M may lead to a rate of convergence for the sequence of primal iterates of O p{q. The reason why we address in this paper the slightly less general optimization problem () is exclusively given by the fact that in this setting we can provide sufficient conditions which guarantee that the sequence generated by the ADMM algorithm is bounded. In the nonconvex setting, the boundedness of the sequence tpx, z, y qu plays a central role the convergence analysis. The contributions of the paper are as follows:. We propose a proximal ADMM (P-ADMM) algorithm and a proximal linearized ADMM (PL-ADMM) algorithm for solving () and carry out a convergence analysis in parallel for both algorithms. We first prove under certain assumptions on the matrix sequences boundedness for the sequence of generated iterates tpx, z, y qu. Under these premises, we show that the cluster points of tpx, z, y qu are KKT points of the problem (). Global convergence of the sequence is shown provide that a regularization of the augmented Lagrangian satisfies the Kurdya- Lojasiewicz property. In case this regularization of the augmented Lagrangian has the Lojasiewicz property, we derive rates of convergence for the sequence of iterates. To the best of our nowledge, these are the first results in the literature addressing convergence rates for the nonconvex ADMM.. The two ADMM algorithms under investigation are of relaxed type, namely, we allow ρ P p0, q. We notice that ρ is the standard choice in the literature ([, 5,, 30, 39, 4]). Gabay and Mercier proved in [6] in the convex setting that ρ may be chosen in p0, q, however, the majority of the extensions of the convex relaxed ADMM algorithm assume that ρ P 0, `?5 43, 44] or as for a particular choice of ρ, which is interpreted as a step size, see [7]. (4c), see [0,, 5, 40, The only wor in the nonconvex setting dealing with an alternating minimization algorithm, however for the minimization of the sum of a simple nonsmooth with a smooth function, and which allows a relaxed parameter ρ different from is [44].

3 3. Particular outcomes of the proposed algorithms will be full splitting algorithms for solving the nonconvex complexly structured optimization (), which we will obtain by an appropriate choice of the matrix sequences. (P-ADMM) will give rise to an iterative scheme formulated only in terms of proximal steps for the function g and h and of forward evaluations of the matrix A, while (PL-ADMM) will give rise to an iterative scheme in which the function h will be performed via a gradient step. Exact formulas for proximal operators are available not only for large classes of convex ([8]), but also of nonconvex functions ([3, 3, 9]). The fruitful idea to linearize the step involving the smooth term has been used in the past in the context of ADMM algorithms mostly in the convex setting [3, 36, 37, 43, 45]; the paper [3] being the only exception in the nonconvex setting. For previous wors addressing the ADMM algorithm in the nonconvex setting we mention: [30], where () is studied by assuming that h is twice continuously differentiable with bounded Hessian; [4], where the convergence is studied in the context of solving a very particular nonconvex consensus and sharing problems; and [], where the ADMM algorithm is used in the penalized zero-variance discriminant analysis. In [4] and [3], the investigations of the ADMM algorithm are carried out in very restrictive settings generated by the strong assumptions on the nonsmooth functions and linear operators.. Notations and preliminaries Let N be a strictly positive integer. We denote by : p,..., q P R N and write for x : px,..., x N q, y : py,..., y N q P R N x ă y if and only if x i ă y N. The Cartesian product R N ˆ R N ˆ... ˆ R Np with some strictly positive integer p will be endowed with inner product and associated norm defined for u : pu,..., u p q, u : `u,..., up P R N ˆ R N ˆ... ˆ R Np by g fÿ u, u ui, uid and u e p u i, i respectively. Moreover, for every u : pu,..., u p q, u : `u,..., u p P R N ˆ R N ˆ... ˆ R Np we have ÿ p? p i g fÿ u i ď u e p u i ď i i pÿ u i. (5) We denote by S Ǹ the family of symmetric and positive semidefinite matrices M P R NˆN. Every M P S Ǹ induces a semi-norm defined by x M : xmx, P RN. The Loewner partial ordering on S Ǹ is defined for M, M P S Ǹ as i M ě M ô x M ě x P RN. Thus M P S Ǹ is nothing else than M ě 0. For α ą 0 we set P N α : M P S Ǹ : M ě αid (, where Id denotes the identity matrix. If M P Pα N, then the semi-norm M obviously becomes a norm. The linear operator A is surjective if and only if its associated matrix has full row ran. This assumption is further equivalent to the fact that the matrix associated to AA is positively definite. Since λ min paa q y ď y xaa y, yy A P R m, (6) AA this is further equivalent to λ min paa q ą 0 (and AA P P n λ minpaa q ), where λ minp q denotes the smallest eigenvalue of a matrix. Similarly, A is injective if and only if λ min pa Aq ą 0 (and A A P P n λ minpa Aq ). Proposition. Let Ψ: R N Ñ R be Fréchet differentiable such that its gradient is Lipschitz continuous with constant L ą 0. Then the following statements are true: 3

4 . For every x, y P R N and every z P rx, ys tp tqx ` ty : t P r0, su it holds Ψ pyq ď Ψ pxq ` x Ψ pzq, y xy ` L y x ; (7). If Ψ is bounded from below, then for every σ ą 0 it holds " ˆ inf Ψ pxq xpr N σ L * σ Ψ pxq ą 8. (8) Proof.. Let be x, y P R N and z : p tqx ` ty for t P r0, s. By the fundamental theorem for line integrals we get Since Ψ pyq Ψ pxq ď ż 0 ż 0 ż 0 ż 0 x Ψ pp sqx ` syq, y xy ds x Ψ pp sqx ` syq Ψ pzq, y xy ds ` x Ψ pzq, y xy. (9) x Ψ pp sqx ` syq Ψ pzq, y xy ds ż Ψ pp sqx ` syq Ψ pzq y x ds ď L x y s t ds ˆż t L x y p s ` tq ds ` 0 ż t ˆ ps tq ds L t p tq x y. (0) 0 The inequality (7) is obtained by combining (9) and (0) and by using that 0 ď t ď.. The inequality (7) gives for every x P R N ˆ 8 ă inf Ψ pyq ď Ψ ypr N x σ Bˆ ď Ψ pxq ` Ψ pxq Ψ pxq ˆ σ which leads to the desired conclusion. x σ L σ Ψ pxq Ψ pxq, F x, Ψ pxq ` L ˆ x Ψ pxq σ x Remar. The so-called Descent Lemma, which says that for a Fréchet differentiable function Ψ: R N Ñ R having Lipschitz continuous gradient with constant L ą 0 it holds Ψ pyq ď Ψ pxq ` x Ψ pxq, y xy ` L y y P R N, () follows from statement (i) of the above proposition for z : x. Moreover, for z : y we have that Ψ pxq ě Ψ pyq ` x Ψ pyq, x yy L x y P R N, () which is equivalent to the fact that Ψ ` L is a convex function, in other words, Ψ is a L-semiconvex function ([8]). It follows from the previous result that a Fréchet differentiable function with L-Lipschitz continuous gradient is L-semiconvex. Further, we will recall the definition and some properties of the limiting subdifferential, a notion which will play an important role in the convergence analysis we are going to carry out for the nonconvex 4

5 ADMM algorithm. Let Ψ: R N Ñ R Y t`8u be a proper and lower semicontinuous function. For any x P domψ : x P R N : Ψ pxq ă `8 (, the Fréchet (viscosity) subdifferential of Ψ at x is * p BΨ pxq : "d P R N Ψ pyq Ψ pxq xd, y xy : lim inf ě 0 yñx y x and the limiting (Morduhovich) subdifferential of Ψ at x is BΨ pxq : td P R N : exist sequences x Ñ x and d Ñ d as Ñ `8 such that Ψ `x Ñ Ψ pxq as Ñ `8 and d P p BΨ `x for all ě 0u. For x R dom pψq, we set p BΨ pxq BΨ pxq : H. The inclusion p BΨ pxq Ď Ψ pxq holds for each x P R N in general. In case Ψ is convex, these two subdifferential notions coincide with the convex subdifferential, thus p BΨ pxq BΨ pxq d P R N : Ψ pyq ě Ψ pxq ` xd, y P R N( for all x P domψ. If x P R N is a local minimum of Ψ, then 0 P BΨ pxq. We denote by critpψq tx P R N : 0 P BΨ pxqu the set of critical points of Ψ. The limiting subdifferential fulfills the closedness criterion: if x ( and td u are sequence in R N such that d P Ψ `x for all ě 0 and `x, d Ñ px, dq and Ψ `x Ñ Ψ pxq as Ñ `8, then d P Ψ pxq. We also have the following subdifferential sum rule ([34, Proposition.07], [38, Exercise 8.8]): if Φ: R N Ñ R is a continuously differentiable function, then B pψ ` Φq pxq BΨ pxq ` Φ pxq for all x P R N ; and the following formula for the subdifferential of the composition with a linear operator A: R N Ñ R N ([34, Proposition.], [38, Exercise 0.7]): if x P domψ and A is injective, then B pψ Aq pxq A BΨ paxq. We close this section by presenting some convergence results for real sequences that will be used in the sequel in the convergence analysis. The next lemma is often used in the literature when proving convergence of numerical algorithms relying on Fejér monotonicity techniques (see, for instance, [, Lemma.], [4, Lemma ]). Lemma. Let tb u be a sequence in R and tξ u a sequence in R`. Assume that tb u is bounded from below and that for every ě 0 Then the following statements hold: b ` ` ξ ď b.. the sequence tξ u is summable, namely ÿ ξ ă `8.. the sequence tb u is monotonically decreasing and convergent. The following lemma, which is an extension of [, Lemma.3] (see, also [4, Lemma 3]), is of interest by its own. Lemma 3. Let a : `a (, a,..., a N be a sequence in RǸ and tδ u a sequence in R D, a ` c 0, a D a D a D ` ě, (3) where c 0 : pc 0,, c 0,,..., c 0,N q P R N, c : pc,, c,,..., c,n q P R Ǹ and c : pc,, c,,..., c,n q P R Ǹ fulfill c 0 ` c ` c ă. Assume further that there exists δ s ě 0 such that for every K ě K ě Then for every i,..., N we have K δ ď s δ. ÿ a i ă `8. In particular, for every i,..., N and every K ě K ě, it holds K a i ď Nÿ j p c 0,j c,j q a K j ` p c 0,jq a K` j ı ` a K` j ` sδ c 0,i c,i c,i. (4) 5

6 Proof. Fix K ě K ě. If K K or K K ` then (4) holds automatically. Consider now the case when K ě K `. Summing up the inequality(3) for K `,, K, we obtain C Since, K` a ` G ď C K` c 0, K` K` K` K` a ` a a a a G ` K` ÿ K`3 K K ÿ K` K ÿ K the inequality (5) can be rewritten as C `, C K c, C a c, K K` a `a K ` a K` a a K K a G ` C c, K` a G ` a ` a K` a K a K` a K` a a K K ` a a a K K ` a, G C E a ` A, a K` a K a K` a K` ď c 0, K which further implies» Nÿ p c 0,j c,j c,j q j G C E a Ac, a K ` a K ` c, K a j fi fl Hence, for every i,..., N it holds C K c 0 c c, K a G G E a Ac, a K ` a K ` K a G 0, a K ` a K`D K` c 0 c, a KD c 0, a K`D a K`D ` Nÿ j p c 0,j c,j q a K j ` p c 0,jq a K` j ı ` a K` j δ, δ. (5) K` ` δ K` δ. p c 0,i c,i c,i q K a i ď Nÿ j p c 0,j c,j q a K j ` p c 0,jq a K` j ı ` a K` j ` sδ and the conclusion follows by taing into consideration that c 0 ` c ` c ă. A proximal ADMM and a proximal linearized ADMM algorithm in the nonconvex setting In this section we will propose two proximal ADMM algorithms for solving the optimization problem () and we will study their convergence behaviour. In this context, a central role will be played by the augmented Lagrangian associated with problem (), which is defined for every r ą 0 as L r : R n ˆ R m ˆ R m Ñ R Y t`8u, L r px, z, yq g pzq ` h pxq ` xy, Ax zy ` r Ax z. 6

7 . General formulations and particular instances written in the spirit of full proximal splitting algorithms Algorithm. Let be the matrix sequences ( M P Sǹ, ( M P Sm`, r ą 0 and 0 ă ρ ă. For a given starting vector `x 0, z 0, y 0 P R n ˆ R m ˆ R m, generate the sequence `x, z, y ( for every ě 0 as: " z ` P arg min g pzq Ax z D ` r Ax z ` * z z, (6a) zpr m M " x ` P arg min h pxq Ax z `D ` r Ax z ` ` * x x, (6b) xpr n M y ` : y ` `Ax ` z `. (6c) Let tt u be a sequence of positive real numbers such that t r A ď, M : t Id ra A and M `x : 0 for every ě 0. Algorithm becomes an iterative scheme which generates a sequence, z, y ( for every ě 0 as: # z ` P arg min zpr m x ` P arg min xpr n g pzq ` r z Ax + r y, " h pxq ` x x ` t y A ` r `Ax z ` t y ` : y ` `Ax ` z `. Recall that the proximal point operator with parameter γ ą 0 of a proper and lower semicontinuous function Ψ: R N Ñ R Y t`8u is the set-valued operator defined as ([35]) " prox γψ : R N ÞÑ RN, prox γψ pxq arg min Ψ pyq ` * x y. ypr N γ The above particular instance of Algorithm is an iterative scheme formulated in the spirit of full splitting numerical methods, namely, the functions g and h are evaluated by their proximal operators, while the linear operator A and its adjoint are evaluated by simple forward steps. Exact formulas for the proximal operator are available not only for large classes of convex functions ([8]), but also for many nonconvex functions appearing in applications ([3, 3, 9]). The second algorithm that we propose in this paper replaces h in the definition of x ` by its linearization at x for every ě 0. Algorithm. Let be the matrix sequences ( M P Sǹ, ( M P Sm`, r ą 0 and 0 ă ρ ă. For a given starting vector `x 0, z 0, y 0 P R n ˆ R m ˆ R m, generate the sequence `x, z, y ( for every ě 0 as: " z ` P arg min g pzq Ax z D ` r zpr m x ` P arg min xpr n Ax z ` x, h `x D Ax z `D ` r y ` : y ` `Ax ` z `. * z z M Ax z ` ` *,, (7a) *, (7b) x x M Due to the presence of the variable metric inducing matrix sequences we can thus provide a unifying scheme for several linearized ADMM algorithms discussed in the literature (see [3, 3, 36, 37, 43, 45]), which can be recovered for specific choices of the variable metrics. When taing as for Algorithm M : t Id ra A, where t r A ď, and M : 0, for every ě 0, then Algorithm translates for every ě 0 into: z ` P arg min zpr m # g pzq ` r z Ax r y +, x ` : x t ` h `x ` A y ` r `Ax z `, y ` : y ` `Ax ` z `. (7c) 7

8 This iterative scheme has the remarable property that the smooth term is evaluated via a gradient step. This is an improvement with respect to other nonconvex ADMM algorithms, such as [4, 44], where the smooth function is involved in a subproblem, which can be in general difficult to solve, unless it can be reformulated as a proximal step (see [30]). We will carry out a parallel convergence analysis for Algorithm and Algorithm and wor to this end in the following setting. Assumption. Assume that A is surjective and r ą 0, ρ P p0, q, µ : sup M ă `8 and µ : sup M ă `8 are such that there exists γ ą with r ě p ` γq T L ą 0 (8) and where $ ρ & T 0 : λ min paa qρ r ρ % λ min paa q p ρq and M 3 : M ` ra A C Id ě 3 C ě 0, (9) if 0 ă ρ ď, if ă ρ ă, $ 4T & µ, for Algorithm, C 0 : r % 4T pl ` µ q, for Algorithm, r $ &, if 0 ă ρ ď, T : λ min paa qρ ρ %, if ă ρ ă, λ min paa q p ρq $ & L ` 4T pl ` µ q, for Algorithm, C : r % L ` 4T µ, for Algorithm. r Remar. Notice that (9) can be equivalently written as $ & 6µ M `ra A `L ` r C M ` 4 pl ` µ q T, for Algorithm, Id ě ě 0, where CM : % 4µ ` 6 pl ` µ q T, for Algorithm. (0) In the following we present some possible choices of the matrix sequences ( M and ( M which fulfill Assumption.. Since ra A P S ǹ, when sup M µ ą L, by choosing there exists α ą 0 such that " r ě max p ` γq T L, µ ě α ě * C M ą 0, µ L ˆ L ` CM ą 0. r Thus (8) is verified, while (0) is ensured when choosing M such that µ Id ě M ě α Id for every ě 0. # +. Let M : t Id ra A for every ě 0, where 0 ă t ă min r A,. Then the relation (0) L becomes t Id ra A `L ` r C M Id ě 0, which automatically holds (as also (8) does), if " * tc M r ě max p ` γq T L, ą 0. tl 8

9 3. If A is assumed to be also injective, then ra A ě rλ min pa Aq ą 0. By choosing # + it follows that r ě max p ` γq T L, L ` al ` 4λ min pa Aq C M λ min pa Aq ra A `L ` r C M Id ě 0, thus, (8) and (0) hold for an arbitrary sequence of symmetric and positive semidefinite matrices M (. A possible choice is M 0 and M 0 for every ě 0, which allows us to recover the classical ADMM. When proving convergence for variable metric algorithms designed for convex optimization problems one usually assumes monotonicity for the matrix sequences inducing the variable metrics (see, for instance, [7, 5]). It is worth to mention that in this paper we manage to perform the convergence analysis for both Algorithm and Algorithm without any monotonicity assumption on ( M and ( M.. Preliminaries of the convergence analysis The following result of Fejér monotonicity type will play a fundamental role in our convergence analysis. Lemma 4. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Then for every ě it holds: L `x` r, z `, y ` ` T `y A ` 0 y ` ď L r `x, z, y ` T 0 A `y y ` C0 ą 0, x ` x ` M z ` z 3 M x x. () Proof. Let ě be fixed. In both cases the proof builds on showing that the following inequality L `x` r, z `, y ` ` x ` x L M x ` x `ra A ` z ` z M ď L `x r, z, y ` y ` y () is true and on providing afterwards an upper bound for. For Algorithm : From (6a) we have y ` y. g `z ` Ax z `D ` r Ax z ` ` z ` z M ď g `z Ax z D ` r Ax z. (3) The optimality criterion of (6b) is h `x ` A y `Ax ra ` z ` ` M `x x `. (4) From (7) (applied for z : x ` ) we get h `x ` ď h `x Ax Ax `D ` Ax ` z `, Ax Ax `D x ` x M ` L x ` x. (5) By combining (3), (5) and (6c), after some rearrangements, we obtain (). By using the notation u l : h `x l ` M l `xl x ě (6) and by taing into consideration (6c), we can rewrite (4) as A y l` ρu l` ` p ρq A y ě 0. (7) 9

10 The case 0 ă ρ ď. We have A `y ` y ρ Since 0 ă ρ ď, the convexity of gives and from here we get `u ` u ` p ρq A `y y. `y A ` y ď ρ u ` u ` p ρq `y A y. λ min paa qρ y ` y ď ρ `y A ` y ď ρ u ` u ` p ρq `y A y p ρq `y A ` y, (8) By using the Lipschitz continuity of h, we have u ` u ď pl ` µ q x ` x ` µ x x, (9) thus u ` u ď pl ` µ q x ` x ` µ x x. (30) After plugging (30) into (8), we get y ` y ď pl ` µ q λ min paa q ` p ρq λ min paa qρ r which, combined with (), provides (). x ` x ` µ λ min paa q A `y y The case ă ρ ă. This time we have from (7) that A `y ` y p ρq As ă ρ ă, the convexity of gives x x p ρq λ min paa qρ r ρ `u` u ρ `y ` pρ q A y. `y A ` y ď ρ u ` u ` pρ q `y A y. ρ and from here it follows A `y ` y, λ min paa q p ρq y ` y ď p ρq `y A ` y ď ρ u ` u ` pρ q `y A y pρ q `y A ` y, (3) ρ After plugging (30) into (3), we get y ` y ď ρ pl ` µ q λ min paa q p ρq r pρ q ` λ min paa q p ρq pρ q λ min paa q p ρq which, combined with (), provides ().. For Algorithm : The optimality criterion of (7b) is x ` x ` A `y y (3) ρµ x λ min paa q p ρq x r A `y ` y, (33) h `x A y `Ax ra ` z ` ` M `x x `. (34) 0

11 From (7) (applied for z : x ) we get h `x ` ď h `x Ax Ax `D ` Ax ` z `, Ax Ax `D x ` x M ` L x ` x. (35) Since the definition of z ` in (7a) leads also to (3), by combining this inequality with (35) and (7c), after some rearrangments, () follows. By using this time the notation u l : h `x l ` M l `xl x ě (36) and by taing into consideration (7c), we can rewrite (34) as The case 0 ă ρ ď. As in (8), we obtain A y l` ρu l` ` p ρq A y ě 0. (37) λ min paa qρ y ` y ď ρ `y A ` y ď ρ u ` u ` p ρq `y A y p ρq `y A ` y. (38) By using the Lipschitz continuity of h, we have u ` u ď µ x ` x ` pl ` µ q x x, (39) thus u ` u ď µ x ` x ` pl ` µ q x x. (40) After plugging (40) into (38), it follows y ` y ď µ λ min paa q p ρq ` λ min paa qρ r which, combined with (), provides (). The case ă ρ ă. As in (3), we obtain x ` x ` pl ` µ q λ min paa q A `y y x x p ρq λ min paa qρ r A `y ` y, λ min paa q p ρq y ` y ď p ρq `y A ` y ď ρ u ` u ` pρ q `y A y pρ q `y A ` y. (4) ρ After plugging (40) into (4), it follows y ` y ď ρµ λ min paa q p ρq r pρ q ` λ min paa q p ρq pρ q λ min paa q p ρq which, combined with (), provides (). This concludes the proof. x ` x ` The following three estimates will be useful in the sequel. A `y y ρ pl ` µ q λ min paa q p ρq r (4) x x A `y ` y. (43) Lemma 5. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Then the following statements are true:

12 (i) z ` z ď A x ` x ` Ax ` z ` ` Ax z A x ` x ` y ` y ` y ě ; (44) (ii) (iii) y ` ď T 0 `y A ` y ` T `x` h ` C0 x ` ě 0; (45) r r 4 y ` y ď C3 x ` x ` C4 x x ` T ` A `y y A `y ` ě, (46) where $ ρ pl ` µ q a, for Algorithm, & λmin paa q p ρ q C 3 : ρµ % a, for Algorithm, λmin paa q p ρ q $ ρµ a, for Algorithm, & λmin paa q p ρ q C 4 : ρ pl ` µ q % a, for Algorithm, λmin paa q p ρ q T : ρ a λmin paa q p ρ q. (47) Proof. The statement in (44) is straightforward. From (7) and (37) we have for every ě 0 or, equivalently A y ` ρu ` ` p ρq A y ρa y ` ρu ` ` p ρq A `y y `, where u ` is defined as being equal to u ` in (6), for Algorithm, and, respectively, to u ` in (36), for Algorithm. For 0 ă ρ ď we have λ min paa qρ y ` ď ρ A y ` ď ρ u ` ` p ρq A `y ` y, (48) while when ă ρ ă we have λ min paa qρ y ` ď ρ A y ` ď Notice further that when ă ρ ă we have {ρ ă and ă ρ{ p ρq. When u ` is defined as in (6), it holds ρ u ` ` pρ q `y A ` y. (49) ρ u ` u ` ď `x` h ` µ x ` ě 0, (50) while, when u ` is defined as in (36), it holds u ` u ` ď h `x ` ` pl ` µ q x ` ě 0. (5) We divide (48) and (49) by λ min paa qρ r ą 0 and plug (50) and, respectively, (5) into the resulting inequalities. This gives us (45).

13 Finally, in order to prove (46), we notice that for every ě it holds A `y ` y ď ρ u ` u ` ρ A `y y, so, a λmin paa q p ρ q y ` y ď p ρ q A `y ` y ď ρ u ` u ` ρ A `y y ρ A `y ` y. (5) We plug into (5) the estimates for u ` u derived in (9) and, respectively, (39) and divide the resulting inequality by a λ min paa q p ρ q ą 0. This furnishes the desired statement. The following regularization of the augmented Lagrangian will play an important role in the convergence analysis of the nonconvex proximal ADMM algorithms: F r : R n ˆ R m ˆ R m ˆ R n ˆ R m Ñ R Y t`8u, F r px, z, y, x, y q L r px, z, yq ` T 0 A `y y ` C0 where T 0 and C 0 are defined in Assumption. For every ě, we denote F : F r `x, z, y, x, y L r `x, z, y ` T 0 A `y y ` C0 x x, (53) x x. (54) `x Since the convergence analysis will rely on the fact that the set of cluster points of the sequence, z, y ( is nonempty, we will present first two situations which guarantee that this sequence is bounded. They mae use of standard coercivity assumptions for the functions g and h, respectively. Recall that a function Ψ : R N Ñ R Y t`8u is called coercive, if lim x Ñ`8 Ψ pxq `8. Theorem 6. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Suppose that one of the following conditions holds: (B-I) The operator A is invertible, g is coercive and h is bounded from below; (B-II) The function h is coercive and g and h are bounded from below. Then the sequence `x, z, y ( is bounded. Proof. From Lemma 4 we have that for every ě F ` ` x ` x ` z ` z ď F M 3 C0Id M (55) which shows, according to (9), that tf u ě is monotonically decreasing. Consequently, for every ě we have F ě F ` ` x ` x ` M 3 C0Id h `x ` ` g `z ` r ` T 0 A `y ` y ` which, thans to (45), leads to F ě h `x ` ` g `z ` T r ` T0 `y A ` y ` y ` ` r z ` z M Ax` z ` ` x ` x M 3 C0Id ` h `x ` ` r r y` Ax` z ` ` x ` x M 3 C0Id ` z ` z M ` C0 }x` x }, r y` z ` z M ` C0 4 }x` x }. (56) Next we will prove the boundedness of `x, z, y ( under each of the two scenarios. 3

14 (B-I) Since r ě p ` γq T L ą T L ą 0, there exists σ ą 0 such that σ L σ T r. From Proposition and (56) we see that for every ě g `z ` ` r Ax` z ` ` y` r ` C0 }x` x } 4 " * ď F inf h pxq T h pxq ă `8. xpr n r Since g is coercive, it follows that the sequences z (, Ax z ` r y ( and x ` x ( are bounded. This implies that Apx ` x q pz ` z q ( is bounded, from which we obtain the boundedness of r py ` y q (. According to the third update in the iterative scheme, we obtain that Ax z ( and thus y ( are also bounded. This implies the boundedness of Ax ( and, finally, since A is invertible, the boundedness of x (. (B-II) Again thans to (8) there exists σ ą 0 such that σ L σ p ` γq T r. We assume first that ρ or, equivalently, T 0 0. From Proposition and (56) we see that for every ě ˆ h `x ` ` T h `x ` ` r γ γr Ax` z ` ` y` r ` T0 }A py ` y q} ď F g `z ` γ inf xpr n " h pxq p ` γq T h pxq r * ă `8. Since h is coercive, we obtain that x (, Ax z ` r y ( and A `y ` y ( are bounded. For every ě 0 we have that λ min pa Aqρ r }Ax ` z ` } λ min pa Aqρ r }y ` y } ď }A py ` y q}, thus Ax z ( is bounded. Consequently, y ( and z ( are bounded. In case ρ or, equivalently, T 0 0, we have that for every ě ˆ h `x ` ` T `x` h ` r γ γr Ax` z ` ` ď F g `z ` " γ inf h pxq p ` γq T * h pxq ă `8, xpr n r r y` from which we deduce that x ( and Ax z ` r y ( are bounded. From Lemma 5 (iii) it yields that y ` y ( is bounded, thus, Ax z ( is bounded. Consequently, y ( ( and z are bounded. Both considered scenarios lead to the conclusion that the sequence `x, z, y ( is bounded. We state now the first convergence result of this paper. Theorem 7. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm, which is assumed to be bounded. The following statements are true: (i) For every ě it holds F ` ` C0 4 x ` x ` z ` z ď F M. (57) (ii) The sequence tf u is bounded from below and convergent. Moreover, x ` x Ñ 0, z ` z Ñ 0 and y ` y Ñ 0 as Ñ `8. (58) 4

15 (iii) The sequences tf u, L `x r, z, y ( and h `x ` g `z ( have the same limit, which we denote by F P R. Proof. (i) According to (9) we have that M 3 C 0 Id P P n C 0 and thus (55) implies (57). (ii) We will show that L `x r, z, y ( is bounded from below, which will imply that tf u is bounded from below as well. Assuming the contrary, as `x, z, y ( is bounded, there exists a subsequence `xq, z q, y q (qě0 converging to an element ppx, pz, pyq P Rn ˆ R m ˆ R m such that Lr `x q, z q, y q ( converges to 8 as q Ñ `8. However, using the lower semicontinuity of g qě0 and the continuity of h, we obtain lim inf L r `x q, z q, y q r ě h ppxq ` g ppzq ` xpy, Apx pzy ` qñ`8 Apx pz, which leads to a contradiction. From Lemma we conclude that tf u ě is convergent and ÿ x ` x ă `8, thus x ` x Ñ 0 as Ñ `8. We proved in (3), (33), (4) and (43) that for every ě y ` y ď C L x ` x ` C0 x x ` T 0 A `y y T 0 A `y ` y. Summing up the above inequality for,..., K, for K ą, we get y ` y ď C L x ` x ` C0 x x ` T 0 A `y y 0 T 0 A `y K` y K ď C L We let K converge to `8 and conclude ÿ x ` x ` C0 Ax ` z ` ÿ x x ` T 0 A `y y 0. y ` y ă `8, thus Ax ` z ` Ñ 0 and y ` y Ñ 0 as Ñ `8. Since x ` x Ñ 0 as Ñ `8, it follows that z ` z Ñ 0 as Ñ `8. (iii) By using (58) and the fact that y ( is bounded, it follows F lim F lim L `x r, z, y `x h ` g `z (. Ñ`8 Ñ`8 lim Ñ`8 The following lemmas provides upper estimates in terms of the iterates for limiting subgradients of the augmented Lagrangian and the regularized augmented Lagrangian F r, respectively. Lemma 8. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. For every ě 0 we have d ` ` : `d x, d ` z, d ` `x` y P BLr, z `, y `, (59) where `x` d ` x : C ` h h `x ` `y A ` y ` M `x x `, d ` z : y y ` ` ra `x x ` ` M `z z `, d ` y : `y` y. (60a) (60b) (60c) 5

16 and C : #, for Algorithm, 0, for Algorithm. where Moreover, for every ě 0 it holds d ` ď C 5 x ` x ` C 6 z ` z ` C 7 y ` y, (6) C 5 : C L ` µ ` r A, C 6 : µ, C 7 : ` A `. (6) Proof. Let ě 0 be fixed. Applying the calculus rules of the limiting subdifferential, we obtain x L r `x`, z `, y ` h `x ` ` A y ` ` ra `Ax ` z `, B z L r `x`, z `, y ` Bg `z ` y ` r `Ax ` z `, y L r `x`, z `, y ` Ax ` z `. (63a) (63b) (63c) Then (60c) follows directly from (63c) and (6c), respectively, (7c), while (60b) follows from y ` rpax z ` q ` M `z z ` P Bg `z `, which is a consequence of the optimality criterion of (6a) and (7a), respectively. In order to derive (60a), let us notice that for Algorithm we have (see (4)) A y ` M `x x ` h `x ` ` `Ax ra ` z `, (64) while for Algorithm we have (see (34)) h `x A y ` M `x x ` `Ax ra ` z `. (65) By using (63a) we get the desired statement. Relation (6) follows by combining the inequalities d ` x ď pc L ` µ q x ` x ` A y ` y, d ` ď y y ` ` r A x ` x ` µ z ` z. with (5). z Lemma 9. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. For every ě 0 we have D ` :, D `, D `, D `, D` P BF `x` r, z `, y `, x, y (66) where D ` x D ` x : d ` `x` x ` C 0 x, Dz ` D ` x : C `x` 0 x, Moreover, for every ě 0 it holds where z y x y : d ` z, Dy ` : d ` y ` T `y 0 AA ` y, D ` y : T 0AA `y ` y. (67) D ` ď C 8 x ` x ` C9 z ` z ` C0 y ` y, (68) C 8 : C 5 ` C 0, C 9 : C 6, C 0 : C 7 ` 4T A. (69) Proof. Let ě 0 be fixed. Applying the calculus rules of the limiting subdifferential it follows x F `x` r, z `, y `, x, y : x L `x` r, z `, y ` `x` ` C 0 x, B z F `x` r, z `, y `, x, y : B z L `x` r, z `, y ` y F `x` r, z `, y `, x, y : y L `x` r, z `, y ` ` T `y 0 AA ` y, x F `x` r, z `, y `, x, y `x` : C 0 x, y F `x` r, z `, y `, x, y : T `y 0 AA ` y, (70a) (70b) (70c) (70d) (70e) 6

17 Then (66) follows directly from the above relations and (59). Inequality (68) follows by combining D ` x ď d ` x ` C0 x ` x, D ` ď d ` ` T A y ` y. with (5). y y The following result is a straightforward consequence of Lemma 5 and Lemma 9. Corollary 0. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Then the norm of the element D ` P BF `x` r, z `, y `, x, y defined in the previous lemma verifies for every ě the following estimate ` D ` ď C x ` x ` x x ` x x where ` C ` A `y y A `y ` y ` C 3 ` A `y y A `y y, (7) " C : max C 8 ` C 9 A ` C 3 C 0 ` C3C 9 C : ˆ C 0 ` C9 T, C 3 : C 9T In the following, we denote by ω u (, C 4C 0 ` C3C 9, C 4C 9 *,. (7) the set of cluster points of a sequence u ( Ď RN. Lemma. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm, which is assumed to be bounded. The following statements are true: (i) if px q, z q, y q q ( is a subsequence of `x, z, y ( which converges to ppx, pz, pyq as q Ñ `8, qě0 then lim L r `x q, z q, y q Lr ppx, pz, pyq ; qñ8 (ii) it holds `x ω, z, y ( Ď crit pl r q Ď tppx, pz, pyq P R n ˆ R m ˆ R m : A py h ppxq, py P Bg ppzq, pz Apxu ; `x (iii) we have lim dist, z, y `x, ω, z, y ( ı 0; Ñ`8 `x (iv) the set ω, z, y ( is nonempty, connected and compact; (v) the function L r taes on ω `x, z, y ( objective function g A ` h does on Pr R n the value F `x ω, z, y ( lim Ñ`8 L `x r, z, y, as the ı. `x Proof. Let ppx, pz, pyq P ω, z, y ( and `xq, z q, y q (qě0 be a subsequence of x, z, y ( converging to ppx, pz, pyq as q Ñ `8. (i) From either (6a) or (7a) we obtain for all q ě g `z D ` y q, Ax q r z q ` ď g ppzq q, Ax q pz D ` r Ax q z q ` Ax q pz ` Taing the limit superior on both sides of the above inequalities, we get lim sup g `z q ď g ppzq, qñ8 z q z q q M pz z q q M. 7

18 which, combined with the lower semicontinuity of g, leads to lim g `z q g ppzq. qñ8 Since h is continuous, we further obtain lim L r `x q, z q, y q lim g `z D ` h `x q ` y q r, Ax q z q ` ı Ax q z q qñ8 qñ8 g ppzq ` h ppxq ` xpy, Apx pzy ` r Apx pz L r ppx, pz, pyq. (ii) For the sequence d ( defined in (60a) - (60c), we have that dq P BL r px q, z q, y q q for every q ě and d q Ñ 0 as q Ñ `8, while `x q, z q, y q Ñ ppx, pz, pyq and Lr `x q, z q, y q Ñ Lr ppx, pz, pyq as q Ñ `8. The closedness criterion of the limiting subdifferential guarantees that 0 P BL r ppx, pz, pyq or, in other words, ppx, pz, pyq P crit pl r q. Choosing now an element ppx, pz, pyq P crit pl r q, it holds which is further equivalent to 0 h ppxq ` A py ` ra papx pzq, 0 P Bg ppzq py r papx pzq, 0 Apx pz, A py h ppxq, py P Bg ppzq, pz Apx. (iii)-(iv) The proof follows in the lines of the proof of Theorem 5 (ii)-(iii) in [9], also by taing into consideration [9, Remar 5], according to which the properties in (iii) and (iv) are generic for sequences satisfying `x `, z `, y ` `x, z, y Ñ 0 as Ñ `8, which is indeed the case due to (58). (v) The conclusion follows according to the first two statements of this theorem and of the third statement of Theorem 7. Remar 3. An element ppx, pz, pyq P R n ˆ R m ˆ R m fulfilling A py h ppxq, py P Bg ppzq, pz Apx is a so-called KKT point of the optimization problem (). For such a KKT point we have When A is injective this is further equivalent to 0 A Bg papxq ` h ppxq. (73) 0 P Bpg Aqppxq ` h ppxq B pg A ` hq ppxq, (74) in other words, px is a critical point of the optimization problem (). On the other hand, when the functions g and h are convex, then (73) and (74) are equivalent, which means that px is a global optimal solution of the optimization problem (). In this case, py is a global optimal solution of the Fenchel dual problem of (). By combining Lemma 9, Theorem 7 and Lemma, one obtains the following result. Lemma. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm `x or Algorithm, which is assumed to be bounded. Denote by Ω : ω, z, y, x, y (. ě The following statements are true: (i) it holds Ω Ď tppx, pz, py, px, pyq P R n ˆ R m ˆ R m ˆ R n ˆ R m : ppx, pz, pyq P crit pl r qu ; (ii) we have lim dist `x, z, y, x, y, Ω 0; Ñ`8 (iii) the set Ω is nonempty, connected and compact; (iv) the regularized augmented Lagrangian F r taes on Ω the value F lim Ñ`8 F, as the objective function g A ` h does on Pr R nω. 8

19 .3 Convergence analysis under Kurdya- Lojasiewicz assumptions In this subsection we will prove global convergence for the sequence `x, z, y ( generated by the two nonconvex proximal ADMM algorithms in the context of K L property. The origins of this notion go bac to the pioneering wor of Kurdya who introduced in [8] a general form of the Lojasiewicz inequality ([33]). A further extension to the nonsmooth setting has been proposed and studied in [6, 7, 8]. We recall that the distance function of a given set Ω Ď R N is defined for every x by dist px, Ωq : inf t x y : y P Ωu. If Ω H, then dist px, Ωq `8. Definition. Let η P p0, `8s. We denote by Φ η the set of all concave and continuous functions ϕ: r0, ηq Ñ r0, `8q which satisfy the following conditions:. ϕ p0q 0;. ϕ is C on p0, ηq and continuous at 0; 3. for all s P p0, ηq : ϕ psq ą 0. Definition. Let Ψ: R N Ñ R Y t`8u be proper and lower semicontinuous.. The function Ψ is said to have the Kurdya- Lojasiewicz (K L) property at a point pu P dombψ : u P R N : BΨ puq H (, if there exists η P p0, `8s, a neighborhood U of pu and a function ϕ P Φ η such that for every the following inequality holds u P U X rψ ppuq ă Ψ puq ă Ψ ppuq ` ηs ϕ pψ puq Ψ ppuqq dist p0, BΨ puqq ě.. If Ψ satisfies the K L property at each point of dombψ, then Ψ is called K L function. The functions ϕ belonging to the set Φ η for η P p0, `8s are called desingularization functions. The K L property reveals the possibility to reparameterize the values of Ψ in order to avoid flatness around the critical points. To the class of K L functions belong semialgebraic, real subanalytic, uniformly convex functions and convex functions satisfying a growth condition. We refer the reader to [, 3, 4, 6, 7, 8, 9] and to the references therein for more properties of K L functions and illustrating examples. The following result, taen from [9, Lemma 6], will be crucial in our convergence analysis. Lemma 3. (Uniformized K L property) Let Ω be a compact set and Ψ: R N Ñ RYt`8u be a proper and lower semicontinuous function. Assume that Ψ is constant on Ω and satisfies the K L property at each point of Ω. Then there exist ε ą 0, η ą 0 and ϕ P Φ η such that for every pu P Ω and every element u in the intersection u P R N : dist pu, Ωq ă ε ( X rψ ppuq ă Ψ puq ă Ψ ppuq ` ηs it holds ϕ pψ puq Ψ ppuqq dist p0, BΨ puqq ě. Woring in the hypotheses of Lemma, we define for every ě E : F `x, z, y, x, y F F F ě 0, (75) where F is the limit of tf u ě as Ñ `8. The sequence te u ě is monotonically decreasing and it converges to 0 as Ñ `8. The next result shows that when the regularization of the augmented Lagrangian F r is a K L function, then the sequence `x, z, y ( converges to a KKT point of the optimization problem (). Theorem 4. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm, which is assumed to be bounded. If F r is a K L function, then the following statements are true: (i) the sequence `x, z, y ( has finite length, namely, ÿ x ` x ÿ ă `8, z ` z ă `8, 9 ÿ y ` y ă `8; (76)

20 (ii) the sequence `x, z, y ( converges to a KKT point of the optimization problem (). `x Proof. As in Lemma, we denote by Ω : ω, z, y, x, y (, which is a nonempty set. ě Let be ppx, pz, py, px, pyq P Ω, thus F r ppx, pz, py, px, pyq F. We have seen that te F F u ě converges to 0 as Ñ `8 and will consider, consequently, two cases. First we assume that there exists an integer ě 0 such that E 0 or, equivalently, F F. Due to the monotonicity of te u ě, it follows that E 0 or, equivalently, F F for all ě. Combining inequality (57) with Lemma 5, it yields that x ` x 0 for all ě `. Using Lemma 5 (iii) and telescoping sum arguments, it yields ÿ y ` y ă `8. Finally, by using Lemma 5 (i), we get that ÿ z ` z ă `8. Consider now the case when E ą 0 or, equivalently, F ą F for every ě. According to Lemma 3, there exist ε ą 0, η ą 0 and a desingularization function ϕ such that for every element u in the intersection tu P R n ˆ R m ˆ R m ˆ R n ˆ R m : dist pu, Ωq ă εu X tu P R n ˆ R m ˆ R m ˆ R n ˆ R m : F ă F r puq ă F ` ηu (77) it holds ϕ pf r puq F q dist p0, BF r puqq ě. Let be ě such that for all ě F ă F ă F ` η. Since lim dist `x, z, y, x, y, Ω 0, see Lemma (ii), there exists ě such that for all Ñ`8 ě dist `x, z, y, x, y, Ω ă ε. Thus, `x, z, y, x, y belongs to the intersection in (77) for all ě 0 : max t,, 3u, which further implies ϕ pf F q dist `0, BF r `x, z, y, x, y ϕ pe q dist `0, BF r `x, z, y, x, y ě. (78) Define for two arbitrary nonnegative integers p and q Then for all K ě 0 ě it holds p,q : ϕ pf p F q ϕ pf q F q ϕ pe p q ϕ pe q q. from which we get ÿ ě,` ă `8. 0,` 0,K` ϕ pe 0 q ϕ pe K` q ď ϕ pe 0 q, By combining Theorem 7 (i) with the concavity of ϕ we obtain for all ě,` ϕ pe q ϕ pe ` q ě ϕ pe q re E ` s ϕ pe q rf F ` s ě ϕ pe q C 0 4 The last relation combined with (78) imply for all ě 0 x ` x ď ϕ pe q dist `0, BF r `x, z, y, x, y x ` x ď 4 C 0,` dist `0, BF r `x, z, y, x, y. x ` x. (79) 0

21 By the arithmetic mean-geometric mean inequality and Corollary 0 we have that for every ě 0 and every β ą 0 x ` x ď c 4 C 0,` dist p0, BF r px, z, y, x, y qq We denote for every ě 3 ď β,` ` C 0 β dist `0, BF `x r, z, y, x, y ď β,` ` C ` x x ` x x ` x x 3 C 0 β ` C ` `y A y `y A y β ` C3 ` `y A y 3 `y A y. (80) β a : x x ě 0, δ : β,` ` C ` `y A y `y A y C 0 β ` C3 ` `y A y 3 A `y y. β The inequality (80) is nothing than (3) with c 0 c c : C β. Observe that for every K ě 0 we have δ ď β ϕ pe 0 q ` C C 0 β 0 `y A 0 y 0 C 3 ` `y A 0 y 0 3 β and thus, by choosing β ą 3C, we can use Lemma 3 to conclude that ÿ x ` x ă `8. The other two statements in (76) follow from Lemma 5. This means that the sequence `x, z, y ( is Cauchy, thus it converges to an element ppx, pz, pyq which is, according to Lemmas, a KKT point of the optimization problem (). Remar 4. The function F r is a K L function if, for instance, the objective function of () is semialgebraic, which is the case when the functions g and h are semi-algebraic. 3 Convergence rates under Lojasiewicz assumptions In this section we derive convergence rates for the sequence `x, z, y ( generated by Algorithm or Algorithm as well as for the regularized augmented Lagrangian function F r along this sequence, provided that the latter satisfies the Lojasiewicz property. 3. Lojasiewicz property and a technical lemma We recall the following definition from [] (see, also, [33]). Definition 3. Let Ψ: R N Ñ R Y t`8u be proper and lower semicontinuous. Then Ψ satisfies the Lojasiewicz property if for any critical point pu of Ψ, there exists C L ą 0, θ P r0, q and ε ą 0 such that Ψ puq Ψ ppuq θ ď C L distp0, P Ball ppu, εq, (8) where Ball ppu, εq denotes the open ball with center pu and radius ε.

22 Providing that the Assumption is fulfilled and `x, z, y ( is the sequence generated by Algorithm or Algorithm `x, which is assumed to be bounded, we have seen in Lemma that the set of cluster points Ω ω, z, y, x, y ( is nonempty, compact and connected and F r taes on Ω the value F ; moreover for any ppx, pz, py, px, pyq P Ω, ppx, pz, pyq belongs to crit pl r q. According to [, Lemma ], if F r has the Lojasiewicz property, then there exist C L ą 0, θ P r0, q and ε ą 0 such that for any `x, z, y, x, y P tu P R n ˆ R m ˆ R m : dist pu, Ωq ă εu, it holds Fr `x, z, y, x, y F θ ď C L dist `0, BF r `x, z, y, x, y. Obviously, F r is a K L function with desingularization function ϕ : r0, `8q Ñ r0, `8q such that ϕ psq : θ C Ls θ, which, according to Theorem 4, means that Ω contains a single element ppx, pz, py, px, pyq, namely, the limit of `x, z, y, x, y ( as Ñ `8. In other words, if F r has the Lojasiewicz property, then there exist C L ą 0, θ P r0, q and ε ą 0 such that Fr `x, z, y, x, y F θ ď C L dist `0, BF r `x, z, y, x, `x, z, y, x, y P Ball pppx, pz, py, px, pyq, εq. (8) In this case, F r is said to satisfy the Lojasiewicz property with Lojasiewicz constant C L ą 0 and Lojasiewicz exponent θ P r0, q. The following lemma will provides convergence rates for a particular class of monotonically decreasing sequences converging to 0. Lemma 5. Let te u be a monotonically decreasing sequence in R` converging 0. Assume further that there exists natural numbers 0 ě l 0 ě such that for every ě 0 e l0 e ě C e e θ, (83) where C e ą 0 is some constant and θ P r0, q. Then following statements are true: (i) if θ 0, then te u converges in finite time; (ii) if θ P p0, {s, then there exists C e,0 ą 0 and Q P r0, q such that for every ě 0 0 ď e ď C e,0 Q ; (iii) if θ P p{, q, then there exists C e, ą 0 such that for every ě 0 ` l 0 0 ď e ď C e, p l 0 ` q θ. Proof. Fix an integer ě 0. Since 0 ě l 0 ě 0, the recurrence inequality (83) is well defined for every ě 0. (i) The case when θ 0. We assume that e ą 0 for every ě 0. From (83) we get e l0 e ě C e ą 0, for every ě 0, which actually leads to contradiction to the fact that te u converges to 0 as Ñ `8. Consequently, there exists ě 0 such that e 0 for every ě and thus the conclusion follows. For the proof of (ii) and (iii) we can assume that e ą 0 for every ě 0. Otherwise, as te u is monotonically decreasing and converges to 0, the sequence is constant beginning with a given index, which means that both statements are true. (ii) The case when θ P p0, {s. We have e ď e 0, thus e0 θ e ď e θ, which leads to Therefore e ď C e e θ 0 ` e l ď ď ˆ ď C e e θ 0 ` e l0 e ě C e e θ ě C e e θ 0 ě 0. ˆ C e e θ 0 ` ] Y 0 l 0 l 0 0 l0 e 0 e 0 `Ce e θ 0 ` 0 max te 0`j : j 0,..., l 0 u l0 ` l bc 0 e e0 θ `,

23 where tpu denotes the greatest integer that is less than or equal to the real number p. This provides the linear convergence rate, as P r0, q. l bc 0 e e θ 0 ` (iii) The case when θ P p{, q. From (83) we get C e ď pe l0 e q e θ. (84) Define ζ : p0, `8q Ñ R, ζpsq s θ. We have that ˆ d s θ s θ ζ psq and ζ psq θs θ ă P p0, `8q. ds θ Consequently, ζ pe l0 q ď ζ psq for all s P re, e l0 s. Assume that ζ pe q ď ζ pe l0 q. Then (84) gives or, equivalently, Assume that ζ pe q ą ζ pe l0 q. equivalent to C e ď pe l0 e q ζ pe q ď pe l0 e q ζ pe l q ż e l0 ζ pe l0 q ds ď e `e θ e θ l θ 0 ż e l0 e ζ psq ds e θ e θ l 0 ě C, where C : pθ q C e ą 0. (85) In other words, eθ l 0 ą e θ. For ν : θ νe l0 ě e ô ν θ e θ l 0 ď e θ ô `ν θ e θ l 0 ď e θ e θ l 0. P p0, q this is Recall that ν θ ą 0, since θ ă 0, and e θ 0 ď e θ l 0, since te u is monotonically decreasing, and thus e θ e θ l 0 ě `ν θ e θ l 0 ě C, where C : `ν θ e θ 0 ą 0. (86) In both situations we get for every i ě 0 e θ i e θ i l 0 ě C : min C, C ( ą 0, (87) where C and C are defined as in (85) and (86), respectively. For every ě 0 ` l 0, by summing up the inequalities (87) for i 0 ` l 0,,, we get l 0 ÿ j 0 e θ j e θ 0`j ě C p 0 l 0 ` q ą 0. Using the fact that θ ă 0 and the monotonicity of te i u iě0, it yields and thus which gives e 0`l 0 ď ď e 0 ô e θ 0`l 0 ě ě e θ 0 ô e θ 0 ě ě e θ 0`l 0 `e θ l 0 e θ 0 ě l 0 ÿ j 0 Moreover, we obtain from (87) that Z 0 ` l 0 ě e θ 0 e θ j e θ 0`j ě C p 0 l 0 ` q, e θ ě e θ 0 ` 0 l 0 ` l 0 C. (88) l 0 ^ ˆ0 C ` l 0 ě C 0 C. (89) l 0 l 0 3

24 By plugging (89) into (88) we obtain e θ ě l 0 ` l 0 C, which implies This concludes the proof. θ ˆC e ď p l0 ` q θ. (90) l 0 Remar 5. The inequality in Lemma 5 (iii) can be writen in term of instead of l 0 ` when large enough. For instance, when ě and thus from (90) we get 3. Convergence rates γ γ pl 0 ` q for some γ ą then we have that l 0 ` ě γ ˆ θ ˆC e ď p l0 ` q C θ θ ď l 0 γ θ. l 0 In this subsection we will study the convergence rates of Algorithm and in the context of an assumption which is slightly more restricitve than Assumption. Assumption. We wor in the hypotheses of Assumption except for (9) which is replaced by M 3 : M ` ra A ` pl C q Id ě 5 C ě 0, (9) Notice that (9) can be written as $ & 0µ M `ra A `L ` r CM Id ě ě 0, where C ` 8 pl ` µ q T, for Algorithm, M : % 8µ ` 0 pl ` µ q T, for Algorithm. (9) Therefore (9) is nothing else than (0) after replacing C M by the bigger constant CM. So, all the examples in Remar can be adapted to the new setting and provide framewors which guarantee Assumption. The scenarios which ensure Assumption evidently satisfy Assumption, therefore the results investigated in Section remain valid in this setting. As follows we will provide improvements of the statements used in the convergence analysis which can be obtained thans to Assumptions by using similar techniques. Firstly, by the same arguments as in Lemma 4, we have that for every ě (see ()) L `x` r, z `, y ` ` x ` x L M x ` `ra A x ` z ` z M ď L `x r, z, y ` y ` y (93) and (see (3), (33), (4) and (43)) y ` y ď C L x ` x ` C0 x x ` T 0 A `y y T 0 A `y ` y. (94) By multiplying (94) by and by adding the resulting inequality to (93), we obtain for every ě L `x` r, z `, y ` ` x ` x ` M 3 T 0 A `y ` y ` y ` y z ` z M ` ď L r `x, z, y ` T 0 A `y y ` C 0 x x. (95) 4

A proximal minimization algorithm for structured nonconvex and nonsmooth problems

A proximal minimization algorithm for structured nonconvex and nonsmooth problems A proximal minimization algorithm for structured nonconvex and nonsmooth problems Radu Ioan Boţ Ernö Robert Csetnek Dang-Khoa Nguyen May 8, 08 Abstract. We propose a proximal algorithm for minimizing objective

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from

More information

ADMM for monotone operators: convergence analysis and rates

ADMM for monotone operators: convergence analysis and rates ADMM for monotone operators: convergence analysis and rates Radu Ioan Boţ Ernö Robert Csetne May 4, 07 Abstract. We propose in this paper a unifying scheme for several algorithms from the literature dedicated

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

An inexact subgradient algorithm for Equilibrium Problems

An inexact subgradient algorithm for Equilibrium Problems Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

arxiv: v2 [math.oc] 21 Nov 2017

arxiv: v2 [math.oc] 21 Nov 2017 Unifying abstract inexact convergence theorems and block coordinate variable metric ipiano arxiv:1602.07283v2 [math.oc] 21 Nov 2017 Peter Ochs Mathematical Optimization Group Saarland University Germany

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Erdinç Dündar, Celal Çakan

Erdinç Dündar, Celal Çakan DEMONSTRATIO MATHEMATICA Vol. XLVII No 3 2014 Erdinç Dündar, Celal Çakan ROUGH I-CONVERGENCE Abstract. In this work, using the concept of I-convergence and using the concept of rough convergence, we introduced

More information

ADVANCE TOPICS IN ANALYSIS - REAL. 8 September September 2011

ADVANCE TOPICS IN ANALYSIS - REAL. 8 September September 2011 ADVANCE TOPICS IN ANALYSIS - REAL NOTES COMPILED BY KATO LA Introductions 8 September 011 15 September 011 Nested Interval Theorem: If A 1 ra 1, b 1 s, A ra, b s,, A n ra n, b n s, and A 1 Ě A Ě Ě A n

More information

Computational Statistics and Optimisation. Joseph Salmon Télécom Paristech, Institut Mines-Télécom

Computational Statistics and Optimisation. Joseph Salmon   Télécom Paristech, Institut Mines-Télécom Computational Statistics and Optimisation Joseph Salmon http://josephsalmon.eu Télécom Paristech, Institut Mines-Télécom Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward

More information

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping

Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping March 0, 206 3:4 WSPC Proceedings - 9in x 6in secondorderanisotropicdamping206030 page Approaching monotone inclusion problems via second order dynamical systems with linear and anisotropic damping Radu

More information

An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions

An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions Radu Ioan Boţ Ernö Robert Csetnek Szilárd Csaba László October, 1 Abstract. We propose a forward-backward

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

From error bounds to the complexity of first-order descent methods for convex functions

From error bounds to the complexity of first-order descent methods for convex functions From error bounds to the complexity of first-order descent methods for convex functions Nguyen Trong Phong-TSE Joint work with Jérôme Bolte, Juan Peypouquet, Bruce Suter. Toulouse, 23-25, March, 2016 Journées

More information

Mathematical Finance

Mathematical Finance ETH Zürich, HS 2017 Prof. Josef Teichmann Matti Kiiski Mathematical Finance Solution sheet 14 Solution 14.1 Denote by Z pz t q tpr0,t s the density process process of Q with respect to P. (a) The second

More information

Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems

Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems Jérôme Bolte Shoham Sabach Marc Teboulle Abstract We introduce a proximal alternating linearized minimization PALM) algorithm

More information

Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)

Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016) Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 206) Instructor: Wotao Yin April 29, 207 Given a function f, the proximal operator maps an input point x to the minimizer of f

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

Entropy and Ergodic Theory Lecture 27: Sinai s factor theorem

Entropy and Ergodic Theory Lecture 27: Sinai s factor theorem Entropy and Ergodic Theory Lecture 27: Sinai s factor theorem What is special about Bernoulli shifts? Our main result in Lecture 26 is weak containment with retention of entropy. If ra Z, µs and rb Z,

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

Chapter 2 Convex Analysis

Chapter 2 Convex Analysis Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,

More information

On proximal-like methods for equilibrium programming

On proximal-like methods for equilibrium programming On proximal-lie methods for equilibrium programming Nils Langenberg Department of Mathematics, University of Trier 54286 Trier, Germany, langenberg@uni-trier.de Abstract In [?] Flam and Antipin discussed

More information

BASICS OF CONVEX ANALYSIS

BASICS OF CONVEX ANALYSIS BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

More information

REAL ANALYSIS II TAKE HOME EXAM. T. Tao s Lecture Notes Set 5

REAL ANALYSIS II TAKE HOME EXAM. T. Tao s Lecture Notes Set 5 REAL ANALYSIS II TAKE HOME EXAM CİHAN BAHRAN T. Tao s Lecture Notes Set 5 1. Suppose that te 1, e 2, e 3,... u is a countable orthonormal system in a complex Hilbert space H, and c 1, c 2,... is a sequence

More information

A projection-type method for generalized variational inequalities with dual solutions

A projection-type method for generalized variational inequalities with dual solutions Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 4812 4821 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa A projection-type method

More information

DS-GA 1002: PREREQUISITES REVIEW SOLUTIONS VLADIMIR KOBZAR

DS-GA 1002: PREREQUISITES REVIEW SOLUTIONS VLADIMIR KOBZAR DS-GA 2: PEEQUISIES EVIEW SOLUIONS VLADIMI KOBZA he following is a selection of questions (drawn from Mr. Bernstein s notes) for reviewing the prerequisites for DS-GA 2. Questions from Ch, 8, 9 and 2 of

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological

More information

Entropy and Ergodic Theory Notes 22: The Kolmogorov Sinai entropy of a measure-preserving system

Entropy and Ergodic Theory Notes 22: The Kolmogorov Sinai entropy of a measure-preserving system Entropy and Ergodic Theory Notes 22: The Kolmogorov Sinai entropy of a measure-preserving system 1 Joinings and channels 1.1 Joinings Definition 1. If px, µ, T q and py, ν, Sq are MPSs, then a joining

More information

A gradient type algorithm with backward inertial steps for a nonconvex minimization

A gradient type algorithm with backward inertial steps for a nonconvex minimization A gradient type algorithm with backward inertial steps for a nonconvex minimization Cristian Daniel Alecsa Szilárd Csaba László Adrian Viorel November 22, 208 Abstract. We investigate an algorithm of gradient

More information

arxiv: v2 [math.ca] 13 May 2015

arxiv: v2 [math.ca] 13 May 2015 ON THE CLOSURE OF TRANSLATION-DILATION INVARIANT LINEAR SPACES OF POLYNOMIALS arxiv:1505.02370v2 [math.ca] 13 May 2015 J. M. ALMIRA AND L. SZÉKELYHIDI Abstract. Assume that a linear space of real polynomials

More information

On duality theory of conic linear problems

On duality theory of conic linear problems On duality theory of conic linear problems Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 3332-25, USA e-mail: ashapiro@isye.gatech.edu

More information

arxiv: v2 [math.oc] 28 Jan 2019

arxiv: v2 [math.oc] 28 Jan 2019 On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming Liang Chen Xudong Li Defeng Sun and Kim-Chuan Toh March 28, 2018; Revised on Jan 28, 2019 arxiv:1803.10803v2

More information

On nonexpansive and accretive operators in Banach spaces

On nonexpansive and accretive operators in Banach spaces Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 3437 3446 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa On nonexpansive and accretive

More information

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, 2018 BORIS S. MORDUKHOVICH 1 and NGUYEN MAU NAM 2 Dedicated to Franco Giannessi and Diethard Pallaschke with great respect Abstract. In

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization

Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization Convergence rates for an inertial algorithm of gradient type associated to a smooth nonconvex minimization Szilárd Csaba László November, 08 Abstract. We investigate an inertial algorithm of gradient type

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

EXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE. Leszek Gasiński

EXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE. Leszek Gasiński DISCRETE AND CONTINUOUS Website: www.aimsciences.org DYNAMICAL SYSTEMS SUPPLEMENT 2007 pp. 409 418 EXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE Leszek Gasiński Jagiellonian

More information

Chapter 2 Metric Spaces

Chapter 2 Metric Spaces Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics

More information

Efficient Energy Maximization Using Smoothing Technique

Efficient Energy Maximization Using Smoothing Technique 1/35 Efficient Energy Maximization Using Smoothing Technique Bogdan Savchynskyy Heidelberg Collaboratory for Image Processing (HCI) University of Heidelberg 2/35 Energy Maximization Problem G pv, Eq, v

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

1. Introduction. In this paper we consider a class of matrix factorization problems, which can be modeled as

1. Introduction. In this paper we consider a class of matrix factorization problems, which can be modeled as 1 3 4 5 6 7 8 9 10 11 1 13 14 15 16 17 A NON-MONOTONE ALTERNATING UPDATING METHOD FOR A CLASS OF MATRIX FACTORIZATION PROBLEMS LEI YANG, TING KEI PONG, AND XIAOJUN CHEN Abstract. In this paper we consider

More information

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Optimization Chapter 2 Solutions Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

arxiv: v1 [math.oc] 21 Apr 2016

arxiv: v1 [math.oc] 21 Apr 2016 Accelerated Douglas Rachford methods for the solution of convex-concave saddle-point problems Kristian Bredies Hongpeng Sun April, 06 arxiv:604.068v [math.oc] Apr 06 Abstract We study acceleration and

More information

Entropy and Ergodic Theory Lecture 7: Rate-distortion theory

Entropy and Ergodic Theory Lecture 7: Rate-distortion theory Entropy and Ergodic Theory Lecture 7: Rate-distortion theory 1 Coupling source coding to channel coding Let rσ, ps be a memoryless source and let ra, θ, Bs be a DMC. Here are the two coding problems we

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

For example, p12q p2x 1 x 2 ` 5x 2 x 2 3 q 2x 2 x 1 ` 5x 1 x 2 3. (a) Let p 12x 5 1x 7 2x 4 18x 6 2x 3 ` 11x 1 x 2 x 3 x 4,

For example, p12q p2x 1 x 2 ` 5x 2 x 2 3 q 2x 2 x 1 ` 5x 1 x 2 3. (a) Let p 12x 5 1x 7 2x 4 18x 6 2x 3 ` 11x 1 x 2 x 3 x 4, SOLUTIONS Math A4900 Homework 5 10/4/2017 1. (DF 2.2.12(a)-(d)+) Symmetric polynomials. The group S n acts on the set tx 1, x 2,..., x n u by σ x i x σpiq. That action extends to a function S n ˆ A Ñ A,

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES

EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES JEREMY J. BECNEL Abstract. We examine the main topologies wea, strong, and inductive placed on the dual of a countably-normed space

More information

Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity.

Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Jérôme BOLTE, Aris DANIILIDIS, Olivier LEY, Laurent MAZET. Abstract The classical Lojasiewicz inequality and its extensions

More information

The Proximal Gradient Method

The Proximal Gradient Method Chapter 10 The Proximal Gradient Method Underlying Space: In this chapter, with the exception of Section 10.9, E is a Euclidean space, meaning a finite dimensional space endowed with an inner product,

More information

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms Peter Ochs, Jalal Fadili, and Thomas Brox Saarland University, Saarbrücken, Germany Normandie Univ, ENSICAEN, CNRS, GREYC, France

More information

arxiv: v1 [math.oc] 13 Dec 2018

arxiv: v1 [math.oc] 13 Dec 2018 A NEW HOMOTOPY PROXIMAL VARIABLE-METRIC FRAMEWORK FOR COMPOSITE CONVEX MINIMIZATION QUOC TRAN-DINH, LIANG LING, AND KIM-CHUAN TOH arxiv:8205243v [mathoc] 3 Dec 208 Abstract This paper suggests two novel

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

An introduction to Mathematical Theory of Control

An introduction to Mathematical Theory of Control An introduction to Mathematical Theory of Control Vasile Staicu University of Aveiro UNICA, May 2018 Vasile Staicu (University of Aveiro) An introduction to Mathematical Theory of Control UNICA, May 2018

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

PROPERTIES OF A CLASS OF APPROXIMATELY SHRINKING OPERATORS AND THEIR APPLICATIONS

PROPERTIES OF A CLASS OF APPROXIMATELY SHRINKING OPERATORS AND THEIR APPLICATIONS Fixed Point Theory, 15(2014), No. 2, 399-426 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html PROPERTIES OF A CLASS OF APPROXIMATELY SHRINKING OPERATORS AND THEIR APPLICATIONS ANDRZEJ CEGIELSKI AND RAFA

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Observer design for a general class of triangular systems

Observer design for a general class of triangular systems 1st International Symposium on Mathematical Theory of Networks and Systems July 7-11, 014. Observer design for a general class of triangular systems Dimitris Boskos 1 John Tsinias Abstract The paper deals

More information

Problem: A class of dynamical systems characterized by a fast divergence of the orbits. A paradigmatic example: the Arnold cat.

Problem: A class of dynamical systems characterized by a fast divergence of the orbits. A paradigmatic example: the Arnold cat. À È Ê ÇÄÁ Ë ËÌ ÅË Problem: A class of dynamical systems characterized by a fast divergence of the orbits A paradigmatic example: the Arnold cat. The closure of a homoclinic orbit. The shadowing lemma.

More information

NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2

NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 MARCO VITTURI Contents 1. Solution to exercise 5-2 1 2. Solution to exercise 5-3 2 3. Solution to exercise 5-7 4 4. Solution to exercise 5-8 6 5. Solution

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

Hedy Attouch, Jérôme Bolte, Benar Svaiter. To cite this version: HAL Id: hal

Hedy Attouch, Jérôme Bolte, Benar Svaiter. To cite this version: HAL Id: hal Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods Hedy Attouch, Jérôme Bolte, Benar Svaiter To cite

More information

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh Abstract. In Part I of

More information

A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators

A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators A Brøndsted-Rockafellar Theorem for Diagonal Subdifferential Operators Radu Ioan Boţ Ernö Robert Csetnek April 23, 2012 Dedicated to Jon Borwein on the occasion of his 60th birthday Abstract. In this note

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

CONVERGENCE AND STABILITY OF A REGULARIZATION METHOD FOR MAXIMAL MONOTONE INCLUSIONS AND ITS APPLICATIONS TO CONVEX OPTIMIZATION

CONVERGENCE AND STABILITY OF A REGULARIZATION METHOD FOR MAXIMAL MONOTONE INCLUSIONS AND ITS APPLICATIONS TO CONVEX OPTIMIZATION Variational Analysis and Appls. F. Giannessi and A. Maugeri, Eds. Kluwer Acad. Publ., Dordrecht, 2004 CONVERGENCE AND STABILITY OF A REGULARIZATION METHOD FOR MAXIMAL MONOTONE INCLUSIONS AND ITS APPLICATIONS

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

On the Moreau-Yosida regularization of the vector k-norm related functions

On the Moreau-Yosida regularization of the vector k-norm related functions On the Moreau-Yosida regularization of the vector k-norm related functions Bin Wu, Chao Ding, Defeng Sun and Kim-Chuan Toh This version: March 08, 2011 Abstract In this paper, we conduct a thorough study

More information

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms JOTA manuscript No. (will be inserted by the editor) Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms Peter Ochs Jalal Fadili Thomas Brox Received: date / Accepted: date Abstract

More information

Variational Functions of Gram Matrices: Convex Analysis and Applications

Variational Functions of Gram Matrices: Convex Analysis and Applications Variational Functions of Gram Matrices: Convex Analysis and Applications Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Lin Xiao (Microsoft Research) IMA Workshop on Resource

More information

Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs

Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs Jérôme Bolte and Edouard Pauwels September 29, 2014 Abstract In view of solving nonsmooth and nonconvex

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

On the convergence of a regularized Jacobi algorithm for convex optimization

On the convergence of a regularized Jacobi algorithm for convex optimization On the convergence of a regularized Jacobi algorithm for convex optimization Goran Banjac, Kostas Margellos, and Paul J. Goulart Abstract In this paper we consider the regularized version of the Jacobi

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Convex Optimization Conjugate, Subdifferential, Proximation

Convex Optimization Conjugate, Subdifferential, Proximation 1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview

More information

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION ALESSIO FIGALLI AND YOUNG-HEON KIM Abstract. Given Ω, Λ R n two bounded open sets, and f and g two probability densities concentrated

More information

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT Linear and Nonlinear Analysis Volume 1, Number 1, 2015, 1 PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT KAZUHIRO HISHINUMA AND HIDEAKI IIDUKA Abstract. In this

More information

Inertial forward-backward methods for solving vector optimization problems

Inertial forward-backward methods for solving vector optimization problems Inertial forward-backward methods for solving vector optimization problems Radu Ioan Boţ Sorin-Mihai Grad Dedicated to Johannes Jahn on the occasion of his 65th birthday Abstract. We propose two forward-backward

More information

A user s guide to Lojasiewicz/KL inequalities

A user s guide to Lojasiewicz/KL inequalities Other A user s guide to Lojasiewicz/KL inequalities Toulouse School of Economics, Université Toulouse I SLRA, Grenoble, 2015 Motivations behind KL f : R n R smooth ẋ(t) = f (x(t)) or x k+1 = x k λ k f

More information