The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates

Size: px

Start display at page:

Download "The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates"

Tobias Campbell
5 years ago
Views:

1 The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates Radu Ioan Boţ Dang-Khoa Nguyen January 6, 08 Abstract. We propose two numerical algorithms for minimizing the sum of a smooth function and the composition of a nonsmooth function with a linear operator in the fully nonconvex setting. The iterative schemes are formulated in the spirit of the proximal and, respectively, proximal linearized alternating direction method of multipliers. The proximal terms are introduced through variable metrics, which facilitates the derivation of proximal splitting algorithms for nonconvex complexly structured optimization problems as particular instances of the general schemes. Convergence of the iterates to a KKT point of the objective function is proved under mild conditions on the sequence of variable metrics and by assuming that a regularization of the associated augmented Lagrangian has the Kurdya- Lojasiewicz property. If the augmented Lagrangian has the Lojasiewicz property, then convergence rates of both augmented Lagrangian and iterates are derived. Keywords. nonconvex complexly structured optimization problems, alternating direction method of multipliers, proximal splitting algorithms, variable metric, convergence analysis, convergence rates, Kurdya- Lojasiewicz property, Lojasiewicz exponent AMS subject classification. 47H05, 65K05, 90C6 Introduction. Problem formulation and motivation In this paper, we address the solving of the optimization problem min tg paxq ` h pxqu, () xprn where g : R m Ñ R Y t`8u is a proper and lower semicontinuous function, h: R n Ñ R is a Fréchet differentiable function with L-Lipschitz continuous gradient and A: R n Ñ R m is a linear operator. The spaces R n and R m are equipped with Euclidean inner products x, y and associated norms a x, y, which are both denoted in the same way, as there is no ris of confusion. We start by briefly describing the Alternating Direction Method of Multipliers (ADMM) in the context of solving the more general problem min tf pxq ` g paxq ` h pxqu, () xprn where g and h are assumed to be also convex and f : R n Ñ R Y t`8u is another proper, convex and lower semicontinuous function. We rewrite the problem (), by introducing an auxiliary variable, as min px,zqpr nˆr m Ax z 0 tf pxq ` g pzq ` h pxqu. (3) Faculty of Mathematics, University of Vienna, Osar-Morgenstern-Platz, 090 Vienna, Austria, radu.bot@ univie.ac.at. Research partially supported by FWF (Austrian Science Fund), project I 49-N3. Invited Associate Professor, Babeş-Bolyai University, Faculty of Mathematics and Computer Sciences, str. Mihail Kogălniceanu, Cluj-Napoca, Romania Faculty of Mathematics, University of Vienna, Osar-Morgenstern-Platz, 090 Vienna, Austria, dang-hoa. nguyen@univie.ac.at. The author gratefully acnowledges the financial support of the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by Austrian Science Fund (FWF, project W60-N35).

2 For a fixed real number r ą 0, the augmented Lagrangian associated with problem (3) reads L r : R n ˆ R m ˆ R m Ñ R Y t`8u, L r px, z, yq fpxq ` g pzq ` h pxq ` xy, Ax zy ` r Ax z. Given a starting vector `x 0, z 0, y 0 P R n ˆ R m ˆ R m and tm u Ď R nˆn, ( M Ď Rmˆm, two sequences of symmetric and positive semidefinite matrices, the following proximal ADMM algorithm formulated in the presence of a smooth function and involving variable metrics has been proposed and investigated in [5]: for all ě 0 generate the sequence tpx, z, y qu by x ` P arg min xpr n z ` arg min zpr m # f pxq ` xx x, hpx qy ` r Ax z ` # g pzq ` r Ax` z ` r y y ` y ` `Ax ` z `. r y ` z z M ` + + x x, (4a) M, (4b) In case ρ, it has been proved in [5] that when the set of the Lagrangian associated with (3) (which is nothing else than L r when r 0) is nonempty and the two matrix sequences and A fulfill mild additional assumptions, then the sequence tpx, z, y qu converges to a saddle point of the Lagrangian associated with problem (3) (which is nothing else than L r when r 0) and provides in this way both an optimal solution of () and an optimal solution of its Fenchel dual problem. Furthermore, an ergodic primal-dual gap convergence rate result expressed in terms of the Lagrangian has been shown. In case h 0, the above iterative scheme encompasses different numerical algorithms considered in the literature. When M M 0 for all ě 0, (4a)-(4c) becomes the classical ADMM algorithm ([5,, 5, 6]), which has a huge popularity in the optimization community. And this despite its poor implementation properties caused by the fact that, in general, the calculation of the sequence of primal variables x ( does not correspond to a proximal step. For an inertial version of the classical ADMM algorithm we refer to [0]. When M M and M M for all ě 0, (4a)-(4c) recovers the proximal ADMM algorithm investigated by Shefi and Teboulle in [39] (see also [0, ]). It has been pointed out in [39] that, for suitable choices of the matrices M and M, this proximal ADMM algorithm becomes a primal-dual splitting algorithm in the sense of those considered in [3, 6, 9, 4], and which, due to their full splitting character, overcome the drawbacs of the classical ADMM algorithm. Recently, in [] it ( has been shown that, when f is strongly convex, suitable choices of the non-constant sequences M and ( M may lead to a rate of convergence for the sequence of primal iterates of O p{q. The reason why we address in this paper the slightly less general optimization problem () is exclusively given by the fact that in this setting we can provide sufficient conditions which guarantee that the sequence generated by the ADMM algorithm is bounded. In the nonconvex setting, the boundedness of the sequence tpx, z, y qu plays a central role the convergence analysis. The contributions of the paper are as follows:. We propose a proximal ADMM (P-ADMM) algorithm and a proximal linearized ADMM (PL-ADMM) algorithm for solving () and carry out a convergence analysis in parallel for both algorithms. We first prove under certain assumptions on the matrix sequences boundedness for the sequence of generated iterates tpx, z, y qu. Under these premises, we show that the cluster points of tpx, z, y qu are KKT points of the problem (). Global convergence of the sequence is shown provide that a regularization of the augmented Lagrangian satisfies the Kurdya- Lojasiewicz property. In case this regularization of the augmented Lagrangian has the Lojasiewicz property, we derive rates of convergence for the sequence of iterates. To the best of our nowledge, these are the first results in the literature addressing convergence rates for the nonconvex ADMM.. The two ADMM algorithms under investigation are of relaxed type, namely, we allow ρ P p0, q. We notice that ρ is the standard choice in the literature ([, 5,, 30, 39, 4]). Gabay and Mercier proved in [6] in the convex setting that ρ may be chosen in p0, q, however, the majority of the extensions of the convex relaxed ADMM algorithm assume that ρ P 0, `?5 43, 44] or as for a particular choice of ρ, which is interpreted as a step size, see [7]. (4c), see [0,, 5, 40, The only wor in the nonconvex setting dealing with an alternating minimization algorithm, however for the minimization of the sum of a simple nonsmooth with a smooth function, and which allows a relaxed parameter ρ different from is [44].

3 3. Particular outcomes of the proposed algorithms will be full splitting algorithms for solving the nonconvex complexly structured optimization (), which we will obtain by an appropriate choice of the matrix sequences. (P-ADMM) will give rise to an iterative scheme formulated only in terms of proximal steps for the function g and h and of forward evaluations of the matrix A, while (PL-ADMM) will give rise to an iterative scheme in which the function h will be performed via a gradient step. Exact formulas for proximal operators are available not only for large classes of convex ([8]), but also of nonconvex functions ([3, 3, 9]). The fruitful idea to linearize the step involving the smooth term has been used in the past in the context of ADMM algorithms mostly in the convex setting [3, 36, 37, 43, 45]; the paper [3] being the only exception in the nonconvex setting. For previous wors addressing the ADMM algorithm in the nonconvex setting we mention: [30], where () is studied by assuming that h is twice continuously differentiable with bounded Hessian; [4], where the convergence is studied in the context of solving a very particular nonconvex consensus and sharing problems; and [], where the ADMM algorithm is used in the penalized zero-variance discriminant analysis. In [4] and [3], the investigations of the ADMM algorithm are carried out in very restrictive settings generated by the strong assumptions on the nonsmooth functions and linear operators.. Notations and preliminaries Let N be a strictly positive integer. We denote by : p,..., q P R N and write for x : px,..., x N q, y : py,..., y N q P R N x ă y if and only if x i ă y N. The Cartesian product R N ˆ R N ˆ... ˆ R Np with some strictly positive integer p will be endowed with inner product and associated norm defined for u : pu,..., u p q, u : `u,..., up P R N ˆ R N ˆ... ˆ R Np by g fÿ u, u ui, uid and u e p u i, i respectively. Moreover, for every u : pu,..., u p q, u : `u,..., u p P R N ˆ R N ˆ... ˆ R Np we have ÿ p? p i g fÿ u i ď u e p u i ď i i pÿ u i. (5) We denote by S Ǹ the family of symmetric and positive semidefinite matrices M P R NˆN. Every M P S Ǹ induces a semi-norm defined by x M : xmx, P RN. The Loewner partial ordering on S Ǹ is defined for M, M P S Ǹ as i M ě M ô x M ě x P RN. Thus M P S Ǹ is nothing else than M ě 0. For α ą 0 we set P N α : M P S Ǹ : M ě αid (, where Id denotes the identity matrix. If M P Pα N, then the semi-norm M obviously becomes a norm. The linear operator A is surjective if and only if its associated matrix has full row ran. This assumption is further equivalent to the fact that the matrix associated to AA is positively definite. Since λ min paa q y ď y xaa y, yy A P R m, (6) AA this is further equivalent to λ min paa q ą 0 (and AA P P n λ minpaa q ), where λ minp q denotes the smallest eigenvalue of a matrix. Similarly, A is injective if and only if λ min pa Aq ą 0 (and A A P P n λ minpa Aq ). Proposition. Let Ψ: R N Ñ R be Fréchet differentiable such that its gradient is Lipschitz continuous with constant L ą 0. Then the following statements are true: 3

4 . For every x, y P R N and every z P rx, ys tp tqx ` ty : t P r0, su it holds Ψ pyq ď Ψ pxq ` x Ψ pzq, y xy ` L y x ; (7). If Ψ is bounded from below, then for every σ ą 0 it holds " ˆ inf Ψ pxq xpr N σ L * σ Ψ pxq ą 8. (8) Proof.. Let be x, y P R N and z : p tqx ` ty for t P r0, s. By the fundamental theorem for line integrals we get Since Ψ pyq Ψ pxq ď ż 0 ż 0 ż 0 ż 0 x Ψ pp sqx ` syq, y xy ds x Ψ pp sqx ` syq Ψ pzq, y xy ds ` x Ψ pzq, y xy. (9) x Ψ pp sqx ` syq Ψ pzq, y xy ds ż Ψ pp sqx ` syq Ψ pzq y x ds ď L x y s t ds ˆż t L x y p s ` tq ds ` 0 ż t ˆ ps tq ds L t p tq x y. (0) 0 The inequality (7) is obtained by combining (9) and (0) and by using that 0 ď t ď.. The inequality (7) gives for every x P R N ˆ 8 ă inf Ψ pyq ď Ψ ypr N x σ Bˆ ď Ψ pxq ` Ψ pxq Ψ pxq ˆ σ which leads to the desired conclusion. x σ L σ Ψ pxq Ψ pxq, F x, Ψ pxq ` L ˆ x Ψ pxq σ x Remar. The so-called Descent Lemma, which says that for a Fréchet differentiable function Ψ: R N Ñ R having Lipschitz continuous gradient with constant L ą 0 it holds Ψ pyq ď Ψ pxq ` x Ψ pxq, y xy ` L y y P R N, () follows from statement (i) of the above proposition for z : x. Moreover, for z : y we have that Ψ pxq ě Ψ pyq ` x Ψ pyq, x yy L x y P R N, () which is equivalent to the fact that Ψ ` L is a convex function, in other words, Ψ is a L-semiconvex function ([8]). It follows from the previous result that a Fréchet differentiable function with L-Lipschitz continuous gradient is L-semiconvex. Further, we will recall the definition and some properties of the limiting subdifferential, a notion which will play an important role in the convergence analysis we are going to carry out for the nonconvex 4

5 ADMM algorithm. Let Ψ: R N Ñ R Y t`8u be a proper and lower semicontinuous function. For any x P domψ : x P R N : Ψ pxq ă `8 (, the Fréchet (viscosity) subdifferential of Ψ at x is * p BΨ pxq : "d P R N Ψ pyq Ψ pxq xd, y xy : lim inf ě 0 yñx y x and the limiting (Morduhovich) subdifferential of Ψ at x is BΨ pxq : td P R N : exist sequences x Ñ x and d Ñ d as Ñ `8 such that Ψ `x Ñ Ψ pxq as Ñ `8 and d P p BΨ `x for all ě 0u. For x R dom pψq, we set p BΨ pxq BΨ pxq : H. The inclusion p BΨ pxq Ď Ψ pxq holds for each x P R N in general. In case Ψ is convex, these two subdifferential notions coincide with the convex subdifferential, thus p BΨ pxq BΨ pxq d P R N : Ψ pyq ě Ψ pxq ` xd, y P R N( for all x P domψ. If x P R N is a local minimum of Ψ, then 0 P BΨ pxq. We denote by critpψq tx P R N : 0 P BΨ pxqu the set of critical points of Ψ. The limiting subdifferential fulfills the closedness criterion: if x ( and td u are sequence in R N such that d P Ψ `x for all ě 0 and `x, d Ñ px, dq and Ψ `x Ñ Ψ pxq as Ñ `8, then d P Ψ pxq. We also have the following subdifferential sum rule ([34, Proposition.07], [38, Exercise 8.8]): if Φ: R N Ñ R is a continuously differentiable function, then B pψ ` Φq pxq BΨ pxq ` Φ pxq for all x P R N ; and the following formula for the subdifferential of the composition with a linear operator A: R N Ñ R N ([34, Proposition.], [38, Exercise 0.7]): if x P domψ and A is injective, then B pψ Aq pxq A BΨ paxq. We close this section by presenting some convergence results for real sequences that will be used in the sequel in the convergence analysis. The next lemma is often used in the literature when proving convergence of numerical algorithms relying on Fejér monotonicity techniques (see, for instance, [, Lemma.], [4, Lemma ]). Lemma. Let tb u be a sequence in R and tξ u a sequence in R`. Assume that tb u is bounded from below and that for every ě 0 Then the following statements hold: b ` ` ξ ď b.. the sequence tξ u is summable, namely ÿ ξ ă `8.. the sequence tb u is monotonically decreasing and convergent. The following lemma, which is an extension of [, Lemma.3] (see, also [4, Lemma 3]), is of interest by its own. Lemma 3. Let a : `a (, a,..., a N be a sequence in RǸ and tδ u a sequence in R D, a ` c 0, a D a D a D ` ě, (3) where c 0 : pc 0,, c 0,,..., c 0,N q P R N, c : pc,, c,,..., c,n q P R Ǹ and c : pc,, c,,..., c,n q P R Ǹ fulfill c 0 ` c ` c ă. Assume further that there exists δ s ě 0 such that for every K ě K ě Then for every i,..., N we have K δ ď s δ. ÿ a i ă `8. In particular, for every i,..., N and every K ě K ě, it holds K a i ď Nÿ j p c 0,j c,j q a K j ` p c 0,jq a K` j ı ` a K` j ` sδ c 0,i c,i c,i. (4) 5

6 Proof. Fix K ě K ě. If K K or K K ` then (4) holds automatically. Consider now the case when K ě K `. Summing up the inequality(3) for K `,, K, we obtain C Since, K` a ` G ď C K` c 0, K` K` K` K` a ` a a a a G ` K` ÿ K`3 K K ÿ K` K ÿ K the inequality (5) can be rewritten as C `, C K c, C a c, K K` a `a K ` a K` a a K K a G ` C c, K` a G ` a ` a K` a K a K` a K` a a K K ` a a a K K ` a, G C E a ` A, a K` a K a K` a K` ď c 0, K which further implies» Nÿ p c 0,j c,j c,j q j G C E a Ac, a K ` a K ` c, K a j fi fl Hence, for every i,..., N it holds C K c 0 c c, K a G G E a Ac, a K ` a K ` K a G 0, a K ` a K`D K` c 0 c, a KD c 0, a K`D a K`D ` Nÿ j p c 0,j c,j q a K j ` p c 0,jq a K` j ı ` a K` j δ, δ. (5) K` ` δ K` δ. p c 0,i c,i c,i q K a i ď Nÿ j p c 0,j c,j q a K j ` p c 0,jq a K` j ı ` a K` j ` sδ and the conclusion follows by taing into consideration that c 0 ` c ` c ă. A proximal ADMM and a proximal linearized ADMM algorithm in the nonconvex setting In this section we will propose two proximal ADMM algorithms for solving the optimization problem () and we will study their convergence behaviour. In this context, a central role will be played by the augmented Lagrangian associated with problem (), which is defined for every r ą 0 as L r : R n ˆ R m ˆ R m Ñ R Y t`8u, L r px, z, yq g pzq ` h pxq ` xy, Ax zy ` r Ax z. 6

7 . General formulations and particular instances written in the spirit of full proximal splitting algorithms Algorithm. Let be the matrix sequences ( M P Sǹ, ( M P Sm`, r ą 0 and 0 ă ρ ă. For a given starting vector `x 0, z 0, y 0 P R n ˆ R m ˆ R m, generate the sequence `x, z, y ( for every ě 0 as: " z ` P arg min g pzq Ax z D ` r Ax z ` * z z, (6a) zpr m M " x ` P arg min h pxq Ax z `D ` r Ax z ` ` * x x, (6b) xpr n M y ` : y ` `Ax ` z `. (6c) Let tt u be a sequence of positive real numbers such that t r A ď, M : t Id ra A and M `x : 0 for every ě 0. Algorithm becomes an iterative scheme which generates a sequence, z, y ( for every ě 0 as: # z ` P arg min zpr m x ` P arg min xpr n g pzq ` r z Ax + r y, " h pxq ` x x ` t y A ` r `Ax z ` t y ` : y ` `Ax ` z `. Recall that the proximal point operator with parameter γ ą 0 of a proper and lower semicontinuous function Ψ: R N Ñ R Y t`8u is the set-valued operator defined as ([35]) " prox γψ : R N ÞÑ RN, prox γψ pxq arg min Ψ pyq ` * x y. ypr N γ The above particular instance of Algorithm is an iterative scheme formulated in the spirit of full splitting numerical methods, namely, the functions g and h are evaluated by their proximal operators, while the linear operator A and its adjoint are evaluated by simple forward steps. Exact formulas for the proximal operator are available not only for large classes of convex functions ([8]), but also for many nonconvex functions appearing in applications ([3, 3, 9]). The second algorithm that we propose in this paper replaces h in the definition of x ` by its linearization at x for every ě 0. Algorithm. Let be the matrix sequences ( M P Sǹ, ( M P Sm`, r ą 0 and 0 ă ρ ă. For a given starting vector `x 0, z 0, y 0 P R n ˆ R m ˆ R m, generate the sequence `x, z, y ( for every ě 0 as: " z ` P arg min g pzq Ax z D ` r zpr m x ` P arg min xpr n Ax z ` x, h `x D Ax z `D ` r y ` : y ` `Ax ` z `. * z z M Ax z ` ` *,, (7a) *, (7b) x x M Due to the presence of the variable metric inducing matrix sequences we can thus provide a unifying scheme for several linearized ADMM algorithms discussed in the literature (see [3, 3, 36, 37, 43, 45]), which can be recovered for specific choices of the variable metrics. When taing as for Algorithm M : t Id ra A, where t r A ď, and M : 0, for every ě 0, then Algorithm translates for every ě 0 into: z ` P arg min zpr m # g pzq ` r z Ax r y +, x ` : x t ` h `x ` A y ` r `Ax z `, y ` : y ` `Ax ` z `. (7c) 7

8 This iterative scheme has the remarable property that the smooth term is evaluated via a gradient step. This is an improvement with respect to other nonconvex ADMM algorithms, such as [4, 44], where the smooth function is involved in a subproblem, which can be in general difficult to solve, unless it can be reformulated as a proximal step (see [30]). We will carry out a parallel convergence analysis for Algorithm and Algorithm and wor to this end in the following setting. Assumption. Assume that A is surjective and r ą 0, ρ P p0, q, µ : sup M ă `8 and µ : sup M ă `8 are such that there exists γ ą with r ě p ` γq T L ą 0 (8) and where $ ρ & T 0 : λ min paa qρ r ρ % λ min paa q p ρq and M 3 : M ` ra A C Id ě 3 C ě 0, (9) if 0 ă ρ ď, if ă ρ ă, $ 4T & µ, for Algorithm, C 0 : r % 4T pl ` µ q, for Algorithm, r $ &, if 0 ă ρ ď, T : λ min paa qρ ρ %, if ă ρ ă, λ min paa q p ρq $ & L ` 4T pl ` µ q, for Algorithm, C : r % L ` 4T µ, for Algorithm. r Remar. Notice that (9) can be equivalently written as $ & 6µ M `ra A `L ` r C M ` 4 pl ` µ q T, for Algorithm, Id ě ě 0, where CM : % 4µ ` 6 pl ` µ q T, for Algorithm. (0) In the following we present some possible choices of the matrix sequences ( M and ( M which fulfill Assumption.. Since ra A P S ǹ, when sup M µ ą L, by choosing there exists α ą 0 such that " r ě max p ` γq T L, µ ě α ě * C M ą 0, µ L ˆ L ` CM ą 0. r Thus (8) is verified, while (0) is ensured when choosing M such that µ Id ě M ě α Id for every ě 0. # +. Let M : t Id ra A for every ě 0, where 0 ă t ă min r A,. Then the relation (0) L becomes t Id ra A `L ` r C M Id ě 0, which automatically holds (as also (8) does), if " * tc M r ě max p ` γq T L, ą 0. tl 8

9 3. If A is assumed to be also injective, then ra A ě rλ min pa Aq ą 0. By choosing # + it follows that r ě max p ` γq T L, L ` al ` 4λ min pa Aq C M λ min pa Aq ra A `L ` r C M Id ě 0, thus, (8) and (0) hold for an arbitrary sequence of symmetric and positive semidefinite matrices M (. A possible choice is M 0 and M 0 for every ě 0, which allows us to recover the classical ADMM. When proving convergence for variable metric algorithms designed for convex optimization problems one usually assumes monotonicity for the matrix sequences inducing the variable metrics (see, for instance, [7, 5]). It is worth to mention that in this paper we manage to perform the convergence analysis for both Algorithm and Algorithm without any monotonicity assumption on ( M and ( M.. Preliminaries of the convergence analysis The following result of Fejér monotonicity type will play a fundamental role in our convergence analysis. Lemma 4. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Then for every ě it holds: L `x` r, z `, y ` ` T `y A ` 0 y ` ď L r `x, z, y ` T 0 A `y y ` C0 ą 0, x ` x ` M z ` z 3 M x x. () Proof. Let ě be fixed. In both cases the proof builds on showing that the following inequality L `x` r, z `, y ` ` x ` x L M x ` x `ra A ` z ` z M ď L `x r, z, y ` y ` y () is true and on providing afterwards an upper bound for. For Algorithm : From (6a) we have y ` y. g `z ` Ax z `D ` r Ax z ` ` z ` z M ď g `z Ax z D ` r Ax z. (3) The optimality criterion of (6b) is h `x ` A y `Ax ra ` z ` ` M `x x `. (4) From (7) (applied for z : x ` ) we get h `x ` ď h `x Ax Ax `D ` Ax ` z `, Ax Ax `D x ` x M ` L x ` x. (5) By combining (3), (5) and (6c), after some rearrangements, we obtain (). By using the notation u l : h `x l ` M l `xl x ě (6) and by taing into consideration (6c), we can rewrite (4) as A y l` ρu l` ` p ρq A y ě 0. (7) 9

10 The case 0 ă ρ ď. We have A `y ` y ρ Since 0 ă ρ ď, the convexity of gives and from here we get `u ` u ` p ρq A `y y. `y A ` y ď ρ u ` u ` p ρq `y A y. λ min paa qρ y ` y ď ρ `y A ` y ď ρ u ` u ` p ρq `y A y p ρq `y A ` y, (8) By using the Lipschitz continuity of h, we have u ` u ď pl ` µ q x ` x ` µ x x, (9) thus u ` u ď pl ` µ q x ` x ` µ x x. (30) After plugging (30) into (8), we get y ` y ď pl ` µ q λ min paa q ` p ρq λ min paa qρ r which, combined with (), provides (). x ` x ` µ λ min paa q A `y y The case ă ρ ă. This time we have from (7) that A `y ` y p ρq As ă ρ ă, the convexity of gives x x p ρq λ min paa qρ r ρ `u` u ρ `y ` pρ q A y. `y A ` y ď ρ u ` u ` pρ q `y A y. ρ and from here it follows A `y ` y, λ min paa q p ρq y ` y ď p ρq `y A ` y ď ρ u ` u ` pρ q `y A y pρ q `y A ` y, (3) ρ After plugging (30) into (3), we get y ` y ď ρ pl ` µ q λ min paa q p ρq r pρ q ` λ min paa q p ρq pρ q λ min paa q p ρq which, combined with (), provides ().. For Algorithm : The optimality criterion of (7b) is x ` x ` A `y y (3) ρµ x λ min paa q p ρq x r A `y ` y, (33) h `x A y `Ax ra ` z ` ` M `x x `. (34) 0

11 From (7) (applied for z : x ) we get h `x ` ď h `x Ax Ax `D ` Ax ` z `, Ax Ax `D x ` x M ` L x ` x. (35) Since the definition of z ` in (7a) leads also to (3), by combining this inequality with (35) and (7c), after some rearrangments, () follows. By using this time the notation u l : h `x l ` M l `xl x ě (36) and by taing into consideration (7c), we can rewrite (34) as The case 0 ă ρ ď. As in (8), we obtain A y l` ρu l` ` p ρq A y ě 0. (37) λ min paa qρ y ` y ď ρ `y A ` y ď ρ u ` u ` p ρq `y A y p ρq `y A ` y. (38) By using the Lipschitz continuity of h, we have u ` u ď µ x ` x ` pl ` µ q x x, (39) thus u ` u ď µ x ` x ` pl ` µ q x x. (40) After plugging (40) into (38), it follows y ` y ď µ λ min paa q p ρq ` λ min paa qρ r which, combined with (), provides (). The case ă ρ ă. As in (3), we obtain x ` x ` pl ` µ q λ min paa q A `y y x x p ρq λ min paa qρ r A `y ` y, λ min paa q p ρq y ` y ď p ρq `y A ` y ď ρ u ` u ` pρ q `y A y pρ q `y A ` y. (4) ρ After plugging (40) into (4), it follows y ` y ď ρµ λ min paa q p ρq r pρ q ` λ min paa q p ρq pρ q λ min paa q p ρq which, combined with (), provides (). This concludes the proof. x ` x ` The following three estimates will be useful in the sequel. A `y y ρ pl ` µ q λ min paa q p ρq r (4) x x A `y ` y. (43) Lemma 5. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Then the following statements are true:

12 (i) z ` z ď A x ` x ` Ax ` z ` ` Ax z A x ` x ` y ` y ` y ě ; (44) (ii) (iii) y ` ď T 0 `y A ` y ` T `x` h ` C0 x ` ě 0; (45) r r 4 y ` y ď C3 x ` x ` C4 x x ` T ` A `y y A `y ` ě, (46) where $ ρ pl ` µ q a, for Algorithm, & λmin paa q p ρ q C 3 : ρµ % a, for Algorithm, λmin paa q p ρ q $ ρµ a, for Algorithm, & λmin paa q p ρ q C 4 : ρ pl ` µ q % a, for Algorithm, λmin paa q p ρ q T : ρ a λmin paa q p ρ q. (47) Proof. The statement in (44) is straightforward. From (7) and (37) we have for every ě 0 or, equivalently A y ` ρu ` ` p ρq A y ρa y ` ρu ` ` p ρq A `y y `, where u ` is defined as being equal to u ` in (6), for Algorithm, and, respectively, to u ` in (36), for Algorithm. For 0 ă ρ ď we have λ min paa qρ y ` ď ρ A y ` ď ρ u ` ` p ρq A `y ` y, (48) while when ă ρ ă we have λ min paa qρ y ` ď ρ A y ` ď Notice further that when ă ρ ă we have {ρ ă and ă ρ{ p ρq. When u ` is defined as in (6), it holds ρ u ` ` pρ q `y A ` y. (49) ρ u ` u ` ď `x` h ` µ x ` ě 0, (50) while, when u ` is defined as in (36), it holds u ` u ` ď h `x ` ` pl ` µ q x ` ě 0. (5) We divide (48) and (49) by λ min paa qρ r ą 0 and plug (50) and, respectively, (5) into the resulting inequalities. This gives us (45).

13 Finally, in order to prove (46), we notice that for every ě it holds A `y ` y ď ρ u ` u ` ρ A `y y, so, a λmin paa q p ρ q y ` y ď p ρ q A `y ` y ď ρ u ` u ` ρ A `y y ρ A `y ` y. (5) We plug into (5) the estimates for u ` u derived in (9) and, respectively, (39) and divide the resulting inequality by a λ min paa q p ρ q ą 0. This furnishes the desired statement. The following regularization of the augmented Lagrangian will play an important role in the convergence analysis of the nonconvex proximal ADMM algorithms: F r : R n ˆ R m ˆ R m ˆ R n ˆ R m Ñ R Y t`8u, F r px, z, y, x, y q L r px, z, yq ` T 0 A `y y ` C0 where T 0 and C 0 are defined in Assumption. For every ě, we denote F : F r `x, z, y, x, y L r `x, z, y ` T 0 A `y y ` C0 x x, (53) x x. (54) `x Since the convergence analysis will rely on the fact that the set of cluster points of the sequence, z, y ( is nonempty, we will present first two situations which guarantee that this sequence is bounded. They mae use of standard coercivity assumptions for the functions g and h, respectively. Recall that a function Ψ : R N Ñ R Y t`8u is called coercive, if lim x Ñ`8 Ψ pxq `8. Theorem 6. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Suppose that one of the following conditions holds: (B-I) The operator A is invertible, g is coercive and h is bounded from below; (B-II) The function h is coercive and g and h are bounded from below. Then the sequence `x, z, y ( is bounded. Proof. From Lemma 4 we have that for every ě F ` ` x ` x ` z ` z ď F M 3 C0Id M (55) which shows, according to (9), that tf u ě is monotonically decreasing. Consequently, for every ě we have F ě F ` ` x ` x ` M 3 C0Id h `x ` ` g `z ` r ` T 0 A `y ` y ` which, thans to (45), leads to F ě h `x ` ` g `z ` T r ` T0 `y A ` y ` y ` ` r z ` z M Ax` z ` ` x ` x M 3 C0Id ` h `x ` ` r r y` Ax` z ` ` x ` x M 3 C0Id ` z ` z M ` C0 }x` x }, r y` z ` z M ` C0 4 }x` x }. (56) Next we will prove the boundedness of `x, z, y ( under each of the two scenarios. 3

14 (B-I) Since r ě p ` γq T L ą T L ą 0, there exists σ ą 0 such that σ L σ T r. From Proposition and (56) we see that for every ě g `z ` ` r Ax` z ` ` y` r ` C0 }x` x } 4 " * ď F inf h pxq T h pxq ă `8. xpr n r Since g is coercive, it follows that the sequences z (, Ax z ` r y ( and x ` x ( are bounded. This implies that Apx ` x q pz ` z q ( is bounded, from which we obtain the boundedness of r py ` y q (. According to the third update in the iterative scheme, we obtain that Ax z ( and thus y ( are also bounded. This implies the boundedness of Ax ( and, finally, since A is invertible, the boundedness of x (. (B-II) Again thans to (8) there exists σ ą 0 such that σ L σ p ` γq T r. We assume first that ρ or, equivalently, T 0 0. From Proposition and (56) we see that for every ě ˆ h `x ` ` T h `x ` ` r γ γr Ax` z ` ` y` r ` T0 }A py ` y q} ď F g `z ` γ inf xpr n " h pxq p ` γq T h pxq r * ă `8. Since h is coercive, we obtain that x (, Ax z ` r y ( and A `y ` y ( are bounded. For every ě 0 we have that λ min pa Aqρ r }Ax ` z ` } λ min pa Aqρ r }y ` y } ď }A py ` y q}, thus Ax z ( is bounded. Consequently, y ( and z ( are bounded. In case ρ or, equivalently, T 0 0, we have that for every ě ˆ h `x ` ` T `x` h ` r γ γr Ax` z ` ` ď F g `z ` " γ inf h pxq p ` γq T * h pxq ă `8, xpr n r r y` from which we deduce that x ( and Ax z ` r y ( are bounded. From Lemma 5 (iii) it yields that y ` y ( is bounded, thus, Ax z ( is bounded. Consequently, y ( ( and z are bounded. Both considered scenarios lead to the conclusion that the sequence `x, z, y ( is bounded. We state now the first convergence result of this paper. Theorem 7. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm, which is assumed to be bounded. The following statements are true: (i) For every ě it holds F ` ` C0 4 x ` x ` z ` z ď F M. (57) (ii) The sequence tf u is bounded from below and convergent. Moreover, x ` x Ñ 0, z ` z Ñ 0 and y ` y Ñ 0 as Ñ `8. (58) 4

15 (iii) The sequences tf u, L `x r, z, y ( and h `x ` g `z ( have the same limit, which we denote by F P R. Proof. (i) According to (9) we have that M 3 C 0 Id P P n C 0 and thus (55) implies (57). (ii) We will show that L `x r, z, y ( is bounded from below, which will imply that tf u is bounded from below as well. Assuming the contrary, as `x, z, y ( is bounded, there exists a subsequence `xq, z q, y q (qě0 converging to an element ppx, pz, pyq P Rn ˆ R m ˆ R m such that Lr `x q, z q, y q ( converges to 8 as q Ñ `8. However, using the lower semicontinuity of g qě0 and the continuity of h, we obtain lim inf L r `x q, z q, y q r ě h ppxq ` g ppzq ` xpy, Apx pzy ` qñ`8 Apx pz, which leads to a contradiction. From Lemma we conclude that tf u ě is convergent and ÿ x ` x ă `8, thus x ` x Ñ 0 as Ñ `8. We proved in (3), (33), (4) and (43) that for every ě y ` y ď C L x ` x ` C0 x x ` T 0 A `y y T 0 A `y ` y. Summing up the above inequality for,..., K, for K ą, we get y ` y ď C L x ` x ` C0 x x ` T 0 A `y y 0 T 0 A `y K` y K ď C L We let K converge to `8 and conclude ÿ x ` x ` C0 Ax ` z ` ÿ x x ` T 0 A `y y 0. y ` y ă `8, thus Ax ` z ` Ñ 0 and y ` y Ñ 0 as Ñ `8. Since x ` x Ñ 0 as Ñ `8, it follows that z ` z Ñ 0 as Ñ `8. (iii) By using (58) and the fact that y ( is bounded, it follows F lim F lim L `x r, z, y `x h ` g `z (. Ñ`8 Ñ`8 lim Ñ`8 The following lemmas provides upper estimates in terms of the iterates for limiting subgradients of the augmented Lagrangian and the regularized augmented Lagrangian F r, respectively. Lemma 8. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. For every ě 0 we have d ` ` : `d x, d ` z, d ` `x` y P BLr, z `, y `, (59) where `x` d ` x : C ` h h `x ` `y A ` y ` M `x x `, d ` z : y y ` ` ra `x x ` ` M `z z `, d ` y : `y` y. (60a) (60b) (60c) 5

16 and C : #, for Algorithm, 0, for Algorithm. where Moreover, for every ě 0 it holds d ` ď C 5 x ` x ` C 6 z ` z ` C 7 y ` y, (6) C 5 : C L ` µ ` r A, C 6 : µ, C 7 : ` A `. (6) Proof. Let ě 0 be fixed. Applying the calculus rules of the limiting subdifferential, we obtain x L r `x`, z `, y ` h `x ` ` A y ` ` ra `Ax ` z `, B z L r `x`, z `, y ` Bg `z ` y ` r `Ax ` z `, y L r `x`, z `, y ` Ax ` z `. (63a) (63b) (63c) Then (60c) follows directly from (63c) and (6c), respectively, (7c), while (60b) follows from y ` rpax z ` q ` M `z z ` P Bg `z `, which is a consequence of the optimality criterion of (6a) and (7a), respectively. In order to derive (60a), let us notice that for Algorithm we have (see (4)) A y ` M `x x ` h `x ` ` `Ax ra ` z `, (64) while for Algorithm we have (see (34)) h `x A y ` M `x x ` `Ax ra ` z `. (65) By using (63a) we get the desired statement. Relation (6) follows by combining the inequalities d ` x ď pc L ` µ q x ` x ` A y ` y, d ` ď y y ` ` r A x ` x ` µ z ` z. with (5). z Lemma 9. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. For every ě 0 we have D ` :, D `, D `, D `, D` P BF `x` r, z `, y `, x, y (66) where D ` x D ` x : d ` `x` x ` C 0 x, Dz ` D ` x : C `x` 0 x, Moreover, for every ě 0 it holds where z y x y : d ` z, Dy ` : d ` y ` T `y 0 AA ` y, D ` y : T 0AA `y ` y. (67) D ` ď C 8 x ` x ` C9 z ` z ` C0 y ` y, (68) C 8 : C 5 ` C 0, C 9 : C 6, C 0 : C 7 ` 4T A. (69) Proof. Let ě 0 be fixed. Applying the calculus rules of the limiting subdifferential it follows x F `x` r, z `, y `, x, y : x L `x` r, z `, y ` `x` ` C 0 x, B z F `x` r, z `, y `, x, y : B z L `x` r, z `, y ` y F `x` r, z `, y `, x, y : y L `x` r, z `, y ` ` T `y 0 AA ` y, x F `x` r, z `, y `, x, y `x` : C 0 x, y F `x` r, z `, y `, x, y : T `y 0 AA ` y, (70a) (70b) (70c) (70d) (70e) 6

17 Then (66) follows directly from the above relations and (59). Inequality (68) follows by combining D ` x ď d ` x ` C0 x ` x, D ` ď d ` ` T A y ` y. with (5). y y The following result is a straightforward consequence of Lemma 5 and Lemma 9. Corollary 0. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm. Then the norm of the element D ` P BF `x` r, z `, y `, x, y defined in the previous lemma verifies for every ě the following estimate ` D ` ď C x ` x ` x x ` x x where ` C ` A `y y A `y ` y ` C 3 ` A `y y A `y y, (7) " C : max C 8 ` C 9 A ` C 3 C 0 ` C3C 9 C : ˆ C 0 ` C9 T, C 3 : C 9T In the following, we denote by ω u (, C 4C 0 ` C3C 9, C 4C 9 *,. (7) the set of cluster points of a sequence u ( Ď RN. Lemma. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm, which is assumed to be bounded. The following statements are true: (i) if px q, z q, y q q ( is a subsequence of `x, z, y ( which converges to ppx, pz, pyq as q Ñ `8, qě0 then lim L r `x q, z q, y q Lr ppx, pz, pyq ; qñ8 (ii) it holds `x ω, z, y ( Ď crit pl r q Ď tppx, pz, pyq P R n ˆ R m ˆ R m : A py h ppxq, py P Bg ppzq, pz Apxu ; `x (iii) we have lim dist, z, y `x, ω, z, y ( ı 0; Ñ`8 `x (iv) the set ω, z, y ( is nonempty, connected and compact; (v) the function L r taes on ω `x, z, y ( objective function g A ` h does on Pr R n the value F `x ω, z, y ( lim Ñ`8 L `x r, z, y, as the ı. `x Proof. Let ppx, pz, pyq P ω, z, y ( and `xq, z q, y q (qě0 be a subsequence of x, z, y ( converging to ppx, pz, pyq as q Ñ `8. (i) From either (6a) or (7a) we obtain for all q ě g `z D ` y q, Ax q r z q ` ď g ppzq q, Ax q pz D ` r Ax q z q ` Ax q pz ` Taing the limit superior on both sides of the above inequalities, we get lim sup g `z q ď g ppzq, qñ8 z q z q q M pz z q q M. 7

18 which, combined with the lower semicontinuity of g, leads to lim g `z q g ppzq. qñ8 Since h is continuous, we further obtain lim L r `x q, z q, y q lim g `z D ` h `x q ` y q r, Ax q z q ` ı Ax q z q qñ8 qñ8 g ppzq ` h ppxq ` xpy, Apx pzy ` r Apx pz L r ppx, pz, pyq. (ii) For the sequence d ( defined in (60a) - (60c), we have that dq P BL r px q, z q, y q q for every q ě and d q Ñ 0 as q Ñ `8, while `x q, z q, y q Ñ ppx, pz, pyq and Lr `x q, z q, y q Ñ Lr ppx, pz, pyq as q Ñ `8. The closedness criterion of the limiting subdifferential guarantees that 0 P BL r ppx, pz, pyq or, in other words, ppx, pz, pyq P crit pl r q. Choosing now an element ppx, pz, pyq P crit pl r q, it holds which is further equivalent to 0 h ppxq ` A py ` ra papx pzq, 0 P Bg ppzq py r papx pzq, 0 Apx pz, A py h ppxq, py P Bg ppzq, pz Apx. (iii)-(iv) The proof follows in the lines of the proof of Theorem 5 (ii)-(iii) in [9], also by taing into consideration [9, Remar 5], according to which the properties in (iii) and (iv) are generic for sequences satisfying `x `, z `, y ` `x, z, y Ñ 0 as Ñ `8, which is indeed the case due to (58). (v) The conclusion follows according to the first two statements of this theorem and of the third statement of Theorem 7. Remar 3. An element ppx, pz, pyq P R n ˆ R m ˆ R m fulfilling A py h ppxq, py P Bg ppzq, pz Apx is a so-called KKT point of the optimization problem (). For such a KKT point we have When A is injective this is further equivalent to 0 A Bg papxq ` h ppxq. (73) 0 P Bpg Aqppxq ` h ppxq B pg A ` hq ppxq, (74) in other words, px is a critical point of the optimization problem (). On the other hand, when the functions g and h are convex, then (73) and (74) are equivalent, which means that px is a global optimal solution of the optimization problem (). In this case, py is a global optimal solution of the Fenchel dual problem of (). By combining Lemma 9, Theorem 7 and Lemma, one obtains the following result. Lemma. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm `x or Algorithm, which is assumed to be bounded. Denote by Ω : ω, z, y, x, y (. ě The following statements are true: (i) it holds Ω Ď tppx, pz, py, px, pyq P R n ˆ R m ˆ R m ˆ R n ˆ R m : ppx, pz, pyq P crit pl r qu ; (ii) we have lim dist `x, z, y, x, y, Ω 0; Ñ`8 (iii) the set Ω is nonempty, connected and compact; (iv) the regularized augmented Lagrangian F r taes on Ω the value F lim Ñ`8 F, as the objective function g A ` h does on Pr R nω. 8

19 .3 Convergence analysis under Kurdya- Lojasiewicz assumptions In this subsection we will prove global convergence for the sequence `x, z, y ( generated by the two nonconvex proximal ADMM algorithms in the context of K L property. The origins of this notion go bac to the pioneering wor of Kurdya who introduced in [8] a general form of the Lojasiewicz inequality ([33]). A further extension to the nonsmooth setting has been proposed and studied in [6, 7, 8]. We recall that the distance function of a given set Ω Ď R N is defined for every x by dist px, Ωq : inf t x y : y P Ωu. If Ω H, then dist px, Ωq `8. Definition. Let η P p0, `8s. We denote by Φ η the set of all concave and continuous functions ϕ: r0, ηq Ñ r0, `8q which satisfy the following conditions:. ϕ p0q 0;. ϕ is C on p0, ηq and continuous at 0; 3. for all s P p0, ηq : ϕ psq ą 0. Definition. Let Ψ: R N Ñ R Y t`8u be proper and lower semicontinuous.. The function Ψ is said to have the Kurdya- Lojasiewicz (K L) property at a point pu P dombψ : u P R N : BΨ puq H (, if there exists η P p0, `8s, a neighborhood U of pu and a function ϕ P Φ η such that for every the following inequality holds u P U X rψ ppuq ă Ψ puq ă Ψ ppuq ` ηs ϕ pψ puq Ψ ppuqq dist p0, BΨ puqq ě.. If Ψ satisfies the K L property at each point of dombψ, then Ψ is called K L function. The functions ϕ belonging to the set Φ η for η P p0, `8s are called desingularization functions. The K L property reveals the possibility to reparameterize the values of Ψ in order to avoid flatness around the critical points. To the class of K L functions belong semialgebraic, real subanalytic, uniformly convex functions and convex functions satisfying a growth condition. We refer the reader to [, 3, 4, 6, 7, 8, 9] and to the references therein for more properties of K L functions and illustrating examples. The following result, taen from [9, Lemma 6], will be crucial in our convergence analysis. Lemma 3. (Uniformized K L property) Let Ω be a compact set and Ψ: R N Ñ RYt`8u be a proper and lower semicontinuous function. Assume that Ψ is constant on Ω and satisfies the K L property at each point of Ω. Then there exist ε ą 0, η ą 0 and ϕ P Φ η such that for every pu P Ω and every element u in the intersection u P R N : dist pu, Ωq ă ε ( X rψ ppuq ă Ψ puq ă Ψ ppuq ` ηs it holds ϕ pψ puq Ψ ppuqq dist p0, BΨ puqq ě. Woring in the hypotheses of Lemma, we define for every ě E : F `x, z, y, x, y F F F ě 0, (75) where F is the limit of tf u ě as Ñ `8. The sequence te u ě is monotonically decreasing and it converges to 0 as Ñ `8. The next result shows that when the regularization of the augmented Lagrangian F r is a K L function, then the sequence `x, z, y ( converges to a KKT point of the optimization problem (). Theorem 4. Let Assumption be satisfied and `x, z, y ( be a sequence generated by Algorithm or Algorithm, which is assumed to be bounded. If F r is a K L function, then the following statements are true: (i) the sequence `x, z, y ( has finite length, namely, ÿ x ` x ÿ ă `8, z ` z ă `8, 9 ÿ y ` y ă `8; (76)

20 (ii) the sequence `x, z, y ( converges to a KKT point of the optimization problem (). `x Proof. As in Lemma, we denote by Ω : ω, z, y, x, y (, which is a nonempty set. ě Let be ppx, pz, py, px, pyq P Ω, thus F r ppx, pz, py, px, pyq F. We have seen that te F F u ě converges to 0 as Ñ `8 and will consider, consequently, two cases. First we assume that there exists an integer ě 0 such that E 0 or, equivalently, F F. Due to the monotonicity of te u ě, it follows that E 0 or, equivalently, F F for all ě. Combining inequality (57) with Lemma 5, it yields that x ` x 0 for all ě `. Using Lemma 5 (iii) and telescoping sum arguments, it yields ÿ y ` y ă `8. Finally, by using Lemma 5 (i), we get that ÿ z ` z ă `8. Consider now the case when E ą 0 or, equivalently, F ą F for every ě. According to Lemma 3, there exist ε ą 0, η ą 0 and a desingularization function ϕ such that for every element u in the intersection tu P R n ˆ R m ˆ R m ˆ R n ˆ R m : dist pu, Ωq ă εu X tu P R n ˆ R m ˆ R m ˆ R n ˆ R m : F ă F r puq ă F ` ηu (77) it holds ϕ pf r puq F q dist p0, BF r puqq ě. Let be ě such that for all ě F ă F ă F ` η. Since lim dist `x, z, y, x, y, Ω 0, see Lemma (ii), there exists ě such that for all Ñ`8 ě dist `x, z, y, x, y, Ω ă ε. Thus, `x, z, y, x, y belongs to the intersection in (77) for all ě 0 : max t,, 3u, which further implies ϕ pf F q dist `0, BF r `x, z, y, x, y ϕ pe q dist `0, BF r `x, z, y, x, y ě. (78) Define for two arbitrary nonnegative integers p and q Then for all K ě 0 ě it holds p,q : ϕ pf p F q ϕ pf q F q ϕ pe p q ϕ pe q q. from which we get ÿ ě,` ă `8. 0,` 0,K` ϕ pe 0 q ϕ pe K` q ď ϕ pe 0 q, By combining Theorem 7 (i) with the concavity of ϕ we obtain for all ě,` ϕ pe q ϕ pe ` q ě ϕ pe q re E ` s ϕ pe q rf F ` s ě ϕ pe q C 0 4 The last relation combined with (78) imply for all ě 0 x ` x ď ϕ pe q dist `0, BF r `x, z, y, x, y x ` x ď 4 C 0,` dist `0, BF r `x, z, y, x, y. x ` x. (79) 0

21 By the arithmetic mean-geometric mean inequality and Corollary 0 we have that for every ě 0 and every β ą 0 x ` x ď c 4 C 0,` dist p0, BF r px, z, y, x, y qq We denote for every ě 3 ď β,` ` C 0 β dist `0, BF `x r, z, y, x, y ď β,` ` C ` x x ` x x ` x x 3 C 0 β ` C ` `y A y `y A y β ` C3 ` `y A y 3 `y A y. (80) β a : x x ě 0, δ : β,` ` C ` `y A y `y A y C 0 β ` C3 ` `y A y 3 A `y y. β The inequality (80) is nothing than (3) with c 0 c c : C β. Observe that for every K ě 0 we have δ ď β ϕ pe 0 q ` C C 0 β 0 `y A 0 y 0 C 3 ` `y A 0 y 0 3 β and thus, by choosing β ą 3C, we can use Lemma 3 to conclude that ÿ x ` x ă `8. The other two statements in (76) follow from Lemma 5. This means that the sequence `x, z, y ( is Cauchy, thus it converges to an element ppx, pz, pyq which is, according to Lemmas, a KKT point of the optimization problem (). Remar 4. The function F r is a K L function if, for instance, the objective function of () is semialgebraic, which is the case when the functions g and h are semi-algebraic. 3 Convergence rates under Lojasiewicz assumptions In this section we derive convergence rates for the sequence `x, z, y ( generated by Algorithm or Algorithm as well as for the regularized augmented Lagrangian function F r along this sequence, provided that the latter satisfies the Lojasiewicz property. 3. Lojasiewicz property and a technical lemma We recall the following definition from [] (see, also, [33]). Definition 3. Let Ψ: R N Ñ R Y t`8u be proper and lower semicontinuous. Then Ψ satisfies the Lojasiewicz property if for any critical point pu of Ψ, there exists C L ą 0, θ P r0, q and ε ą 0 such that Ψ puq Ψ ppuq θ ď C L distp0, P Ball ppu, εq, (8) where Ball ppu, εq denotes the open ball with center pu and radius ε.

22 Providing that the Assumption is fulfilled and `x, z, y ( is the sequence generated by Algorithm or Algorithm `x, which is assumed to be bounded, we have seen in Lemma that the set of cluster points Ω ω, z, y, x, y ( is nonempty, compact and connected and F r taes on Ω the value F ; moreover for any ppx, pz, py, px, pyq P Ω, ppx, pz, pyq belongs to crit pl r q. According to [, Lemma ], if F r has the Lojasiewicz property, then there exist C L ą 0, θ P r0, q and ε ą 0 such that for any `x, z, y, x, y P tu P R n ˆ R m ˆ R m : dist pu, Ωq ă εu, it holds Fr `x, z, y, x, y F θ ď C L dist `0, BF r `x, z, y, x, y. Obviously, F r is a K L function with desingularization function ϕ : r0, `8q Ñ r0, `8q such that ϕ psq : θ C Ls θ, which, according to Theorem 4, means that Ω contains a single element ppx, pz, py, px, pyq, namely, the limit of `x, z, y, x, y ( as Ñ `8. In other words, if F r has the Lojasiewicz property, then there exist C L ą 0, θ P r0, q and ε ą 0 such that Fr `x, z, y, x, y F θ ď C L dist `0, BF r `x, z, y, x, `x, z, y, x, y P Ball pppx, pz, py, px, pyq, εq. (8) In this case, F r is said to satisfy the Lojasiewicz property with Lojasiewicz constant C L ą 0 and Lojasiewicz exponent θ P r0, q. The following lemma will provides convergence rates for a particular class of monotonically decreasing sequences converging to 0. Lemma 5. Let te u be a monotonically decreasing sequence in R` converging 0. Assume further that there exists natural numbers 0 ě l 0 ě such that for every ě 0 e l0 e ě C e e θ, (83) where C e ą 0 is some constant and θ P r0, q. Then following statements are true: (i) if θ 0, then te u converges in finite time; (ii) if θ P p0, {s, then there exists C e,0 ą 0 and Q P r0, q such that for every ě 0 0 ď e ď C e,0 Q ; (iii) if θ P p{, q, then there exists C e, ą 0 such that for every ě 0 ` l 0 0 ď e ď C e, p l 0 ` q θ. Proof. Fix an integer ě 0. Since 0 ě l 0 ě 0, the recurrence inequality (83) is well defined for every ě 0. (i) The case when θ 0. We assume that e ą 0 for every ě 0. From (83) we get e l0 e ě C e ą 0, for every ě 0, which actually leads to contradiction to the fact that te u converges to 0 as Ñ `8. Consequently, there exists ě 0 such that e 0 for every ě and thus the conclusion follows. For the proof of (ii) and (iii) we can assume that e ą 0 for every ě 0. Otherwise, as te u is monotonically decreasing and converges to 0, the sequence is constant beginning with a given index, which means that both statements are true. (ii) The case when θ P p0, {s. We have e ď e 0, thus e0 θ e ď e θ, which leads to Therefore e ď C e e θ 0 ` e l ď ď ˆ ď C e e θ 0 ` e l0 e ě C e e θ ě C e e θ 0 ě 0. ˆ C e e θ 0 ` ] Y 0 l 0 l 0 0 l0 e 0 e 0 `Ce e θ 0 ` 0 max te 0`j : j 0,..., l 0 u l0 ` l bc 0 e e0 θ `,

23 where tpu denotes the greatest integer that is less than or equal to the real number p. This provides the linear convergence rate, as P r0, q. l bc 0 e e θ 0 ` (iii) The case when θ P p{, q. From (83) we get C e ď pe l0 e q e θ. (84) Define ζ : p0, `8q Ñ R, ζpsq s θ. We have that ˆ d s θ s θ ζ psq and ζ psq θs θ ă P p0, `8q. ds θ Consequently, ζ pe l0 q ď ζ psq for all s P re, e l0 s. Assume that ζ pe q ď ζ pe l0 q. Then (84) gives or, equivalently, Assume that ζ pe q ą ζ pe l0 q. equivalent to C e ď pe l0 e q ζ pe q ď pe l0 e q ζ pe l q ż e l0 ζ pe l0 q ds ď e `e θ e θ l θ 0 ż e l0 e ζ psq ds e θ e θ l 0 ě C, where C : pθ q C e ą 0. (85) In other words, eθ l 0 ą e θ. For ν : θ νe l0 ě e ô ν θ e θ l 0 ď e θ ô `ν θ e θ l 0 ď e θ e θ l 0. P p0, q this is Recall that ν θ ą 0, since θ ă 0, and e θ 0 ď e θ l 0, since te u is monotonically decreasing, and thus e θ e θ l 0 ě `ν θ e θ l 0 ě C, where C : `ν θ e θ 0 ą 0. (86) In both situations we get for every i ě 0 e θ i e θ i l 0 ě C : min C, C ( ą 0, (87) where C and C are defined as in (85) and (86), respectively. For every ě 0 ` l 0, by summing up the inequalities (87) for i 0 ` l 0,,, we get l 0 ÿ j 0 e θ j e θ 0`j ě C p 0 l 0 ` q ą 0. Using the fact that θ ă 0 and the monotonicity of te i u iě0, it yields and thus which gives e 0`l 0 ď ď e 0 ô e θ 0`l 0 ě ě e θ 0 ô e θ 0 ě ě e θ 0`l 0 `e θ l 0 e θ 0 ě l 0 ÿ j 0 Moreover, we obtain from (87) that Z 0 ` l 0 ě e θ 0 e θ j e θ 0`j ě C p 0 l 0 ` q, e θ ě e θ 0 ` 0 l 0 ` l 0 C. (88) l 0 ^ ˆ0 C ` l 0 ě C 0 C. (89) l 0 l 0 3

24 By plugging (89) into (88) we obtain e θ ě l 0 ` l 0 C, which implies This concludes the proof. θ ˆC e ď p l0 ` q θ. (90) l 0 Remar 5. The inequality in Lemma 5 (iii) can be writen in term of instead of l 0 ` when large enough. For instance, when ě and thus from (90) we get 3. Convergence rates γ γ pl 0 ` q for some γ ą then we have that l 0 ` ě γ ˆ θ ˆC e ď p l0 ` q C θ θ ď l 0 γ θ. l 0 In this subsection we will study the convergence rates of Algorithm and in the context of an assumption which is slightly more restricitve than Assumption. Assumption. We wor in the hypotheses of Assumption except for (9) which is replaced by M 3 : M ` ra A ` pl C q Id ě 5 C ě 0, (9) Notice that (9) can be written as $ & 0µ M `ra A `L ` r CM Id ě ě 0, where C ` 8 pl ` µ q T, for Algorithm, M : % 8µ ` 0 pl ` µ q T, for Algorithm. (9) Therefore (9) is nothing else than (0) after replacing C M by the bigger constant CM. So, all the examples in Remar can be adapted to the new setting and provide framewors which guarantee Assumption. The scenarios which ensure Assumption evidently satisfy Assumption, therefore the results investigated in Section remain valid in this setting. As follows we will provide improvements of the statements used in the convergence analysis which can be obtained thans to Assumptions by using similar techniques. Firstly, by the same arguments as in Lemma 4, we have that for every ě (see ()) L `x` r, z `, y ` ` x ` x L M x ` `ra A x ` z ` z M ď L `x r, z, y ` y ` y (93) and (see (3), (33), (4) and (43)) y ` y ď C L x ` x ` C0 x x ` T 0 A `y y T 0 A `y ` y. (94) By multiplying (94) by and by adding the resulting inequality to (93), we obtain for every ě L `x` r, z `, y ` ` x ` x ` M 3 T 0 A `y ` y ` y ` y z ` z M ` ď L r `x, z, y ` T 0 A `y y ` C 0 x x. (95) 4

A proximal minimization algorithm for structured nonconvex and nonsmooth problems

A proximal minimization algorithm for structured nonconvex and nonsmooth problems Radu Ioan Boţ Ernö Robert Csetnek Dang-Khoa Nguyen May 8, 08 Abstract. We propose a proximal algorithm for minimizing objective