A proximal minimization algorithm for structured nonconvex and nonsmooth problems

Size: px

Start display at page:

Download "A proximal minimization algorithm for structured nonconvex and nonsmooth problems"

Timothy Gibson
5 years ago
Views:

1 A proximal minimization algorithm for structured nonconvex and nonsmooth problems Radu Ioan Boţ Ernö Robert Csetnek Dang-Khoa Nguyen May 8, 08 Abstract. We propose a proximal algorithm for minimizing objective functions consisting of three summands: the composition of a nonsmooth function with a linear operator, another nonsmooth function, each of the nonsmooth summands depending on an independent block variable, and a smooth function which couples the two block variables. The algorithm is a full splitting method, which means that the nonsmooth functions are processed via their proximal operators, the smooth function via gradient steps, and the linear operator via matrix times vector multiplication. We provide sufficient conditions for the boundedness of the generated sequence and prove that any cluster point of the latter is a KKT point of the minimization problem. In the setting of the Kurdyka- Lojasiewicz property we show global convergence, and derive convergence rates for the iterates in terms of the Lojasiewicz exponent. Key Words. structured nonconvex and nonsmooth optimization, proximal algorithm, full splitting scheme, Kurdyka- Lojasiewicz property, limiting subdifferential AMS subject classification. 65K0, 90C6, 90C30 Introduction. Problem formulation and motivation In this paper we propose a full splitting algorithm for solving nonconvex and nonsmooth problems of the form min tf paxq ` G pyq ` H px, yqu, (.) px,yqprmˆrq where F : R p Ñ R Y t`8u and G: R q Ñ R Y t`8u are proper and lower semicontinuous functions, H : R mˆr q Ñ R is a Fréchet differentiable function with Lipschitz continuous gradient, and A: R m Ñ R p is a linear operator. It is noticeable that neither for the nonsmooth nor for the smooth functions convexity is assumed. In case m p and A is the identity operator, Bolte, Sabach and Teboulle formulated in [9], also in the nonconvex setting, a proximal alternating linearization method (PALM) for solving (.). PALM is a proximally regularized variant of the Gauss-Seidel alternating minimization scheme and basically consists of two proximal-gradient steps. It had a significant impact in the optimization community, as it can be used to solve a large variety of nonconvex and nonsmooth problems arising in applications such as: matrix factorization, image deblurring and denoising, the feasibility problem, compressed sensing, etc. An inertial version of PALM has been proposed by Pock and Sabach in [3]. A naive approach of PALM for solving (.) would require the calculation of the proximal operator of the function F A, for which, in general, even in the convex case, a closed formula is not available. In the last decade, an impressive progress can be noticed in the field of primal-dual/proximal ADMM algorithms, designed to solve convex optimization problems involving compositions with linear operators in the spirit of the full splitting paradigm. One of the pillars of this development is the conjugate duality theory which is available for convex optimization problems. Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz, 090 Vienna, Austria, radu.bot@ univie.ac.at. Research partially supported by FWF (Austrian Science Fund), project I 49-N3. Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz, 090 Vienna, Austria, ernoe. robert.csetnek@univie.ac.at. Research supported by FWF (Austrian Science Fund), project P 9809-N3. Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz, 090 Vienna, Austria, dang-khoa. nguyen@univie.ac.at. Research supported by the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by FWF (Austrian Science Fund), project W60-N35.

2 The algorithm which we propose in this paper for solving the nonconvex and nonsmooth problem(.) is a full splitting scheme, too; the nonsmooth functions are processed via their proximal operators, the smooth function via gradient steps, and the linear operator via matrix times vector multiplication. In case Gpyq 0 and Hpx, yq Hpxq for any px, yq P R mˆr q, where H : R m Ñ R is a Fréchet differentiable function with Lipschitz continuous gradient, it furnishes a full splitting iterative scheme for solving the nonsmooth and nonconvex optimization problem min tf paxq ` H pxqu. (.) xprm Splitting algorithms for solving problems of the form (.) have been considered in [9], under the assumption that H is twice continuously differentiable with bounded Hessian, in [5], under the assumption that one of the summands is convex and continuous on its effective domain, and in [3], as a particular case of a general nonconvex proximal ADMM algorithm. We would like to mention in this context also [0] for the case when A is nonlinear. The convergence analysis we will carry out in this paper relies on a descent inequality, which we prove for a regularization of the augmented Lagrangian L β : R m ˆ R q ˆ R p ˆ R p Ñ R Y t`8u L β px, y, z, uq F pzq ` G pyq ` H px, yq ` xu, Ax zy ` β Ax z, β ą 0, associated with problem (.). This is obtained by an appropriate tuning of the parameters involved in the description of the algorithm. In addition, we provide sufficient conditions in terms of the input functions F, G and H for the boundedness of the generated sequence of iterates. We also show that any cluster point of this sequence is a KKT point of the optimization problem (.). By assuming that the above-mentioned regularization of the augmented Lagrangian satisfies the Kurdyka- Lojasiewicz property, we prove global convergence. If this function satisfies the Lojasiewicz property, then we can even derive convergence rates for the sequence of iterates formulated in terms of the Lojasiewicz exponent. For similar approaches relying on the use of the Kurdyka- Lojasiewicz property in the proof of the global convergence of nonconvex optimization algorithms we refer to the papers of Attouch and Bolte [], Attouch, Bolte and Svaiter [3], and Bolte, Sabach and Teboulle [9].. Notations and preliminaries Every space R d, where d is a positive integer, is assumed to be equipped with the Euclidean inner product x, y and associated norm a x, y. The Cartesian product R d ˆ R d ˆ... ˆ R d k of the Euclidean spaces R di, i,..., k, will be endowed with inner product and associated norm defined for x : px,..., x k q, y : py,..., y k q P R d ˆ R d ˆ... ˆ R d k by g kÿ fÿ x, y xx i, y i y and x e k x i, i respectively. For every x : px,..., x k q P R d ˆ R d ˆ... ˆ R d k we have g k ÿ fÿ? x i ď x e k kÿ x i ď x i. (.3) k i Let ψ : R d Ñ R Y t`8u be a proper and lower semicontinuous function and x an element of its effective domain domψ : y P R d : ψ pyq ă `8 (. The Fréchet (viscosity) subdifferential of ψ at x is p Bψ pxq : "d P R d : lim inf yñx and the limiting (Mordukhovich) subdifferential of ψ at x is i i i ψ pyq ψ pxq xd, y xy y x Bψ pxq : td P R d : exist sequences x n Ñ x and d n Ñ d as n Ñ `8 For x R domψ, we set p Bψ pxq Bψ pxq : H. * ě 0 such that ψ px n q Ñ ψ pxq as n Ñ `8 and d n P p Bψ px n q for any n ě 0u.

3 The inclusion p Bψ pxq Ď ψ pxq holds for each x P R d. If ψ is convex, then the two subdifferentials coincide with the convex subdifferential of ψ, thus p Bψ pxq Bψ pxq d P R d : ψ pyq ě ψ pxq ` xd, y P R d( for any x P R d. If x P R d is a local minimum of ψ, then 0 P Bψ pxq. We denote by crit pψq : x P R d : 0 P Bψ pxq ( the set of critical points of ψ. The limiting subdifferential fulfils the following closedness criterion: if tx n u ně0 and td n u ně0 are sequence in R d such that d n P Bψ px n q for any n ě 0 and px n, d n q Ñ px, dq and ψ px n q Ñ ψ pxq as n Ñ `8, then d P Bψ pxq. We also have the following subdifferential sum formula (see [, Proposition.07], [4, Exercise 8.8]): if Φ: R d Ñ R is a continuously differentiable function, then B pψ ` φq pxq Bψ pxq ` φ pxq for any x P R d ; and a formula for the subdifferential of the composition of ψ with a linear operator A: R k Ñ R d (see [, Proposition.], [4, Exercise 0.7]): if A is injective, then B pψ Aq pxq A T Bψ paxq for any x P R k. The following proposition collects some important properties of a (not necessarily convex) Fréchet differentiable function with Lipschitz continuous gradient. For the proof of this result we refer to [3, Proposition ]. Proposition. Let ψ : R d Ñ R be Fréchet differentiable such that its gradient is Lipschitz continuous with constant l ą 0. Then the following statements are true: piq For every x, y P R d and every z P rx, ys tp tqx ` ty : t P r0, su it holds piiq For any γ P Rz t0u it holds inf xpr d ψ pyq ď ψ pxq ` x ψ pzq, y xy ` l y x ; (.4) " ˆ ψ pxq γ l * γ ψ pxq ě inf ψ pxq. (.5) xpr d The Descent Lemma, which says that for a Fréchet differentiable function ψ : R d Ñ R having a Lipschitz continuous gradient with constant l ą 0 it holds ψ pyq ď ψ pxq ` x ψ pxq, y xy ` l y y P R d, follows from (.4) for z : x. In addition, by taking in (.4) z : y we obtain ψ pxq ě ψ pyq ` x ψ pyq, x yy l x y P R d. This is equivalent to the fact that ψ` l is a convex function, which is the same with ψ is l-semiconvex ([8]). In other words, a consequence of Proposition () is, that a Fréchet differentiable function with l-lipschitz continuous gradient is l-semiconvex. We close ths introductory section by presenting two convergence results for real sequences that will be used in the sequel in the convergence analysis. The following lemma is useful when proving convergence of numerical algorithms relying on Fejér monotonicity techniques (see, for instance, [, Lemma.], [, Lemma ]). Lemma. Let tξ n u ně0 be a sequence of real numbers and tω n u ně0 a sequence of real nonnegative numbers. Assume that tξ n u ně0 is bounded from below and that for any n ě 0 Then the following statements hold: ξ n` ` ω n ď ξ n. piq the sequence tω n u ně0 is summable, namely ÿ ně0 ω n ă `8; piiq the sequence tξ n u ně0 is monotonically decreasing and convergent. The following lemma can be found in [, Lemma.3] (see, also [, Lemma 3]). Lemma 3. Let ta n u ně0 and tb n u ně be sequences of real nonnegative numbers such that for any n ě where χ 0 P R and χ ě 0 fulfill χ 0 ` χ ă, and ÿ ně a n` ď χ 0 a n ` χ a n ` b n, (.6) b n ă `8. Then ÿ ně0 a n ă `8. 3

4 The algorithm The numerical algorithm we propose for solving (.) has the following formulation. Algorithm. Let µ, β, τ ą 0 and 0 ă σ ď. For a given starting point px 0, y 0, z 0, u 0 q P R m ˆ R q ˆ R p ˆ R p generate the sequence tpx n, y n, z n, u n qu ně0 for any n ě 0 as follows y n` P arg min G pyq ` x y H px n, y n q, yy ` µ y y n ) (.a) " z n` P arg min F pzq ` xu n, Ax n zy ` β * zpr p Ax n z (.b) x n` : x n τ ` x H px n, y n` q ` A T u n ` βa T pax n z n` q (.c) ypr q! u n` : u n ` σβ pax n` z n` q. (.d) The proximal point operator with parameter γ ą 0 (see []) of a proper and lower semicontinuous function ψ : R d Ñ R Y t`8u is the set-valued operator defined as " prox γψ : R d Ñ Rd, prox γψ pxq arg min ψ pyq ` * x y. ypr d γ Exact formulas for the proximal operator are available not only for large classes of convex functions ([4, 5, 4]), but also for various nonconvex functions ([, 5, 8]). In view of the above definition, the iterative scheme (.a) - (.d) reads for every n ě 0 y n` P prox µ G `yn µ y H px n, y n q z n` P prox β F `Axn ` β u n x n` : x n τ ` x H px n, y n` q ` A T u n ` βa T pax n z n` q u n` : u n ` σβ pax n` z n` q. One can notice the full splitting character of Algorithm and also that the first two steps can be performed in parallel. Remark. piq In case Gpyq 0 and Hpx, yq Hpxq for any px, yq P R m ˆ R q, where H : R m Ñ R is a Fréchet differentiable function with Lipschitz continuous gradient, Algorithm gives rise to an iterative scheme which has been proposed in [3] for solving the optimization problem (.). This reads for any n ě 0 z n` P prox β F `Axn ` β u n x n` : x n τ ` H px n q ` A T u n ` βa T pax n z n` q u n` : u n ` σβ pax n` z n` q. piiq In case m p and A Id is the identity operator on R m, Algorithm gives rise to an iterative scheme for solving min tf pxq ` G pyq ` H px, yqu, (.) px,yqprmˆrq which reads for any n ě 0 y n` P prox µ G `yn µ y H px n, y n q z n` P prox β F `xn ` β u n x n` : x n τ p x H px n, y n` q ` u n ` β px n z n` qq u n` : u n ` σβ px n` z n` q. This algorithm provides an alternative to PALM ([9]) for solving optimization problems of the form (.). piiiq In case m p, A Id, F pxq 0 and Hpx, yq Hpyq for any px, yq P R m ˆ R q, where H : R q Ñ R is a Fréchet differentiable function with Lipschitz continuous gradient, Algorithm gives rise to an iterative scheme for solving min tgpyq ` H pyqu, (.3) yprq 4

5 which reads for any n ě 0 y n` P prox µ G `yn µ Hpy n q, and is nothing else than the proximal-gradient method. An inertial version of the proximal-gradient method for solving (.3) in the fully nonconvex setting has been considered in [].. A descent inequality We will start with the convergence analysis of Algorithm () by proving a descent inequality, which will play a fundamental role in our investigations. We will analyse Algorithm () under the following assumptions, which we will be later even weakened. Assumption. piq the functions F, G and H are bounded from below; piiq the linear operator A is surjective; piiiq for any fixed y P R q there exists l pyq ě 0 such that x H px, yq x H `x, y ď l pyq x x P R m, (.4a) and for any fixed x P R m there exist l pxq, l 3 pxq ě 0 such that y H px, yq y H `x, y ď l pxq y y P R q, (.4b) x H px, yq x H `x, y ď l3 pxq y y P R q ; (.4c) pivq there exist l i,` ą 0, i,, 3, such that sup l py n q ď l,`, ně0 sup l px n q ď l,`, ně0 sup l 3 px n q ď l 3,`. (.5) ně0 Remark. Some comments on Assumption are in order. piq Assumption piq ensures that the sequence generated by Algorithm is well-defined. It has also as consequence that Ψ : inf tf pzq ` G pyq ` H px, yqu ą 8. (.6) px,y,zqˆrmˆrqˆrp piiq Comparing the assumptions in (iii) and (iv) to the ones in [9], one can notice the presence of the additional condition (.4c), which is essential in particular when proving the boundedness of the sequence of generated iterates. Notice that in iterative schemes of gradient type, proximal-gradient type or forward-backward-forward type (see [9,, ]) the boundedness of the iterates follow by combining a descent inequality expressed in terms of the objective function with coercivity assumptions on the later. In our setting this undertaken is less simple, since the descent inequality which we obtain below is in terms of the augmented Lagrangian associated with problem (.). piiiq The linear operator A is surjective if and only if its associated matrix has full row rank, which is the same with the fact that the matrix associated to AA T is positively definite. Since λ min `AA T z ď xaa T z, zy A T P R p, this is further equivalent to λ min `AA T ą 0, where λ min pmq denotes the minimal eigenvalue of a square matrix M. We also denote by κpmq the condition number, namely the ratio between the maximal eigenvalue λ max pmq and the minimal eigenvalue of the square matrix M, κ pmq : λ max pmq λ min pmq M λ min pmq ě. The convergence analysis will make use of the following regularized augmented Lagrangian function Ψ: R m ˆ R q ˆ R p ˆ R p ˆ R m ˆ R p Ñ R Y t`8u, 5

6 defined as `x, y, z, u, x, u ÞÑ F pzq ` G pyq ` H px, yq ` xu, Ax zy ` β Ax z ` C 0 A T `u u ` σb `x x ` C x x, where Notice that B : τid βa T A, C 0 : 4 p σq σ βλ min paa T q ě 0 and C : 8 pστ ` l,`q σβλ min paa T q ą 0. B ď τ, whenever τ ě β A. Indeed, this is a consequence of the relation }Bx} τ }x} τβ}ax} ` β }A T Ax} ď τ }x} ` βpβ}a} P R m. For simplification, we introduce the following notations R : R m ˆ R q ˆ R p ˆ R p ˆ R m ˆ R p X : `x, y, z, u, x, u X n : px n, y n, z n, u n, x n, u n ě Ψ n : Ψ px n ě. The next result provides the announced descent inequality. Lemma 4. Let Assumption be satisfied, τ ě β A and tpx n, y n, z n, u n qu ně0 be a sequence generated by Algorithm. Then for any n ě it holds where Ψ n` ` C x n` x n ` C 3 y n` y n ` C 4 u n` u n ď Ψ n, (.7) C : τ l,` ` β A 4στ βλ min paa T q 8 pστ ` l,`q σβλ min paa T q, C 3 : µ l,` C 4 : σβ. 8l 3,` σβλ min paa T q, (.8a) (.8b) (.8c) Proof. Let n ě be fixed. We will show first that F pz n` q ` G py n` q ` H px n`, y n` q ` xu n`, Ax n` z n` y ` β Ax n` z n` ` τ l,` ` β A x n` x n ` µ l,` y n` y n ` σβ u n` u n ď F pz n q ` G py n q ` H px n, y n q ` xu n, Ax n z n y ` β Ax n z n ` σβ u n` u n (.9) and provide afterwards an upper estimate for the term u n` u n on the right-hand side of (.9). From (.a) and (.b) we obtain and G py n` q ` x y H px n, y n q, y n` y n y ` µ y n` y n ď G py n q (.0) F pz n` q ` xu n, Ax n z n` y ` β Ax n z n` ď F pz n q ` xu n, Ax n z n y ` β Ax n z n (.) 6

7 respectively. On the other hand, according to the Descent Lemma we have H px n, y n` q ď H px n, y n q ` x y H px n, y n q, y n` y n y ` l px n q and, further, by taking into consideration (.c), y n` y n ď H px n, y n q ` x y H px n, y n q, y n` y n y ` l,` y n` y n H px n`, y n` q ď H px n, y n` q ` x x H px n, y n` q, x n` x n y ` l py n` q x n` x n H px n, y n` q xu n, Ax n` Ax n y β xax n z n`, Ax n` Ax n y ˆ τ l py n` q x n` x n ď H px n, y n` q xu n, Ax n` Ax n y ` β Ax n z n` β Ax n` z n` τ l,` ` β A x n` x n. Summing this inequality with (.0) and (.) gives (.9). Next we will focus on estimating u n` u n. Combining (.c) and (.d), we obtain and A T u n` ` σb px n` x n q p σq A T u n σ x H px n, y n` q A T u n ` σb px n x n q p σq A T u n σ x H px n, y n q. Subtracting these relations and making use of the notations it yields w n : A T pu n u n q ` σb px n x n q v n : σb px n x n q ` x H px n, y n q x H px n, y n` q, w n` p σq w n ` σv n. The convexity of guarantees that (notice that 0 ă σ ď ) w n` ď p σq w n ` σ v n. (.) In addition, from the definitions of w n and v n, we obtain A T pu n` u n q ď w n` ` σ B x n` x n ď w n` ` στ x n` x n (.3) and v n ď σ B x n x n ` x H px n, y n q x H px n, y n` q ď στ x n x n ` x H px n, y n q x H px n, y n q ` x H px n, y n q x H px n, y n` q ď pστ ` l,`q x n x n ` l 3,` y n` y n (.4) respectively. Using the Cauchy-Schwarz inequality, (.3) yields T λ min `AA u n` u n ď A T pu n` u n q ď w n` ` σ τ x n` x n and (.4) yields v n ď pστ ` l,`q x n x n ` l 3,` y n` y n. After combining these two inequalities with (.), we get T σλ min `AA u n` u n ` p σq w n` ď p σq w n ` σ 3 τ x n` x n ` σ pστ ` l,`q x n x n ` σl 3,` y n` y n. The desired statement follows after we multiply the above relation by the resulting inequality with (.9). 4 σ βλ min paa T ą 0 and combine q 7

8 The following result provides one possibility to choose the parameters in Algorithm, such that all three constants C, C 3 and C 4 that appear in (.7) are positive. Lemma 5. Let 4ν β ą 4σκ paa T q # β A max, βλ min `AA T 4σ 0 ă σ ă 4κ paa T q ˆ 4 ` 3σ ` ˆ 6ν β a τ b 4 ` 4σ ` 9σ 9σκ paa T q + ą 0 ă τ ă βλ T min `AA ˆ 6ν 4σ β ` a τ (.5a) (.5b) (.5c) where ν : Then we have µ ą l,` ` 6l 3,` σβλ min paa T q ą 0, (.5d) l,` λ min paa T q ą 0 and τ : 3ν β 8ν β 4νσ 4σκ `AA T ą 0. (.5e) β Furthermore, there exist γ, γ P Rz t0u such that Proof. We will prove first that l,` γ γ βλ min paa T q C ą 0 ô 4στ βλ min paa T q ˆ min tc, C 3, C 4 u ą 0. and 6l,` βλ min paa T τ ` q l,` γ γ βλ min paa T q. (.6) 6l,` σβλ min paa T q ` l,` ` β A ă 0. (.7) The reduced discriminant of the quadratic function in τ in the above relation reads ˆ 6l,` τ : ˆ 6ν β 3ν β βλ min paa T q 384ν 8ν β β 4νσ β 384l,` β λ min paat q 4νσ β 4σκ `AA T 4l,`σ T βλ min paa T 4σκ `AA q 4σκ `AA T ą 0, (.8) if σ and β are being chosen as in (.5a) and (.5b), respectively. Therefore, for T βλ min `AA ˆ 6ν 4σ β a τ ă τ ă βλ T min `AA 4σ ˆ 6ν β ` a τ (.7) is satisfied. It remains to verify the feasibility of τ in (.5c), in other words, to prove that β A ă βλ min `AA T ˆ 6ν 4σ β ` a τ. This is easy to see, as, according to (.8), we have β A ă βλ min `AA T ˆ 6ν ô 6ν 4σ β β σκ `AA T ą 0. The positivity of C 3 follows from the choice of µ in (.5d), while, obviously, C 4 ą 0. Finally, as 4ν β ą 4σκ paa T q ą 4ν, it follows that each of the two quadratic equations in (.6) (in γ and, respectively, γ ) has a nonzero real solution., 8

9 Remark 3. Hong and Luo proved recently in [6] linear convergence for the iterates generated by a Lagrangian-based algorithm in the convex setting, without any strong convexity assumption. To this end a certain error bound condition must hold true and the step size of the dual update, which is also assumed to depend on the error bound constants, must be taken small. The authors also mention that this choice of the dual step size may be too conservative and cumbersome to compute unless the objective function is strongly convex. As shown in previous lemma, the step size of the dual update in our algorithm can be computed without assuming strong convexity and indeed it depends only on the linear operator A. Theorem 6. Let Assumption be satisfied and the parameters in Algorithm be such that τ ě β A and the constants defined in Lemma 4 fulfil mintc, C 3, C 4 u ą 0. If tpx n, y n, z n, u n qu ně0 is a sequence generated by Algorithm, then the following statements are true: piq the sequence tψ n u ně is bounded from below and convergent; piiq x n` x n Ñ 0, y n` y n Ñ 0, z n` z n Ñ 0 and u n` u n Ñ 0 as n Ñ `8. (.9) Proof. First, we show that Ψ defined in (.6) is a lower bound of tψ n u ně. Suppose the contrary, namely that there exists n 0 ě such that Ψ n0 Ψ ă 0. According to Lemma 4, tψ n u ně is a nonincreasing sequence and thus for any N ě n 0 which implies that Nÿ pψ n Ψq ď n nÿ 0 n On the other hand, for any n ě it holds lim NÑ`8 n pψ n Ψq ` pn n 0 ` q pψ n0 Ψq, Nÿ pψ n Ψq 8. Ψ n Ψ ě F pz n q ` G py n q ` H px n, y n q ` xu n, Ax n z n y Ψ ě xu n, Ax n z n y σβ xu n, u n u n y σβ u n ` σβ u n u n σβ u n. Therefore, for any N ě, we have Nÿ n pψ n Ψq ě σβ Nÿ n u n u n ` σβ u N σβ u 0 ě σβ u 0, which leads to a contradiction. As tψ n u ně is bounded from below, we obtain from Lemma statement piq and also that Since for any n ě it holds x n` x n Ñ 0, y n` y n Ñ 0 and u n` u n Ñ 0 as n Ñ `8. z n` z n ď A x n` x n ` Ax n` z n` ` Ax n z n A x n` x n ` σβ u n` u n ` σβ u n u n, (.0) it follows that z n` z n Ñ 0 as n Ñ `8. Usually, for nonconvex algorithms, the fact that the sequences of differences of consecutive iterates converge to zero is shown by assuming that the generated sequences are bounded (see [3, 9, 5]). In our analysis the only ingredients for obtaining statement (ii) in Theorem 6 are the descent property and Lemma. 9

10 . General conditions for the boundedness of tpx n, y n, z n, u n qu ně0 In the following we will formulate general conditions in terms of the input data of the optimization problem (.) which guarantee the boundedness of the sequence tpx n, y n, z n, u n qu ně0. Working in the setting of Theorem 6, thanks to (.9), we have that the sequences tx n` x n u ně0, ty n` y n u ně0, tz n` z n u ně0 and tu n` u n u ně0 are bounded. Denote s : sup t x n` x n, y n` y n, z n` z n, u n` u n u ă `8. ně0 Even though this observation does not imply immediately that tpx n, y n, z n, u n qu ně0 is bounded, this will follow under standard coercivity assumptions. Recall that a function ψ : R d Ñ R Y t`8u is called coercive, if lim x Ñ`8 ψ pxq `8. Theorem 7. Let Assumption be satisfied and the parameters in Algorithm be such that τ ě β A, the constants defined in Lemma 4 fulfil mintc, C 3, C 4 u ą 0 and there exist γ, γ P Rzt0u such that (.6) holds. Suppose that one of the following conditions hold: piq the function H is coercive; piiq the operator A is invertible, and F and G are coercive. Then every sequence tpx n, y n, z n, u n qu ně0 generated by Algorithm is bounded. Proof. Let n ě be fixed. According to Lemma 4 we have that Ψ ě... ě Ψ n ě Ψ n` ě F pz n` q ` G py n` q ` H px n`, y n` q β u n` ` β Ax n` z n` ` β u n`. (.) Combine (.c) and (.d) we get ˆ A T u n` A T pu n` u n q ` B px n x n` q σ ` x H px n`, y n` q x H px n, y n` q x H px n`, y n` q, (.) which implies ˆ A T u n` ď σ A u n` u n ` pτ ` l,`q x n` x n ` x H px n`, y n` q ˆˆ ď σ A ` τ ` l,` s ` x H px n`, y n` q. By using the Cauchy-Schwarz inequality we further obtain T λ min `AA u n` ď ˆˆ A T u n` ď σ A ` τ ` l,` s ` x H px n`, y n` q. Multiplying the above relation by βλ min paa T q and combining it with (.), we get Ψ ě F pz n` q ` G py n` q ` H px n`, y n` q βλ min paa T q xh px n`, y n` q ˆˆ βλ min paa T q σ A ` τ ` l,` s ` β Ax n` z n` ` β u n`. (.3) We will prove the boundedness of tpx n, y n, z n, u n qu ně0 in each of the two scenarios. 0

11 piq According to (.3) and Proposition, we have that for any n ě H px n`, y n` q ` β Ax n` z n` ` β u n` ˆˆ ď Ψ ` βλ min paa T q σ A ` τ ` l,` s inf F pzq inf G pyq zprp ypr m " ˆ * inf H px n`, y n` q l,` ně γ γ x H px n`, y n` q ˆˆ ď Ψ ` βλ min paa T q σ A ` τ ` l,` s inf F pzq inf G pyq zprp ypr q ă ` 8. px,yqpr " Since H is coercive and bounded from below, it follows that tpx n, y n qu ně0 and Ax n z n ` inf H px, yq mˆrq are bounded. As, according to (.d), tax n z n u ně0 is bounded, it follows that tu n u ně0 and tz n u ně0 are also bounded. piiq According to (.3) and Proposition, we have this time that for any n ě F pz n` q ` G py n` q ` β Ax n` z n` ` β u n` ˆˆ ď Ψ ` βλ min paa T q σ A ` τ ` l,` s " ˆ * inf H px n`, y n` q l,` ně γ γ x H px n`, y n` q ˆˆ ď Ψ ` βλ min paa T q σ A ` τ ` l,` s inf H px, yq ă `8. px,yqprmˆrq Since F and G are coercive and bounded from below, it follows that the sequences tpy n, z n qu " ně0 and Ax n z n ` * β u n are bounded. As, according to (.d), tax n z n u ně0 is bounded, it ně0 follows that tu n u ně0 and tax n u ně0 are bounded. The fact that A is invertible implies that tx n u ně0 is bounded..3 The cluster points of tpx n, y n, z n, u n qu ně0 are KKT points We will close this section dedicated to the convergence analysis of the sequence generated by Algorithm in a general framework by proving that any cluster point of tpx n, y n, z n, u n qu ně0 is a KKT point of the optimization problem (.). We provided above general conditions which guarantee both the descent inequality (.7), with positive constants C, C 3 and C 4, and the boundedness of the generated iterates. Lemma 5 and Theorem 7 provide one possible setting that ensures these two fundamental properties of the convergence analysis. We do not want to restrict ourselves to this particular setting and, therefore, we will work, from now on, under the following assumptions. Assumption. piq the functions F, G and H are bounded from below; piiq the linear operator A is surjective; piiiq every sequence tpx n, y n, z n, u n qu ně0 generated by the Algorithm is bounded: pivq H is Lipschitz continuous with constant L ą 0 on a convex bounded subset B ˆ B Ď R m ˆ R q containing tpx n, y n qu ně0. In other words, for any px, yq, px, y q P B ˆ B it holds ` x H px, yq x H `x, y, y H px, yq y H `x, y ď L px, yq `x, y ; (.4) β u n * ně0

12 pvq the parameters µ, β, τ ą 0 and 0 ă σ ď are such that τ ě β}a} and mintc, C 3, C 4 u ą 0, where C : τ L? ` β A 4στ βλ min paa T q 8? `στ ` L σβλ min paa T q, C 3 : µ L? C 4 : σβ. 6L σβλ min paa T q, (.5a) (.5b) (.5c) Remark 4. Being facilitated by the boundedness of the generated sequence, Assumption pivq not only guarantee the fulfilment of Assumption piiiq and pivq on a convex bounded set, but it also arises in a more natural way (see also [9]). Assumption pivq holds, for instance, if H is twice continuously differentiable. In addition, as (.4) implies for any px, yq, px, y q P B ˆ B that x H px, yq x H `x, y ` y H px, yq y H `x, y ď L? ` x x ` y y, we can take l,` l,` l 3,` : L?. (.6) As (.4a) - (.4c) are valid also on a convex bounded set, the descent inequality Ψ n` ` C x n` x n ` C 3 y n` y n ` C 4 u n` u n ď Ψ ě (.7) remains true, where the constants on the left-hand sided are given in (.5) and follow from (.8) under the consideration of (.6). A possible choice of the parameters of the algorithm such that min tc, C 3, C 4 u ą 0 can be obtained also from Lemma 5. The next result provide upper estimates for the limiting subgradients of the regularized function Ψ at px n, y n, z n, u n q for every n ě. Lemma 8. Let Assumption be satisfied and tpx n, y n, z n, u n qu ně0 be a sequence generated by Algorithm. Then for any n ě it holds where D n : `d n x, d n y, d n z, d n u, d n x, dn u P BΨ pxn q, (.8) d n x : x H px n, y n q ` A T u n ` βa T pax n z n q ` C px n x n q ` σc 0 B T `A T pu n u n q ` σb px n x n q, d n y : y H px n, y n q y H px n, y n q ` µ py n y n q, d n z : u n u n ` βa px n x n q, d n u : Ax n z n ` C 0 A `A T pu n u n q ` σb px n x n q, (.9a) (.9b) (.9c) (.9d) d n x : σc 0B T `A T pu n u n q ` σb px n x n q C px n x n q, (.9e) d n u : C 0A `A T pu n u n q ` σb px n x n q. (.9f) In addition, for any n ě it holds where D n ď C 5 x n x n ` C 6 y n y n ` C 7 u n u n, (.30) C 5 :? L ` τ ` β A ` 4 pστ ` A q στc 0 ` 4C, C 6 : L? ` µ, C 7 : ` σβ ` ˆ σ A ` 4 pστ ` A q C 0 A. (.3a) (.3b) (.3c)

13 Proof. Let n ě be fixed. Applying the calculus rules of the limiting subdifferential we get x Ψ px n q x H px n, y n q ` A T u n ` βa T pax n z n q ` C px n x n q ` σc 0 B `A T T pu n u n q ` σb px n x n q, B y Ψ px n q BG py n q ` y H px n, y n q, B z Ψ px n q BF pz n q u n β pax n z n q, u Ψ px n q Ax n z n ` C 0 A `A T pu n u n q ` σb px n x n q, x Ψ px n q σc 0 B `A T T pu n u n q ` σb px n x n q C px n x n q, u Ψ px n q C 0 A `A T pu n u n q ` σb px n x n q. (.3a) (.3b) (.3c) (.3d) (.3e) (.3f) Then (.9a) and (.9d) - (.9f) follow directly from (.3a) and (.3d) - (.3f), respectively. By combining (.3b) with the optimality criterion for (.a) 0 P G py n q ` y H px n, y n q ` µ py n y n q, we obtain (.9b). Similarly, by combining (.3c) with the optimality criterion for (.b) 0 P F pz n q u n β pax n z n q, we get (.9c). In the following we will derive the upper estimates for the components of the limiting subgradient. From (.) it follows d n x ď x H px n, y n q ` A T u n ` β A Ax n z n ` `C ` σ τ C 0 xn x n In addition, we have ` στc 0 A u n u n ď L? ˆ ` τ ` C ` σ τ C 0 x n x n ` σ ` στc 0 A u n u n. d n y ď L? xn x n ` L? ` µ y n y n, d n z ď β A x n x n ` u n u n, ˆ d n u ď στc 0 A x n x n ` σβ ` C 0 A u n u n, d n x ď `σ τ C 0 ` C xn x n ` στc 0 A u n u n, d n u ď στc 0 A x n x n ` C 0 A u n u n. The inequality (.30) follows by combining the above relations with (.3). We denote by Ω : Ω `tx n u ně the set of cluster points of the sequence txn u ně Ď R, which is nonempty thanks to the boundedness of tx n u ně. The distance function of the set Ω is defined for any X P R by dist px, Ωq : inf t X Y : Y P Ωu. The main result of this section follows. Theorem 9. Let Assumption be satisfied and tpx n, y n, z n, u n qu ně0 be a sequence generated by Algorithm. The following statements are true: piq if tpx nk, y nk, z nk, u nk qu kě0 is a subsequence of tpx n, y n, z n, u n qu ně0 which converges to px, y, z, u q as k Ñ `8, then lim kñ`8 Ψ n k Ψ px, y, z, u, x, u q ; piiq it holds Ω Ď crit pψq Ď tx P R : A T u x H px, y q, 0 P BG py q ` y H px, y q, u P BF pz q, z Ax u, (.33) where X : px, y, z, u, x, u q; 3

14 piiiq it holds lim dist px n, Ωq 0; nñ`8 pivq the set Ω is nonempty, connected and compact; pvq the function Ψ takes on Ω the value Ψ lim Ψ n lim tf pz nq ` G py n q ` H px n, y n qu. nñ`8 nñ`8 Proof. Let px, y, z, u q P R m ˆ R q ˆ R p ˆ R p be such that the subsequence tx nk of tx n u ně converges to X : px, y, z, u, x, u q. (i) From (.a) and (.b) we have for any k ě and : px nk, y nk, z nk, u nk, x nk, u nk qu kě G py nk q ` x y H px nk, y nk q, y nk y nk y ` µ y n k y nk ď G py q ` x y H px nk, y nk q, y y nk y ` µ y y nk F pz nk q ` xu nk, Ax nk z nk y ` β Ax n k z nk ď F pz q ` xu nk, Ax nk z y ` β Axnk z, respectively. From (.d) and Theorem 6 follows Ax z. Taking the limit superior as k Ñ `8 on both sides of the above inequalities, we get lim sup kñ`8 F pz nk q ď F pz q and lim sup G py nk q ď G py q kñ`8 which, combined with the lower semicontinuity of F and G, lead to lim F pz n k q F pz q and lim G py n k q G py q. kñ`8 kñ`8 The desired statement follows thanks to the continuity of H. (ii) For the sequence td n u ně0 defined in (.8) - (.9), we have that D nk P BΨ px nk q for any k ě and D nk Ñ 0 as k Ñ `8, while X nk Ñ X and Ψ nk Ñ ΨpX q as k Ñ `8. The closedness criterion of the limiting subdifferential guarantees that 0 P BΨpX q or, in other words, X P crit pψq. Choosing now an element X P crit pψq, it holds which is further equivalent to (.33). $ 0 x H px, y q ` A T u ` βa T pax z q, & 0 P BG py q ` y H px, y q, 0 P BF pz q u β pax z q, % 0 Ax z, (iii)-(iv) The proof follows in the lines of the proof of Theorem 5 (ii)-(iii) in [9], also by taking into consideration [9, Remark 5], according to which the properties in (iii) and (iv) are generic for sequences satisfying X n X n Ñ 0 as n Ñ `8, which is indeed the case due to (.9). (v) Due to (.9) and the fact that tu n u ně0 is bounded, the sequences tf pz n q ` G py n q ` H px n, y n qu ně0 and tψ n u ně0 have the same limit Ψ lim Ψ n lim tf pz nq ` G py n q ` H px n, y n qu. nñ`8 nñ`8 The conclusion follows by taking into consideration the first two statements of this theorem. Remark 5. An element px, y, z, u q fulfilling (.33) is a so-called KKT point of the optimization problem (.). Such a KKT point obviously fulfils 0 P A T BF pax q ` x H px, y q, 0 P BG py q ` y H px, y q. (.34) 4

15 If A is injective, then this system of inclusions is further equivalent to 0 P B pf Aq px q ` x H px, y q B x pf A ` Hq, 0 P BG py q ` y H px, y q B y pg ` Hq, (.35) in other words, px, y q is a critical point of the optimization problem (.). On the other hand, if the functions F, G and H are convex, then, even without asking A to be injective, (.34) and (.35) are equivalent, which means that px, y q is a global minimum of the optimization problem (.). 3 Global convergence and rates In this section we will prove global convergence for the sequence tpx n, y n, z n, u n qu ně0 generated by Algorithm in the context of the Kurdyka- Lojasiewicz property and provide convergence rates for it in the context of the Lojasiewicz property. 3. Global convergence under Kurdyka- Lojasiewicz assumptions The origins of this notion go back to the pioneering work of Kurdyka who introduced in [7] a general form of the Lojasiewicz inequality [0]. An extension to the nonsmooth setting has been proposed and studied in [6, 7, 8]. Definition. Let η P p0, `8s. We denote by Φ η the set of all concave and continuous functions ϕ: r0, ηq Ñ r0, `8q which satisfy the following conditions: piq ϕ p0q 0; piiq ϕ is C on p0, ηq and continuous at 0; piiiq for any s P p0, ηq : ϕ psq ą 0. Definition. Let Ψ: R d Ñ R Y t`8u be proper and lower semicontinuous. piq The function Ψ is said to have the Kurdyka- Lojasiewicz (K L) property at a point pv P dombψ : v P R d : BΨ pvq H (, if there exists η P p0, `8s, a neighborhood V of pv and a function ϕ P Φ η such that for any the following inequality holds v P V X rψ ppvq ă Ψ pvq ă Ψ ppvq ` ηs ϕ pψ pvq Ψ ppvqq dist p0, BΨ pvqq ě. piiq If Ψ satisfies the K L property at each point of dombψ, then Ψ is called K L function. The functions ϕ belonging to the set Φ η for η P p0, `8s are called desingularization functions. The K L property reveals the possibility to reparametrize the values of Ψ in order to avoid flatness around the critical points. To the class of K L functions belong semialgebraic, real subanalytic, uniformly convex functions and convex functions satisfying a growth condition. We refer to [,, 3, 6, 7, 8, 9] for more properties of K L functions and illustrating examples. The following result, the proof of which can be found in [9, Lemma 6], will play an essential role in our convergence analysis. Lemma 0. (Uniformized K L property) Let Ω be a compact set and Ψ: R d Ñ RYt`8u be a proper and lower semicontinuous function. Assume that Ψ is constant on Ω and satisfies the K L property at each point of Ω. Then there exist ε ą 0, η ą 0 and ϕ P Φ η such that for any pv P Ω and every element u in the intersection v P R d : dist pv, Ωq ă ε ( X rψ ppvq ă Ψ pvq ă Ψ ppvq ` ηs it holds ϕ pψ pvq Ψ ppvqq dist p0, BΨ pvqq ě. 5

16 From now on we will use the following notations C 8 : min tc, C 3, C 4 u, C 9 : max tc 5, C 6, C 7 u and E n : Ψ n ě, where Ψ lim Ψ n. nñ`8 The next result shows that if Ψ is a K L function, then the sequence tpx n, y n, z n, u n qu ně0 converges to a KKT point of the optimization problem (.). This hypothesis is fulfilled if, for instance, F, G and H are semi-algebraic functions. Theorem. Let Assumption be satisfied and tpx n, y n, z n, u n qu ně0 be a sequence generated by Algorithm. If Ψ is a K L function, then the following statements are true: piq the sequence tpx n, y n, z n, u n qu ně0 has finite length, namely, ÿ ně0 x n` x n ă `8, ÿ ně0 y n` y n ă `8, ÿ ně0 z n` z n ă `8, ÿ ně0 u n` u n ă `8; piiq the sequencetpx n, y n, z n, u n qu ně0 converges to a KKT point of the optimization problem (.). Proof. Let be X P Ω, thus Ψ px q Ψ. Recall that te n u ně is monotonically decreasing and converges to 0 as n Ñ `8. We consider two cases. Case. Assume that there exists an integer n ě such that E n 0 or, equivalently, Ψ n Ψ. Due to the monotonicity of te n u ně, it follows that E n 0 or, equivalently, Ψ n Ψ for any n ě n. The inequality (.7) yields for any n ě n ` x n` x n 0, y n` y n 0 and u n` u n 0. The inequality (.0) gives us further z n` z n 0 for any n ě n `. This proves (3.). Case. Consider now the case when E n ą 0 or, equivalently, Ψ n ą Ψ for any n ě. According to Lemma 0, there exist ε ą 0, η ą 0 and a desingularization function ϕ such that for any element X in the intersection tz P R: dist pz, Ωq ă εu X tz P R: Ψ ă Ψ pzq ă Ψ ` ηu (3.) it holds Let be n ě such that for any n ě n Since ϕ pψ pxq Ψ q dist p0, BΨ pxqq ě. Ψ ă Ψ n ă Ψ ` η. lim nñ`8 dist px n, Ωq 0 (see Lemma 9 piiiq), there exists n ě such that for any n ě n dist px n, Ωq ă ε. Consequently, X n px n, y n, z n, u n, x n, u n q belongs to the intersection in (3.) for any n ě n 0 : max tn, n u, which further implies (3.) ϕ pψ n Ψ q dist p0, BΨ px n qq ϕ pe n q dist p0, BΨ px n qq ě. (3.3) Define for two arbitrary nonnegative integers i and j i,j : ϕ pψ i Ψ q ϕ pψ j Ψ q ϕ pe i q ϕ pe j q. The monotonicity of the sequence tψ n u ně0 and of the function ϕ implies that i,j ě 0 for any ď i ď j. In addition, for any N ě n 0 ě it holds Nÿ from which we get ÿ ně n,n` ă `8. n n 0 n,n` n0,n` ϕ pe n0 q ϕ pe N` q ď ϕ pe n0 q, 6

17 By combining Lemma 4 with the concavity of ϕ we obtain for any n ě n,n` ϕ pe n q ϕ pe n` q ě ϕ pe n q pe n E n` q ϕ pe n q pψ n Ψ n` q ě min tc, C 3, C 4 u ϕ pe n q x n` x n ` y n` y n ` u n` u n. Thus, (3.3) implies for any n ě n 0 x n` x n ` y n` y n ` u n` u n ď dist p0, BΨ px n qq ϕ pe n q x n` x n ` y n` y n ` u n` u n ď C 8 dist p0, BΨ px n qq n,n`. By the Cauchy-Schwarz inequality, the arithmetic mean-geometric mean inequality and Lemma 8, we have that for any n ě n 0 and every α ą 0 If we denote for any n ě 0 x n` x n ` y n` y n ` u n` u n ď? b 3 x n` x n ` y n` y n ` u n` u n ď a b 3C 8 dist p0, BΨ px n qq n,n` ď α dist p0, BΨ px n qq ` 3C 8 4α n,n` ď αc 9 p x n x n ` y n y n ` u n u n q ` 3C 8 4α n,n`. (3.4) a n : x n x n ` y n y n ` u n u n and b n : 3C 8 4α n,n`, (3.5) then the above inequality is nothing else than (.6) with χ 0 : αc 9 and χ : 0. Since ÿ ně b n ă `8, by choosing α ă {C 9, we can apply Lemma 3 to conclude that ÿ ně0 x n` x n ` y n` y n ` u n` u n ă `8. The proof of (3.) is completed by taking into account once again (.0). From (i) it follows that the sequence tpx n, y n, z n, u n qu ně0 is Cauchy, thus it converges to an element px, y, z, u q which is, according to Lemmas 9, a KKT point of the optimization problem (.). 3. Convergence rates In this section we derive convergence rates for the sequence tpx n, y n, z n, u n qu ně0 generated by Algorithm as well as for tψ n u ně0, if the regularized augmented Lagrangian Ψ satisfies the Lojasiewicz property. The following definition is from [] (see also [0]). Definition 3. Let Ψ: R d Ñ R Y t`8u be proper and lower semicontinuous. Then Ψ satisfies the Lojasiewicz property, if for any critical point pv of Ψ there exists C L ą 0, θ P r0, q and ε ą 0 such that Ψ pvq Ψ ppvq θ ď C L dist p0, P Ball ppv, εq, where Ball ppv, εq denotes the open ball with center pv and radius ε. If Assumption is fulfilled and tpx n, y n, z n, u n qu ně0 is the sequence generated by Algorithm, then, according to Theorem 9, the set of cluster points Ω is nonempty, compact and connected and Ψ takes on Ω the value Ψ ; in addition, Ω Ď crit pψq. According to [, Lemma ], if Ψ has the Lojasiewicz property, then there exist C L ą 0, θ P r0, q and ε ą 0 such that for any X P tz P R: dist pz, Ωq ă εu, 7

18 it holds Ψ pxq Ψ θ ď C L dist p0, BΨ pxqq. Obviously, Ψ is a K L function with desingularization function ϕ : r0, `8q Ñ r0, `8q, ϕ psq : θ C Ls θ, which, according to Theorem, means that Ω contains a single element X, which is the limit of tx n u ně as n Ñ `8. In other words, if Ψ has the Lojasiewicz property, then there exist C L ą 0, θ P r0, q and ε ą 0 such that for any X P Ball px, εq Ψ pxq Ψ θ ď C L dist p0, BΨ pxqq. (3.6) In this case, Ψ is said to satisfy the Lojasiewicz property with Lojasiewicz constant C L ą 0 and Lojasiewicz exponent θ P r0, q. The following lemma will provide convergence rates for a particular class of monotonically decreasing real sequences converging to 0. Its proof can be found in [3, Lemma 5]. Lemma. Let te n u ně0 be a monotonically decreasing sequence of nonnegative numbers converging 0. Assume further that there exists natural numbers n 0 ě such that for any n ě n 0 e n e n ě C e e θ n, where C e ą 0 is some constant and θ P r0, q. The following statements are true: piq if θ 0, then te n u ně0 converges in finite time; piiq if θ P p0, {s, then there exist C e,0 ą 0 and Q P r0, q such that for any n ě n 0 0 ď e n ď C e,0 Q n ; piiiq if θ P p{, q, then there exists C e, ą 0 such that for any n ě n 0 ` 0 ď e n ď C e, n θ. We prove a recurrence inequality for the sequence te n u ně0. Lemma 3. Let Assumption be satisfied and tpx n, y n, z n, u n qu ně0 be a sequence generated by Algorithm. If Ψ satisfies the Lojasiewicz property with Lojasiewicz constant C L ą 0 and Lojasiewicz exponent θ P r0, q, then there exists n 0 ě such that the following estimate holds for any n ě n 0 E n E n ě C 0 En θ C 8, where C 0 : 3 pc L C 9 q. (3.7) Proof. For every n ě we obtain from Lemma 4 E n E n Ψ n Ψ n ě C 8 x n x n ` y n y n ` u n u n ě 3 C 8 p x n x n ` y n y n ` u n u n q ě C 0 C L D n, where D n P BΨpX n q. Let ε ą 0 be such that (3.6) is fulfilled and choose n 0 ě with the property that for any n ě n 0, X n belongs to BallpX, εq. Relation (3.6) implies (3.7) for any n ě n 0. The following result follows by combining Lemma with Lemma 3. Theorem 4. Let Assumption be satisfied and tpx n, y n, z n, u n qu ně0 be a sequence generated by Algorithm. If Ψ satisfies the Lojasiewicz property with Lojasiewicz constant C L ą 0 and Lojasiewicz exponent θ P r0, q, then the following statements are true: piq if θ 0, then tψ n u ně converges in finite time; 8

19 piiq if θ P p0, {s, then there exist n 0 ě, p C 0 ą 0 and Q P r0, q such that for any n ě n 0 0 ď Ψ n Ψ ď p C 0 Q n ; piiiq if θ P p{, q, then there exist n 0 ě and p C ą 0 such that for any n ě n 0 ` 0 ď Ψ n Ψ ď p C n θ. The next lemma will play an important role when transferring the convergence rates for tψ n u ně0 to the sequence of iterates tpx n, y n, z n, u n qu ně0. Lemma 5. Let Assumption be satisfied and tpx n, y n, z n, u n qu ně0 be a sequence generated by Algorithm. Let px, y, z, u q be the KKT point of the optimization problem (.) to which tpx n, y n, z n, u n qu ně0 converges as n Ñ `8. Then there exists n 0 ě such that the following estimates hold for any n ě n 0!a )!a ) x n x ď C max En, ϕ pe n q, y n y ď C max En, ϕ pe n q,!a )!a ) z n z ď C max En, ϕ pe n q, u n u ď C max En, ϕ pe n q, (3.8) where C : a 3C 8 ` 3C 8 C 9 and C : ˆ A ` C. σβ Proof. We assume that E n ą 0 for any n ě 0. Otherwise, the sequence tpx n, y n, z n, u n qu ně0 becomes identical to px, y, z, u q beginning with a given index and the conclusion follows automatically (see the proof of Theorem ). Let ε ą 0 be such that (3.6) is fulfilled and n 0 ě be such that X n belongs to BallpX, εq for any n ě n 0. We fix n ě n 0 now. One can easily notice that x n x ď x n` x n ` x n` x ď ď ÿ x k` x k. kěn Similarly, we derive y n y ď ÿ kěn y k` y k, z n z ď ÿ z k` z k, kěn u n u ď ÿ u k` u k. kěn On the other hand, in view of (3.5) and by taking α : C 9 the inequality (3.4) can be written as a n` ď a n ` b ě n 0. Let us fix now an integer N ě n. Summing up the above inequality for k n,..., N, we have Nÿ a k` ď k n ď Nÿ Nÿ a k ` b k k n Nÿ k n k n Nÿ Nÿ a k` ` a n a N` ` k n a k` ` a n ` 3C 8C 9 ϕ pe n q. By passing N Ñ `8, we obtain ÿ a k` ÿ p x k` x k ` y k` y k ` u k` u k q kěn kěn which gives the desired statement. ď p x n` x n ` y n` y n ` u n` u n q ` 3C 8 C 9 ϕ pe n q ď? b 3 x n` x n ` y n` y n ` u n` u n ` 3C 8 C 9 ϕ pe n q ď a 3C 8 ae n E n` ` 3C 8 C 9 ϕ pe n q, k n b k 9

20 We can now formulate convergence rates for the sequence of generated iterates. Theorem 6. Let Assumption be satisfied and tpx n, y n, z n, u n qu ně0 be a sequence generated by Algorithm. Suppose further that Ψ satisfies the Lojasiewicz property with Lojasiewicz constant C L ą 0 and Lojasiewicz exponent θ P r0, q. Let px, y, z, u q be the KKT point of the optimization problem (.) to which tpx n, y n, z n, u n qu ně0 converges as n Ñ `8. Then the following statements are true: piq if θ 0, then the algorithm converges in finite time; piiq if θ P p0, {s, then there exist n 0 ě, p C0,, p C 0,, p C 0,3, p C 0,4 ą 0 and p Q P r0, q such that for any n ě n 0 x n x ď p C 0, p Q k, y n y ď p C 0, p Q k, z n z ď p C 0,3 p Q k, u n u ď p C 0,4 p Q k ; piiiq if θ P p{, q, then there exist n 0 ě and p C,, p C,, p C,3, p C,4 ą 0 such that for any n ě n 0 ` Proof. Let be the desingularization function. x n x ď p C, n θ θ, yn y ď p C, n θ θ, z n z ď p C,3 n θ θ, un u ď p C,4 n θ θ. ϕ : r0, `8q Ñ r0, `8q, s ÞÑ θ C Ls θ, (i) If θ 0, then tψ n u ně converges in finite time. As seen in the proof of Theorem, the sequence tpx n, y n, z n, u n qu ně0 becomes identical to px, y, z, u q starting from a given index. In other words, the sequence tpx n, y n, z n, u n qu ně0 converges also in finite time and the conclusion follows. Let be θ and n 0 ě such that for any n ě n 0 the inequalities (3.8) in Lemma 5 and hold. E n ď ˆ θ θ C L (ii) If θ P p0, {q, then θ ă 0 and thus for any n ě n 0 θ C LE θ n ď a E n, which implies that If θ {, then thus In both cases we have!a ) max En, ϕ pe n q a E n. ϕ pe n q C L a En,!a ) max En, ϕ pe n q max t, C L u ae ě.!a ) max En, ϕ pe n q ď max t, C L u ae ě n 0. By Theorem 4, there exist n 0 ě, C0 p ą 0 and Q P r0, q such that for Q p :? Q and every n ě n 0 it holds a b b En ď pc 0 Q n{ pc 0Q p n. The conclusion follows from Lemma 5 for n 0 : max tn 0, n 0u. 0

21 (iii) If θ P p{, q, then θ ą 0 and thus for any n ě n 0 a En ď θ C LE θ n, which implies that!a ) max En, ϕ pe n q ϕ pe n q θ C LEn θ. By Theorem 4, there exist n 0 ě and p C ą 0 such that for any n ě n 0 θ C LE θ n ď θ C L p C θ pn q θ θ. The conclusion follows again for n 0 : max tn 0, n 0u from Lemma 5. References [] H. Attouch, J. Bolte. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Mathematical Programming 6(), 5 6 (009) [] H. Attouch, J. Bolte, P. Redont, A. Soubeyran. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka Lojasiewicz inequality. Mathematics of Operations Research 35(), (00) [3] H. Attouch, J. Bolte, B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical Programming 37( ), 9 9 (03) [4] H.H. Bauschke, P.L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, New York (0) [5] A. Beck. First-Order Methods in Optimization. MOS-SIAM Series on Optimization. SIAM, Philadelphia (07) [6] J. Bolte, A. Daniilidis, A. Lewis. The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization 7(4), 05 3 (006) [7] J. Bolte, A. Daniilidis, A. Lewis, M. Shiota. Clarke subgradients of stratifiable functions. SIAM Journal on Optimization 8(), (007) [8] J. Bolte, A. Daniilidis, O. Ley, L. Mazet. Characterizations of Lojasiewicz inequalities: subgradient flows, talweg, convexity. Transactions of the American Mathematical Society 36(6), (00) [9] J. Bolte, S. Sabach, M. Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 46(): (04) [0] J. Bolte, S. Sabach, M. Teboulle. Nonconvex Lagrangian-based optimization: monitoring schemes and global convergence. Mathematics of Operations Research, to appear. org/0.87/moor [] R. I. Boţ, E. R. Csetnek. An inertial Tseng s type proximal algorithm for nonsmooth and nonconvex optimization problems. Journal of Optimization Theory and Applications 7(), (06) [] R. I. Boţ, E. R. Csetnek, S. C. László. An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO Journal on Computational Optimization 4(), 3 5 (06) [3] R. I. Boţ, D.-K. Nguyen. The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. arxiv:

22 [4] P. L. Combettes, V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation 4(4), (005) [5] W. Hare, C. Sagastizábal. Computing proximal points of nonconvex functions. Mathematical Programming 6(-), 58 (009) [6] M. Hong, Z.-Q. Luo. On the linear convergence of the alternating direction method of multipliers. Mathematica Programming 6, (07) [7] K. Kurdyka. On gradients of functions definable in o-minimal structures. Annales de l Institut Fourier 48, (998) [8] A. Lewis, J. Malick. Alternating projection on manifolds. Mathematics of Operations Research 33(), 6 34 (008) [9] G. Li, T. K. Pong. Global convergence of splitting methods for nonconvex composite optimization. SIAM Journal on Optimization 5(4), (05) [0] S. Lojasiewicz. Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles. Éditions du Centre National de la Recherche Scientifique, Paris, 8 89 (963) [] B. Mordukhovich. Variational Analysis and Generalized Differentiation, I: Basic Theory, II: Applications. Springer, Berlin (006) [] J. Moreau. Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes Rendus de l Académie des Sciences (Paris), Série A 55, (96) [3] T. Pock, S. Sabach. Inertial proximal alternating linearized minimization (ipalm) for nonconvex and nonsmooth problems, SIAM Journal Imaging Sciences 9(4), (06) [4] R. T. Rockafellar, R. J.-B. Wets. Variational Analysis. Fundamental Principles of Mathematical Sciences 37. Springer, Berlin (998) [5] L. Yang, T. K. Pong and X. Chen. Alternating Direction Method of Multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM Journal on Imaging Sciences, 0(), 74 0 (07)

The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates

The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates Radu Ioan Boţ Dang-Khoa Nguyen January 6, 08 Abstract. We propose two numerical algorithms