Structured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity Analysis

Size: px

Start display at page:

Download "Structured Nonconvex and Nonsmooth Optimization: Algorithms and Iteration Complexity Analysis"

Arlene Black
5 years ago
Views:

1 Structured onconvex and onsmooth Optmzaton: Algorthms and Iteraton Complexty Analyss Bo Jang Tany Ln Shqan Ma Shuzhong Zhang ovember 13, 017 Abstract onconvex and nonsmooth optmzaton problems are frequently encountered n much of statstcs, busness, scence and engneerng, but they are not yet wdely recognzed as a technology n the sense of scalablty. A reason for ths relatvely low degree of popularty s the lack of a well developed system of theory and algorthms to support the applcatons, as s the case for ts convex counterpart. Ths paper ams to take one step n the drecton of dscplned nonconvex and nonsmooth optmzaton. In partcular, we consder n ths paper some constraned nonconvex optmzaton models n block decson varables, wth or wthout coupled affne constrants. In the case of wthout coupled constrants, we show a sublnear rate of convergence to an ɛ-statonary soluton n the form of varatonal nequalty for a generalzed condtonal gradent method, where the convergence rate s shown to be dependent on the Hölderan contnuty of the gradent of the smooth part of the objectve. For the model wth coupled affne constrants, we ntroduce correspondng ɛ-statonarty condtons, and apply two proxmal-type varants of the ADMM to solve such a model, assumng the proxmal ADMM updates can be mplemented for all the block varables except for the last block, for whch ether a gradent step or a majorzaton-mnmzaton step s mplemented. We show an teraton complexty bound of O1/ɛ to reach an ɛ-statonary soluton for both algorthms. Moreover, we show that the same teraton complexty of a proxmal BCD method follows mmedately. umercal results are provded to llustrate the effcacy of the proposed algorthms for tensor robust PCA. Keywords: Structured onconvex Optmzaton, ɛ-statonary Soluton, Iteraton Complexty, Condtonal Gradent Method, Alternatng Drecton Method of Multplers, Block Coordnate Descent Method Mathematcs Subject Classfcaton: 90C6, 90C06, 90C60. Research Center for Management Scence and Data Analytcs, School of Informaton Management and Engneerng, Shangha Unversty of Fnance and Economcs, Shangha 00433, Chna. Research of ths author was supported n part by atonal atural Scence Foundaton of Chna Grant Department of Industral Engneerng and Operatons Research, UC Berkeley, Berkeley, CA 9470, USA. Department of Mathematcs, UC Davs, Davs, CA 95616, USA. Research of ths author was supported n part by a startup package n Department of Mathematcs at UC Davs. Department of Industral and Systems Engneerng, Unversty of Mnnesota, Mnneapols, M 55455, USA. Research of ths author was supported n part by the atonal Scence Foundaton Grant CMMI

2 1 Introducton In ths paper, we consder the followng nonconvex and nonsmooth optmzaton problem wth multple block varables: mn s.t. fx 1, x,, x + 1 r x =1 =1 A x = b, x X, = 1,..., 1, 1.1 where f s dfferentable and possbly nonconvex, and each r s possbly nonsmooth and nonconvex, = 1,..., 1; A R m n, b R m, x R n ; and X R n are convex sets, = 1,,..., 1. One restrcton of model 1.1 s that the objectve functon s requred to be smooth wth respect to the last block varable x. However, n Secton 4 we shall extend the result to cover the general case where r x may be present and that x maybe constraned as well. A specal case of 1.1 s when the affne constrants are absent, and there s no block structure of the varables.e., x = x 1 and other block varables do not show up n 1.1, whch leads to the followng more compact form mn Φx := fx + rx, s.t. x S R n, 1. where S s a convex and compact set. In ths paper, we propose several frst-order algorthms for computng an ɛ-statonary soluton to be defned later for 1.1 and 1., and analyze ther teraton complextes. Throughout, we assume the followng condton. Assumpton 1.1 The sets of the statonary solutons for 1.1 and 1. are non-empty. Problem 1.1 arses from a varety of nterestng applcatons. For example, one of the nonconvex models for matrx robust PCA can be cast as follows see, e.g., [51], whch seeks to decompose a gven matrx M R m n nto a superposton of a low-rank matrx Z, a sparse matrx E and a nose matrx B: mn Z XY F + αre, s.t. M = Z + E + B, B F η, 1.3 X,Y,Z,E,B where X R m r, Y R n r, wth r < mnm, n beng the estmated rank of Z; η > 0 s the nose level, α > 0 s a weghtng parameter; RE s a regularzaton functon that can mprove the sparsty of E. One of the wdely used regularzaton functons s the l 1 norm, whch s convex and nonsmooth. However, there are also many nonconvex regularzaton functons that are wdely used n statstcal learnng and nformaton theory, such as smoothly clpped absolute devaton SCAD [3], log-sum penalty LSP [15], mnmax concave penalty MCP [57], and capped-l 1 penalty [58, 59], and they are nonsmooth at pont 0 f composed wth the absolute value functon, whch s usually the case n statstcal learnng. Clearly 1.3 s n the form of 1.1. Another example of the form 1.1 s the followng nonconvex tensor robust PCA model see, e.g., [54], whch seeks to decompose a gven tensor T R n 1 n n d nto a superposton of a low-rank tensor Z, a sparse tensor E and a nose tensor B: mn Z C 1 X 1 X 3 d X d F + αre, s.t. T = Z + E + B, B F η, X,C,Z,E,B where C s the core tensor that has a smaller sze than Z, and X are matrces wth approprate szes, = 1,..., d. In fact, the low-rank tensor n the above model corresponds to the tensor wth a small core; however a recent work [35] demonstrates that the CP-rank of the core regardless of ts sze could be as large as the orgnal tensor. Therefore, f one wants to fnd the low CP-rank decomposton, then the followng model s preferred: mn Z X 1, X,, X d + α RE + B, s.t. T = Z + E + B, X,Z,E,B

3 for X = [a,1, a,,, a,r ] R n R, 1 d and X 1, X,, X d := R a 1,r a,r a d,r, where denotes the outer product of vectors, and R s an estmaton of the CP-rank. In addton, the so-called sparse tensor PCA problem [1], whch seeks the best sparse rank-one approxmaton for a gven d-th order tensor T, can also be formulated n the form of 1.1: mn T x 1, x,, x d + α r=1 d Rx, s.t. x S = {x x 1}, = 1,,..., d, =1 where T x 1, x,, x d = 1,..., d T 1,..., d x 1 1 x d d. The convergence and teraton complexty for varous nonconvex and nonsmooth optmzaton problems have recently attracted consderable research attenton; see e.g. [3, 6 8, 10, 11, 19, 0, 6, 7, 9, 41]. In ths paper, we study several soluton methods that use only the frstorder nformaton of the objectve functon, ncludng a generalzed condtonal gradent method, varants of alternatng drecton method of multplers, and a proxmal block coordnate descent method, for solvng 1.1 and 1.. Specfcally, we apply a generalzed condtonal gradent GCG method to solve 1.. We prove that the GCG can fnd an ɛ-statonary soluton for 1. n Oɛ q teratons under certan mld condtons, where q s a parameter n the Hölder condton that characterzes the degree of smoothness for f. In other words, the convergence rate of the algorthm depends on the degree of smoothness of the objectve functon. It should be noted that a smlar teraton bound that depends on the parameter q was reported for convex problems [13], and for general nonconvex problem, [14] analyzed the convergence results, but there was no teraton complexty result. Furthermore, we show that f f s concave, then GCG fnds an ɛ-statonary soluton for 1. n O1/ɛ teratons. For the affnely constraned problem 1.1, we propose two algorthms called proxmal ADMM-g and proxmal ADMM-m n ths paper, both can be vewed as varants of the alternatng drecton method of multplers ADMM. Recently, there has been an emergng research nterest on the ADMM for nonconvex problems see, e.g., [, 3, 33, 38, 5, 53, 55]. However, the results n [38, 5, 53, 55] only show that the terates produced by the ADMM converge to a statonary soluton wthout provdng an teraton complexty analyss. Moreover, the objectve functon s requred to satsfy the so-called Kurdyka- Lojasewcz KL property [9, 36, 4, 45] to enable those convergence results. In [33], Hong, Luo and Razavyayn analyzed the convergence of the ADMM for solvng nonconvex consensus and sharng problems. ote that they also analyzed the teraton complexty of the ADMM for the consensus problem. However, they requre the nonconvex part of the objectve functon to be smooth, and nonsmooth part to be convex. In contrast, r n our model 1.1 can be nonconvex and nonsmooth at the same tme. Moreover, we allow general constrants x X, = 1,..., 1, whle the consensus problem n [33] only allows such constrant for one block varable. A very recent work of Hong [3] dscussed the teraton complexty of an augmented Lagrangan method for fndng an ɛ-statonary soluton for the followng problem: mn fx, s.t. Ax = b, x R n, 1.4 under the assumpton that f s dfferentable. We wll compare our results wth [3] n more detals n Secton 3. Throughout ths paper, we make the followng assumpton. Assumpton 1. All subproblems n our algorthms, though possbly nonconvex, can be solved to global optmalty. We shall show later that solvng our subproblems usually corresponds to computng the proxmal mappng of the nonsmooth part of the objectve functon. Besdes, the proxmal mappngs of the aforementoned nonsmooth regularzaton functons, ncludng the l 1 norm, 3

4 SCAD, LSP, MCP and Capped-l 1 penalty, all admt closed-form solutons, and the explct formulae can be found n [8]. Before proceedng, let us frst summarze: Our contrbutons. We provde defntons of ɛ-statonary soluton for 1.1 and 1. usng the varatonal nequaltes. For 1.1, our defnton of the ɛ-statonary soluton allows each r to be nonsmooth and nonconvex. We study a generalzed condtonal gradent method wth a sutable lne search rule for solvng 1.. We assume that the gradent of f satsfes a Hölder condton, and analyze ts teraton complexty for obtanng an ɛ-statonary soluton for 1.. After we released the frst verson of ths paper, we notced there are several recent works that study the teraton complexty of condtonal gradent method for nonconvex problems. However, our results are dfferent from these. For example, the convergence rate gven n [56] s worse than ours, and [43, 44] only consder smooth nonconvex problem wth Lpschtz contnuous gradent, but our results cover nonsmooth models. We study two ADMM varants proxmal ADMM-g and proxmal ADMM-m for solvng 1.1, and analyze ther teraton complextes for obtanng an ɛ-statonary soluton for nonconvex problem 1.1. In addton, the setup and the assumptons of our model are dfferent from other recent works. For nstance, [38] consders a two-block nonconvex problem wth an dentty coeffcent matrx for one block varable n the lnear constrant, and requres the coercveness of the objectve or the boundedness of the doman. [53] assumes the the objectve functon s coercve over the feasble set and the nonsmooth objectve s restrcted prox-regualr or pece-wse lnear. Whle our algorthm assumes the gradent of the smooth part of the objectve functon s Lpschtz contnuous and the nonsmooth part does not nvolve the last block varable, whch s weaker than the assumptons on the objectve functons n [38, 53]. v As an extenson, we also show how to use proxmal ADMM-g and proxmal ADMM-m to fnd an ɛ-statonary soluton for 1.1 wthout assumng any condton on A. v When the affne constrants are absent n model 1.1, as a by-product, we demonstrate that the teraton complexty of proxmal block coordnate descent BCD method wth cyclc order can be obtaned drectly from that of proxmal ADMM-g and proxmal ADMM-m. Although [11] gves an teraton complexty result of nonconvex BCD, t requres the KL property, and the complexty depends on a parameter n the KL condton, whch s typcally unknown. otaton. x denotes the Eucldean norm of vector x, and x H denotes x Hx for some postve defnte matrx H. For set S and scalar p > 1, we denote dam p S := max x,y S x y p, where x p = n =1 x p 1/p. Wthout specfcaton, we denote x = x and dams = dam S for short. We use dstx, S to denote the Eucldean dstance of vector x to set S. Gven a matrx A, ts spectral norm and smallest sngular value are denoted by A and σ mn A respectvely. We use a to denote the celng of a. Organzaton. The rest of ths paper s organzed as follows. In Secton we ntroduce the noton of ɛ-statonary soluton for 1. and apply a generalzed condtonal gradent method to solve 1. and analyze ts teraton complexty for obtanng an ɛ-statonary soluton for 1.. In Secton 3 we gve two defntons of ɛ-statonarty for 1.1 under dfferent settngs and propose two ADMM varants that solve 1.1 and analyze ther teraton complextes to reach an ɛ-statonary soluton for 1.1. In Secton 4 we provde some extensons of the results n Secton 3. In partcular, we frst show how to remove some of the condtons that we assume n 4

5 Secton 3, and then we apply a proxmal BCD method to solve 1.1 wthout affne constrants and provde an teraton complexty analyss. In Secton 5, we present numercal results to llustrate the practcal effcency of the proposed algorthms. A generalzed condtonal gradent method In ths secton, we study a GCG method for solvng 1. and analyze ts teraton complexty. The condtonal gradent CG method, also known as the Frank-Wolfe method, was orgnally proposed n [4], and reganed a lot of popularty recently due to ts capablty n solvng large-scale problems see, [4, 5, 5, 30, 34, 37, 47]. However, these works focus on solvng convex problems. Bredes et al. [14] proved the convergence of a generalzed condtonal gradent method for solvng nonconvex problems n Hlbert space. In ths secton, by ntroducng a sutable lne search rule, we provde an teraton complexty analyss for ths algorthm. Throughout ths secton, we make the followng assumpton regardng 1.. Assumpton.1 In 1., rx s convex and nonsmooth, and the constrant set S s convex and compact. Moreover, f s dfferentable and there exst some p > 1 and ρ > 0 such that fy fx + fx y x + ρ y x p p, x, y S..1 The above nequalty.1 s also known as the Hölder condton and was used n other works on frst-order algorthms e.g., [1]. It can be shown that.1 holds for a varety of functons. For nstance,.1 holds for any p when f s concave, and s vald for p = when f s Lpschtz contnuous..1 An ɛ-statonary soluton for problem 1. For smooth unconstraned problem mn x fx, t s natural to defne the ɛ-statonary soluton usng the crteron fx ɛ. esterov [49] and Carts et al. [17] showed that the gradent descent type methods wth properly chosen step sze need O1/ɛ teratons to fnd such a soluton. Moreover, Carts et al. [16] constructed an example showng that the O1/ɛ teraton complexty s tght for the steepest descent type algorthm. However, the case for the constraned nonsmooth nonconvex optmzaton s subtler. There exst some works on how to defne ɛ- optmalty condton for the local mnmzers of varous constraned nonconvex problems [18,, 7, 3, 48]. Carts et al. [18] proposed an approxmate measure for smooth problem wth convex set constrant. [48] dscussed general nonsmooth nonconvex problem n Banach space by usng the tool of lmtng Fréchet ɛ-subdfferental. [] showed that under certan condtons ɛ-statonary solutons can converge to a statonary soluton as ɛ 0. Ghadm et al. [7] consdered the followng noton of ɛ-statonary soluton for 1.: P S x, γ := 1 γ x x+, where x + = arg mn y S fx y + 1 V y, x + ry,. γ where γ > 0 and V s a prox-functon. They proposed a projected gradent algorthm to solve 1. and proved that t takes no more than O1/ɛ teratons to fnd an x satsfyng Our defnton of an ɛ-statonary soluton for 1. s as follows. P S x, γ ɛ..3 Defnton. We call x an ɛ-statonary soluton ɛ 0 for 1. f the followng holds: ψ S x := nf y S { fx y x + ry rx} ɛ..4 If ɛ = 0, then x s called a statonary soluton for 1.. 5

6 Observe that f r s contnuous then any cluster pont of ɛ-statonary solutons defned above s a statonary soluton for 1. as ɛ 0. Moreover, the statonarty condton s weaker than the usual KKT optmalty condton. To see ths, we frst rewrte 1. as the followng equvalent unconstraned problem mn fx + rx + ι S x x where ι S x s the ndcator functon of S. Suppose that x s any local mnmzer of ths problem and thus also a local mnmzer of 1.. Snce f s dfferentable, r and ι S are convex, Fermat s rule [50] yelds 0 fx + rx + ι S x = fx + rx + ι S x,.5 whch further mples that there exsts some z rx such that Usng the convexty of r, t follows that fx + z y x 0, y S. fx y x + ry rx 0, y S..6 Thus,.6 s weaker than.5, and t s a necessary condton for local mnmum of 1. as well. Furthermore, we clam that ψ S x ɛ mples P S x, γ ɛ/γ wth the prox-functon V y, x = y x /. In fact,. guarantees that fx + 1 γ x+ x + z y x + 0, y S,.7 for some z rx +. By choosng y = x n.7 one obtans fx x x + + rx rx + fx + z x x + 1 γ x+ x..8 Therefore, f ψ S x ɛ, then P S x, γ ɛ γ holds.. The algorthm For gven pont z, we defne an approxmaton of the objectve functon of 1. to be: ly; x := fx + fx y x + ry,.9 whch s obtaned by lnearzng the smooth part functon f of Φ n 1.. Our GCG method for solvng 1. s descrbed n Algorthm 1, where ρ and p are from Assumpton.1. Algorthm 1 Generalzed Condtonal Gradent Algorthm GCG for solvng 1. Requre: Gven x 0 S for k = 0, 1,... do [Step 1] y k = arg mn y S ly; x k, and let d k = y k x k ; [Step ] α k = arg mn α [0,1] α fx k d k + α p ρ dk p p + 1 αrx k + αry k ; [Step 3] Set x k+1 = 1 α k x k + α k y k. end for In each teraton of Algorthm 1, we frst perform an exact mnmzaton on the approxmated objectve functon ly; x to form a drecton d k. Then the step sze α k s obtaned by an exact lne search whch dfferentates the GCG from a normal CG method along the drecton d k, where f s approxmated by p-powered functon and the nonsmooth part s replaced by ts upper bound. Fnally, the terate s updated by movng along the drecton d k wth step sze α k. 6

7 Remark.3 Accordng to Assumpton 1., we assume that the subproblem n Step 1 of Algorthm 1 can be solved to global optmalty. See [46] for problems arsng from sparse PCA that satsfy ths assumpton. Remark.4 It s easy to see that the sequence {Φx k } generated by GCG s monotoncally nonncreasng [14], whch mples that any cluster pont of {x k } cannot be a strct local maxmzer..3 An teraton complexty analyss Before we proceed to the man result on teraton complexty of GCG, we need the followng lemma that gves a suffcent condton for an ɛ-statonary soluton for 1.. Ths lemma s nspred by [6], and t ndcates that f the progress ganed by mnmzng.9 s small, then z must already be close to a statonary soluton for 1.. Lemma.5 Defne z l := argmn x S lx; z. The mprovement of the lnearzaton at pont z s defned as l z := lz; z lz l ; z = fz z l z + rz rz l. Gven ɛ 0, for any z S, f l z ɛ, then z s an ɛ-statonary soluton for 1. as defned n Defnton.. Proof. From the defnton of z l, we have whch mples that ly; z lz l ; z = fz y z l + ry rz l 0, y S, fz y z + ry rz = fz y z l + ry rz l + fz z l z + rz l rz fz z l z + rz l rz, y S. It then follows mmedately that f l z ɛ, then fz y z + ry rz l z ɛ. Denotng Φ to be the optmal value of 1., we are now ready to gve the man result of the teraton complexty of GCG Algorthm 1 for obtanng an ɛ-statonary soluton for 1.. Theorem.6 For any ɛ 0, dam p psρ, GCG fnds an ɛ-statonary soluton for 1. wthn Φx 0 Φ dam p psρ q 1 teratons, where 1 p + 1 q = 1. ɛ q Proof. For ease of presentaton, we denote D := dam p S and l k := l x k. By Assumpton.1, usng the fact that ɛ D p ρ < 1, and by the defnton of α k n Algorthm 1, we have ɛ/d p ρ 1 p 1 l k 1 p ɛ/d ρ1/p 1 p 1.10 ɛ/d p ρ 1 p 1 fx k y k x k + ry k rx k ρ ɛ/dp ρ p p 1 y k x k p p α k fx k y k x k + ry k rx k ραp k yk x k p p fx k x k+1 x k + rx k rx k+1 ρ xk+1 x k p p fx k fx k+1 + rx k rx k+1 = Φx k Φx k+1, 7

8 where the thrd nequalty s due to the convexty of functon r and the fact that x k+1 x k = α k y k x k, and the last nequalty s due to.1. Furthermore,.10 mmedately yelds l k ɛ/d p ρ 1 p 1 Φx k Φx k+1 + ɛ..11 For any nteger K > 0, summng.11 over k = 0, 1,..., K 1, yelds K K 1 mn k {0,1,...,K 1} lk l k ɛ/d p ρ 1 p 1 Φx 0 Φx K + ɛ K k=0 ɛ/d p ρ 1 p 1 Φx 0 Φ + ɛ K, where Φ s the optmal value of 1.. It s easy to see that by settng K = Φx 0 Φ D p ρ q 1 ɛ, q the above nequalty mples l x k ɛ, where k argmn k {0,...,K 1} l k. Accordng to Lemma.5, x k s an ɛ-statonary soluton for 1. as defned n Defnton.. Fnally, f f s concave, then the teraton complexty can be mproved as O1/ɛ. Proposton.7 Suppose that f s a concave functon. If we set α k = 1 for all k n GCG Algorthm 1, then t returns an ɛ-statonary soluton for 1. wthn Φx 0 Φ teratons. Proof. By settng α k = 1 n Algorthm 1 we have x k+1 = y k for all k. Snce f s concave, t holds that l k = fx k x k+1 x k + rx k rx k+1 Φx k Φx k+1. Summng ths nequalty over k = 0, 1,..., K 1 yelds K mn k {0,1,...,K 1} l k Φx 0 Φ, whch leads to the desred result mmedately. 3 Varants of ADMM for solvng nonconvex problems wth affne constrants In ths secton, we study two varants of the ADMM Alternatng Drecton Method of Multplers for solvng the general problem 1.1, and analyze ther teraton complextes for obtanng an ɛ-statonary soluton to be defned later under certan condtons. Throughout ths secton, the followng two assumptons regardng problem 1.1 are assumed. Assumpton 3.1 The gradent of the functon f s Lpschtz contnuous wth Lpschtz constant L > 0,.e., for any x 1 1,, x1 and x 1,, x X 1 X 1 R n, t holds that fx 1 1, x 1,, x 1 fx 1, x,, x L x 1 1 x 1, x 1 x,, x 1 x, 3.1 whch mples that for any x 1,, x 1 X 1 X 1 and x, ˆx R n, we have fx 1,, x 1, x fx 1,, x 1, ˆx +x ˆx fx 1,, x 1, ˆx + L x ˆx. Assumpton 3. f and r, = 1,..., 1 are all lower bounded over the approprate domans defned va the sets X 1, X,, X 1, R n, and we denote f = nf x X,=1,..., 1;x R n and r = nf x X {r x } for = 1,,..., 1. {fx 1, x,, x } ɛ 3. 8

9 3.1 Prelmnares To characterze the optmalty condtons for 1.1 when r s nonsmooth and nonconvex, we need to recall the noton of the generalzed gradent see, e.g., [50]. Defnton 3.3 Let h : R n R {+ } be a proper lower sem-contnuous functon. Suppose h x s fnte for a gven x. For v R n, we say that. v s a regular subgradent also called Fréchet subdfferental of h at x, wrtten v ˆ h x, f lm nf hx h x v, x x 0; x x x x x x. v s a general subgradent of h at x, wrtten v h x, f there exst sequences {x k } and {v k } such that x k x wth hx k h x, and v k ˆ hx k wth v k v when k. The followng proposton lsts some well known facts about the lower sem-contnuous functons. Proposton 3.4 Let h : R n R {+ } and g : R n R {+ } be proper lower semcontnuous functons. Then t holds that: Theorem 10.1 n [50] Fermat s rule remans true: f x s a local mnmum of h, then 0 h x. If h s contnuously dfferentable at x, then h + gx = hx + gx. Exercse n [50] If h s locally Lpschtz contnuous at x, then h + gx hx + gx. v Suppose hx s locally Lpschtz contnuous, X s a closed and convex set, and x s a local mnmum of h on X. Then there exsts v h x such that x x v 0, x X. In our analyss, we frequently use the followng dentty that holds for any vectors a, b, c, d, a b c d = 1 a d a c + b c b d An ɛ-statonary soluton for problem 1.1 We now ntroduce notons of ɛ-statonarty for 1.1 under the followng two settngs: Settng 1: r s Lpschtz contnuous, and X s a compact set, for = 1,..., 1; Settng : r s lower sem-contnuous, and X = R n, for = 1,..., 1. Defnton 3.5 ɛ-statonary soluton for 1.1 n Settng 1 Under the condtons n Settng 1, for ɛ 0, we call x 1,, x an ɛ-statonary soluton for 1.1 f there exsts a Lagrange multpler λ such that the followng holds for any x 1,, x X 1 X 1 R n : [ x x g + fx 1,, x A λ ] ɛ, = 1,..., 1, 3.4 fx 1,..., x 1, x A λ ɛ, 3.5 A x b ɛ, 3.6 =1 where g s a general subgradent of r at pont x. If ɛ = 0, we call x 1,, x a statonary soluton for 1.1. If X = R n for = 1,..., 1, then the VI style condtons n Defnton 3.5 reduce to the followng. 9

10 Defnton 3.6 ɛ-statonary soluton for 1.1 n Settng Under the condtons n Settng, for ɛ 0, we call x 1,..., x to be an ɛ-statonary soluton for 1.1 f there exsts a Lagrange multpler λ such that 3.5, 3.6 and the followng holds for any x 1,, x X 1 X 1 R n : dst fx 1,, x + A λ, r x ɛ, = 1,..., 1, 3.7 where r x s the general subgradent of r at x, = 1,,..., 1. If ɛ = 0, we call x 1,, x to be a statonary soluton for 1.1. The two settngs of problem 1.1 consdered n ths secton and ther correspondng defntons of ɛ-statonary soluton, are summarzed n Table 1. Table 1: ɛ-statonary soluton of 1.1 n two settngs r, = 1,..., 1 X, = 1,..., 1 ɛ-statonary soluton Settng 1 Lpschtz contnuous X R n compact Defnton 3.5 Settng lower sem-contnuous X = R n Defnton 3.6 A very recent work of Hong [3] proposes a defnton of an ɛ-statonary soluton for problem 1.4, and analyzes the teraton complexty of a proxmal augmented Lagrangan method for obtanng such a soluton. Specfcally, x, λ s called an ɛ-statonary soluton for 1.4 n [3] f Qx, λ ɛ, where Qx, λ := x L β x, λ + Ax b, and L β x, λ := fx λ Ax b + β Ax b s the augmented Lagrangan functon of 1.4. ote that [3] assumes that f s dfferentable and has bounded gradent n 1.4. It s easy to show that an ɛ-statonary soluton n [3] s equvalent to an O ɛ-statonary soluton for 1.1 accordng to Defnton 3.6 wth r = 0 and f beng dfferentable. ote that there s no set constrant n 1.4, and so the noton of the ɛ-statonarty n [3] s not applcable n the case of Defnton 3.5. Proposton 3.7 Consder the ɛ-statonary soluton n Defnton 3.6 appled to problem 1.4,.e., one block varable and r x = 0. Then x s a γ 1 ɛ-statonary soluton n Defnton 3.6, wth Lagrange multpler λ and γ 1 = 1/ β A + 3, mples Qx, λ ɛ. On the contrary, f Qx, λ ɛ, then x s a γ ɛ-statonary soluton from Defnton 3.6 wth Lagrange multpler λ, where γ = 1 + β A. Proof. Suppose x s a γ 1 ɛ-statonary soluton as defned n Defnton 3.6. We have fx A λ γ 1 ɛ and Ax b γ 1 ɛ, whch mples that Qx, λ = fx A λ + βa Ax b + Ax b fx A λ + β A Ax b + Ax b γ 1ɛ + β A + 1γ 1ɛ = ɛ. On the other hand, f Qx, λ ɛ, then we have fx A λ + βa Ax b ɛ and Ax b ɛ. Therefore, fx A λ fx A λ + βa Ax b + βa Ax b fx A λ + βa Ax b + β A Ax b 1 + β A ɛ. The desred result then follows mmedately. In the followng, we ntroduce two varants of ADMM, to be called proxmal ADMM-g and proxmal ADMM-m, that solve 1.1 under some addtonal assumptons on A. In partcular, proxmal ADMM-g assumes A = I, and proxmal ADMM-m assumes A to have full row rank. 10

11 3.3 Proxmal gradent-based ADMM proxmal ADMM-g Our proxmal ADMM-g solves 1.1 under the condton that A = I. ote that when A = I, the problem s usually referred to as the sharng problem n the lterature, and t has a varety of applcatons see, e.g., [1,33,39,40]. Our proxmal ADMM-g for solvng 1.1 wth A = I s descrbed n Algorthm. It can be seen from Algorthm that proxmal ADMM-g s based on the framework of augmented Lagrangan method, and can be vewed as a varant of the ADMM. The augmented Lagrangan functon of 1.1 s defned as 1 L β x 1,, x, λ := fx 1,, x + r x =1 λ, A x b + β A x b where λ s the Lagrange multpler assocated wth the affne constrant, and β > 0 s a penalty parameter. In each teraton, proxmal ADMM-g mnmzes the augmented Lagrangan functon plus a proxmal term for block varables x 1,..., x 1, wth other varables beng fxed; and then a gradent descent step s conducted for x, and fnally the Lagrange multpler λ s updated. Algorthm Proxmal Gradent-based ADMM proxmal ADMM-g for solvng 1.1 wth A = I Requre: Gven x 0 1, x0,, x0 X1 X 1 R n, λ 0 R m for k = 0, 1,... do [Step 1] x k+1 := argmn x X L β x k+1 1, x, x k +1,, xk, λk + 1 x x k for H some postve defnte matrx H, = 1,..., 1 [Step ] x k+1 := xk γ L β x k+1 1, x k+1,, x k, λk [Step 3] λ k+1 := λ k β =1 A x k+1 b end for =1 =1, Remark 3.8 Accordng to Assumpton 1., we assume that the subproblems n Step 1 of Algorthm can be solved to global optmalty. In fact, when the coupled objectve s absent or can be lnearzed, after choosng some proper matrx H, the soluton of the correspondng subproblem s gven by the proxmal mappngs of r. As we mentoned earler, many nonconvex regularzaton functons such as SCAD, LSP, MCP and Capped-l 1 admt closed-form proxmal mappngs. Moreover, n Algorthm, we can choose and γ β > max 13β 1βL 7L β L, max 13 =1,,..., 1, +, f β 9β 1βL 7L β 13β 1βL 7L, β+ 13β 1βL 7L, f β 7L +1βL 9β 7L +1βL 9β 6L, 3.8 σ mn H L, + ] L, L, whch guarantee the convergence rate of the algorthm as shown n Lemma 3.9 and Theorem 3.1. Before presentng the man result on the teraton complexty of proxmal ADMM-g, we need some lemmas. Lemma 3.9 Suppose the sequence {x k 1,, xk, λk } s generated by Algorthm. The followng nequalty holds λ k+1 λ k 3β 1/γ x k x k+1 +3β 1/γ + L x k 1 xk + 3L 1 =1 3.9 x k+1 x k

12 Proof. ote that Steps and 3 of Algorthm yeld that λ k+1 = β 1/γx k x k+1 + fx k+1 1, xk Combnng 3.11 and 3.1 yelds that λ k+1 λ k fx k+1 1, xk fx k 1,, x k 1, x k 1 + β 1/γxk x k+1 β 1/γx k 1 xk 3 fx k+1 1, xk fx k 1,, x k 1, x k 1 + 3β 1/γ x k x k+1 +3 [ β 1 γ ] x k 1 [ 3 β 1 ] x k γ x k+1 xk + 3 [ β 1 ] x + L k 1 γ xk 1 + 3L x k+1 =1 x k. We now defne the followng functon, whch wll play a crucal role n our analyss: [ Ψ G x 1, x,, x, λ, x = L β x 1, x,, x, λ + 3 β 1 ] + L x x. 3.1 β γ Lemma 3.10 Suppose the sequence {x k 1,, xk, λk } s generated by Algorthm, where the parameters β and γ are taken accordng to 3.8 and 3.9 respectvely. Then Ψ G x k+1, λk+1, x k monotoncally decreases over k 0. Proof. From Step 1 of Algorthm t s easy to see that L β x k+1 1, xk, λ k L β x k 1,, x k, λ k 1 =1 1 x k x k H From Step of Algorthm we get that [ 0 = x k xk+1 fx k+1 1, xk 1 ] λk + β =1 A x k+1 + x k b 1 γ x k xk+1 fx k+1 1, xk fxk+1 + L x k xk+1 x k xk+1 λ k + β x k xk+1 + β 1 A x k+1 + x k b β 1 A x k+1 + x k+1 b 1 γ x k xk+1 =1 =1 = L β x k+1 1, xk, λk L β x k+1, λk + L+β 1 x k γ x k+1, 3.14 where the nequalty follows from 3. and 3.3. Moreover, the followng equalty holds trvally L β x k+1, λk+1 = L β x k+1, λk + 1 λ k λ k β Combnng 3.13, 3.14, 3.15 and 3.10 yelds that L β x k+1, λk+1 L β x k 1,, x k, λ k L + β 1 x k x k x k x k γ H β =1 L + β 1 γ + 3 [ β 1 ] x k x k β γ β 1 3L + x k x k+1 β I 1 H x k x k+1, =1 1 λ k λ k+1 [ β 1 ] x + L k 1 γ xk

13 whch further mples that Ψ G x k+1, λk+1, x k Ψ G x k 1,, x k, λ k, x k L + β 1 γ + 6 [ β 1 ] x + 3L k x k+1 β γ β 1 =1 x k x k+1 1 H 3L β I. It s easy to verfy that when β > L, then γ defned as n 3.9 ensures that γ > 0 and L + β 1 γ + 6 [ β 1 ] + 3L < β γ β L Therefore, choosng β > max 13 L, max =1,,..., 1 σ mn H and γ as n 3.9 guarantees that Ψ G x k+1, λk+1, x k monotoncally decreases over k 0. In fact, 3.17 can be verfed as follows. By denotng z = β 1 γ, 3.17 s equvalent to 1z + βz + 6L + βl β < 0, whch holds when β > L and β 13β 1βL 7L < z < β+ 13β 1βL 7L,.e., β 13β 1βL 7L < 1 γ < β + 13β 1βL 7L, whch holds when γ s chosen as n 3.9. Lemma 3.11 Suppose the sequence {x k 1,, xk, λk } s generated by Algorthm. Under the same condtons as n Lemma 3.10, for any k 0, we have Ψ G x k+1, λk+1, x k where r and f are defned n Assumpton 3.. Proof. ote that from 3.11, we have 1 =1 r + f, = L β x k+1, λk+1 1 =1 r x k+1 + fx k+1 + β =1 3 β 1 =1 1 =1 A x k+1 + x k+1 b 1 A x k+1 + x k+1 b =1 =1 ] fx k+1 1, xk fx k+1 r x k+1 + fx k+1 [ β 1 ] x + L k γ r + f 3 β 1 1, b x k+1 [ β 1 ] x + L k γ A x k+1 + x k+1 b =1 fx k+1 [ β 1 γ β A x k+1 + β 6 L x k+1, 1 x k x k+1 =1 A x k+1 + x k+1 b 13

14 where the frst nequalty follows from 3., and the second nequalty s due to β 3L/. The desred result follows from the defnton of Ψ G n 3.1. ow we are ready to gve the teraton complexty of Algorthm for fndng an ɛ-statonary soluton of 1.1. Theorem 3.1 Suppose the sequence {x k 1,, xk, λk } s generated by Algorthm. Furthermore, suppose that β satsfes 3.8 and γ satsfes 3.9. Denote [ κ 1 := 3 β 1 β γ + L ], κ := β 1 γ + L, κ3 := max damx, 1 1 κ 4 := L + β [ max A ] + max H 1 1 and { L + β τ := mn 1 γ + 6 [ β 1 ] { + 3L 3L, mn β γ β =1,..., 1 β σ } } mnh > Then the number of teratons that the algorthm runs can be upper bounded by: max{κ1,κ,κ 4 κ 3 } Ψ τ ɛ G x 1 1,, x1, λ1, x 0 1 =1 r f, for Settng 1 K := max{κ1,κ,κ 4 } Ψ τ ɛ G x 1 1,, x1, λ1, x 0 1 =1 r f, for Settng and we can further dentfy one teraton ˆk argmn kk =1 x k xk+1 + x k 1 x k such that xˆk 1,, xˆk s an ɛ-statonary soluton for optmzaton problem 1.1 wth Lagrange multpler λˆk and A = I, for Settngs 1 and respectvely. Proof. For ease of presentaton, denote θ k := =1 x k x k+1 By summng 3.16 over k = 1,..., K, we obtan that Ψ G x K+1 1,, x K+1, λk+1, x K Ψ G x 1 1,, x 1, λ 1, x 0 τ + x k 1 x k. 3.0 K k=1 =1 where τ s defned n By nvokng Lemmas 3.10 and 3.11, we get [ mn θ 1 k Ψ G x 1 kk+1 τ K 1,, x 1, λ 1, x 0 + Ψ G x 1,, x, λ, x 1 τ K [ Ψ G x 1 1,, x 1, λ 1, x 0 ] r f. =1 x k x k+1, 3.1 ] r f We now derve upper bounds on the terms n 3.5 and 3.6 through θ k. ote that 3.11 mples that λ k+1 fx k+1 β 1 γ xk x k+1 + fx k+1 1, xk fx k+1 [ β 1γ ] + L x k x k+1, =1 14

15 whch yelds λ k+1 fx k+1 [ β 1 γ + L ] θ k. 3. From Step 3 of Algorthm and 3.10 t s easy to see that 1 =1 3 β [β 1 γ + 3L β A x k+1 + x k+1 b ] x k x k+1 1 =1 3 β [ β 1 γ = 1 β λ k+1 λ k [ + 3 β 1 β γ + L x k xk+1 + L ] θ k. ] x k 1 xk 3.3 We now derve upper bounds on the terms n 3.4 and 3.7 under the two settngs n Table 1, respectvely. Settng. Because r s lower sem-contnuous and X = R n, = 1,..., 1, t follows from Step 1 of Algorthm that there exsts a general subgradent g r x k+1 = dst fx k+1 + A λ k+1, r x k+1 g + fx k+1 A λ k+1 fx k+1 fx k+1, x k +1,, x k + βa A j x k+1 j x k j H x k+1 x k L j=+1 j=+1 x k j xk+1 j + β A + H x k+1 x k L + β max +1j [ A j ] A j=+1 A j x k+1 j j=+1 + H x k+1 x k L + β [ max A ] θk + max H. 1 1 x k j x k j xk+1 j such that 3.4 By combnng 3.4, 3. and 3.3 we conclude that Algorthm returns an ɛ-statonary soluton for 1.1 accordng to Defnton 3.6 under the condtons of Settng n Table 1. Settng 1. Under ths settng, we know r s Lpschtz contnuous and X R n s convex and compact. Because fx 1,, x s dfferentable, we know r x + fx 1,, x s locally Lpschtz contnuous wth respect to x for = 1,,..., 1. Smlar to 3.4, for any x X, 15

16 Step 1 of Algorthm yelds that [ x x k+1 x x k+1 + βa g + fx k+1 A λ k+1] 3.5 [ fx k+1 fx k+1, x k +1,, x k j=+1 L damx ] A j x k+1 j x k j H x k+1 x k j=+1 β A damx β max 1 x k j xk+1 j j=+1 [ A A j x k+1 j x k j damx H x k+1 x k ] + L + max 1 H max 1 1 [damx ] θ k, where g r x k+1 s a general subgradent of r at x k+1. By combnng 3.5, 3. and 3.3 we conclude that Algorthm returns an ɛ-statonary soluton for 1.1 accordng to Defnton 3.5 under the condtons of Settng 1 n Table 1. Remark 3.13 ote that the potental functon Ψ G defned n 3.1 s related to the augmented Lagrangan functon. The augmented Lagrangan functon has been used as a potental functon n analyzng the convergence of nonconvex splttng and ADMM methods n [, 31 33, 38]. See [3] for a more detaled dscusson on ths. Remark 3.14 In Step 1 of Algorthm, we can also replace the functon by ts lnearzaton fx k+1 1, xk, x k +1,, x k + fx k+1 1, x, x k +1,, x k x x k fx k+1 1, xk, x k +1,, x, so that the subproblem can be solved by computng the proxmal mappngs of r, wth some properly chosen matrx H wth H LI for = 1,..., 1, the same teraton bound stll holds. 3.4 Proxmal majorzaton ADMM proxmal ADMM-m Our proxmal ADMM-m solves 1.1 under the condton that A has full row rank. In ths secton, we use σ to denote the smallest egenvalue of A A. ote that σ > 0 because A has full row rank. Our proxmal ADMM-m can be descrbed as follows In Algorthm 3, Ux 1,, x 1, x, λ, x s defned as Ux 1,, x 1, x, λ, x = fx 1,, x 1, x + x x fx 1,, x 1, x + L x x λ, A x b + β A x b Moreover, β can be chosen as β > max { 18L σ, { max =1 6L σ σ mn H =1 }}. 3.6.

17 Algorthm 3 Proxmal majorzaton ADMM proxmal ADMM-m for solvng 1.1 wth A beng full row rank Requre: Gven x 0 1, x0,, x0 X1 X 1 R n, λ 0 R m for k = 0, 1,... do [Step 1] x k+1 := argmn x X L β x k+1 1, x, x k +1,, xk, λk + 1 x x k for H some postve defnte matrx H, = 1,..., 1 [Step ] x k+1 := argmn x Ux k+1 1, x, λ k, x k [Step 3] λ k+1 := λ k β =1 A x k+1 b end for to guarantee the convergence rate of the algorthm as shown n Lemma 3.16 and Theorem It s worth notng that the proxmal ADMM-m and proxmal ADMM-g dffer only n Step : Step of proxmal ADMM-g takes a gradent step of the augmented Lagrangan functon wth respect to x, whle Step of proxmal ADMM-m requres to mnmze a quadratc functon of x. We provde some lemmas that are useful n analyzng the teraton complexty of proxmal ADMM-m for solvng 1.1. Lemma 3.15 Suppose the sequence {x k 1,, xk, λk } s generated by Algorthm 3. The followng nequalty holds λ k+1 λ k 3L x k σ x k+1 + 6L σ x k 1 xk + 3L σ 1 Proof. From the optmalty condtons of Step of Algorthm 3, we have 0 = fx k+1 1, xk A λ k + βa A x k+1 b = fx k+1 1, xk A λ k+1 L =1 x k x k+1, =1 x k x k L where the second equalty s due to Step 3 of Algorthm 3. Therefore, we have λ k+1 λ k σ 1 A λ k+1 A λ k σ 1 fx k+1 1, xk fx k 1,, x k 1, x k 1 x k x k+1 Lxk x k+1 3 fx k+1 1 σ, xk fx k 1,, x k 1, x k 1 + 3L x k x k+1 σ 3L x k x k+1 σ + 6L x k 1 σ xk + 3L 1 σ =1 x k x k+1. We defne the followng functon that wll be used n the analyss of proxmal ADMM-m: Ψ L x 1,, x, λ, x = L β x 1,, x, λ + 6L βσ x x. + Lxk 1 xk Smlar as the functon used n proxmal ADMM-g, we can prove the monotoncty and boundedness of functon Ψ L. Lemma 3.16 Suppose the sequence {x k 1,, xk, λk } s generated by Algorthm 3, whereβ s chosen accordng to 3.6. Then Ψ L x k+, λk+1, x k monotoncally decreases over k > x k 1 xk

18 Proof. By Step 1 of Algorthm 3 one observes that L β x k+1 1, xk, λ k L β x k 1,, x k, λ k 1 =1 1 x k x k+1 whle by Step of Algorthm 3 we have [ 0 = x k xk+1 fx k+1 1, xk A λ k ] +βa =1 A x k+1 b L x k xk+1 fx k+1 1, xk fxk+1 L x k xk+1 1 =1 A x k+1 + A x k b λ k + =1 A x k+1 b λ k + β 1 =1 A x k+1 + A x k b β =1 A x k+1 b A x k A x k+1 β L β x k+1 1, xk, λk L β x k+1, λk L x k xk+1 where the frst nequalty s due to 3. and 3.3. Moreover, from 3.7 we have, 3.8 H, 3.9 L β x k+1, λk+1 L β x k+1, λk = 1 β λk λ k L x k x k+1 βσ Combnng 3.8, 3.9 and 3.30 yelds that + 6L x k 1 βσ xk + 3L 1 βσ =1 x k x k+1. L β x k+1, λk+1 L β x k 1,, x k, λ k 3L L 1 x k x k+1 βσ + x k x k+1 3L + 6L x k 1 I 1 βσ H βσ xk, whch further mples that =1 Ψ L x k+1, λk+1, x k Ψ L x k 1,, x k, λ k, x k L L x k x k+1 1 3L + σ mnh x k x k+1 < 0, βσ βσ where the second nequalty s due to 3.6. Ths completes the proof. The followng lemma shows that the functon Ψ L s lower bounded. =1 Lemma 3.17 Suppose the sequence {x k 1,, xk, λk } s generated by Algorthm 3. Under the same condtons as n Lemma 3.16, the sequence {Ψ L x k+, λk+1, x k } s bounded from below. Proof. From Step 3 of Algorthm 3 we have Ψ L x k+1, λk+1, x k L βx k+1, λk+1 = 1 =1 r x k+1 + fx k+1 =1 A x k+1 b λ k+1 + β =1 A x k+1 = 1 =1 r x k+1 + fx k+1 1 β λk λ k+1 λ k β λk λ k+1 = 1 =1 r x k+1 + fx k+1 1 β λk + 1 β λk β λk λ k+1 1 =1 r + f 1 β λk + 1 β λk+1, 18 b 3.3

19 where the thrd equalty follows from 3.3. Summng ths nequalty over k = 0, 1,..., K 1 for any nteger K 1 yelds that K 1 1 Ψ L x k+1 K, λk+1, x k k=0 1 =1 r + f 1 λ 0. β Lemma 3.16 stpulates that {Ψ L x k+1, λk+1, x k } s a monotoncally decreasng sequence; the above nequalty thus further mples that the entre sequence s bounded from below. We are now ready to gve the teraton complexty of proxmal ADMM-m, whose proof s smlar to that of Theorem 3.1. Theorem 3.18 Suppose the sequence {x k 1,, xk, λk } s generated by proxmal ADMM-m Algorthm 3, and β satsfes 3.6. Denote and κ 1 := 6L β σ, κ := 4L, κ 3 := max damx 1 1 κ 4 := L + β [ max A ] + max H, 1 1 { 9L τ := mn L { 3L, mn σ }} mnh > βσ =1,..., 1 βσ Then the number of teratons that the algorthm should run can be determned as: max{κ1,κ,κ 4 κ 3 } Ψ τ ɛ L x 1 1,, x1, λ1, x 0 1 =1 r f, for Settng 1 K := max{κ1,κ,κ 4 } Ψ τ ɛ L x 1 1,, x1, λ1, x 0 1 =1 r f, for Settng 3.34 and we can further dentfy one teraton ˆk argmn =1 x k xk+1 + x k 1 x k, kk+1 such that xˆk 1,, xˆk s an ɛ-statonary soluton for 1.1 wth Lagrange multpler λˆk and A beng full row rank, for Settngs 1 and respectvely. Proof. By summng 3.31 over k = 1,..., K, we obtan that Ψ L x K+1 1,, x K+1, λk+1, x K Ψ L x 1 1,, x 1, λ 1, x 0 τ K k=1 =1 x k x k+1, 3.35 where τ s defned n From Lemma 3.17 we know that there exsts a constant Ψ L such that Ψx k+1, λk+1, x k Ψ L holds for any k 1. Therefore, mn θ k [ ΨL x 1 kk+1 τ K 1,, x 1, λ 1, x 0 Ψ ] L, 3.36 where θ k s defned n 3.0,.e., for K defned as n 3.34, θˆk = Oɛ. We now gve upper bounds to the terms n 3.5 and 3.6 through θ k. ote that 3.8 mples that A λ k+1 fx k+1 L x k x k+1 + fx k+1 1, xk fx k+1 L x k x k+1, 19

20 whch mples that A λ k+1 fx k+1 4L θ k By Step 3 of Algorthm 3 and 3.7 we have =1 A x k+1 b 3L x k β σ x k+1 + 6L x k 1 β σ 6L β σ θ k. = 1 β λk+1 λ k 3.38 xk + 3L β σ 1 =1 x k x k+1 The remanng proof s to gve upper bounds to the terms n 3.4 and 3.7. Snce the proof steps are almost the same as Theorem 3.1, we shall only provde the key nequaltes below. Settng. Under condtons n Settng n Table 1, the nequalty 3.4 becomes fx k+1 dst L + β max 1 + A λ k+1, r x k+1 θk [ A ] + max H 1 By combnng 3.39, 3.37 and 3.38 we conclude that Algorthm 3 returns an ɛ-statonary soluton for 1.1 accordng to Defnton 3.6 under the condtons of Settng n Table 1. Settng 1. Under condtons n Settng 1 n Table 1, the nequalty 3.5 becomes x x k+1 [ g + fx k+1 β [ max A 1 ] + L + max 1 H A λ k+1] 3.40 max [damx ] θ k. 1 1 By combnng 3.40, 3.37 and 3.38 we conclude that Algorthm 3 returns an ɛ-statonary soluton for 1.1 accordng to Defnton 3.5 under the condtons of Settng 1 n Table 1. Remark 3.19 In Step 1 of Algorthm 3, we can replace the functon fx k+1 1, x, x k +1,, xk by ts lnearzaton x x k fx k+1 1, xk, x k +1,, x k + fx k+1 1, xk, x k +1,, x. Under the same condtons as n Remark 3.14, the same teraton bound follows by slghtly modfyng the analyss above. 4 Extensons 4.1 Relaxng the assumpton on the last block varable x It s noted that n 1.1, we have some restrctons on the last block varable x,.e., r 0 and A = I or s full row rank. In ths subsecton, we show how to remove these restrctons and consder the more general problem mn s.t. fx 1, x,, x + r x =1 =1 A x = b, = 1,...,, where x R n and A R m n, = 1,...,. Before proceedng, we make the followng assumpton on

21 Assumpton 4.1 Denote n = n n. For any compact set S R n, and any sequence λ j R m wth λ j, j = 1,,..., the followng lmt lm dst fx 1,, x + A λ j, j r x holds unformly for all x 1,, x S, where A = [A 1,..., A ]. Remark that the above mples A to have full row-rank. Furthermore, f f s contnuously dfferentable and r maps a compact set to a compact set, and A has full row rank, then Assumpton 4.1 trvally holds. On the other hand, for popular non-convex regularzaton functons, such as SCAD, MCP and Capped l 1 -norm, t can be shown that the correspondng r ndeed maps any compact set to a compact set, and so Assumpton 4.1 holds n all these cases. We ntroduce the followng problem that s closely related to 4.1: mn s.t. =1 fx 1, x,, x + r x + µɛ y =1 =1 A x + y = b, = 1,...,, where ɛ > 0 s the target tolerance, and µɛ s a functon of ɛ whch wll be specfed later. ow, proxmal ADMM-m s ready to be used for solvng 4. because A +1 = I and y s unconstraned. We have the followng teraton complexty result for proxmal ADMM-m to obtan an ɛ-statonary soluton of 4.1; proxmal ADMM-g can be analyzed smlarly. Theorem 4. Consder problem 4.1 under Settng n Table 1. Suppose that Assumpton 4.1 holds, and the objectve n 4.1,.e., f + =1 r, has a bounded level set. Furthermore, suppose that f has a Lpschtz gradent wth constant L, and A s of full row rank. ow let the sequence {x k 1,, xk, yk, λ k } be generated by proxmal ADMM-m for solvng 4. wth ntal terates y 0 = λ 0 = 0, and x 0 1,, x0 such that =1 A x 0 = b. Assume that the target tolerance ɛ satsfes { 1 0 < ɛ < mn L, 1 }, where τ = 1 6 τ mn {σ mnh }. 4.3 =1,..., Then n no more than O1/ɛ 4 teratons we wll reach an terate x ˆK+1 1,, x ˆK+1, y ˆK+1 that s an ɛ-statonary soluton for 4. wth Lagrange multpler λ ˆK+1. Moreover, x ˆK+1 1,, x ˆK+1 s an ɛ-statonary soluton for 4.1 wth Lagrange multpler λ ˆK+1. Proof. Denote the penalty parameter as βɛ. The augmented Lagrangan functon of 4. s gven by L βɛ x 1,, x, y, λ := fx 1,, x + =1 r x + µɛ y λ, =1 A x + y b + βɛ =1 A x + y b. ow we set 4. µɛ = 1/ɛ, and βɛ = 3/ɛ. 4.4 From 4.3 we have µɛ > L. Ths mples that the Lpschtz constant of the smooth part of the objectve of 4. s equal to µɛ. Then from the optmalty condtons of Step of Algorthm 3, we have µɛy k = λ k, k 1. Smlar as Lemma 3.16, we can prove that L βɛ x k 1,..., xk, yk, λ k monotoncally decreases. Specfcally, snce µɛy k = λ k, combnng 3.8, 3.9 and the equalty n 3.30 yelds, L βɛ x k+1 1 =1 x k x k+1 H, yk+1, λ k+1 L βɛ x k 1,, x k, y k, λ k µɛ µɛ βɛ 1 y k y k+1 < 0, 4.5

22 where the last nequalty s due to 4.4. Smlar as Lemma 3.17, we can prove that L βɛ x k 1,, xk, yk, λ k s bounded from below,.e., the exsts a constant L = f + =1 r such that L βɛ x k 1,, x k, y k, λ k L, for all k. Actually the followng nequaltes lead to the above fact: L βɛ x k 1,, x k, y k, λ k = fx k 1,, x k + = fx k 1,, x k + =1 =1 L + µɛ 1 A x k b =1 λ k, A x k + y k b + βɛ A x k + y k b =1 =1 µɛy k, A x k + y k b + βɛ A x k + y k b =1 =1 βɛ µɛ + A x k + y k b L, 4.6 µɛ r x k + µɛ yk r x k + µɛ yk where the second equalty s from µɛy k = λ k, and the last nequalty s due to 4.4. Moreover, denote L 0 L βɛ x 0 1,, x0, y0, λ 0, whch s a constant ndependent of ɛ. Furthermore, for any nteger K 1, summng 4.5 over k = 0,..., K yelds =1 L βɛ x K+1 1,, x K+1, yk+1, λ K+1 L 0 τ K θ k, 4.7 where θ k := =1 xk xk+1 + y k y k+1. ote that 4.7 and 4.6 mply that k=0 mn θ k 1 L 0 L kK τk Smlar as 3.4, t can be shown that for = 1,...,, dst fx k+1 + A λk+1, r x k+1 L + βɛ θk max 1 A + max 1 H. 4.9 Set K = 1/ɛ 4 and denote ˆK = argmn 0kK θ k. Then we know θ ˆK = Oɛ 4. As a result, A x ˆK+1 =1 + y ˆK+1 b = 1 βɛ λ ˆK+1 λ ˆK = µɛ βɛ y ˆK+1 y ˆK 1 9 θ ˆK = Oɛ ote that 4.6 also mples that fx k 1,, xk + =1 r x k s upper-bounded by a constant. Thus, from the assumpton that the level set of the objectve s bounded, we know x k 1,, xk s bounded. Then Assumpton 4.1 mples that λ k bounded, whch results n y k = Oɛ. Therefore, from 4.10 we have A x ˆK+1 b A x ˆK+1 + y ˆK+1 y b + ˆK+1 = Oɛ, =1 =1 whch combnng wth 4.9 yelds that x ˆK+1 1,, x ˆK+1 s an ɛ-statonary soluton for 4.1 wth Lagrange multpler λ ˆK+1, accordng to Defnton 3.6.

23 Remark 4.3 Wthout Assumpton 4.1, we can stll provde an teraton complexty of proxmal ADMM-m, but the complexty bound s worse than O1/ɛ 4. To see ths, note that because L βɛ x k 1,, xk, yk, λ k monotoncally decreases, the frst nequalty n 4.6 mples that µɛ 1 A x k b =1 L 0 L, k Therefore, by settng K = 1/ɛ 6, µɛ = 1/ɛ and βɛ = 3/ɛ nstead of 4.4, and combnng 4.9 and 4.11, we conclude that x ˆK+1 1,, x ˆK+1 s an ɛ-statonary soluton for 4.1 wth Lagrange multpler λ ˆK+1, accordng to Defnton Proxmal BCD Block Coordnate Descent In ths secton, we apply a proxmal block coordnate descent method to solve the followng varant of 1.1 and present ts teraton complexty: mn F x 1, x,, x := fx 1, x,, x + r x s.t. x X, = 1,...,, =1 4.1 where f s dfferentable, r s nonsmooth, and X R n s a closed convex set for = 1,,...,. ote that f and r can be nonconvex functons. Our proxmal BCD method for solvng 4.1 s descrbed n Algorthm 4. Algorthm 4 A proxmal BCD method for solvng 4.1 Requre: Gven x 0 1, x0,, x0 X1 X for k = 0, 1,... do Update block x n a cyclc order,.e., for = 1,..., H postve defnte: end for x k+1 := argmn F x k+1 1, x, x k +1,, x k + 1 x x k x X H Smlar as the settngs n Table 1, dependng on the propertes of r and X, the ɛ-statonary soluton for 4.1 s as follows. Defnton 4.4 x 1,..., x, λ s called an ɛ-statonary soluton for 4.1, f r s Lpschtz contnuous, X s convex and compact, and for any x X, = 1,...,, t holds that g = r x denotes a generalzed subgradent of r x x [ fx 1,, x + g A λ ] ɛ; or, f r s lower sem-contnuous, X = R n for = 1,...,, t holds that dst fx 1,, x + A λ, r x ɛ. We now show that the teraton complexty of Algorthm 4 can be obtaned from that of proxmal ADMM-g. By ntroducng an auxlary varable x +1 and an arbtrary vector b R m, problem 4.1 can be equvalently rewrtten as 3

24 mn fx 1, x,, x + r x =1 s.t. x +1 = b, x X, = 1,..., It s easy to see that applyng proxmal ADMM-g to solve 4.14 wth x +1 beng the last block varable reduces exactly to Algorthm 4. Hence, we have the followng teraton complexty result of Algorthm 4 for obtanng an ɛ-statonary soluton of 4.1. Theorem 4.5 Suppose the sequence {x k 1,, xk } s generated by proxmal BCD Algorthm 4. Denote κ 5 := L + max H, κ 6 := max damx. 1 1 Lettng K := κ5 κ 6 Ψ τ ɛ G x 1 1,, x1, λ1, x 0 =1 r f κ5 Ψ τ ɛ G x 1 1,, x1, λ1, x 0 =1 r f for Settng 1 for Settng wth τ beng defned n 3.18, and ˆK := mn =1 x k 1kK xk+1, we have that x ˆK 1,, x ˆK s an ɛ-statonary soluton for problem 4.1. Proof. ote that A 1 = { = A = 0 and A +1 = I n problem By applyng proxmal { ADMM-g wth β > max 18L, max 6L 1 σ mn H } }, Theorem 3.1 holds. In partcular, 3.4 and 3.5 are vald n dfferent settngs wth β max [ A j ] A = 0 for = 1,...,, +1j+1 whch leads to the choces of κ 5 and κ 6 n the above. Moreover, we do not need to consder the optmalty wth respect to x +1 and the volaton of the affne constrants, thus κ 1 and κ n Theorem 3.1 are excluded n the expresson of K, and the concluson follows. 5 umercal Experments 5.1 Robust Tensor PCA Problem We consder the followng nonconvex and nonsmooth model of robust tensor PCA wth l 1 norm regularzaton for thrd-order tensor of dmenson I 1 I I 3. Gven an ntal estmate R of the CP-rank, we am to solve the followng problem: mn A,B,C,Z,E,B Z A, B, C + α E 1 + B s.t. Z + E + B = T, 5.1 where A R I 1 R, B R I R, C R I 3 R. The augmented Lagrangan functon of 5.1 s gven by L β A, B, C, Z, E, B, Λ = Z A, B, C + α E 1 + B Λ, Z + E + B T + β Z + E + B T. The followng denttes are useful for our presentaton later: Z A, B, C = Z 1 AC B = Z BC A = Z 3 CB A, where Z stands for the mode- unfoldng of tensor Z and stands for the Khatr-Rao product of matrces. 4

25 ote that there are sx block varables n 5.1, and we choose B as the last block varable. A typcal teraton of proxmal ADMM-g for solvng 5.1 can be descrbed as follows we chose H = δ I, wth δ > 0, = 1,..., 5: 1 A k+1 = Z k 1 Ck B k + δ 1 A C k k C k B k B k + δ 1 I R R 1 B k+1 = Z k Ck A k+1 + δ B C k k C k A k+1 A k+1 + δ I R R 1 Z k 3 Bk+1 A k+1 + δ 3 C B k k+1 B k+1 A k+1 A k+1 + δ 3 I R R C k+1 = E k+1 1 = S β Z k+1 1 = 1 +δ 5 +β α β+δ 4 β+δ 4 T β Λk 1 Bk 1 Zk 1 + δ 4 β+δ 4 E1 k, A k+1 C k+1 B k+1 + δ 5 Z 1 k + Λ k 1 βek B1 k T 1 B k+1 1 = B k 1 γ B k 1 Λk 1 + βek Z k B k 1 T 1 Λ k+1 1 = Λ k 1 β Z k E k B k+1 1 T 1 where s the matrx Hadamard product and S stands for the soft shrnkage operator. The updates n proxmal ADMM-m are almost the same as proxmal ADMM-g except B 1 s updated as B k+1 1 = 1 L B1 k L + β + Λk 1 βek Z k+1 1 T 1. On the other hand, note that 5.1 can be equvalently wrtten as mn Z A, B, A,B,C,Z,E C + α E 1 + Z + E T, 5. whch can be solved by the classcal BCD method as well as our proxmal BCD Algorthm 4. In the followng we shall compare the numercal performance of BCD, proxmal BCD, proxmal ADMM-g and proxmal ADMM-m for solvng 5.1. We let α = / max{ I 1, I, I 3 } n model 5.1. We apply proxmal ADMM-g and proxmal ADMM-m to solve 5.1, and apply BCD and proxmal BCD to solve 5.. In all the four algorthms we set the maxmum teraton number to be 000, and the algorthms are termnated ether when the maxmum teraton number s reached or when θ k as defned n 3.0 s less than The parameters used n the two ADMM varants are specfed n Table. H, = 1,..., 5 β γ 1 proxmal ADMM-g β I 4 1 β proxmal ADMM-m 5 β I 5 - Table : Choces of parameters n the two ADMM varants. In the experment, we randomly generate 0 nstances for fxed tensor dmenson and CPrank. Suppose the low-rank part Z 0 s of rank R CP. It s generated by Z 0 = R CP r=1 a 1,r a,r a 3,r, where vectors a,r are generated from standard Gaussan dstrbuton for = 1,, 3, r = 1,..., R CP. Moreover, a sparse tensor E 0 s generated wth cardnalty of I 1 I I 3 such that each nonzero component follows from standard Gaussan dstrbuton. Fnally, we generate nose B 0 = ˆB, where ˆB s a Gaussan tensor. Then we set T = Z 0 + E 0 + B 0 as the observed data n 5.1. We report the average performance of 0 nstances of the four algorthms wth ntal guess R = R CP and R = R CP + 1 n Tables 3 and 4, respectvely. 5

On the Global Linear Convergence of the ADMM with Multi-Block Variables

On the Global Linear Convergence of the ADMM with Multi-Block Variables On the Global Lnear Convergence of the ADMM wth Mult-Block Varables Tany Ln Shqan Ma Shuzhong Zhang May 31, 01 Abstract The alternatng drecton method of multplers ADMM has been wdely used for solvng structured