Research Article. Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization

Size: px

Start display at page:

Download "Research Article. Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization"

Lorin Bradley
5 years ago
Views:

1 To appear n Optmzaton Vol. 00, No. 00, Month 20XX, 1 27 Research Artcle Almost Sure Convergence of Random Projected Proxmal and Subgradent Algorthms for Dstrbuted Nonsmooth Convex Optmzaton Hdea Idua a a Department of Computer Scence, Mej Unversty, Hgashmta, Tama-u, Kawasa-sh, Kanagawa Japan dua@cs.mej.ac.jp (Receved 00 Month 20XX; accepted 00 Month 20XX) Two dstrbuted algorthms are descrbed that enable all users connected over a networ to cooperatvely solve the problem of mnmzng the sum of all users objectve functons over the ntersecton of all users constrant sets, where each user has ts own prvate nonsmooth convex objectve functon and closed convex constrant set, whch s the ntersecton of a number of smple, closed convex sets. One algorthm enables each user to adjust ts estmate by usng the proxmty operator of ts objectve functon and the metrc projecton onto one constrant set randomly selected from a number of smple, closed convex sets. The other determnes each user s estmate by usng the subdfferental of ts objectve functon nstead of the proxmty operator. Investgaton of the two algorthms convergence propertes for a dmnshng step-sze rule revealed that, under certan assumptons, the sequences of all users generated by each of the two algorthms converge almost surely to the same soluton. It also showed that the rate of convergence depends on the step sze and that a smaller step sze results n qucer convergence. The results of numercal evaluaton usng a nonsmooth convex optmzaton problem support the convergence analyss and demonstrate the effectveness of the two algorthms. Keywords: almost sure convergence; dstrbuted nonsmooth convex optmzaton; metrc projecton; proxmty operator; random projecton algorthm; subgradent AMS Subject Classfcaton: 90C15; 90C25; 90C30 1. Introducton Future networ models have attracted a great deal of attenton. The concept of the networ model consdered here dffers completely from that of a conventonal clent-server networ model. Whle a conventonal clent-server networ model explctly dstngushes hosts provdng servces (servers) from hosts recevng servces (clents), the networ model consdered here does not assgn fxed roles to hosts. Hosts composng the networ, referred to here as users, can be both servers and clents. Hence, the networ can functon as an autonomous, dstrbuted, and cooperatve system. Although there are several forms of networs n whch some operatons are ntentonally centralzed (e.g., hybrd peer-to-peer networs), here the focus s on networs that do not have centralzed operatons. Therefore, dstrbuted mechansms need to be used that can wor n cooperaton wth each user and neghborng users to control the networ. Ths wor was supported by the Japan Socety for the Promoton of Scence through a Grant-n-Ad for Scentfc Research (C) (15K04763). 1

2 Dstrbuted optmzaton (see, e.g., [2], [4], [9], [11, Part II], [12, PART II], [27], [28], [33] and references theren) plays a crucal role n mang future networs [10, 14, 23, 30], such as wreless, sensor, and peer-to-peer networs, stable and hghly relable. One way [23, Chapter 5], [25], [33, Chapter 2] to acheve ths optmzaton s to model the utlty and strategy of each user respectvely as a concave utlty functon and convex constrant set and solve the problem of maxmzng the overall utlty of all users over the ntersecton of all users constrant sets. Ths paper focuses on a constraned convex mnmzaton problem n whch each user has ts own nonsmooth convex objectve functon (.e., a mnus nonsmooth concave utlty functon) and closed convex constrant set that s the ntersecton of a number of smple, closed convex sets (e.g., the ntersecton of affne subspaces, halfspaces, and hyperslabs). The constraned nonsmooth convex optmzaton problem covers the mportant stuatons n whch each user s objectve functon s dfferentable wth a non-lpschtz contnuous gradent or not dfferentable (e.g., the L 1 -norm) and ncludes, for nstance, the problem of mnmzng the total varaton of a sgnal over a convex set, Tyhonov-le problems wth L 1 -norms [13, I. Introducton], the classfer ensemble problem wth sparsty and dversty learnng [39, Subsecton 2.2.3], [40, Subsecton 3.2.4], whch s expressed as L 1 -norm mnmzaton, and the mnmal antenna-subset selecton problem [38, Subsecton 17.4]. The man objectve of the present paper s to show that the constraned nonsmooth convex optmzaton problem for many real-world applcatons can be solved by usng dstrbuted optmzaton technques. Two dstrbuted optmzaton algorthms are presented for solvng the constraned convex mnmzaton problem descrbed above. At each teraton of the frst algorthm, each user calculates the weghted average of ts estmate and the estmates receved from ts neghborng users and then updates ts estmate by usng the weghted average, the proxmty operator of ts own prvate nonsmooth convex functon, and the metrc projecton onto one constrant set randomly selected from a number of smple, closed convex sets. The second algorthm s obtaned by replacng the proxmty operator n the frst algorthm wth the subdfferental of each user s nonsmooth convex functon. The two algorthms are performed on the bass of a framewor [24, (2a)] for local user communcatons and random observatons [24, (2b)], [26, (2)] of the local constrants, whch ensures that each user can observe one smple, closed convex set onto whch the metrc projecton can be effcently calculated. Accordngly, the two algorthms can be appled to two complcated cases. In Case 1, each user does not now the full form of ts prvate constrant set n advance and can observe only one smple, closed convex set at each nstance. In Case 2, each user nows the full form of ts prvate constrant set n advance, and the constrant set s the ntersecton of a huge number of smple, closed convex sets, whch means that a metrc projecton onto the constrant set cannot be calculated easly. See [24, Secton I] and [26, Secton 1] and references theren for detals on applcatons of the two cases, ncludng collaboratve flterng for recommender systems and text classfcaton problems. Proxmal pont methods and subgradent methods wth randomzed order [4, Secton 4], [27, Secton 3] are useful for constraned nonsmooth convex optmzaton. These methods are mplemented under the condton that each user uses a randomly chosen component functon at each teraton whle the proposed algorthms are performed under the condton that each user uses a randomly chosen closed convex set onto whch the metrc projecton can be effcently calculated (Subsectons 3.2 and 4.2 explctly compare the methods n [4, Secton 4] and [27, 2

3 Secton 3] wth the proposed algorthms). Related random projecton algorthms have been proposed for convex optmzaton over the ntersecton of a number of closed convex sets. The algorthm most relevant to the wor reported here s the frst dstrbuted random projected gradent algorthm [24] that was proposed for solvng a constraned smooth convex mnmzaton problem when each user s objectve functon s convex wth a Lpschtz contnuous gradent. The centralzed random projected gradent and subgradent algorthms [26] were proposed for mnmzng a sngle objectve functon over the ntersecton of an arbtrary famly of convex nequaltes. The ncremental constrant projecton-proxmal algorthm [36] uses both random subgradent updates and random constrant updates. Whle there have been no reports on dstrbuted random projecton algorthms for nonsmooth convex optmzaton, thans to the useful deas n [24, 36], a dstrbuted random projected proxmal algorthm (Algorthm 3.1) can be devsed. Moreover, on the bass of [24, 26], a dstrbuted random projected subgradent algorthm (Algorthm 4.1) can be devsed that s a generalzaton of the frst dstrbuted random projected gradent algorthm [24, (2a), (2b)]. Furthermore, there has been much wor [35] on random projecton of vectors n a hgher dmensonal space to a randomly chosen lower dmensonal subspace, and convex programmng problems [29] and affne varatonal nequaltes [31] have been analyzed usng random projectons defned n the same way [35]. Snce, wth the proposed algorthms, each user s assumed to use a randomly chosen projecton at each teraton that projects a vector n the whole space to a closed convex subset of the whole space, the defnton of random projecton n [29, 31, 35] clearly dffers from the one used here. Ths paper maes two contrbutons that buld on prevously reported results. Frst, t presents two novel dstrbuted random projecton algorthms for constraned nonsmooth convex optmzaton that are based on each user s local communcatons. Ths means that they can be mplemented ndependently of the networ topology and that each user can calculate the weghted average of ts estmate and the neghborng users estmates. The algorthms proceed by performng a proxmal or subgradent step for each user s objectve functon at the weghted average and projectng onto one smple, closed convex set that s randomly selected from each user s local constrant sets. Snce the metrc projecton s a specal case of nonexpansve mappng, the algorthms are related to prevous fxed pont optmzaton algorthms [15, 16, 18, 19, 21] for convex optmzaton over fxed pont sets of nonexpansve mappngs. Furthermore, t presents a gradent method [18] that accelerates prevously reported optmzaton algorthms for mnmzng one smooth convex objectve functon over a fxed pont set of a nonexpansve mappng. These algorthms [16, 21] are synchronous decentralzed algorthms that can be appled to smooth convex optmzaton over fxed pont sets of nonexpansve mappngs. Snce these algorthms [16, 18, 21] wor for only smooth convex optmzaton, they cannot be appled to the constraned nonsmooth convex optmzaton problem consdered here. Parallel and ncremental subgradent algorthms [15] have been proposed to solve the problem of mnmzng the sum of nonsmooth, convex objectve functons over fxed pont sets, and ncremental proxmal pont algorthms [19] have been proposed for solvng the problem. These algorthms [15, 19] wor for nonsmooth convex optmzaton over the ntersecton of convex constrant sets that are not always smple. However, snce they wor only when each user maes the best use of ts own prvate nformaton, they cannot be appled to the case n whch each user does not now the explct form of ts constrant set n advance. Moreover, snce these algorthms [15, 19] can be appled only to determnstc optmzaton, they cannot be appled to the case n whch each user uses one constrant set selected 3

4 randomly at each teraton. In contrast, the proposed algorthms wor even when each user randomly sets one projecton selected from many projectons (see Case 1 descrpton above). Therefore, the proposed algorthms have wder applcaton than prevous fxed pont optmzaton algorthms [15, 16, 18, 19, 21]. The second contrbuton s an analyss of the proposed proxmal and subgradent algorthms. In contrast to the convergence analyss of the frst dstrbuted random projected gradent algorthm [24], smooth convex analyss, whch has tractable propertes due to the use of Lpschtz contnuous gradents, cannot be appled to the convergence analyses of the two proposed algorthms, whch optmze nonsmooth convex functons. However, convergence analyses of the two algorthms can be performed by usng useful propertes [1, Propostons 12.16, 12.27, and 16.14] (Proposton 2.1) of the proxmty operators and the subgradents of nonsmooth convex objectve functons. Thans to the supermartngale convergence theorem [5, Proposton ] (Proposton 2.2) and the portmanteau lemma [6, Theorem 16], [34, Lemma 2.2] (Proposton 2.3), t s guaranteed that, under certan assumptons, the sequences of all users generated by each of the two algorthms converge almost surely to the same soluton to the constraned nonsmooth convex optmzaton problem consdered n ths paper (Theorems 3.1 and 4.1). Moreover, the rates of convergence of the two algorthms (Proposton 3.1, (11), Proposton 4.1, and (22)) are provded to llustrate ther algorthm effcences. The convergence rate analyss leads to selecton of a step sze such that the two algorthms converge qucly. Numercal results for the two algorthms are provded to support the convergence and convergence rate analyses. Ths paper s organzed as follows. Secton 2 gves the mathematcal prelmnares and states the man problem. Secton 3 presents the proposed random projected proxmal algorthm for solvng the man problem and descrbes ts convergence propertes for a dmnshng step sze. Secton 4 presents the proposed random projected subgradent algorthm for solvng the man problem and descrbes ts convergence propertes for a dmnshng step sze. Secton 5 presents a numercal evaluaton usng a nonsmooth convex optmzaton problem and compares the behavors of the two algorthms. Secton 6 concludes the paper wth a bref summary and mentons future drectons for mprovng the proposed algorthms. 2. Prelmnares 2.1. Defntons and propostons Let R d be a d-dmensonal Eucldean space wth nner product, and ts nduced norm, and let R d + := {(x 1, x 2,..., x d ) R d : x 0 ( = 1, 2,..., d)}. Let [W ] j and W denote the (, j)th entry and the transpose of a matrx W. Let Pr{X} and E[X] denote the probablty and the expectaton of a random varable X. The probablty space consdered here s denoted by (Ω, F, Pr). The metrc projecton onto a nonempty, closed convex set C R d s denoted by P C, and t s defned for all x R d by P C (x) C and x P C (x) = d(x, C) := nf{ x y : y C}. Mappng P C satsfes the frm nonexpansvty condton [1, Proposton 4.8];.e., P C (x) P C (y) 2 + (x P C (x)) (y P C (y)) 2 x y 2 (x, y R d ). Ths means that P C (x) P C (y) x y (x, y R d );.e., P C s nonexpansve. The subdfferental [1, Defnton 16.1] of f : R d R s the set-valued operator defned for all x R d by f(x) := {u R d : f(y) f(x) + y x, u (y R d )}. The proxmty operator [1, Defnton 12.23] of a convex functon f : R d R, denoted by prox f, maps every x R d to the unque mnmzer of f( )+(1/2) x 2 ; 4

5 .e., prox f (x) Argmn y R d[f(y) + (1/2) x y 2 ] (x R d ). The unqueness and exstence of prox f (x) are guaranteed for all x R d [1, Defnton 12.23]. Proposton 2.1 Let f : R d R be convex. Then the followng hold: () f(x) s nonempty for all x R d. () Let x, p R d. p = prox f (x) f and only f x p f(p) (.e., y p, x p + f(p) f(y) for all y R d ). () prox f s frmly nonexpansve. (v) Let L > 0. Then f s Lpschtz contnuous wth a Lpschtz constant L f and only f u L for all x R d and for all u f(x). Proof: () () Propostons 16.14() and n [1] lead to () and () whle Proposton n [1] mples (). (v) Theorem 6.2.2, Corollary 6.1.2, and Exercse 6.1.9(c) n [7] lead to (v). The followng propostons are needed to prove the man theorems. Proposton 2.2 [The supermartngale convergence theorem [5, Proposton ]] Let (Y ) 0, (Z ) 0, and (W ) 0 be sequences of nonnegatve random varables, and let F ( 0) denote the collecton Y 0, Y 1,..., Y, Z 0, Z 1,..., Z, and W 0, W 1,..., W. Suppose that =0 W < almost surely and that almost surely, for all 0, E[Y +1 F ] Y Z + W. Then =0 Z < almost surely and (Y ) 0 converges almost surely to a nonnegatve random varable Y. Proposton 2.3 [The portmanteau lemma [6, Theorem 16], [34, Lemma 2.2]] Let (Y ) 0 be a sequence of random varables that converges n law to a random varable Y. Then lm sup Pr{Y F } Pr{Y F } for every closed set F. A drected graph G := (V, E) s a fnte nonempty set V of nodes (users) and a collecton E of ordered pars of dstnct nodes from V [3, p. 394]. A drected graph s sad to be strongly connected f, for each par of nodes and l, there exsts a drected path from to l [3, p. 394] Man problem and assumptons Let us consder a constraned nonsmooth convex optmzaton problem that s dstrbuted over a networ of m users, ndexed by V := {1, 2,..., m}. User ( V ) has ts own prvate functon f : R d R and constrant set X R d. On the bass of [24, Secton II], let us defne the constrant set X := m X for the whole networ. Suppose that X s the ntersecton of n closed convex sets. Let I := {1, 2,..., n}, and let I ( V ) be the partton of I such that I = m I and I I l = for all, l V wth l. Then X can be defned by the ntersecton of closed convex sets X j (j I );.e., X := j I X j ( V ) and X := V X = m The followng s assumed hereafter. Assumpton 2.1 Suppose that j I X j. (A1) X j Rd ( V, j I ) s a closed convex set onto whch the metrc projecton can be effcently computed, and X := V X ; P X j (A2) f : R d R ( V ) s convex; (A3) For all V, there exsts M R such that sup{ g : x X, g f (x)} M. Assumpton (A3) s satsfed f f ( V ) s polyhedral on X or X ( V ) s 5

6 bounded [5, p. 471]. Proposton 2.1(v) guarantees that, f f ( V ) s Lpschtz contnuous on X, Assumpton (A3) holds. The followng s the man problem dscussed here. Problem 2.1 Under Assumpton 2.1, mnmze f (x) := V f (x) subject to x X := X, where one assumes that Problem 2.1 has a soluton. The soluton set of Problem 2.1 s denoted by X := {x X : f(x ) = f := nf x X f(x)}. The condton X holds, for example, when one of X j s s bounded [1, Corollary 8.31, Proposton 11.14]. The man objectve here s to present dstrbuted optmzaton algorthms that enable each user to solve Problem 2.1 wthout usng other users prvate nformaton. Ths goal s addressed by assumng that each user and ts neghborng users form a networ n whch each user can transmt ts estmate to ts neghborng users. The networ topology at tme s expressed as a drected graph G() := (V, E()), where E() V V, and (, j) E() stands for a ln such that user receves nformaton from user j at tme. Let N () V be the set of users that send nformaton to user ;.e., N () := {j V : (, j) E()} and N () ( V, 0). To consder Problem 2.1, the followng assumptons are needed [24, Assumptons 4 and 5], whch leads to Lemma 3.3 (see [32, Theorem 4.2]) used to prove the man theorems. Assumpton 2.2 There exsts Q 1 such that the graph (V, Q 1 l=0 E( + l)) s strongly connected for all 0. Assumpton 2.3 For 0, user ( V ) has the weghted parameters w j () (j V ) satsfyng the followng: () w j () := [W ()] j 0 for all j V, and w j () = 0 when j / N (); () There exsts w (0, 1) such that w j () w for all j N (). () j V [W ()] j = 1 for all V and V [W ()] j = 1 for all j V. Moreover, user ( V ) has the step sze (α ) 0 =0 α = and (C2) =0 α2 <. V (0, ) satsfyng (C1) The matrx W () ( 0) satsfyng Assumpton 2.3() () s sad to be doubly stochastc [24, Assumpton 5]. Theorem 1 n [37] ndcates that the spectral radus ρ of the doubly stochastc matrx W () ( 0) s less than 1 f and only f, for all 0, lm W s ()s = ee m, (1) where e R d stands for the column vector n whch all entres are equal to 1. Moreover, Theorem n [8] and the remar n [37, Theorem 1] show that (1) means the networ s strongly connected. Accordngly, Assumpton 2.2 holds f ρ < 1 (e.g., ρ < 1 holds when one s a smple egenvalue of W () and all other egenvalues are strctly less than one n magntude [37, p. 67]). See [17, Examples 2.6 and 2.7] for examples of W () satsfyng Assumpton 2.3() (). Assumptons (A1) and (A2) mply that, for all 0, user ( V ) can determne 6

7 ts estmate by usng a subdfferental or proxmty operator of f and the metrc projecton onto a certan constrant set selected from ts own constrant sets X j (j I ). Here t s assumed that user ( V ) forms the metrc projecton on the bass of random observatons of the local constrants;.e., user observes a local constrant set at tme, X Ω(), where Ω () I s a random varable, and thus uses P Ω X (). Convergence analyses are performed by assumng the followng [24, Assumptons 2 and 3]: Assumpton 2.4 The random sequences (Ω ()) 0 ( V ) are ndependent and dentcally dstrbuted and ndependent of the ntal ponts x (0) ( V ) n the algorthms presented here. Moreover, Pr{Ω () = j} > 0 holds for V and for j I. Assumpton 2.5 There exsts c > 0 such that, for all V, for all x R d, and for all 0, d(x, X) 2 ce[d(x, X Ω () ) 2 ]. The sequences (Ω ()) 0 ( V ) satsfy Assumpton 2.4 when the sample constrants are generated through state transtons of a Marov chan [36, Subsecton 4.4]. Assumpton 2.4 s satsfed when the constrants are sampled n a cyclc manner by random shufflng or n accordance wth a determnstc cyclc order [36, Subsecton 4.3]. Other examples of (Ω ()) 0 satsfyng Assumpton 2.4 are descrbed n Subsectons 4.1 and 4.2 n [36]. Assumpton 2.5 holds when X j ( V, j I ) s a lnear nequalty or an equalty constrant, or when X has an nteror pont (see [24, p. 223] and references theren). Even f X j ( V, j I ) s a restrcted constrant such as a lnear nequalty or equalty constrant, X := j I X j ( V ) s complcated n the sense that metrc projecton onto X s not necessarly easy. Accordngly, the metrc projecton onto the whole constrant set X := V X cannot be effcently computed, and hence, X := V j I X j has a complcated form even when X j s smple enough to have a closed form expresson of the metrc projecton onto X j. The proposed algorthms (Algorthms 3.1 and 4.1) usng the metrc projecton onto X j can be appled to Problem 2.1 wth such a complcated constrant X. In contrast, the prevously reported algorthms [4, Secton 4], [27, Secton 3] need to use metrc projecton onto the whole constrant set X (see Subsectons 3.2 and 4.2 for detaled comparsons of the proposed algorthms wth the prevous ones [4, Secton 4], [27, Secton 3]). Let x () R d be the estmate of user at tme (see (4) and (16) n Algorthms 3.1 and 4.1 for detals of the defnton of (x ()) 0 ( V )). The proposed algorthms are analyzed by usng the expectaton taen wth respect to the past hstory of the algorthms defned as follows [24, p. 224]. Let F be the σ-algebra generated by the entre hstory of the algorthms up to tme ( 1) nclusvely;.e., F 0 := σ(x (0) ( V )), and for all 1, F := σ (x (0) ( V ), Ω (l) (l [0, 1], V )). (2) 3. Dstrbuted random projected proxmal algorthm Ths secton presents the proposed proxmal algorthm usng random projectons for solvng Problem 2.1 under Assumptons Algorthm 3.1 7

8 Step 0. User ( V ) sets x (0) arbtrarly. Step 1. User ( V ) receves x j () from ts neghborng users j N () and computes the weghted average v () := j N () User updates ts estmate x ( + 1) by usng x ( + 1) := P X Ω () The algorthm sets := + 1 and returns to Step 1. w j () x j (). (3) ( proxα f (v ()) ). (4) Defnton (2) mples that, gven F ( 0), the collecton x (0), x (1),..., x () and v (0), v (1),..., v () generated by Algorthm 3.1 s fully determned. Algorthm 3.1 enables user ( V ) to determne x ( + 1) by usng ts proxmty operator for the weghted average v () of the receved x j () (j N ()) and the metrc projecton onto a constrant set X Ω () randomly selected from X j. The convergence analyss of Algorthm 3.1 s as follows. Theorem 3.1 Under Assumptons , the sequence (x ()) 0 ( V ) generated by Algorthm 3.1 converges almost surely to a random pont x X Proof of Theorem 3.1 The followng lemma can be proven by usng Proposton 2.1 and the frm nonexpansvty condton of P Ω X () and by referrng to the proof of [26, Lemma 2]. Hence, the proof of Lemma 3.1 can be omtted. Lemma 3.1 Suppose that Assumpton 2.1 holds. Then for all x X, for all V, for all z X, for all 0, and for all τ, η, µ > 0, x ( + 1) x 2 v () x 2 2α (f (z) f (x)) + N (τ, η) α 2 + τ v () z 2 + (η + µ 2) v () prox αf (v ()) 2 ( 1 1 ) x ( + 1) v () 2, µ where N(τ, η) := (1/τ + 1/η) max V M 2 (A3). < and M ( V ) s as n Assumpton The followng lemma s shown. Lemma 3.2 Suppose that Assumptons 2.1, 2.3, and 2.5 hold. Then =0 v () prox α f (v ()) 2 <, and =0 d(v (), X) 2 < almost surely for all V. Proof: The defnton of d means that d(v (), X) = v () P X (v ()) and d(x ( + 1), X) x ( + 1) P X (v ()) ( V, 0). The condton x ( + 1) X Ω () ( V, 0) means that d(v (), X Ω () ) v () x ( + 1) ( V, 0). Then Lemma 3.1 wth x = z := P X (v ()) ( V, 0) guarantees 8

9 that, for all V, for all 0, for all τ, η > 0, and for all µ > 1, d (x ( + 1), X) 2 d (v (), X) 2 + τd (v (), X) 2 + (η + µ 2) v () prox αf (v ()) 2 + N (τ, η) α 2 ( 1 1 ) ( ) d v (), X Ω() 2. µ By tang the expectaton n ths nequalty condtoned on F defned n (2), Assumpton 2.5 leads to the fndng that, almost surely, for all V, for all 0, for all τ, η > 0, and for all µ > 1, E [d ] (x ( + 1), X) 2 F d (v (), X) 2 + (η + µ 2) v () prox αf (v ()) 2 { + τ 1 ( 1 1 )} d (v (), X) 2 + N (τ, η) α 2 c µ. Let us tae τ := 1/(6c), η := 1/3, and µ := 3/2. From η + µ 2 = 1/6, τ (1/c)(1 1/µ) = 1/(6c), and the convexty of d(, X) 2, almost surely for all V and for all 0, E [d ] (x ( + 1), X) 2 F m [W ()] j d (x j (), X) 2 1 v () prox 6 α f (v ()) 2 j=1 1 6c d (v (), X) 2 + N ( 1 6c, 1 ) α 2 3, where N(1/(6c), 1/3) < s guaranteed from Assumpton (A3). Hence, Assumpton 2.3 ensures that, almost surely, for all 0, [ m ] E d (x ( + 1), X) 2 F m d (x j (), X) j=1 1 6c m v () prox α f (v ()) 2 m d (v (), X) 2 + mn ( 1 6c, 1 ) α 2 3. (5) Proposton 2.2 and (C2) lead to m =0 v () prox αf (v ()) 2 < and m =0 d(v (), X) 2 < almost surely. Ths completes the proof. The followng lemma can be proven by usng Lemma 3.2 and by referrng to the proof of [24, Lemma 7]. Lemma 3.3 Suppose that Assumptons 2.1, 2.2, and 2.3 hold and defne v() := (1/m) m l=1 v l() for all 0. Then =0 e () 2 := =0 x ( +1) v () 2 < and =0 α v () v() < almost surely for all V. Next, let us show the followng lemma. Lemma 3.4 Suppose that the assumptons n Theorem 3.1 are satsfed, and defne z () := P X (v ()) for all V and for all 0 and z() := (1/m) m z () for all 0. Then the sequence ( x () x ) 0 converges almost surely for all V and for all x X and lm nf f( z()) = f almost surely. 9

10 Proof: Choose x X arbtrarly. The convexty of 2 and Assumpton 2.3 mply that m v () x 2 m j=1 x j() x 2 ( 0). Lemma 3.1 mples that, for all 0 and for all τ, η, µ > 0, m x ( + 1) x 2 m x () x 2 2α + τ m m v () z () 2 + (η + µ 2) (f (z ()) f (x )) ( 1 1 µ ) m x ( + 1) v () 2 m v () prox α f (v ()) 2 + mn (τ, η) α 2. From z () X ( V, 0), the convexty of X ensures that z() X X ( V ). Accordngly, (A3) means that ḡ () M for all ḡ () f ( z()) ( V, 0). The defnton of f ( V ) and the Cauchy-Schwarz nequalty thus guarantee that, for all V and for all 0, f (z ()) f (x ) = f (z ()) f ( z()) + f ( z()) f (x ) z () z(), ḡ () + f ( z()) f (x ) M z () z() + f ( z()) f (x ), where M := max V M <. Moreover, the convexty of and the nonexpansvty of P X mply that, for all V and for all 0, z () z() = (1/m) m l=1 (P X(v ()) z l ()) (1/m) m l=1 P X(v ()) P X (v l ()) (1/m) m l=1 v () v l (), whch, together wth the trangle nequalty, mples that, for all V and for all 0, z () z() v () v() + (1/m) m l=1 v l() v(), where v() := (1/m) m l=1 v l() ( 0). Accordngly, for all V and for all 0, f (z ()) f (x ) M v () v () M m + f ( z ()) f (x ). m v l () v () l=1 (6) Hence, the defntons of f and f mply that, for all 0 and for all τ, η, µ > 0, m x ( + 1) x 2 m x () x Mα m v () v () + 2 Mα m v l () v () 2α (f ( z ()) f ) + τ l=1 m v () z () 2 + (η + µ 2) ( 1 1 ) m x ( + 1) v () 2 µ m v () prox α f (v ()) 2 + mn (τ, η) α 2, whch, together wth d(v (), X) = v () z () and d(v (), X Ω () ) v () x ( + 1) ( V, 0), mples that, for all 0, for all τ, η > 0, and for all 10

11 µ > 1, m x ( + 1) x 2 m x () x Mα 2α (f ( z ()) f ) + τ m v () v () m d (v (), X) 2 ( 1 1 ) m ( ) d v (), X Ω() 2 µ m + (η + µ 2) v () prox α f (v ()) 2 + mn (τ, η) α 2. Accordngly, Assumpton 2.5 guarantees that, almost surely, for all 0, for all τ, η > 0, and for all µ > 1, [ m ] E x ( + 1) x 2 F m x () x Mα m v () v () 2α (f ( z ()) f ) { + τ 1 ( 1 1 )} m d (v (), X) 2 c µ + (η + µ 2) m v () prox αf (v ()) 2 + mn (τ, η) α 2. (7) Settng τ := 1/(6c), η := 1/3, and µ := 3/2 (also see proof of Lemma 3.2) n the nequalty above leads to the fndng that, almost surely, for all 0, [ m ] E x ( + 1) x 2 F m x () x 2 2α (f ( z ()) f ) + 4 Mα m v () v () + mn ( 1 6c, 1 ) α 2 3. Therefore, snce z() X mples f( z()) f 0 ( 0), Proposton 2.2, (C2), and Lemma 3.3 ensure that ( x () x ) 0 converges almost surely for all V. Moreover, (8) α (f ( z ()) f ) < (9) =0 s almost surely satsfed. Now, let us assume that lm nf f( z()) f does not hold almost surely. Then for all Ω F, Pr{ Ω} = 1, and there exsts ω Ω such that lm nf f( z()(ω)) f > 0. Hence, 1 > 0 and γ > 0 can be chosen 11

12 such that f( z()(ω)) f γ for all 1. Accordngly, (9) and (C1) mean that = γ = 1 α = 1 α (f ( z () (ω)) f ) <, whch s a contradcton. Therefore, lm nf f( z()) f almost surely. From f( z()) f 0 ( 0), lm nf f( z()) = f almost surely. Ths completes the proof. It s now possble to prove Theorem 3.1. Proof: Lemma 3.4 guarantees the almost sure convergence of (x ()) 0 ( V ). From (3), (v ()) 0 ( V ) also converges almost surely. The defnton of v() ( 0) mples the almost sure convergence of ( v()) 0. Moreover, Lemma 3.2 mples that, for all V, =0 d(v (), X) 2 = =0 v () z () 2 < almost surely;.e., lm v () z () = 0 almost surely. Accordngly, (z ()) 0 ( V ) converges almost surely. Ths mples that ( z()) 0 converges almost surely to x ;.e., ( z()) 0 converges n law to x. Hence, the closedness of X and Proposton 2.3 guarantee that lm sup Pr{ z() X} Pr{x X}. Snce the defnton of z() ( 0) mples that Pr{ z() X} = 1 ( 0), Pr{x X} = 1. Moreover, Lemma 3.4 and the contnuty of f ensure that, almost surely, f (x ) = lm f ( z()) = lm nf f ( z()) = f ;.e., x X. The defntons of v() and z() ( 0) mean that, for all 0, v() z() (1/m) m z () v (), whch, together wth lm v () z () = 0 almost surely for all V, mples that lm v() z() = 0 almost surely. Accordngly, the almost sure convergence of ( z()) 0 to x X guarantees that ( v()) 0 also converges almost surely to the same x X. Snce Lemma 3.3 mples that =0 α v () v() < almost surely for all V, a dscusson smlar to the one for obtanng lm nf f( z()) f almost surely (see proof of Lemma 3.4) and (C1) guarantee that lm nf v () v() = 0 almost surely for all V. Moreover, the trangle nequalty mples that v () x v () v() + v() x ( V, 0). Hence, from lm v() x = 0 and lm nf v () v() = 0 ( V ) almost surely, lm nf v () x = 0 almost surely for all V. Therefore, the almost sure convergence of (v ()) 0 ( V ) leads to the fndng that, for all V, lm v () x = 0 almost surely. (10) Snce Lemma 3.3 ensures that, for all V, lm e () 2 = lm x (+1) v () 2 = 0 almost surely, (x ()) 0 ( V ) converges almost surely to x X. Ths completes the proof Convergence rate analyss for Algorthm 3.1 The dscusson n Subsecton 3.1 leads to the fndng that the sequence of the feasblty error (d(x (), X) 2 ) 0 and the sequence of the teraton error ( x () x 2 ) 0 are stochastcally decreasng n the sense of the nequaltes n the followng proposton. 12

13 Proposton 3.1 Suppose that the assumptons n Theorem 3.1 hold, that x X s a soluton to Problem 2.1, and that (x ()) 0 ( V ) s the sequence generated by Algorthm 3.1. Then there exst β (j) > 0 (j = 1, 2, 3, 4, 5) such that, almost surely, for all 0, [ m ] E d (x ( + 1), X) 2 F [ m ] E x ( + 1) x 2 F where γ (1) α 2, γ(4) m d (x (), X) 2 m x () x 2 := m d(v (), X) 2, γ (2) := α m v () v(), and γ (5) =0 γ(j) < (j = 1, 2, 3, 4, 5). j=1,2 j=1,2,5 β (j) γ (j) + β (3) γ (3), β (j) γ (j) + j=3,4 β (j) γ (j), := m v () prox αf (v ()) 2, γ (3) := := α (f( z()) f ) ( 0) satsfy Proof: Lemma 3.2 guarantees that γ (1) := m d(v (), X) 2 and γ (2) := m v () prox αf (v ()) 2 ( 0) satsfy =0 γ(j) < almost surely for j = 1, 2. Condton (C2) mples that γ (3) := α 2 ( 0) satsfes =0 γ(3) <. From := α (f( z()) f ) ( 0) also satsfy =0 γ(j) < almost surely for j = 4, 5. Set τ := 1/(6c), η := 1/3, and µ := 3/2, and put β (1) := (τ (1/c)(1 1/µ)) = 1/(6c), β (2) := 2 η µ = 1/6, β (3) := mn(τ, η), β (4) := 4 M, and β (5) := 2, where β (3), β (4) < hold from (A3) and M := max V M. Accordngly, (5) and (7) ensure that Proposton 3.1 holds. Lemma 3.3 and (9), γ (4) := α m v () v() and γ (5) From j=1,2 β(j) γ (j) + β (3) γ (3) β (3) γ (3) = β (3) α 2 and j=1,2,5 β(j) γ (j) + j=3,4 β(j) γ (j) = (β (3) α + β (4) m v () v() )α ( 0), j=3,4 β(j) γ (j) Proposton 3.1 and Theorem 3.1 (see (10) for the almost sure boundedness of (v ()) 0 ( V )) ndcate that (x ()) 0 ( V ) n Algorthm 3.1 converges almost surely to a soluton to Problem 2.1 under the followng convergence rate condtons: for all 0, [ m ] m E d (x ( + 1), X) 2 F d (x (), X) 2 + O ( α) 2, [ m ] E x ( + 1) x 2 F m x () x 2 + O (α ). (11) Moreover, under the condton that x ( + 1) x x () x holds for all V and for a large enough, (8) means that, almost surely, f ( z()) f + 2 M m v () v() + O (α ), (12) where t s guaranteed from Theorem 3.1 that ( v () v() ) 0 ( V ) converges almost surely to 0. From Proposton 3.1, the convergence rate of Algorthm 3.1 depends on β (j) and (γ (j) ) 0 (j = 1, 2, 3, 4, 5);.e., the number of users m and the step sze (α ) 0 13

14 (see (11) and (12)). When m s fxed, t s desrable to set (α ) 0 so that, for all 0, E[ m d(x ( + 1), X) 2 F ] < m d(x (), X) 2 and E[ m x ( + 1) x 2 F ] < m x () x 2 are almost surely satsfed. Accordngly, Algorthm 3.1converges qucly f (α ) 0 s chosen so as to satsfy j=1,2 β (j) γ (j) + β (3) γ (3) < 0 and j=1,2,5 β (j) γ (j) + j=3,4 β (j) γ (j) < 0 ( 0). Hence, t would be desrable to set (α ) 0 so as to satsfy j=3,4 β(j) γ (j) = β (3) α 2 + β(4) α m v () v() 0 as much as possble, e.g., to set α := α/( + 1) wth a small postve constant α. Secton 5 gves numercal examples showng that Algorthm 3.1 wth α := 10 3 /( + 1) converges more qucly than Algorthm 3.1 wth α := 1/( + 1). Here, let us compare the convergence analyss of Algorthm 3.1 (Theorem 3.1 and Proposton 3.1) wth the convergence analyss of the ncremental subgradentproxmal methods n [4, Secton 4]. The problem consdered n [4] was mnmze F (x) = V F (x) := V (f (x) + h (x)) subject to x X, (13) where f, h : R d R ( V ) are convex and X R d s nonempty, closed, and convex. Under the condtons that f ( V ) are sutable for a proxmal teraton and the subgradents of h ( V ) are effcently computed, the followng method [4, Secton 4, (4.2)] was proposed for solvng Problem (13): z() := prox α f ω (x()), g ω h ω (z()), x( + 1) := P X (z() α g ω ), (14) where (ω ) 0 V := {1, 2,..., m} s a sequence of random varables. Algorthm (14) s an ncremental optmzaton algorthm that uses the metrc projecton onto the whole constrant set X and the proxmty operator and subdfferental of one functon selected randomly from {F } V whle Algorthm 3.1 s a dstrbuted optmzaton algorthm that uses the proxmty operator of each user s objectve functon and the metrc projecton onto one closed convex set selected randomly from each user s constrant sets. Proposton 4.3 n [4] shows that, under certan assumptons, Algorthm (14) wth (α ) 0 satsfyng (C1) and (C2) converges almost surely to some random pont n the soluton set of Problem (13). Theorem 3.1 guarantees the almost sure convergence of Algorthm 3.1 wth the condtons (C1) and (C2) to a random pont n X. Proposton 4.2 n [4] ndcates that, under certan assumptons, Algorthm (14) wth α := α > 0 ( 0) satsfes that, for all ϵ > 0, almost surely mn F (x()) F + 5αmc2 + ϵ, (15) 0 N 2 where F stands for the optmal value of Problem (13), c R s a constant, X s the soluton set of Problem (13), and N s a random varable wth E[N] md(x(0), X ) 2 /(αϵ). From (15) and Proposton 3.1, the convergence rates of Algorthms 3.1 and (14) depend on the number of elements n V and the step sze (α ) 0. 14

15 4. Dstrbuted random projected subgradent algorthm Ths secton presents the subgradent algorthm wth random projectons for solvng Problem 2.1. Algorthm 4.1 Step 0. User ( V ) sets x (0) arbtrarly. Step 1. User ( V ) receves x j () from ts neghborng users j N () and computes the weghted average v () defned as n (3) and the subgradent g () f (v ()). User updates ts estmate x ( + 1) by x ( + 1) := P X Ω () (v () α g ()). (16) The algorthm sets := + 1 and returns to Step 1. In ths secton, Assumpton (A3) s replaced wth (A3) For all V, there exsts M R such that sup{ g : x X, g f (x)} M. For all V, there exsts C R such that sup{ g () : g () f (v ()), 0} C. It s obvous that Assumpton (A3) s stronger than Assumpton (A3). Assumpton (A3) holds when f ( V ) s Lpschtz contnuous on R d (Proposton 2.1(v)). The boundedness of (g ()) 0 ( V ) s needed to show that Algorthm 4.1 satsfes =0 d(v (), X) 2 < almost surely for all V, whch s the essental part of the convergence analyss of Algorthm 4.1 (see Lemmas 4.1 and 4.2 for detals). Let us do a convergence analyss of Algorthm 4.1. Theorem 4.1 Under Assumptons , the sequence (x ()) 0 ( V ) generated by Algorthm 4.1 converges almost surely to a random pont x X. Let us compare the dstrbuted random projected gradent algorthm [24, (2a) and (2b)] wth Algorthm 4.1. The algorthm [24, (2a) and (2b)] s the poneerng dstrbuted optmzaton algorthm that s based on local communcatons of users estmates n a networ and a gradent descent wth random projectons. It can be appled to Problem 2.1 when f ( V ) s convex and dfferentable wth the Lpschtz gradent f [24, Assumpton 1 c)]. The algorthm [24, (2a) and (2b)] s x ( + 1) := P X Ω () (v () α f (v ())), (17) where v () s defned as n (3). Proposton 1 n [24] ndcates that, under the assumptons n Theorem 3.1, the sequence (x ()) 0 ( V ) generated by Algorthm (17) converges almost surely to x X. In contrast to Algorthm (17), Algorthm 4.1 can be appled to nonsmooth convex optmzaton (see Assumpton 2.1(A2)) by usng the subgradents g () f (v ()) ( V, 0) and enables all users to arrve at the same soluton to Problem 2.1 under the assumptons n Theorem 4.1 that are stronger than the ones n Theorem 3.1. In Algorthm 3.1, each user sets x ( + 1) by usng ts proxmty operator (see (4) for defnton of x () ( 0) n Algorthm 3.1) and can solve Problem 2.1 under the assumptons n Theorem 3.1 (see Theorem 3.1). 15

16 4.1. Proof of Theorem 4.1 The proof starts wth the followng lemma, whch can be proven by referrng to the proof of [26, Lemma 2]. Lemma 4.1 Suppose that Assumpton 2.1 holds. Then for all x X, for all V, for all z X, for all 0, and for all τ, η > 0, x ( + 1) x 2 v () x 2 2α (f (z) f (x)) + M (τ, η) α 2 + τ v () z 2 + (η 1) v () x ( + 1) 2, where M(τ, η) := max V (M 2 /τ + C2 /η) <, and M and C ( V ) are as n Assumpton (A3). The followng lemma can also be proven by referrng to the proof of Lemma 3.2 and by usng Lemma 4.1 (see [20] for the detals of the proof of Lemma 4.2). Lemma 4.2 Suppose that Assumptons 2.1, 2.5, and 2.3 hold. Then =0 d(v (), X) 2 < almost surely for all V. Proof: A dscusson smlar to the one for obtanng (5) mples that, almost surely, for all 0, [ m ] m E d (x ( + 1), X) 2 F d (x j (), X) 2 1 m d (v (), X) 2 4c j=1 (18) ( 1 + mm 2c, 1 ) α 2 4. Proposton 2.2 and (C2) lead to =0 m d(v (), X) 2 < almost surely;.e., =0 d(v (), X) 2 < almost surely for all V. Ths completes the proof. A dscusson smlar to the one for provng Lemma 3.3 leads to the followng (The proof of Lemma 4.3 s also descrbed n [20]). Lemma 4.3 Suppose that Assumptons 2.1, 2.2, and 2.3 hold, and defne v() := (1/m) m l=1 v l() and e () := x ( +1) v () for all 0 and for all V. Then =0 e () 2 < and =0 α v () v() < almost surely for all V. The same dscusson as for the proof of Lemma 3.4 leads to the followng (see [20] for the detals of the proof of Lemma 4.4). Lemma 4.4 Suppose that the assumptons n Theorem 4.1 are satsfed and defne z () := P X (v ()) for all V and for all 0 and defne z() := (1/m) m z () for all 0. Then the sequence ( x () x ) 0 converges almost surely for all V and for all x X, and lm nf f( z()) = f almost surely. Proof: Choose x X arbtrarly. It can be observed that (6) holds for Algorthm 4.1 because the defntons of z() and v() ( 0) n the proof of Theorem 3.1 are the same as the defntons of z() and v() ( 0) n Lemma 4.4. A dscusson smlar to the one for provng (7) means that, almost surely for all 0, for all 16

17 τ > 0, and for all η (0, 1), [ m ] E x ( + 1) x 2 F m x () x Mα 2α (f ( z ()) f ) + τ + η 1 c m v () v () m d (v (), X) 2 m d (v (), X) 2 + mm (τ, η) α 2, (19) whch, together wth τ := 1/(2c), η := 1/4, and τ + (η 1)/c = 1/(4c), mples that, almost surely, for all 0, [ m ] E x ( + 1) x 2 F m x () x 2 2α (f ( z ()) f ) + 4 Mα m v () v () + mm ( 1 2c, 1 ) α 2 4, where M(1/(2c), 1/4) < holds from (A3). Therefore, Proposton 2.2, (C2), and Lemma 4.3 ensure that ( m x () x 2 ) 0 converges almost surely;.e., ( x () x ) 0 converges almost surely for all V. Moreover, snce z() X mples f( z()) f 0 ( 0), there s also another fndng: almost surely (20) α (f ( z ()) f ) <. (21) =0 Hence, a dscusson smlar to the one for provng Lemma 3.4 leads to lm nf f( z()) = f almost surely. Ths completes the proof. Theorem 4.1 can be proven by referrng to the proof of Theorem 3.1. Proof: By referrng to the proof of Theorem 3.1, the almost sure convergence of (x ()) 0 ( V ) and Lemma 4.2 lead to the concluson that (v ()) 0, (z ()) 0 ( V ), ( v()) 0, and ( z()) 0 converge almost surely. Moreover, Lemma 4.4 and the contnuty of f ensure that ( v()) 0 and ( z()) 0 converge almost surely to x X. By referrng to the proof of Theorem 3.1, Lemma 4.3, (C1), and the trangle nequalty guarantee that lm v () x = 0 almost surely for all V. Snce Lemma 4.3 ensures that, for all V, lm e () 2 = lm x ( + 1) v () 2 = 0 almost surely, (x ()) 0 ( V ) converges almost surely to x X. Ths completes the proof Convergence rate analyss for Algorthm 4.1 The followng proposton s proven by referrng to the dscusson n Subsecton 4.1. Proposton 4.1 Suppose that the assumptons n Theorem 4.1 hold, that x X s a soluton to Problem 2.1, and that (x ()) 0 ( V ) s the sequence generated by Algorthm 4.1. Then there exst β (j) > 0 (j = 1, 2, 3, 4) such that, almost surely, 17

18 for all 0, [ m ] E d (x ( + 1), X) 2 F [ m ] E x ( + 1) x 2 F where γ (1) γ (4) m d (x (), X) 2 β (1) γ (1) + β (2) γ (2), m x () x 2 := m d(v (), X) 2, γ (2) := α 2, γ(3) j=1,4 β (j) γ (j) + j=2,3 β (j) γ (j), := α m v () v(), and := α (f( z()) f ) ( 0) satsfy =0 γ(j) < (j = 1, 2, 3, 4). Proof: From Lemma 4.2 and (C2), γ (1) ( 0) satsfy =0 γ(1) := m d(v (), X) 2 and γ (2) := α 2 < almost surely and =0 γ(2) <. Lemma 4.3 and (21) ensure that γ (3) := α m v () v() and γ (4) := α (f( z()) f ) ( 0) also satsfy =0 γ(j) < almost surely for j = 3, 4. Set τ := 1/(2c) and η := 1/4, β (1) := (τ + (η 1)/c) = 1/(4c), β (2) := mm(τ, η), β (3) := 4 M, and β (4) := 2, where β (2), β (3) < hold from (A3) and M := max V M. Accordngly, (18) and (19) ensure that Proposton 4.1 holds. A dscusson smlar to the one for obtanng (11), Proposton 4.1, and Theorem 4.1 ndcates that (x ()) 0 ( V ) n Algorthm 4.1 converges almost surely to a soluton to Problem 2.1 for the followng convergence rates: for all 0, [ m ] m E d (x ( + 1), X) 2 F d (x (), X) 2 + O ( α) 2, [ m ] E x ( + 1) x 2 F m x () x 2 + O (α ). (22) Under the condton that x ( + 1) x x () x for all V and for a large enough, (20) mples that, almost surely, f ( z()) f + 2 M m v () v() + O(α ), (23) where ( v () v() ) 0 s the almost sure convergent sequence (see proof of Theorem 4.1). It can be observed from (11), (12), (22), and (23) that Algorthms 3.1 and 4.1 have almost the same convergence rate. Secton 5 gves numercal examples showng that Algorthms 3.1 and 4.1 have almost the same convergence rate when they have the same step sze. Next, let us consder the case n whch f ( V ) s convex and dfferentable and f ( V ) satsfes the Lpschtz contnuty condton [24, Assumpton 1 c)]. Then Algorthm 4.1 concdes wth the frst dstrbuted random projecton algorthm [24, (2a) and (2b)] (see (17)) defned as follows for all 0 and for all V : x ( + 1) := P X Ω () (v () α f (v ())), (24) where v () ( V, 0) s as n (3) and (α ) 0 satsfes (C1) and (C2). The proof of Lemma 5 and (18) n [24] ndcate that algorthm (24), under (A3), almost 18

19 surely satsfes the followng convergence rate condtons for a large enough. [ m ] E d (x ( + 1), X) 2 F ( 1 + O ( α 2 )) m d (x (), X) 2 + O ( α) 2, [ m ] E x ( + 1) x 2 F ( 1 + O ( α 2 )) m x () x 2 + O (α ). (25) Proposton 4.1 mples that, f the stronger assumpton (A3) s satsfed, Algorthm (24) satsfes (22) that are better propertes for convergence rate than (25). Let us compare the convergence analyss used here (Theorem 4.1 and Proposton 4.1) wth that of the ncremental subgradent methods n [27]. The followng ncremental subgradent method wth randomzaton can optmze the sum of nonsmooth, convex functons V f over a nonempty, closed convex set X [27, (3.1)]: g ω f ω (x()), x( + 1) := P X (x() α g ω ), (26) where (ω ) 0 V s a sequence of random varables. Algorthm (26) s an ncremental optmzaton algorthm that uses the metrc projecton onto the whole constrant set X and the subdfferental of one functon selected randomly from {f } V whle Algorthm 4.1 s a dstrbuted optmzaton algorthm that uses the subdfferental of each user s objectve functon and the metrc projecton onto one closed convex set selected randomly from each user s constrant sets. Under certan assumptons, Algorthm (26) wth (α ) 0 satsfyng (C1) and (C2) converges almost surely to some mnmzer of V f over X [27, Proposton 3.2]. Theorem 4.1 wth condtons (C1) and (C2) ndcates the almost sure convergence of Algorthm 4.1 to a random pont n X. Proposton 3.1 n [27] ndcates that, under certan assumptons, Algorthm (26) wth α := α > 0 ( 0) means that the followng relatonshp s almost surely satsfed. nf f (x()) 0 V V f (x ) + αmc2, (27) 2 where x X represents the mnmzer of V f over X and c R s a constant. From (27) and Proposton 4.1, the convergence rates of Algorthms 4.1 and (26) depend on the number of elements n V and the step sze (α ) 0, as seen n Algorthms 3.1 and (14) (see Subsecton 3.2). 5. Numercal evaluaton Let us apply Algorthms 3.1 and 4.1 to Problem 2.1 wth f : R d R, and X R d ( V := {1, 2,..., m}) defned by f (x) := d a j x j b j and X := j=1 d j=1 {x R d : x c j r j }, 19

20 where a := (a j ) d j=1, b := (b j ) d j=1, r := (r j ) d j=1 Rd + ( V ), and c j R d ( V, j = 1, 2,..., d), meanng that ths s the problem of mnmzng the sum of the weghted L 1 -norms f(x) = m f (x) over the ntersecton of closed balls X = m X. The metrc projecton onto X j := {x R d : x c j r j } ( V, j = 1, 2,..., d) can be computed wthn a fnte number of arthmetc operatons [1, Chapter 28]. Functon f ( V ) satsfes the Lpschtz contnuty condton. Hence, Assumptons 2.1 and 2.5 hold. The set X Ω () ( V, 0) used n the numercal evaluaton was chosen randomly from the sets X j so as to satsfy Assumpton 2.4. The subdfferental f and the proxmty operator prox αf ( V, α > 0) can be calculated explctly [13, Lemma 10, (30), (35)] Fgure 1.: Networ model used n numercal evaluaton, where users wthn each mared area can communcate wth each other In the evaluaton, a networ wth 48 users (.e., m := 48) and 16 subnetwors was used, as llustrated n Fgure 1. It was assumed that users wthn each mared area (.e., all users n each subnetwor) could communcate wth each other. For example, user 2 could communcate wth users 1, 3, and 4 (.e., N 2 () = {1, 2, 3, 4}) whle user 1 could communcate wth not only users 2, 3, and 4 but also users 46, 47, and 48 (.e., N 1 () = {1, 2, 3, 4, 46, 47, 48}). The weghted parameters w j () ( V, j N ()) were set to satsfy Assumpton 2.3 (e.g., user 2 had w 2,j (j N 2 ()) such that w 2,j () = w j,2 () = 3/8 (j = 2, 3) and w 2,j () = w j,2 () = 1/8 (j = 1, 4), and user 1 had w 1,j (j N 1 ()) such that w 1,1 () = 2/8 and w 1,j () = w j,1 = 1/8 (j = 2, 3, 4, 46, 47, 48)). The pont v () ( V ) defned n (3) (e.g., v 2 () = (3/8)(x 2 () + x 3 ()) + (1/8)(x 1 () + x 4 ())) was computed by passng along x j () (j N ()) n a prearranged cyclc order. The computer used n the evaluaton had two Intel Xeon E v3 (2.60 GHz) CPUs. Each had 8 physcal cores and 16 threads;.e., the total number of cores was 16, and the total number of threads was 32. The computer had 64 GB DDR4 memory and ran the Ubuntu (Lnux ernel: generc, 64 bt) operatng system. The evaluaton programs were run n Python 3.4.0; Numpy was used to compute the lnear algebra operatons. We set d := 100 and m := 48 and used a := (a j ) d j=1 (0, 1]d, b := (b j ) d j=1 [0, 1)d, r := (r j ) d j=1 [3, 4)d ( V ), and c j [ (3/4)d, (3/4)d) d ( V, j = 1, 2,..., d) to satsfy X generated randomly by numpy.random. One hundred samplngs were performed, each startng from dfferent random ntal ponts x (0) ( V ) n the range [ 2, 2] d, and the results were averaged. From the dscusson n Subsectons 3.2 and 4.2, t can be expected that Algorthms 3.1 and 4.1 wth small step szes converge qucly. To see how the step sze affects ther convergence rates, we compared ther rates for α = 1/(+1) wth those for α = 10 3 /( + 1). Step sze α = 1/( + 1) s the smplest sequence satsfyng (C1) and (C2) n Assumpton 2.3. The selecton of step sze α = 10 3 /( + 1) was 20

21 based on the numercal results n [16, 22], whch presented fxed pont optmzaton algorthms that mnmze the sum of smooth, convex functons over the ntersecton of smple, closed convex sets. The prevous results [16, 22] ndcated that fxed pont optmzaton algorthms wth a small step sze such as α = 10 3 /( + 1) converge more qucly than ones wth the standard step sze α = 1/( + 1). Accordngly, the numercal evaluaton descrbed here used step szes α = 1/( + 1) and 10 3 /( + 1) and nvestgated whch one resulted n qucer convergence for Algorthms 3.1 and 4.1. Let us defne two performance measures for each V and for all 0, D () := x () d j=1 P X j (x ()) and F () := f (x ()), (28) and observe the behavors of D () and F () for Algorthms 3.1 and 4.1 wth α = 1/( + 1) and 10 3 /( + 1). If (D ()) 0 ( V ) converges to 0, (x ()) 0 converges to a fxed pont of d j=1 P X,.e., to a pont n X j = d j=1 Xj [1, Corollary 4.37]. Frst, let us observe the behavors of (x 1 ()) 1000 =0 calculated for user 1, whch belongs to two subnetwors. Fgures 2 and 3 show that Algorthm 3.1 converged more qucly to a pont n X 1 wth α = 10 3 /( + 1) than wth α = 1/( + 1). From Fgures 2 and 3, t can be seen that Algorthm 3.1 wth α = 10 3 /( + 1) optmzes f 1 over X 1 n the early stages. (a) D 1 () vs. no. of teratons (b) D 1 () vs. elapsed tme Fgure 2.: Behavors of D 1 () for Algorthm 3.1 wth α = 1/(+1) and 10 3 /(+1) (a) F 1 () vs. no. of teratons (b) F 1 () vs. elapsed tme Fgure 3.: Behavors of F 1 () for Algorthm 3.1 wth α = 1/(+1) and 10 3 /(+1) 21

General viscosity iterative method for a sequence of quasi-nonexpansive mappings

General viscosity iterative method for a sequence of quasi-nonexpansive mappings Avalable onlne at www.tjnsa.com J. Nonlnear Sc. Appl. 9 (2016), 5672 5682 Research Artcle General vscosty teratve method for a sequence of quas-nonexpansve mappngs Cuje Zhang, Ynan Wang College of Scence,