Parallel Multi-splitting Proximal Method for Star Networks

Similar documents
arxiv: v1 [cs.lg] 22 Feb 2015

LECTURE 24 LECTURE OUTLINE

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

The Mathematical Appendix

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Research Article A New Iterative Method for Common Fixed Points of a Finite Family of Nonexpansive Mappings

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

PROJECTION PROBLEM FOR REGULAR POLYGONS

Rademacher Complexity. Examples

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Introduction to local (nonparametric) density estimation. methods

Dimensionality Reduction and Learning

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables

A Remark on the Uniform Convergence of Some Sequences of Functions

A tighter lower bound on the circuit size of the hardest Boolean functions

MATH 247/Winter Notes on the adjoint and on normal operators.

Chapter 5 Properties of a Random Sample

Aitken delta-squared generalized Juncgk-type iterative procedure

Likewise, properties of the optimal policy for equipment replacement & maintenance problems can be used to reduce the computation.

Functions of Random Variables

TESTS BASED ON MAXIMUM LIKELIHOOD

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

1 Convergence of the Arnoldi method for eigenvalue problems

1 Lyapunov Stability Theory

Support vector machines

Point Estimation: definition of estimators

Analysis of Lagrange Interpolation Formula

A Study on Generalized Generalized Quasi hyperbolic Kac Moody algebra QHGGH of rank 10

Research on SVM Prediction Model Based on Chaos Theory

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem

Bayes (Naïve or not) Classifiers: Generative Approach

Chapter 9 Jordan Block Matrices

BERNSTEIN COLLOCATION METHOD FOR SOLVING NONLINEAR DIFFERENTIAL EQUATIONS. Aysegul Akyuz Dascioglu and Nese Isler

Summary of the lecture in Biostatistics

A New Method for Decision Making Based on Soft Matrix Theory

Runtime analysis RLS on OneMax. Heuristic Optimization

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

A new type of optimization method based on conjugate directions

Consensus Control for a Class of High Order System via Sliding Mode Control

Mu Sequences/Series Solutions National Convention 2014

Lecture 3 Probability review (cont d)

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

PTAS for Bin-Packing

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

Generalization of the Dissimilarity Measure of Fuzzy Sets

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Research Article A New Derivation and Recursive Algorithm Based on Wronskian Matrix for Vandermonde Inverse Matrix

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov

arxiv:math/ v1 [math.gm] 8 Dec 2005

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations

Simulation Output Analysis

Comparing Different Estimators of three Parameters for Transmuted Weibull Distribution

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

A conic cutting surface method for linear-quadraticsemidefinite

Non-uniform Turán-type problems

X ε ) = 0, or equivalently, lim

Research Article Some Strong Limit Theorems for Weighted Product Sums of ρ-mixing Sequences of Random Variables

Uniform asymptotical stability of almost periodic solution of a discrete multispecies Lotka-Volterra competition system

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design

CHAPTER VI Statistical Analysis of Experimental Data

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Supervised learning: Linear regression Logistic regression

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic

Communication-Efficient Distributed Primal-Dual Algorithm for Saddle Point Problems

An Introduction to. Support Vector Machine

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

1 Onto functions and bijections Applications to Counting

A New Family of Transformations for Lifetime Data

5 Short Proofs of Simplified Stirling s Approximation

Lebesgue Measure of Generalized Cantor Set

Analysis of Variance with Weibull Data

On Modified Interval Symmetric Single-Step Procedure ISS2-5D for the Simultaneous Inclusion of Polynomial Zeros

Lecture Note to Rice Chapter 8

An Accelerated Proximal Coordinate Gradient Method

Kernel-based Methods and Support Vector Machines

Lecture 07: Poles and Zeros

THE PROBABILISTIC STABILITY FOR THE GAMMA FUNCTIONAL EQUATION

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

4 Inner Product Spaces

3D Geometry for Computer Graphics. Lesson 2: PCA & SVD

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Analyzing Fuzzy System Reliability Using Vague Set Theory

Entropy ISSN by MDPI

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. Research on scheme evaluation method of automation mechatronic systems

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

0/1 INTEGER PROGRAMMING AND SEMIDEFINTE PROGRAMMING

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Investigating Cellular Automata

Complete Convergence for Weighted Sums of Arrays of Rowwise Asymptotically Almost Negative Associated Random Variables

Multivariate Transformation of Variables and Maximum Likelihood Estimation

arxiv: v4 [math.nt] 14 Aug 2015

Nonlinear Piecewise-Defined Difference Equations with Reciprocal Quadratic Terms

Algorithms Theory, Solution for Assignment 2

Transcription:

Parallel Mult-splttg Proxmal Method for Star Networks Erm We Departmet of Electrcal Egeerg ad Computer Scece Northwester Uversty Evasto, IL 600 erm.we@orthwester.edu Abstract We develop a parallel algorthm based o proxmal method to solve the problem of mmzg summato of covex (ot ecessarly smooth) fuctos over a star etwork. We show that ths method coverges to a optmal soluto for ay choce of costat stepsze for covex objectve fuctos. Uder further assumpto of Lpschtz-gradet ad strog covexty of objectve fuctos, the method coverges learly. I. INTRODUCTION We cosder the followg class of optmzato problem m x = f (x), whch has gaed much research atteto recetly. It captures may mportat applcatos such as dstrbuted cotrol for a team of autoomous robots/uavs pursug/amg at a commo target, sesor etworks costructg a estmato for the etre surroudg, commucato systems maxmzg system throughput, ad mache learg applcatos [5], [0], [6], [5], [9], [], [7]. Most of the exstg lterature for solvg ths problem ether does ot explore parallel potetal [6], [5], [9], [] or requres a careful selecto of stepsze to guaratee covergece [7], []. The requremet of stepsze tug ca be computatoally expesve, ad udermes the robustess of the etre system to provde a optmal soluto. The oly le of dstrbuted algorthms that does ot suffer from the drawbacks of stepsze selecto s Alteratg Drecto Method of Multpler (ADMM) based algorthms [3], [5], [0], [], [], [0], [3], [5], [], [], [8], [4], whch has gaed much popularty due to great umercal performace. A closer look of the stadard ADMM reveals that t s a two-way splttg proxmal algorthm [6], where a two-way splttg of the dual fucto s formed ad the proxmal method s appled teratvely to both parts. However, as observed recet work [4], whle the stadard two-way splttg ADMM (correspodg to a two-aget settg a mult-aget setup) ca coverge for ay stepsze choce, a three-way splttg of the dual fucto may result a algorthm that dverges. Hece, order to use ADMM a dstrbuted settg wth more tha two agets, complex reformulato of the problem ad troducto of auxlary (prmal ad dual) varables are requred [], [3], [0]. Despte the two promsg features that proxmal-based methods do ot requre stepsze selecto ad that mult-splttg arses aturally from mult-aget setup, the questo of whether we ca desg a coverget algorthm based o more tha two-way splttg proxmal method remas ope. I ths paper, we combe deas from proxmal method ad projecto to develop a mult-splttg proxmal algorthm that works wth o-smooth covex objectve fucto, takes advatage of parallel processg power ad guaratees asymptotc covergece for ay postve stepsze. We also aalyze ts rate of covergece uder stroger assumptos of Lpschtz gradet ad strog covexty ad show that the algorthm coverges learly. Our paper s related to the large lterature o dstrbuted/parallel computato, buldg upo semal works [] ad [9]. I partcular, the dstrbuted gradet descet method [] ad EXTRA [7] method. The dstrbuted gradet method ca be appled to o-smooth objectve fucto, however a costat stepsze would oly guaratee covergece to a error eghborhood of the optmal soluto. The recetly proposed dstrbuted frst-order method, EXTRA, uses a costat stepsze ad coverges to a optmal pot. However, the algorthm does requre careful selecto of stepszes to guaratee covergece ad smoothess of the objectve fuctos, whch lmts ts applcablty to mportat problems wth o-smooth regularzato term, such as the LASSO. The most closely related lterature for our algorthm s [8] from 983, whch was later geeralzed [7]. These authors combe mult-splttg proxmal method ad projecto to form a ew algorthm. Spgar s algorthm s a specal case of our algorthm wth a ut stepsze. We also ote that these papers do ot have rate of covergece aalyss (uder Lpschtz gradet ad strog covexty assumptos). The proposed algorthm shares the same rate of covergece as some exstg algorthms, such as EXTRA ad ADMM, the ma advatage s ts robustess agast stepsze selecto ad smple mplemetato dstrbuted settg. Whle we focus o the star etwork ths work, ths serves as a buldg bloco develop dstrbuted methods for geeral etwork topologes. The rest of the paper, we wll frst preset the algorthm alog wth some prelmary smulato results ad the the covergece ad rate of covergece aalyss. II. ALGORITHM We preset the proposed algorthm ths secto. Frst, we ote that the orgal problem ca be equvaletly expressed Due to space lmts, some proofs are omtted here. Iterested readers are referred to the author s webste for a mauscrpt cludg all the proofs.

Aggregator...... f (x ) f (x ) f 3 (x 3 ) f (x ) Fg.. Parallel archtecture to mplemet the proposed algorthm. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. Mult Splttg (c=0.) DGD (α = 0.05) EXTRA (α = 0.05) 0 0 3 4 5 6 7 8 9 0 Fg.. Prelmary umercal result. Y-axs: relatve error f(x t ) f f(x 0 ) f, x-axs: terato cout. as m x f (x ), = s.t. x =... = x. () We adopt a geeral setup where each fucto f : R m R s covex but ot ecessarly dfferetable. We am at developg algorthm to solve ths reformulated problem uder the followg stadard assumpto. Assumpto : Problem () has a o-empty optmal soluto set, deoted by X. Ths codto does ot requre uqueess of the optmal soluto. The parallel algorthm s mplemeted o + maches, coected a star graph as show Fgure. We call the oe the ceter the aggregator, the rest of them workers labeled {,..., }. Each worker has formato about oe fucto f ad specalzes computg the proxmal operator related to f. Collectvely, the workers ad aggregator are solvg problem (). Our algorthm s a teratve method, where the updates are related to decso varable ad frst order formato. At each terato, the workers parallel perform a proxmal pot updates for ther respectve f usg the curret state formato receved from the aggregator (related to decso varable ad correspodg frst order formato) ad seds the updated formato to the aggregator. The aggregator the averages the formato accordg to a specfc rule ad seds the averaged formato as the ew state baco each worker. I partcular, we use The umber of maches ca be reduced va m-batchg ad/or requrg the aggregator to also process formato about oe of the fucto f. We chose to preset the settg wth maxmum parallelsm. the followg set of otato to descrbe our algorthm. I our algorthm, we use superscrpt to dcate the terato cout ad subscrpt to dcator the worker that s assocated wth the varable. Postve parameter c s the stepsze ad s a costat throughout the algorthm. Our algorthm s preseted Algorthm. Algorthm Parallel Mult-splttg Proxmal Method Italzato: The aggregator starts from arbtrary x 0 ad ṽ 0 R m for,...,, compute x 0 = x0, for all =,... ad v 0 = ṽ0 = ṽ0. The aggregator seds formato x 0 + cv0 ( Rm ) to each worker. Iterato: for k = 0,,... Worker =,... computes parallel y k+ argm f (p) + p x k cv k, (a) p c w k+ = c (xk + cv k y k+ ), (b) ad reports y k+ ad w k+ (each R m ) baco the aggregator. After recevg y k+ ad w k+ formato from all workers, the aggregator geerates x k+ = = yk+, v k+ = w k+ = wk+, for =,..., ad the seds formato x k+ cv k+ ( R m ) to each worker. + The {y k} k sequece ca be vewed as local estmates of x. At each tme stat k, x k s the same for all ad equals to the average of all local estmates. The {w k } sequece as show later Lemma 3., represets a local subgradet assocated wth x k+ of fucto f. The varable v k captures the dfferece betwee local subgradet ad the average of all subgradets. Ths algorthm s well suted for problems where step (a), mmzato related to oe compoet of the objectve fucto, ca be mplemeted a effcet way. Examples clude SVM, quadratc objectve fuctos, Lasso (Least Absolute Shrkage ad Selecto Operator) (see [3], [4] for more examples). Whe aalyzg covergece speed for ths algorthm, we focus o the terato cout of k, ad ot coutg the tme eeded to solve step (a). We have performed some tal umercal studes to compare our method agast dstrbuted gradet descet (DGD) [] ad EXTRA [7] wth = 4, m = ad quadratc objectve fuctos. We plot the relatve error objectve fucto Fgure. We used stepsze of 0. for the proposed method ad 0.05 for DGD ad EXTRA,as they both dverge for the stepsze choce of 0. ad eeded smaller stepsze. III. CONVERGENCE ANALYSIS I ths secto, we aalyze the covergece ad speed of covergece of the proposed algorthm. For cocse represetato, we troduce the followg otato. Vector

x k = [x k ] R m s a log vector formed by stackg x k,.e., x k = x k. x k. Smlarly, we form vectors y k = [y k ], v k = [v k] ad w k = [w k], all R m. Uless otherwse specfed, vectors such as x wth sub-dces le R m ad those wthout sub-dces, such as x k are R m. We deote by F : R m R, ad F : R m R m, the x f (x ) x mappgs F. =., F. = x f (x ) x v., v f (x ),where the otato f (x) deotes v the subdfferetal set,.e., the set cossts of all subgradet of f at pot x. We use x y to deote the er product betwee two vectors x ad y. We ext show that our algorthm has two compoets: proxmal method ad projecto, whch serves as the bass for covergece aalyss. A. Proxmal method We start by aalyzg the sequeces y k ad w k. Step (a) ca be equvaletly expressed as y k+ prox cf (x k + cv k ) usg defto of proxmal operator. We ext gve a characterzato of w k+. s the set f (y k+ ). The proceedg lemma llustrates that at each terato, at each worker, we have a par of prmal decso varable Lemma 3.: For each terato k, w k+ ad a assocated subgradet (y k+, w k+ ) obtaed based o a proxmal step. Hece the m dmesoal vectors y k+, w k+ also correspods to decso varable ad subgradet par geerated based o a proxmal step at x k + cv k. B. Projecto We ext study the sequece x k, v k. Motvated by optmalty codto of problem (), we ext troduce the x followg two subspaces: A =., x =... = x, x x B =., = x = 0. We use z(a) ad z(b) to x deote the projecto of vector z oto subspaces A ad B respectvely. We observe that for ay optmal soluto x of (), frst order optmalty codtos mply that x s A ad there exsts a subgradet v F (x ), wth v B. The ext lemma qualfes the coecto betwee spaces A ad B. Lemma 3.: The spaces A ad B are orthogoal complemets. We ow ote that our algorthm x k s a projecto of decso varable y k oto space A ad v k s a projecto of subgradet w k oto the space B. These projectos are performed to gude the decso varables ad subgradets towards the approprate subspaces where the optmal solutos lve. C. Covergece Based o the prevous two sectos, we coclude that our algorthm s a combato of the proxmal method ad orthogoal projecto method. The covergece aalyss s also motvated by the oexpasve propertes of these methods. Before we proceed to the aalyss, we frst observe that by defto of w k+, we have y k+ + cw k+ = x k+ + cv k+. Therefore, oe terato of the algorthm ca be represeted as follows: x k + cv k = y k+ + cw k+ (4) x k+ = y k+ (A), v k+ = w k+ (B), (5) where w k+ s F (y k+ ) by Lemma 3.. Sce the two sequeces x k ad v k are two orthogoal spaces, ther sum has a uque orthogoal decomposto ad covergece of the sum automatcally mples covergece of x k ad v k. We wll focus o the covergece of the sum x k + cv k. We frst show that ay fxed pot of the above terato ad the set of optmal solutos to problem () are equvalet. Lemma 3.3: The vector x + cv where x A ad v B s a fxed pot of terato (5) f ad oly f x s a optmal soluto of problem () ad v s F (x). Proof: We frst assume that (x, v) s a fxed pot of terato (5). We use y, w, x +, v + to deote the updates startg from x k = x, v k = v. Sce x + cv s a fxed pot, we have x + + cv + = x + cv. Sce x, x + both are A ad v, v + are both B, by orthogoalty of A, ad B, we have x = x +, v = v +. Sce x + cv s a fxed pot, we have x + + cv + = y (A) + cw (B) = y + cw = x + cv, (6) for =,...,. We ca the sum over ad have = y (A) + cw (B) = = x+ + cv +. By costructo of x +, we have = y (A) = = x+. Therefore = w (B) = = v+. Sce v+ s B, we have = v+ = 0, whch mples that = w (B) = 0 ad w s the subspace B. Hece v + = w(b) = w = v. We combe ths wth Eq. (6), ad obta x = y = y +. Therefore, v B s also F (x) wth x A. Ths suggests that the frst order optmalty codto s satsfed ad therefore the par (x, v) s a optmal soluto ad subgradet par. Next we start from a optmal soluto ad subgradet par (x, v). We have x s A ad v s B. Sce v s F (x), we have w = v ad y = x, ad the projecto wll gve the orgal par back. Thus (x, v) s a par of fxed pot wth the compoets lyg orthogoal subspaces, whch mples that x + cv s a fxed pot of the terato.

By Assumpto, we have that the set of fxed pots of terato (5) s oempty. For the rest of the paper, we use x, v to deote a fxed pot of the terato (5). We ext show that the mappg from x k + cv o x k+ + cv k+ s oexpasve, whch s a key for covergece aalyss. Theorem 3.4: Let x A deote a optmal soluto of problem () ad v B be a subgradet of F (x ). The ay sequece of x k, v k, y k, w k geerated by Algorthm, we have for all k x k+ x + cv k+ cv (7) = x k x + cv k cv y k+ x k+ cw k+ cv k+ (y k+ x ) (cw k+ cv ), (y k x ) (cw k cv ) = (y k x ) (w k v ) 0. (8) = The sequece { x k x + cv k cv } k s mootocally ocreasg. Proof: We frst apply the equalty of the form z k+ z + z k z k+ +(z k z k+ ) (z k+ z ) = z k z, where z = x + cv wth the correspodg superscrpt, to x k+ + cv k+ x cv ad obta x k+ + cv k+ x cv (9) = x k + cv k x cv x k + cv k x k+ cv k+ (x k + cv k x k+ cv k+ ) (x k+ + cv k+ x cv ). The rest of the proof reles o the fact that A ad B are orthogoal complemets ad the er products betwee ay elemets of these sets are zero. For the secod term o the rght had sde of Eq. (9), we use Eq. (5) ad have x k + cv k x k+ cv k+ = y k+ + cw k+ x k+ cv k+. Sce x k+ s the projecto of y k+ A we have that y k+ x k+ les B ad smlarly cw k+ cv k+ s A. Ther er product s zero ad thus the term ca be further decomposed to x k + cv k x k+ cv k+ = y k+ x k+ + cw k+ cv k+. We ext aalyze the er product term o the rght had of Eq. (9). By Eq. (5), we have x k + cv k = y k+ + cw k+, ad thus we have x k + cv k x k+ cv k+ = yk + + cw k+ x k+ cv k+. We recall that y k+ x k+ les B ad cw k+ cv k+ s A. We also observe that x k+ x s A ad cv k+ cv s B. Combe these observatos together, we have (x k + cv k x k+ cv k+ ) (x k+ + cv k+ x cv ) = (y k+ x k+ ) (cv k+ cv ) + (cw k+ cv k+ ) (x k+ x ), = (y k+ x k+ + x k+ x ) (cv k+ cv ) + (cw k+ cv k+ ) (x k+ x + y k+ x k+ ), where the last equalty we add terms (x k+ x ) (cv k+ cv ) to the frst term ad (cw k+ cv k+ ) (y k+ x k+ ) to the secod term of the secod equalty, both of whch are zero due to the orthogoalty of A ad B. We ca ow combe the terms ad have (x k + cv k x k+ cv k+ ) (x k+ + cv k+ x cv ) (0) = (y k+ x ) (cw k+ cv ). We ca ow combe Eqs. (9)-(0) ad coclude x k+ + cv k+ x cv x k + cv k x cv = y k+ x k+ cw k+ cv k+ (y k+ x ) (cw k+ cv ). For the terms o the left had sde, we oce aga use the orthogoalty of A ad B, alog wth the fact that all x related terms are A ad v related terms are B to break dow the orm, ad have x k+ x + cv k+ cv x k x cv k cv = y k+ x k+ cw k+ cv k+ (y k+ x ) (cw k+ cv ), whch shows Eq. (7). To see that the sequece { x k x + cv k cv } k s mootocally ocreasg, we eed to show that the er product term the above le satsfes (y k+ x ) (cw k+ cv ) 0,.e., Eq. (8). We ote that sce w k+ s F (y k+ ), we have by covexty of f, (y k+ x ) (w k+ v ) 0 for =,...,. Ths establshes Eq. (8). The prevous theorem establshes that the sequece { x k x + cv k cv } k s mootocally ocreasg, we are ow equpped to show covergece of the sequece {x k } to a optmal soluto. Theorem 3.5: Let {x k } be a sequece geerated by Algorthm. The the sequece coverges to a optmal soluto of problem (). Proof: The mootocty results from prevous theorem mples that the sequece { x k x + cv k cv } k s bouded. Hece sequece {x k, v k } k has subsequet coverget sequece. We ow focus o a coverget subsequet {x kt, v kt } t ad deote ts lmt pot as x, ṽ, ad the correspodg {y kt, w kt } also coverge ad ts lmt pot as (ỹ, w). Eq. (7) suggests that x + x + cv + cv = x x + cv cv y k+ x k+ cw k+ cv k+ k= + (y k+ x ) (cw k+ cv ).

By Eq. (8), we have the er product term s oegatve, ad thus x x + cv cv x kt+ x cv kt+ cv + k= y k+ x k+ + cw k+ cv k+. We the take lmt as t o both sdes ad have lm x x + cv cv t x + x cv + cv lm + t k= y k+ x k+ + cw k+ cv k+. Sce the sequece {x kt, v kt } s coverget to ( x, ṽ), we have x x + cṽ cv x x cṽ cv lm + t k= y k+ x k+ + cw k+ cv k+. The left had sde s 0 ad each summad o the rght had sde s oegatve, therefore we have lm t y kt x kt = 0, lm k cw kt cv kt = 0,.e., ỹ = x, w = ṽ. Hece the pot ( x, ṽ) s a fxed pot of terato (5). We ca the use x = x, ad v = ṽ Eq. (7). Sce the value of x x + cv cṽ s gog to 0 alog the sequece ad the orgal sequece k s mootoe, we have lm k 0 x k x + cv k cṽ = 0. Therefore sequece {x k, v k } coverges. The lmt pot of ( x, ṽ) s a fxed pot of terato (5) ad thus by Lemma 3.3, x s a optmal soluto of problem (). We remarhat the above theorem guaratees covergece of the algorthm for ay stepsze choce c > 0. D. Rate of Covergece We ext show that uder some assumptos of the objectve fuctos, we ca establsh lear rate of covergece of the algorthm, ad the stepsze choce c becomes a parameter the rate of covergece. For ths secto, we assume our objectve fuctos f are cotuously dfferetable ad satsfy the followg assumpto. Assumpto : Each compoet of the objectve fucto f has Lpschtz gradet wth Lpschtz costat L ad s µ strogly covex. Note that the evet where precse values of L ad µ are mssg, a upper boud o L ad a lower boud o µ ca be used place of L ad µ for the rest of aalyss. The followg lemma relates y k+ x k+ + cw k+ cv k+ + (y k+ x ) (cw k+ cv ) to x k+ x + cv k+ cv. We later combe ths lemma wth Theorem 3.4 to show lear covergece rate. Lemma 3.6: For ay sequece of x k, v k, y k, w k geerated by Algorthm, we have that y k+ x k+ + cw k+ cv k+ () + (y k+ x ) (cw k+ cv ) { m, cµβ } x k+ x { } µ( β) + m, cv k+ cl cv for ay β (0, ). Proof: We frst focus o the er product term o the left had sde, the later combe t wth the rest of the orm terms. Sce each f s dfferetable, the vector w k+ s composed of gradet vectors,.e., w k+ = [ f (y k+ )] ad v = [ f (x )]. By strog covexty of f Assumpto, we have by [3], (y x ) ( f (y ) f (x )) µ y x, for ay y R m. For the log vector R m, we therefore have (y k+ x ) (cw k+ cv ) () = c (y k+ x ) ( f (y k+ ) f (x )) cµ = y k+ = x = cµ y k+ x. We also ote that due to the fact that each f has Lpschtz gradet wth Lpschtz costat L, we have by results [3], f (y ) f (x ) L y x, for ay y R m. Therefore, we have y k+ x = L = = y k+ x f (y k+ ) f (x ) = L w k+ v. Thus we ca troduce the factor β (0, ) to Eq. () ad have (y k+ x ) (cw k+ cv ) cµβ y k+ x + cµ( β) y k+ x cµβ y k+ x + µ( β) cl cw k+ cv. We ca ow brg the orm terms ad have y k+ x k+ + cw k+ cv k+ + (y k+ x ) (cw k+ cv ) y k+ x k+ + cµβ y k+ x + cw k+ cv k+ + µ( β) cl cw k+ cv We ext use the equalty 3 that a + b a + b, ad have y k+ x k+ + cµβ y k+ x { m, cµβ } x k+ x, 3 To see why ths equalty s true, we have a + b a b = a b 0. Therefore, [ a + b ] a b, whch mples that a + b a + b + a b = a + b.

cw k+ cv k+ µ( β) + cw k+ cl cv { } µ( β) m, cl cv k+ cv. By combg the prevous three equaltes, we obta Eq. (). We ext show lear rate of covergece. Theorem 3.7: For ay sequece of x k, v k geerated by Algorthm, we have that for ay β (0, ), ( { }) + m x k+ x + ( + m, cµβ { µ( β), cl }) cv k+ cv x k x + cv k cv. Proof: Recall Eqs. (7) ad () x k x + cv k cv x k+ x cv k+ cv = y k+ x k+ + cw k+ cv k+ + (y k+ x ) (cw k+ cv ), ad y k+ x k+ + cw k+ cv k+ + (y k+ x ) (cw k+ cv ) { m, cµβ } x k+ x { } µ( β) + m, cv k+ cl cv. Hece, we ca combe the prevous two les ad establsh the desred relato. The above theorem establshes lear covergece rate for the algorthm. To match the two costats, we cµβ = µ( β) β β ca set cl, ad have c = L. Ths cµβ choce of c gves = µ( β) cl = ( β)β µ L. Ths value s maxmzed at β =. We ca the have ( { + m, 4L}) [ µ x k+ x + cv k+ cv ] x k x + cv k cv. For problems wth µ L >, we have x k+ x + cv k+ cv [ x k x + cv k cv ]. For problems wth 3 µ L, we have x k+ x + cv k+ cv [ 4 x k 4+κ x + cv k cv ], where κ = µ L. We coclude that the rate of lear covergece depeds o the codto umber of the objectve fuctos. IV. CONCLUSIONS I ths paper, we propose a parallel mult-splttg proxmal method ad show that t coverges for ay postve stepsze. Whe the objectve fuctos are Lpschtz gradet ad strogly covex, the algorthm coverges learly. Future works clude exted ths algorthm to stochastc settg where delays ad errors are volved. REFERENCES [] D. P. Bertsekas. Icremetal Gradet, Subgradet, ad Proxmal Methods for Covex Optmzato: A Survey. LIDS Report 848, 00. [] D. P. Bertsekas ad J. N. Tstskls. Parallel ad Dstrbuted Computato: Numercal Methods. Athea Scetfc, Belmot, MA, 997. [3] S. Boyd, N. Parkh, E. Chu, B. Peleato, ad J. Eckste. Dstrbuted Optmzato ad Statstcal Learg va the Alteratg Drecto Method of Multplers. Foudatos ad Treds Mache Learg, 3():, 00. [4] Cahua Che, Bgsheg He, Yyu Ye, ad Xaomg Yua. The drect exteso of admm for mult-block covex mmzato problems s ot ecessarly coverget. Mathematcal Programmg, 55(-):57 79, 06. [5] Aaro Defazo, Fracs Bach, ad Smo Lacoste-Jule. Saga: A fast cremetal gradet method wth support for o-strogly covex composte objectves. I Advaces Neural Iformato Processg Systems, pages 646 654, 04. [6] J. Eckste. Augmeted Lagraga ad Alteratg Drecto Methods for Covex Optmzato: A Tutoral ad Some Illustratve Computatoal Results. Rutcor Research Report, 0. [7] Joatha Eckste ad Bear Fux Svater. Geeral projectve splttg methods for sums of maxmal mootoe operators. SIAM Joural o Cotrol ad Optmzato, 48():787 8, 009. [8] Potus Gselsso ad Stephe Boyd. Dagoal scalg douglasrachford splttg ad admm. I Decso ad Cotrol (CDC), 04 IEEE 53rd Aual Coferece o, pages 5033 5039. IEEE, 04. [9] Re Johso ad Tog Zhag. Acceleratg stochastc gradet descet usg predctve varace reducto. I Advaces Neural Iformato Processg Systems, pages 35 33, 03. [0] J. Mota, J Xaver, P. Aguar, ad M. Püschel. ADMM For Cosesus O Colored Networks. Proceedgs of IEEE Coferece o Decso ad Cotrol (CDC), 0. [] J. Mota, J. Xaver, P. Aguar, ad M. Püschel. D-ADMM : A Commucato-Effcet Dstrbuted Algorthm For Separable Optmzato. IEEE Trasactos o Sgal Processg, 6(0):78 73, 03. [] A. Nedć ad A. Ozdaglar. Dstrbuted subgradet methods for multaget optmzato. Automatc Cotrol, IEEE Trasactos o, 54():48 6, Ja 009. [3] Yur Nesterov. Itroductory lectures o covex optmzato: A basc course, volume 87. Sprger Scece & Busess Meda, 03. [4] Neal Parkh ad Stephe P Boyd. Proxmal algorthms. Foudatos ad Treds optmzato, (3):7 39, 04. [5] I. D. Schzas, R. Rbero, ad G. B. Gaaks. Cosesus Ad Hoc WSNs wth Nosy Lks - Part I: Dstrbuted Estmato of Determstc Sgals. IEEE Trasactos o Sgal Processg, 56:350 364, 008. [6] Mark Schmdt, Ncolas Le Roux, ad Fracs Bach. Mmzg fte sums wth the stochastc average gradet. arxv preprt arxv:309.388, 03. [7] We Sh, Qg Lg, Gag Wu, ad Wotao Y. Extra: A exact frst-order algorthm for decetralzed cosesus optmzato. SIAM Joural o Optmzato, 5():944 966, 05. [8] Joatha E Spgar. Partal verse of a mootoe operator. Appled mathematcs ad optmzato, 0():47 65, 983. [9] J. N. Tstskls. Problems Decetralzed Decso Makg ad Computato. PhD thess, Massachusetts Isttute of Techology, 984. [0] E. We ad A. Ozdaglar. Dstrbuted Alteratg Drecto Method of Multplers. Proceedgs of IEEE Coferece o Decso ad Cotrol (CDC), 0. [] E. We ad A. Ozdaglar. O the O(/k) covergece of asychroous dstrbuted alteratg Drecto Method of Multplers. I Global Coferece o Sgal ad Iformato Processg (GlobalSIP), 03 IEEE, pages 55 554. IEEE, 03. [] L Xao ad Tog Zhag. A proxmal stochastc gradet method wth progressve varace reducto. SIAM Joural o Optmzato, 4(4):057 075, 04. [3] H. Zhu, A. Cao, ad G. B. Gaaks. I-Network Chael Decodg Usg Cosesus o Log-Lkelhood Rato Averages. Proceedgs of Coferece o Iformato Sceces ad Systems (CISS), 008.