A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization

Size: px

Start display at page:

Download "A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization"

Stewart James
6 years ago
Views:

1 1 A Decenralized Second-Order Mehod wih Exac Linear Convergence Rae for Consensus Opimizaion Aryan Mokhari, Wei Shi, Qing Ling, and Alejandro Ribeiro Absrac This paper considers decenralized consensus opimizaion problems where differen summands of a global objecive funcion are available a nodes of a nework ha can communicae wih neighbors only. The proximal mehod of mulipliers is considered as a powerful ool ha relies on proximal primal descen and dual ascen updaes on a suiably defined augmened Lagrangian. The srucure of he augmened Lagrangian makes his problem nondecomposable, which precludes disribued implemenaions. This problem is regularly addressed by he use of he alernaing direcion mehod of mulipliers. The exac second order mehod (ESOM) is inroduced here as an alernaive ha relies on: (i) The use of a separable quadraic approximaion of he augmened Lagrangian. (ii) A runcaed Taylor s series o esimae he soluion of he firs order condiion imposed on he minimizaion of he quadraic approximaion of he augmened Lagrangian. The sequences of primal and dual variables generaed by ESOM are shown o converge linearly o heir opimal argumens when he aggregae cos funcion is srongly convex and is gradiens are Lipschiz coninuous. Numerical resuls demonsrae advanages of ESOM relaive o decenralized alernaives in solving leas squares and logisic regression problems. Index Terms Muli-agen neworks, decenralized opimizaion, mehod of mulipliers, linear convergence, second-order mehods I. INTRODUCTION In decenralized consensus opimizaion problems, componens of a global objecive funcion ha is o be minimized are available a differen nodes of a nework. Formally, consider a decision variable x R p and a conneced nework conaining n nodes where each node i has access o a local objecive funcion f i : R p R. Nodes can exchange informaion wih neighbors only and ry o minimize he global cos funcion n i=1 f i( x), n x := argmin f i ( x). (1) x R p We assume ha he local objecive funcions f i ( x) are srongly convex. The global objecive funcion n i=1 f i( x), which is he sum of a se of srongly convex funcions, is also srongly convex. Problems like (1) arise in decenralized conrol [1] [3], wireless communicaion [4], [5], sensor neworks [6] [8], and large scale machine learning [9] [11]. Decenralized mehods for solving (1) can be divided ino wo classes: primal domain mehods and dual domain mehods Work suppored by NSF CAREER CCF , ONR N , and NSFC A. Mokhari and A. Ribeiro are wih he Dep. of Elecrical and Sysems Engineering, Universiy of Pennsylvania, 200 S 33rd S., Philadelphia, PA {aryanm, aribeiro}@seas.upenn.edu. W. Shi is wih he Coordinaed Science Laboraory, Universiy of Illinois a Urbana-Champaign, 1308 W Main S, Urbana, IL wilburs@illinois.edu. Q. Ling is wih he Dep. of Auomaion, Universiy of Science and Technology of China, 96 Jinzhao Rd., Hefei, Anhui, , China. qingling@mail.usc.edu.cn. i=1 (1). Decenralized gradien descen (DGD) is a well-esablished primal mehod ha implemens gradien descen on a penalized version of (1) whose gradien can be separaed ino per-node componens. Nework Newon (NN) is a more recen alernaive ha acceleraes convergence of DGD by incorporaing second order informaion of he penalized objecive [12], [13]. Boh, DGD and NN, converge o a neighborhood of he opimal argumen x when using a consan sepsize and converge sublinearly o he exac opimal argumen if using a diminishing sepsize. Dual domain mehods build on he fac ha he dual funcion of (1) has a gradien wih separable srucure. The use of plain dual gradien descen is possible bu generally slow o converge [14] [16]. In cenralized opimizaion, beer convergence speeds are aained by he mehod of mulipliers (MM) ha adds a quadraic augmenaion erm o he Lagrangian [17], [18], or he proximal (P)MM ha adds an addiional erm o keep ieraes close. In eiher case, he quadraic erm ha is added o consruc he augmened Lagrangian makes disribued compuaion of primal gradiens impossible. This issue is mos ofen overcome wih he use of decenralized (D) versions of he alernaing direcion mehod of mulipliers (ADMM) [6], [19], [20]. Besides he ADMM, oher mehods ha use differen alernaives o approximae he gradiens of he dual funcion have also been proposed [21] [27]. The convergence raes of hese mehods have no been sudied excep for he DADMM and is varians ha are known o converge linearly o he opimal argumen when he local funcions are srongly convex and heir gradiens are Lipschiz coninuous [20], [28], [29]. An imporan observaion here is ha while all of hese mehods ry o approximae he MM or he PMM, he performance penaly enailed by he approximaion has no been sudied. This paper inroduces he exac second order mehod (ESOM) which uses quadraic approximaions of he augmened Lagrangians of (1) and leads o a se of separable subproblems. Similar o oher second order mehods, implemenaion of ESOM requires compuaion of Hessian inverses. Disribued implemenaion of his operaion is infeasible because while he Hessian of he proximal augmened Lagrangian is neighbor sparse, is inverse is no. ESOM resolves his issue by using he Hessian inverse approximaion echnique inroduced in [12], [13], [30]. This echnique consiss of runcaing he Taylor s series of he Hessian inverse o order K o obain he family of mehods ESOM-K. Implemenaion of his expansion in erms of local operaions is possible. A remarkable propery of all ESOM-K mehods is ha hey can be shown o pay a performance penaly relaive o (cenralized) PMM ha vanishes wih increasing ieraions. We begin he paper by reformulaing (1) in a form more suiable for decenralized implemenaion (Proposiion 1) and

2 2 proceed o describe he PMM (Secion II). ESOM is a variaion of PMM ha subsiues he proximal augmened Lagrangian wih is quadraic approximaion (Secion III). Implemenaion of ESOM requires compuing he inverse of he Hessian of he proximal augmened Lagrangian. Since his inversion canno be compued using local and neighboring informaion, ESOM-K approximaes he Hessian inverse wih he K-order runcaion of he Taylor s series expansion of he Hessian inverse. This expansion can be carried ou using an inner loop of local operaions. This and oher deails required for decenralized implemenaion of ESOM-K are discussed in Secion III-A along wih a discussion of how ESOM can be inerpreed as a saddle poin generalizaion of he Nework Newon mehods proposed in [12], [13] (Remark 1) or a second order version of he EXTRA mehod proposed in [31] (Remark 2). Convergence analyses of PMM and ESOM are hen presened (Secion IV). Linear convergence of PMM is esablished (Secion IV-A) and linear convergence facors explicily derived o use as benchmarks (Theorem 1). In he ESOM analysis (Secion IV-B) we provide an upper bound for he error of he proximal augmened Lagrangian approximaion (Lemma 3). We leverage his resul o prove linear convergence of ESOM (Theorem 2) and o show ha ESOM s linear convergence facor approaches he corresponding PMM facor as ime grows (Secion IV-C). This indicaes ha he convergence pahs of (disribued) ESOM-K and (cenralized) PMM are very close. We also sudy he dependency of he convergence consan wih he algorihm s order K. ESOM radeoffs and comparisons wih oher decenralized mehods for solving consensus opimizaion problems are illusraed in numerical experimens (Secion V) for a decenralized leas squares problem (Secion V-A) and a decenralized logisic regression classificaion problem (Secion V-B). Numerical resuls in boh seings verify ha larger K leads o faser convergence in erms of number of ieraions. However, we observe ha all version of ESOM-K exhibi similar convergence raes in erms of he number of communicaion exchanges. This implies ha ESOM-0 is preferable wih respec o he laer meric and ha larger K is jusified when compuaional cos is of ineres. Faser convergence relaive o EXTRA, Nework Newon, and DADMM is observed. We close he paper wih concluding remarks (Secion VI). Noaion. Vecors are wrien as x R n and marices as A R n n. Given n vecors x i, he vecor x = [x 1 ;... ; x n ] represens a sacking of he elemens of each individual x i. We use x and A o denoe he Euclidean norm of vecor x and marix A, respecively. The norm of vecor x wih respec o posiive definie marix A is x A := (x T Ax) 1/2. Given a funcion f is gradien x is denoed as f(x) and is Hessian as 2 f(x). II. PROXIMAL METHOD OF MULTIPLIERS Le x i R p be a copy of he decision variable x kep a node i and define N i as he neighborhood of node i. Assuming he nework is bidirecionally conneced, he opimizaion problem in (1) is equivalen o he program n {x i } n i=1 := argmin f i (x i ), {x i} n i=1 i=1 s.. x i = x j, for all i, j N i. (2) Indeed, he consrain in (2) enforces he consensus condiion x 1 = = x n for any feasible poin of (2). Wih his condiion saisfied, he objecive in (2) is equal o he objecive funcion in (1) from where i follows ha he opimal local variables x i are all equal o he opimal argumen x of (1), i.e., x 1 = = x n = x. To derive ESOM define x := [x 1 ;... ; x n ] R np as he concaenaion of he local decision variables x i and he aggregae funcion f : R np R as f(x) = f(x 1,..., x n ) := n i=1 f i(x i ) as he sum of all he local funcions f i (x i ). Inroduce he marix W R n n wih elemens w ij 0 represening a weigh ha node i assigns o variables of node j. The weigh w ij = 0 if and only if j / N i {i}. The marix W is furher required o saisfy W T = W, W1 = 1, null(i W) = span(1). (3) The firs condiion implies ha he weighs are symmeric, i.e., w ij = w ji. The second condiion ensures ha he weighs of a given node sum up o 1, i.e., n j=1 w ij = 1 for all i. Since W1 = 1 we have ha I W is rank deficien. The las condiion null(i W) = span(1) makes he rank of I W exacly equal o n 1 [32]. The marix W can be used o reformulae (2) as we show in he following proposiion. Proposiion 1 Define he marix Z := W I p R np R np as he Kronecker produc of he weigh marix W and he ideniy marix I p and consider he definiions of he global vecor x := [x 1 ;... ; x n ] and aggregae funcion f(x) := n i=1 f i(x i ). The opimizaion problem in (2) is equivalen o x = argmin f(x) s.. (I Z) 1/2 x = 0. (4) x R np I.e., x = [x 1;... ; x n] wih {x i }n i=1 he soluion of (2). Proof : We jus show ha he consrain ((I n W) I p )x = (I np Z)x = 0 is also a consensus consrain. To do so begin by noicing ha since I W is posiive semidefinie, I Z = (I W) I p is also posiive semidefinie. Therefore, he null space of he square roo marix (I Z) 1/2 is equal o he null space of I Z and we conclude ha saisfying he condiion (I Z) 1/2 x is equivalen o he consensus condiion x 1 = = x n. This observaion in conjuncion wih he definiion of he aggregae funcion f(x) = n i=1 f i(x i ) shows ha he programs in (4) and (3) are equivalen. In paricular, he opimal soluion of (4) is x = [x 1;... ; x n] wih {x i }n i=1 he soluion of (2). The formulaion in (4) is used o define he proximal mehod of mulipliers (PMM) ha we consider in his paper. To do so inroduce dual variables v R np o define he augmened Lagrangian L(x, v) of (4) as L(x, v) = f(x) v T (I Z) 1/2 x α 2 xt (I Z)x, (5) where α is a posiive consan. Given he properies of he marix Z, he augmenaion erm (α/2)x T (I Z)x is null when he

3 3 variable x is a feasible soluion of (4). Oherwise, he inner produc is posiive and behaves as a penaly for he violaion of he consensus consrain. Inroduce now a ime index N and define x and v as primal and dual ieraes a sep. The primal variable x 1 is updaed by minimizing he sum of he augmened Lagrangian in (5) and he proximal erm (ɛ/2) x x 2. We hen have ha { x 1 = argmin L(x, v ) ɛ x R np 2 x x 2}, (6) where he proximal coefficien ɛ > 0 is a sricly posiive consan. The dual variable v is updaed by ascending hrough he gradien of he augmened Lagrangian wih respec o he dual variable v L(x 1, v ) wih sepsize α v 1 = v α(i Z) 1/2 x 1. (7) The updaes in (6) and (7) for PMM can be considered as a generalizaion of he mehod of mulipliers (MM), because seing he proximal coefficien ɛ = 0 recovers he updaes of MM. The proximal erm (ɛ/2) x x 2 is added o keep he updaed variable x 1 close o he previous ierae x. This does no affec convergence guaranees bu improves compuaional sabiliy. The primal updae in (6) may be compuaionally cosly because i requires solving a convex program and canno be implemened in a decenralized manner because he augmenaion erm (1/)x T (I Z)x in (5) is no separable. In he following secion we propose an approximaion of PMM ha makes he minimizaion in (6) compuaionally economic and separable over nodes of he nework. This leads o he se of decenralized updaes ha define he ESOM algorihm. III. ESOM: EXACT SECOND-ORDER METHOD To reduce he compuaional complexiy of (6) and obain a separable updae we inroduce a second order approximaion of he augmened Lagrangian in (5). Consider hen he second order Taylor s expansion L(x, v ) L(x, v ) x L(x, v ) T (x x ) (1/2)(x x ) T 2 xl(x, v )(x x ) of he augmened Lagrangian wih respec o x cenered around (x, v ). Using his approximaion in lieu of L(x, v ) in (6) leads o he primal updae { x 1 = argmin L(x, v ) x L(x, v ) T (x x ) (8) x R np 1 2 (x x ) T( 2 xl(x, v ) ɛi ) } (x x ). The minimizaion in he righ hand side of (8) is of a posiive definie quadraic form. Thus, upon defining he Hessian marix H R np np as H := 2 f(x ) α(i Z) ɛi, (9) and considering he explici form of he augmened Lagrangian gradien x L(x, v ) [cf. (5)] i follows ha he variable x 1 in (8) is given by x 1 = x H ] [ f(x ) (I Z) 1/2 v α(i Z)x. (10) A fundamenal observaion here is ha he marix H, which is he Hessian of he objecive funcion in (8), is block neighbor sparse. By block neighbor sparse we mean ha he (i, j)h block is non-zero if and only if j N i or j = i. To confirm his claim, observe ha 2 f(x ) R np np is a block diagonal marix where is ih diagonal block is he Hessian of he ih local funcion, 2 f i (x i, ) R p p. Addiionally, marix ɛi np is a diagonal marix which implies ha he erm 2 f(x ) ɛi np is a block diagonal marix wih blocks 2 f i (x i, ) ɛi p. Furher, i follows from he definiion of he marix Z ha he marix I Z is neighbor sparse. Therefore, he Hessian H is also neighbor sparse. Alhough he Hessian H is neighbor sparse, is inverse H is no. This observaion leads o he conclusion ha he updae in (10) is no implemenable in a decenralized manner, i.e., nodes canno implemen (10) by exchanging informaion only wih heir neighbors. To resolve his issue, we use a Hessian inverse approximaion ha is buil on runcaing he Taylor s series of he Hessian inverse H as in [12]. To do so, we ry o decompose he Hessian as H = D B where D is a block diagonal posiive definie marix and B is a neighbor sparse posiive semidefinie marix. In paricular, define D as D := 2 f(x ) ɛi (I Z d ), (11) where Z d := diag(z). Observing he definiions of he marices H and D and considering he relaion B = D H we conclude ha B is given by B := α (I 2Z d Z). (12) Noice ha using he decomposiion H = D B and by facoring D 1/2, he Hessian inverse can be wrien as H = D /2 (I D /2 BD /2 ) D /2. Observe ha he inverse marix (I D /2 BD /2 ) can be subsiued by is Taylor s series u=0 (D/2 BD /2 ) u ; however, compuaion of he series requires global communicaion which is no affordable in decenralized seings. Thus, we approximae he Hessian inverse H by runcaing he firs K 1 erms of is Taylor s series which leads o he Hessian inverse approximaion (K), H (K) := D /2 K u=0 H ( ) u D /2 BD /2 /2 D. (13) Noice ha he approximae Hessian inverse H (K) is K-hop block neighbor sparse, i.e., he (i, j)h block is nonzero if and only if here is a leas one pah beween nodes i and j wih lengh K or smaller. We inroduce he Exac Second-Order Mehod (ESOM) as a second order mehod for solving decenralized opimizaion problems which subsiues he Hessian inverse in updae (10) by is K block neighbor sparse approximaion Ĥ k (K) defined in (13). Therefore, he primal updae of ESOM is x 1 = x H (K) ] [ f(x ) (I Z) 1/2 v α(i Z)x. The ESOM dual updae is idenical o he updae in (7), (14) v 1 = v α(i Z) 1/2 x 1. (15) Noice ha ESOM is differen from PMM in approximaing he augmened Lagrangian in he primal updae of PMM by a second order approximaion. Furher, ESOM approximaes he Hessian

4 4 inverse of he augmened Lagrangian by runcaing he Taylor s series of he Hessian inverse which is no necessarily neighbor sparse. In he following subsecion we sudy he implanaion deails of he updaes in (14) and (15). A. Decenralized implemenaion of ESOM The updaes in (14) and (15) show ha ESOM is a second order approximaion of PMM. Alhough hese updaes are necessary for undersanding he raionale behind ESOM, hey are no implemenable in a decenralized fashion since he marix (I Z) 1/2 is no neighbor sparse. To resolve his issue, define he sequence of variables q as q := (I Z) 1/2 v. Considering he definiion of q, he primal updae in (14) can be wrien as x 1 = x H (K) ( f(x ) q α(i Z)x ). (16) By muliplying he dual updae in (15) by (I Z) 1/2 from he lef hand side and using he definiion q := (I Z) 1/2 v we obain ha q 1 = q α(i Z)x 1. (17) Noice ha he sysem of updaes in (16) and (17) is equivalen o he updaes in (14) and (15), i.e., he sequences of variables x generaed by hem are idenical. Nodes can implemen he primaldual updaes in (16) and (17) in a decenralized manner, since he squared roo marix (I Z) 1/2 is eliminaed from he updaes and nodes can compue he producs (I Z)x and (I Z)x 1 by exchanging informaion wih heir neighbors. To characerize he local updae of each node for implemening he updaes in (16) and (17), define g := x L(x, v ) = f(x ) q α(i Z)x, (18) as he gradien of he augmened Lagrangian in (5). Furher, define he primal descen direcion d (K) wih K levels of approximaion as d (K) := H (K) g, (19) which implies ha he updae in (16) can be wrien as x 1 = x d (K). Based on he mechanism of he Hessian inverse approximaion in (13), he descen direcions d (k) and d (k 1) saisfy he condiion d (k 1) = D Bd (k) D g. (20) Define d i, (k) as he descen direcion of node i a sep which is he ih elemen of he global descen direcion d (k) = [d 1, (k);... ; d n, (k)]. Therefore, he localized version of he relaion in (20) a node i is given by d i, (k 1) = D ii, j=i,j N i B ij d j, (k) D ii, g i,. (21) The updae in (21) shows ha node i can compue is (k 1)h descen direcion d i, (k 1) if i has access o he kh descen direcion d i, (k) of iself and is neighbors d j, (k) for j N i. Thus, if nodes iniialize wih he ESOM-0 descen direcion d i, (0) = D ii, g i, and exchange heir descen direcions wih heir neighbors for K rounds and use he updae in (21), hey can compue heir local ESOM-K descen direcion d i, (K). Noice ha he ih diagonal block D is given by D ii, := Algorihm 1 ESOM-K mehod a node i Require: Iniial ieraes x i,0 = x j,0 = 0 for all j N i and q i,0 = 0. 1: B blocks: B ii = α(1 w ii)i and B ij = αw iji 2: for = 0, 1, 2,... do 3: D block: D ii, = 2 f i(x i,) ((1 w ii) ɛ)i 4: Compue g i, = f i(x i,)q i,α(1 w ii)x i, α j N i w ijx j, 5: Compue ESOM-0 descen direcion d i,(0) = D ii, gi, 6: for k = 0,..., K 1 do 7: Exchange d i,(k) wih neighbors [ j N i ] 8: Compue d i,(k 1)= D ii, B ijd j,(k) g i, j N i,j=i 9: end for 10: Updae primal ierae: x i,1 = x i, d i,(k). 11: Exchange ieraes x i, wih neighbors j N i. 12: Updae dual ierae: q i,1 = q i, α(1 w ii)x i,1 α 13: end for j N i w ijx j,1. 2 f i (x i, )((1 w ii )ɛ)i, where x i, is he primal variable of node i a sep. Thus, he block D ii, is locally available a node i. Moreover, node i can evaluae he blocks B ii = α(1 w ii )I and B ij = αw ij I wihou exra communicaion. In addiion, nodes can compue he gradien g by communicaing wih heir neighbors. To confirm his claim observe ha he ih elemen of g = [g 1, ;... ; g n, ] associaed wih node i is given by g i, := f i (x i, ) q i, α(1 w ii )x i, α w ij x j,, (22) j N i where q i, R p is he ih elemen of q = [q 1, ;... ; q n, ] and x i, he primal variable of node i a sep and hey are boh available a node i. Hence, he updae in (16) can be implemened in a decenralized manner. Likewise, nodes can implemen he dual updae in (17) using he local updae q i,1 = q i, α(1 w ii )x i,1 α j N i w ij x j,1, (23) which requires access o he local primal variable x j,1 of he neighboring nodes j N i. The seps of ESOM-K are summarized in Algorihm 1. The core seps are Seps 5-9 which correspond o compuing he ESOM-K primal descen direcion d i, (K). In Sep 5, Each node compues is iniial descen direcion d i, (0) using he block D ii, and he local gradien g i, compued in Seps 3 and 4, respecively. Seps 7 and 8 correspond o he recursion in (21). In sep 7, nodes exchange heir kh level descen direcion d i, (k) wih heir neighboring nodes o compue he (k 1)h descen direcion d i, (k 1) in Sep 8. The oucome of his recursion is he Kh level descen direcion d i, (K) which is required for he updae of he primal variable x i, in Sep 10. Noice ha he blocks of he neighbor sparse marix B, which are required for sep 8, are compued and sored in Sep 1. Afer updaing he primal variables in Sep 10, nodes exchange heir updaed variables x i,1 wih heir neighbors j N i in Sep 11. By having access o he decision variable of neighboring nodes, nodes updae heir local dual variable q i, in Sep 12.

5 5 Remark 1 The proposed ESOM algorihm solves problem (4) in he dual domain by defining he proximal augmened Lagrangian. I is also possible o solve problem (4) in he primal domain by solving a penaly version of (4). In paricular, by using he quadraic penaly funcion (1/2). 2 for he consrain (I Z) 1/2 x wih penaly coefficien α, we obain he penalized version of (4) ˆx := argmin x R np f(x) α 2 xt (I Z)x, (24) where ˆx is he opimal argumen of he penalized objecive funcion. Noice ha ˆx is no equal o he opimal argumen x and he disance x ˆx is in he order of O(1/α). The objecive funcion in (24) can be minimized by descending hrough he gradien descen direcion which leads o he updae of decenralized gradien descen (DGD) [33]. The convergence of DGD can be improved by using Newon s mehod. Noice ha he Hessian of he objecive funcion in (24) is given by Ĥ := 2 f(x) α(i Z). (25) The Hessian Ĥ in (25) is idenical o he Hessian H in (9) excep for he erm ɛi. Therefore, he same echnique for approximaing he Hessian inverse Ĥ can be used o approximae he Newon direcion of he penalized objecive funcion in (24) which leads o he updae of he Nework Newon (NN) mehods [12], [13]. Thus, ESOM and NN use an approximae decenralized variaion of Newon s mehod for solving wo differen problems. In oher words, ESOM uses he approximae Newon direcion for minimizing he augmened Lagrangian of (4), while NN solves a penalized version of (4) using his approximaion. This difference jusifies he reason ha he sequence of ieraes generaed by ESOM converges o he opimal argumen x (Secion IV), while NN converges o a neighborhood of x. Remark 2 ESOM approximaes he augmened Lagrangian L(x, v) in (6) by is second order approximaion. If we subsiue he augmened Lagrangian by is firs order approximaion we can recover he updae of EXTRA proposed in [31]. To be more precise, we can subsiue L(x, v ) by is firs order approximaion L(x, v ) L(x, v ) T (x x ) near he poin (x, v ) o updae he primal variable x. Considering his subsiuion and he definiion of he augmened Lagrangian in (5) I follows ha he updae for he primal variable x can be wrien as x 1 = x 1 ] [ f(x ) α(i Z) 1/2 v α(i Z)x. (26) ɛ By subracing he updae a sep from he updae a sep and using he dual variables relaion ha v 1 = v α(i Z) 1/2 x 1 we obain he updae ( x 1 = 2I [ α ɛ α2 ɛ ] ) (I Z) x (I α ) ɛ (I Z) x 1 ɛ ( f(x ) f(x )). (27) The updae in (27) shows a firs-order approximaion of he PMM. I is no hard o show ha for specific choices of α and ɛ, he updae in (27) is equivalen o he updae of EXTRA in [31]. Thus, we expec o observe faser convergence for ESOM relaive o EXTRA as i incorporaes second-order informaion. This advanage is sudied in Secion V. IV. CONVERGENCE ANALYSIS In his secion we sudy convergence raes of PMM and ESOM. Firs, we show ha he sequence of ieraes x generaed by PMM converges linearly o he opimal argumen x. Alhough, PMM canno be implemened in a decenralized fashion, is convergence rae can be used as a benchmark for evaluaing performance of ESOM. We hen follow he secion by analyzing convergence properies ESOM. We show ha ESOM exhibis a linear convergence rae and compare is facor of linear convergence wih he linear convergence facor of PMM. In proving hese resuls we consider he following assumpions. Assumpion 1 The local objecive funcions f i (x) are wice differeniable and he eigenvalues of he local objecive funcions Hessian 2 f(x) are bounded by posiive consans 0 < m M <, i.e. mi 2 f i (x i ) MI, (28) for all x i R p and i = 1,..., n. The lower bound in (28) is equivalen o he condiion ha he local objecive funcions f i are srongly convex wih consan m > 0. The upper bound for he eigenvalues of he Hessians 2 f i implies ha he gradiens of he local objecive funcions f i are Lipschiz coninuous wih consan M. Noice ha he global objecive funcion 2 f(x) is a block diagonal marix where is ih diagonal block is 2 f i (x i ). Therefore, he bounds on he eigenvalues of he local Hessians 2 f i (x i ) in (28) also hold for he global objecive funcion Hessian 2 f(x). I.e., mi 2 f(x) MI, (29) for all x R np. Thus, he global objecive funcion f is also srongly convex wih consan m and is gradiens f are Lipschiz coninuous wih consan M. A. Convergence of Proximal Mehod of Mulipliers (PMM) Convergence rae of PMM can be considered as a benchmark for he convergence rae of ESOM. To esablish linear convergence of PMM, We firs sudy he relaionship beween he primal x and dual v ieraes generaed by PMM and he opimal argumens x and v in he following lemma. Lemma 1 Consider he updaes for he proximal mehod of mulipliers in (6) and (7). The sequences of primal and dual ieraes generaed by PMM saisfy and v 1 v α(i Z) 1/2 (x 1 x ) = 0, (30) f(x 1 ) f(x ) (I Z) 1/2 (v 1 v ) Proof: See Appendix A. ɛ(x 1 x ) = 0. (31) Considering he preliminary resuls in (30) and (31), we can sae convergence resuls of PMM. To do so, we prove linear convergence of a Lyapunov funcion of he primal x x 2

6 6 and dual v v 2 errors. To be more precise, we define he vecor u R 2np and marix G R np np as [ ] [ ] v I 0 u =, G =. (32) x 0 αɛi Noice ha he sequence u is he concaenaion of he dual variable v and primal variable x. Likewise, we can define u as he concaenaion of he opimal argumens v and x. We proceed o prove ha he sequence u u 2 G converges linearly o null. Observe ha u u 2 G can be simplified as v v 2 αɛ x x 2. This observaion shows ha u u 2 G is a Lyapunov funcion of he primal x x 2 and dual v v 2 errors. Therefore, linear convergence of he sequence u u 2 G implies linear convergence of he sequence x x 2. In he following heorem, we show ha he sequence u u 2 G converges o zero a a linear rae. Theorem 1 Consider he proximal mehod of mulipliers as inroduced in (6) and (7). Consider β > 1 as an arbirary consan sricly larger han 1 and define ˆλ min (I Z) as he smalles nonzero eigenvalue of he marix I Z. Furher, recall he definiions of he vecor u and marix G in (32). If Assumpion 1 holds, hen he sequence of Lyapunov funcions u u 2 G generaed by PMM saisfies u 1 u 2 G 1 1 δ u u 2 G, (33) where he consan δ is given by { } ˆλ min (I Z) 2mM δ = min, β(m M) ɛ(m M), (β 1)αˆλ min (I Z). βɛ (34) Proof: See Appendix B. The resul in Theorem 1 shows linear convergence of he sequence u u 2 G generaed by PMM where he facor of linear convergence is 1/(1 δ). Observe ha larger δ implies smaller linear convergence facor 1/(1δ) and faser convergence. Noice ha all he erms in he minimizaion in (34) are posiive and herefore he consan δ is sricly larger han 0. In addiion, he resul in Theorem 1 holds for any feasible se of parameers β > 1, ɛ > 0, and α > 0; however, maximizing he parameer δ requires properly choosing he se of parameers β, ɛ, and α. Observe ha when he firs posiive eigenvalue ˆλ min (I Z) of he marix I Z, which is he second smalles eigenvalue of I Z, is small he consan δ becomes close o zero and convergence becomes slow. Noice ha small ˆλ min (I Z) shows ha he graph is no highly conneced. This observaion maches he inuiion ha when he graph has less edges he speed of convergence is slower. Addiionally, he upper bounds in (34) show ha when he condiion number M/m of he global objecive funcion f is large, δ becomes small and he linear convergence becomes slow. Alhough PMM enjoys a fas linear convergence rae, each ieraion of PMM requires infinie rounds of communicaions which makes i infeasible. In he following secion, we sudy convergence properies of ESOM as a second order approximaion of PMM ha is implemenable in decenralized seings. B. Convergence of ESOM We proceed o show ha he sequence of ieraes x generaed by ESOM converges linearly o he opimal argumen x = [ x ;... ; x ]. To do so, we firs prove linear convergence of he Lyapunov funcion u u 2 G as defined in (32). Moreover, we show ha by increasing he Hessian inverse approximaion accuracy, ESOM facor of linear convergence can be arbirary close o he linear convergence facor of PMM in Theorem 1. Noice ha ESOM is buil on a second order approximaion of he proximal augmened Lagrangian used in he updae of PMM. To guaranee ha he second order approximaion suggesed in ESOM is feasible, he local objecive funcions f i are required o be wice differeniable as assumed in Assumpion 1. The wice differeniabiliy of he local objecive funcions f i implies ha he aggregae funcion f, which is he sum of a se of wice differeniable funcions, is also wice differeniable. This observaion shows ha he global objecive funcion 2 f(x) is definable. Considering his observaion, we prove some preliminary resuls for he ieraes generaed by ESOM in he following lemma. Lemma 2 Consider he updaes of ESOM in (14) and (15). Recall he definiions of he augmened Lagrangian Hessian H in (9) and he approximae Hessian inverse H (K) in (13). If Assumpion 1 holds, hen he primal and dual ieraes generaed by ESOM saisfy v 1 v α(i Z) 1/2 (x 1 x ) = 0. (35) Moreover, we can show ha f(x 1 ) f(x ) (I Z) 1/2 (v 1 v ) (36) ɛ(x 1 x ) e = 0, where he error vecor e is defined as e := f(x ) 2 f(x )(x 1 x ) f(x 1 ) ( ) H (K) H (x 1 x ). (37) Proof: See Appendix C. The resuls in Theorem 2 show he relaionships beween he primal x and dual v ieraes generaed by ESOM and he opimal argumens x and v. The firs resul in (35) is idenical o he convergence propery of PMM in (30), while he second resul in (36) differs from (31) in having he exra summand e. The vecor e can be inerpreed as he error of second order approximaion for ESOM a sep. To be more precise, he opimaliy condiion of he primal updae of PMM is given by f(x 1 ) (I Z) 1/2 v α(i Z)x 1 ɛ(x 1 x ) = 0 as shown in (31). Noice ha he second order approximaion of his condiion is equivalen o f(x ) 2 f(x )(x 1 x ) (I Z) 1/2 v α(i Z)x 1 ɛ(x 1 x ) = 0. However, he exac Hessian inverse H = ( 2 f(x ) ɛi α(i Z)) canno be compued in a disribued manner o solve he opimaliy condiion. Thus, i is approximaed by he approximae Hessian inverse marix H (K) as inroduced in (13). This shows ha he approximae opimaliy condiion in ESOM is f(x ) (I Z) 1/2 v α(i Z)x H (x 1 x ) = 0. Hence, he difference beween he opimaliy condiions of PMM and ESOM is e = f(x ) f(x 1 )α(i Z)(x x 1 )

7 7 H (x 1 x ) ɛ(x 1 x ). By adding and subracing he erm H (x 1 x ), he definiion of he error vecor e in (37) follows. The observaion ha he vecor e characerizes he error of second order approximaion in ESOM, moivaes analyzing an upper bound for he error vecor norm e. To prove ha he norm e is bounded above we assume he following condiion is saisfied. Assumpion 2 The global objecive funcion Hessian 2 f(x) is Lipschiz coninuous wih consan L, i.e., 2 f(x) 2 f( x) L x x. (38) The condiions imposed by Assumpion 2 is cusomary in he analysis of second order mehods; see, e.g., [29]. In he following lemma we use he assumpion in (38) o prove an upper bound for he error norm e in erms of he disance x 1 x. Lemma 3 Consider ESOM as inroduced in (8)-(15) and recall he definiion of he error vecor e in (37). Furher, define c > 0 as a lower bound for he local weighs w ii. If Assumpions 1-2 hold, hen he error vecor norm e is bounded above by e Γ x 1 x, (39) where Γ is defined as { Γ := min 2M, L } 2 x 1 x (M ɛ (1 c)) ρ K1, and ρ := (1 c)/((1 c) m ɛ). Proof: See Appendix D. (40) Firs, noe ha he lower bound c > 0 on he local weighs w ii is implied from he fac ha all he local weighs are posiive. In paricular, we can define he lower bound c as c := min i w ii. The resul in (39) shows ha he error of second order approximaion in ESOM vanishes as he sequence of ieraes x approaches he opimal argumen x. We will show in Theorem 2 ha x x converges o zero which implies ha he limi of he sequence x 1 x is zero. To undersand he definiion of Γ in (40), we have o decompose he error vecor e in (37) ino wo pars. The firs par is f(x ) 2 f(x )(x 1 x ) f(x 1 ) which comes from he fac ha ESOM minimizes a second order approximaion of he proximal augmened Lagrangian insead of he exac proximal augmened Lagrangian. This erm can be bounded by min{2m, (L/2) x 1 x } x 1 x as shown in Lemma 3. The second par of he error vecor e is ( H (K) H )(x 1 x ) which shows he error of Hessian inverse approximaion. Noice ha compuaion of he exac Hessian inverse H is no possible and ESOM approximaes he exac Hessian by he approximaion H (K). According o he resuls in [12], he difference H (K) H can upper bounded by (M ɛ2(1 c)/α)ρ K1 which jusifies he second erm of he expression for Γ in (40). In he following heorem, we use he resul in Lemma 3 o show ha he sequence of Lyapunov funcions u u 2 G generaed by ESOM converges o zero linearly. Theorem 2 Consider ESOM as inroduced in (8)-(15). Consider β > 1 and φ > 1 as arbirary consans ha are sricly larger han 1, and ζ as a posiive consan ha is chosen from he inerval ζ ((m M)/2mM, ɛ/γ 2 ). Furher, recall he definiions of he vecor u and marix G in (32) and consider ˆλ min (I Z) as he smalles non-zero eigenvalue of he marix I Z. If Assumpions 1-2 hold, hen he sequence of Lyapunov funcions u u 2 G generaed by ESOM saisfies u 1 u 2 G 1 1 δ where he sequence δ is given by { [ δ ˆλ min (I Z) = min φβ(m M), 2mM ɛ(m M) 1 ζɛ [ (β 1)αˆλ min (I Z) 1 ζγ2 βɛ ɛ Proof: See Appendix E. u u 2 G. (41) ][ 1 φγ2 (β 1) (φ 1)ɛ 2 ], (42) ] } The resul in Theorem 2 shows linear convergence of he sequence u u 2 G generaed by ESOM where he facor of linear convergence is 1/(1δ ). Noice ha he posiive consan ξ is chosen from he inerval ((mm)/2mm, ɛ/γ 2 ). This inerval is non-empy if and only if he proximal parameer ɛ saisfies he condiion ɛ > Γ 2 (m M)/2mM. I follows from he resul in Theorem 2 ha he sequence of primal variables x converges o he opimal argumen x defined in (4). Corollary 1 Under he assumpions in Theorem 2, he sequence of squared errors x x 2 generaed by ESOM converges o zero a a linear rae, i.e., ( x x min {δ }. ) u 0 u 2 G. (43) αɛ Proof: According o he definiion of he sequence u and marix G, we can wrie u u 2 G = αɛ x x 2 v v 2 which implies ha x x 2 (1/αɛ) u u 2 G. Considering his resul and linear convergence of he sequence u u 2 G in (41), he claim in (43) follows. C. Convergence raes comparison The expression for δ in (42) verifies he inuiion ha he convergence rae of ESOM is slower han PMM. This is rue, since he upper bounds for δ in PMM are larger han heir equivalen upper bounds for δ in ESOM. We obain ha δ is smaller han δ which implies ha he linear convergence facor 1/(1 δ) of PMM is smaller han 1/(1 δ ) for ESOM. Therefore, for all seps, he linear convergence of PMM is faser han ESOM. Alhough, linear convergence facor of ESOM 1/(1δ ) is larger han 1/(1 δ) for PMM, as ime passes he gap beween hese wo consans becomes smaller. In paricular, noice ha afer a number of ieraions (L/2) x 1 x becomes smaller han 2M and Γ can be simplified as Γ L 2 x 1 x ((1 c) M ɛ) ρ K1. (44)

8 8 The erm (L/2) x 1 x evenually approaches zero, while he second erm (2(1 c)/α M ɛ)ρ K1 is consan. Alhough, he second erm is no approaching zero, by proper choice of ρ and K, his erm can become arbirary close o zero. Noice ha when Γ approaches zero, if we se ζ = 1/Γ he upper bounds in (42) for δ approach he upper bounds for δ of PMM in (34). Therefore, as ime passes Γ becomes smaller and he facor of linear convergence for ESOM 1/(1 δ ) becomes closer o he linear convergence facor of PMM 1/(1 δ). V. NUMERICAL EXPERIMENTS In his secion we compare he performances of ESOM, EX- TRA, Decenralized (D)ADMM, and Nework Newon (NN). Firs we consider a linear leas squares problem and hen we use he menioned mehods o solve a logisic regression problem. A. Decenralized linear leas squares Consider a decenralized linear leas squares problem where each agen i {1,, n} holds is privae measuremen equaion, y i = M i xν i, where y i R mi and M i R mi p are measured daa, x R p is he unknown variable, and ν i R mi is some unknown noise. The decenralized linear leas squares esimaes x by solving he opimizaion problem n x = argmin x M i x y i 2 2. (45) i=1 The nework in his experimen is randomly generaed wih conneciviy raio r = 3/n, where r is defined as he number of edges divided by he number of all possible ones, n(n 1)/2. We se n = 20, p = 5, and m i = 5 for all i = 1,..., n. The vecors y i and marices M i as well as he noise vecors ν (i), for all i are generaed following he sandard normal disribuion. We precondiion he aggregaed daa marices M i so ha he condiion number of he problem is 10. The decision variables x i are iniialized as x i,0 = 0 for all nodes i = 1,..., n and he iniial disance o he opimal is x i,0 x = 100. We use Meropolis consan edge weigh marix as he mixing marix W in all experimens. We run PMM, EXTRA, and ESOM- K wih fixed hand-opimized sepsizes α. The bes choices of α for ESOM-0, ESOM-1, and ESOM-2 are α = 0.03, α = 0.04, and α = 0.05, respecively. The sepsize α = 0.1 leads o he bes performance for EXTRA which is considered in he numerical experimens. Noice ha for variaions of NN-K, here is no opimal choice of sepsize smaller sepsize leads o more accurae bu slow convergence, while large sepsize acceleraes he convergence bu o a less accurae neighborhood of he opimal soluion. Therefore, for NN-0, NN-1, and NN-2 we se α = 0.001, α = 0.008, and α = 0.02, respecively. Alhough he PMM algorihm is no implemenable in a decenralized fashion, we use is convergence pah which is generaed in a cenralized manner as our benchmark. The choice of sepsize for PMM is α = 2. Fig. 1 illusraes he relaive error x x / x 0 x versus he number of ieraions. Noice ha he vecor x is he concaenaion of he local vecors x i, and he opimal vecor x is defined as x = [ x ;... ; x ] R np. Observe ha all he variaions of NN-K fail o converge o he opimal argumen and hey converge linearly o a neighborhood of he opimal soluion x. Relaive error NN-0 NN-1 NN-2 EXTRA ESOM-0 ESOM-1 ESOM-2 PMM Number of ieraions Fig. 1: Relaive error x x / x 0 x of EXTRA, ESOM-K, NN- K, and PMM versus number of ieraions for he leas squares problem. Using a larger value of K for ESOM-K leads o faser convergence and makes he convergence pah closer o he one for PMM. Relaive error NN-0 NN-1 NN-2 EXTRA ESOM-2 ESOM-1 ESOM Rounds of communicaions Fig. 2: Relaive error x x / x 0 x of EXTRA, ESOM-K, NN- K, and PMM versus rounds of communicaions wih neighboring nodes for he leas squares problem. ESOM-0 is he mos efficien algorihm in erms of communicaion cos among all he mehods. Among he decenralized algorihms wih exac linear convergence rae, EXTRA has he wors performance and all he variaions of ESOM-K ouperform EXTRA. Recall ha he problem condiion number is 10 in our experimen and he difference beween EXTRA and ESOM-K is more significan for problems wih larger condiion numbers. Furher, choosing a larger value of K for ESOM-K leads o faser convergence and as we increase K he convergence pah of ESOM-K approaches he convergence pah of PMM. EXTRA requires one round of communicaions per ieraion, while NN-K and ESOM-K require K 1 rounds of local communicaions per ieraion. Thus, convergence pahs of hese mehods in erms of rounds of communicaions migh be differen from he ones in Fig. 1. The convergence pahs of NN, ESOM, EXTRA in erms of rounds of local communicaions are shown in Fig. 2. In his plo we ignore PMM, since i requires infinie rounds of communicaions per ieraion. The main difference beween Figs. 1 and 2 is in he performances of ESOM-0, ESOM-

9 9 1, and ESOM-2. All of he variaions of ESOM ouperform EXTRA in erms of rounds of communicaions, while he bes performance belongs o ESOM-0. This observaion shows ha increasing he approximaion level K does no necessary improve he performance of ESOM-K in erms of communicaion cos. B. Decenralized logisic regression We consider he applicaion of ESOM for solving a logisic regression problem in a form x λ n m i := argmin x R p 2 x 2 ln ( 1 exp ( (s T ij x)y ij)), (46) i=1 j=1 where every agen i has access o m i raining samples (s ij, y ij ) R p {, 1}, j = 1,, m i, including explanaory/feaure variables s ij and binary oupus/oucomes y ij. The regularizaion erm (λ/2) x 2 is added o avoid overfiing where λ is a posiive consan. Hence, in he decenralized seing he local objecive funcion f i of node i is given by f i ( x) = λ m i 2n x 2 ln ( 1 exp ( (s T ij x)y ij)). (47) j=1 The seings are as follows. The conneced nework is randomly generaed wih n = 20 agens and conneciviy raio r = 3/n. Each agen holds 3 samples, i.e., m i = 3, for all i. The dimension of sample vecors s ij is p = 3. The samples are randomly generaed, and he opimal logisic classifier x is pre-compued hrough cenralized adapive gradien mehod. We use Meropolis consan edge weigh marix as he mixing marix W in ESOM- K. The sepsize α for ESOM-0, ESOM-1, ESOM-2, EXTRA, and DADMM are hand-opimized and he bes of each is used for he comparison. Fig. 3 and Fig 4 showcase he convergence pahs of ESOM- 0, ESOM-1, ESOM-2, EXTRA, and DADMM versus number of ieraions and rounds of communicaions, respecively. The resuls mach he observaions for he leas squares problem in Fig. 1 and Fig. 2. Differen versions of ESOM-K converge faser han EXTRA boh in erms of communicaion cos and number of ieraions. Moreover, ESOM-2 converges faser han ESOM-1 and ESOM-0 in erms of number of ieraions, while ESOM-0 has he bes performance in erms of communicaion cos for achieving a arge accuracy. Comparing he convergence pahs of ESOM- 0, ESOM-1, and ESOM-2 wih DADMM shows ha number of ieraions required for he convergence of DADMM is larger han he required ieraions for ESOM-0, ESOM-1, and ESOM-2. In erms of communicaion cos, DADMM has a beer performance relaive o ESOM-1 and ESOM-2, while ESOM-0 is he mos efficien algorihm. VI. CONCLUSIONS We sudied he consensus opimizaion problem where he componens of a global objecive funcion are available a differen nodes of a nework. We proposed an Exac Second-Order Mehod (ESOM) ha converges o he opimal argumen of he global objecive funcion a a linear rae. We developed he updae of ESOM by subsiuing he primal updae of Proximal Mehod of Mulipliers (PMM) wih is second order approximaion. Moreover, Relaive error EXTRA DADMM ESOM-0 ESOM-1 ESOM Number of ieraions Fig. 3: Relaive error x x / x 0 x of EXTRA, ESOM-K, and DADMM versus number of ieraions for he logisic regression problem. EXTRA is significanly slower han he ESOM mehods. The proposed mehods (ESOM-K) ouperform DADMM. Relaive error Rounds of communicaions EXTRA DADMM ESOM-0 ESOM-1 ESOM-2 Fig. 4: Relaive error x x / x 0 x of EXTRA, ESOM-K, and DADMM versus rounds of communicaions for he logisic regression problem. ESOM-0 has he bes performance in erms of rounds of communicaions and i ouperforms DADMM. we approximaed he Hessian inverse of he proximal augmened Lagrangian by runcaing is Taylor s series. This approximaion leads o a class of algorihms ESOM-K where K 1 indicaes he number of Taylor s series erms ha are used for Hessian inverse approximaion. Convergence analysis of ESOM-K shows ha he sequence of ieraes converges o he opimal argumen linearly irrespecive o he choice of K. We showed ha he linear convergence facor of ESOM-K is a funcion of ime and he choice of K. The linear convergence facor of ESOM approaches he linear convergence facor of PMM as ime passes. Moreover, larger choice of K makes he facor of linear convergence for ESOM closer o he one for PMM. Numerical resuls verify he heoreical linear convergence and he relaion beween he linear convergence facor of ESOM-K and PMM. Furher, we observed ha larger choice of K for ESOM-K leads o faser convergence in erms of number of ieraions, while he mos efficien version of ESOM-K in erms of communicaion cos is ESOM-0.

10 10 APPENDIX A PROOF OF LEMMA 1 Consider he updaes of PMM in (6) and (7). According o (4), he opimal argumen x saisfies he condiion (I Z) 1/2 x = 0. This observaion in conjuncion wih he dual variable updae in (7) yields he claim in (30). To prove he claim in (31), noe ha he opimaliy condiion of (6) implies x L(x 1, v ) ɛ(x 1 x ) = 0. Based on he definiion of he Lagrangian L(x, v) in (5), he opimaliy condiion for he primal updae of PMM can be wrien as f(x 1 )(I Z) 1/2 v α(i Z)x 1 ɛ(x 1 x ) = 0. (48) Furher, noice ha one of he KKT condiions of he opimizaion problem in (4) is f(x ) (I Z) 1/2 v = 0. (49) Moreover, he opimal soluion x = [ x ;... ; x ] of (4) lies in null{i Z}. Therefore, we obain α(i Z)x = 0. (50) Subracing he equaliies in (49) and (50) from (48) yields f(x 1 ) f(x ) (I Z) 1/2 (v v ) α(i Z)(x 1 x ) ɛ(x 1 x ) = 0. (51) Regrouping he erms in (30) implies ha v is equivalen o v = v 1 α(i Z) 1/2 (x 1 x ). (52) Subsiuing v in (51) by he expression in he righ hand side of (52) follows he claim in (31). APPENDIX B PROOF OF THEOREM 1 According o Assumpion 1, he global objecive funcion f is srongly convex wih consan m and is gradiens f are Lipschiz coninuous wih consan M. Considering hese assumpions, we obain ha he inner produc (x 1 x ) T ( f(x 1 ) f(x )) is lower bounded by mm m M x 1 x 2 1 m M f(x 1) f(x ) 2 (x 1 x ) T ( f(x 1 ) f(x )). (53) The resul in (31) shows ha he difference f(x 1 ) f(x ) is equal o (I Z) 1/2 (v 1 v ) ɛ(x 1 x ). Apply his subsiuion ino (53) and muliply boh sides of he resuled inequaliy by 2 o obain 2mM m M x 1 x 2 2 m M f(x 1) f(x ) 2 2(x 1 x ) T (I Z) 1/2 (v 1 v ) 2ɛ(x 1 x ) T (x 1 x ). (54) Based on he resul in (30), we can subsiue (x 1 x ) T (I Z) 1/2 by (1/α)(v 1 v ) T. Thus, we can rewrie (54) as mm m M x 1 x 2 m M f(x 1) f(x ) 2 (55) 2(v 1 v ) T (v 1 v ) ɛ(x 1 x ) T (x 1 x ). Noice ha for any vecors a, b, and c we can wrie 2(a b) T (a c) = a b 2 a c 2 b c 2. (56) By seing a = v 1, b = v, and c = v we obain ha he inner produc 2(v 1 v ) T (v 1 v ) in (55) can be wrien as v 1 v 2 v 1 v 2 v v 2. Likewise, seing a = x 1, b = x, and c = x implies ha he inner produc 2(x 1 x ) T (x 1 x ) in (55) is equal o x 1 x 2 x 1 x 2 x x 2. Applying hese simplificaions ino (55) yields mm m M x 1 x 2 m M f(x 1) f(x ) 2 v v 2 v 1 v 2 v 1 v 2 αɛ x x 2 αɛ x 1 x 2 αɛ x 1 x 2. (57) Now using he definiions of he variable u and marix G in (32) we can subsiue v v 2 v 1 v 2 αɛ x x 2 αɛ x 1 x 2 by u u 2 G u 1 u 2 G. Moreover, he squared norm v 1 v 2 is equivalen o x 1 x 2 α 2 (I Z) based on he resul in (30). By applying hese subsiuions we can rewrie (57) as mm m M x 1 x 2 m M f(x 1) f(x ) 2 u u 2 G u 1 u 2 G αɛ x 1 x 2 x 1 x 2 α 2 (I Z). (58) Regrouping he erms in (58) leads o he following lower bound for he difference u u 2 G u 1 u 2 G, u u 2 G u 1 u 2 G m M f(x 1) f(x ) 2 αɛ x 1 x 2 x 1 x 2 mm mm Iα2 (I Z). (59) Observe ha he resul in (59) provides a lower bound for he decremen u u 2 G u 1 u 2 G. To prove he claim in (33), we need o show ha for a posiive consan δ we have u u 2 G u 1 u 2 G δ u 1 u 2 G. Therefore, he inequaliy in (33) is saisfied if we can show ha he lower bound in (59) is greaer han δ u 1 u 2 G or equivalenly δ v 1 v 2 δαɛ x 1 x 2 m M f(x 1) f(x ) 2 αɛ x 1 x 2 x 1 x 2 mm mm Iα2 (I Z). (60) To prove ha he inequaliy in (60) for some δ > 0, we firs find an upper bound for he squared norm v 1 v 2 in erms of he summands in he righ hand side of (60). To do so, consider he relaion (31) along wih he fac ha v 1 and v boh lying in he column space of (I Z) 1/2. I follows ha v 1 v 2 is bounded above by v 1 v 2 βɛ 2 (β 1)ˆλ min (I Z) x 1 x 2 β ˆλ min (I Z) f(x 1) f(x ) 2. (61)

11 11 where β > 1 is a unable free parameer and ˆλ min (I Z) is he smalles non-zero eigenvalue of I Z. Considering he resul in (61) o saisfy he inequaliy in (60), which is a sufficien condiion for he claim in (33), i remains o show ha m M f(x 1) f(x ) 2 αɛ x 1 x 2 x 1 x 2 mm mm Iα2 (I Z) δβɛ 2 (β 1)ˆλ min (I Z) x 1 x 2 δɛα x 1 x 2 δβ ˆλ min (I Z) f(x 1) f(x ) 2. (62) To enable (62) and consequenly enabling (60), we only need o verify ha here exiss δ > 0 such ha mm m M I α2 (I Z) δαɛi, αɛ m M δβ ˆλ min (I Z), δβɛ 2 (β 1)ˆλ min (I Z). (63) The condiions in (63) are saisfied if he consan δ is chosen as in (34). Therefore, for δ in (34) he claim in (60) holds, which implies he claim in (33). APPENDIX C PROOF OF LEMMA 2 Consider he primal updae of ESOM in (14). By regrouping he erms we obain ha f(x ) (I Z) 1/2 v α(i Z)x H (x 1 x ) = 0, (64) where H is he inverse of he Hessian inverse approximaion H (K). Recall he definiion of he exac Hessian H in (9). Adding and subracing he erm H (x 1 x ) o he expression in (64) yields f(x ) 2 f(x )(x 1 x ) (I Z) 1/2 v (65) α(i Z)x 1 ɛ(x 1 x ) ( H H )(x 1 x ) = 0. Now using he definiion of he error vecor e in (37) we can rewrie (65) as f(x 1 ) (I Z) 1/2 v α(i Z)x 1 ɛ(x 1 x ) e = 0. (66) Noice ha he resul in (66) is idenical o he expression for PMM in (48) excep for he error erm e. To prove he claim in (36) from (66), i remains o follow he seps in (49)-(52). APPENDIX D PROOF OF LEMMA 3 To prove he resul in (39), we firs use he resul in Proposiion 2 of [29]. I shows ha when he eigenvalues of he Hessian 2 f(x) are bounded above by M and he Hessian is Lipschiz coninuous wih consan L we can wrie f(x ) 2 f(x )(x 1 x ) f(x 1 ) { min 2M, L } 2 x 1 x. (67) Considering he resul in (67), i remains o find an upper bound for he second erm of he error vecor e which is ( H (K) H )(x 1 x ). To do so, we develop firs an upper bound for he norm H (K) H. Noice ha by facoring he erm H (K) and using he Cauchy-Schwarz inequaliy we obain ha H (K) H H (K) I H H (K). (68) According o Lemma 3 in [12], we can simplify I H H (K) as (BD ) K1. This simplificaion implies ha I H H (K) = BD K1. (69) Observe ha he marices B and D in his paper are differen from he ones in [12], bu he analyses of hem are very similar. Similar o he proof of Proposiion 2 in [12] we define ˆD := (I Z d ). Noice ha he marix ˆD is bock diagonal where is ih diagonal block is (1 w ii )I p. Thus, ˆD is posiive definie and inverible. Hence, We are allowed o wrie he produc BD as BD = ( B ˆD ) ( ˆDD ). (70) The nex sep is o find an upper bound for he eigenvalues of B ˆD in (70). Based on he definiions of marices B and ˆD, he produc B ˆD is given by B ˆD = (I 2Z d Z) (2(I Z d )). (71) According o he resul in Proposiion 2 of [12], he eigenvalues of he marix (I 2Z d Z)(2(I Z d )) are uniformly bounded by 0 and 1. Thus, we obain ha B ˆD 1. (72) According o he definiions of he marices ˆD and D, he produc ˆD 1/2 D /2 is block diagonal and he ih diagonal block is given by [ ] ( ˆDD 2 f i (x i, ) ɛi = I). (73) ii (1 w ii ) Based on Assumpion 1, he eigenvalues of he local Hessians 2 f i (x i ) are bounded by m and M. Furher, noice ha he diagonal elemens w ii of he weigh marix W are bounded below by c. Considering hese bounds, we can show ha he eigenvalues of he marices (1/(1 w ii ))( 2 f i (x i, ) ɛi) I for all i = 1,..., n are bounded below by [ m ɛ (1 c) 1 ] I 2 f i (x i, ) ɛi (1 w ii ) I. (74) By considering he bounds in (74), he eigenvalues of each block of he marix ˆDD, inroduced in (73), are bounded above as ( 2 [ ] f i (x i, ) ɛi m ɛ I) (1 w ii ) (1 c) 1 I. (75) The upper bound in (75) for he eigenvalues of each diagonal block of he marix ˆDD implies ha he marix norm ˆDD is bounded above by (1 c) ˆDD ρ := (1 c) m ɛ. (76) Considering he upper bounds in (72) and (76) and he relaion

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 2, NO. 4, DECEMBER 2016 507 A Decenralized Second-Order Mehod wih Exac Linear Convergence Rae for Consensus Opimizaion Aryan Mokhari,