Network Newton Distributed Optimization Methods

Size: px
Start display at page:

Download "Network Newton Distributed Optimization Methods"

Transcription

1 Nework Newon Disribued Opimizaion Mehods Aryan Mokhari, Qing Ling, and Alejandro Ribeiro Absrac We sudy he problem of minimizing a sum of convex objecive funcions where he componens of he objecive are available a differen nodes of a nework and nodes are allowed o only communicae wih heir neighbors. The use of disribued gradien mehods is a common approach o solve his problem. Their populariy nowihsanding, hese mehods exhibi slow convergence and a consequen large number of communicaions beween nodes o approach he opimal argumen because hey rely on firs order informaion only. This paper proposes he nework Newon (NN) mehod as a disribued algorihm ha incorporaes second order informaion. This is done via disribued implemenaion of approximaions of a suiably chosen Newon sep. The approximaions are obained by runcaion of he Newon sep s Taylor expansion. This leads o a family of mehods defined by he number K of Taylor series erms kep in he approximaion. When keeping K erms of he Taylor series, he mehod is called NN-K and can be implemened hrough he aggregaion of informaion in K-hop neighborhoods. Convergence o a poin close o he opimal argumen a a rae ha is a leas linear is proven and he exisence of a radeoff beween convergence ime and he disance o he opimal argumen is shown. The numerical experimens corroborae reducions in he number of ieraions and he communicaion cos ha are necessary o achieve convergence relaive o firs-order alernaives. Index Terms Muli-agen nework, disribued opimizaion, Newon s mehod. I. INTRODUCTION Disribued opimizaion algorihms are used o solve he problem of minimizing a global cos funcion over a se of nodes in siuaions where he objecive funcion is defined as a sum of local funcions. To be more precise, consider a variable x R p and a conneced nework conaining n agens each of which has access o a local funcion f i : R p R. The agens cooperae in minimizing he aggregae cos funcion f : R p R aking values f(x) := n i=1 f i(x). I.e., agens cooperae in solving he problem x := argmin x R p f(x) = argmin x R p n f i (x). (1) i=1 Problems of his form arise ofen in, e.g., decenralized conrol sysems [3], [4], wireless sysems [5], [6], sensor neworks [7] [9], and large scale machine learning [10] [1]. There are differen algorihms o solve (1) in a disribued manner. The mos popular choices are decenralized gradien Work in his paper is suppored by NSF CAREER CCF , ONR N , and NSFC Aryan Mokhari and Alejandro Ribeiro are wih he Deparmen of Elecrical and Sysems Engineering, Universiy of Pennsylvania, 00 Souh 33rd Sree, Philadelphia, PA 19104, USA. {aryanm, aribeiro}@seas.upenn.edu. Qing Ling is wih he Deparmen of Auomaion, Universiy of Science and Technology of China, 96 Jinzhao Road, Hefei, Anhui, 3006, China. qingling@mail.usc.edu.cn. Par of he resuls in his paper appeared in [1] and []. This paper expands he resuls and presens convergence proofs ha are referenced in [1] and []. descen (DGD) [13] [16], disribued implemenaions of he alernaing direcion mehod of mulipliers [7], [17] [0], and decenralized dual averaging [1], []. Alhough here are subsanial differences beween hem, hese mehods can be generically absraced as combinaions of local descen seps followed by variable exchanges and averaging of informaion among neighbors. A feaure common o all of hese algorihms is he slow convergence rae in ill-condiioned problems since hey operae on firs order informaion only. This is no surprising because gradien descen mehods in cenralized seings where he aggregae funcion gradien is available a a single server have he same difficulies in problems wih skewed curvaure [see Chaper 9 of [3]]. This issue is addressed in cenralized opimizaion by Newon s mehod ha uses second order informaion o deermine a descen direcion adaped o he objecive s curvaure [see Chaper 9 of [3]]. In general, second order mehods are no available in disribued seings because disribued approximaions of Newon seps are difficul o devise. In he paricular case of flow opimizaion problems, hese approximaions are possible when operaing in he dual domain and have led o he developmen of he acceleraed dual descen mehods [4], [5]. As would be expeced, hese mehods resul in large reducions of convergence imes. Our goal is o develop approximae Newon s mehods o solve (1) in disribued seings where agens have access o heir local funcions only and exchange variables wih neighboring agens. We do so by inroducing Nework Newon (NN), a mehod ha relies on disribued approximaions of Newon seps for he global cos funcion f o accelerae convergence of DGD. We begin he paper wih an alernaive formulaion of (1) and a brief discussion of DGD (Secion II). We hen inroduce a reinerpreaion of DGD as an algorihm ha uilizes gradien descen o solve a penalized version of (1) in lieu of he original opimizaion problem (Secion II-A). This reinerpreaion explains convergence of DGD o a neighborhood of x. The volume of his neighborhood is given by he relaive weigh of he penaly funcion and he original objecive which is conrolled by a penaly coefficien. If gradien descen on he penalized funcion finds an approximae soluion o he original problem, he same soluion can be found wih a much smaller number of ieraions by using Newon s mehod. Alas, disribued compuaion of Newon seps requires global communicaion beween all nodes in he nework and is herefore impracical (Secion III). To resolve his issue we approximae he Newon sep of he penalized objecive funcion by runcaing he Taylor series of he exac Newon sep (Secion III-A). This approximaion resuls in a family of mehods indexed by he number of erms of he Taylor expansion ha are kep in he approximaion. The mehod ha resuls from keeping K of hese erms is ermed

2 NN-K. A fundamenal observaion here is ha he Hessian of he penalized funcion has a sparsiy srucure ha is he same sparsiy paern of he graph. Thus, when compuing erms in he Hessian inverse expansion, he firs order erm is as sparse as he graph, he second erm is as sparse as he wo hop neighborhood, and, in general, he k-h erm is as sparse as he k-hop neighborhood of he graph. Thus, implemenaion of he NN-K mehod requires aggregaing informaion from K hops away. Increasing K makes NN-K arbirarily close o Newon s mehod a he cos of increasing he communicaion overhead of each ieraion. We poin ou ha he same Taylor series is used in he developmen of he ADD algorihms, bu his is done o solve a nework uiliy maximizaion problem in he dual domain [4]. The Taylor expansion is uilized here o solve a consensus opimizaion problem in he primal domain. Convergence of NN-K o he opimal argumen of he penalized objecive is esablished (Secion IV). We do so by esablishing several auxiliary bounds on he eigenvalues of he marices involved in he definiion of he mehod (Proposiions 1-3 and Lemma ). We show ha a measure of he error beween he Hessian inverse approximaion uilized by NN-K and he acual inverse Hessian decays exponenially wih he mehod index K. This exponenial decrease hins ha using a small value of K should suffice in pracice. Convergence is formally claimed in Theorem 1 ha shows he convergence rae is a leas linear. I follows from his convergence analysis ha larger penaly coefficiens resul in faser convergence ha comes a he cos of increasing he disance beween he opimal soluions of he original and penalized objecives. We also sudy he convergence rae of he NN mehod as an approximaion of Newon s mehod (Secion IV-A). We show ha for all ieraions excep he firs few, a weighed gradien norm associaed wih NN-K ieraes follows a decreasing pah akin o he pah ha would be followed by Newon ieraes (Lemma 3). The only difference beween hese residual pahs is ha he NN-K pah conains a erm ha capures he error of he Hessian inverse approximaion. Leveraging his similariy, i is possible o show ha he rae of convergence is quadraic in a specific inerval whose lengh depends on he order K of he seleced nework Newon mehod (Theorem ). Exisence of his quadraic convergence phase explains why NN-K mehods converge faser han DGD as we observe in experimens. I is also worh remarking ha he error in he Hessian inverse approximaion can be made arbirarily small by increasing he mehod s order K and, as a consequence, he quadraic phase can be made arbirarily large. We wrap up he paper wih numerical analyses (Secion V). We firs demonsrae he advanages of NN-K relaive o alernaive primal and dual mehods for he minimizaion of a family of quadraic objecive funcions (Secion V-A). Then, we sudy he effec of objecive funcion condiion number and show ha he NN mehod ouperforms firs-order alernaives significanly in ill-condiioned problems (Secion V-B). Furher, we sudy he effec of nework opology on he performance of NN (Secion V-C). Moreover, we compare he convergence rae of NN in heory and pracice o show he ighness of he bounds in his paper (Secion V-D). The paper closes wih concluding remarks (Secion VI). Noaion. Vecors are wrien as x R n and marices as A R n n. The null space of marix A is denoed by null(a) and he span of a vecor by span(x). We use x and A o denoe he Euclidean norm of vecor x and marix A, respecively. The gradien of a funcion f(x) is denoed as f(x) and he Hessian marix is denoed as f(x). The i-h larges eigenvalue of marix A is denoed by µ i (A). II. DISTRIBUTED GRADIENT DESCENT The nework ha connecs he n agens is assumed conneced, symmeric, and specified by he neighborhoods N i ha conain he lis of nodes ha can communicae wih i for i = 1,..., n. In problem (1) agen i has access o he local cos f i (x) and agens cooperae o minimize he global cos f(x). This specificaion is more naurally formulaed by an alernaive represenaion of (1) in which node i selecs a local decision vecor x i R p. Nodes hen ry o achieve he minimum of heir local objecive funcions f i (x i ), while keeping heir variables equal o he variables x j of neighbors j N i. This alernaive formulaion can be wrien as n {x i } n i=1 := argmin f i (x i ), {x i} n i=1 i=1 s.. x i = x j, for all i, j N i. () Since he nework is conneced, he consrains x i = x j for all i and j N i imply ha (1) and () are equivalen and we have x i = x for all i. This mus be he case because for a conneced nework he consrains x i = x j for all i and j N i collapse he feasible space of () o a hyperplane in which all local variables are equal. When all variables are equal, he objecives in (1) and () coincide and so do heir opima. DGD is an esablished disribued mehod o solve () which relies on he inroducion of nonnegaive weighs w ij 0 ha are null if and only if j / N i {i} he use of ime varying weighs w ij is common in DGD implemenaions bu no done here; see, e.g., [13]. Leing N be a discree ime index and α a given sepsize, DGD is defined by he recursion n x i,+1 = w ij x j, α f i (x i, ), i = 1,..., n. (3) j=1 Since w ij = 0 when j i and j / N i, i follows from (3) ha each agen i updaes is variable x i by performing an average over he esimaes x j, of is neighbors j N i and is own esimae x i,, and descending hrough he negaive local gradien f i (x i, ). The weighs in (3) canno be arbirary. To express condiions on he se of allowable weighs define he marix W R n n wih enries w ij. We require he weighs o be symmeric, i.e., w ij = w ji for all i, j, and such ha he weighs of a given node sum up o 1, i.e., n j=1 w ij = 1 for all i. If he weighs sum up o 1 we mus have W1 = 1 which implies ha I W is rank deficien. I is also cusomary o require he rank of I W o be exacly equal o n 1 so ha he null space of I W is null(i W) = span(1). We herefore have he following hree resricions on he marix W, W T = W, W1 = 1, null(i W) = span(1). (4)

3 3 If he condiions in (4) are rue, i is possible o show ha (3) approaches he soluion of (1) in he sense ha x i, x for all i and large, [13]. The acceped inerpreaion of why (3) converges is ha nodes are gradien descending owards heir local minima because of he erm α f i (x i, ) bu also perform an average of neighboring variables n j=1 w ijx j,. This laer consensus operaion drives he agens o agreemen. In he following secion we show ha (3) can be alernaively inerpreed as a penaly mehod. A. Penaly mehod inerpreaion I is illuminaing o define marices and vecors so as o rewrie (3) as a single equaion. To do so define he vecors y := [x 1 ;... ; x n ] and h(y) := [ f 1 (x 1 );... ; f n (x n )]. Vecor y R np concaenaes he local vecors x i, and he vecor h(y) R np concaenaes he gradiens of he local funcions f i aken wih respec o he local variable x i. Noice ha h(y) is no he gradien of f(x) and ha a vecor y wih h(y) = 0 does no necessarily solve (1). To solve (1) we need o have x i = x j for all i and j wih n i=1 f i(x i ) = 0. In any even, o rewrie (3) we also define he marix Z := W I R np np as he Kronecker produc of he weigh marix W R n n and he ideniy marix I R p p. I is hen ready o see ha (3) is equivalen o y +1 = Zy αh(y ) = y [ (I Z)y + αh(y ) ], (5) where in he second equaliy we added and subraced y and regrouped erms. Inspecion of (5) reveals ha he DGD updae formula a sep is equivalen o a (regular) gradien descen algorihm being used o solve he program y := argmin F (y) := min 1 n yt (I Z)y+α f i (x i ). (6) i=1 This inerpreaion has been previously used in [14], [6] o design a Neserov ype acceleraion of DGD. Indeed, given he definiion of he funcion F (y) := (1/)y T (I Z) y + α n i=1 f i(x i ) i follows ha he gradien F (y ) is given by g := F (y ) = (I Z)y + αh(y ). (7) Using (7) we rewrie (5) as y +1 = y g and conclude ha DGD descends along he negaive gradien of F (y) wih uni sepsize. The expression in (3) is jus a disribued implemenaion of gradien descen ha uses he gradien in (7). To confirm ha his is rue, observe ha he ih elemen of he gradien g = [g 1, ;... ; g n, ] is given by g i, = (1 w ii )x i, j N i w ij x j, + α f i (x i, ). (8) The gradien descen ieraion y +1 = y g is hen equivalen o (3) if we enrus node i wih he implemenaion of he descen x i,+1 = x i, g i,, where, we recall, x i, and x i,+1 are he ih componens of he vecors y and y +1. Observe ha he local gradien componen g i, can be compued using local informaion and he x j, ieraes of is neighbors j N i. This is as i should be, because he descen x i,+1 = x i, g i, is equivalen o (3). Is i a good idea o descend on F (y) o solve (1)? To some exen. Since we know ha he null space of I W is null(i W) = span(1) and ha Z = W I we know ha he null space of I Z is he se of consensus vecors, i.e., null(i Z) = { y = [x 1 ;... ; x n ] } x1 = = x n. Thus, (I Z)y = 0 holds if and only if x 1 = = x n. Since he marix I Z is posiive semidefinie and symmeric, he same is rue of he square roo marix (I Z) 1/. Therefore, he opimizaion problem in () is equivalen o he opimizaion problem n ỹ := argmin f i (x i ), s.. (I Z) 1/ y = 0. (9) x i=1 Indeed, for y = [x 1 ;... ; x n ] o be feasible in (9) we mus have x 1 = = x n. This is he same consrain imposed in () from where i follows ha we mus have ỹ = [x 1;... ; x n] wih x i = x for all i. The unconsrained minimizaion in (6) is a penaly version of (9). The penaly funcion associaed wih he consrain (I Z) 1/ y = 0 is he squared norm (1/) (I Z) 1/ y and he corresponding penaly coefficien is 1/α. Inasmuch as he penaly coefficien 1/α is sufficienly large, he opimal argumens y and ỹ are no oo far apar. The reinerpreaion of (3) as a penaly mehod demonsraes ha DGD is an algorihm ha finds he opimal soluion of (6), no (9) or is equivalen original formulaions in (1) and (). Using a fixed α he disance beween y and ỹ is of order O(α), [15]. To solve (9) we need o inroduce a rule o progressively decrease α. In he following secion we exploi he reinerpreaion of (5) as a mehod o minimize (6) o propose an approximae Newon algorihm ha can be implemened in a disribued manner. III. NETWORK NEWTON Insead of solving (6) wih a gradien descen mehod as in DGD, we can solve (6) using Newon s mehod. To implemen Newon s mehod we need o compue he Hessian H := F (y ) of F evaluaed a y so as o deermine he Newon sep d := H 1 g. Sar by differeniaing wice in (6) in order o wrie H as H := F (y ) = I Z + αg, (10) where G R np np is a block diagonal marix formed by blocks G ii, R p p defined as G ii, = f i (x i, ). (11) I follows from (10) and (11) ha he Hessian H is block sparse wih blocks H ij, R p p having he sparsiy paern of Z, which is he sparsiy paern of he graph. The diagonal blocks are of he form H ii, = (1 w ii )I + α f i (x i, ) and he off diagonal blocks are no null only when j N i in which case H ij, = w ij I. While he Hessian H is sparse, he inverse H is no. I is he laer ha we need o compue he Newon sep d := g. To overcome his problem we spli he diagonal and off diagonal blocks of H and rely on a Taylor s expansion of he inverse This spliing echnique is inspired from he Taylor s expansion used in [4]. To be precise, wrie H = H 1

4 4 D B where he marix D is defined as D := αg + (I diag(z)) := αg + (I Z d ), (1) where in he second equaliy we defined Z d := diag(z) for fuure reference. Since he diagonal weighs mus be w ii < 1, he marix I Z d is posiive definie. The same is rue of he block diagonal marix G because he local funcions are assumed srongly convex. Therefore, he marix D is block diagonal and posiive definie. The ih diagonal block D ii, R p of D can be compued and sored by node i as D ii, = α f i (x i, ) + (1 w ii )I. To have H = D B we mus define B := D H. Considering he definiions of H and D in (10) and (1), i follows ha B = I Z d + Z. (13) Noe ha B is ime-invarian and depends on he weigh marix Z only. As in he case of he Hessian H, he marix B is block sparse wih blocks B ij R p p having he sparsiy paern of Z, which is he sparsiy paern of he graph. Node i can compue he diagonal blocks B ii = (1 w ii )I and he off diagonal blocks B ij = w ij I using informaion abou is own and neighbors weighs. Proceed now o facor D 1/ from boh sides of he spliing relaionship o wrie H = D 1/ (I D 1/ BD 1/ )D 1/. When we consider he Hessian inverse H 1, we can use he Taylor series (I X) 1 = D 1/ BD 1/ o wrie H 1 = D 1/ k=0 j=0 Xj wih X = ( ) k D 1/ BD 1/ 1/ D. (14) The sum in (14) converges if he absolue value of all he eigenvalues of he marix D 1/ BD 1/ are sricly less han 1. For he ime being we assume his o be he case bu we will prove ha his is rue in Secion IV. When he series converge, we can use runcaions of his series o define approximaions o he Newon sep as we explain in he following secion. Remark 1 The Hessian decomposiion H = D B wih he marices D and B in (1) and (13), respecively, is no he only valid decomposiion ha we can use for Nework Newon. Any decomposiion of he form H = D ± B is valid if D is posiive definie and he eigenvalues of he marix D 1/ B D 1/ are in he inerval ( 1, 1). An example alernaive decomposiion is given by he marices D = αg and B = I Z. This decomposiion has he advanage of separaing he effecs of he funcion in D and he effecs of he nework in B. The decomposiion in (1) and (13) exhibis faser convergence of he series in (14) because he marix D in (1) accumulaes more weigh in he diagonal han he marix D = αg. The sudy of alernaive decomposiions is beyond he scope of his paper. A. Disribued approximaions of he Newon sep Nework Newon (NN) is defined as a family of algorihms ha rely on runcaions of he series in (14). The Kh member of his family, NN-K, considers he firs K + 1 erms of he series o define he approximae Hessian inverse Ĥ (K) 1 := D 1/ K k=0 ( ) k D 1/ BD 1/ 1/ D. (15) NN-K uses he approximae Hessian Ĥ(K) 1 as a curvaure correcion marix ha is used in lieu of he exac Hessian inverse H 1 descending along he Newon sep d := H 1 along he NN-K sep d (K) := Ĥ(K) 1 o esimae he Newon sep. I.e., insead of g we descend g as an approximaion of d. Using he explici expression for Ĥ(K) 1 in (15) we wrie he NN-K sep as d (K) = D 1/ K k=0 ( ) k D 1/ BD 1/ 1/ D g, (16) where, we recall, g as he gradien of he funcion F (y) defined in (7). The NN-K updae can hen be wrien as y +1 = y + ɛ d (K), (17) where ɛ is a properly seleced sepsize see Theorem 1 for specific condiions. The algorihm defined by recursive applicaion of (17) can be implemened in a disribued manner because he runcaed series in (15) has a local srucure conrolled by he parameer K. To explain his saemen beer define he componens d (K) i, d (K) = [d (K) 1, ;... ; d(k) (17) requires ha node i compues d (K) i, he local descen x i,+1 = x i, + ɛd (K) i, R p of he NN-K sep n, ]. A disribued implemenaion of so as o implemen. The key observaion here is ha he sep componen d (K) i, can indeed be compued hrough local operaions. Specificially, begin by noing ha as per he definiion of he NN-K descen direcion in (16) he sequence of NN descen direcions saisfies d (k+1) = D 1 Bd (k) D 1 g = D 1 ( Bd (k) g ). (18) Since he marix B has he sparsiy paern of he graph, his recursion can be decomposed ino local componens ( ) d (k+1) i, = D 1 ii, B ij d (k) j, g i,, (19) j N i {i} The marix D ii, = α f i (x i, ) + (1 w ii )I is sored and compued a node i. The gradien componen g i, = (1 w ii )x i, j N i w ij x j, + α f i (x i, ) is also sored and compued a i. Node i can also evaluae he values of he marix blocks B ii = (1 w ii )I and B ij = w ij I. Thus, if he NN-k sep componens d (k) j, are available a neighbors j, node i can deermine he NN-(k + 1) sep componen d (k+1) i, upon being communicaed ha informaion. The expression in (19) represens an ieraive compuaion embedded inside he NN-K recursion in (17). A ime index, we compue he local componen of he NN-0 sep = D 1 ii, g i,. Upon exchanging his informaion wih neighbors we use (19) o deermine he NN-1 sep d (1) i,. These d (0) i, can be exchanged o compuer d () i, as in (19). Repeaing his procedure K imes, nodes ends up having deermined heir

5 5 Algorihm 1 Nework Newon-K mehod a node i Require: Iniial ierae x i,0. Weighs w ij. Penaly coefficien α. 1: B marix blocks: B ii = (1 w ii)i and B ij = w iji : for = 0, 1,,... do 3: D marix block: D ii, = α f i(x i,) + (1 w ii)i 4: Exchange ieraes x i, wih neighbors j N i. 5: Gradien: g i, = (1 w ii)x i, w ijx j, + α f i(x i,) j N i 6: Compue NN-0 descen direcion d (0) i, = D 1 ii, gi, 7: for k = 0,..., K 1 do 8: Exchange elemens d (k) i, of he NN-k [ sep wih neighbors ] 9: NN-(k+1) sep: d (k+1) i, = D 1 ii, B ijd (k) j, gi, j N i,j=i 10: end for 11: Updae local ierae: x i,+1 = x i, + ɛ d (K) i,. 1: end for NN-K sep componen d (K) i,. The resuling NN-K mehod is summarized in Algorihm 1. The descen ieraion in (17) is implemened in Sep 11. Implemenaion of his descen requires access o he NN-K descen direcion d (K) i, which is compued by he loop in seps Sep 6 iniializes he loop by compuing he NN-0 sep d (0) i, = D 1 ii, g i,. The core of he loop is in Sep 9 which corresponds o he recursion in (19). Sep 8 sands for he variable exchange ha is required o implemen Sep 9. Afer K ieraions hrough his loop, he NN-K descen direcion is compued and can be used in Sep 11. Boh, Seps 6 and 9, require access o he local gradien componen g i,. This is evaluaed in Sep 5 afer receiving he prerequisie informaion from neighbors in Sep 4. Seps 1 and 3 compue he blocks B ii,, B ij,, and D ii, required in seps 6 and 9. d (K) i, Remark By rying o approximae he Newon sep, NN- K ends up reducing he number of ieraions required for convergence. Furhermore, he larger K is, he closer ha he NN-K sep ges o he Newon sep, and he faser NN- K converges. We will jusify hese asserions boh, analyically in Secion IV, and numerically in Secion V. I is imporan o observe, however, ha reducing he number of ieraions reduces he compuaional cos bu no necessarily he communicaion cos. In DGD, each node i shares is vecor x i, R p wih each of is neighbors j N i. In NN-K, node i exchanges no only he vecor x i, R p wih is neighboring nodes, bu i also communicaes ieraively he local componens of he descen direcions {d (k) i, }K 1 k=0 Rp so as o compue he descen direcion d (K) i,. Hence, a each ieraion, node i sends N i vecors of size p o is neighbors in DGD, while in NN-K i sends (K+1) N i vecors of he same size. Unless he original problem is well condiioned, NN-K also reduces oal communicaion cos unil convergence, even hough he cos of each individual ieraion is larger. However, he use of large K is unwarraned because he added benefi of beer approximaing he Newon sep does no compensae he increase in communicaion cos. IV. CONVERGENCE ANALYSIS In his secion we show ha as ime progresses he sequence of objecive funcion values F (y ) [cf. (6)] approaches he opimal objecive funcion value F (y ). In proving his claim we make he following assumpions. Assumpion 1 There exis consans 0 δ < 1 ha lower and upper bound he diagonal weighs for all i, 0 < δ w ii < 1, i = 1,..., n. (0) Assumpion The local objecive funcions f i (x) are wice differeniable and he eigenvalues of he local Hessians are bounded wih posiive consans 0 < m M <, i.e. mi f i (x) MI. (1) Assumpion 3 The local objecive funcion Hessians f i (x) are Lipschiz coninuous wih respec o he Euclidian norm wih parameer L. I.e., for all x, ˆx R p, i holds f i (x) f i (ˆx) L x ˆx. () The lower bound in Assumpion 1 is more a definiion han a consrain. To be more precise, he weighs w ij are posiive if and only if j N i or j = i. This observaion verifies exisence of a lower bound for he local weighs w ii ha is defined as δ > 0 in Assumpion 1. The upper bound < 1 on he weighs w ii is rue for all conneced neworks as long as neighbors j N i are assigned nonzero weighs w ij > 0. This is because he marix W is doubly sochasic [cf. (4)], which implies ha w ii = 1 j N i w ij < 1 as long as w ij > 0. The lower bound m for he eigenvalues of local objecive funcion Hessians f i (x) is equivalen o he srong convexiy of local objecive funcions f i (x) wih parameer m. The srong convexiy assumpion for he local objecive funcions f i (x) saed in Assumpion is cusomary in Newon-based mehods, since he Hessian of objecive funcion should be inverible o esablish Newon s mehod [Chaper 9 of [3]]. The upper bound M for he eigenvalues of local objecive funcion Hessians f i (x) is similar o he condiion ha gradiens f i (x) are Lipschiz coninuous wih parameer M for he case ha funcions are wice differeniable. The resricion imposed by Assumpion 3 is cusomary in he analysis of second order mehods, see Secion of [3], which guaranees ha he Hessians F (y) are also Lipschiz coninuous as we show in he following lemma. Lemma 1 Consider he definiion of objecive funcion F (y) in (6). If Assumpion 3 holds hen he objecive funcion Hessian H(y) =: F (y) is Lipschiz coninuous wih parameer αl, i.e., for all y, ŷ R np we have Proof: See Appendix A. H(y) H(ŷ) αl y ŷ. (3) Lemma 1 saes ha he penaly objecive funcion inroduced in (6) has he propery ha he Hessians are Lipschiz coninuous, while he Lipschiz consan is a funcion of he penaly coefficien 1/α. Thus, if we increase he penaly coefficien 1/α, or, equivalenly, decrease α, he objecive funcion F (y) approaches a quadraic form because he curvaure becomes consan.

6 6 To prove convergence properies of NN we need bounds for he eigenvalues of he block diagonal marix D, he block sparse marix B, and he Hessian H. These eigenvalue bounds are esablished in he following proposiion using he condiions imposed by Assumpions 1 and. Proposiion 1 Consider he definiions of marices H, D, and B in (10), (1), and (13), respecively. If Assumpions 1 and hold rue, hen he eigenvalues of marices H, D, and B are uniformly bounded as αmi H ((1 δ) + αm)i, (4) ((1 ) + αm)i D ((1 δ) + αm)i, (5) Proof: See Appendix B. 0 B (1 δ)i. (6) Proposiion 1 saes ha Hessian marix H and block diagonal marix D are posiive definie, while marix B is posiive semidefinie. As we noed in Secion III, for he expansion in (14) o be valid he eigenvalues of he marix D 1/ BD 1/ mus be nonnegaive and sricly smaller han 1. The following proposiion saes ha his is rue for all imes. Proposiion Consider he definiions of he marices D in (1) and B in (13). If Assumpions 1 and hold rue, he marix D 1/ BD 1/ is posiive semidefinie and is eigenvalues are bounded above by a consan ρ < 1 0 D 1/ BD 1/ ρi, (7) where ρ := (1 δ)/((1 δ) + αm). Proof: See Appendix C. The resuls in Proposiion 1 would lead o he rivial upper bound (1 δ)/(αm + (1 )) for he eigenvalues of D 1/ BD 1/. The upper bound in Proposiion is igher and follows from he srucure of he marix D 1/ BD 1/. The bounds for he eigenvalues of D 1/ BD 1/ in (7) guaranee convergence of he Taylor series in (14). As menioned in Secion III, NN-K runcaes he firs K summands of he Hessian inverse Taylor series in (14) o approximae he Hessian inverse of he objecive funcion in opimizaion problem (6). To evaluae he performance of NN-K we sudy he error of he Hessian inverse approximaion by defining he error marix E R np np as E := I Ĥ(K) 1/ H Ĥ (K) 1/. (8) The error marix E measures closeness of he Hessian inverse approximaion marix Ĥ(K) 1 and he exac Hessian inverse H 1 a ime. Based on he definiion of he error marix E, if he Hessian inverse approximaion Ĥ(K) 1 approaches he exac Hessian inverse H 1 he error marix E approaches he zero marix 0. We herefore bound he error of he Hessian inverse approximaion by developing a bound for he eigenvalues of E. This bound is provided in he following proposiion. Proposiion 3 Consider he NN-K mehod in (1)-(17) and he definiion of error marix E in (8). Furher, recall he definiion of he consan ρ := (1 δ)/(αm + (1 δ)) < 1 in Proposiion. The error marix E is posiive semidefinie and all is eigenvalues are upper bounded by ρ K+1, Proof: See Appendix D. 0 E ρ K+1 I. (9) Proposiion 3 assers ha he error in he approximaion of he Hessian inverse, hereby on he approximaion of he Newon sep, is bounded by ρ K+1. This resul corroboraes he inuiion ha he larger K is, he closer ha d (K) i, approximaes he Newon sep. This closer approximaion comes a he cos of increasing he communicaion cos of each descen ieraion. The decrease of his error being proporional o ρ K+1 hins ha using a small value of K should suffice in pracice. Furher o decrease ρ we can increase δ or increase α. Increasing δ calls for assigning subsanial weigh o w ii. Increasing α comes a he cos of moving he soluion of (6) away from he soluion of (9) and is equivalen (1). Bounds on he eigenvalues of he objecive funcion Hessian H are cenral o he convergence analysis of Newon s mehod [Chaper 9 of [3]]. Lower bounds for he Hessian eigenvalues guaranee ha he marix is nonsingular. Upper bounds imply ha he minimum eigenvalue of he Hessian inverse H 1 is sricly larger han zero, which, in urn, implies a sric decremen in each Newon sep. Analogous bounds for he eigenvalues of he NN approximae Hessian inverses Ĥ(K) 1 are required. These bounds are sudied in he following lemma. Lemma Consider he NN-K mehod as defined in (1)-(17). If Assumpions 1 and hold rue, we have λi where consans λ and Λ are defined as Ĥ(K) 1 ΛI, (30) 1 λ:= (1 δ)+αm and Λ:= 1 ρ K+1 (1 ρ)((1 )+αm). (31) Proof: See Appendix E. According o he resul in Lemma, he NN-K approximae Hessian inverses Ĥ(K) 1 are sricly posiive definie and have all of heir eigenvalues bounded beween he posiive and finie consans λ and Λ. This is rue for all K and uniform across all ieraion indexes. Considering hese eigenvalue bounds and he fac ha g is a descen direcion, he approximae Newon sep Ĥ(K) 1 g enforces convergence of he ierae y o he opimal argumen y of he penalized objecive funcion F (y) in (6). In he following heorem we show ha if he sepsize ɛ is properly chosen, he sequence of objecive funcion values F (y ) converges a leas linearly o he opimal objecive funcion value F (y ). Theorem 1 Consider he NN-K mehod as defined in (1)- (17) and he objecive funcion F (y) as inroduced in (6). Furher, recall he definiions of he lower and upper bounds

7 7 λ and Λ, respecively, for he eigenvalues of he approximae Hessian inverse Ĥ(K) 1 in (31). If he sepsize ɛ is chosen as [ ] ɛ min 1, 3mλ 5 1 LΛ 3 (F (y 0 ) F (y )) 1, (3) and Assumpions 1-3 hold, he sequence F (y ) converges o he opimal argumen F (y ) a leas linearly as F (y ) F (y ) (1 ζ) (F (y 0 ) F (y )), (33) where he consan 0 < ζ < 1 is explicily given by ζ := ( ɛ)ɛαmλ αɛ3 LΛ 3 (F (y 0 ) F (y )) 1 6λ 3 Proof: See Appendix F.. (34) Theorem 1 shows ha he objecive funcion error sequence F (y ) F (y ) asympoicly converges o zero and ha he rae of convergence is a leas linear. Noe ha according o he definiion of he convergence parameer ζ in Theorem 1 and he definiions of λ and Λ in (31), increasing α leads o faser convergence. This observaion verifies exisence of a radeoff beween rae and accuracy of convergence. For large values of α he sequence generaed by nework Newon converges faser o he opimal soluion of (6). These faser convergence comes a he cos of increasing he disance beween he opimal soluions of (6) and (1). Conversely, smaller α implies smaller gap beween he opimal soluions of (6) and (1), bu he convergence rae of NN-K is slower. In he following secion, we illusrae he connecion beween nework Newon and he cenralized Newon s mehod. A. Analysis of nework Newon as a Newon-like mehod To connec he proposed NN mehod wih he classic Newon s mehod, we firs sudy he difference beween hese mehods. In paricular, he following lemma shows ha he convergence of he norm of he weighed gradien 1 g in NN-K is akin o he convergence of Newon s mehod wih consan sepsize. The difference is he appearance of a erm associaed wih he error of he Hessian inverse approximaion as we formally sae nex. Lemma 3 Consider he NN-K mehod as defined in (1)-(17). If Assumpions 1-3 hold, he sequence of weighed gradiens D 1/ g +1 saisfies g +1 ( 1 ɛ + ɛρ K+1)[ ] 1 + Γ 1 (1 ζ) ( 1) 4 1 g + ɛ Γ 1 g, (35) where he consans Γ 1 and Γ are defined as Γ 1 := (αɛlλ) 1 (F (y 0 ) F (y )) 1 4 λ 3 4 ((1 ) + αm) Proof: See Appendix G. αlλ, Γ :=. λ((1 )+αm) 1 (36) As per Lemma 3 he weighed gradien norm g +1 is upper bounded by erms ha are linear and quadraic on he weighed norm 1 g associaed wih he previous ierae. This is akin o he gradien norm decrease of Newon s mehod wih consan sepsize. Noe ha if he error of Hessian inverse approximaion which is characerized by ρ K+1 becomes zero, by seing ɛ = 1 we can simplify (35) as g +1 Γ 1 g. This resul shows quadraic convergence when Γ 1 g < 1. However, he erm ρ K+1 is no zero in general. Alhough, he error of Hessian inverse approximaion is no zero, he resul in (35) is very similar o he one for he classic Newon s mehod. To make his connecion clearer, furher noe ha for all excep he firs few ieraions he erm Γ 1 (1 ζ) ( 1)/4 0 is close o 0 and he relaion in (35) can be simplified o g +1 (1 ɛ+ɛρ K+1 ) 1 g + ɛ Γ 1 g. (37) In (37), he coefficien in he linear erm is reduced o (1 ɛ + ɛρ K+1 ) and he coefficien in he quadraic erm says a ɛ Γ. If, for discussion purposes, we se ɛ = 1 as in Newon s quadraic phase, he upper bound in (37) is furher reduced o g +1 ρ K+1 1 g + Γ 1 g. (38) The equaion in (38) makes he connecion beween NN and Newon s clear, because he exac same resul would hold for Newon s mehod if we se ρ = 0. The NN mehod can no have a quadraic convergence phase for he res of he ieraions like he one for Newon s mehod because of he erm ρ K+1 1 g. However, since he consan ρ (cf. Proposiion ) is smaller han 1 he erm ρ K+1 can be made arbirarily small by increasing he approximaion order K. Equivalenly, his means ha by selecing K o be large enough, we can make he quadraic erm in (38) dominan and observe a quadraic convergence phase. The boundaries of his quadraic convergence phase are formally deermined in he following Theorem using he resul in (35). Theorem Consider he NN-K mehod as defined in (1)- (17). Define he sequence η := [(1 ɛ + ɛρ K+1 )(1 + Γ 1 (1 ζ) ( 1)/4 )] and he ime 0 as he firs ime a which sequence η is smaller han 1, i.e. 0 := argmin { η < 1}. If Assumpions 1-3 hold, hen for all 0 when he sequence 1 g saisfies η (1 η ) ɛ 1 Γ g < 1 η ɛ, (39) Γ he sequence of scaled gradien norms is such ha g +1 ɛ Γ 1 η 1 g. (40) Proof: Based on he definiion of η, we can rewrie (35) as g +1 η 1 g + ɛ Γ 1 g. (41) We use his expression o prove he inequaliy in (40). To do so, rearrange erms in he firs inequaliy in (39) and wrie η ɛ Γ 1 η 1 g. (4)

8 8 Muliplying boh sides of (4) by η 1 η 1 g g yields η ɛ Γ 1 η 1 g. (43) Subsiuing η 1 g in (41) by is upper bound in (43) implies ha g +1 η ɛ Γ 1 η 1 g + ɛ Γ 1 g = ɛ Γ 1 η 1 g. (44) To verify quadraic convergence, i is necessary o prove ha he sequence i 1 g i of weighed gradien norms is decreasing. For his o be rue we mus have ɛ Γ D 1 1/ 1 η g < 1. (45) Bu (45) is rue because we are looking a a range of gradiens ha saisfy he second inequaliy in (39). As per Theorem 1 y is converging o y a a rae ha is a leas linear. Thus, he gradiens g will be such ha a some poin in ime hey saisfy he righmos inequaliy in (39). A ha poin in ime, progress owards y proceeds a a quadraic rae as indicaed by (40). This quadraic rae of progress is mainained unil he lefmos inequaliy in (39) is saisfied, a which poin he linear erm in (35) dominaes and he convergence rae goes back o linear. Furhermore, making K sufficienly large i is possible o reduce η arbirarily and make he quadraic convergence region las longer. In pracice, his calls for making K large enough so ha η is close o he desired gradien norm accuracy. Remark 3 For a quadraic funcion F, he Lipschiz consan for he Hessian is L = 0. Then, he opimal choice of sepsize for NN-K is ɛ = 1 as a resul of sepsize rule in (3). Moreover, he consans for he linear and quadraic erms in (35) are Γ 1 = Γ = 0 as i follows from heir definiions in (36). For quadraic funcions we also have ha he Hessian of he objecive funcion H = H and he block diagonal marix D = D are ime-invarian. Thus, we can rewrie (35) as g +1 ρ K+1 g. (46) Noe ha Newon s mehod converges in a single sep in quadraic programming. This propery follows from (46) because Newon s mehod is equivalen o NN-K as K. The expression in (46) saes ha NN-K converges linearly wih a consan decrease facor of ρ K+1 per ieraion. This in conras wih firs order mehods like DGD ha converge wih a linear rae ha depends on he problem condiion number. V. NUMERICAL ANALYSIS In his secion, we sudy he performance of NN-K in he minimizaion of a disribued quadraic objecive. For each agen i we consider a posiive definie diagonal marix A i S ++ p and a vecor b i R p o define he local objecive funcion f i (x) := (1/)x T A i x + b T i x. Therefore, he global cos funcion f(x) is wrien as n 1 f(x) := xt A i x + b T i x. (47) i=1 The difficuly of solving (47) is given by he condiion number of he marices A i. To une condiion numbers we generae diagonal marices A i wih random diagonal elemens a ii. The firs p/ diagonal elemens a ii are drawn uniformly a random from he discree se {1, 10 1,..., 10 ξ } and he nex p/ are uniformly and randomly chosen from he se {1, 10 1,..., 10 ξ }. This choice of coefficiens yields local marices A i wih eigenvalues in he inerval [10 ξ, 10 ξ ] and global marices n i=1 A i wih eigenvalues in he inerval [n10 ξ, n10 ξ ]. The linear erms b T i x are added so ha he differen local funcions have differen minima. The vecors b i are chosen uniformly a random from he box [0, 1] p. The graph is graph is d-regular and generaed by creaing a cycle and hen connecing each node wih he d/ nodes ha are closes in each direcion. The diagonal weighs in he marix W are se o w ii = 1/ + 1/(d + 1) and he off diagonal weighs o w ij = 1/(d + 1) when j N i. A. Comparison wih exising mehods In his secion we compare he performance of he proposed NN mehod wih primal mehods such as DGD in [13] and he acceleraed version of DGD (Acc. DGD) in [14]. For he Acc. DGD mehod, we assume ha he sepsize parameer and he momenum coefficiens are consan as in he case for he cenralized acceleraed gradien descen. This makes he comparison beween Acc. DGD, DGD, and NN fair, since our aim is o compare heir performances in solving he penalized objecive funcion. Moreover, we consider he convergence pahs of he disribued ADMM (DADMM) in [18] and he exac firs order mehod EXTRA in [16]. Alhough EXTRA operaes in he primal domain, i has been shown ha i can be inerpreed as a saddle-poin mehod [7]. Thus, we consider EXTRA in he caegory of dual mehods which has a linear convergence rae as DADMM. We compare hese mehods in solving (47) for he case ha here are n = 100 nodes in he nework and he dimension of he vecor x is p = 0. We assume ha he graph is 4-regular. Furher, we se he condiion number parameer o ξ = and he penaly parameer o α = The momenum coefficien for he acceleraed DGD is 0.9. Noe ha among he values {0.1, 0.,..., 0.9, 1}, he bes performance belongs o he momenum coefficien 0.9 which we use in he experimens. As he condiion number of he problem is relaively large, i.e., , he NN mehod performs beer han DGD and Acc. DGD in erms of he number of ieraions and oal number of local informaion exchanges as hey are illusraed in Fig. 1 and Fig., respecively. In he case ha he condiion number of he objecive funcion is no significanly large wih respec o he dimension of he problem, he acceleraed DGD would be a beer choice relaive o NN. The comparison wih dual mehods shows ha in erms of ieraions and rounds of communicaions DADMM and

9 9 Relaive error x x x0 x DGD Acc. DGD ADMM EXTRA NN-0 NN-1 NN- Relaive error x x x0 x DGD Acc.DGD NN-0 NN-1 NN Number of ieraions Fig. 1: Comparison of DGD, Acc. DGD, DADMM, EXTRA, NN-0, NN-1, and NN- in erms of number of ieraions Number of local informaion exchanges Fig. 3: Relaive error of DGD, Acc. DGD, NN-0, NN-1, and NN- vs number of local info. exchanges for a well-condiioned problem. Relaive error x x x0 x DGD Acc. DGD ADMM EXTRA NN-0 NN-1 NN- Relaive error x x x0 x DGD Acc. DGD NN-0 NN-1 NN Number of local informaion exchanges Fig. : Comparison of DGD, Acc. DGD, DADMM, EXTRA, NN-0, NN-1, and NN- in erms of rounds of local informaion exchanges Number of local informaion exchanges Fig. 4: Relaive error of DGD, Acc. DGD, NN-0, NN-1, and NN- vs number of local info. exchanges for an ill-condiioned problem differen varians of NN perform relaively well and afer some poin DADMM ouperform NN and oher primal mehods because i converges o he opimal argumen of he original problem insead of he penalized funcion. On he oher hand, each sep of DADMM requires solving a convex program which can be compuaionally cosly. We observe ha EXTRA also has a linear convergence rae o he exac opimal soluion, and is accuracy becomes beer han all primal mehods. However, EXTRA is a firs-order mehod and is convergence a he beginning is relaively slower han NN. This advanage of NN resuls from incorporaion of he curvaure informaion of he objecive funcion. These observaions show ha by incorporaing he idea of NN and EXTRA we should be able o come up wih a second-order mehod ha has a linear convergence rae o he exac soluion of (47) while i can perform well in ill-condiioned problems. B. Effec of objecive funcion condiion number We sudy he effec of condiion number on he convergence rae of NN and show ha NN is less sensiive o he objecive funcion condiion number wih respec o primal firs-order mehods, e.g., DGD in [13] and acceleraed DGD in [14]. To do so, we compare he performances of he menioned mehods in solving he problem in (47) for small and large condiion numbers. The parameers are he same as he parameers in Fig. 1 excep he choice of he condiion number parameer ξ. We firs consider he case ha ξ = 1 which leads o condiion number The convergence pahs of DGD, acceleraed DGD, NN-0, NN-1, and NN- in erms of he num- ber of local informaion exchanges are shown in Fig. 3. The performance of variaions of NN are no significanly beer han DGD and acceleraed DGD. In paricular, DGD and Acc. DGD boh ouperform NN-1 and NN- in erms of he oal communicaions unil convergence. Thus, acceleraed DGD is he bes opion among he primal mehods for problems wih small condiion number. To explore he performance of hese mehods for an illcondiioned problem we se he condiion number parameer ξ = 3 which leads o he condiion number for he considered realizaion. Fig. 4 illusrae he convergence pahs of he considered primal mehods in erms of he number of local informaion exchanges. As we observe, he advanage of he nework Newon mehods is subsanial in his seing and hey ouperform DGD and acceleraed DGD in erms of communicaion cos. C. Effec of nework opology We proceed o compare he performance of NN in differen nework opologies. In paricular, we consider five differen opologies which are random graphs wih conneciviy probabiliies p c = 0.5 and p c = 0.35, complee graph, cycle, and line. Noe ha in random graphs, we generae he edges beween nodes wih probabiliy p c. The complee graph is a graph ha all nodes are conneced o each oher direcly. A cycle graph is a conneced graph ha each node has degree. A line graph is a cycle graph ha is missing an edge. The parameers are he same as he parameers in Fig. 1 excep he nework graph and he way ha we generae

10 10 Relaive error x x x0 x NN- pc = 0.5 NN- pc = 0.35 NN- complee graph NN- cycle NN- line Number of ieraions Fig. 5: Relaive error of NN- vs num. of ieraions for random graphs wih p c = {0.5, 0.35}, complee graph, cycle graph, and line graph. Weighed gradien norm 1 g NN-0 T.B. NN-1 T.B. NN- T.B. NN-0 NN-1 NN Number of ieraions Fig. 7: Comparison of he heoreical bound (T.B.) in (46) wih he empirical resul for a quadraic programming. Relaive error x x x0 x NN- pc = 0.5 NN- pc = 0.35 NN- complee graph NN- cycle NN- line observaions imply ha for he graphs ha δ is larger we expec faser linear convergence. The convergence pahs in Fig 5 reinforce his claim. Noe ha δ for he considered graphs are δ pc=0.5 = , δ pc=0.35 = , δ com = 0.51, δ cycle = 0.75, δ line = These numbers jusify he similariy of he convergence pahs of line and cycle graphs and he slow convergence rae of he complee graph Toal communicaions beween nodes 10 5 Fig. 6: Relaive error of NN- vs num. of communicaions for random graphs wih p c ={0.5, 0.35}, complee graph, cycle graph, and line graph. he weigh marix W. We generae he weigh marix W using he formula W = I L/τ where L is he Laplacian marix of he graph and τ/ is he larges eigenvalue of he Laplacian L. We compare he performance of NN- for all hese neworks in erms of he number of ieraions and he oal number of communicaions beween nodes. Noice ha in his secion we use oal communicaions beween node insead of he number of local informaion exchanges (rounds of local communicaions) since he degrees of nodes in he differen neworks are no equal. The convergence pahs of NN- for he considered opologies in erms of he number of ieraions and he oal number of communicaions are demonsraed in Fig. 5 and Fig 6, respecively. The firs imporan observaion is he accuracy of convergence. According o he resuls in [15], if we define β < 1 as he second larges magniude of he eigenvalues of W, hen he accuracy of convergence is proporional o 1/(1 β). Thus, he graphs wih smaller β converge o a smaller neighborhood of he opimal argumen. In paricular, he parameer β for he complee graph which has he mos accurae convergence is β = 0.5, while for he line graph ha has he leas accurae convergence pah β = The second imporan observaion is he rae of convergence for NN- in hese nework opologies. I follows from he resul in Theorem 1 ha for a quadraic objecive funcion he consan of linear convergence becomes 1 αmλ. Therefore, for larger values of λ we expec faser convergence. Noe ha λ is large when δ = min i w ii is large and close o 1. These D. Tighness of he bounds In his secion, we sudy he ighness of he heoreical bounds in he paper. To do so, we compare he empirical convergence raes of NN-0, NN-1, and NN- wih he heoreical resul in Lemma 3. As we discussed in Remark 3, for a quadraic objecive funcion he sequence of weighed gradiens of NN-K saisfies he inequaliy g +1 ρ K+1 g. We refer o his rae as T.B. which sands for heoreical bound. Figure 7 illusraes he heoreical bounds and empirical convergence pahs of NN-0, NN-1, and NN- for he quadraic problem in (47). As we observe, he convergence raes of all mehods are faser han heir heoreical bounds a he beginning, bu afer almos 10 ieraions heir convergence rae becomes similar o he heoreical bound in (46). To be clearer, he slopes of he acual convergence pahs and heir corresponding heoreical bounds become equal afer almos 10 ieraions. This observaion shows ha he bound in (46) is reasonably igh and he sequence of weighed gradiens for NN-K diminishes wih facor ρ K+1. VI. CONCLUSIONS We developed he nework Newon mehod as an approximae Newon mehod for solving consensus opimizaion problems. The algorihm builds on a reinerpreaion of disribued gradien descen as a penaly mehod and relies on an approximaion of he Newon sep of he corresponding penalized objecive funcion. To approximae he Newon direcion we runcae he Taylor series of he exac Newon sep. This leads o a family of mehods defined by he number K of Taylor series erms kep in he approximaion. When we keep K erms of he Taylor series, he mehod is called NN-K and can be implemened hrough he aggregaion of informaion in K-hop neighborhoods. We showed ha NN converges a leas linearly o he soluion of he penalized objecive, and, consequenly,

11 11 o a neighborhood of he opimal argumen for he original opimizaion problem. We compleed he convergence analysis of NN-K by showing ha he sequence of ieraes generaed by NN-K converges a a quadraic rae in a specific inerval. Numerical analyses compared he performances of NN-K wih differen choices of K for minimizing quadraic objecives. We observed ha all NN-K mehods work faser han disribued gradien descen in erms of number of ieraions and number of communicaions. APPENDIX A PROOF OF LEMMA 1 Consider wo vecors y := [x 1 ;... ; x n ] R np and ŷ := [ˆx 1 ;... ; ˆx n ] R np. Based on he Hessian expression in (10), we simplify he Euclidean norm H(y) H(ŷ) as H(y) H(ŷ) = α G(y) G(ŷ) = α max f i (x i ) f i.(ˆx i ). (48) i=1,...,n By using 3 and (48) we obain ha H(y) H(ŷ) αl max x i ˆx i αl y ŷ. (49) i Therefore, he claim in (3) follows. APPENDIX B PROOF OF PROPOSITION 1 The Gershgorin circle heorem saes ha each eigenvalue of a marix A lies wihin a leas one of he Gershgorin discs D(a ii, R ii ) where he cener a ii is he ih diagonal elemen of A and he radius R ii := j i a ij is he sum of he absolue values of all he non-diagonal elemens of he ih row. Hence, Gershgorin discs can be considered as inervals of widh [a ii R ii, a ii + R ii ] for I W, where a ii = 1 w ii and R ii = j i w ij = j i w ij. Therefore, all he eigenvalues of I W are in a leas one of he inervals [1 w ii j i w ij, 1 w ii + j i w ij]. Since j w ij = 1, i can be derived ha 1 w ii = n j i w ij. Thus, he Gershgorin inervals can be simplified as [0, (1 w ii )] for i = 1,..., n. This observaion in associaion wih he fac ha (1 w ii ) (1 δ) implies ha he eigenvalues of I W are in he inerval [0, (1 δ)] and consequenly he eigenvalues of I Z are bounded as 0 I Z (1 δ)i. (50) Since marix G is block diagonal and he eigenvalues of each diagonal block G ii, = f i (x i, ) are bounded by consans 0 < m M < as menioned in (1), we obain mi G MI. (51) Considering he definiion of he Hessian H := I Z + αg and he bounds in (50) and (51), he firs claim follows. The definiion of he marix D in (1) yields D = αg + (I n W d ) I p, (5) where W d is defined as W d := diag(w). Noe ha marix I n W d is diagonal and he i-h diagonal componen is 1 w ii. Since he local weighs saisfy δ w ii, we obain ha he eigenvalues of I n W d are bounded below and above by 1 and 1 δ, respecively. Since he eigenvalues of (I n W d ) and (I n W d ) I p are idenical we obain (1 )I np (I n W d ) I p (1 δ)i np (53) Considering he relaion in (5) and bounds in (51) and (53), he second claim follows. Based on he definiion of B in (13), we can wrie B = (I W d + W) I. (54) Noe ha in he i-h row of marix I W d +W, he diagonal componen is 1 w ii and he jh componen is w ij for all j i. Using Gershgorin heorem and he same argumen ha we esablished for he eigenvalues of I Z, we can wrie 0 I W d + W (1 δ)i. (55) Based on (55) and (54), he las claim follows. APPENDIX C PROOF OF PROPOSITION According o he resul of Proposiion 1, D is posiive definie and B is posiive semidefinie which immediaely implies ha D 1/ BD 1/ is posiive semidefinie. Recall he definiion of D in (1) and define he marix ˆD as a special case of marix D for α = 0. I.e., ˆD := (I Z d ). Noice ha ˆD is diagonal, ime invarian, and only depends on he srucure of he nework. Since ˆD is diagonal and each diagonal componen 1 w ii is sricly larger han 0, ˆD is posiive definie and inverible. Hence, we can wrie D 1 BD 1 = (D 1 ˆD 1 )( ˆD 1 B ˆD 1 1 )( ˆD D 1 ). (56) We proceed o find an upper bound for he eigenvalues of he marix ˆD 1/ B ˆD 1/ in (56). Observing he fac ha marices ˆD 1/ B ˆD 1/ and B ˆD 1 are similar, eigenvalues of hese marices are idenical. Hence, we proceed o characerize an upper bound for he eigenvalues of marix B ˆD 1. Based on he definiions of B and ˆD, he produc B ˆD 1 is given by B ˆD 1 = (I Z d + Z) ((I Z d )) 1. Therefore, he blocks of he marix B ˆD 1 are given by [B ˆD 1 ] ii = 1 I and [B ˆD 1 ] ij = w ij I. (57) (1 w jj ) Thus, each diagonal componen of he marix B ˆD 1 is 1/ and ha he sum of non-diagonal componens of column i is np j=1,j i B ˆD 1 [ji] = 1 np j=1,j i w ji 1 w ii = 1. (58) Consider (58) and apply Gershgorin heorem o obain 0 µ i (B ˆD 1 ) 1 i = 1,..., n, (59) where µ i (B ˆD 1 ) indicaes he i-h eigenvalue of he marix B ˆD 1. The bounds in (59) and similariy of he marices B ˆD 1 and ˆD 1/ B ˆD 1/ show ha he eigenvalues of he marix ˆD 1/ B ˆD 1/ are uniformly bounded in he inerval 0 µ i ( ˆD 1/ B ˆD 1/ ) 1. (60) Based on (56), o characerize he bounds for he eigen-

12 1 values of D 1/ BD 1/, he bounds for he eigenvalues of he marix ˆD 1/ D 1/ should be sudied as well. Noice ha according o he definiions of ˆD and D, he produc ˆD 1/ D 1/ is block diagonal and he i-h diagonal block is [ ] ( ˆD1/ D 1/ α ) 1/ f i (x i, ) = ii (1 w ii ) + I. (61) Observe ha according o Assumpion 1, he eigenvalues of local Hessian marices f i (x i ) are bounded by m and M. Furher noice ha he diagonal elemens of weigh marix w ii are bounded by δ and, i.e. δ w ii. Considering hese bounds we can show ha he eigenvalues of marices (α/(1 w ii )) f i (x i, ) + I are lower and upper bounded as [ αm (1 δ) + 1 ] I α f i (x i, ) (1 w ii ) + I [ αm (1 ) + 1 ] I. (6) By considering he bounds in (6) and he expression in (61), he eigenvalues of he marix ˆD 1/ D 1/ are bounded as [ ] 1 (1 ) ( ) [ ] 1 1 µi ˆD D 1 (1 δ), (1 )+αm (1 δ)+αm (63) for i = 1,..., n. Observing he decomposiion in (56), he norm of he marix D 1/ BD 1/ is upper bounded as BD 1 ˆD 1/ ˆD 1 B ˆD 1. (64) Considering he symmery of marices ˆD1/ D 1/ and ˆD 1/ B ˆD 1/, and he upper bounds for heir eigenvalues in (60) and (63), respecively, we can subsiue he norm of hese wo marices by he upper bounds of heir eigenvalues and simplify he upper bound in (64) o BD 1/ (1 δ) (1 δ) + αm. (65) Since D 1/ BD 1/ is posiive semidefinie and symmeric, he resul in (7) follows. APPENDIX D PROOF OF PROPOSITION 3 In his proof and he res of he proofs we denoe he Hessian approximaion as Ĥ 1 insead of Ĥ(K) 1 for simplificaion of equaions. To prove lower and upper bounds for he eigenvalues of he error marix E we firs develop a simplificaion for he marix I H Ĥ 1 in he following lemma. Lemma 4 Consider he NN-K mehod as defined in (1)-(17). The marix I H Ĥ 1 can be simplified as I H Ĥ 1 = ( BD 1 ) K+1. (66) Proof: Check Lemma in [4]. Proof of Proposiion 3: Recall he resul in Proposiion. Since he marices D 1/ BD 1/ and B D 1 are similar (conjugae) he ses of eigenvalues of hese wo marices are idenical. Thus, he eigenvalues of BD 1 are bounded as 0 µ i (BD 1 ) ρ, (67) for i = 1,,..., np. This resul in associaion wih (66) yields 0 µ i (I H Ĥ 1 ) ρ K+1. (68) Observe ha he error marix E = I Ĥ 1/ H Ĥ 1/ is he conjugae of marix I H Ĥ 1. Hence, he bounds for he eigenvalues of marix I H Ĥ 1 also hold for he eigenvalues of error marix E and he claim in (9) follows. APPENDIX E PROOF OF LEMMA Based on he Cauchy-Schwarz inequaliy, he produc of he norms is larger han norm of he producs. This observaion and he definiion of Ĥ 1 in (15) lead o Ĥ 1 I+D 1 BD [D 1 BD 1 ] K. (69) As a resul of Proposiion 1 he eigenvalues of D are bounded below by (1 ) + αm. Thus, he maximum eigenvalue of is inverse D 1 is smaller han 1/((1 ) + αm), and, herefore, he norm of he marix D 1/ is bounded above as [(1 ) + αm] 1/. (70) Based on he resul in Proposiion, he eigenvalues of D 1/ BD 1/ are smaller han ρ. Furher, using he symmery and posiive definieness of D 1/ BD 1/ we obain BD 1/ ρ. (71) Using he riangle inequaliy in (69) o claim ha he norm of he sum is smaller han he sum of he norms and subsiuing he bounds in (70) and (71) ino he resuling expression yield 1 Ĥ 1 (1 ) + αm K ρ k. (7) k=0 Since ρ < 1, he sum K k=0 ρk can be simplified o (1 ρ K+1 )/(1 ρ). Considering his simplificaion for he sum in (7), he upper bound in (30) for he eigenvalues of he approximae Hessian inverse Ĥ 1 follows. In expression (15), all he summands excep he firs one, D 1, are posiive semidefinie. Hence, he approximae Hessian inverse Ĥ 1 is he sum of he marix D 1 and K posiive semidefinie marices and as a resul we can conclude ha D 1 Ĥ 1. (73) Proposiion 1 shows ha he eigenvalues of D are bounded above by (1 δ) + αm which leads o he conclusion ha here exiss a lower bound for he eigenvalues of D 1, ((1 δ) + αm) 1 I D 1. (74) The claim in (30) follows from he resuls in (73) and (74). APPENDIX F PROOF OF THEOREM 1 To prove global convergence of he Nework Newon mehod we firs inroduce wo echnical lemmas. In he firs lemma, we develop an upper bound for he objecive funcion

13 13 value F (y) using he firs hree erms of is Taylor expansion. In he second lemma, we consruc an upper bound for he error F (y +1 ) F (y ) in erms of F (y ) F (y ). Lemma 5 Consider he funcion F (y) defined in (6). If Assumpions and 3 hold, hen for any y, ŷ R np F (ŷ) F (y) + F (y) T (ŷ y) (75) + 1 (ŷ y)t F (y)(ŷ y) + αl 6 ŷ y 3. Proof : The claim follows from he Lipschiz coninuiy of he Hessian wih consan αl and Theorem 7.7 in [8] which characerizes he error of aylor s expansion. In he following lemma, we use he resul in Lemma 5 o esablish an upper bound for he error F (y +1 ) F (y ). Lemma 6 Consider he NN-K mehod as defined in (1)-(17). Furher, recall he definiion of y as he opimal argumen of he objecive funcion F (y). If Assumpions 1-3 hold, hen F (y +1 ) F (y ) [ 1 ( ɛ ɛ ) αmλ ] [F (y ) F (y )] + αlɛ3 Λ 3 [F (y 6λ 3 ) F (y )] 3. (76) Proof: By seing ŷ := y +1 and y := y in (75) we obain F (y +1 ) F (y ) + g T (y +1 y ) (77) + 1 (y +1 y ) T H (y +1 y )+ αl 6 y +1 y 3, where g := F (y ) and H := F (y ). From he definiion of he NN-K updae in (16) we can wrie he difference of wo consecuive variables as y +1 y = ɛĥ 1 g. Making his subsiuion ino (77) implies F (y +1 ) F (y ) ɛg T g + ɛ gt Ĥ 1 H Ĥ 1 Ĥ 1 + αlɛ3 6 Ĥ 1 g 3. (78) According o (8), we can subsiue Ĥ 1/ H Ĥ 1/ in (78) by I E which leads o F (y +1 ) F (y ) ɛg T g Ĥ 1 g + ɛ 1 gt Ĥ (I E 1 )Ĥ + αlɛ3 6 Ĥ 1 g 3. (79) Proposiion 3 shows ha E is posiive semidefinie, and, herefore, he quadraic form g T Ĥ 1/ E Ĥ 1/ g is nonnegaive. Considering his lower bound we can simplify (79) o ( ) ɛ ɛ F (y +1 ) F (y ) g T Ĥ 1 g + αlɛ3 6 Ĥ 1 g 3. (80) Since ɛ < 1, we obain ha ɛ ɛ is posiive. Moreover, recall he resul of Lemma ha all he eigenvalues of he Hessian inverse approximaion Ĥ 1 are lower and upper bounded by λ and Λ, respecively. These wo observaions imply ha we can replace he erm g T Ĥ 1 g by is lower bound λ g. Moreover, exisence of upper bound Λ for he eigenvalues of Hessian inverse approximaion Ĥ 1 implies ha he erm g Ĥ 1 g 3 is upper bounded by Λ 3 g 3. Subsiuing hese bounds for he second and hird erms of (80) and subracing F (y ) from boh sides of inequaliy (80) leads o ( ) ɛ ɛ λ F (y +1 ) F (y ) F (y ) F (y ) g + αlɛ3 Λ 3 g 3. 6 (81) Since he funcion F is srongly convex wih consan αm we can wrie [see Eq. (9.9) in [3]], F (y ) F (y ) 1 αm F (y ). (8) Rearrange erms in (8) o obain αm(f (y ) F (y )) as a lower bound for F (y ) = g. Now subsiue he lower bound αm(f (y ) F (y )) for squared norm of gradien g in he second summand of (81) o obain F (y +1 ) F (y ) [ 1 ( ɛ ɛ ) αmλ ] (F (y ) F (y )) + αlɛ3 Λ 3 g 3. (83) 6 Since he eigenvalues of he Hessian are upper bounded by (1 δ) + αm, for any vecors ŷ and y in R np we can wrie F (y) F (ŷ) + F (ŷ) T (1 δ) + αm (y ŷ) + y ŷ. (84) According o he definiion of λ in (31), we can subsiue (1 δ) + αm by 1/λ. Implemening his subsiuion and minimizing boh sides of he equaliy wih respec o y yields F (y ) F (ŷ) λ F (ŷ). (85) Seing ŷ = y, replacing F (y ) by g, and aking he square roo of boh sides of he resuling inequaliy yields g [ λ 1 [F (y ) F (y )] ] 1/. (86) Replace he upper bound in (14) for he norm of he gradien g in he las erm of (83) o obain (76). Proof of Theorem 1: To simplify upcoming derivaions define he sequence β as β :=( ɛ)ɛαmλ ɛ3 αlλ 3 [F (y ) F (y )] 1. (87) 6λ 3 Recall he resul of Lemma 6. Facorizing F (y ) F (y ) from he erms of he righ hand side of (76) in associaion wih he definiion of β in (87) implies ha we can simplify (76) as F (y +1 ) F (y ) (1 β )(F (y ) F (y )). (88) I remains o show ha for all ime seps, he consans β saisfy 0 < β < 1. We firs show ha β < 1 for all 0. Based on (87) we can wrie β ( ɛ)ɛαmλ. (89) Considering (ɛ 1) 0 we have ɛ( ɛ) 1. Furher, by inequaliies m < M and 1 δ > 0, we obain αm < αm + (1 δ). Thus, αm/(αm + (1 δ)) < 1 which is

14 14 equivalen o αmλ < 1. I follows from hese inequaliies ha ( ɛ)ɛαmλ < 1. (90) Tha β < 1 follows by combining (89) wih (90). To prove ha 0 < β for all 0 we prove ha his is rue for = 0 and hen prove ha he β sequence is increasing. According o (3), we can wrie [ ] 3mλ 5 1 ɛ, (91) LΛ 3 (F (y 0 ) F (y )) 1 By compuing he squares of boh sides of (91), muliplying he righ hand side of he resuling inequaliy by o make he inequaliy sric, and facorizing αmλ we obain 6λ 3 ɛ < αmλ. (9) αlλ 3 [F (y 0 ) F (y )] 1 If we now divide boh sides of he inequaliy in (9) by he firs muliplicand in he righ hand side of (9) we obain ɛ αlλ 3 [F (y 0 ) F (y )] 1 6λ 3 < αmλ. (93) Observe ha based on he hypohesis in (3) he sep size ɛ is smaller han 1 and i is hen rivially rue ha ɛ 1. This observaion shows ha if we muliply he righ hand side of (93) by (1 ɛ/) he inequaliy sill holds, ɛ αlλ 3 (F (y 0 ) F (y )) 1 6λ 3 < αm( ɛ)λ. (94) Muliply boh sides of (94) by ɛ and rearrange erms o obain αmɛ( ɛ)λ ɛ3 αlλ 3 [F (y 0 ) F (y )] 1 >0. (95) 6λ 3 Based on (87), he resul in (95) yields β 0 > 0. Observing ha β 0 is posiive, o show ha for all he sequence of β is posiive i is sufficien o prove ha he sequence β is increasing. We use srong inducion o prove β < β +1 for all 0. By seing = 0 in (88) we obain F (y 1 ) F (y ) (1 β 0 )(F (y 0 ) F (y )). (96) Considering he resul in (96) and he fac ha 0 < β 0 < 1, we obain ha he objecive funcion error a ime = 1 is sricly smaller han he error a ime = 0, i.e. F (y 1 ) F (y ) < F (y 0 ) F (y ). (97) According o (87), a smaller objecive funcion error F (y ) F (y ) leads o a larger coefficien β. This observaion combined wih he resul in (97) leads o β 0 < β 1. (98) To complee he srong inducion argumen assume now ha β 0 < β 1 < < β 1 < β and proceed o prove ha if his is rue we mus have β < β +1. Begin by observing ha since 0 < β 0 he inducion hypohesis implies ha for all u {0,..., } he consan β u is also posiive, i.e., 0 < β u. Furher recall ha for all he sequence β is also smaller han 1 as already proved. Combining hese wo observaions we have 0 < β u < 1 for all u {0,..., }. Consider now he inequaliy in (88) and uilize he fac ha 0 < β u < 1 for all u {0,..., } o conclude ha F (y u+1 ) F (y ) < F (y u ) F (y ), (99) for all u {0,..., }. Seing u = in (99) we conclude ha F (y +1 ) F (y ) < F (y ) F (y ). By furher repeaing he argumen leading from (98) o (97) we can conclude ha β < β +1. (100) The srong inducion proof is complee and we can claim ha 0 < β 0 < β 1 < < β < 1, (101) for all imes. The resuls in (88) and (101) imply lim F (y ) F (y ) = 0. To conclude ha he rae is a leas linear simply observe ha if he sequence β is increasing as per (101), he sequence 1 β is decreasing and saisfies 0 < 1 β < 1 β 0 < 1, (10) for all ime seps. Applying he inequaliy in (88) recursively and considering he inequaliy in (10) yields F (y ) F (y ) (1 β 0 ) (F (y 0 ) F (y )). (103) Considering ζ = β 0, he claim in (33) follows. APPENDIX G PROOF OF LEMMA 3 To simplify noaion we use Ĥ 1 o indicae he approximae Hessian inverse Ĥ(K) 1. Based on Lemma 1..3 in [9], he Lipschiz coninuiy of Hessians wih consan αl yields g +1 g + ɛh Ĥ 1 g ɛ αl Ĥ 1 g, (104) where we have used y +1 y = ɛĥ 1 g. Based on he definiion of marix norm we can wrie [g +1 g + ɛh Ĥ 1 g ] g +1 g + ɛh Ĥ 1 g. (105) Subsiuing g +1 g + ɛh Ĥ 1 g in he righ hand side of (105) by he upper bound in (104) leads o [g +1 g +ɛh Ĥ 1 g ] ɛ αl 1 D Ĥ 1 g. (106) Based on he riangle inequaliy, for any vecors a and b, and a posiive consan C, if he relaion a b C holds, hen a b + C. Thus, we can use he resul in (106) o wrie g +1 [g ɛh Ĥ 1 g ] + ɛ αl 1 D Ĥ 1 g. (107) Wrie D 1/ g as he sum (1 ɛ)(d 1/ g ) + ɛ(d 1/ g ) and use he riangle inequaliy o obain g +1 (1 ɛ) g + ɛ [I H Ĥ 1 ]g + ɛ αl 1 D Ĥ 1 g. (108)

15 15 Use he resul in Lemma 4 o wrie [I H Ĥ 1 ]g = [D 1 BD 1 ] K+1 D 1 g. (109) The resul in Proposiion implies ha [D 1/ BD 1/ ] K+1 ρ K+1. Considering his upper bound and he simplificaion in (109) we can wrie [I H Ĥ 1 ]g ρ K+1 g. (110) Subsiue he upper bound in (110) ino (108) and use he inequaliy Ĥ 1 g Ĥ 1 g o wrie g +1 (1 ɛ + ɛρ K+1 ) g + αɛ L Ĥ 1 g. (111) D 1/ Noe ha D 1 D 1 1 is bounded above as D 1 D 1 D 1 D D 1 D 1. (11) 1 1 The eigenvalues of D and D 1 are bounded below by αm+ (1 ). Thus, he eigenvalues of D 1 and D 1 1 are bounded above by 1/(αm + (1 )). Hence, D 1 D 1 ((1 ) + αm) D D 1. (113) 1 The difference D D 1 can be simplified as α(g G 1 ). Moreover, H H 1 = α(g G 1 ). Thus, D D 1 = H H 1. This observaion in conjuncion wih he Lipschiz coninuiy of he Hessians wih parameer αl implies ha D D 1 αl y y 1. (114) Replace D D 1 in (113) by he bound in (114) o obain D 1 D 1 αl 1 ((1 ) + αm) y y 1. (115) Noe ha g T (D 1 D 1 1 )g is bounded above by D 1 D 1 1 g. Considering he upper bound for D 1 D 1 1 in (115), he erm g T (D 1 D 1 1 )g is bounded above by g T (D 1 D 1 1 )g αl y y 1 g ((1 ) + αm). (116) Using he resul in (116), and simplifacions g T D 1 1 g = g and g T D 1 g = g, we can wrie 1 g 1 g + αl y y 1 g ((1 ) + αm). (117) For any consans a, b, and c if a b + c holds, hen a b + c holds. Using his resul and (117) we obain g 1 g + (αl y y 1 ) 1 g. (118) (1 ) + αm Considering he updae in (17) we can subsiue y y 1 by ɛĥ 1 1 g 1. Applying his subsiuion ino (118) yields g 1 g + [αɛl Ĥ 1 1 g 1 ] 1 g. (119) (1 ) + αm If we subsiue g by he upper bound in (119) and subsiue Ĥ 1 1 g 1 by he upper bound Ĥ 1 1 g 1, he inequaliy in (111) can be wrien as g +1 ( 1 ɛ + ɛρ K+1) 1 g ( ) 1 ɛ + ɛρ K+1 + [αɛl Ĥ 1 1 g 1 ] 1/ g (1 ) + αm + αɛ L Ĥ 1 g. (10) D 1/ Noe ha µ min (D 1/ 1 ) g 1 g. Considering his inequaliy and he lower bound ((1 δ) + αm) 1/ for he eigenvalues of D 1/ 1 we can wrie g ((1 δ) + αm) 1/ 1 g. (11) Subsiue g by he upper bound in (11), use he definiion λ := 1/((1 δ) + αm), replace he norms he norms Ĥ 1 and Ĥ 1 1 by heir upper bound Λ, and use he fac ha is bounded above by 1/((1 )+αm) 1/ o rewrie he righ hand side of (10) as g +1 (1 ɛ+ɛρ K+1 )[1 + C 1 g 1 1 ] 1 g αɛ LΛ + λ((1 ) + αm) 1 1 g, (1) where C 1 := [ αɛlλ/λ((1 ) + αm) ] 1/. According o (31), we can subsiue 1/((1 δ) + αm) by λ. Applying his subsiuion ino (84) and minimizing he boh sides of (84) wih respec o y yields F (y ) F (ŷ) λ F (ŷ). (13) Since (13) holds for any ŷ, we se ŷ := y 1. By rearranging he erms and aking heir square roos, we obain an upper bound for he gradien norm F (y 1 ) = g 1 as g 1 [ λ 1 [F (y 1 ) F (y )] ] 1. (14) The resul in Theorem 1 and he relaion in (14) allow us o show ha g 1 1/ is upper bounded by g 1 1 [ λ 1 (1 ζ) 1 (F (y 0 ) F (y )) ] 1 4. (15) Consider he definiion of Γ in (36) and subsiue he upper bound in (15) for g 1 1/ o updae (1) as g +1 ( 1 ɛ+ɛρ K+1)[ ] 1 + C (1 ζ) g + ɛ Γ 1 g, (16) where C := C 1 [(F (y 0 ) F (y ))/λ] 1/4. Based on he definiions of C and Γ 1 we obain ha C = Γ 1. This observaion in associaion wih (16) leads o he claim in (35). REFERENCES [1] A. Mokhari, Q. Ling, and A. Ribeiro, Nework newon, in Signals, Sysems and Compuers, h Asilomar Conference on. IEEE, 014, pp [], An approximae newon mehod for disribued opimizaion, in Acousics, Speech and Signal Processing (ICASSP), 015 IEEE Inernaional Conference on. IEEE, 015, pp [3] Y. Cao, W. Yu, W. Ren, and G. Chen, An overview of recen progress in he sudy of disribued muli-agen coordinaion, IEEE Transacions on Indusrial Informaics, vol. 9, pp , 013.

16 16 [4] C. G. Lopes and A. H. Sayed, Diffusion leas-mean squares over adapive neworks: Formulaion and performance analysis, Signal Processing, IEEE Transacions on, vol. 56, no. 7, pp , 008. [5] A. Ribeiro, Ergodic sochasic opimizaion algorihms for wireless communicaion and neworking, Signal Processing, IEEE Transacions on, vol. 58, no. 1, pp , 010. [6] M. G. Rabba and R. D. Nowak, Decenralized source localizaion and racking [wireless sensor neworks], in Acousics, Speech, and Signal Processing, 004. Proceedings.(ICASSP 04). IEEE Inernaional Conference on, vol. 3. IEEE, 004, pp. iii 91. [7] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, Consensus in ad hoc wsns wih noisy links par i: Disribued esimaion of deerminisic signals, Signal Processing, IEEE Transacions on, vol. 56, no. 1, pp , 008. [8] U. A. Khan, S. Kar, and J. M. Moura, Diland: An algorihm for disribued sensor localizaion wih noisy disance measuremens, Signal Processing, IEEE Transacions on, vol. 58, no. 3, pp , 010. [9] M. Rabba and R. Nowak, Disribued opimizaion in sensor neworks, in Proceedings of he 3rd inernaional symposium on Informaion processing in sensor neworks. ACM, 004, pp [10] R. Bekkerman, M. Bilenko, and J. Langford, Scaling up machine learning: Parallel and disribued approaches. Cambridge Universiy Press, 011. [11] K. I. Tsianos, S. Lawlor, and M. G. Rabba, Consensus-based disribued opimizaion: Pracical issues and applicaions in large-scale machine learning, Communicaion, Conrol, and Compuing (Alleron), 01 50h Annual Alleron Conference on, pp , 01. [1] Y. Low, D. Bickson, J. Gonzalez, C. Guesrin, A. Kyrola, and J. M. Hellersein, Disribued graphlab: a framework for machine learning and daa mining in he cloud, Proceedings of he VLDB Endowmen, vol. 5, no. 8, pp , 01. [13] A. Nedic and A. Ozdaglar, Disribued subgradien mehods for muliagen opimizaion, Auomaic Conrol, IEEE Transacions on, vol. 54, no. 1, pp , 009. [14] D. Jakoveic, J. Xavier, and J. M. Moura, Fas disribued gradien mehods, Auomaic Conrol, IEEE Transacions on, vol. 59, no. 5, pp , 014. [15] K. Yuan, Q. Ling, and W. Yin, On he convergence of decenralized gradien descen, arxiv preprin arxiv: , 013. [16] W. Shi, Q. Ling, G. Wu, and W. Yin, Exra: An exac firs-order algorihm for decenralized consensus opimizaion, arxiv preprin arxiv: , 014. [17] S. Boyd, N. Parikh, E. Chu, B. Peleao, and J. Ecksein, Disribued opimizaion and saisical learning via he alernaing direcion mehod R in Machine Learning, vol. 3, of mulipliers, Foundaions and Trends no. 1, pp. 1 1, 011. [18] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, On he linear convergence of he admm in decenralized consensus opimizaion, IEEE Trans. on Signal Processing, vol. 6, no. 7, pp , 014. [19] T.-H. Chang, M. Hong, and X. Wang, Muli-agen disribued opimizaion via inexac consensus admm, Signal Processing, IEEE Transacions on, vol. 63, no., pp , 015. [0] A. Mokhari, W. Shi, Q. Ling, and A. Ribeiro, Dqm: Decenralized quadraically approximaed alernaing direcion mehod of mulipliers, IEEE Transacions on Signal Processing, vol. 64, no. 19, pp , Oc 016. [1] J. C. Duchi, A. Agarwal, and M. J. Wainwrigh, Dual averaging for disribued opimizaion: convergence analysis and nework scaling, Auomaic Conrol, IEEE Trans. on, vol. 57, no. 3, pp , 01. [] K. I. Tsianos, S. Lawlor, and M. G. Rabba, Push-sum disribued dual averaging for convex opimizaion. CDC, pp , 01. [3] S. Boyd and L. Vandenberghe, Convex opimizaion. Cambridge universiy press, 004. [4] M. Zargham, A. Ribeiro, A. Ozdaglar, and A. Jadbabaie, Acceleraed dual descen for nework flow opimizaion, Auomaic Conrol, IEEE Transacions on, vol. 59, no. 4, pp , 014. [5] E. Wei, A. Ozdaglar, and A. Jadbabaie, A disribued newon mehod for nework uiliy maximizaion i: algorihm, Auomaic Conrol, IEEE Transacions on, vol. 58, no. 9, pp , 013. [6] D. Jakoveic, J. M. Moura, and J. Xavier, Disribued neserov-like gradien algorihms, in Decision and Conrol (CDC), 01 IEEE 51s Annual Conference on. IEEE, 01, pp [7] A. Mokhari and A. Ribeiro, Dsa: decenralized double sochasic averaging gradien algorihm, Journal of Machine Learning Research, vol. 17, no. 61, pp. 1 35, 016. [8] T. M. Aposol, Calculus, volume I. John Wiley & Sons, 007, vol. 1. [9] Y. Neserov, Inroducory lecures on convex opimizaion: A basic course. Springer Science & Business Media, 013, vol. 87. Aryan Mokhari received he B. Sc. degree in elecrical engineering from Sharif Universiy of Technology, Tehran, Iran, in 011, and he M.S. degree in elecrical engineering from he Universiy of Pennsylvania, Philadelphia, PA, in 014. Since 01, he has been working owards he Ph.D. degree in he Deparmen of Elecrical and Sysems Engineering, Universiy of Pennsylvania, Philadelphia, PA. From June o Augus 010, he was an inern a he Advanced Digial Sciences Cener, Singapore, Singapore. He was a research inern wih he Bigdaa Machine Learning group a Yahoo!, Sunnyvale, CA, from June o Augus 016. His research ineress lie in he areas of opimizaion, machine learning, conrol, and signal processing. His curren research focuses on developing mehods for large-scale opimizaion problems. Qing Ling received he B.E. degree in auomaion and he Ph.D. degree in conrol heory and conrol engineering from Universiy of Science and Technology of China in 001 and 006, respecively. From 006 o 009, he was a Pos-Docoral Research Fellow in he Deparmen of Elecrical and Compuer Engineering, Michigan Technological Universiy. Since 009, he has been an Associae Professor in he Deparmen of Auomaion, Universiy of Science and Technology of China. His curren research focuses on decenralized opimizaion of neworked muli-agen sysems. Alejandro Ribeiro received he B.Sc. degree in elecrical engineering from he Universidad de la Republica Orienal del Uruguay, Monevideo, in 1998 and he M.Sc. and Ph.D. degree in elecrical engineering from he Deparmen of Elecrical and Compuer Engineering, he Universiy of Minnesoa, Minneapolis in 005 and 007. From 1998 o 003, he was a member of he echnical saff a Bellsouh Monevideo. Afer his M.Sc. and Ph.D. sudies, in 008 he joined he Universiy of Pennsylvania, Philadelphia, where he is currenly he Rosenbluh Associae Professor a he Deparmen of Elecrical and Sysems Engineering. His research ineress are in he applicaions of saisical signal processing o he sudy of neworks and neworked phenomena. His focus is on srucured represenaions of neworked daa srucures, graph signal processing, nework opimizaion, robo eams, and neworked conrol. Dr. Ribeiro received he 014 O. Hugo Schuck bes paper award, he 01 S. Reid Warren, Jr. Award presened by Penn s undergraduae suden body for ousanding eaching, he NSF CAREER Award in 010, and paper awards a he 016 SSP Workshop, 016 SAM Workshop, 015 Asilomar SSC Conference, ACC 013, ICASSP 006, and ICASSP 005. Dr. Ribeiro is a Fulbrigh scholar and a Penn Fellow.

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization 1 A Decenralized Second-Order Mehod wih Exac Linear Convergence Rae for Consensus Opimizaion Aryan Mokhari, Wei Shi, Qing Ling, and Alejandro Ribeiro Absrac This paper considers decenralized consensus

More information

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, VOL. 2, NO. 4, DECEMBER 2016 507 A Decenralized Second-Order Mehod wih Exac Linear Convergence Rae for Consensus Opimizaion Aryan Mokhari,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Optimality Conditions for Unconstrained Problems

Optimality Conditions for Unconstrained Problems 62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

Chapter 6. Systems of First Order Linear Differential Equations

Chapter 6. Systems of First Order Linear Differential Equations Chaper 6 Sysems of Firs Order Linear Differenial Equaions We will only discuss firs order sysems However higher order sysems may be made ino firs order sysems by a rick shown below We will have a sligh

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

An Introduction to Malliavin calculus and its applications

An Introduction to Malliavin calculus and its applications An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION MATH 28A, SUMME 2009, FINAL EXAM SOLUTION BENJAMIN JOHNSON () (8 poins) [Lagrange Inerpolaion] (a) (4 poins) Le f be a funcion defined a some real numbers x 0,..., x n. Give a defining equaion for he Lagrange

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

15. Vector Valued Functions

15. Vector Valued Functions 1. Vecor Valued Funcions Up o his poin, we have presened vecors wih consan componens, for example, 1, and,,4. However, we can allow he componens of a vecor o be funcions of a common variable. For example,

More information

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection Appendix o Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan Huahua Wang Arindam Banerjee Prem Melville A Background abou ADMM In his secion, we give a brief

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

Then. 1 The eigenvalues of A are inside R = n i=1 R i. 2 Union of any k circles not intersecting the other (n k)

Then. 1 The eigenvalues of A are inside R = n i=1 R i. 2 Union of any k circles not intersecting the other (n k) Ger sgorin Circle Chaper 9 Approimaing Eigenvalues Per-Olof Persson persson@berkeley.edu Deparmen of Mahemaics Universiy of California, Berkeley Mah 128B Numerical Analysis (Ger sgorin Circle) Le A be

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n Module Fick s laws of diffusion Fick s laws of diffusion and hin film soluion Adolf Fick (1855) proposed: d J α d d d J (mole/m s) flu (m /s) diffusion coefficien and (mole/m 3 ) concenraion of ions, aoms

More information

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves

Rapid Termination Evaluation for Recursive Subdivision of Bezier Curves Rapid Terminaion Evaluaion for Recursive Subdivision of Bezier Curves Thomas F. Hain School of Compuer and Informaion Sciences, Universiy of Souh Alabama, Mobile, AL, U.S.A. Absrac Bézier curve flaening

More information

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x WEEK-3 Reciaion PHYS 131 Ch. 3: FOC 1, 3, 4, 6, 14. Problems 9, 37, 41 & 71 and Ch. 4: FOC 1, 3, 5, 8. Problems 3, 5 & 16. Feb 8, 018 Ch. 3: FOC 1, 3, 4, 6, 14. 1. (a) The horizonal componen of he projecile

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information

Distributed Fictitious Play for Optimal Behavior of Multi-Agent Systems with Incomplete Information

Distributed Fictitious Play for Optimal Behavior of Multi-Agent Systems with Incomplete Information Disribued Ficiious Play for Opimal Behavior of Muli-Agen Sysems wih Incomplee Informaion Ceyhun Eksin and Alejandro Ribeiro arxiv:602.02066v [cs.g] 5 Feb 206 Absrac A muli-agen sysem operaes in an uncerain

More information

Global Convergence of Online Limited Memory BFGS

Global Convergence of Online Limited Memory BFGS Journal of Machine Learning Research 16 (2015 3151-3181 Submied 9/14; Revised 7/15; Published 12/15 Global Convergence of Online Limied Memory BFGS Aryan Mokhari Alejandro Ribeiro Deparmen of Elecrical

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source Muli-scale D acousic full waveform inversion wih high frequency impulsive source Vladimir N Zubov*, Universiy of Calgary, Calgary AB vzubov@ucalgaryca and Michael P Lamoureux, Universiy of Calgary, Calgary

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A

More information

Convergence of the Neumann series in higher norms

Convergence of the Neumann series in higher norms Convergence of he Neumann series in higher norms Charles L. Epsein Deparmen of Mahemaics, Universiy of Pennsylvania Version 1.0 Augus 1, 003 Absrac Naural condiions on an operaor A are given so ha he Neumann

More information

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl Time series model fiing via Kalman smoohing and EM esimaion in TimeModels.jl Gord Sephen Las updaed: January 206 Conens Inroducion 2. Moivaion and Acknowledgemens....................... 2.2 Noaion......................................

More information

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004 ODEs II, Lecure : Homogeneous Linear Sysems - I Mike Raugh March 8, 4 Inroducion. In he firs lecure we discussed a sysem of linear ODEs for modeling he excreion of lead from he human body, saw how o ransform

More information

Model Reduction for Dynamical Systems Lecture 6

Model Reduction for Dynamical Systems Lecture 6 Oo-von-Guericke Universiä Magdeburg Faculy of Mahemaics Summer erm 07 Model Reducion for Dynamical Sysems ecure 6 v eer enner and ihong Feng Max lanck Insiue for Dynamics of Complex echnical Sysems Compuaional

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

4.6 One Dimensional Kinematics and Integration

4.6 One Dimensional Kinematics and Integration 4.6 One Dimensional Kinemaics and Inegraion When he acceleraion a( of an objec is a non-consan funcion of ime, we would like o deermine he ime dependence of he posiion funcion x( and he x -componen of

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Unsteady Flow Problems

Unsteady Flow Problems School of Mechanical Aerospace and Civil Engineering Unseady Flow Problems T. J. Craf George Begg Building, C41 TPFE MSc CFD-1 Reading: J. Ferziger, M. Peric, Compuaional Mehods for Fluid Dynamics H.K.

More information

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

Cash Flow Valuation Mode Lin Discrete Time

Cash Flow Valuation Mode Lin Discrete Time IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728,p-ISSN: 2319-765X, 6, Issue 6 (May. - Jun. 2013), PP 35-41 Cash Flow Valuaion Mode Lin Discree Time Olayiwola. M. A. and Oni, N. O. Deparmen of Mahemaics

More information

. Now define y j = log x j, and solve the iteration.

. Now define y j = log x j, and solve the iteration. Problem 1: (Disribued Resource Allocaion (ALOHA!)) (Adaped from M& U, Problem 5.11) In his problem, we sudy a simple disribued proocol for allocaing agens o shared resources, wherein agens conend for resources

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Ordinary Differential Equations

Ordinary Differential Equations Ordinary Differenial Equaions 5. Examples of linear differenial equaions and heir applicaions We consider some examples of sysems of linear differenial equaions wih consan coefficiens y = a y +... + a

More information

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3 and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013 IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization A Forward-Backward Spliing Mehod wih Componen-wise Lazy Evaluaion for Online Srucured Convex Opimizaion Yukihiro Togari and Nobuo Yamashia March 28, 2016 Absrac: We consider large-scale opimizaion problems

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information

Primal-Dual Splitting: Recent Improvements and Variants

Primal-Dual Splitting: Recent Improvements and Variants Primal-Dual Spliing: Recen Improvemens and Varians 1 Thomas Pock and 2 Anonin Chambolle 1 Insiue for Compuer Graphics and Vision, TU Graz, Ausria 2 CMAP & CNRS École Polyechnique, France The proximal poin

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

EECE251. Circuit Analysis I. Set 4: Capacitors, Inductors, and First-Order Linear Circuits

EECE251. Circuit Analysis I. Set 4: Capacitors, Inductors, and First-Order Linear Circuits EEE25 ircui Analysis I Se 4: apaciors, Inducors, and Firs-Order inear ircuis Shahriar Mirabbasi Deparmen of Elecrical and ompuer Engineering Universiy of Briish olumbia shahriar@ece.ubc.ca Overview Passive

More information

Ordinary dierential equations

Ordinary dierential equations Chaper 5 Ordinary dierenial equaions Conens 5.1 Iniial value problem........................... 31 5. Forward Euler's mehod......................... 3 5.3 Runge-Kua mehods.......................... 36

More information

4 Sequences of measurable functions

4 Sequences of measurable functions 4 Sequences of measurable funcions 1. Le (Ω, A, µ) be a measure space (complee, afer a possible applicaion of he compleion heorem). In his chaper we invesigae relaions beween various (nonequivalen) convergences

More information

CHAPTER 2 Signals And Spectra

CHAPTER 2 Signals And Spectra CHAPER Signals And Specra Properies of Signals and Noise In communicaion sysems he received waveform is usually caegorized ino he desired par conaining he informaion, and he undesired par. he desired par

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

On-line Adaptive Optimal Timing Control of Switched Systems

On-line Adaptive Optimal Timing Control of Switched Systems On-line Adapive Opimal Timing Conrol of Swiched Sysems X.C. Ding, Y. Wardi and M. Egersed Absrac In his paper we consider he problem of opimizing over he swiching imes for a muli-modal dynamic sysem when

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS LECTURE : GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS We will work wih a coninuous ime reversible Markov chain X on a finie conneced sae space, wih generaor Lf(x = y q x,yf(y. (Recall ha q

More information

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE Topics MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES 2-6 3. FUNCTION OF A RANDOM VARIABLE 3.2 PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE 3.3 EXPECTATION AND MOMENTS

More information

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS *

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS * haper 5 HERNOFF DISTANE AND AFFINITY FOR TRUNATED DISTRIBUTIONS * 5. Inroducion In he case of disribuions ha saisfy he regulariy condiions, he ramer- Rao inequaliy holds and he maximum likelihood esimaor

More information

Computer-Aided Analysis of Electronic Circuits Course Notes 3

Computer-Aided Analysis of Electronic Circuits Course Notes 3 Gheorghe Asachi Technical Universiy of Iasi Faculy of Elecronics, Telecommunicaions and Informaion Technologies Compuer-Aided Analysis of Elecronic Circuis Course Noes 3 Bachelor: Telecommunicaion Technologies

More information

System of Linear Differential Equations

System of Linear Differential Equations Sysem of Linear Differenial Equaions In "Ordinary Differenial Equaions" we've learned how o solve a differenial equaion for a variable, such as: y'k5$e K2$x =0 solve DE yx = K 5 2 ek2 x C_C1 2$y''C7$y

More information