arxiv: v1 [math.oc] 11 Dec 2014

Similar documents
Variants of Pegasos. December 11, 2009

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

Solution in semi infinite diffusion couples (error function analysis)

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Robustness Experiments with Two Variance Components

An introduction to Support Vector Machine

Volatility Interpolation

Lecture VI Regression

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Cubic Bezier Homotopy Function for Solving Exponential Equations

Machine Learning Linear Regression

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

Advanced Machine Learning & Perception

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

Lecture 6: Learning for Control (Generalised Linear Regression)

CHAPTER 10: LINEAR DISCRIMINATION

Notes on the stability of dynamic systems and the use of Eigen Values.

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

Robust and Accurate Cancer Classification with Gene Expression Profiling

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

CS286.2 Lecture 14: Quantum de Finetti Theorems II

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

Department of Economics University of Toronto

Clustering (Bishop ch 9)

TSS = SST + SSE An orthogonal partition of the total SS

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

FTCS Solution to the Heat Equation

On One Analytic Method of. Constructing Program Controls

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Relative controllability of nonlinear systems with delays in control

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times

( ) () we define the interaction representation by the unitary transformation () = ()

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

MANY real-world applications (e.g. production

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction

Computing Relevance, Similarity: The Vector Space Model

Graduate Macroeconomics 2 Problem set 5. - Solutions

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Chapter Lagrangian Interpolation

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

Single-loop System Reliability-Based Design & Topology Optimization (SRBDO/SRBTO): A Matrix-based System Reliability (MSR) Method

CHAPTER 5: MULTIVARIATE METHODS

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Lecture 11 SVM cont

arxiv: v1 [cs.sy] 2 Sep 2014

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair

Fall 2010 Graduate Course on Dynamic Learning

Math 128b Project. Jude Yuen

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

Dual Approximate Dynamic Programming for Large Scale Hydro Valleys

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Comparison of Differences between Power Means 1

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Sampling Procedure of the Sum of two Binary Markov Process Realizations

Machine Learning 2nd Edition

Li An-Ping. Beijing , P.R.China

Including the ordinary differential of distance with time as velocity makes a system of ordinary differential equations.

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

GMM parameter estimation. Xiaoye Lu CMPS290c Final Project

Joint Channel Estimation and Resource Allocation for MIMO Systems Part I: Single-User Analysis

Privacy Preserving Randomized Gossip Algorithms. University of Edinburgh, UK KAUST, KSA Higher School of Economics, Russia.

Tight results for Next Fit and Worst Fit with resource augmentation

Boosted LMS-based Piecewise Linear Adaptive Filters

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Comb Filters. Comb Filters

Dynamic Team Decision Theory

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

10. A.C CIRCUITS. Theoretically current grows to maximum value after infinite time. But practically it grows to maximum after 5τ. Decay of current :

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β

A Simulation Based Optimal Control System For Water Resources

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation

WiH Wei He

Linear Response Theory: The connection between QFT and experiments

2/20/2013. EE 101 Midterm 2 Review

Survival Analysis and Reliability. A Note on the Mean Residual Life Function of a Parallel System

Chapter 4. Neural Networks Based on Competition

( ) [ ] MAP Decision Rule

On computing differential transform of nonlinear non-autonomous functions and its applications

How about the more general "linear" scalar functions of scalars (i.e., a 1st degree polynomial of the following form with a constant term )?

Technical report a

COMPUTER SCIENCE 349A SAMPLE EXAM QUESTIONS WITH SOLUTIONS PARTS 1, 2

Advanced Macroeconomics II: Exchange economy

Optimal environmental charges under imperfect compliance

Introduction to Boosting

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

[Link to MIT-Lab 6P.1 goes here.] After completing the lab, fill in the following blanks: Numerical. Simulation s Calculations

Bandlimited channel. Intersymbol interference (ISI) This non-ideal communication channel is also called dispersive channel

A GENERAL FRAMEWORK FOR CONTINUOUS TIME POWER CONTROL IN TIME VARYING LONG TERM FADING WIRELESS NETWORKS

GAME theory is a field of mathematics that studies conflict. Dynamic Potential Games with Constraints: Fundamentals and Applications in Communications

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov

CS 268: Packet Scheduling

2 Aggregate demand in partial equilibrium static framework

APOC #232 Capacity Planning for Fault-Tolerant All-Optical Network

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

Transcription:

Nework Newon Aryan Mokhar, Qng Lng and Alejandro Rbero Dep. of Elecrcal and Sysems Engneerng, Unversy of Pennsylvana Dep. of Auomaon, Unversy of Scence and Technology of Chna arxv:1412.374v1 [mah.oc] 11 Dec 214 Absrac We consder mnmzaon of a sum of conve objecve funcons where he componens of he objecve are avalable a dfferen nodes of a nework and nodes are allowed o only communcae wh her neghbors. The use of dsrbued subgraden or graden mehods s wdespread bu hey ofen suffer from slow convergence snce hey rely on frs order nformaon, whch leads o a large number of local communcaons beween nodes n he nework. In hs paper we propose he Nework Newon NN) mehod as a dsrbued algorhm ha ncorporaes second order nformaon va dsrbued evaluaon of appromaons o Newon seps. We also nroduce adapve A)NN n order o esablsh eac convergence. Numercal analyses show sgnfcan mprovemen n boh convergence me and number of communcaons for NN relave o esng frs order) alernaves. I. INTRODUCTION Dsrbued opmzaon algorhms are used o mnmze a global cos funcon over a se of nodes n suaons where he objecve funcon s defned as a sum of a se of local funcons. Consder a varable R p and a conneced nework conanng n agens each of whch has access o a local funcon f : R p R. The agens cooperae n mnmzng he aggregae cos funcon f : R p R akng values f) := f). I.e., agens cooperae n solvng he global opmzaon problem := argmn f) = argmn f ). 1) Problems of hs form arse ofen n, e.g., wreless sysems [1], [2], sensor neworks [3], [4], and large scale machne learnng [5]. There are dfferen algorhms o solve 1) n a dsrbued manner. The mos popular alernaves are decenralzed graden descen ) [6] [9], dsrbued mplemenaons of he alernang drecon mehod of mulplers [3], [1] [12], and decenralzed dual averagng DDA) [13]. A feaure common o all of hese algorhms s he slow convergence rae n llcondoned problems snce hey operae on frs order nformaon only. Ths paper consders Nework Newon NN), a mehod ha reles on dsrbued appromaons of Newon seps for he global cos funcon f o accelerae convergence of he algorhm. We begn hs paper by nroducng he dea ha solves a penalzed verson of 1) usng graden descen n leu of solvng he orgnal opmzaon problem. To accelerae he convergence of graden descen mehod for solvng he penaly verson of 1) we advocae he use of he NN algorhm. Ths algorhm reles on appromaons o he Newon sep of he penalzed objecve funcon by runcang he Taylor seres of he eac Newon sep Secon II-A). These appromaons o he Newon sep can be compued n a dsrbued manner wh a level of localy conrolled by he number K of elemens ha are reaned n he Taylor s seres. When we rean K elemens n he seres we say ha we mplemen NN-K. We prove ha for a fed penaly coeffcen lower and upper bounds on he Hessans of local objecve funcons f are suffcen o guaranee a leas lnear convergence of NN-K o he opmal argumens of penalzed opmzaon problem Theorem 1). Furher, We nroduce an adapve verson of NN-K ANN-K) ha uses an ncreasng penaly coeffcen o acheve eac convergence o he opmal soluon of 1) Secon II-B). We sudy he advanages of NN-K relave o, boh n erms of number of eraons and communcaons for convergence for solvng a famly of quadrac objecve problems Secon IV). Work n hs paper s suppored by ARO W911NF-1-1-388, NSF CAREER CCF-952867, and ONR N14-12-1-997. II. PROBLEM FORMULATION AND ALGORITHM DEFINITION The nework ha connecs he agens s assumed symmerc and specfed by he neghborhoods N ha conan he ls of nodes han can communcae wh for = 1,..., n. s an esablshed dsrbued mehod o solve 1) whch reles on he nroducon of local varables R p and nonnegave weghs w j ha are no null f and only f j = or f j N. Leng N be a dscree me nde and α a gven sepsze, s defned by he recurson,+1 = w j j, α f,), = 1,..., n. 2) j=1 Snce w j = when j and j / N, follows from 2) ha each agen updaes s esmae of he opmal vecor by performng an average over he esmaes j, of s neghbors j N and s own esmae,, and descendng hrough he negave local graden f,). Noe ha weghs w j ha nodes assgn o each oher form a wegh mar W R n n ha s symmerc and row sochasc. I s also cusomary o requre he rank of I W o be n 1 so ha nulli W) = span1). If he wo assumpons W T = W and nulli W) = 1 are rue, s possble o show ha 2) approaches he soluon of 1) n he sense ha, for all and large, [6]. To rewre 2) defne he mar Z := W I R np np as he Kronecker produc of wegh mar W R n n and he deny mar I R p p. Furher, we nroduce vecors y := [ ] 1;... ; n [ R np ha concaenaes he local vecors, and vecor hy) := f1 1);... ; f ] n n) R np whch concaenaes he gradens of he local funcons f aken wh respec o he local varable. I s hen ready o see ha 2) s equvalen o y +1 = Zy αhy ) = y [ I Z)y + αhy ) ], 3) where n he second equaly we added and subraced y and regrouped erms. Inspecon of 3) reveals ha he updae formula a sep s equvalen o a regular) graden descen algorhm beng used o solve he program y := argmn F y) := mn 1 2 yt I Z) y + α f ). 4) Observe ha s possble o wre he graden of F y) as g := F y ) = I Z)y + αhy ), 5) n order o wre 3) as y +1 = y g and conclude ha descends along he negave graden of F y) wh un sepsze. The epresson n 2) s jus a local mplemenaon of 5) where node mplemens he descen,+1 =, g, where g, s he h elemen of he graden g = [g,;... ; g,]. Node can compue he local graden g, = 1 w ), j N w j j, + α f,). 6) Noce ha snce we know ha he null space of I W s nulli W) = span1) and ha Z = W I, we oban ha he span of I Z s nulli Z) = span1 I). Thus, we have ha I Z)y = holds f and only f 1 = = n. Snce he mar I Z s posve semdefne because s sochasc and symmerc, he same s rue of he square roo mar I Z) 1/2. Therefore, we have ha he opmzaon problem

2 n 1) s equvalen o he opmzaon problem ỹ := argmn f ), s.. I Z) 1/2 y =. 7) Indeed, for y = [ 1;... ; n] o be feasble n 7) we mus have 1 = = n because null[i Z) 1/2 ] = span1 I) as already argued. When resrced o hs feasble se he objecve f) of 7) s he same as he objecve of 1) from where follows ha a soluon ỹ = [ 1;... ; n] of 7) s such ha = = for all,.e. ỹ = [ 1;... ; n]. The unconsraned mnmzaon n 4) s a penaly verson of 7). The penaly funcon assocaed wh he consran I Z) 1/2 y = s he squared norm 1/2) I Z) 1/2 y 2 and he correspondng penaly coeffcen s 1/α. Inasmuch as he penaly coeffcen 1/α s suffcenly large, he opmal argumens y and ỹ are no oo far apar. In hs paper we eplo he renerpreaon of 3) as a mehod o mnmze 4) o propose an appromae Newon algorhm ha can be mplemened n a dsrbued manner. We eplan hs algorhm n he followng secon. A. Nework Newon Insead of solvng 4) wh a graden descen algorhm as n, we can solve 4) usng Newon s mehod. To mplemen Newon s mehod we need o compue he Hessan H := 2 F y ) of F evaluaed a y so as o deermne he Newon sep d := H 1 g. Sar by dfferenang wce n 4) n order o wre H as H := 2 F y ) = I Z + αg, 8) where he mar G R np np s a block dagonal mar formed by blocks G, R p p conanng he Hessan of he h local funcon, G, = 2 f,). 9) I follows from 8) and 9) ha he Hessan H s block sparse wh blocks H j, R p p havng he sparsy paern of Z, whch s he sparsy paern of he graph. The dagonal blocks are of he form H, = 1 w )I + α 2 f,) and he off dagonal blocks are no null only when j N n whch case H j, = w ji. Whle he Hessan H s sparse, he nverse H s no. I s he laer ha we need o compue he Newon sep d := H 1 g. To overcome hs problem we spl he dagonal and off-dagonal blocks of H and rely on a Taylor s epanson of he nverse. To be precse, wre H = D B where he mar D s defned as D := αg + 2 I dagz)) := αg + 2 I Z d ), 1) n he second equaly we defned Z d := dagz) for fuure reference. Snce he dagonal weghs mus be w < 1, he mar I Z d s posve defne. The same s rue of he block dagonal mar G because he local funcons are assumed srongly conve. Therefore, he mar D s block dagonal and posve defne. The h dagonal block D, R p of D can be compued and sored by node as D, = α 2 f,) + 21 w )I. To have H = D B we mus defne B := D H. Consderng he defnons of H and D n 8) and 1), follows ha B = I 2Z d + Z. 11) Observe ha B s ndependen of me and depends on he wegh mar Z only. As n he case of he Hessan H, he mar B s block sparse wh wh blocks B j R p p havng he sparsy paern of Z, whch s he sparsy paern of he graph. Node can compue he dagonal blocks B = 1 w )I and he off dagonal blocks B j = w ji usng he local nformaon abou s own weghs only. Proceed now o facor D 1/2 from boh sdes of he splng relaonshp o wre H = D 1/2 I D 1/2 BD 1/2 ) 1 D 1/2. When we consder he Hessan nverse H 1, we can use he Taylor seres I X) 1 = Algorhm 1 Nework Newon-K mehod a node Requre: Inal erae,. 1: for =, 1, 2,... do 2: Echange eraes, wh neghbors j N. 3: Graden: g, = 1 w ), 4: Compue descen drecon d ), 5: for k =,..., K 1 do 6: Echange local elemens d k), 7: NN-k + 1) sep: d k+1), j N w j j, + α f,). = D 1, g, of he NN-k sep wh neghbors B jd k)., j N,j= 8: end for 9: Updae local erae:,+1 =, + ɛ d K),. 1: end for j= Xj wh X /2 B o wre H 1 /2 k= j, g, B 1/2 D. 12) Observe ha he sum n 12) converges f he absolue value of all he egenvalues of mar B are srcly less han 1. Ths resul s proven n [14]. Nework Newon NN) s defned as a famly of algorhms ha rely on runcaons of he seres n 12). The Kh member of hs famly, NN-K consders he frs K + 1 erms of he seres o defne he appromae Hessan nverse Ĥ K) 1 :/2 K k= B 1/2 D. 13) NN-K uses he appromae Hessan ĤK) 1 as a curvaure correcon mar ha s used n leu of he eac Hessan nverse H 1 o esmae he Newon sep. I.e., nsead of descendng along he Newon sep d := H 1 g we descend along he NN-K sep d K) := ĤK) 1 g, whch we nend as an appromaon of d. Usng he eplc epresson for n 13) we wre he NN-K sep as Ĥ K) 1 d K) = K k= B 1/2 D g, 14) where, we recall, he vecor g s he graden of objecve funcon F y) defned n 5). The NN-K updae formula can hen be wren as y +1 = y + ɛ d K). 15) The algorhm defned by recursve applcaon of 15) can be mplemened n a dsrbued manner because he runcaed seres n 13) has a local srucure conrolled by he parameer K. To eplan hs saemen beer defne he componens d K), R p of he NN-K sep d K) = [d K) 1, ;... ; dk) n, ]. A dsrbued mplemenaon of 15) requres ha node compues d K), so as o mplemen he local descen,+1 =, + ɛd K),. The sep componens dk), can be compued hrough local compuaons. To see ha hs s rue frs noe ha consderng he defnon of he NN-K descen drecon n 14) he sequence of NN descen drecons sasfes d k+1) Bd k) d k+1), D 1 g Bd k) g ). 16) Then observe ha snce he mar ˆB has he sparsy paern of he graph, hs recurson can be decomposed no local componens, j N,j= B jd k) j, g, ), 17) The mar D, = α 2 f,)+21 w )I s sored and compued a

3 Algorhm 2 Compuaon of NN-K sep a node. 1: funcon = NN-Kα,, ol) 2: whle g > ol do 3: B mar blocks: B = 1 w )I and B j = w ji 4: D mar block: D, = α 2 f ) + 21 w )I 5: Echange eraes wh neghbors j N. 6: Graden: g = 1 w ) j N w j j + α f ). 7: Compue descen drecon d ) 8: for k =,..., K 1 do 9: Echange elemens d k) [ 1: NN-k + 1) sep: d k+1) 11: end for 12: Updae local erae: = + ɛ d K). 13: end whle = D 1 g of he NN-k sep wh neghbors B jd k) j g ]. j N,j= node. The graden componen g, = 1 w ), j N w j j, + α f,) s also sored and compued a. Node can also evaluae he values of he mar blocks B j = w ji. Thus, f he NN-k sep componens d k) j, are avalable a neghborng nodes j, node can hen deermne he NN-k + 1) sep componen d k+1), upon beng communcaed ha nformaon. The epresson n 17) represens an erave compuaon embedded nsde he NN-K recurson n 15). For each me nde, we compue he local componen of he sep d ), = D 1,g,. Upon echangng hs nformaon wh neghbors we use 17) o deermne he sep componens d 1),. These can be echanged and plugged n 17) o compue d 2),. Repeang hs procedure K mes, nodes ends up havng deermned her NN-K sep componen d K),. The NN-K mehod s summarzed n Algorhm 1. The descen eraon n 15) s mplemened n Sep 9. Implemenaon of hs descen requres access o he NN-K descen drecon d K), whch s compued by he loop n seps 4-8. Sep 4 nalzes he loop by compung he sep d ), = D 1,g,. The core of he loop s n Sep 7 whch corresponds o he recurson n 17). Sep 6 sands for he varable echange ha s necessary o mplemen Sep 7. Afer K eraons hrough hs loop he NN-K descen drecon d K), s compued and can be used n Sep 9. Boh, seps 4 and 9, requre access o he local graden componen g,. Ths s evaluaed n Sep 3 afer recevng he prerequse nformaon n Sep 2. B. Adapve Nework Newon As menoned n Secon II, NN-K algorhm nsead of solvng 1) or s equvalen 7), solves a penaly verson of 7) as nroduced n 4). The opmal soluons of opmzaon problems 7) and 4) are dfferen and he gap beween hem s upper bounded by Oα) [8]. Ths observaon mples ha by seng a decreasng polcy for α or equvalenly an ncreasng polcy for penaly coeffcen 1/α, he soluon of 7) approaches he mnmzer of 4),.e. ỹ y for α. We nroduce Adapve Nework Newon-K ANN-K) as a verson of NN-K ha uses a decreasng sequence of α o acheve eac convergence o he opmal soluon of 1). The dea of ANN-K s o decrease parameer α by mulplyng by η < 1,.e., α +1 = ηα, when he sequence generaed by NN-K s converged for a specfc value of α. To be more precse, each node has a sgnal vecor s = [s 1;... ; s n] {, 1} n where each componen s a bnary varable. Noe ha s j corresponds o he occurrence of recevng a sgnal a node from node j. Hence, nodes nalze her sgnalng componens by for all he nodes n he nework. A eraon node compues s local graden norm g,. If he norm of graden s smaller han a specfc value called ol,.e. g, ol, ses he local sgnal componen o s = 1 and sends a sgnal o all he nodes n he nework. The recever Algorhm 3 Adapve Nework Newon-K mehod a node Requre: Inal erae,, nal penaly parameer α and nal sequence of bs s = [s 1;... ; s n] = [;... ; ]. 1: for =, 1, 2,... do 2: Call NN-K funcon:,+1 = NN-Kα,,, ol) 3: Se s = 1 and broadcas scalar o all nodes. 4: Se s j = 1 for all nodes j ha sen a sgnal. 5: f s j = 1 for all j = 1,..., n hen 6: Updae penaly parameer α +1 = ηα. 7: Se s j = for all j = 1,..., n. 8: end f 9: end for nodes se he correspondng componen of node n her local sgnal vecors o 1,.e. s j = 1 for j. Ths procedure mples ha he sgnal vecors of all nodes n he nework are always synchronous. The updae for parameer α occurs when all he componens of sgnal vecor are 1 whch s equvalen o achevng he requred accuracy for all nodes n he nework. Snce he number of mes ha α should be updaed s small, he cos of communcaon for updang α s affordable. The ANN-K mehod s summarzed n Algorhm 3. A each eraon of ANN-K algorhm a Sep 2 funcon NN-K Sep s called o updae varable, for node. Noe ha funcon NN-K whch s nroduced n Algorhm 2, runs NN-K sep unl he me ha norm of local graden s smaller han a hreshold g ol. Afer achevng hs accuracy, n Seps 3 node updaes s local sgnal componen s o 1 and sends o he oher nodes. In Sep 4 each node updaes he sgnal vecor componens of oher nodes n he nework. Then, n Sep 6 he nodes updae he penaly parameer for he ne eraon as α +1 = ηα f all he componens of sgnal vecor s 1, oherwse hey use he prevous value α +1 = α. In order o rese he sysem afer updang α, all sgnal vecors are se o,.e. s = for = 1,..., n as n Sep 7. III. CONVERGENCE ANALYSIS In hs secon we show ha as me progresses he sequence of objecve funcon F y ) defned n 4) approaches he opmal objecve funcon value F y ) by consderng he followng assumpons. Assumpon 1 There ess consans δ < < 1 ha lower and upper bound he dagonal weghs for all, δ w < 1 = 1,..., n. 18) Assumpon 2 The egenvalues of local objecve funcon Hessans 2 f ) are bounded wh posve consans < m M <,.e. mi 2 f ) MI. 19) Assumpon 3 The local objecve funcon Hessans 2 f ) are Lpschz connuous wh parameer L wh respec o Eucldan norm, 2 f ) 2 f ˆ) L ˆ. 2) Lnear convergence of objecve funcon F y ) o he opmal objecve funcon F y ) s shown n [14] whch we menon as a reference. Theorem 1 Consder he NN-K mehod as defned n 1)-15) and he objecve funcon F y) as nroduced n 4). If he sepsze ɛ s chosen as ɛ = mn {1, ɛ } where ɛ s a consan ha depends on problem parameers, and Assumpons 1, 2, and 3 hold rue, he sequence F y ) converges o he opmal argumen F y ) a leas lnearly wh consan < 1 ζ < 1. I.e., F y ) F y ) 1 ζ) F y ) F y )). 21) Theorem 1 shows lnear convergence of sequence of objecve funcon F y ). In he followng secon we sudy he performances of NN and ANN mehods va dfferen numercal epermens.

4 1 1 1 1 2 1 1 1 1 2 5 1 15 Number of eraons Fg. 1: Convergence of,,, and n erms of number of eraons. The NN mehods converges faser han. Furhermore, he larger K s, he faser NN-K converges. 5 1 15 Number of local nformaon echanges Fg. 2: Convergence of,,, and n erms of number of communcaon echanges. The NN-K mehods rean he advanage over bu ncreasng K may no resul n faser convergence. For hs parcular nsance s acually ha converges fases n erms of number of communcaon echanges. IV. NUMERICAL ANALYSIS We compare he performance of and dfferen versons of NN n he mnmzaon of a dsrbued quadrac objecve. The comparson s done n erms of boh, number of eraons and number of nformaon echanges. Specfcally, for each agen we consder a posve defne dagonal mar A S ++ p and a vecor b R p o defne he local objecve funcon f ) := 1/2) T A + b T. Therefore, he global cos funcon f) s wren as f) := 1 2 T A + b T. 22) The dffculy of solvng 22) s gven by he condon number of he marces A. To adjus condon numbers we generae dagonal marces A wh random dagonal elemens a. The frs p/2 dagonal elemens a are drawn unformly a random from he dscree se {1, 1 1,..., 1 ξ } and he ne p/2 are unformly and randomly chosen from he se {1, 1 1,..., 1 ξ }. Ths choce of coeffcens yelds local marces A wh egenvalues n he nerval [1 ξ, 1 ξ ] and global marces A wh egenvalues n he nerval [n1 ξ, n1 ξ ]. The condon numbers are ypcally 1 2ξ for he local funcons and 1 ξ for he global objecves. The lnear erms b T are added so ha he dfferen local funcons have dfferen mnma. The vecors b are chosen unformly a random from he bo [, 1] p. For he quadrac objecve n 22) we can compue he opmal argumen n closed form. We hen evaluae convergence hrough he relave error ha we defne as he average normalzed squared dsance beween local vecors and he opmal decson vecor, e := 1, 2. 23) n The nework connecng he nodes s a d-regular cycle where each node s conneced o eacly d neghbors and d s assumed even. The graph s generaed by creang a cycle and hen connecng each node wh he d/2 nodes ha are closes n each drecon. The dagonal weghs n he mar W are se o w = 1/2 + 1/2d + 1) and he off dagonal weghs o w j = 1/2d + 1) when j N. In he subsequen epermens we se he nework sze o n = 1, he dmenson of he decson vecors o p = 4, he condon number parameer o ξ = 2, he penaly coeffcen nverse o α = 1 2, and he nework degree o d = 4. The NN sep sze s se o ɛ = 1, whch s always possble when we have quadrac objecves. Fgure 1 llusraes a sample convergence pah for,,, and by measurng he relave error e n 23) wh respec o he number of eraons. As epeced for a problem ha doesn have a small condon number n Emprcal dsrbuon Emprcal dsrbuon.2.15.1.5.15.5 3 4 5 6.2.1 Number of nformaon echanges 25 3 35 4 45 5 Number of nformaon echanges Emprcal dsrbuon Emprcal dsrbuon.2.15.1.5 3 35 4 45 5 55.2.15.1.5 Number of nformaon echanges 25 3 35 4 45 5 Number of nformaon echanges Fg. 3: Hsograms of he number of nformaon echanges requred o achevng accuracy e < 1 2. The qualave observaons made n fgures 1 and 2 hold over a range of random problem realzaons. hs parcular nsanaon of he funcon n 22) he condon number s 95.2 dfferen versons of NN are much faser han. E.g., afer = 1.5 1 3 eraons he error assocaed whch eraes s e 1.9 1 1. Comparable or beer accuracy e < 1.9 1 1 s acheved n = 132, = 63, and = 43 eraons for,, and, respecvely. Furher recall ha α conrols he dfference beween he acual opmal argumen ỹ = [ ;... ; ] [cf. 7)] and he argumen y [cf. 4)] o whch and NN converge. Snce we have α = 1 2 and he dfference beween hese wo vecors s of order Oα), we epec he error n 23) o sele a e 1 2. The error acually seles a e 6.3 1 3 and akes all hree versons of NN less han = 4 eraons o do so. I akes more han = 1 4 eraons o reach hs value. Ths relave performance dfference decreases f he problem has beer condonng bu can be made arbrarly large by ncreasng he condon number of he mar A. The number of eraons requred for convergence can be furher decreased by consderng hgher order appromaons n 14). The advanages would be msleadng because hey come a he cos of ncreasng he number of communcaons requred o appromae he Newon sep. To sudy hs laer effec we consder he relave performance of and dfferen versons of NN n erms of he number of local nformaon echanges. Noe ha each eraon n NN-K requres a oal of K + 1 nformaon echanges wh each neghbor, as opposed o he sngle

5 1 1 1 1 1 2 1 3 1 4 1 5 1 1 1 2 1 3 1 4 1 5 1 2 3 4 5 6 7 8 9 1 Number of eraons Fg. 4: Convergence of adapve,,, and for α =1 2. 1 2 3 4 5 6 7 8 9 1 Number of eraons Fg. 5:Convergence of Adapve,,, and for α =1 1. varable echange requred by. Afer eraons he number of varable echanges beween each par of neghbors s for and K +1) for NN-K. Thus, we can ranslae Fgure 1 no a pah n erms of number of communcaons by scalng he me as by K + 1). The resul of hs scalng s shown n Fgure 2. The dfferen versons of NN rean a sgnfcan, albe smaller, advanage wh respec o. Error e < 1 2 s acheved by,, and afer K + 1) = 3.7 1 2, K + 1) = 3.1 1 2, and K + 1) = 3.4 1 2 varable echanges, respecvely. When measured n hs merc s no longer rue ha ncreasng K resuls n faser convergence. For hs parcular problem nsance s acually ha converges fases n erms of number of communcaon echanges. For a more more comprehensve evaluaon we consder 1 3 dfferen random realzaons of 22) where we also randomze he degree d of he d-regular graph ha we choose from he even numbers n he se [2, 1]. The remanng parameers are he same used o generae fgures 1 and 2. For each jon random realzaon of nework and objecve we run,,, and, unl achevng error e < 1 2 and record he number of communcaon echanges ha have elapsed whch amoun o smply for and K + 1) for NN. The resulng hsograms are shown n Fgure 3. The mean mes requred o reduce he error o e < 1 2 are 4.3 1 3 for and 4. 1 2, 3.5 1 2, and 3.7 1 2 for,, and. As n he parcular case shown n fgures 1 and 2, performs bes n erms of communcaon echanges. Observe, however, ha he number of communcaon echanges requred by s no much larger and ha requres less compuaonal effor han because he number of eraons s smaller. A. Adapve Nework Newon Gven ha and NN are penaly mehods s of neres o consder her behavor when he nverse penaly parameer α s decreased recursvely. The adapaon of α for NN-K s dscussed n Secon II-B where s ermed adapve A)NN-K. The same adapaon sraegy s consdered here for. The parameer α s kep consan unl he local graden componens g, become smaller han a gven olerance ol,.e., unl g, ol for all. When hs olerance s acheved, he parameer α s scaled by a facor η < 1,.e., α s decreased from s curren value o ηα. Ths requres he use of a sgnalng mehod lke he one summarzed n Algorhm 3 for ANN-K. We consder he objecve n 22) and nodes conneced by a d-regular cycle. We use he same parameers used o generae fgures 1 and 2. The adapve graden olerance s se o ol = 1 3 and he scalng parameer o η =.1. We consder wo dfferen scenaros where he nal penaly parameers are α = α = 1 1 and α = α = 1 2. The respecve error rajecores e wh respec o he number o eraons are shown n fgures 4 where α = 1 2 and 5 where α = 1 1. In each fgure we show e for adapve, A, A, and A. Boh fgures show ha he ANN mehods ouperform adapve and ha larger K reduces he number of eraons ha akes ANN-K o acheve a arge error. These resuls are conssen wh he fndngs summarzed n fgures 1-3. More neresng conclusons follow from a comparson across fgures 1 and 2. We can see ha s beer o sar wh he larger) value α = 1 1 even f he mehod nally converges o a pon farher from he acually opmum. Ths happens because problems wh larger α are beer condoned and hus easer o mnmze. REFERENCES [1] A. Rbero, Ergodc sochasc opmzaon algorhms for wreless communcaon and neworkng, IEEE Trans. Sgnal Process.., vol. 58, no. 12, pp. 6369 6386, December 21. [2], Opmal resource allocaon n wreless communcaon and neworkng, EURASIP J. Wreless commun., vol. 212, no. 272, pp. 3727 3741, Augus 212, pua carajo. [3] I. Schzas, A. Rbero, and G. Gannaks, Consensus n ad hoc wsns wh nosy lnks - par : Dsrbued esmaon of deermnsc sgnals, IEEE Transacons on Sgnal Processng, vol. 56, pp. 35 364, 28. [4] M. Rabba and R. Nowak, Dsrbued opmzaon n sensor neworks, proceedngs of he 3rd nernaonal symposum on Informaon processng n sensor neworks, pp. 2 27, ACM, 24. [5] V. Cevher, S. Becker, and M. Schmd, Conve opmzaon for bg daa: Scalable, randomzed, and parallel algorhms for bg daa analycs, IEEE Sgnal Processng Magazne, vol. 31, pp. 32 43, 214. [6] A. Nedc and A. Ozdaglar, Dsrbued subgraden mehods for mulagen opmzaon, IEEE Transacons on Auomac Conrol, vol. 54, pp. 48 61, 29. [7] D. Jakovec, J. Xaver, and J. Moura, Fas dsrbued graden mehods, IEEE Transacons on Auomac Conrol, vol. 59, pp. 1131 1146, 214. [8] K. Yuan, Q. Lng, and W. Yn, On he convergence of decenralzed graden descen, arxv preprn arxv, 131.763, 213. [9] W. Sh, Q. Lng, G. Wu, and W. Yn, Era: An eac frs-order algorhm for decenralzed consensus opmzaon, arxv preprn arxv, 144.6264 214. [1] Q. Lng and A. Rbero, Decenralzed lnearzed alernang drecon mehod of mulplers, Proc. In. Conf. Acouscs Speech Sgnal Process., pp. 5447 5451, 214. [11] S. Boyd, N. Parkh, E. Chu, B. Peleao, and J. Ecksen, Dsrbued opmzaon and sascal learnng va he alernang drecon mehod of mulplers, Foundaons and Trends n Machne Learnng, vol. 3, no. 1, pp. 1 122, 211. [12] W. Sh, Q. Lng, G. Wu, and W. Yn, On he lnear convergence of he admm n decenralzed consensus opmzaon, IEEE Transacons on Sgnal Processng, vol. 62, pp. 175 1761, 214. [13] J. Duch, A. Agarwal, and M. Wanwrgh, Dual averagng for dsrbued opmzaon: Convergence analyss and nework scalng, IEEE Transacons on Auomac Conrol, vol. 57, pp. 592 66, 212. [14] A. Mokhar, Q. Lng, and A. Rbero, An appromae newon mehod for dsrbued opmzaon, 214, avalable a hp://www.seas.upenn.edu/ aryanm/wk/nn-icassp.pdf.