A Distributed Newton Method for Network Optimization

Size: px

Start display at page:

Download "A Distributed Newton Method for Network Optimization"

Doreen Hudson
6 years ago
Views:

1 A Distributed Newton Method for Networ Optimization Ali Jadbabaie, Asuman Ozdaglar, and Michael Zargham Abstract Most existing wor uses dual decomposition and subgradient methods to solve networ optimization problems in a distributed manner, which suffer from slow convergence rate properties. This paper proposes an alternative distributed approach based on a Newton-type method for solving minimum cost networ optimization problems. The ey component of the method is to represent the dual Newton direction as the solution of a discrete Poisson equation involving the graph Laplacian. This representation enables using an iterative consensus-based local averaging scheme with an additional input term) to compute the Newton direction based only on local information. We show that even when the iterative schemes used for computing the Newton direction and the stepsize in our method are truncated, the resulting iterates converge superlinearly within an explicitly characterized error neighborhood. Simulation results illustrate the significant performance gains of this method relative to subgradient methods based on dual decomposition. I. INTRODUCTION The standard approach to distributed optimization in networs is to use dual decomposition and subgradient or firstorder) methods, which for some classes of problems yields iterative algorithms that operate on the basis of local information see [3], [4],[], and [5]). However, a major shortcoming of this approach, particularly relevant in today s largescale networs, is the slow convergence rate of the resulting algorithms. In this paper, we propose an alternative approach based on using Newton-type or second-order) methods for minimum cost networ optimization problems. We show that the proposed method can be implemented in a distributed manner and has faster convergence properties. Consider a networ represented by a directed graph G = N, E). Each edge e in the networ has a convex cost function φ e x e ), which captures the cost due to congestion effects as a function of the flow x e on this edge. The total cost of a flow vector x = [x e ] e E is given by the sum of the edge costs, i.e., e E φ ex e ). Given an external supply b i for each node i N, the minimum cost networ optimization problem is to find a minimum cost flow allocation vector that satisfies the flow conservation constraint at each node. A. Ozdaglar s research was supported by the National Science Foundation under Career grant DMI-5459 and the DARPA ITMANET program. Ali Jadbabaie s research was supported by Army Research lab s MAST Collaborative Technology Alliance, DARPA/DSO SToMP, and ONR MURI N Ali Jadbabaie and Michael Zargham are with the Department of Electrical and Systems Engineering and GRASP Laboratory, University of Pennsylvania. Asuman Ozdaglar is with the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. We focus on feasible problems, i.e., we assume that the total in-flow to the networ is equal to the total out-flow, P i N b i =. This problem can be formulated as a convex optimization problem with linear equality constraints. The application of dual decomposition together with a dual subgradient algorithm then yields a distributed iterative solution method. Instead, we propose a distributed primal-dual Newton-type method that achieves a superlinear convergence rate to an error neighborhood). The challenges in using Newton-type methods in this context are twofold. First, the superlinear convergence rate of this type of methods is achieved by using a bactracing stepsize rule, which relies on global information in the computation of the norm of a residual function used in defining the stepsize). We solve this problem by using a consensus-based local averaging scheme for estimating the norm of the residual function. Second, the computation of the dual Newton step involves a matrix inversion, which requires global information. Our main contribution in this regard is to develop a distributed iterative scheme for the computation of the dual Newton step. The ey idea is to recognize that the dual Newton step is defined through a discrete Poisson equation with constant input), which involves the Laplacian of the graph, and therefore can be solved using a consensus-based scheme in a distributed manner. We show that the convergence rate of this scheme is governed by the spectral properties of the underlying graph. Hence, for fast-mixing graphs i.e., those with large spectral gaps), the dual Newton step can be computed efficiently using only local information. Since our method uses consensus-based schemes to compute the stepsize and the Newton direction in each, exact computation is not feasible. Another major contribution of our paper is to consider truncated versions of these consensusschemes at each and present convergence rate analysis of the constrained Newton method when the stepsize and the direction are estimated with some error. We show that when these errors are sufficiently small, the value of the residual function converges superlinearly to a neighborhood of the origin, whose size is explicitly quantified as a function of the errors and the parameters of the objective function and the constraints of the minimum cost networ optimization problem. Our analysis relates to wor on the convergence rate analysis of inexact Newton methods [8], []). These wors focus on providing conditions on the amount of error at each relative to the norm of the gradient of the current iterate that ensures superlinear convergence to the exact Consensus-based schemes have been used extensively in recent literature as distributed mechanisms for aligning or computing the average of) the values held by multiple agents; see [], [7],[8], and [9].

2 optimal solution essentially requiring the error to vanish in the limit). Hence, our focus on providing convergence rate results to an error neighborhood may be of independent interest. The rest of the paper is organized as follows: Section II defines the minimum cost networ optimization problem and shows that the dual decomposition and subgradient method can be implemented in a distributed manner. This section also presents the constrained primal-dual Newton method for this problem and introduces a distributed iterative scheme for computing the dual Newton step. Section III presents a convergence rate analysis for an inexact Newton method, for which there are errors associated with computation of the step and the stepsize. In section IV, simulations demonstrate that the Newton s method outperforms the subgradient method with respect to runtime. Section V contains our concluding remars. Basic Notation and Notions: A vector is viewed as a column vector, unless clearly stated otherwise. We denote by x i the i-th component of a vector x. When x i for all components i of a vector x, we write x. For a matrix A, we write A ij or [A] ij to denote the matrix entry in the i-th row and j-th column. We write x to denote the transpose of a vector x. The scalar product of two vectors x, y R m is denoted by x y. We use x to denote the standard Euclidean norm, x = x x. For a vector-valued function f : R n R m, the gradient matrix of f at x R n is denoted by fx). A vector a R m is said to be a stochastic vector when its components a i, i =,..., m, are nonnegative and their sum is equal to, i.e., m i= a i =. A square m m matrix A is said to be a stochastic matrix when each row of A is a stochastic vector. A stochastic matrix is called irreducible and aperiodic also nown as primitive) if all eigenvalues except the trivial eigenvalue at ) are subunit. One can associate a discrete-time Marov chain with a stochastic matrix and a graph G as follows: The state of the chain at time {,, }, denoted by X), is a node in N the node set of the graph) and the weight associated to each edge in the graph is the probability with which X maes a transition between two adjacent nodes. In other words, the transition from state i to state j happens with probability p ij, the weight of edge i, j). If π) with elements defined as π i ) = PX) = i) is the probability distribution of the state at time, the state distribution satisfies the recursion π +) T = π) T P. If the chain is irreducible and aperiodic then for all initial distributions, π converges to the unique stationary distribution π []. II. MINIMUM COST NETWORK OPTIMIZATION PROBLEM We consider a networ represented by a directed graph G = N, E) with node set N = {,..., N}, and edge set E = {,..., E}. We denote the flow vector by x = [x e ] e E, where x e denotes the flow on edge e. The flow conservation conditions at the nodes can be compactly expressed as Ax = b, where A is the N E node-edge incidence matrix of the graph, i.e., if edge j leaves node i A ij = if edge j enters node i otherwise, and the vector b denotes the external sources, i.e., b i > or b i < ) indicates b i units of external flow enters or exits) node i. We associate a cost function φ e : R R with each edge e, i.e., φ e x e ) denotes the cost on edge e as a function of the edge flow x e. We assume that the cost functions φ e are strictly convex and twice continuously differentiable. The minimum cost networ optimization problem can be written as E minimize φ e x e ) ) e= subject to Ax = b. In this paper, our goal is to investigate iterative distributed methods for solving problem ). In particular, we focus on two methods: first relies on solving the dual of problem ) using a subgradient method; second uses a constrained Newton method, where, at each, the Newton direction is computed iteratively using an averaging method. A. Dual Subgradient Method We first consider solving problem ) using a dual subgradient method. To define the dual problem, we form the Lagrangian function of problem ) L : R E R N R given by E Lx, λ) = φ e x e ) λ Ax b). e= The dual function qλ) is then given by qλ) = inf Lx, λ) x R E E ) = inf x R E φ e x e ) λ Ax + λ b = E inf x e R e= e= φ e x e ) λ A) e x e) + λ b. Hence, in view of the fact that the objective function and the constraints of problem ) are separable in the decision variables x e, the evaluation of the dual function decomposes into one-dimensional optimization problems. We assume that each of these optimization problems has an optimal solution, which is unique by the strict convexity of the functions φ e and is denoted by x e λ). Using the first order optimality conditions, it can be seen that for each e, x e λ) is given by x e λ) = φ e) λ i λ j ), ) where i, j N denote the end nodes of edge e. Thus, for each edge e, the evaluation of x e λ) can be done based on

3 local information about the edge cost function φ e and the dual variables of the incident nodes i and j. We can write the dual problem as maximize λ R N qλ). The dual problem can be solved by using a subgradient method: given an initial vector λ, the iterates are generated by λ + = λ α g for all, where g is a subgradient of the dual function qλ) at λ = λ given by g = Axλ ) b, xλ ) = argmin x R E Lx, λ ), i.e., for all e E, x e λ ) is given by Eq. ) with λ = λ. This method naturally lends itself to a distributed implementation: each node i updates its dual variable λ i using local subgradient) information g i obtained from edges e incident to that node, which in turn updates its primal variables x e λ) using dual variables of the incident nodes. Despite its simplicity and distributed nature, however, it is well-nown that the dual subgradient method suffers from slow rate of convergence see [6] and [5] for rate analysis and construction of primal solutions for dual subgradient methods), which motivates us to consider a Newton method for solving problem ). B. Equality-Constrained Newton Method We next consider solving problem ) using an infeasible start) equality-constrained Newton method see [4], Chapter ). We let fx) = E e= φ ex e ) for notational simplicity. Given an initial primal vector x, the iterates are generated by x + = x + α v where v is the Newton step given as the solution to the following system of linear equations: 3 ) ) ) fx ) A v fx ) =. A w Ax b We let H = fx ) and h = Ax b for notational convenience. Solving for v and w in the preceding yields v = H fx ) + A w ), and AH A )w = h AH fx ). 3) Since the matrix H is a diagonal matrix with entries [H ] ee = φ e x e ) ), given the vector w, the Newton step v can be computed using local information. However, the computation of the vector w at a given primal vector x cannot be implemented in a decentralized manner in view of the fact that solving equation 3) AH A ) requires global information. The following section provides an iterative scheme to compute the vector w using local information. 3 This is essentially a primal-dual method with the vectors v and w acting as primal and dual steps; see Section III. ) Distributed Computation of the Newton Direction: Consider the vector w defined in Eq. 3). The ey step in developing a decentralized iterative scheme for the computation of the vector w is to recognize that the matrix AH A is the weighted Laplacian of the underlying graph G = N, E), denoted by L. Hence, L can be written as L = AH A = D B. Here B is an N N matrix with entries { ) φ e B ) ij = x e ) if e = i, j) E, otherwise, and D is an N N diagonal matrix with entries D ) ii = j N i B ) ij, where N i denotes the set of neighbors of node i, i.e., N i = {j N i, j) E}. Letting s = h AH fx ) for notational convenience, Eq. 3) can be then rewritten as I D + I) B + I))w = D + I) s. This motivates the following iterative scheme nown as splitting) to solve for w. For any t, the iterates are generated by wt + ) = D + I) B + I)wt) + D + I) s, where we suppressed the indices for notational convenience). Note that the matrix P := D + I) B + I) is row stochastic, and when the graph of the networ is connected, it is irreducible and aperiodic i.e., a primitive matrix) [9]. Furthermore, P is diagonally similar to a symmetric matrix, therefore all of its eigenvalues are real and by Perron- Frobenius theorem [] they satisfy = λ P ) > λ P ) λ n P ) >. Aside from one eigenvalue at corresponding to the allone eigenvector ), all other eigenvalues are subunit [9]. As a result, the projection of the dynamics to the orthogonal complement of the span of is stable. Let V be the n n ) dimensional matrix whose orthonormal n columns span, the orthogonal subspace to. Denote wt) = V wt), s = V s, where V V = I n, and V V = I n T n. The projected dynamics can now be written as wt + ) = V P V wt) + V D + I) V s. The above equation depicts the dynamics of a stable linear system with a constant step) input, which from basic linear systems theory is nown to converge to the solution of equation 3). The exponential convergence rate is determined by the convergence rate of the random wal wt + ) = P wt) to its stationary distribution. More precisely, the speed of convergence of wt) to the stationary distribution, depends on the eigenstructure of the probability transition weight) matrix P.

4 ) Convergence Rates: It is well nown that the rate of convergence to the stationary distribution of a Marov chain is governed by the second largest eigenvalue modulus of matrix P defined as µp ) = max i=,,n { λ i P ) } = max{λ P ), λ n P ) }. To mae this statement more precise, let i be the initial state and define the total variation distance between the distribution at time and the stationary distributionπ as i ) = Pij πj. j V The rate of convergence to the stationary distribution is measured using the following quantity nown as the mixing time: T mix = max min{ : i ) < e for all }. i The following theorem indicates the relationship between the mixing time of a Marov chain and the second largest eigenvalue modulus of its probability transition matrix [], []. Theorem : The mixing time of a reversible Marov chain with transition probability matrix W and second largest eigenvalue modulus µ satisfies µ µ) ln ) T mix + log n µ. Therefore, the speed of convergence of the Marov chain to its stationary distribution is determined by the value of µ nown as the spectral gap; the larger the spectral gap, the faster the convergence. This suggests that for each, the dual Newton step w can be computed faster on graphs with large spectral gap. The above convergence results indicate that the dual Newton step can be computed in a decentralized fashion with a diffusion scheme, or more precisely, with a discrete Poisson equation. Since our distributed solution involves an iterative scheme, exact computation is not feasible. In what follows, we show that the Newton method has desirable convergence properties even when the Newton direction is computed with some error provided that the errors are sufficiently small. III. INEXACT NEWTON METHOD In this section, we consider the following convex optimization problem with equality constraints: minimize fx) 4) subject to Ax = b, where f : R n R is a twice continuously differentiable convex function, and A is an m n matrix. The minimum cost networ optimization problem ) is a special case of this problem with fx) = E e= φ ex e ) and A is the N E node-edge incidence matrix. We denote the optimal value of this problem by f. Throughout this section, we assume that the value f is finite and problem 4) has an optimal solution, which we denote by x. We consider an inexact infeasible start) Newton method for solving problem 4) see [4]). In particular, we let y = x, ν) R n R m, where x is the primal variable and ν is the dual variable, and study a primal-dual method which updates the vector y at as follows: y + = y + α d, 5) where α is a positive stepsize, and the vector d is an approximate constrained Newton direction given by Dry )d = ry ) + ɛ. 6) Here, the residual function r : R n R m R n R m is defined as rx, ν) = r dual x, ν), r pri x, ν)), 7) where and r dual x, ν) = fx) + A ν, 8) r pri x, ν)) = Ax b. 9) Moreover, Dry) R n+m) n+m) is the gradient matrix of r evaluated at y, and the vector ɛ is an error vector at. We assume that the error sequence {ɛ } is uniformly bounded from above, i.e., there exists a scalar ɛ such that ɛ ɛ for all. ) We adopt the following standard assumption: Assumption : Let r : R n R m R n R m be the residual function defined in Eqs. 7)-9). Then, we have: a) Lipschitz Condition) There exists some constant L > such that Dry) Drȳ) L y ȳ y, ȳ R n R m. b) There exists some constant M > such that A. Basic Relation Dry) M y R n R m. We use the norm of the residual vector ry) to measure the progress of the algorithm. In the next proposition, we present a relation between the iterates ry ), which holds for any stepsize rule. The proposition follows from a multidimensional extension of the descent lemma see [3]). Proposition : Let Assumption hold. Let {y } be a sequence generated by the method 5). For any stepsize rule α, we have ry + ) α ) ry ) + M Lα ry ) +α ɛ + M Lα ɛ. Proof: We consider two vectors w R n R m and z R n R m. We let ξ be a scalar parameter and define the function gξ) = rw+ξz). From the chain rule, it follows that gξ) = Drw + ξz)z. Using the Lipschitz continuity of the

5 residual function gradient [cf. Assumption a)], we obtain: rw + z) rw) = g) g) = = z gξ)dξ Drw + ξz)zdξ Drw + ξz) Drw))zdξ + Drw)zdξ Drw + ξz) Drw) z dξ + Drw)z Lξ z dξ + Drw)z = L z + Drw)z. We apply the preceding relation with w = y and z = α d and obtain ry + α d ) ry ) α Dry )d + L α d. By Eq. 6), we have Dry )d = ry ) + ɛ. Substituting this in the previous relation, this yields ry + α d ) α )ry ) + α ɛ + L α d. Moreover, using Assumption b), we have d = Dry ) ry ) + ɛ ) Dry ) ry ) + ɛ M ry ) + ɛ ). Combining the above relations, we obtain ry + ) α ) ry ) + M Lα ry ) + α ɛ + M Lα ɛ, establishing the desired relation. B. Inexact Bactracing Stepsize Rule We use a bactracing stepsize rule in our method to achieve the superlinear local convergence properties of the Newton method. However, this requires computation of the norm of the residual function ry). In view of the distributed nature of the residual vector ry) [cf. Eq. 7)], this norm can be computed using a distributed consensus-based scheme. Since this scheme is iterative, in practice the residual norm can only be estimated with some error. Let {y } be a sequence generated by the inexact Newton method 5). At each, we assume that we can compute the norm ry ) with some error, i.e., we can compute a scalar n that satisfies n ry ) γ/, ) for some constant γ. Hence, n is an approximate version of ry ), which can be computed for example using distributed iterative methods. For fixed scalars σ, /) and, ), we set the stepsize α equal to α = m, where m is the smallest nonnegative integer that satisfies n + σ m )n + B + γ. ) Here, γ is the maximum error in the residual function norm [cf. Eq. )], and B is a constant given by B = ɛ + M Lɛ, 3) where ɛ is the upper bound on the error sequence in the constrained Newton direction [cf. Eq. )] and M and L are the constants in Assumption. C. Global Convergence of the Inexact Newton Method The next two sections provide convergence rate estimates for the damped Newton phase and the local convergence phase of the inexact Newton method when there are errors in the Newton direction and the bactracing stepsize. ) Convergence Rate for Damped Newton Phase: We first show a strict decrease in the norm of the residual function if the errors ɛ and γ are sufficiently small, as quantified in the following assumption. Assumption : The errors B and e [cf. Eqs. 3) and )] satisfy B + γ 6M L, where is the constant used in the inexact bactracing stepsize rule, and M and L are the constants defined in Assumption. Under this assumption, the next proposition establishes a strict decrease in the norm of the residual function as long as ry) > M L. Proposition : Let Assumptions and hold. Let {y } be a sequence generated by the method 5) when the stepsize sequence {α } is selected using the inexact bactracing stepsize rule [cf. Eq. )]. Assume that ry ) > M L. Then, we have ry + ) ry ) 6M L. Proof: For any, we define ᾱ = M Ln + γ/). In view of the condition on n [cf. Eq. )], we have M L ry ) + γ) ᾱ M <, 4) L ry ) where the last inequality follows by the assumption ry ) > M L. Using the preceding relation and substituting α = ᾱ in the basic relation in Proposition, we obtain: ry + ) ry ) + ᾱ ɛ + M Lᾱ ɛ ) ᾱ ry ) M Lᾱ ry ) ry ) + ᾱ ɛ + M Lᾱ ɛ ᾱ ry ) M ry ) ) L M L ry ) ) ᾱ ɛ + M Lᾱ ɛ + ᾱ ry ) ) B + ᾱ ry ),

6 where the second inequality follows from the definition of ᾱ and the third inequality follows by combining the facts ᾱ <, ɛ ɛ for all, and the definition of B. The constant σ used in the definition of the inexact bactracing line search satisfies σ, /), therefore, it follows from the preceding relation that ry + ) σᾱ ) ry ) + B. Using condition ) once again, this implies n + σᾱ )n + B + γ, showing that the steplength α selected by the inexact bactracing line search satisfies α ᾱ. From condition ), we have which implies n + σα )n + B + γ, ry + ) σᾱ ) ry ) + B + γ. Combined with Eq. 4), this yields ry + ) σ M L ry ) + γ) By Assumption, we also have γ B + γ 6M L, ) ry ) + B + γ. which in view of the assumption ry ) > M L implies that γ ry ). Substituting this in the preceding relation and using the fact α, /), we obtain ry + ) ry ) 8M L + B + γ. Combined with Assumption, this yields the desired result. The preceding proposition shows that, under the assumptions on the size of the errors, at each, we obtain a minimum decrease in the norm of the residual function) of 6M L, as long as ry ) > /M L. This establishes that we need at most 6 ry ) M L, s until we obtain ry ) /M L. ) Convergence Rate for Local Convergence Phase: In this section, we show that when ry ) /M L, the inexact bactracing stepsize rule selects a full step α = and the norm of the residual function ry ) converges quadratically within an error neighborhood, which is a function of the parameters of the problem as given in Assumption ) and the error level in the constrained Newton direction. Proposition 3: Let Assumption hold. Let {y } be a sequence generated by the method 5) when the stepsize sequence {α } is selected using the inexact bactracing stepsize rule [cf. Eq. )]. Assume that there exists some such that ry ) M L. Then, the inexact bactracing stepsize rule selects α =. We further assume that for some δ, /), B + M LB δ 4M L, where B is the constant defined in Eq. 3). Then, we have ry +m ) m M L + B + δ m ) M 5) L m for all m >. As a particular consequence, we obtain lim sup ry m ) B + δ m M L. Proof: We first show that if ry ) M L for some >, then the inexact bactracing stepsize rule selects α =. Replacing α = in the basic relation of Proposition and using the definition of the constant B, we obtain ry + ) M L ry ) + B ry ) + B σ) ry ) + B, where to get the last inequality, we used the fact that the constant σ used in the inexact bactracing stepsize rule satisfies σ, /). Using the condition on n [cf. Eq. )], this yields n + σ)n + B + γ, showing that the steplength α = satisfies condition ) in the inexact bactracing stepsize rule. We next show Eq. 5) using induction on the m. Using α = in the basic relation of Proposition, we obtain ry + ) ry ) + B 4M L + B, where the second inequality follows from the assumption ry ) M L. This establishes relation 5) for m =. We next assume that 5) holds for some m >, and show that it also holds for m +. Eq. 5) implies that ry +m ) 4M L + B + δ 4M L. Using the assumption B + M LB ry +m ) + δ 4M L < M L, δ 4M L, this yields where the strict inequality follows from δ, /). Hence, the inexact bactracing stepsize rule selects α +m =. Using α +m = in the basic relation, we obtain M L ry +m+ ) M L ry +m ) ) + B. Using Eq. 5), this implies that

7 M L ry +m+ ) + M LB + δ m ) ) + M LB m m = + M LB m+ m + δ m m+ +M L B + δ m ) ) M + M LB. L m Using algebraic manipulations and the assumption B + M LB, this yields δ 4M L ry +m+ ) m+ M L + B + δ m+ ) M, L m+ completing the induction and therefore the proof of relation 5). Taing the limit superior in Eq. 5) establishes the final result. IV. SIMULATION RESULTS Our simulation results demonstrate that the decentralized Newton significantly outperforms the dual subgradient method algorithm in terms of runtime. Simulations were conducted as follows: Networ flow problems with conservation constraints and the cost function Φx) = E e= φ ex e ) where φ e x e ) = x e ) 4 were generated on Erdös-Rényi random graphs with n =,, 8, and 6 nodes and an expected node degree, np = 5. We limit the simulations to optimization problems which are well behaved in the sense that the Hessian matrix remains well conditioned, defined as λmax λ min. For this subclass of problem, the runtime of the Newton s method algorithm is significantly less than the subgradient method for frequency Runtime Distribution n=6, np=5) Newton Algorithm Subgradient Algorithm runtime Sec) Fig.. Runtime Histogram, 6 node graphs with mean node degree 5 Runtime sec) Newton Algorithm Subgradient Algorithm Mean Runtime 5 trials) 3 4 # of nodes = * x 4 the cost function is motivated by the Kuramoto model of coupled nonlinear oscillators []. Fig. 3. Average Runtime, 5 samples each fx) Ax s x x* Networ Flow Optimization: Random Graph with 8 nodes Subgradient, runtime =.6854sec) Newton, runtime =.65446sec) 6 3 Fig.. Sample convergence trajectory for a random graph with 8 nodes and mean node degree 5. all trials. Note that the stopping criterion is also tested in a distributed manner for both algorithms. In any particular experiment the Newton s method algorithm discovers the feasible direction in one and proceeds to find the optimal solution in two to four additional s. One such experiment is shown in Figure. A sample runtime distribution for 5 trials is presented in Figure. The fact that of course Newton s method outperform the subgradient scheme is not surprising. Perhaps the surprising fact is that Newton s method outperforms subgradient scheme, despite the fact the computation of the dual Newton step is based on a discrete Poisson equation essentially diffusion with input). On average the Newton s method terminates in less than half of the subgradient runtime and exhibits a tighter variance. This is a representative sample with respect to varying the number of nodes. As shown in Figure 3, the Newton s method algorithm completes in significantly less time on average for all of the graphs evaluated. We also tested the performance of the proposed method on graphs with different connectivity properties. In particular, we considered a complete fully connected) graph and a sparse

8 Ax s fx) Networ Flow Optimization: Random Graph with Subgradient, runtime = 8.79sec) Newton, runtime =.383sec) to compute the dual Newton direction. We show that even when the Newton direction and stepsize are computed with some error, the method achieves superlinear convergence rate to an error neighborhood. Our simulation experiments on different graphs suggested the superiority of the proposed Newton scheme to standard dual subgradient methods. In ongoing wor, we extend this approach to Networ Utility Maximization NUM) problems. Furthermore, we believe new results in [7] will enable us to solve the discrete Poisson equation for the dual Newton step using modified PageRan algorithms in a more scalable fashion [6]. x x* Ax s x x* 5 fx) Fig. 4. Sample convergence trajectory for a complete graph Networ Flow Optimization: Random Graph with nodes Fig. 5. Subgradient, runtime =.48sec) Newton, runtime =.59789sec) Sample convergence trajectory for a sparse graph graph. Figures 4 and 5 compare the performance of the subgradient scheme with the Newton method. The Newton method outperforms the subgradient scheme in both cases. As expected, the performance gains are more significant for the complete graph since the dual Newton step can be computed efficiently on graphs with large spectral gap. V. CONCLUSIONS This paper develops a distributed Newton-type method for solving minimum cost networ optimization problems that can achieve the superlinear convergence rate within some error neighborhood. We show that due to the sparsity structure of the incidence matrix of a networ, the computation of the dual Newton step can be performed using a discrete Poisson equation. This enables using distributed consensus schemes REFERENCES [] D. Aldous, Some inequalities for reversible Marov chains, Journal of the London Mathematical Society 5 98), [] A. Berman and R. J. Plemmons, Nonnegative matrices in the mathematical sciences, Academic Press, New Yor, 979. [3] D.P. Bertseas, A. Nedić, and A.E. Ozdaglar, Convex analysis and optimization, Athena Scientific, Cambridge, Massachusetts, 3. [4] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, Cambridge, UK, 4. [5] M. Chiang, S.H. Low, A.R. Calderban, and J.C. Doyle, Layering as optimization decomposition: A mathematical theory of networ architectures, Proceedings of the IEEE 95 7), no., [6] F. Chung-Graham, PageRan as a discrete Green s function, preprint. [7], The heat ernel as the pageran of a graph, Proceedings of the National Academy of Sciences 4 7), no. 5, [8] R. Dembo, S. Eisenstat, and T. Steihaug, Inexact newton methods, SIAM Journal on Numerical Analysis 9 98), [9] R. Horn and C. R. Johnson, Matrix analysis, Cambridge University Press, New Yor, 985. [] A. Jadbabaie, J. Lin, and S. Morse, Coordination of groups of mobile autonomous agents using nearest neighbor rules, IEEE Transactions on Automatic Control 48 3), no. 6, 988. [] A. Jadbabaie, N. Motee, and M Barahona, On the stability of Kuramoto model of coupled nonlinear oscillators, Proceedings of the American Control Conference, June 4. [] C.T. Kelley, Iterative methods for linear and nonlinear equations, SIAM, Philadelphia, PA, 995. [3] F.P. Kelly, A.K. Maulloo, and D.K. Tan, Rate control for communication networs: shadow prices, proportional fairness, and stability, Journal of the Operational Research Society ), [4] S. Low and D.E. Lapsley, Optimization flow control, I: Basic algorithm and convergence, IEEE/ACM Transactions on Networing 7 999), no. 6, [5] A. Nedić and A. Ozdaglar, Approximate primal solutions and rate analysis for dual subgradient methods, SIAM Journal on Optimization, forthcoming 8). [6], Subgradient methods in networ resource allocation: Rate analysis, Proc. of CISS, 8. [7] R. Olfati-Saber and R.M. Murray, Consensus problems in networs of agents with switching topology and time-delays, IEEE Transactions on Automatic Control 49 4), no. 9, [8] A. Olshevsy and J.N. Tsitsilis, Convergence rates in distributed consensus averaging, Proceedings of IEEE CDC, 6. [9], Convergence speed in distributed consensus and averaging, SIAM Journal on Optimization, forthcoming 6). [] A. J. Sinclair, Improved bounds for mixing rates of Marov chains and multicommodity flow, Combinatorics, Probability and Computing 99), [] R. Sriant, Mathematics of Internet congestion control, Birhauser, 4.

A Distributed Newton Method for Network Utility Maximization

A Distributed Newton Method for Networ Utility Maximization Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie Abstract Most existing wor uses dual decomposition and subgradient methods to solve Networ Utility