A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

Size: px
Start display at page:

Download "A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS"

Transcription

1 A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable) convex objective function and Lipschitz continuous convex inequality constraint functions. A simple algorithm is developed and achieves an O(/t) convergence rate. Similar to the classical dual subgradient algorithm and the ADMM algorithm, the new algorithm has a parallel implementation when the objective and constraint functions are separable. However, the new algorithm has a faster O(/t) convergence rate compared with the best known O(/ t) convergence rate for the dual subgradient algorithm with primal averaging. Further, it can solve convex programs with nonlinear constraints, which cannot be handled by the ADMM algorithm. The new algorithm is applied to a multipath network utility maximization problem and yields a decentralized flow control algorithm with the fast O(/t) convergence rate. Key words. convex programs, parallel algorithms, convergence rates AMS subject classifications. 90C5, 90C30. Introduction. Fix positive integers n and m. Consider the general convex program: () () (3) minimize subject to f(x) g k (x) 0, k {,,..., m}, x X, where set X R n is a closed convex set; function f(x) is continuous and convex on X ; and functions g k (x), k {,,..., m} are convex and Lipschitz continuous on X. Note that the functions f(x), g (x),..., g m (x) are not necessarily differentiable. Denote the stacked vector of multiple functions g (x), g (x),..., g m (x) as g(x) = [ g (x), g (x),..., g m (x) ] T. The Lipschitz continuity of each gk (x) implies that g(x) is Lipschitz continuous on X. Throughout this paper, we use to denote the vector Euclidean norm and Lipschitz continuity is defined with respect to the Euclidean norm. The following assumptions are imposed on the convex program ()-(3): Assumption (Basic Assumptions). There exists a (possibly non-unique) optimal solution x X that solves the convex program ()-(3). There exists a constant β such that g(x ) g(x ) β x x for all x, x X, i.e., the Lipschitz continuous function g(x) has modulus β. Assumption (Existence of Lagrange Multipliers). The convex program ()-(3) has Lagrange multipliers attaining the strong duality. That is, there exists a Lagrange multiplier vector λ = [λ, λ,..., λ m] T 0 such that q(λ ) = min x X { f(x) : g k (x) 0, k {,,..., m} where q(λ) = min x X {f(x) + m k= λ kg k (x)} is the Lagrangian dual function of problem ()-(3). Using the methodology initiated in the current paper, we further developed a different primaldual type algorithm with the same O(/t) convergence rate in the extended work [4]. Department of Electrical Engineering, University of Southern California, Los Angeles, CA (yuhao@usc.edu, mjneely@usc.edu). },

2 Assumption is a mild assumption. For convex programs, Assumption is implied by the existence of a vector s X such that g k (s) < 0 for all k {,..., m}, called the Slater condition [, 5]. However, there are convex programs where Assumption holds but the Slater condition does not hold... New Algorithm. Consider the following algorithm described in Algorithm. The algorithm computes vectors x(t) X for iterations t {0,,,...}. Define the average over the first t > 0 iterations as x(t) = t t x(τ). The algorithm uses an initial guess vector that is represented as x( ) and that is chosen as any vector in X. The algorithm also uses vector variables Q(t) = [ Q (t),..., Q m (t) ] T in the computations. The main result of this paper is that, whenever the parameter α is chosen to satisfy α β /, the vector x(t) closely approximates a solution to the convex program and has an approximation error that decays like O(/t). Algorithm Let α > 0 be a constant parameter. Choose any x( ) X. Initialize Q k (0) = max{0, g k (x( ))}, k {,,..., m}. At each iteration t {0,,,...}, observe x(t ) and Q(t) and do the following: Choose x(t) as x(t) = argmin x X Update virtual queues via { f(x) + [ Q(t) + g(x(t )) ] T g(x) + α x x(t ) }, Q k (t + ) = max{ g k (x(t)), Q k (t) + g k (x(t))}, k {,,..., m}. Update the averages x(t) via x(t + ) = t + t t x(τ) = x(t) t + + x(t) t +. The variables x(t) are called primal variables. The variables Q(t) can be viewed as dual variables because they have a close connection to Lagrange multipliers. The variables Q(t) shall be called virtual queues because their update rule resembles a queueing equation. The virtual queue update used by Algorithm is related to traditional virtual queue and dual variable update rules. However, there is an important difference. A traditional update rule is Q k (t + ) = max{q k (t) + g k (x(t)), 0} [3, 6]. In contrast, the new algorithm takes a max with g k (x(t)), rather than a max with 0. The primal update rule for x(t) is also new and is discussed in more detail in the next subsection. Algorithm has the following desirable property: If the functions f(x) and g(x) are separable with respect to components or blocks of x, then the primal updates for x(t) can be decomposed into several smaller independent subproblems, each of which only involves a component or block of x(t)... The Dual Subgradient Algorithm and the Drift-Plus-Penalty Algorithm. The dual subgradient algorithm is a well known iterative technique that approaches optimality for certain strictly convex problems [3]. A modification of the dual subgradient algorithm that averages the resulting sequence of primal estimates

3 is known to solve general convex programs (without requiring strict convexity) and provides an O(/ t) convergence rate [5, 3, 7], where t is the number of iterations. Work [] improves the convergence rate of the dual subgradient algorithm to O(/t) in the special case when the objective function f(x) is strongly convex and second order differentiable and constraint functions g k (x) are second order differentiable and have bounded Jacobians. A recent work in [3] shows that the convergence rate of the dual subgradient algorithm with primal averaging is O(/t) without requiring the differentiability of f(x) and g k (x) but still requiring the strong convexity of f(x). (Further improvements are also possible under more restrictive assumptions, see [3].) The dual subgradient algorithm with primal averaging is also called the Drift-Plus-Penalty (DPP) algorithm. This is because it is a special case of a stochastic optimization procedure that minimizes a drift expression for a quadratic Lyapunov function [6]. One advantage of these dual subgradient and drift approaches is that the computations required at every iteration are simple and can yield parallel algorithms when f(x) and g k (x) are separable. Algorithm developed in the current paper maintains this simplicity on every iteration, but provides fast O(/t) convergence for general convex programs, without requiring strict convexity or strong convexity. For example, the algorithm has the O(/t) convergence rate in the special case of linear f(x). Algorithm is similar to a DPP algorithm, or equivalently, a classic dual subgradient algorithm with primal averaging, with the following distinctions:. The Lagrange multiplier ( virtual queue ) update equation for Q k (t) is modified to take a max with g k (x(t)), rather than simply project onto the nonnegative real numbers.. The minimization step augments the Q k (t) weights with g k (x(t )) values obtained on the previous step. These g k (x(t )) quantities, when multiplied by constraint functions g k (x), yield a cross-product term in the primal update. This cross term together with another newly introduced quadratic term in the primal update can cancel a quadratic term in an upper bound of the Lyapunov drift such that a finer analysis of the drift-plus-penalty leads to the fast O(/t) convergence rate. 3. A quadratic term, which is similar to a term used in proximal algorithms [9], is introduced. This provides a strong convexity pushback. The pushback is not sufficient to alone cancel the main drift components, but it cancels residual components introduced by the new g k (x(t )) weight..3. The ADMM Algorithm. The Alternating Direction Method of Multipliers (ADMM) is an algorithm used to solve linear equality constrained convex programs in the following form: (4) (5) (6) minimize f (x) + f (y) subject to Ax + By = c, x X, y Y. In [5, 3, 7], the dual subgradient algorithm with primal averaging is proven to achieve an ɛ-approximate solution with O(/ɛ ) iterations by using an O(ɛ) step size. We say that the dual subgradient algorithm with primal averaging has an O(/ t) convergence rate because an algorithm with O(/ t) convergence requires the same O(/ɛ ) iterations to yield an ɛ-approximate solution. However, the dual subgradient algorithm in [5, 3, 7] does not have vanishing errors as does an algorithm with O(/ t) convergence.

4 Define the augmented Lagrangian as L ρ (x, y, λ) = f (x) + f (y) + λ T ( Ax + By c ) + ρ Ax + By c. At each iteration t, the ADMM algorithm consists of the following steps: Update x(t) = argmin x X L ρ (x, y(t ), λ(t )). Update y(t) = argmin y Y L ρ (x(t), y, λ(t )). Update λ(t) = λ(t ) + ρ ( Ax(t) + By(t) c ). Thus, the ADMM algorithm yields a distributed algorithm where the updates of x and y only involve local sub-problems and is suitable to solve large scale convex programs in machine learning, network scheduling, computational biology and finance [4]. The best known convergence rate of ADMM algorithm for convex program with general convex f ( ) and f ( ) is recently shown to be O(/t) [6, 9]. An asynchronous ADMM algorithm with the same O(/t) convergence rate is studied in []. Note that we can apply Algorithm to solve the problem (4)-(6) after replacing the equality constraint Ax + By = c by two linear inequality constraints Ax + By c and Ax + By c. It can be observed that the algorithm yielded by Algorithm is also separable for x and y. In fact, the updates of x and y in Algorithm are fully parallel while the ADMM algorithm updates x and y sequentially. The remaining part of this paper shows that the convergence rate of Algorithm is also O(/t). However, a significant limitation of the ADMM algorithm is that it can only solve problems with linear constraints. In contrast, Algorithm proposed in this paper can solve general convex programs with non-linear constraints..4. Decentralized Multipath Network Flow Control Problems. Section 4 presents an example application to multipath network flow control problems. The algorithm has a queue-based interpretation that is natural for networks. Prior work on distributed optimization for networks is in [, 5,, 3, ]. It is known that the DPP algorithm achieves O(/ t) convergence for general networks [5], and several algorithms show faster O(/t) convergence for special classes of strongly convex problems [, 3]. However, multipath network flow problems fundamentally fail to satisfy the strong convexity property because they have routing variables that participate in the constraints but not in the objective function. The algorithm of the current paper does not require strong convexity. It easily solves this multi-path scenario with fast O(/t) convergence. The algorithm also has a simple distributed implementation and allows for general convex but nonlinear constraint functions.. Preliminaries and Basic Analysis. This section presents useful preliminaries in convex analysis and important facts of Algorithm... Preliminaries. Definition (Lipschitz Continuity). Let X R n be a convex set. Function h : X R m is said to be Lipschitz continuous on X with modulus L if there exists L > 0 such that h(y) h(x) L y x for all x, y X. Definition (Strongly Convex Functions). Let X R n be a convex set. Function h is said to be strongly convex on X with modulus α if there exists a constant α > 0 such that h(x) α x is convex on X. In fact, we do not need to replace the linear equality constraint with two inequality constraints. We can apply Algorithm to problem (4)-(6) directly by modifying the virtual queue update equations as Q k (t + ) = Q k (t) + g k (x(t)), k {,,..., m}. In this case, a simple adaption of the convergence rate analysis in this paper can establish the same O(/t) convergence rate. However, to simplify the presentation, this paper just considers the general convex program in the form of ()-(3) since any linear equality can be equivalently represented by two linear inequalities.

5 By the definition of strongly convex functions, it is easy to show that if h(x) is convex and α > 0, then h(x) + α x x 0 is strongly convex with modulus α for any constant x 0. Lemma (Theorem 6.. in [7]). Let h(x) be strongly convex on X with modulus α. Let h(x) be the set of all subgradients of h at point x. Then h(y) h(x) + d T (y x) + α y x for all x, y X and all d h(x). Lemma (Proposition B.4 (f) in [3]). Let X R n be a convex set. Let function h be convex on X and x opt be a global minimum of h on X. Let h(x) be the set of all subgradients of h at point x. Then, there exists d h(x opt ) such that d T (x x opt ) 0 for all x X. Corollary. Let X R n be a convex set. Let function h be strongly convex on X with modulus α and x opt be a global minimum of h on X. Then, h(x opt ) h(x) α xopt x for all x X. Proof. A special case when h is differentiable and X = R n is Theorem..8 in [8]. The proof for general h and X is as follows: Fix x X. By Lemma, there exists d h(x opt ) such that d T (x x opt ) 0. By Lemma, we also have h(x) h(x opt ) + d T (x x opt ) + α x xopt (a) h(x opt ) + α x xopt, where (a) follows from the fact that d T (x x opt ) 0... Properties of the Virtual Queues. Lemma 3. In Algorithm, we have. At each iteration t {0,,,...}, Q k (t) 0 for all k {,,..., m}.. At each iteration t {0,,,...}, Q k (t) + g k (x(t )) 0 for all k {,..., m}. 3. At iteration t = 0, Q(0) g(x( )). At each iteration t {,,...}, Q(t) g(x(t )). Proof.. Fix k {,,..., m}. Note that Q k (0) 0 by the initialization rule Q k (0) = max{0, g k (x( ))}. Assume Q k (t) 0 and consider time t+. If g k (x(t)) 0, then Q k (t + ) = max{ g k (x(t)), Q k (t) + g k (x(t))} Q k (t) + g k (x(t)) 0. If g k (x(t)) < 0, then Q k (t + ) = max{ g k (x(t)), Q k (t) + g k (x(t))} g k (x(t)) > 0. Thus, Q k (t + ) 0. The result follows by induction.. Fix k {,,..., m}. Note that Q k (0) + g k (x( )) 0 by the initialization rule Q k (0) = max{0, g k (x( ))} g k (x( )). For t, by the virtual queue update equation, we have Q k (t) = max{ g k (x(t )), Q k (t ) + g k (x(t ))} g k (x(t )), which implies that Q k (t) + g k (x(t )) For t = 0. Fix k {,,..., m}. Consider the cases g k (x( )) 0 and g k (x( )) < 0 separately. If g k (x( )) 0, then Q k (0) = max{0, g k (x( ))} = 0 and so Q k (0) g k (x( )). If g k (x( )) < 0, then Q k (0) = max{0, g k (x( ))} = g k (x( )). Thus, in both cases, we have Q k (0) g k (x( )). Squaring both sides and summing over k {,,..., m} yields Q(0) g(x( )).

6 For t. Fix k {,,..., m}. Consider the cases g k (x(t )) 0 and g k (x(t )) < 0 separately. If g k (x(t )) 0, then Q k (t) = max{ g k (x(t )), Q k (t ) + g k (x(t ))} Q k (t ) + g k (x(t )) (a) g k (x(t )) = g k (x(t )), where (a) follows from part. If g k (x(t )) < 0, then Q k (t) = max{ g k (x(t )), Q k (t ) + g k (x(t ))} g k (x(t )) = g k (x(t )). Thus, in both cases, we have Q k (t) g k (x(t )). Squaring both sides and summing over k {,,..., m} yields Q(t) g(x(t ))..3. Properties of the Drift. Recall that Q(t) = [ Q (t),..., Q m (t) ] T is the vector of virtual queue backlogs. Define L(t) = Q(t). The function L(t) shall be called a Lyapunov function. Define the Lyapunov drift as (7) (t) = L(t + ) L(t) = [ Q(t + ) Q(t) ]. Lemma 4. At each iteration t {0,,,...} in Algorithm, an upper bound of the Lyapunov drift is given by (8) (t) Q T (t)g(x(t)) + g(x(t)). Proof. The virtual queue update equations Q k (t + ) = max{ g k (x(t)), Q k (t) + g k (x(t))}, k {,,..., m} can be rewritten as (9) where { g k (x(t)) = Q k (t + ) = Q k (t) + g k (x(t)), k {,,..., m}, g k (x(t)), Q k (t) g k (x(t)), if Q k (t) + g k (x(t)) g k (x(t)) else Fix k {,,..., m}. Squaring both sides of (9) and dividing by yields: (Q k(t + )) = (Q k(t)) + ( gk (x(t)) ) + Qk (t) g k (x(t)) = (Q k(t)) + ( gk (x(t)) ) + Qk (t)g k (x(t)) + Q k (t) ( g k (x(t)) g k (x(t)) ) (a) = (Q k(t)) + (x(t)) ) + Qk (t)g k (x(t)) ( gk ( g k (x(t)) + g k (x(t)) )( g k (x(t)) g k (x(t)) ) = (Q k(t)) ( gk (x(t)) ) + Qk (t)g k (x(t)) + ( g k (x(t)) ) (Q k(t)) + Q k (t)g k (x(t)) + ( g k (x(t)) ), k.

7 where (a) follows from the fact that Q k (t) ( g k (x(t)) g k (x(t)) ) = ( g k (x(t)) + g k (x(t)) )( g k (x(t)) g k (x(t)) ), which can be shown by considering g k (x(t)) = g k (x(t)) and g k (x(t)) g k (x(t)). Summing over k {,,..., m} yields Q(t + ) Q(t) + Q T (t)g(x(t)) + g(x(t)). Rearranging the terms yields the desired result. 3. Convergence Rate Analysis of Algorithm. This section analyzes the convergence rate of Algorithm for the problem ()-(3). 3.. An Upper Bound of the Drift-Plus-Penalty Expression. Lemma 5. Let x be an optimal solution of the problem ()-(3). If α β in Algorithm, then for all t 0, we have (t) + f(x(t)) f(x ) + α [ x x(t ) x x(t) ] + [ g(x(t)) g(x(t )) ], where β is defined in Assumption. Proof. Fix t 0. Note that Lemma 3 implies that Q(t)+g(x(t )) is componentwise nonnegative. Hence, the function f(x) + [ Q(t) + g(x(t )) ] T g(x) is convex with respect to x on X. Since α x x(t ) is strongly convex with respect to x with modulus α, it follows that f(x) + [ Q(t) + g(x(t )) ] T g(x) + α x x(t ) is strongly convex with respect to x with modulus α. Since x(t) is chosen to minimize the above strongly convex function, by Corollary, we have (0) f(x(t)) + [ Q(t) + g(x(t )) ] T g(x(t)) + α x(t) x(t ) f(x ) + [ Q(t) + g(x(t )) ] T g(x ) +α x x(t ) α x x(t) }{{} 0 (a) f(x ) + α x x(t ) α x x(t), where (a) follows by using the fact that g k (x ) 0 for all k {,,..., m} and Q k (t) + g k (x(t )) 0 (i.e., part in Lemma 3) to eliminate the term marked by an underbrace. Note that u T u = [ u + u u u ] for any u, u R m. Thus, we have () g(x(t )) T g(x(t)) = [ g(x(t )) + g(x(t)) g(x(t )) g(x(t)) ].

8 Substituting () into (0) and rearranging terms yields f(x(t)) + Q T (t)g(x(t)) f(x ) + α x x(t ) α x x(t) α x(t) x(t ) + g(x(t )) g(x(t)) g(x(t )) g(x(t)) (a) f(x ) + α x x(t ) α x x(t) + ( β α) x(t) x(t ) g(x(t )) g(x(t)) (b) f(x ) + α x x(t ) α x x(t) g(x(t )) g(x(t)), where (a) follows from the fact that g(x(t )) g(x(t)) β x(t) x(t ), which further follows from the assumption that g(x) is Lipschitz continuous with modulus β; and (b) follows from the fact α β. Summing (8) with the above inequality yields (t) + f(x(t)) f(x ) + α [ x x(t ) x x(t) ] + [ g(x(t)) g(x(t )) ]. 3.. Objective Value Violations. Lemma 6. Let x be an optimal solution of the problem ()-(3) and β be defined in Assumption.. If α β in Algorithm, then for all t, we have t f(x(τ)) tf(x ) + α x x( ).. If α > β in Algorithm, then for all t, we have t f(x(τ)) tf(x ) + α x x( ) + α α β g(x ) Q(t). Proof. By Lemma 5, we have (τ) + f(x(τ)) f(x ) + α[ x x(τ ) x x(τ) ] + [ g(x(τ)) g(x(τ )) ] for all τ {0,,,...}. Summing over τ {0,,..., t } yields t t t (τ) + f(x(τ)) tf(x ) + α [ x x(τ ) x x(τ) ] + t [ g(x(τ)) g(x(τ )) ]. Recalling that (τ) = L(τ + ) L(τ) and simplifying summations yields t L(t) L(0) + f(x(τ)) tf(x ) + α x x( ) α x x(t ) + g(x(t )) g(x( )).

9 Rearranging terms; and substituting L(0) = Q(0) and L(t) = Q(t) yields t f(x(τ)) tf(x ) + α x x( ) α x x(t ) + g(x(t )) g(x( )) + Q(0) Q(t) () (a) tf(x ) + α x x( ) α x x(t ) + g(x(t )) Q(t), where (a) follows from the fact that Q(0) g(x( )), i.e., part 3 in Lemma 3. Next, we present the proof of both parts:. This part follows from the observation that the equation () can be further simplified as t f(x(τ)) (a) tf(x ) + α x x( ) + g(x(t )) Q(t) (b) tf(x ) + α x x( ), where (a) follows by ignoring the non-positive term α x x(t ) on the right side and (b) follows from the fact that Q(t) g(x(t )), i.e., part 3 in Lemma 3.. This part follows by rewriting the equation () as t f(x(τ)) tf(x ) + α x x( ) α x x(t ) + g(x(t )) g(x ) + g(x ) Q(t) =tf(x ) + α x x( ) α x x(t ) + g(x(t )) g(x ) + g T (x )[g(x(t )) g(x )] + g(x ) Q(t) (a) tf(x ) + α x x( ) α x x(t ) + g(x(t )) g(x ) + g(x ) g(x(t )) g(x ) + g(x ) Q(t) (b) tf(x ) + α x x( ) α x x(t ) + β x x(t ) + β g(x ) x x(t ) + g(x ) Q(t) =tf(x ) + α x x( ) ( α β)[ x x(t ) β α g(x ) β α + α β g(x ) Q(t) ]

10 (c) tf(x ) + α x x( ) α + α β g(x ) Q(t), where (a) follows from Cauchy-Schwarz inequality; (b) follows from the fact that g(x(t )) g(x ) β x x(t ), which further follows from the assumption that g(x) is Lipschitz continuous with modulus β; and (c) follows from the fact that α > β. Theorem (Objective Value Violations). Let x be an optimal solution of the problem ()-(3). If α β in Algorithm, for all t, we have where β is defined in Assumption. f(x(t)) f(x ) + α t x x( ), Proof. Fix t. By part in Lemma 6, we have Since x(t) = t that t f(x(τ)) tf(x ) + α x x( ) t f(x(τ)) f(x ) + α t t x x( ). t x(τ) and f(x) is convex, by Jensen s inequality it follows f(x(t)) t f(x(τ)). t The above theorem shows that the error gap between f(x(t)) and the optimal value f(x ) is at most O(/t). This holds for any initial guess vector x( ) X. Of course, choosing x( ) close to x is desirable because it reduces the coefficient α x x( ) Constraint Violations. Lemma 7. Let Q(t), t {0,,...} be the sequence generated by Algorithm. For any t, t Q k (t) g k (x(τ)), k {,,..., m}. Proof. Fix k {,,..., m} and t. For any τ {0,..., t } the update rule of Algorithm gives: Q k (τ + ) = max{ g k (x(τ)), Q k (τ) + g k (x(τ))} Q k (τ) + g k (x(τ)). Hence, Q k (τ + ) Q k (τ) g k (x(τ)). Summing over τ {0,..., t } and using Q k (0) 0 gives the result. Lemma 8. Let x be an optimal solution of the problem ()-(3) and λ be a Lagrange multiplier vector satisfying Assumption. Let x(t), Q(t), t {0,,...} be sequences generated by Algorithm. Then, t f(x(τ)) tf(x ) λ Q(t), t.

11 Proof. The proof is similar to a related result in [3] for the DPP algorithm. Define Lagrangian dual function q(λ) = min {f(x) + m x X k= λ kg k (x)}. For all τ {0,,...}, by Assumption, we have f(x ) = q(λ ) (a) f(x(τ)) + m λ kg k (x(τ)), k= where (a) follows the definition of q(λ ). Thus, we have f(x(τ)) f(x ) m λ kg k (x(τ)), τ {0,,...}. k= Summing over τ {0,,..., t } yields t t m f(x(τ)) tf(x ) λ kg k (x(τ)) =tf(x ) (a) tf(x ) k= m [ t λ k k= m λ kq k (t) k= (b) tf(x ) λ Q(t), ] g k (x(τ)) where (a) follows from Lemma 7 and the fact that λ k 0, k {,,..., m}; and (b) follows from the Cauchy-Schwarz inequality. Lemma 9. Let x be an optimal solution of the problem ()-(3) and λ be a Lagrange multiplier vector satisfying Assumption. If α > β in Algorithm, then for all t, the virtual queue vector satisfies Q(t) λ + α α x x( ) + α g(x ), β where β is defined in Assumption. Proof. Fix t. By part in Lemma 6, we have t f(x(τ)) tf(x ) + α x x( ) α + α β g(x ) Q(t). By Lemma 8, we have t f(x(τ)) tf(x ) λ Q(t). Combining the last two inequalities and cancelling the common term tf(x ) on both

12 sides yields Q(t) ( α x x( ) + ( Q(t) λ ) λ + α x x( ) + Q(t) λ + λ + α x x( ) + (a) Q(t) λ + α x x( ) + α α β g(x ) ) λ Q(t) α α g(x ) β α α β g(x ) α α β g(x ), where (a) follows from the basic inequality a + b + c a + b + c for any a, b, c 0. Theorem (Constraint Violations). Let x be an optimal solution of the problem ()-(3) and λ be a Lagrange multiplier vector satisfying Assumption. If α > β in Algorithm, then for all t, the constraint functions satisfy g k (x(t)) ( λ + α ) α x x( ) + t α g(x ), k {,,..., m}, β where β is defined in Assumption. Proof. Fix t and k {,,..., m}. Recall that x(t) = t g k (x(t)) (a) t g k (x(τ)) t (b) Q k(t) t Q(t) t (c) t ( λ + α x x( ) + t x(τ). Thus, α ) α g(x ), β where (a) follows from the convexity of g k (x), k {,,..., m} and Jensen s inequality; (b) follows from Lemma 7; and (c) follows from Lemma Convergence Rate of Algorithm. The next theorem summarizes the last two subsections. Theorem 3. Let x be an optimal solution of the problem ()-(3) and λ be a Lagrange multiplier vector satisfying Assumption. If α > β in Algorithm, then for all t, we have f(x(t)) f(x ) + α t x x( ), g k (x(t)) ( λ + α ) α x x( ) + t α g(x ), k {,,..., m}, β where β is defined in Assumption. In summary, Algorithm ensures error decays like O(/t) and provides an ɛ-approximate solution with convergence time O(/ɛ).

13 4. Application: Decentralized Network Utility Maximization. This section considers the application of Algorithm to decentralized multipath network utility maximization problems. 4.. Decentralized Multipath Flow Control. Network flow control can be formulated as the convex optimization of maximizing network utility subject to link capacity constraints [8]. In this view, many existing TCP protocols can be interpreted as distributed solutions to network utility maximization (NUM) problems [0]. In single path network flow control problems, if the utility functions are strictly convex, work [] shows that the dual subgradient algorithm (with convergence rate O(/ t)) can yield a distributed flow control algorithm. If the utility functions are only convex but not necessarily strictly convex, the DPP algorithm (or dual subgradient algorithm with primal averaging) can yield a distributed flow control algorithm with an O(/ t) convergence rate [5, 3, 7]. If utility functions are strongly convex, a faster network flow control algorithm with an O(/t) convergence rate is proposed in []. A recent work [3] shows that the distributed network flow control based on the DPP algorithm also has convergence rate O(/t) if utility functions are strongly convex. Other Newton method based distributed algorithms for network flow control with strictly convex utility functions are considered in []. However, in multipath network flow control problems, even if the utility function is strictly or strongly convex with respect to the source rate, it is no longer strictly or strongly convex with respect to path rates. Thus, many of the above algorithms requiring strict or strong convexity can no longer be applied. The DPP algorithm can still be applied but the convergence rate is only O(/ t). Distributed algorithms based on the primal-dual subgradient method, also known as the Arrow-Hurwicz- Uzawa subgradient method, have been considered in [0]. However, the convergence rate 3 of the primal-dual subgradient method for general convex programs without strong convexity is known to be O(/ t) [4]. As shown in the previous sections, Algorithm has an O(/t) convergence rate for general convex programs and yields a distributed algorithm if the objective function and constraint functions are separable. The next subsection applies Algorithm to the multipath network flow control problem. The resulting algorithm has structural properties and implementation characteristics similar to the subgradient-based algorithm of [0] and to the DPP algorithm of [3], but has additional decision variables due to the multipath formulation. (The algorithms in [0, 3] rely on strict and strong convexity of the objective function, respectively, and apply only to single path situations.) 4.. Decentralized Multipath Flow Control Based on Algorithm. Suppose there are S sources enumerated by S = {,,..., S} and L links enumerated by L = {,,..., L}. Each link l L has a link capacity of c l bits/slot. Each source sends data from a specific origin to a specific destination, and has multiple path options. The paths for each source can use overlapping links, and they can also overlap with paths of other sources. Further, two distinct sources can have identical paths. However, it is useful to give distinct path indices to paths associated with each source (even if their corresponding paths are physically the same). Specifically, for each source s, define P s as the set of path indices used by source s. The index sets 3 Similar to the dual subgradient method, the primal-dual subgradient method has an O(/ t) convergence rate for general convex programs in the sense that O(/ɛ ) iterations are required to obtain an ɛ-approximate solution. However, the primal-dual subgradient method does not have vanishing errors.

14 P,..., P S are disjoint and P P P S = K = {,,..., K}, where K is the number of path indices. For each source s, let U s (y s ) be a real-valued, concave, continuous and nondecreasing utility function defined for y s 0. This represents the satisfaction source s receives by communicating with a total rate of y s, where the total rate sums over all paths it used. Note that different paths can share links in common. Define D l K as the set of paths that use link l and E k L as the set of links used by path k. Let x = [x,..., x K ] T be the vector that specifies the flow rate on each path; and y = [y,..., y S ] T be the vector that specifies the rate of each source. The goal is to allocate flow rates on each path so that no link is overloaded and network utility is maximized. This multipath network utility maximization problem can be formulated as follows: S (3) maximize U s (y s ) (4) (5) (6) (7) subject to s= x k c l, l L, k D l y s = x k, s S, k P s 0 x k x max k, k K, 0 y s ys max, s S, where x max and y max are the allowed maximum path rate and maximum source rate, respectively. The expression (3) represents the network utility; inequality (4) specifies the link capacity constraints; and equality (5) enforces the definition of y s. The linear equality constraints (5) can be formally treated by writing each one as two linear inequality constraints. However, since the utility functions U s ( ) are nondecreasing and seek to maximize the y s values, it is clear that the above problem is equivalent to the following: (8) (9) (0) minimize subject to S U s (y s ) s= x k c l, l L, k D l y s x k, s S, k P s () 0 x k x max k, k K, () 0 y s ys max, s S. [ ] x Note that the constraints (9)-(0) are linear and can be written as A [ ] y c, where A is an (L+S) (K +S) matrix of which each entry is in {0, ±}. This 0 [ ] [ ] x c inequality constraint A can be treated as the inequality constraint y 0 [ ] [ ] x c () defined as g(z) = Az b 0 with z = and b = in the general y 0

15 convex program ()-(3). The box constraints ()-() can be treated as the set constraint (3) in the general convex program ()-(3). Thus, the multipath network utility maximization problem (8)-() is a special case of the general convex program ()-(3). The Lipschitz continuity of g(z) is summarized in the next lemma. Lemma 0. If the constraints (9)-(0) are written as g(z) 0, then we have. The function g(z) is Lipschitz continuous with modulus β S + K + d k, k K where d k is the length, i.e., number of hops, of path k K.. The function g(z) is Lipschitz continuous with modulus β (L + )K + S. Proof.. Recall that g(z) = Az b is Lipschitz continuous with modulus β = σ max (A), where σ max (A) is the maximum singular value of matrix A, and a simple upper bound of σ max (A) is the Frobenius norm given by A F = tr(a T A) = i,j A ij. Note that A can be written as [ R 0L S A = T where R is a {0, } matrix of size L K, 0 L S is an L S zero matrix, T is a {0, } matrix of size S K and I S is an S S identity matrix. Note that the (l, k)-th entry of R is if and only if path k uses link l; the (s, k)-th entry of T is if and only if path k is a path for source s, i.e, k P s. Matrix R has k K d k non-zero entries since each column has exactly d k non-zero entries. Matrix T has K non-zero entries since there are in total K network paths. Thus, matrix A in total has S + K + k K d k non-zero entries whose absolute values are equal to. It follows that β S + K + k K d k.. This part follows from the fact that the length of each path is at most L. Note that the second bound in the above lemma is more loose than the first one but holds regardless of the flow path configurations in the network. A straightforward application of Algorithm yields the following decentralized network flow control algorithm described in Algorithm. Similar to the flow control based on the dual subgradient algorithm, Algorithm is decentralized and can be easily implemented within the current TCP protocols [0].( By Theorem 3, if we choose α S + K + k K d k), then for any t, I S ], (3) (4) (5) S S U s (y s (t)) U s (ys) O(/t), s= s= x k (t) c l + O(/t), l L, k D l y s (t) x k (t) + O(/t), s S, k P s where (x, y ) is an optimal solution of the problem (8)-(). Note that if Algorithm has been run for a sufficiently long time and we are satisfied with the current

16 Algorithm Initialization: Let x k ( ) [0, x max k ], k K be arbitrary, y s ( ) [0, ys max ], s S be arbitrary, Q l (0) = max{0, k D l x k ( ) + c l }, l L, Y l (0) = Q l (0) + k D l x k ( ) c l, l L, R s (0) = max{0, k P s x k ( ) y s ( )}, s S and Z s (0) = R s (0) + y s ( ) k P s x k ( ), s S. Each link l s algorithm: At each time t {0,,...}, link l does the following:. Receive the path rates x k (t) that use link l. Update Q l (t) via: { Q l (t + ) = max } x k (t) + c l, Q l (t) + x k (t) c l. k D l k D l. The price of this link is given by Y l (t+) = Q l (t+)+ k D l x k (t) c l. 3. Communicate the link price Y l (t + ) to sources that use link l. Each source s s algorithm: At each time t {0,,...}, source s does the following:. Receive from the network the link prices Y l (t) for all links l that are used by any path of source s.. Update the path rates x k, k P s by { [ ] x k (t) = argmin Y l (t) Z s (t) xk + α(x k x k (t )) } 0 x k x max k l E k [ = x k (t ) ( ) ] x max k Y l (t) Z s (t), α 0 l E k where [z] b a = min{max{z, a}, b}. 3. Communicate the path rates x k (t), k P s to all links that are used by path k. 4. Update the source rate y s (t) by y s (t) = argmin 0 y s y max s { U s (y s ) + Z s (t)y s + α(y s y s (t )) }, which usually has a closed form solution for differentiable utilities by taking derivatives. 5. Update virtual queue R s (t) and source price Z s (t) locally by { R s (t + ) = max y s (t) + } x k (t), R s (t) + y s (t) x k (t), k P s k P s Z s (t + ) = R s (t + ) + y s (t) x k (t). k P s t t performance, then we can fix x = t x(τ) and y = t y(τ) such that the performance is still within O(/t) sub-optimality for all future time Decentralized Joint Flow and Power Control. Since Algorithm allows for general nonlinear convex constraint functions, it can also be applied to solve the joint flow and power control problem. In this case, the capacity of each link l is not fixed but depends concavely on a power allocation variable p l. Assuming each

17 link capacity is logarithmic in p l results in the following problem: (6) (7) (8) (9) (30) (3) maximize subject to S U s (y s ) s= L V l (p l ) l= x k log( + p l ), l L, k D l y s x k, s S, k P s 0 x k x max k, k K, 0 y s ys max, s S, 0 p l p max l, l L, where p l is the power allocated at link l, log( + p l ) is the corresponding link capacity as a function of p l, and V l (p l ) is the associated power cost (assumed to be a convex function of p l ). A decentralized joint flow and power control algorithm for this problem can be similarly developed by applying Algorithm. 5. Numerical Results. This section considers numerical experiments to verify the convergence rate results shown in this paper. 5.. Decentralized Multiplath Flow Control. Consider the simple multipath network flow problem described in Figure. Assume each link has capacity. Let y, y and y 3 be the data rates of source, and 3; x, x, x 3, x 4, x 5, x 6 and x 7 be the data rates of the paths indicated in the figure; and the network utility be maximizing log(y ) + log(y ) + log(y 3 ). The NUM problem can be formulated as follows: where R = maximize log(y ) + log(y ) + log(y 3 ) subject to Rx c, y Tx, 0 x i x max i, i {,,..., 4}, 0 y i yi max, i {,, 3},, c =, T = The optimal value to this NUM problem is f = To verify the convergence of Algorithm, Figure shows ( the values of objective and constraint functions yielded by Algorithm with α = K+S+ k K d k) + = 0 and x( ) = 0. (By writing constraints Rx c and y Tx in the compact form Az b, it can be checked that β = σ max (A) = If we choose a smaller α, e.g., α = β + = , then Algorithm converges even faster. In this simulation, we choose a loose α whose value can be easily estimated from Lemma 0 without.

18 knowing the detailed network topology.) We also compare our algorithm with the dual subgradient algorithm (with primal averaging) with step size 0.0 in [3]. (Or equivalently, the DPP algorithm with V = 00 in [5, 7, 3].) Recall that the dual subgradient algorithm in [5, 3, 7, 3] does not converge to an exact optimal solution but only converges to an approximate solution with an error level determined by the step size. In contrast, our algorithm can eventually converge to the exact optimality. Figure shows that Algorithm converges faster than the dual subgradient algorithm with primal averaging. To verify the convergence rate of Algorithm, Figure 3 plots f(x(t)) f, all constraint values, function /t, and bounds from Theorem 3 with both x-axis and y-axis in log 0 scales. It can be observed that the curves of f(x(t)) f and all the source rate constraint values are parallel to the curve of /t for large t. Note that all the link capacity constraints are satisfied early (i.e., negative), and hence are not drawn in log 0 scales. Figure 3 verifies that the error of Algorithm decays like O(/t) and suggests that it is actually Θ(/t) for this multipath NUM problem. x l 4 x 3 source l 8 source l x l l 3 l 5 x 4 x 5 l 7 x 6 x 7 l 9 l 6 source 3 Fig.. A simple multipath NUM problem with 3 sources and 7 paths. 5.. Decentralized Joint Flow and Power Control. Consider the joint flow and power control over the same network described in Figure. We assume the power cost of each link is given by V l (p l ) = 0.5p l. The optimal value of this joint flow and power control problem is f = To verify the convergence of Algorithm, Figure 4 shows the values of objective and constraint functions yielded by Algorithm with α = 0, x( ) = 0, y( ) = 0 and p( ) = 0. (In fact, by writing constraints Rx log( + p) and y Tx in the compact form g(z) 0, it can be checked that β =.59. If we choose a smaller α, e.g., α = β + = 4.86, then Algorithm converges even faster.) We also compare our algorithm with the dual subgradient algorithm (with primal averaging) with step size 0.0 in [5, 3, 7, 3]. Figure 4 shows that Algorithm converges faster than the dual subgradient algorithm with primal averaging. To verify the convergence rate of Algorithm, Figure 5 plots f(x(t)) f, all constraint values, function /t, and bounds from Theorem 3 with both x-axis and y-axis in log 0 scales. It can be observed that the curves of f(x(t)) f and all the source rate constraint values are parallel to the curve of /t for large t.

19 Source Rate Cons Value Objective Value Link Capacity Cons Value Iterations: t Iterations: t Iterations: t Fig.. The convergence of Algorithm and the dual subgradient algorithm with primal averaging for a multipath flow control problem Quadratic Programs. Consider the following quadratic program min s.t. x T Px + c T x x T Qx + d T x e x min x x max where P and Q are positive semidefinite to ensure the convexity of the quadratic program. We randomly generate a large scale example where x R 00, P R is diagonal with entries from uniform [0, 4], c R 00 with entries from uniform [ 5, 0], Q R is diagonal with entries from uniform [0, ], d R 00 with entries from uniform [, ], e is a scalar from uniform [4, 5], x min = 0 and x max =. Note that Algorithm and the dual subgradient algorithm (with primal averaging) can deal with general semidefinite positive matrices P and Q. However, if P and Q are diagonal or block diagonal, then the primal update in both algorithms can be decomposed into independent smaller problems and hence has extremely low complexity. To verify the convergence of Algorithm, Figure 6 shows the values of objective and constraint functions yielded by Algorithm with α = β +, where β the is Lipschitz modulus of the constraint function, and x( ) = x min. We also compare our algorithm with the dual subgradient algorithm (with primal averaging) with step

20 Flow Control: Convergence Rate of Algorithm v.s. O(/t) upper bound of constraint violations from Thm 3 f $! f(x(t)) upper bound of f $! f(x(t)) from Thm 3 Constraint/Objective Violations curves are parallel for large t 0-4 =t 0-5 source rate constraints (3 curves) Iterations: t Fig. 3. The convergence rate of Algorithm for a multipath flow control problem. size 0.0 in [5, 3, 7, 3]. Figure 6 shows that Algorithm converges faster than the dual subgradient algorithm with primal averaging. To verify the convergence rate of Algorithm, Figure 7 plots f(x(t)) f, function /t, and the bound from Theorem 3 with both x-axis and y-axis in log 0 scales. It can be observed that the curves of f(x(t)) f and all the source rate constraint values are parallel to the curve of /t for large t. Note that the constraint violation is not plotted since the constraint function is satisfied for all iterations as observed in Figure Conclusions. This paper proposes a novel but simple algorithm to solve convex programs with a possibly non-differentiable objective function and Lipschitz continuous constraint functions. The new algorithm has a parallel implementation when the objective function and constraint functions are separable. The convergence rate of the proposed algorithm is shown to be O(/t). This is faster than the O(/ t) convergence rate of the dual subgradient algorithm with primal averaging. The ADMM algorithm has the same O(/t) convergence rate but can only deal with linear equality constraint functions. The new algorithm is further applied to solve multipath network flow control problems and yields a decentralized flow control algorithm which converges faster than existing dual subgradient or primal-dual subgradient based flow control algorithms. The O(/t) convergence rate of the proposed algorithm is also verified by numerical experiments.

21 Objective Value Link Capacity Cons Value Source Rate Cons Value Joint Flow and Power Control: Convergence of Objective Values 6 4 dual subgradient with primal averaging optimal value Algorithm Iterations: t Joint Flow and Power Control: Convergence of Link Capacity Constraints Algorithm (9 curves) Iterations: t Joint Flow and Power Control: Convergence of Source Rate Constraints 4 3 dual subgradient with primal averaging(9 curves) dual subgradient with primal averaging (3 curves) 0 Algorithm (3 curves) Iterations: t Fig. 4. The convergence of Algorithm and the dual subgradient algorithm with primal averaging for a multipath joint flow and power control problem. REFERENCES [] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms, Wiley-Interscience, 006. [] A. Beck, A. Nedic, A. Ozdaglar, and M. Teboulle, An O(/k) gradient method for network resource allocation problems, IEEE Transactions on Control of Network Systems, (04), pp [3] D. P. Bertsekas, Nonlinear Programming, Athena Scientific, second ed., 999. [4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, 3 (0), pp.. [5] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 004. [6] B. He and X. Yuan, On the O(/n) convergence rate of the Douglas-Rachford alternating direction method, SIAM Journal on Numerical Analysis, 50 (0), pp [7] J.-B. Hiriart-Urruty and C. Lemaréchal, Fundamentals of Convex Analysis, Springer, 00. [8] F. P. Kelly, A. K. Maulloo, and D. K. Tan, Rate control for communication networks: Shadow prices, proportional fairness and stability, Journal of the Operational Research Society, 49 (998), pp [9] T.-Y. Lin, S.-Q. Ma, and S.-Z. Zhang, On the sublinear convergence rate of multi-block ADMM, Journal of the Operations Research Society of China, 3 (05), pp [0] S. H. Low, A duality model of TCP and queue management algorithms, IEEE/ACM Transactions on Networking, (003), pp [] S. H. Low and D. E. Lapsley, Optimization flow control I: basic algorithm and convergence, IEEE/ACM Transactions on Networking, 7 (999), pp

22 0 3 0 $! x $! x 0 Constraint/Objective Violations curves are parallel for large t link capacity constraints (9 curves) source rate constraints (3 curves) Iterations: t Fig. 5. The convergence rate of Algorithm for a multipath joint flow and power control problem. [] I. Necoara and V. Nedelcu, Rate analysis of inexact dual first-order methods application to dual decomposition, IEEE Transactions on Automatic Control, 59 (04), pp [3] A. Nedić and A. Ozdaglar, Approximate primal solutions and rate analysis for dual subgradient methods, SIAM Journal on Optimization, 9 (009), pp [4] A. Nedić and A. Ozdaglar, Subgradient methods for saddle-point problems, Journal of Optimization Theory and Applications, 4 (009), pp [5] M. J. Neely, Distributed and secure computation of convex programs over a network of connected processors, in DCDIS Conference Guelph, July 005. [6] M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool Publishers, 00. [7] M. J. Neely, A simple convergence time analysis of drift-plus-penalty for stochastic optimization and convex programs, arxiv:4.079, (04). [8] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Springer Science & Business Media, 004. [9] N. Parikh and S. Boyd, Proximal algorithms, Foundations and Trends in Optimization, (03), pp [0] W.-H. Wang, M. Palaniswami, and S. H. Low, Optimal flow control and routing in multi-path networks, Performance Evaluation, 5 (003), pp [] E. Wei and A. Ozdaglar, On the O(/k) convergence of asynchronous distributed alternating direction method of multipliers, in Proceedings of IEEE Global Conference on Signal and Information Processing, 03. [] E. Wei, A. Ozdaglar, and A. Jadbabaie, A distributed Newton method for network utility maximization I: algorithm, IEEE Transactions on Automatic Control, 58 (03), pp [3] H. Yu and M. J. Neely, On the convergence time of the drift-plus-penalty algorithm for

23 Quadratic Program: Convergence of Objective Values Objective Value Algorithm optimal value dual subgradient with primal averaging Constraint Value Iterations: t Quadratic Program: Convergence of Constraint Violations dual subgradient with primal averaging Algorithm Iterations: t Fig. 6. The convergence of Algorithm and the dual subgradient algorithm with primal averaging for a quadratic program. strongly convex programs, in Proceedings of IEEE Conference on Decision and Control (CDC), 05. [4] H. Yu and M. J. Neely, A primal-dual type algorithm with the O(/t) convergence rate for large scale constrained convex programs, in Proceedings of IEEE Conference on Decision and Control (CDC), 06.

24 0 4 Quadratic Program: Convergence Rate of Algorithm v.s. O(/t) f(x(t))! f $ upper bound of f(x(t))! f $ from Thm 3 curves are parallel for large t =t Fig. 7. The convergence rate of Algorithm for a quadratic program.

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

NEW LAGRANGIAN METHODS FOR CONSTRAINED CONVEX PROGRAMS AND THEIR APPLICATIONS. Hao Yu

NEW LAGRANGIAN METHODS FOR CONSTRAINED CONVEX PROGRAMS AND THEIR APPLICATIONS. Hao Yu NEW LAGRANGIAN METHODS FOR CONSTRAINED CONVEX PROGRAMS AND THEIR APPLICATIONS by Hao Yu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

A Distributed Newton Method for Network Utility Maximization

A Distributed Newton Method for Network Utility Maximization A Distributed Newton Method for Networ Utility Maximization Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie Abstract Most existing wor uses dual decomposition and subgradient methods to solve Networ Utility

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Subgradient Methods in Network Resource Allocation: Rate Analysis

Subgradient Methods in Network Resource Allocation: Rate Analysis Subgradient Methods in Networ Resource Allocation: Rate Analysis Angelia Nedić Department of Industrial and Enterprise Systems Engineering University of Illinois Urbana-Champaign, IL 61801 Email: angelia@uiuc.edu

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

The Multi-Path Utility Maximization Problem

The Multi-Path Utility Maximization Problem The Multi-Path Utility Maximization Problem Xiaojun Lin and Ness B. Shroff School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47906 {linx,shroff}@ecn.purdue.edu Abstract

More information

Primal Solutions and Rate Analysis for Subgradient Methods

Primal Solutions and Rate Analysis for Subgradient Methods Primal Solutions and Rate Analysis for Subgradient Methods Asu Ozdaglar Joint work with Angelia Nedić, UIUC Conference on Information Sciences and Systems (CISS) March, 2008 Department of Electrical Engineering

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications

Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications Alp Simsek Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Asuman E.

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Some Background Math Notes on Limsups, Sets, and Convexity

Some Background Math Notes on Limsups, Sets, and Convexity EE599 STOCHASTIC NETWORK OPTIMIZATION, MICHAEL J. NEELY, FALL 2008 1 Some Background Math Notes on Limsups, Sets, and Convexity I. LIMITS Let f(t) be a real valued function of time. Suppose f(t) converges

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Distributed Multiuser Optimization: Algorithms and Error Analysis

Distributed Multiuser Optimization: Algorithms and Error Analysis Distributed Multiuser Optimization: Algorithms and Error Analysis Jayash Koshal, Angelia Nedić and Uday V. Shanbhag Abstract We consider a class of multiuser optimization problems in which user interactions

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

A Distributed Newton Method for Network Utility Maximization, I: Algorithm

A Distributed Newton Method for Network Utility Maximization, I: Algorithm A Distributed Newton Method for Networ Utility Maximization, I: Algorithm Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract Most existing wors use dual decomposition and first-order

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS Working Paper 01 09 Departamento de Estadística y Econometría Statistics and Econometrics Series 06 Universidad Carlos III de Madrid January 2001 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Dynamic Network Utility Maximization with Delivery Contracts

Dynamic Network Utility Maximization with Delivery Contracts Dynamic Network Utility Maximization with Delivery Contracts N. Trichakis A. Zymnis S. Boyd August 31, 27 Abstract We consider a multi-period variation of the network utility maximization problem that

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Solution Methods for Stochastic Programs

Solution Methods for Stochastic Programs Solution Methods for Stochastic Programs Huseyin Topaloglu School of Operations Research and Information Engineering Cornell University ht88@cornell.edu August 14, 2010 1 Outline Cutting plane methods

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

On convergence rate of the Douglas-Rachford operator splitting method

On convergence rate of the Douglas-Rachford operator splitting method On convergence rate of the Douglas-Rachford operator splitting method Bingsheng He and Xiaoming Yuan 2 Abstract. This note provides a simple proof on a O(/k) convergence rate for the Douglas- Rachford

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

On the acceleration of augmented Lagrangian method for linearly constrained optimization

On the acceleration of augmented Lagrangian method for linearly constrained optimization On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental

More information

4. Algebra and Duality

4. Algebra and Duality 4-1 Algebra and Duality P. Parrilo and S. Lall, CDC 2003 2003.12.07.01 4. Algebra and Duality Example: non-convex polynomial optimization Weak duality and duality gap The dual is not intrinsic The cone

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way. AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Network Optimization: Notes and Exercises

Network Optimization: Notes and Exercises SPRING 2016 1 Network Optimization: Notes and Exercises Michael J. Neely University of Southern California http://www-bcf.usc.edu/ mjneely Abstract These notes provide a tutorial treatment of topics of

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods

Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods Angelia Nedich Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign 117 Transportation

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

Stability of Primal-Dual Gradient Dynamics and Applications to Network Optimization

Stability of Primal-Dual Gradient Dynamics and Applications to Network Optimization Stability of Primal-Dual Gradient Dynamics and Applications to Network Optimization Diego Feijer Fernando Paganini Department of Electrical Engineering, Universidad ORT Cuareim 1451, 11.100 Montevideo,

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Distributed Resource Allocation Using One-Way Communication with Applications to Power Networks

Distributed Resource Allocation Using One-Way Communication with Applications to Power Networks Distributed Resource Allocation Using One-Way Communication with Applications to Power Networks Sindri Magnússon, Chinwendu Enyioha, Kathryn Heal, Na Li, Carlo Fischione, and Vahid Tarokh Abstract Typical

More information

Duality Theory of Constrained Optimization

Duality Theory of Constrained Optimization Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

A Distributed Newton Method for Network Optimization

A Distributed Newton Method for Network Optimization A Distributed Newton Method for Networ Optimization Ali Jadbabaie, Asuman Ozdaglar, and Michael Zargham Abstract Most existing wor uses dual decomposition and subgradient methods to solve networ optimization

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Rate Control in Communication Networks

Rate Control in Communication Networks From Models to Algorithms Department of Computer Science & Engineering The Chinese University of Hong Kong February 29, 2008 Outline Preliminaries 1 Preliminaries Convex Optimization TCP Congestion Control

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

A. Derivation of regularized ERM duality

A. Derivation of regularized ERM duality A. Derivation of regularized ERM dualit For completeness, in this section we derive the dual 5 to the problem of computing proximal operator for the ERM objective 3. We can rewrite the primal problem as

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the

More information

Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO

Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems Guidance Professor Masao FUKUSHIMA Hirokazu KATO 2004 Graduate Course in Department of Applied Mathematics and

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information