Primal Solutions and Rate Analysis for Subgradient Methods

Similar documents
Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods

Subgradient Methods in Network Resource Allocation: Rate Analysis

Distributed Multiuser Optimization: Algorithms and Error Analysis

Solving Dual Problems

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Lecture 3. Optimization Problems and Iterative Algorithms

Math 273a: Optimization Subgradients of convex functions

Lecture 8. Strong Duality Results. September 22, 2008

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6

Convex Optimization Theory. Chapter 4 Exercises and Solutions: Extended Version

Finite Dimensional Optimization Part III: Convex Optimization 1

Additional Homework Problems

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

Algorithms for constrained local optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Primal/Dual Decomposition Methods

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

5. Subgradient method

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms

Subgradient Methods. Stephen Boyd (with help from Jaehyun Park) Notes for EE364b, Stanford University, Spring

Convex Optimization and Modeling

Lecture 6: Conic Optimization September 8

Lecture 25: Subgradient Method and Bundle Methods April 24

Solution Methods for Stochastic Programs

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

A Distributed Newton Method for Network Utility Maximization, II: Convergence

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Extended Monotropic Programming and Duality 1

Stability of Primal-Dual Gradient Dynamics and Applications to Network Optimization

Constrained Consensus and Optimization in Multi-Agent Networks

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

3.10 Lagrangian relaxation

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Miscellaneous Nonlinear Programming Exercises

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Lagrange Relaxation and Duality

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

LECTURE 10 LECTURE OUTLINE

On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging

Accelerated primal-dual methods for linearly constrained convex problems

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Pacific Journal of Optimization (Vol. 2, No. 3, September 2006) ABSTRACT

Asynchronous Multi-Agent Primal-Dual. Optimization

12. Interior-point methods

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

10-725/ Optimization Midterm Exam

Nonlinear Programming and the Kuhn-Tucker Conditions

Algorithms for Nonsmooth Optimization

Generalization to inequality constrained problem. Maximize

Utility, Fairness and Rate Allocation

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Optimization & Lagrange Duality

Optimisation in Higher Dimensions

Introduction to Alternating Direction Method of Multipliers

Lagrangian Duality Theory

More First-Order Optimization Algorithms

EE364a Review Session 5

Sum-Power Iterative Watefilling Algorithm

Lecture 10: Linear programming duality and sensitivity 0-0

Approximate Farkas Lemmas in Convex Optimization

Math 273a: Optimization Subgradient Methods

Lecture Note 5: Semidefinite Programming for Stability Analysis

Chap 2. Optimality conditions

4. Algebra and Duality

Distributed Consensus Optimization

NEW LAGRANGIAN METHODS FOR CONSTRAINED CONVEX PROGRAMS AND THEIR APPLICATIONS. Hao Yu

Lecture: Duality of LP, SOCP and SDP

Math 273a: Optimization Convex Conjugacy

Constrained Optimization and Lagrangian Duality

An approximate dual subgradient algorithm for multi-agent non-convex optimization

Optimization for Machine Learning

You should be able to...

Solving large Semidefinite Programs - Part 1 and 2

subject to (x 2)(x 4) u,

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

Lagrange Relaxation: Introduction and Applications

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

ADMM and Fast Gradient Methods for Distributed Optimization

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

Distributed Optimization over Random Networks

Convergence of the Surrogate Lagrangian Relaxation Method

12. Interior-point methods

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Distributed Estimation via Dual Decomposition

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization

Summary of the simplex method

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Concave programming. Concave programming is another special case of the general constrained optimization. subject to g(x) 0

A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

Transcription:

Primal Solutions and Rate Analysis for Subgradient Methods Asu Ozdaglar Joint work with Angelia Nedić, UIUC Conference on Information Sciences and Systems (CISS) March, 2008 Department of Electrical Engineering & Computer Science Massachusetts Institute of Technology

Introduction Lagrangian relaxation and duality effective tools for solving large-scale convex optimization, systematically providing lower bounds on the optimal value Subgradient methods provide efficient computational means to solve the dual problem to obtain Near-optimal dual solutions Bounds on the primal optimal value Most remarkably, in networking applications, subgradient methods have been used to design decentralized resource allocation mechanisms Kelly 1997, Low and Lapsley 1999, Srikant 2003, Chiang et al. 2007

Issues with this approach Subgradient methods operate in the dual space In most problems, interest in primal solutions Convergence analysis mostly focuses on diminishing stepsize No convergence rate analysis Question of Interest: Can we use the subgradient information to produce near-feasible and near-optimal primal solutions?

Our Work Primal solution generation from subgradient algorithms Main Results: Development of algorithms that use the subgradient information and an averaging scheme to generate approximate primal optimal solutions Convergence rate analysis for the approximation error of the primal solutions including: The amount of feasibility violation Primal optimal cost approximation error Stopping criteria for our algorithms This talk has two parts: Dual subgradient algorithms (subgradient of the dual function available) Primal-dual subgradient algorithms

Prior Work Subgradient methods producing primal solutions by averaging Nemirovskii and Yudin 1978 Shor 1985, Sherali and Choi 1996 [linear primal] Larsson, Patriksson, Strömberg 1995, 1998, 1999 [convex primal] Kiwiel, Larsson, and Lindberg 2007 In all of the existing literature: Interest is in generating primal optimal solutions in the limit The focus is on subgradient algorithms using a diminishing step There is no convergence rate analysis (Primal) subgradient methods that use averaging to generate solutions Nesterov 2005, Ruszczynski 2007

Primal and Dual Problem We consider the following primal problem f = minimize f(x) subject to g(x) 0, x X, where g(x) = (g 1 (x),..., g m (x)) and f is finite. The functions f : R n R and g i : R n R, i = 1,..., m are convex, and the set X R n is nonempty and convex We are interested in solving the primal problem by considering the Lagrangian dual problem q = maximize subject to q(µ) = inf x X {f(x) + µt g(x)} µ 0, µ R m

Dual Subgradient Method The dual iterates are generated by the following update rule: µ k+1 = [µ k + α k g k ] + for k 0 µ 0 is an initial iterate with µ 0 0 [ ] + denotes the projection on the nonnegative orthant α k > 0 is a stepsize g k is a subgradient of q(µ) at µ k, i.e., g k = g(x k ) with x k X and q(µ k ) = f(x k ) + µ T k g(x k ) We assume that: The set of optimal solutions, arg min x X {f(x) + µ T g(x)}, is nonempty for all µ 0 The subgradient of the dual function is easy to compute

Dual Set Boundedness under Slater Assumption (Slater Condition) There is a vector x R n such that g j ( x) < 0, j = 1,..., r. Under the Slater condition, we have: The dual optimal set is nonempty and bounded There holds for any dual optimal solution µ 0, m µ j j=1 f( x) q min 1 j m { g j ( x)} [Uzawa 1958] We extend this result, as follows: Proposition: Let the Slater condition hold. Then, for every c R, the set Q c = {µ 0 q(µ) c} is bounded: µ where x is a Slater vector. f( x) c min 1 j m { g j ( x)} for all µ Q c

Analysis of the Subgradient Method Consider the algorithm with a constant stepsize α > 0, i.e., µ k+1 = [µ k + αg k ] + for k 0 Assumption (Bounded Subgradients) The subgradient sequence {g k } is bounded, i.e., there exists a scalar L > 0 such that g k L, k 0 This assumption satisfied when primal constraint set X is compact By the convexity of the g j over R n, max x X g(x) is finite and provides an upper bound on the norms of the subgradients

Bounded Multipliers Proposition: Let the Slater condition hold and let the subgradients g k be bounded. Let {µ k } be the multiplier sequence generated by the subgradient algorithm. Then, the sequence {µ k } is bounded. In particular, for all k, we have µ k 2 { γ [f( x) 1 q ] + max µ 0, γ [f( x) q ] + αl2 2γ α is the stepsize x is a Slater vector γ = min 1 j m { g j ( x)} L is a subgradient norm bound } + αl

Subgradient Algorithm and Primal Averages Subgradient Method Generates multipliers in the dual space: µ k+1 = [µ k + αg k ] + for k 0 g k = g(x k ) with x k X and q(µ k ) = f(x k ) + µ T k g(x k ) Primal Averaging Generates the primal averages of x 0,..., x k 1 : ˆx k = 1 k k 1 i=0 x i for k 1 Each ˆx k belongs to X by convexity of X and the fact x i X for all i The vectors ˆx k need not be feasible We consider ˆx k as an approximate primal solution

Proposition: Basic Estimates for the Primal Averages Let {µ k } be generated by the subgradient method with a stepsize α. Let ˆx k be the primal averages of the subgradient defining vectors x k X. Then, for all k 1: The amount of feasibility violation at ˆx k is bounded by g(ˆxk ) + µ k kα The primal cost at ˆx k is bounded above by f(ˆx k ) q + µ 0 2 2kα + α 2k k 1 i=0 g(x i ) 2 The primal cost at ˆx k is bounded below by f(ˆx k ) q µ g(ˆxk ) + where µ is a dual optimal solution and q is the dual optimal value.

Estimates under Slater Proposition: Let Slater condition hold and subgradients be bounded. Then, the estimates for ˆx k can be strengthened as follows: for all k 1, The amount of feasibility violation is bounded by The primal cost is bounded above by g(ˆx k ) + B µ 0 kα f(ˆx k ) f + µ 0 2 2kα + αl2 2 The primal cost is bounded below by f(ˆx k ) f 1 γ [f( x) q ] g(ˆxk ) + where L is a subgradient norm bound, γ = min 1 j m { g j ( x)} Bµ 0 = 2 { γ [f( x) 1 q ] + max µ 0, γ [f( x) q ] + αl2 2γ } + αl

Analyzing the Results Choosing µ 0 = 0 yields: g(ˆxk ) + B 0 kα with B 0 = 3 γ [f( x) q ] + αl2 2γ + αl f(ˆx k ) f + αl2 2 f(ˆx k ) f 1 γ [f( x) q ] g(ˆxk ) + Remarks: The rate of convergence to the primal near-optimal value is driven by the rate of infeasibility decrease The bound on feasibility violation B0 involves dual optimal value q. We can use max 0 i k q(µ i ) q for an alternative bound. Stopping criteria readily available from these estimates The estimates capture the trade-offs between a desired accuracy and the computations required to achieve the accuracy

Example Rate allocation using network utility maximization A simple network with 2 serial links and 3 sources, with rate x i Link capacities are c 1 = 1 and c 2 = 2 Each user has utility function u i (x i ) = x i We allocate rates as the optimal solution of maximize 3 i=1 xi subject to x 1 + x 2 1, x 1 + x 3 2, x i 0, i = 1, 2, 3. We use a dual subgradient method and averaging to generate primal solutions

Performance The convergence behavior of the primal sequence {x k } (left) and {ˆx k } (right) The convergence behavior of the constraint violation (left) and primal objective function values (right) for the two primal sequences 10 9 g(x(k)) g(x^(k)) 7 6 f(x^(k)) f(x(k)) 8 7 5 6 4 5 4 3 3 2 2 1 1 0 0 10 20 30 40 50 60 k 0 0 10 20 30 40 50 60 k

Primal-Dual Subgradient Method Assume subgradient of dual function cannot be computed efficiently We consider methods for computing saddle point of Lagrangian L(x, µ) = f(x) + µ g(x), for all x X, µ 0 Primal-Dual Subgradient Method: x k+1 = P X [x k αl x (x k, µ k )] for k = 0, 1,.... µ k+1 = P D [µ k + αl µ (x k, µ k )] for k = 0, 1,.... D is a closed convex set containing set of dual optimal solutions L x (x k, µ) denotes a subgradient wrt x of L(x, µ) at x k. L µ (x, µ k ) denotes a subgradient wrt µ of L(x, µ) at µ k. L x (x k, µ) = s f (x k ) + m µ i s gi (x k ), L µ (x, µ k ) = g(x), i=1 where s f (x k ) and s gi (x k ) are subgradients of f and g i at x k. Builds on the seminal Arrow-Hurwicz-Uzawa gradient method 1958

Set D under Slater Assumption Under Slater, dual optimal set M nonempty and bounded This motivates the following choice for set D: D = { µ 0 µ f( x) q γ where r > 0 is a scalar parameter } + r Assumption (Compactness): Set X is compact, x B, for all x X. Under the assumptions and the definition of the method, the subgradients are bounded: { } max max L x (x k, µ k ), L µ (x k, µ k ) k 0 L. The subgradient boundedness was assumed in previous analysis (Gol shtein 72, Korpelevich 76)

Estimates for the Primal-Dual Method Proposition: Let the Slater and Compactness Assumptions hold. Let {ˆx k } be the primal average sequence. Then, for all k 1, we have: The amount of feasibility violation is bounded by g(ˆx k ) + 2 kαr ( f( x) q γ ) 2 + r + x 0 x 2 2kαr + αl2 2r. The primal cost is bounded above by f(ˆx k ) f + µ 0 2 2kα + x 0 x 2 2kα + αl 2. The primal cost is bounded below by ( ) f(ˆx k ) f f( x) q g(ˆx k ) +. γ

Optimal Choice for r and Resulting Estimate By minimizing the bound for the feasibility violation with respect to the parameter r > 0, we obtain: The resulting optimal r depends on the iteration index k: r (k) = ( ) 2 f( x) q + x 0 x 2 γ 4 + kα2 L 2 4 for k 1. Given some k, consider an algorithm where dual iterates are obtained by { µ i+1 = P Dk [µ i + αl µ (x i, µ i )], D k = µ 0 µ f( x) q γ The resulting feasibility violation estimate at the primal average ˆx k : g(ˆx k ) + 8 ( ) f( x) q + 2 x 0 x + 2L kα γ kα k } + r (k)

Conclusions We considered dual and primal-dual subgradient methods with primal averaging to generate primal near-feasible and near-optimal solutions Slater assumption plays a key role in our analysis We provided estimates for feasibility violation and primal cost Our estimates capture the trade-offs between desired accuracy and the computations required to achieve the accuracy Our analysis shows that The scheme using dual subgradient method converges with rate 1/k The scheme using primal-dual subgradient method converges with rate 1/ k