A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

Similar documents
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

On the Convergence Time of Dual Subgradient Methods for Strongly Convex Programs

Lecture 9: September 25

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization

Optimality Conditions for Unconstrained Problems

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

An Introduction to Malliavin calculus and its applications

Aryan Mokhtari, Wei Shi, Qing Ling, and Alejandro Ribeiro. cost function n

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 20: Riccati Equations and Least Squares Feedback Control

Primal-Dual Splitting: Recent Improvements and Variants

INEXACT CUTS FOR DETERMINISTIC AND STOCHASTIC DUAL DYNAMIC PROGRAMMING APPLIED TO CONVEX NONLINEAR OPTIMIZATION PROBLEMS

arxiv: v1 [math.oc] 11 Sep 2017

ELE 538B: Large-Scale Optimization for Data Science. Quasi-Newton methods. Yuxin Chen Princeton University, Spring 2018

An introduction to the theory of SDDP algorithm

Variational Iteration Method for Solving System of Fractional Order Ordinary Differential Equations

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

Convergence of the Neumann series in higher norms

Optimality and complexity for constrained optimization problems with nonconvex regularization

Network Newton Distributed Optimization Methods

Correspondence should be addressed to Nguyen Buong,

Expert Advice for Amateurs

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Vehicle Arrival Models : Headway

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

t 2 B F x,t n dsdt t u x,t dxdt

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

arxiv: v1 [math.oc] 13 Sep 2018

L p -L q -Time decay estimate for solution of the Cauchy problem for hyperbolic partial differential equations of linear thermoelasticity

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Monotonic Solutions of a Class of Quadratic Singular Integral Equations of Volterra type

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

arxiv: v2 [math.ap] 16 Oct 2017

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

EXERCISES FOR SECTION 1.5

A primal-dual Laplacian gradient flow dynamics for distributed resource allocation problems

Hamilton Jacobi equations

Optimal approximate dynamic programming algorithms for a general class of storage problems

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Chapter 2. First Order Scalar Equations

Notes on online convex optimization

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales

Lecture 4: November 13

SUFFICIENT CONDITIONS FOR EXISTENCE SOLUTION OF LINEAR TWO-POINT BOUNDARY PROBLEM IN MINIMIZATION OF QUADRATIC FUNCTIONAL

CONTRIBUTION TO IMPULSIVE EQUATIONS

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

1 Solutions to selected problems

Vanishing Viscosity Method. There are another instructive and perhaps more natural discontinuous solutions of the conservation law

Chapter 3 Boundary Value Problem

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

14 Autoregressive Moving Average Models

Lecture 4 Notes (Little s Theorem)

Convergence of the Lasserre Hierarchy of SDP Relaxations for Convex Polynomial Programs without Compactness

Accelerated Distributed Nesterov Gradient Descent for Convex and Smooth Functions

On Oscillation of a Generalized Logistic Equation with Several Delays

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Ordinary Differential Equations

Projection-Based Optimal Mode Scheduling

Particle Swarm Optimization

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Optimality and complexity for constrained optimization problems with nonconvex regularization

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

A Hop Constrained Min-Sum Arborescence with Outage Costs

STATE-SPACE MODELLING. A mass balance across the tank gives:

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

Global Optimization for Scheduling Refinery Crude Oil Operations

System of Linear Differential Equations

The L p -Version of the Generalized Bohl Perron Principle for Vector Equations with Infinite Delay

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

TO our knowledge, most exciting results on the existence

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

MA 214 Calculus IV (Spring 2016) Section 2. Homework Assignment 1 Solutions

556: MATHEMATICAL STATISTICS I

arxiv: v1 [math.pr] 19 Feb 2011

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Recursive Least-Squares Fixed-Interval Smoother Using Covariance Information based on Innovation Approach in Linear Continuous Stochastic Systems

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Scheduling of Crude Oil Movements at Refinery Front-end

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

FAST CONVERGENCE OF INERTIAL DYNAMICS AND ALGORITHMS WITH ASYMPTOTIC VANISHING DAMPING

An Introduction to Stochastic Programming: The Recourse Problem

Energy-Efficient Link Adaptation on Parallel Channels

International Journal of Mathematics Trends and Technology (IJMTT) Volume 37 Number 3 September 2016

Ann. Funct. Anal. 2 (2011), no. 2, A nnals of F unctional A nalysis ISSN: (electronic) URL:

OPTIMAL PRIMAL-DUAL METHODS FOR A CLASS OF SADDLE POINT PROBLEMS

Dini derivative and a characterization for Lipschitz and convex functions on Riemannian manifolds

Class Meeting # 10: Introduction to the Wave Equation

dy dx = xey (a) y(0) = 2 (b) y(1) = 2.5 SOLUTION: See next page

Orthogonal Rational Functions, Associated Rational Functions And Functions Of The Second Kind

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Endpoint Strichartz estimates

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

Heat kernel and Harnack inequality on Riemannian manifolds

Transcription:

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers large scale consrained convex programs. These are ofen difficul o solve by inerior poin mehods or oher Newon-ype mehods due o he prohibiive compuaion and sorage complexiy for Hessians or marix inversions. Insead, large scale consrained convex programs are ofen solved by gradien based mehods or decomposiion based mehods. The convenional primal-dual subgradien mehod, also known as he Arrow-Hurwicz-Uzawa subgradien mehod, is a low complexiy algorihm wih he O(/ ) convergence rae, where is he number of ieraions. If he objecive and consrain funcions are separable, he Lagrangian dual ype mehod can decompose a large scale convex program ino muliple parallel small scale convex programs. The classical dual gradien algorihm is an example of Lagrangian dual ype mehods and has convergence rae O(/ ). Recenly, he auhors of he curren paper proposed a new Lagrangian dual ype algorihm wih faser O(/) convergence. However, if he objecive or consrain funcions are no separable, each ieraion requires o solve a large scale unconsrained convex program, which can have huge complexiy. This paper proposes a new primal-dual ype algorihm, which only involves simple gradien updaes a each ieraion and has O(/) convergence. I. INTRODUCTION Fix posiive inegers n and m, which are ypically large. Consider he general consrained convex program: minimize: f(x) () such ha: g k (x) 0, k {,,..., m} () x X (3) where se X R n is a compac convex se; funcion f(x) is convex and smooh on X ; and funcions g k (x), k {,,..., m} are convex, smooh and Lipschiz coninuous on X. Denoe he sacked vecor of muliple funcions g (x), g (x),..., g m (x) as g(x) = g (x), g (x),..., g m (x) T. The Lipschiz coninuiy of each g k (x) implies ha g(x) is Lipschiz coninuous on X. Throughou his paper, we use o represen he Euclidean norm and have he following assumpions on convex program ()-(3): Assumpion (Basic Assumpions): There exiss a (possibly non-unique) opimal soluion x X ha solves convex program ()-(3). There exiss L f 0 such ha f(x) f(y) L f x y for all x, y X, i.e., f(x) is smooh wih modulus L f. For each k {,,..., m}, here exiss L gk 0 such ha g k (x) g k (y) L gk x y The auhors are wih he Elecrical Engineering deparmen a he Universiy of Souhern California, Los Angeles, CA. for all x, y X, i.e., g k (x) is smooh wih modulus L gk. Denoe L g = L g,..., L gm T. There exiss β 0 such ha g(x) g(y) β x y, x, y X, i.e., g(x) is Lipschiz coninuous wih modulus β. There exiss C 0 such ha g(x) C, x X. There exiss R 0 such ha x y R, x, y X. Noe ha he exisence of C follows from he coninuiy of g(x) and he compacness of se X. The exisence of R follows from he compacness of se X. Assumpion (Exisence of Lagrange mulipliers): There exiss a Lagrange muliplier vecor λ = λ, λ,..., λ m 0 aaining he srong dualiy for problem ()-(3), i.e., q(λ ) = min x X {f(x) : g k(x) 0, k {,,..., m}}, where q(λ) = min {f(x) + m x X k= λ kg k (x)} is he Lagrangian dual funcion of problem ()-(3). Assumpion is a mild condiion. For example, i is implied by he Slaer condiion for convex programs. A. Large Scale Convex Programs In general, convex program ()-(3) can be solved via inerior poin mehods (or oher Newon ype mehods) which involve he compuaion of Hessians and marix inversions a each ieraion. The associaed compuaion complexiy and memory space complexiy a each ieraion is beween O(n ) and O(n 3 ), which is prohibiive when n is exremely large. For example, if n = 0 5 and each floaing poin number uses 4 byes, hen 40 Gbyes of memory is required even o save he Hessian a each ieraion. Thus, large scale convex programs are usually solved by gradien based mehods or decomposiion based mehods. B. The Primal-Dual Subgradien Mehod The primal-dual subgradien mehod, also known as he Arrow-Hurwicz-Uzawa Subgradien Mehod, applied o convex program ()-(3) is described in Algorihm. The updaes of x() and λ() only involve he compuaion of gradien and simple projecion operaions, which are much simpler han he compuaion of Hessians and marix inversions for exremely large n. Thus, compared wih he inerior poin mehods, he primal-dual subgradien algorihm has lower complexiy compuaions a each ieraion and hence is more suiable o large scale convex programs. However, he

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 convergence rae of Algorihm is only O(/ ), where is he number of ieraions. Algorihm The Primal-Dual Subgradien Algorihm Le c > 0 be a consan sep size. Choose any x(0) X. Iniialize Lagrangian mulipliers λ k (0) = 0, k {,,..., m}. A each ieraion {,,...}, observe x( ) and λ( ) and do he following: m Choose x() = P X x( ) c k= λ k( ) g k (x( )), where P X is he projecion ono convex se X. Updae Lagrangian mulipliers λ k () = λ k ( ) + cg k (x( )) λmax k 0, k {,,..., m}, where λ max k > λ k and max 0 is he projecion ono inerval 0, λ max k. Updae he running averages x(+) = x(τ) = x() + + x() +. C. Lagrangian Dual Type Mehods The classical dual subgradien algorihm is a Lagrangian dual ype ieraive mehod ha approaches opimaliy for sricly convex programs 3. A modificaion of he classical dual subgradien algorihm ha averages he resuling sequence of primal esimaes can solve general convex programs and has an O(/ ) convergence rae 4, 5, 6. The dual subgradien algorihm wih primal averaging is suiable o large scale convex programs because he updaes of each componen x i () are independen and parallel if funcions f(x) and g k (x) in convex program ()-(3) are separable wih respec o each componen (or block) of x, e.g., f(x) = n i= f i(x i ) and g k (x) = n i= g k,i(x i ). Recenly, a new Lagrangian dual ype algorihm wih convergence rae O(/) for general convex programs is proposed in 7. This algorihm can solve convex program ()-(3) following he seps described in Algorihm. Similar o he dual subgradien algorihm wih primal averaging, Algorihm can decompose he updaes of x() ino smaller independen subproblems if funcions f(x) and g k (x) are separable. Moreover, Algorihm has O(/) convergence, which is faser han he primal-dual subgradien or he dual subgradien algorihm wih primal averaging. However, if f(x) or g k (x) are no separable, each updae of x() requires o solve a se consrained convex program. If he dimension n is large, such a se consrained convex program should be solved via a gradien based mehod insead of a Newon mehod. However, he gradien based mehod for se consrained convex programs is an ieraive echnique and involves a leas one projecion operaion a each ieraion. In his paper, we say ha he primal dual subgradien algorihm and he dual subgradien algorihm have an O(/ ) convergence rae in he sense ha hey achieve an ɛ-approximae soluion wih O(/ɛ ) ieraions by using an O(ɛ) sep size. The error of hose algorihms does no necessarily coninue o decay afer he ɛ-approximae soluion is reached. In conras, he algorihm in he curren paper has a faser O(/) convergence and his holds for all ime, so ha error goes o zero as he number of ieraions increases. Algorihm Algorihm in 7 Le α > 0 be a consan parameer. Choose any x( ) X. Iniialize virual queues Q k (0) = max{0, g k (x( ))}, k {,,..., m}. A each ieraion {0,,,...}, observe x( ) and Q() and do he following: Choose x() = argmin x X {f(x) + Q() + g(x( )) T g(x) + α x x( ) }. Updae virual queue vecor Q() via Q k ( + ) = max{ g k (x()), Q k () + g k (x())}, k {,,..., m}. Updae he running averages via x( + ) = + D. New Algorihm x(τ) = x() + + x() +. Consider large scale convex programs wih non-separable f(x) or g k (x), e.g., f(x) = Ax b. In his case, Algorihm has convergence rae O(/ ) using low complexiy ieraions; while Algorihm has convergence rae O(/) using high complexiy ieraions. This paper proposes a new algorihm described in Algorihm 3 which combines he advanages of Algorihm and Algorihm. The new algorihm modifies Algorihm by changing he updae of x() from a minimizaion problem o a simple projecion. Meanwhile, he O(/) convergence rae of Algorihm is preserved in he new algorihm. Algorihm 3 New Algorihm Le γ > 0 be a consan sep size. Choose any x( ) X. Iniialize virual queues Q k (0) = max{0, g k (x( ))}, k {,,..., m}. A each ieraion {0,,,...}, observe x( ) and Q() and do he following: Define d() = f(x( )) + m k= Q k() + g k (x( )) g k (x( )), which is he gradien of funcion φ(x) = f(x) + Q() + g(x( )) T g(x) a poin x = x( ). Choose x() = P X x( ) γd(), where P X is he projecion ono convex se X. Updae virual queue vecor Q() via Q k ( + ) = max{ g k (x()), Q k () + g k (x())}, k {,,..., m}. Updae he running averages x( + ) = + x(τ) = x() + + x() +. II. PRELIMINARIES AND BASIC ANALYSIS This secion presens useful preliminaries on convex analysis and imporan facs of Algorihm 3. A. Preliminaries Definiion (Lipschiz Coninuiy): Le X R n be a convex se. Funcion h : X R m is said o be Lipschiz coninuous on X wih modulus L if here exiss L > 0 such ha h(y) h(x) L y x for all x, y X.

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 Definiion (Smooh Funcions): Le X R n and funcion h(x) be coninuously differeniable on X. Funcion h(x) is said o be smooh on X wih modulus L if h(x) is Lipschiz coninuous on X wih modulus L. Noe ha linear funcion h(x) = a T x is smooh wih modulus 0. If a funcion h(x) is smooh wih modulus L, hen ch(x) is smooh wih modulus cl for any c > 0. Lemma (Descen Lemma, Proposiion A.4 in 3): If h is smooh on X wih modulus L, hen h(y) h(x) + h(x) T (y x) + L y x for all x, y X. Definiion 3 (Srongly Convex Funcions): Le X R n be a convex se. Funcion h is said o be srongly convex on X wih modulus α if here exiss a consan α > 0 such ha h(x) α x is convex on X. If h(x) is convex and α > 0, hen h(x) + α x x 0 is srongly convex wih modulus α for any consan x 0. Lemma : Le X R n be a convex se. Le funcion h be srongly convex wih modulus α and x op be a global minimum of h on X. Then, h(x op ) h(x) α xop x, x X. Proof: A special case when h is differeniable and X = R n is Theorem..8 in 8. The proof for general srongly convex funcion h and general convex se X is in 7. B. Basic Properies This subsecion presens preliminary resuls relaed o he virual queue updae (Lemmas 3-6) ha are proven for Algorihm in 7. Lemma 3 (Lemma 3 in 7): In Algorihm 3, we have ) A each ieraion {0,,,...}, Q k () 0 for all k {,,..., m}. ) A each ieraion {0,,,...}, Q k () + g k (x( )) 0 for all k {,..., m}. 3) A ieraion = 0, Q(0) g(x( )). A each ieraion {,,...}, Q() g(x( )). Lemma 4 (Lemma 7 in 7): Le Q(), {0,,...} be he sequence generaed by Algorihm 3. For any, Q k () g k (x(τ)), k {,,..., m}. Le Q() = Q (),..., Q m () T be he vecor of virual queue backlogs. Define L() = Q(). The funcion L() shall be called a Lyapunov funcion. Define he Lyapunov drif as () = L(+) L() = Q(+) Q(). Lemma 5 (Lemma 4 in 7): A each ieraion {0,,,...} in Algorihm 3, an upper bound of he Lyapunov drif is given by () Q T ()g(x()) + g(x()). (4) Lemma 6 (Lemma 8 in 7): Le x be an opimal soluion and λ be defined in Assumpion. Le x(), Q(), {0,,...} be sequences generaed by Algorihm 3. Then, f(x(τ)) f(x ) λ Q() for all. III. CONVERGENCE RATE ANALYSIS OF ALGORITHM 3 This secion analyzes he convergence rae of Algorihm 3 for problem ()-(3). A. Upper Bounds of he Drif-Plus-Penaly Expression Lemma 7: Le x be an opimal soluion. For all 0 in Algorihm 3, we have () + f(x()) f(x ) + γ x x( ) x x() + g(x()) g(x( )) + β + L f + Q() L g + C L g γ x() x( ), where β, L f, L g and C are defined in Assumpion. Proof: Fix 0. Recall ha φ(x) = f(x) + Q() + g(x( )) T g(x) as defined in Algorihm 3. Noe ha par in Lemma 3 implies ha Q() + g(x( )) is componen-wise nonnegaive. Hence, φ(x) is convex. Since d() = φ(x( )), he projecion operaor in Algorihm 3 can be reinerpreed as an opimizaion problem: x() =P X x( ) γd() = argmin x X φ(x( )) + T φ(x( ))x x( ) + γ x x( ), (5) where follows by removing he consan erm φ(x( )) in he minimizaion, compleing he square, and using he fac ha he projecion of a poin ono a se is equivalen o he minimizaion of he Euclidean disance o his poin over he same se. (See 9 for he deailed proof.) Since γ x x( ) is srongly convex wih respec o x wih modulus γ, i follows ha φ(x( )) + T φ(x( ))x x( ) + γ x x( ) is srongly convex wih respec o x wih modulus γ. Since x() is chosen o minimize he above srongly convex funcion, by Lemma, we have φ(x( )) + T φ(x( ))x() x( ) + x() x( ) γ φ(x( )) + T φ(x( ))x x( ) + γ x x( ) γ x x() φ(x ) + γ x x( ) x x() =f(x ) + Q() + g(x( )) T g(x ) }{{} 0 + γ x x( ) x x() f(x ) + γ x x( ) x x(), (6) where follows from he convexiy of φ(x); follows from he definiion of φ(x); and follows by using he fac ha g k (x ) 0 and Q k () + g k (x( )) 0 (i.e., par in Lemma 3) for all k {,,..., m} o eliminae he erm marked by an underbrace. Recall ha f(x) is smooh on X wih modulus L f by Assumpion. By Lemma, we have f(x()) f(x( )) + T f(x( ))x() x( ) + L f x() x( ). (7)

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 Recall ha each g k (x) is smooh on X wih modulus L gk by Assumpion. Thus, Q k () + g k (x( ))g k (x) is smooh wih modulus Q k ()+g k (x( ))L gk. By Lemma, we have Q k () + g k (x( ))g k (x()) Q k () + g k (x( ))g k (x( ))+Q k ()+g k (x( )) T g k (x( ))x() x( )+ Q k()+g k (x( ))L gk x() x( ). Summing his inequaliy over k {,,..., m} yields Q() + g(x( )) T g(x()) Q() + g(x( )) T g(x( ))+ m Q k () + g k (x( )) T g k (x( ))x() x( ) k= + Q() + g(x( ))T L g x() x( ). (8) Summing up (7) and (8) ogeher yields f(x()) + Q() + g(x( )) T g(x()) φ(x( )) + T φ(x( ))x() x( ) + L f + Q() + g(x( )) T L g x() x( ), (9) where follows from he definiion of φ(x). Subsiuing (6) ino (9) yields Summing (4) wih his inequaliy yields () + f(x()) f(x ) + γ x x( ) x x() + g(x()) g(x( )) + + Q() + g(x( )) T L g γ β + L f x() x( ) f(x ) + γ x x( ) x x() + g(x()) g(x( )) + + Q() L g + C L g γ β + L f x() x( ), where follows from Q() + g(x( )) T L g Q() + g(x( )) L g Q() + g(x()) L g Q() L g + C L g, where he firs sep follows from Cauchy-Schwarz inequaliy, he second follows from he riangular inequaliy and he hird follows from g(x) C for all x X, i.e., Assumpion. Lemma 8: Le x be an opimal soluion and λ be defined in Assumpion. Define D = β +L f + λ L g + C L g, where β, L f, L g and C are defined in Assumpion. If γ > 0 in Algorihm 3 saisfies f(x()) + Q() + g(x( )) T g(x()) f(x ) + + Q() + g(x( )) T L g γ L f γ x x( ) x x() + x() x( ). (0) Noe ha u T u = u + u u u for any u, u R m. Thus, we have g(x( )) T g(x()) = g(x( )) + g(x()) g(x( )) g(x()). Subsiuing his ino (0) and rearranging erms yields f(x()) + Q T ()g(x()) f(x ) + L f x() x( ) γ x x( ) x x() + + Q() + g(x( )) T L g γ + g(x( )) g(x()) g(x( )) g(x()) f(x ) + β + x() x( ) γ x x( ) x x() + L f + Q() + g(x( )) T L g γ g(x( )) g(x()) where follows from g(x( )) g(x()) β x() x( ), which furher follows from he assumpion ha g(x) is Lipschiz coninuous wih modulus β. D + L g R γ γ where R is defined in Assumpion, e.g., 0, () 0 < γ /( L g R + D), () hen a each ieraion {0,,,...}, we have ) Q() λ + R γ + C. ) ()+f(x()) f(x )+ γ x x( ) x x() + g(x()) g(x( )). Proof: Before he main proof, we verify ha γ given by () saisfies (). Need o choose γ > 0 such ha D + L g R γ γ 0 Dγ + L g R γ 0 γ L g R + L g R + 4D D = L g R + L g R + 4D. Noe ha L g R+ L g R +4D L = g R+ D L, where follows from a + b a + g R+ D b, a, b 0. Thus, if γ L, i.e., 0 < γ g R+ D ( L g R+ D), hen inequaliy () holds. Nex, we prove his lemma by inducion. Consider = 0. Q(0) λ + R γ + C follows from he fac ha Q(0) g(x( )) C, where follows from par 3 in Lemma 3 and follows

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 from Assumpion. Thus, he firs par in his lemma holds a ieraion = 0. Noe ha β + L f + Q(0) L g + C L g /γ β + L f + ( λ + R γ + C ) L g + C L g γ =D + L g R γ γ 0, (3) where follows from Q(0) λ + R γ +C; follows from he definiion of D; and follows from (), i.e., he selecion rule of γ. Applying Lemma 7 a ieraion = 0 yields (0) + f(x(0)) f(x ) + γ x x( ) x x(0) + g(x(0)) g(x( )) + β + L f x(0) x( ) + Q(0) L g + C L g γ f(x ) + γ x x( ) x x(0) + g(x(0)) g(x( )), where follows from (3). Thus, he second par in his lemma holds a ieraion = 0. Assume (τ) + f(x(τ)) f(x ) + γ x x(τ ) x x(τ) + g(x(τ)) g(x(τ )) holds for all 0 τ and consider ieraion +. Summing his inequaliy over τ {0,,..., } yields (τ) + f(x(τ)) ( + )f(x ) + γ x x(τ ) x x(τ) + g(x(τ)) g(x(τ )). Recalling ha (τ) = L(τ + ) L(τ) and simplifying he summaions yields L( + ) L(0) + f(x(τ)) ( + )f(x ) + γ x x( ) γ x x() + g(x()) g(x( )) ( + )f(x ) + γ x x( ) + g(x()) g(x( )). Rearranging erms yields f(x(τ)) ( + )f(x ) + γ x x( ) + g(x()) g(x( )) + L(0) L( + ) = ( + )f(x ) + γ x x( ) + g(x()) g(x( )) + Q(0) Q( + ) ( + )f(x ) + R γ + C Q( + ), (4) where follows from L(0) = Q(0) and L( + ) = Q( + ) ; follows from x y R for all x, y X, i.e., Assumpion, g(x()) C, i.e., Assumpion, and Q(0) g(x( )), i.e., par 3 in Lemma 3. Applying Lemma 6 a ieraion + yields f(x(τ)) ( + )f(x ) λ Q( + ). Combining his inequaliy wih (4) and cancelling he common erm ( + )f(x ) on boh sides yields Q( + ) λ Q( + ) R γ C 0 ( Q( + ) λ ) λ + R γ + C Q( + ) λ + λ + R /γ + C Q( + ) λ + R/ γ + C, where follows from he basic inequaliy a + b + c a + b + c for any a, b, c 0. Thus, he firs par in his lemma holds a ieraion +. Noe ha β + L f + Q( + ) L g + C L g γ β + L f + ( λ + R γ + C) L g + C L g γ =D + L g R γ γ 0, (5) where follows from Q(+) λ + R γ +C; follows from he definiion of D; and follows from (), i.e., he selecion rule of γ. Applying Lemma 7 a ieraion + yields ( + ) + f(x( + )) f(x ) + γ x x() x x( + ) + g(x( + )) g(x()) + β + L f + x( + ) x() Q( + ) L g + C L g γ f(x ) + γ x x() x x( + ) + g(x( + )) g(x()), where follows from (5). Thus, he second par in his lemma holds a ieraion +. Thus, boh pars in his lemma follow by inducion. Remark : Recall ha if each g k (x) is a linear funcion, hen L gk = 0 for all k {,,..., m}. In his case, equaion () reduces o 0 < γ /(β + L f ). B. Objecive Value Violaions Theorem (Objecive Value Violaions): Le x be an opimal soluion. If we choose γ according o () in Algorihm 3, hen for all, we have f(x()) f(x )+ R γ, where R is defined in Assumpion. Proof: Fix. By par in Lemma 8, we have (τ) + f(x(τ)) f(x ) + γ x x(τ )

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 x x(τ) + g(x(τ)) g(x(τ )) for all τ {0,,,...}. Summing over τ {0,,..., } yields (τ) + f(x(τ)) f(x ) + γ x x(τ ) x x(τ) + g(x(τ)) g(x(τ )). Recalling ha (τ) = L(τ +) L(τ) and simplifying he summaions yields L() L(0)+ f(x(τ)) f(x )+ γ x x( ) γ x x( ) + g(x( )) g(x( )) f(x ) + γ x x( ) + g(x( )) g(x( )). Rearranging erms yields f(x(τ)) f(x ) + γ x x( ) + g(x( )) g(x( )) + L(0) L() = f(x ) + γ x x( ) + g(x( )) g(x( )) + Q(0) Q() f(x ) + γ x x( ) f(x ) + R γ, where follows from he definiion ha L(0) = Q(0) and L() = Q() ; follows from he fac ha Q(0) g(x( )) and Q() g(x( )) for, i.e., par 3 in Lemma 3; and follows from he fac ha x y R for all x, y X, i.e., Assumpion. Dividing boh sides by facor yields f(x(τ)) f(x ) + R γ. Finally, since x() = x(τ) and f(x) is convex, By Jensen s inequaliy i follows ha f(x()) f(x(τ)). C. Consrain Violaions Theorem (Consrain Violaions): Le x be an opimal soluion and λ be defined in Assumpion. If we choose γ according o () in Algorihm( 3, hen for all, he consrains saisfy g k (x()) λ + R γ + C ), k {,,..., m}, where R and C are defined in Assumpion. Proof: x() = Fix and k {,,..., m}. Recall ha x(τ). Thus, g k (x()) g k (x(τ)) Q k() Q() ( λ + R γ + C ), where follows from he convexiy of g k (x) and Jensen s inequaliy; follows from Lemma 4; and follows from par in Lemma 8. Theorems and show ha Algorihm 3 ensures error decays like O(/) and provides an ɛ-approximiae soluion wih convergence ime O(/ɛ). D. Pracical Implemenaions By Theorems and, i suffices o choose γ according o () o guaranee he O(/) convergence rae of Algorihm 3. If all consrain funcions are linear, hen () is independen of λ by Remark. For general consrain funcions, we need o know he value of λ, which is ypically unknown, o selec γ according o (). However, i is easy o observe ha an upper bound of λ is sufficien for us o choose γ saisfying (). To obain an upper bound of λ, he nex lemma is useful if problem ()-(3) has an inerior feasible poin, i.e., he Slaer condiion is saisfied. Lemma 9 (Lemma in 5): Consider convex program min f(x) s.. g k (x) 0, k {,,..., m} x X R n and define he Lagrangian dual funcion as q(λ) = inf x X {f(x) + λ T g(x)}. If he Slaer condiion holds, i.e., here exiss ˆx X such ha g j (x) < 0, j {,,..., m}, hen he level ses Vˆλ = {λ 0 : q(λ) q(ˆλ)} are bounded for any ˆλ. In paricular, we have max λ Vˆλ λ min j m { g j(ˆx)} (f(ˆx) q(ˆλ)). By Lemma 9, if convex program ()-(3) has a feasible poin ˆx X such ha g k (ˆx) < 0, k {,,..., m}, hen we can ake an arbirary ˆλ 0 o obain he value q(ˆλ) = inf x X {f(x) + ˆλ T g(x)} and conclude ha λ min j m { g (f(ˆx) q(ˆλ)). j(ˆx)} Since f(x) is coninuous and X is a compac se, here exiss consan F > 0 such ha f(x) F for all x X. Thus, we can ake ˆλ = 0 such ha q(ˆλ) = min x X {f(x)} F. I follows from Lemma 9 ha λ min j m { g (f(ˆx) q(ˆλ)) j(ˆx)} F min j m { g. j(ˆx)} REFERENCES S. Boyd and L. Vandenberghe, Convex Opimizaion. Cambridge Universiy Press, 004. A. Nedić and A. Ozdaglar, Subgradien mehods for saddle-poin problems, Journal of Opimizaion Theory and Applicaions, vol. 4, no., pp. 05 8, 009. 3 D. P. Bersekas, Nonlinear Programming, nd ed. Ahena Scienific, 999. 4 M. J. Neely, Disribued and secure compuaion of convex programs over a nework of conneced processors, in DCDIS Conference Guelph, July 005. 5 A. Nedić and A. Ozdaglar, Approximae primal soluions and rae analysis for dual subgradien mehods, SIAM Journal on Opimizaion, vol. 9, no. 4, pp. 757 780, 009. 6 M. J. Neely, A simple convergence ime analysis of drif-plus-penaly for sochasic opimizaion and convex programs, arxiv:4.079, 04. 7 H. Yu and M. J. Neely, A simple parallel algorihm wih an O(/) convergence rae for general convex programs, arxiv:5.08370, 05. 8 Y. Neserov, Inroducory Lecures on Convex Opimizaion: A Basic Course. Springer Science & Business Media, 004. 9 H. Yu and M. J. Neely, A primal-dual ype algorihm wih he O(/) convergence rae for large scale consrained convex programs, arxiv:604.06, 06.