A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers large scale consrained convex programs. These are ofen difficul o solve by inerior poin mehods or oher Newon-ype mehods due o he prohibiive compuaion and sorage complexiy for Hessians or marix inversions. Insead, large scale consrained convex programs are ofen solved by gradien based mehods or decomposiion based mehods. The convenional primal-dual subgradien mehod, also known as he Arrow-Hurwicz-Uzawa subgradien mehod, is a low complexiy algorihm wih he O(/ ) convergence rae, where is he number of ieraions. If he objecive and consrain funcions are separable, he Lagrangian dual ype mehod can decompose a large scale convex program ino muliple parallel small scale convex programs. The classical dual gradien algorihm is an example of Lagrangian dual ype mehods and has convergence rae O(/ ). Recenly, he auhors of he curren paper proposed a new Lagrangian dual ype algorihm wih faser O(/) convergence. However, if he objecive or consrain funcions are no separable, each ieraion requires o solve a large scale unconsrained convex program, which can have huge complexiy. This paper proposes a new primal-dual ype algorihm, which only involves simple gradien updaes a each ieraion and has O(/) convergence. I. INTRODUCTION Fix posiive inegers n and m, which are ypically large. Consider he general consrained convex program: minimize: f(x) () such ha: g k (x) 0, k {,,..., m} () x X (3) where se X R n is a compac convex se; funcion f(x) is convex and smooh on X ; and funcions g k (x), k {,,..., m} are convex, smooh and Lipschiz coninuous on X. Denoe he sacked vecor of muliple funcions g (x), g (x),..., g m (x) as g(x) = g (x), g (x),..., g m (x) T. The Lipschiz coninuiy of each g k (x) implies ha g(x) is Lipschiz coninuous on X. Throughou his paper, we use o represen he Euclidean norm and have he following assumpions on convex program ()-(3): Assumpion (Basic Assumpions): There exiss a (possibly non-unique) opimal soluion x X ha solves convex program ()-(3). There exiss L f 0 such ha f(x) f(y) L f x y for all x, y X, i.e., f(x) is smooh wih modulus L f. For each k {,,..., m}, here exiss L gk 0 such ha g k (x) g k (y) L gk x y The auhors are wih he Elecrical Engineering deparmen a he Universiy of Souhern California, Los Angeles, CA. for all x, y X, i.e., g k (x) is smooh wih modulus L gk. Denoe L g = L g,..., L gm T. There exiss β 0 such ha g(x) g(y) β x y, x, y X, i.e., g(x) is Lipschiz coninuous wih modulus β. There exiss C 0 such ha g(x) C, x X. There exiss R 0 such ha x y R, x, y X. Noe ha he exisence of C follows from he coninuiy of g(x) and he compacness of se X. The exisence of R follows from he compacness of se X. Assumpion (Exisence of Lagrange mulipliers): There exiss a Lagrange muliplier vecor λ = λ, λ,..., λ m 0 aaining he srong dualiy for problem ()-(3), i.e., q(λ ) = min x X {f(x) : g k(x) 0, k {,,..., m}}, where q(λ) = min {f(x) + m x X k= λ kg k (x)} is he Lagrangian dual funcion of problem ()-(3). Assumpion is a mild condiion. For example, i is implied by he Slaer condiion for convex programs. A. Large Scale Convex Programs In general, convex program ()-(3) can be solved via inerior poin mehods (or oher Newon ype mehods) which involve he compuaion of Hessians and marix inversions a each ieraion. The associaed compuaion complexiy and memory space complexiy a each ieraion is beween O(n ) and O(n 3 ), which is prohibiive when n is exremely large. For example, if n = 0 5 and each floaing poin number uses 4 byes, hen 40 Gbyes of memory is required even o save he Hessian a each ieraion. Thus, large scale convex programs are usually solved by gradien based mehods or decomposiion based mehods. B. The Primal-Dual Subgradien Mehod The primal-dual subgradien mehod, also known as he Arrow-Hurwicz-Uzawa Subgradien Mehod, applied o convex program ()-(3) is described in Algorihm. The updaes of x() and λ() only involve he compuaion of gradien and simple projecion operaions, which are much simpler han he compuaion of Hessians and marix inversions for exremely large n. Thus, compared wih he inerior poin mehods, he primal-dual subgradien algorihm has lower complexiy compuaions a each ieraion and hence is more suiable o large scale convex programs. However, he

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 convergence rae of Algorihm is only O(/ ), where is he number of ieraions. Algorihm The Primal-Dual Subgradien Algorihm Le c > 0 be a consan sep size. Choose any x(0) X. Iniialize Lagrangian mulipliers λ k (0) = 0, k {,,..., m}. A each ieraion {,,...}, observe x( ) and λ( ) and do he following: m Choose x() = P X x( ) c k= λ k( ) g k (x( )), where P X is he projecion ono convex se X. Updae Lagrangian mulipliers λ k () = λ k ( ) + cg k (x( )) λmax k 0, k {,,..., m}, where λ max k > λ k and max 0 is he projecion ono inerval 0, λ max k. Updae he running averages x(+) = x(τ) = x() + + x() +. C. Lagrangian Dual Type Mehods The classical dual subgradien algorihm is a Lagrangian dual ype ieraive mehod ha approaches opimaliy for sricly convex programs 3. A modificaion of he classical dual subgradien algorihm ha averages he resuling sequence of primal esimaes can solve general convex programs and has an O(/ ) convergence rae 4, 5, 6. The dual subgradien algorihm wih primal averaging is suiable o large scale convex programs because he updaes of each componen x i () are independen and parallel if funcions f(x) and g k (x) in convex program ()-(3) are separable wih respec o each componen (or block) of x, e.g., f(x) = n i= f i(x i ) and g k (x) = n i= g k,i(x i ). Recenly, a new Lagrangian dual ype algorihm wih convergence rae O(/) for general convex programs is proposed in 7. This algorihm can solve convex program ()-(3) following he seps described in Algorihm. Similar o he dual subgradien algorihm wih primal averaging, Algorihm can decompose he updaes of x() ino smaller independen subproblems if funcions f(x) and g k (x) are separable. Moreover, Algorihm has O(/) convergence, which is faser han he primal-dual subgradien or he dual subgradien algorihm wih primal averaging. However, if f(x) or g k (x) are no separable, each updae of x() requires o solve a se consrained convex program. If he dimension n is large, such a se consrained convex program should be solved via a gradien based mehod insead of a Newon mehod. However, he gradien based mehod for se consrained convex programs is an ieraive echnique and involves a leas one projecion operaion a each ieraion. In his paper, we say ha he primal dual subgradien algorihm and he dual subgradien algorihm have an O(/ ) convergence rae in he sense ha hey achieve an ɛ-approximae soluion wih O(/ɛ ) ieraions by using an O(ɛ) sep size. The error of hose algorihms does no necessarily coninue o decay afer he ɛ-approximae soluion is reached. In conras, he algorihm in he curren paper has a faser O(/) convergence and his holds for all ime, so ha error goes o zero as he number of ieraions increases. Algorihm Algorihm in 7 Le α > 0 be a consan parameer. Choose any x( ) X. Iniialize virual queues Q k (0) = max{0, g k (x( ))}, k {,,..., m}. A each ieraion {0,,,...}, observe x( ) and Q() and do he following: Choose x() = argmin x X {f(x) + Q() + g(x( )) T g(x) + α x x( ) }. Updae virual queue vecor Q() via Q k ( + ) = max{ g k (x()), Q k () + g k (x())}, k {,,..., m}. Updae he running averages via x( + ) = + D. New Algorihm x(τ) = x() + + x() +. Consider large scale convex programs wih non-separable f(x) or g k (x), e.g., f(x) = Ax b. In his case, Algorihm has convergence rae O(/ ) using low complexiy ieraions; while Algorihm has convergence rae O(/) using high complexiy ieraions. This paper proposes a new algorihm described in Algorihm 3 which combines he advanages of Algorihm and Algorihm. The new algorihm modifies Algorihm by changing he updae of x() from a minimizaion problem o a simple projecion. Meanwhile, he O(/) convergence rae of Algorihm is preserved in he new algorihm. Algorihm 3 New Algorihm Le γ > 0 be a consan sep size. Choose any x( ) X. Iniialize virual queues Q k (0) = max{0, g k (x( ))}, k {,,..., m}. A each ieraion {0,,,...}, observe x( ) and Q() and do he following: Define d() = f(x( )) + m k= Q k() + g k (x( )) g k (x( )), which is he gradien of funcion φ(x) = f(x) + Q() + g(x( )) T g(x) a poin x = x( ). Choose x() = P X x( ) γd(), where P X is he projecion ono convex se X. Updae virual queue vecor Q() via Q k ( + ) = max{ g k (x()), Q k () + g k (x())}, k {,,..., m}. Updae he running averages x( + ) = + x(τ) = x() + + x() +. II. PRELIMINARIES AND BASIC ANALYSIS This secion presens useful preliminaries on convex analysis and imporan facs of Algorihm 3. A. Preliminaries Definiion (Lipschiz Coninuiy): Le X R n be a convex se. Funcion h : X R m is said o be Lipschiz coninuous on X wih modulus L if here exiss L > 0 such ha h(y) h(x) L y x for all x, y X.

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 Definiion (Smooh Funcions): Le X R n and funcion h(x) be coninuously differeniable on X. Funcion h(x) is said o be smooh on X wih modulus L if h(x) is Lipschiz coninuous on X wih modulus L. Noe ha linear funcion h(x) = a T x is smooh wih modulus 0. If a funcion h(x) is smooh wih modulus L, hen ch(x) is smooh wih modulus cl for any c > 0. Lemma (Descen Lemma, Proposiion A.4 in 3): If h is smooh on X wih modulus L, hen h(y) h(x) + h(x) T (y x) + L y x for all x, y X. Definiion 3 (Srongly Convex Funcions): Le X R n be a convex se. Funcion h is said o be srongly convex on X wih modulus α if here exiss a consan α > 0 such ha h(x) α x is convex on X. If h(x) is convex and α > 0, hen h(x) + α x x 0 is srongly convex wih modulus α for any consan x 0. Lemma : Le X R n be a convex se. Le funcion h be srongly convex wih modulus α and x op be a global minimum of h on X. Then, h(x op ) h(x) α xop x, x X. Proof: A special case when h is differeniable and X = R n is Theorem..8 in 8. The proof for general srongly convex funcion h and general convex se X is in 7. B. Basic Properies This subsecion presens preliminary resuls relaed o he virual queue updae (Lemmas 3-6) ha are proven for Algorihm in 7. Lemma 3 (Lemma 3 in 7): In Algorihm 3, we have ) A each ieraion {0,,,...}, Q k () 0 for all k {,,..., m}. ) A each ieraion {0,,,...}, Q k () + g k (x( )) 0 for all k {,..., m}. 3) A ieraion = 0, Q(0) g(x( )). A each ieraion {,,...}, Q() g(x( )). Lemma 4 (Lemma 7 in 7): Le Q(), {0,,...} be he sequence generaed by Algorihm 3. For any, Q k () g k (x(τ)), k {,,..., m}. Le Q() = Q (),..., Q m () T be he vecor of virual queue backlogs. Define L() = Q(). The funcion L() shall be called a Lyapunov funcion. Define he Lyapunov drif as () = L(+) L() = Q(+) Q(). Lemma 5 (Lemma 4 in 7): A each ieraion {0,,,...} in Algorihm 3, an upper bound of he Lyapunov drif is given by () Q T ()g(x()) + g(x()). (4) Lemma 6 (Lemma 8 in 7): Le x be an opimal soluion and λ be defined in Assumpion. Le x(), Q(), {0,,...} be sequences generaed by Algorihm 3. Then, f(x(τ)) f(x ) λ Q() for all. III. CONVERGENCE RATE ANALYSIS OF ALGORITHM 3 This secion analyzes he convergence rae of Algorihm 3 for problem ()-(3). A. Upper Bounds of he Drif-Plus-Penaly Expression Lemma 7: Le x be an opimal soluion. For all 0 in Algorihm 3, we have () + f(x()) f(x ) + γ x x( ) x x() + g(x()) g(x( )) + β + L f + Q() L g + C L g γ x() x( ), where β, L f, L g and C are defined in Assumpion. Proof: Fix 0. Recall ha φ(x) = f(x) + Q() + g(x( )) T g(x) as defined in Algorihm 3. Noe ha par in Lemma 3 implies ha Q() + g(x( )) is componen-wise nonnegaive. Hence, φ(x) is convex. Since d() = φ(x( )), he projecion operaor in Algorihm 3 can be reinerpreed as an opimizaion problem: x() =P X x( ) γd() = argmin x X φ(x( )) + T φ(x( ))x x( ) + γ x x( ), (5) where follows by removing he consan erm φ(x( )) in he minimizaion, compleing he square, and using he fac ha he projecion of a poin ono a se is equivalen o he minimizaion of he Euclidean disance o his poin over he same se. (See 9 for he deailed proof.) Since γ x x( ) is srongly convex wih respec o x wih modulus γ, i follows ha φ(x( )) + T φ(x( ))x x( ) + γ x x( ) is srongly convex wih respec o x wih modulus γ. Since x() is chosen o minimize he above srongly convex funcion, by Lemma, we have φ(x( )) + T φ(x( ))x() x( ) + x() x( ) γ φ(x( )) + T φ(x( ))x x( ) + γ x x( ) γ x x() φ(x ) + γ x x( ) x x() =f(x ) + Q() + g(x( )) T g(x ) }{{} 0 + γ x x( ) x x() f(x ) + γ x x( ) x x(), (6) where follows from he convexiy of φ(x); follows from he definiion of φ(x); and follows by using he fac ha g k (x ) 0 and Q k () + g k (x( )) 0 (i.e., par in Lemma 3) for all k {,,..., m} o eliminae he erm marked by an underbrace. Recall ha f(x) is smooh on X wih modulus L f by Assumpion. By Lemma, we have f(x()) f(x( )) + T f(x( ))x() x( ) + L f x() x( ). (7)

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 Recall ha each g k (x) is smooh on X wih modulus L gk by Assumpion. Thus, Q k () + g k (x( ))g k (x) is smooh wih modulus Q k ()+g k (x( ))L gk. By Lemma, we have Q k () + g k (x( ))g k (x()) Q k () + g k (x( ))g k (x( ))+Q k ()+g k (x( )) T g k (x( ))x() x( )+ Q k()+g k (x( ))L gk x() x( ). Summing his inequaliy over k {,,..., m} yields Q() + g(x( )) T g(x()) Q() + g(x( )) T g(x( ))+ m Q k () + g k (x( )) T g k (x( ))x() x( ) k= + Q() + g(x( ))T L g x() x( ). (8) Summing up (7) and (8) ogeher yields f(x()) + Q() + g(x( )) T g(x()) φ(x( )) + T φ(x( ))x() x( ) + L f + Q() + g(x( )) T L g x() x( ), (9) where follows from he definiion of φ(x). Subsiuing (6) ino (9) yields Summing (4) wih his inequaliy yields () + f(x()) f(x ) + γ x x( ) x x() + g(x()) g(x( )) + + Q() + g(x( )) T L g γ β + L f x() x( ) f(x ) + γ x x( ) x x() + g(x()) g(x( )) + + Q() L g + C L g γ β + L f x() x( ), where follows from Q() + g(x( )) T L g Q() + g(x( )) L g Q() + g(x()) L g Q() L g + C L g, where he firs sep follows from Cauchy-Schwarz inequaliy, he second follows from he riangular inequaliy and he hird follows from g(x) C for all x X, i.e., Assumpion. Lemma 8: Le x be an opimal soluion and λ be defined in Assumpion. Define D = β +L f + λ L g + C L g, where β, L f, L g and C are defined in Assumpion. If γ > 0 in Algorihm 3 saisfies f(x()) + Q() + g(x( )) T g(x()) f(x ) + + Q() + g(x( )) T L g γ L f γ x x( ) x x() + x() x( ). (0) Noe ha u T u = u + u u u for any u, u R m. Thus, we have g(x( )) T g(x()) = g(x( )) + g(x()) g(x( )) g(x()). Subsiuing his ino (0) and rearranging erms yields f(x()) + Q T ()g(x()) f(x ) + L f x() x( ) γ x x( ) x x() + + Q() + g(x( )) T L g γ + g(x( )) g(x()) g(x( )) g(x()) f(x ) + β + x() x( ) γ x x( ) x x() + L f + Q() + g(x( )) T L g γ g(x( )) g(x()) where follows from g(x( )) g(x()) β x() x( ), which furher follows from he assumpion ha g(x) is Lipschiz coninuous wih modulus β. D + L g R γ γ where R is defined in Assumpion, e.g., 0, () 0 < γ /( L g R + D), () hen a each ieraion {0,,,...}, we have ) Q() λ + R γ + C. ) ()+f(x()) f(x )+ γ x x( ) x x() + g(x()) g(x( )). Proof: Before he main proof, we verify ha γ given by () saisfies (). Need o choose γ > 0 such ha D + L g R γ γ 0 Dγ + L g R γ 0 γ L g R + L g R + 4D D = L g R + L g R + 4D. Noe ha L g R+ L g R +4D L = g R+ D L, where follows from a + b a + g R+ D b, a, b 0. Thus, if γ L, i.e., 0 < γ g R+ D ( L g R+ D), hen inequaliy () holds. Nex, we prove his lemma by inducion. Consider = 0. Q(0) λ + R γ + C follows from he fac ha Q(0) g(x( )) C, where follows from par 3 in Lemma 3 and follows

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 from Assumpion. Thus, he firs par in his lemma holds a ieraion = 0. Noe ha β + L f + Q(0) L g + C L g /γ β + L f + ( λ + R γ + C ) L g + C L g γ =D + L g R γ γ 0, (3) where follows from Q(0) λ + R γ +C; follows from he definiion of D; and follows from (), i.e., he selecion rule of γ. Applying Lemma 7 a ieraion = 0 yields (0) + f(x(0)) f(x ) + γ x x( ) x x(0) + g(x(0)) g(x( )) + β + L f x(0) x( ) + Q(0) L g + C L g γ f(x ) + γ x x( ) x x(0) + g(x(0)) g(x( )), where follows from (3). Thus, he second par in his lemma holds a ieraion = 0. Assume (τ) + f(x(τ)) f(x ) + γ x x(τ ) x x(τ) + g(x(τ)) g(x(τ )) holds for all 0 τ and consider ieraion +. Summing his inequaliy over τ {0,,..., } yields (τ) + f(x(τ)) ( + )f(x ) + γ x x(τ ) x x(τ) + g(x(τ)) g(x(τ )). Recalling ha (τ) = L(τ + ) L(τ) and simplifying he summaions yields L( + ) L(0) + f(x(τ)) ( + )f(x ) + γ x x( ) γ x x() + g(x()) g(x( )) ( + )f(x ) + γ x x( ) + g(x()) g(x( )). Rearranging erms yields f(x(τ)) ( + )f(x ) + γ x x( ) + g(x()) g(x( )) + L(0) L( + ) = ( + )f(x ) + γ x x( ) + g(x()) g(x( )) + Q(0) Q( + ) ( + )f(x ) + R γ + C Q( + ), (4) where follows from L(0) = Q(0) and L( + ) = Q( + ) ; follows from x y R for all x, y X, i.e., Assumpion, g(x()) C, i.e., Assumpion, and Q(0) g(x( )), i.e., par 3 in Lemma 3. Applying Lemma 6 a ieraion + yields f(x(τ)) ( + )f(x ) λ Q( + ). Combining his inequaliy wih (4) and cancelling he common erm ( + )f(x ) on boh sides yields Q( + ) λ Q( + ) R γ C 0 ( Q( + ) λ ) λ + R γ + C Q( + ) λ + λ + R /γ + C Q( + ) λ + R/ γ + C, where follows from he basic inequaliy a + b + c a + b + c for any a, b, c 0. Thus, he firs par in his lemma holds a ieraion +. Noe ha β + L f + Q( + ) L g + C L g γ β + L f + ( λ + R γ + C) L g + C L g γ =D + L g R γ γ 0, (5) where follows from Q(+) λ + R γ +C; follows from he definiion of D; and follows from (), i.e., he selecion rule of γ. Applying Lemma 7 a ieraion + yields ( + ) + f(x( + )) f(x ) + γ x x() x x( + ) + g(x( + )) g(x()) + β + L f + x( + ) x() Q( + ) L g + C L g γ f(x ) + γ x x() x x( + ) + g(x( + )) g(x()), where follows from (5). Thus, he second par in his lemma holds a ieraion +. Thus, boh pars in his lemma follow by inducion. Remark : Recall ha if each g k (x) is a linear funcion, hen L gk = 0 for all k {,,..., m}. In his case, equaion () reduces o 0 < γ /(β + L f ). B. Objecive Value Violaions Theorem (Objecive Value Violaions): Le x be an opimal soluion. If we choose γ according o () in Algorihm 3, hen for all, we have f(x()) f(x )+ R γ, where R is defined in Assumpion. Proof: Fix. By par in Lemma 8, we have (τ) + f(x(τ)) f(x ) + γ x x(τ )

PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 x x(τ) + g(x(τ)) g(x(τ )) for all τ {0,,,...}. Summing over τ {0,,..., } yields (τ) + f(x(τ)) f(x ) + γ x x(τ ) x x(τ) + g(x(τ)) g(x(τ )). Recalling ha (τ) = L(τ +) L(τ) and simplifying he summaions yields L() L(0)+ f(x(τ)) f(x )+ γ x x( ) γ x x( ) + g(x( )) g(x( )) f(x ) + γ x x( ) + g(x( )) g(x( )). Rearranging erms yields f(x(τ)) f(x ) + γ x x( ) + g(x( )) g(x( )) + L(0) L() = f(x ) + γ x x( ) + g(x( )) g(x( )) + Q(0) Q() f(x ) + γ x x( ) f(x ) + R γ, where follows from he definiion ha L(0) = Q(0) and L() = Q() ; follows from he fac ha Q(0) g(x( )) and Q() g(x( )) for, i.e., par 3 in Lemma 3; and follows from he fac ha x y R for all x, y X, i.e., Assumpion. Dividing boh sides by facor yields f(x(τ)) f(x ) + R γ. Finally, since x() = x(τ) and f(x) is convex, By Jensen s inequaliy i follows ha f(x()) f(x(τ)). C. Consrain Violaions Theorem (Consrain Violaions): Le x be an opimal soluion and λ be defined in Assumpion. If we choose γ according o () in Algorihm( 3, hen for all, he consrains saisfy g k (x()) λ + R γ + C ), k {,,..., m}, where R and C are defined in Assumpion. Proof: x() = Fix and k {,,..., m}. Recall ha x(τ). Thus, g k (x()) g k (x(τ)) Q k() Q() ( λ + R γ + C ), where follows from he convexiy of g k (x) and Jensen s inequaliy; follows from Lemma 4; and follows from par in Lemma 8. Theorems and show ha Algorihm 3 ensures error decays like O(/) and provides an ɛ-approximiae soluion wih convergence ime O(/ɛ). D. Pracical Implemenaions By Theorems and, i suffices o choose γ according o () o guaranee he O(/) convergence rae of Algorihm 3. If all consrain funcions are linear, hen () is independen of λ by Remark. For general consrain funcions, we need o know he value of λ, which is ypically unknown, o selec γ according o (). However, i is easy o observe ha an upper bound of λ is sufficien for us o choose γ saisfying (). To obain an upper bound of λ, he nex lemma is useful if problem ()-(3) has an inerior feasible poin, i.e., he Slaer condiion is saisfied. Lemma 9 (Lemma in 5): Consider convex program min f(x) s.. g k (x) 0, k {,,..., m} x X R n and define he Lagrangian dual funcion as q(λ) = inf x X {f(x) + λ T g(x)}. If he Slaer condiion holds, i.e., here exiss ˆx X such ha g j (x) < 0, j {,,..., m}, hen he level ses Vˆλ = {λ 0 : q(λ) q(ˆλ)} are bounded for any ˆλ. In paricular, we have max λ Vˆλ λ min j m { g j(ˆx)} (f(ˆx) q(ˆλ)). By Lemma 9, if convex program ()-(3) has a feasible poin ˆx X such ha g k (ˆx) < 0, k {,,..., m}, hen we can ake an arbirary ˆλ 0 o obain he value q(ˆλ) = inf x X {f(x) + ˆλ T g(x)} and conclude ha λ min j m { g (f(ˆx) q(ˆλ)). j(ˆx)} Since f(x) is coninuous and X is a compac se, here exiss consan F > 0 such ha f(x) F for all x X. Thus, we can ake ˆλ = 0 such ha q(ˆλ) = min x X {f(x)} F. I follows from Lemma 9 ha λ min j m { g (f(ˆx) q(ˆλ)) j(ˆx)} F min j m { g. j(ˆx)} REFERENCES S. Boyd and L. Vandenberghe, Convex Opimizaion. Cambridge Universiy Press, 004. A. Nedić and A. Ozdaglar, Subgradien mehods for saddle-poin problems, Journal of Opimizaion Theory and Applicaions, vol. 4, no., pp. 05 8, 009. 3 D. P. Bersekas, Nonlinear Programming, nd ed. Ahena Scienific, 999. 4 M. J. Neely, Disribued and secure compuaion of convex programs over a nework of conneced processors, in DCDIS Conference Guelph, July 005. 5 A. Nedić and A. Ozdaglar, Approximae primal soluions and rae analysis for dual subgradien mehods, SIAM Journal on Opimizaion, vol. 9, no. 4, pp. 757 780, 009. 6 M. J. Neely, A simple convergence ime analysis of drif-plus-penaly for sochasic opimizaion and convex programs, arxiv:4.079, 04. 7 H. Yu and M. J. Neely, A simple parallel algorihm wih an O(/) convergence rae for general convex programs, arxiv:5.08370, 05. 8 Y. Neserov, Inroducory Lecures on Convex Opimizaion: A Basic Course. Springer Science & Business Media, 004. 9 H. Yu and M. J. Neely, A primal-dual ype algorihm wih he O(/) convergence rae for large scale consrained convex programs, arxiv:604.06, 06.