Log-Sigmoid Multipliers Method in Constrained Optimization

Size: px

Start display at page:

Download "Log-Sigmoid Multipliers Method in Constrained Optimization"

Victoria Griffith
5 years ago
Views:

1 Annals of Operations Research 101, , Kluwer Academic Publishers. Manufactured in The Netherlands. Log-Sigmoid Multipliers Method in Constrained Optimization ROMAN A. POLYAK Systems Engineering and Operations Research Department and Mathematical Sciences Department, George Mason University, Fairfa, VA 22030, USA Abstract. In this paper we introduced and analyzed the Log-Sigmoid LS) multipliers method for constrained optimization. The LS method is to the recently developed smoothing technique as augmented Lagrangian to the penalty method or modified barrier to classical barrier methods. At the same time the LS method has some specific properties, which make it substantially different from other nonquadratic augmented Lagrangian techniques. We established convergence of the LS type penalty method under very mild assumptions on the input data and estimated the rate of convergence of the LS multipliers method under the standard second order optimality condition for both eact and noneact minimization. Some important properties of the dual function and the dual problem, which are based on the LS Lagrangian, were discovered and the primal dual LS method was introduced. Keywords: log-sigmoid, multipliers method, duality, smoothing technique 1. Introduction Recently Chen and Mangasarian used the integral of the scaled sigmoid function St,k) = 1 + ep kt)) 1 as an approimation for + = ma{0,} to develop the smoothing technique for solving conve system of inequalities and linear complementarity problems [6]. Later Auslender et al. analyzed the smoothing technique for constrained optimization [1]. The smoothing method for constrained optimization employs a smooth approimation of + to transform a constrained optimization problem into a sequence of unconstrained optimization problems. The convergence of the correspondent sequence of the unconstrained minimizers to the primal solution is due to the unbounded increase of the scaling parameter. So the smoothing technique is in fact a penalty type method with a smooth penalty function and can be considered as a particular case of SUMT [7]. There are few well known difficulties associated with the penalty type approach: rather slow convergence, the Hessian of the penalty function became ill conditioned and the area where Newton method is well defined shrinks to a point when the scaling parameter k. This paper is dedicated to Professor Anthony V. Fiacco on the occasion of his 70th birthday. Partially supported by NSF under Grant DMS

2 428 POLYAK It motivates an alternative approach. We use the Log-Sigmoid LS) function ψt) = 2ln2St,1) to transform the constraints of a given constrained optimization problem into an equivalent one. The transformation ψt) is parametrized by a positive scaling parameter. Simultaneously we tranform the objective function with log-sigmoid type transformation. The classical Lagrangian for the equivalent problem the Log-Sigmoid Lagrangian LSL) is our basic instrument. There are three basic reasons for using the LS transformation and the corresponding Lagrangian: 1) ψ C on, ); 2) the LSL is as smooth as the initial functions in the entire primal space; 3) ψ and ψ are bounded on, ). Sequential unconstrained minimization of the LSL in primal space followed by eplicit formula for the Lagrange multipliers update forms the LS multipliers method. Our first contribution is the convergence proof of the LS multipliers method. It is proven that for inequality constrained optimization problem, which satisfies the standard second order optimality conditions the LS method converges with Q-linear rate for any fied but large enough scaling parameter. If one changes the scaling parameter from step to step as it takes place in the smoothing methods then the rate of convergence is Q-superlinear. It is worth to mention that such substantial improvement of the rate of convergence is possible to achieve without increase computational efforts per step as compare with the smoothing technique. Our second contribution is the proof that a particular modification of the LS method retains the up to Q-superlinear rate of convergence if instead of the eact primal minimizer one uses its approimation. It makes the LS multipliers method practical and together with the properties 1) 3) of the transformation ψ increases the efficiency of the Newton method for constrained optimization. We also discovered that the dual function and the dual problem, which are based on LSL have some etra important properties on the top of those which are typical for the classical dual function and the corresponded dual problem. The new properties of the dual function allow to use Newton type methods for solving the dual problem, which leads to the second order multipliers methods with up to quadratic rate of convergence. Finally we introduced the primal dual LS method, which has been tested numerically on a number of LP and NLP problems. The numerical results obtained clearly indicate that the primal dual LS method can be very efficient in the final phase of the computational process. The paper is organized as follows. The problem formulation and the basic assumptions are given in the net section. In section 3, we consider the LS transformation and its properties. In section 4 we consider the equivalent problem and correspondent

3 LOG-SIGMOID MULTIPLIERS 429 Lagrangian. The LS multiplier method is introduced in section 5. In section 6 we establish the convergence and estimate the rate of convergence of the LS multipliers method. In section 7 we consider the modification of the LS method and show that the rate of convergence of LS methods can be retained for ineact minimization. The primal dual LS method is introduced in section 8. Duality issues related to the LSL are considered in section 9. We conclude the paper with some remarks related to the future research. 2. Statement of the problem and basic assumptions Let f : R n R 1 be conve and all c i : R n R 1, i = 1,...,p, be concave and smooth functions. We consider a conve set ={ R n + : c i) 0, i= 1,...,p} and the following conve optimization problem. We will assume that: X = arg min { f) }. A) The optimal set X is not empty and bounded. B) The Slater s condition holds, i.e., there eists ˆ R n ++ : c i ˆ) > 0, i= 1,...,p. To simplify consideration we will include the nonnegativity constraints i 0, i = 1,...,n, into the set c i ) 0, i.e., = { R n : c i ) 0, i= 1,...,p, c p+1 ) = 1 0,...,c p+n ) = n 0 } = { R n : c i ) 0, i= 1,...,m }, m = p + n. If B) holds and f), c i ), i = 1,...,m, are smooth, then the Karush Kuhn Tucker s KKT s) conditions hold true, i.e., there eists a nonnegative vector λ = λ 1,...,λ m ) such that L,λ ) = f ) m λ i c i ) = 0, 2.1) λ i c i ) = 0, i = 1,...,m, 2.2) where L, λ) = f) m λ ic i ) is the Lagrangian for the primal problem P). Also due to B), the optimal dual set { L = λ R m + : f ) m λ i c i ) } = 0, X 2.3) is bounded. Along with the primal problem P) we consider the dual problem where dλ) = inf L, λ) is the dual function. λ L = Arg ma { dλ) λ R m +}, D) P)

4 430 POLYAK Later we will use the standard second order optimality condition. Let us assume that the active constraint set at is I ={i: c i ) = 0} ={1,...,r}. We consider the vector-functions c T ) = c 1 ),..., c m )), cr) T ) = c 1),..., c r )) and their Jacobians c) = J c) ) c 1 ) =., c r) ) = J c r) ) ) c 1 ) =.. c m ) c r) ) The sufficient regularity conditions rank c r) ) = r, λ i > 0, i I, 2.4) together with the sufficient condition for the minimum to be isolated 2 L,λ ) y,y ) ρy,y), ρ > 0, y 0: c r) ) y = 0 2.5) comprise the standard second order optimality sufficient conditions. We conclude the section with an assertion, which will be used later. The following assertion is a slight modification of Debreu theorem see, for eample, [11]). Assertion 2.1. Let A be a symmetric n n matri, let B an r n matri, = diagλ i ) r and λ i > 0. Ay,y) ρy,y), ρ > 0, y: By = 0 2.6) then there eists k 0 > 0 large enough such that for any 0 <µ<ρwe have A + kb T B ), ) µ,), R n, 2.7) whenever k k Log-sigmoid transformation The Log-Sigmoid Transformation LST) ψ : R, 2ln2) we define by the formula ψt) = 2ln2St,1) = 2ln2 1 + e t) 1 = 2 ln 2 + t ln 1 + e t )). 3.1) For the scaled log-sigmoid transformation we have k 1 ψkt) = 2k 1 ln 2St,k) = 2k 1 ln 2 ln 1 + e kt)), k > 0. Let us consider the following function: { 2t + 2k vt,k) = 1 ln 2, t 0, 2k 1 ln 2, t 0.

5 LOG-SIGMOID MULTIPLIERS 431 It is easy to see that vt,k) k 1 ψkt) = { 2k 1 ln 1 + e kt), t 0, 2k 1 ln 1 + e kt), t 0. Therefore the following estimation is taking place 0 vt,k) k 1 ψkt) 2k 1 ln 2, <t<. The assertion below states the basic LST properties. Assertion 3.1. The LST ψ has the following properties: A1) ψ0) = 0; A2) ψ t) = 21 + e t ) 1 > 0, t, + ) and ψ 0) = 1; A3) ψ t) = 2e t 1 + e t ) 2 < 0, t, + ) and ψ 0) = 1/2; A4) lim t ψ t) = lim t 21 + e t ) 1 = 0; A5) a) 0 <ψ t) < 2; b) 0.5 ψ t) < 0, <t<. One can check properties A1) A5) directly. The substantial difference between ψt) and the shifted log-barrier function, which leads to the MBF theory and methods [11], is that ψt) is defined on, + ) together with its derivatives of any order. The properties A5) distinguish ψt) not only from shifted barrier and eponential transformation see [9,17]), but also from classes of nonquadratic augmented Lagrangians P I and P I see [4, p. 309]) as well as transformations which have been considered lately see [3,8,13,16]). The properties A5) have substantial impact on both global and local behavior of the LS multiplier as well as on its dual equivalents interior pro method with entropy like ϕ-divergence distance. Entropy like ϕ-divergence distance function and correspondent interior pro method for the dual problem have been considered in [15]. The LS transformation and the correspondent LS multipliers method, which we consider in section 5 is equivalent to a pro method with entropy like ϕ-divergence distance for the dual problem. The ϕ-divergence distance is based on Fermi Dirac kernel ϕ = ψ, because the Fenchel conjugate of LS ψ s) = inf { st ψt) t R } = s 2) ln2 s) s ln s is in fact the Fermi Dirac entropy type function. The issues related to LS multipliers method and its dual equivalent we are going to consider in the upcoming paper.

6 432 POLYAK 4. Equivalent problem and log-sigmoid Lagrangian We use ln1 + e t ) to transform the objective function and ψt) to transform the constraints. For the objective function we obtain f):= ln 1 + e f)) > ) The constraints transformation is scaled by the parameter k>0, i.e., c i ) 0 2k 1 ln e kc i) ) 1 0, i = 1, 2,...,m. Therefore for any given k>0 the problem X = arg min { f) 2k 1 ln 2 ln 1 + e kc i) )) 0, i= 1,...,m } 4.2) is equivalent to the original problem P). The boundness of f)from below is important for our further considerations. The Lagrangian for the equivalent problem 4.2) log-sigmoid Lagrangian is the main tool in our analysis m L,λ,k)= f)+ 2k 1 λ i ln 1 + e kc i) ) m ) 2k 1 λ i ln ) The LSL can be rewritten as follows m L,λ,k)= f) 2 λ i c i k) + 2k 1 = f) m λ i c i ) + 2k 1 m m λ i ln 1 + e ) m ) kc i) 2k 1 λ i ln 2 λ i ln e kc i)/2 + 2k 1 m = L, λ) + 2k 1 λ i ln ekci)/2 + e kc i)/2 2 m ) = L, λ) + 2k 1 kci ) λ i ln ch. 2 m λ i ln 1 + ekc i) ) 2 The following lemma establishes the basic LSL properties at any KKT s pair,λ ). Lemma 4.1. For any KKT s pair,λ ) the following LSL properties are taking place for any k>0. 1 ) L,λ,k)= f ); 2 ) L,λ,k)= L,λ ) = f ) m λ i c i ) = 0; 3 ) 2 L,λ,k)= 2 L,λ ) + 0.5k c ) T c ).

7 LOG-SIGMOID MULTIPLIERS 433 Proof. In view of the complementarity condition we have L,λ,k ) = f ) 2k 1 for any k>0. For the LSL gradient in we have L,λ,k)= f) Again due to 2.2) we obtain m m λ i ln 2 ln 1 + e kc i ) )) = f ) 2λ i e kc i) 1 + e kc i) c i) = f) L,λ,k ) = L,λ ) = f ) For the LSL Hessian in we obtain m m λ i c i ) = 0. 2 L,λ,k)= 2 L, λ) + 2k c)t I + e kc)) 2 c), 2λ i 1 + e kc i) c i). where e kc) = diage kc i) ) m, = diagλ i) m and I is the identical matri in Rm. Again due to 2.2) for any k>0wehave 2 L,λ,k ) = 2 L,λ ) + 0.5k c ) T c ). 4.4) The local LSL properties 1 ) 3 ) are similar to those of the logarithmic MBF function [11]. Globally the LSL has some etra important features due to the properties A5). These features effect substantially the behavior of the correspondent multipliers method, which we consider in the net section. The following lemma characterizes the conveity properties of LSL. Lemma 4.2. If f)and all c i ) C 2 then for any fied λ R n ++ and k>0thelsl Hessian is positive definite for any R n, i.e., L,λ,k)is strictly conve in R n and strongly conve on any bounded set in R n. Proof. The proof follows directly from the formula of the LSL Hessian 2 L,λ,k)= 2 L, λ) + 2k c p)) T p) Ip + e kc p)) ) 2 cp) ) + 2k n) In + e k) 2 and the conveity of f)and all c i ). The following lemma 4.3 is a consequence of property 3 ) and assertion 1.1.

8 434 POLYAK Lemma 4.3. If conditions 2.4), 2.5) are satisfied then there eits k 0 > 0andM 0 > µ 0 > 0 that the following estimation M 0 ky,y) 2 L,λ,k ) y,y ) µ 0 y, y), y R n, 4.5) takes place for any fied k k 0. Proof. taking We obtain the right inequality 4.5) as a consequence of 2.5), 2.7) and 3 )by A = 2 L,λ ) and B = c r) ). The left inequality follows from the formula 4.4) for 2 L,λ,k) if k k 0 and k 0 > 0 is large enough. Corollary. If f)and all c i ) are twice continuous differentiable and ε>0issmall enough then for any fied k k 0 there eist a pair M > µ > 0 such that for any primal dual pair w =, λ) S w,ε ) = { w: } w w ε the following inequalities hold true. µ,y) 2 L,λ,k)y,y) My,y), y R n, 4.6) In other words in the neighborhood of the KKT s pair,λ ) the condition number cond 2 L,λ,k) µm 1 is stable for any fied k k 0. Remark 4.1. Lemma 4.3 is true whether f)and all c i ), i = 1,...,m,areconve or not. Lemma 4.4. If X is bounded, then for any λ R m ++ and k>0there eists ˆ =ˆλ,k) = arg min { L,λ,k) R n}. Proof. If X is bounded then by adding one etra constraint c m+1 ) = f) + M 0, where M>0 is large enough we obtain due to corollary 20 see [7, p. 94]), that the feasible set is bounded. Therefore without restriction of generality we assume from the very beginning that is bounded. We start by establishing that LSL L,λ,k) has no direction of recession in, i.e., for any nontrivial direction z R n lim L + tz,λ,k)=, λ t Rn ++,k>0. Let int, i.e., c i ) > 0. Due to the boundness of for any z 0 one can find i 0 : c i0 + tz) = 0, t >0, in fact, if c i + tz) > 0 t >0, i = 1,...,m,then is unbounded.

9 LOG-SIGMOID MULTIPLIERS 435 Let = + tz, using concavity of c i0 ) we obtain c i0 ) c i0 ) c i0 ), ) or i.e., 0 <α= c i0 ) c i0 ),z ) t, ci0 ), z ) α t 1 = β<0. 4.7) Again using the concavity of c i0 ) we obtain or Hence in view of 4.8) we obtain c i0 + tz) c i0 ) + c i0 ),z ) t t) c i0 + tz) βt t), t >0. 4.8) L + tz,λ,k)= f+ tz) + 2k 1 λ i ln 1 + e kc i+tz) ) 2k 1 λ i ln 2 f+ tz) 2k 1 λ i + 2k 1 λ i0 ln 1 + e kc i 0 +tz) ) = f+ tz) 2k 1 λ i + 2k 1 λ i0 ln 1 + e kc i 0 +tz) ) 2λ i0 c i0 + tz) f+ tz) 2k 1 λ i 2βλ i0 t t). Taking into account 4.1) and 4.6) we obtain, so the set lim L + tz,λ,k)=+, z t Rn, Xλ, k) = { ˆ L ˆ,λ,k) = inf R n L,λ,k) } is not empty and bounded Rockafellar [4, theorem 27.1d]). Moreover, for λ, k) R m+1 ++ due to lemma 4.2 the set Xλ, k) contains only one point ˆλ,k) = arg min{l,λ,k) R n }. The uniqueness of ˆλ,k) means that in contrast to the dual function dλ) = inf{l, λ) R n } which is based on the Lagrangian for the initial problem P), the dual function d k λ) = min{l,λ,k) R n }, which is based on LSL, is as smooth as the initial functions for any λ R n ++. Remark 4.2. Due to A5a)) we have lim t ψ t) = 2 <, therefore the eistence of ˆλ,k) does not follow from standard considerations [4, p. 329], see also [1]. In fact let us consider the following LP min{3 0}. WehaveX ={0} and L ={3}. For λ = 1andk = 1 the LSL L, 1, 1) = 3 + 2ln1 + e ) 2ln2andinfL, 1, 1) =. The transformation of the objective function is critical for the eistence of ˆλ,k).

10 436 POLYAK Remark 4.3. The conve in R n LSL L,λ,k)has a bounded level set in for any λ R n ++ and k>0. It does not follow from lemma 12 see [7, p. 95]), because ψt) does not satisfy the assumption a). 5. Log-sigmoid multipliers method We consider the following method. For a chosen λ 0 R n ++ and k>0wegenerate iteratively the sequence { s } and {λ s } according to the following formulas: s+1 = arg min { L,λ,k) R n}, 5.1) λ s+1 i = λ s i ψ kc )) i s+1 = 2λ ) s i 1 + e kc i s+1 ) 1, i = 1,...,m. 5.2) Along with multipliers method 5.1), 5.2) we consider a version of this method when the parameter k>0isnot fied but one can change it from step to step. For a given positive sequence {k s }: k s+1 >k s, lim s k s =, we find the primal { s } and the dual {λ s } sequences by formulas s+1 = arg min { L,λ,k s ) R n}, 5.3) λ s+1 i = λ s i ψ k s c i s+1 )) = 2λ s i 1 + e k s c i s+1 ) ) 1, i = 1,...,m. 5.4) First of all we have to guarantee that the multipliers method 5.1), 5.2) is well defined, i.e., that s+1 eists for any given λ s R m ++ and k>0. Duetolemma4.4foranyλ s R m ++ there eist ˆλs,k) = arg min{l, λ s,k) R n } and due to the formulas 5.2) and 5.4) we have λ s R m ++ λs+1 R m ++. Therefore if the starting vector of Lagrange multipliers λ 0 R n ++ then all vectors λs, s = 1, 2,..., will remain positive, so the LS method is eecutable. The critical part of any multipliers method is the formula for the Lagrange multipliers update. It follows from 5.2) and 5.4) that λ s+1 i >λ s i if cs+1 )<0andλ s+1 i <λ s i if c s+1 )>0. In this respect the LS method is similar to other multipliers method, however due to A5a)) the LS method has some very specific properties. In particular the Lagrange multipliers cannot be increased more than twice independent on the constraint violation and the value of the scaling parameter k>0. It means, for instance, if λ 0 = e = 1,...,1) R m is the starting Lagrange multipliers vector then for any k>0large enough and any constraint violation the new Lagrange multipliers cannot be more than two. Therefore in contrast to the eponential [17] or MBF methods [11] it is impossible to find approimation close enough to λ by using L,e,k)= f)+ 2k 1 m ln e kc i) ) no matter how large k>0 we are ready to use. Therefore to guarantee convergence when k we modified L,e,k). The convergence and the rate of convergence of the LS multipliers methods we consider in the net section.

11 LOG-SIGMOID MULTIPLIERS Convergence and rate of convergence We start with a modification of L,e,k). For a chosen 0 <α<1 we define the penalty LS function P : R n R ++ R ++ by formula P,k) = f)+ 2k 1+α The eistence and uniqueness of the minimizer ) = k) = arg min { P,k) R n} follows from lemmas 4.2 and 4.4. For the minimizer ) we have By introducing we obtain P ), ) = f ) ) m ln e kc i) ). 6.1) m 2k α 1 + e ) kc i) 1 ) ci ) = ) λ i ) λ i k) = 2k α 1 + e kc i) ) 1, i = 1,...,m, 6.3) P ), ) = f ) ) m ) λ i ) c i ) = L ), λ ) ) = ) The following theorem establishes the convergence of {k)} k>0 and {λk)} k>0 to X and L. Theorem ) If conditions A) and B) hold then the primal and dual trajectories {k)} k>0 and {λk)} k>0 are bounded and their limit points belong to X and L. 2) If the standard second order optimality conditions 2.4), 2.5) are satisfied then lim k k) = and lim k λk) = λ. If, in addition, f)and all c i ) C 2, then the following bound 0 f ) f k) ) 0.5k α f ) + k 1 m ln 2 ) ) m λ 6.5) holds true for any k k 0,wherek 0 > 0 is large enough. Proof. 1) As we mentioned in the proof of lemma 4.4 if X is bounded then by adding an etra constraint we can assume that the feasible set is bounded. Also due to lemma 4.2 the minimizer k) is unique, therefore λk) is uniquely defined by 6.3).

12 438 POLYAK For the vector k) we define two sets of indees I + = I + k) ={i: c i k)) 0} and I = I k) ={i: c i k)) < 0}. Then P k),k ) [ m = P,k)= f ) + 2k 1+α ln 1 + e kc i ) ) ] m ln 2 [ = f ) + 2k 1+α ln 1 + e ) kc i ) + ln 1 + e ) kc i ) k ] c i ) m ln 2. i I + i I i I Therefore P,k) f ) 2k α i I c ) 2k 1+α m ln 2. On the other hand P,k) P,k ) = f ) + 2k 1+α m ln e kc i ) ). In view of e kc i ) ) 1, i = 1,...,m,wehaveP,k) f ). Therefore keeping in mind fk))>0 we obtain 2k α i I c i ) f ) + 2k 1+α m ln 2 or ci ) 0.5k α f ) + k 1 m ln 2. i I In other words, for the maimum constraint violation at k) we have ) ma ci k) 0.5k α f ) + k 1 m ln 2 = vk). 6.6) i I Therefore due to corollary 20 see [7, p. 94]), the boundness {k)} k>0 follows from the boundness of. The boundness of the dual trajectory {λk)} k>0 follows from Slater s condition B), 6.4) and the boundness of the primal trajectory {k)} k>0. Let {k s )} s=1 and {λk s)} s=1 be the primal and dual converging subsequences and = lim s k s ) and λ = lim s λk s ), then by passing to the limit in 6.4) we obtain L, λ ) m = f ) λ i c i ) = 0 and from 6.3) and 6.6) we have λ R m + and c i ) 0, i = 1,...,m, λ i = 0, i / I ) = { i: c i ) = 0 } hence, λ) is a KKT s pair, i.e., X, λ.

13 LOG-SIGMOID MULTIPLIERS 439 2) If the standard second order optimality conditions 2.4), 2.5) are satisfied, then the pair,λ ) is unique, therefore = lim k k) and λ = lim k λk). To obtain the bound 6.5) we consider the enlarged feasible set k) ={: c i ) vk), i = 1,...,m}, which is bounded because is bounded. Therefore fk = arg min{f) k)} eists and fk fk)), hence f ) fk)) f ) fk. Due to the conditions 2.4), 2.5) and keeping in mind that f), c i) C 2 we can use theorem 6 see [7, p. 34]), to estimate f ) fk for any k k 0 and k 0 > 0 large enough we obtain f ) f k) ) f ) f k Using 6.6) we obtain the bound 6.5). m vk) λ i. Convergence results for general classes of smoothing methods have been considered in [1]. Before we establish the rate of convergence for the LS multipliers method we would like to discuss one intrinsic property of the smoothing methods. Let us consider the penalty LS function s Hessian. We have H ), ) = 2 P ), ) = 2 L ), λ ) ) + 2k c ) ) T ) e kc )) I + e kc ))) 2 ) c ), where ) = diagλ )) m and ekc )) = diage kci )) ) m. In view of 6.5) for k>0 large enough the pair ), λ )) is close to,λ ), therefore H ), ) 2 L,λ ) + 0.5k c ) T c ). Due to assertion 2.1 for k>0 large enough the min eigval H ), ) = µ>0, while the ma eigval H ), ) = Mk, M>0. Therefore cond H ), ) = µmk) 1 = O k 1). Hence the cond H ), ) converges to zero faster than fk)) converges to f ). The infinite increase of the scaling parameter k>0 is the only way to insure the convergence of the smoothing method. Therefore from some point on the smooth unconstrained minimization methods and in particular the Newton method might loose its efficiency. The method 5.1), 5.2) allows to speed up the rate of convergence substantially and at the same time keeps stable the condition number of the LS Hessian. Now we will prove that under the standard second order optimality conditions the primal dual sequence { s,λ s } generated by the LS multipliers method 5.1), 5.2) converges to the primal dual solution with Q-linear rate under a fied but large enough scaling parameter k>0.

14 440 POLYAK In our analysis we follow the scheme [11], in which the quadratic augmented Lagrangian proof see [4, p. 109]) for equality constraints has been generalized for nonquadratic augmented Lagrangians applied to inequality constrained optimization. In the course of our analysis we will estimate the threshold k 0 > 0 for the scaling parameter when the Q-linear rates occurs. First, we specify the etended dual feasible domain in R m + [k 0, ), where the Q-linear convergence takes place. Let = = ma 1 i n i, we choose a small enough 0 <δ< min 1 i r λ i. We will split the dual optimal vector λ on active λ r) = λ 1,...,λ r ) Rr ++ and passive λ m r) = λ r+1,...,λ m ) = 0 parts. The neighborhood of λ we define as follows: D ) D λ,k 0,δ ) = { λ, k) R m+1 ++ : λ i δ>0, λi λ i δk, i = 1,...,r, 0 λ i δk, k k 0,i= r + 1,...,m }. By introducing the vector t = t i,i = 1,...,m) = t r),t m r) ) with t i = λ i λ i )k 1, k k 0, i = 1,...,m, we transform the dual set D ) into the neighborhood of the origin in the etended dual space Sδ,k 0 ) = { t, k): t 1,...,t m ; k): t i ) δ λ i k 1,i= 1,...,r, } t i 0, i= r + 1,...,m, t δ, k k 0. Then for LSL we obtain L,t,k) = f)+ 2k 1 m kti + λ ) i ln e kc i ) ). For each λ D ) and k k 0 we can find the correspondent t, k) Sδ,k 0 ),the minimizer ˆ =ˆt,k) = arg min { L,t,k) R n} and the new vector of the Lagrange multipliers ˆλ = ˆλt, k) = ˆλ i t, k) = 2 kt i + λ ) i 1 + e kc i ˆ) ) 1 ),i= 1,...,m. Let us split the vector ˆλ = ˆλ r), ˆλ m r) ) on the active ˆλ r) = ˆλ i,i= 1,...,r) and the passive ˆλ m r) ˆλ m r) ˆ,t,k ) = λi ˆ,t,k ) = 2kti 1 + e kc i ˆ) ) 1,i= r + 1,...,m ) parts. We consider the vector-function h, t, k) = m i=r+1 which correspond to the passive set of constraints. ˆλ i,t,k) c i ),

15 LOG-SIGMOID MULTIPLIERS 441 Let a R n, b R n, θτ): R R, θb) = θb 1 ),..., θb n ) ), aθb) = a 1 θb 1 ),...,a n θb n ) ), a + θb) = a 1 + θb 1 ),...,a n + θb n ) ). Now we are ready for the basic statement in this section. We would like to emphasize that results of the following theorem remain true if neither f) nor c i ), i = 1,...,m,areconve. Theorem 6.2. If f) and all c i ) C 2 and the conditions 2.4) and 2.5) hold, then there eists such a small δ>0andlargek 0 > 0 that for any λ D ) and k k 0 : 1) there eist ˆ = ˆλ,k) = arg min{l,λ,k) R n }: L ˆ,λ,k) = 0and ˆλ = ˆλλ, k) = 2λ1 + e kcˆ) ) 1 ; 2) for the pair ˆ, ˆλ) the estimate ma { ˆ, ˆλ λ } ck 1 λ λ 6.7) holds and c>0 is independent on k k 0 ; 3) the LS function L,λ,k)is strongly conve in a neighborhood of ˆ. Proof. For any R n,anyk>0andt R m the vector-function h, t, k) is smooth in, also h, 0,k ) = 0 R n, h, 0,k ) = 0 n,n, λr) h, 0,k ) = 0 n,r, where 0 p,q is p q matri with zero elements. Let σ = min{c i ) i = r + 1,...,m} > 0. We consider the following map : R n+r+m+1 R n+r, which is defined by, ˆλ r),t,k ) [ f) r ˆλ ] i c i ) h, t, k) = 2k 1 ) kt r) + λ ). 6.8) r) 1 + e kc r) ) 1 k 1 ˆλ r) Taking into account 2.1), 2.2) we obtain,λ r), 0,k) = 0 n+r, k >0. 6.9) Let ˆλ r) = ˆλ r),λ r), 0,k), I r identical matri in R r, r) = diag λ i ) r, L = L,λ ), c = c ), c r) = c r) ). In view of h, 0,k)= 0 n,n and ˆλ r) h, 0,k)= 0 n,r we obtain [ ] L cr) T k ˆλ r) = 1 2 r) c. r) k 1 I r

16 442 POLYAK Using reasoning similar to those in [11] we obtain that 1 k eists and there is a number ϰ>0, which is independent on k k 0 that ϰ. 6.10) 1 k By applying the second implicit function theorem see [3, p. 12]) to the map 6.8) we find that on the set Sδ,k 0,k 1 ) = { t, k): t i ) δ λ i k 1,i= 1,...,r, t i 0, i= r + 1,...,m, t δk 1 },k 0 k k 1 there eists two vector-functions ) = t,k) = 1 t, k),..., m t, k) ) and ˆλ r) ) = ˆλ r) t, k) = ˆλ 1 t, k),..., ˆλ r t, k) ) such that t,k), ˆλ r t, k), t, k ) ), ˆλ ), ) ) One can rewrite system 6.11) as follows: f ) ) r ) ˆλ i ) c i ) h ), ) = 0, 6.12) ˆλ i ) = 2 kt i + λ i ) 1 + e kc i )) ) 1, i = 1,...,r. 6.13) We also have ˆλ i ) = 2λ i 1 + e kc i ˆ )) ) 1, i = r + 1,...,m. Recalling that λ m r) = λ r+1,...,λ m ) = 0 Rm r we first estimate ˆλ m r) λ m r). For any small ε>0wecan find δ>0small enough that t,k) 0,k) = ) ε for any t Sδ,k 0 ). Taking into account c i ) σ>0weobtain ) σ c i t,k) 2, i = r + 1,...,mfor t Sδ,k 0). Therefore in view of e + 1 we obtain 0 < ˆλ i t, k) 2λ i 1 + e 0.5kσ 2λ i kσ 4 σ λ i k. 6.14) Therefore ˆλ m r) λ 4 m r) σ k 1 λm r) λ m r). 6.15) Now we will consider the vector-functions t,k) = ) and ˆλ r) t, k) = ˆλ r) ). By differentiating 6.12) and 6.13) in t we find the Jacobians t ) t t,k) and t ˆλ r) ) = t ˆλ r) t, k) from the following system: [ ] t ) = t ˆλ r) ) ˆλ r) ) ) [ 1 t h ] ), ) 2diag 1 + e ) kc i )) 1) r ;. 6.16) 0r,m r

17 LOG-SIGMOID MULTIPLIERS 443 Considering the system 6.16) for t = 0 R m, we obtain [ ] t 0,k) = t ˆλ r) 0,k) ˆλ r) ) [ 1 t h 0,k),0,k ) ] [ I r ; 0 r,m r = k ) 1 t h 0,k),0,k ) ] I r ; 0 r,m r. In view of 6.10) and estimation t h, 0,k ) 2k 1 + e kσ/2) 1 c m r) ) 4σ 1 c m r) ) which holds true for any k k 0, we obtain ma { t 0,k), t ˆλ r) 0,k) } ϰ c ) m r) + I r ) Therefore = ϰ 4σ 1 cm r) ) + 1 ) = c ) ma { t t,k), t ˆλ r) t, k) } 2c ) for any t, k) Sδ,k 0 ) and δ>0small enough. Keeping in mind that 0,k) = and ˆλ r) 0,k) = ˆλ r) and using arguments similar to those in [11], we obtain ma { t,k), ˆλ r) t, k) ˆλ } r) 2c 0 k 1 λ λ. 6.19) Let λ λ ) λ λ λ λ )) ˆλ,k) =,k, ˆλλ, k) = ˆλ r),k ), ˆλ m r),k k k k then taking c = ma{2c 0, 4/σ } from 6.15) and 6.19) we obtain 6.7). To prove the final part of the theorem we consider the LS Hessian L,λ,k)at the point ˆ =ˆλ,k). Wehave L,λ,k)= f) and for the Hessian 2 L,λ,k)we obtain m 2λ i 1 + e kc i) c i) 2 L,λ,k) m = 2 f) 2λ i 1 + e kc i) 2 c i) + 2k c) ) T ) I + e kc) 2 c), where I identical matri in R m and e kc) = diage kc i) ) m, = diagλ i) m. Therefore 2 L ˆ,λ,k ) = 2 L ˆ, ˆλ ) + k c ˆ )) T I + e kcˆ) ) 1 c ˆ ).

18 444 POLYAK Using the estimation 6.1) for any λ, k) D ) we obtain 2 L ˆ,λ,k ) 2 L,λ ) + k c )) T I + e kc ) ) 1 c ) = 2 L,λ ) k c )) T c ). The strong conveity of LS L,λ,k)in in the neighborhood of ˆ follows from continuity of 2 L,λ,k)in and assertion 2.1. The proof of theorem is completed. Corollary 6.1. The Q-linear rate of convergence for the method 5.1), 5.2) and Q-superlinear convergence for 5.3), 5.4) follows directly from the estimation 6.7) because c>0 is independent on k>k Modification of the LS method The LS method 5.1), 5.2) requires solving unconstrained optimization problem at each step. To make the method practical we have to replace the unconstrained minimizer by an approimation that retains the convergence and the rate of convergence of LS method. In this section we establish the conditions for the approimation and prove that such an approimation allows to retain for the modified LS method the rate of convergence 6.7). For a given positive Lagrange multipliers vector λ R m ++, a large enough penalty parameter k>0and a positive scalar τ>0wefind an approimation for the primal minimizer ˆ from the inequality R n : L,λ,k) τk 1 2 I + e kc ) ) 1 λ λ 7.1) and the approimation for the Lagrange multipliers by formula λ = 2 I + e kc )) 1 λ. 7.2) It leads to the following modification of the LS multipliers method 5.1), 5.2). We define the modified primal dual sequence by the following formulas: s+1 R n : L s+1,λ s,k ) τk e kc i s+1 ) ) 1 λ s λ s, 7.3) λ s+1 = 2 I + e ) kc s+1 ) 1 λ s. 7.4) It turns out that the modification 7.3), 7.4) of the LS method 5.1), 5.2) keeps the basic property of the LS method, namely the Q-linear rate of convergence as soon as the second order optimality conditions hold and the functions f)and c i ), i = 1,...,m, are smooth enough.

19 LOG-SIGMOID MULTIPLIERS 445 Theorem 7.1. If the standard second order optimality conditions 2.4), 2.5) hold and the Hessians 2 f)and 2 c i ), i = 1,...,m, satisfy the Lipschitz condition 2 f 1 ) 2 f 2 ) L0 1 2, 2 c i 1 ) 2 c i 2 ) 7.5) Li 1 2 then there is k 0 > 0 that for any λ D ) and k k 0 the following bound holds true: ma {, λ λ } c5 + τ)k 1 λ λ 7.6) and c>0 is independent on k k 0. Proof. Let us assume that ε>0 is small enough and S,ε ) = { R n : } ε, λ = 1 + e kc )) 1 λ S λ,ε ) = { λ R m ++ : λ λ ε }. We consider vectors =, λ = λ λ = ) λ r), λ m r), λr) = λ r) λ r), λ m r) = λ m r) λ m r) = λ m r), y r) =, λ r) ) and y =, λ). Due to 7.5) we have f )= f ) + 2 f ) + r 0 ), 7.7) c i )= c i ) + 2 c i ) + r i ), i = 1,...,m, 7.8) and r 0 0) = 0, r i 0) = 0. r i ) L i. Then L,λ,k)= f ) = f ) = f ) Also due to 7.5) we have r 0 ) L 0, m 2λ i 1 + e kc i ) c i ) m λ i c i ) r ) λi + λ i ci ) + h,λ m r),k ), where h,λ m r),k)= m i=r+1 2λ i1 + e kc i ) ) 1 c i ). Using 7.7), 7.8) we obtain L,λ,k) = f ) + 2 f ) + r 0 ) r ) λi + λ ) i ci + 2 c ) i + r i ) ) + h,λ m r),k )

20 446 POLYAK Let = f ) r λ i c i ) + 2 f ) r λ i c i ) + r 0 ) r λ i 2 c )) i r λ i 2 c i ) r ) λi + λ i ri ) + h,λ m r),k ). r 1) y) = r 0 ) r λ i 2 c ) i + r ) λi + λ i ri ) then keeping in mind the KKT s condition we can rewrite the epression above as L,λ,k) = L,λ ) c r) ) T λr) +h,λ m r),k ) +r 1) y), 7.9) where r 1) 0) = 0andthereisL 1) > 0that r y) L 1) y. Then λ i = λ i λ i = λ i λ i + λ i λ i, i.e., λ i λ i ) λ i = λ i λ i, i = 1,...,r,or r) e r),k) λ r) = λ r) λ r), 7.10) where er) T,k) = e 1,k),...,e r,k)) and e i,k) = 1 e kci) )1 + e kci) ) 1, i = 1,...,r. Further, e i,k) = e i,k ) + k ei T,k ) + ri e ), i = 1,...,r, and 0) = 0, r e i ) L e i. r e i In view of e i,k)= 0and e i,k ) = 2k e 2kc i ) 1 + e kc i ) ) 2 ci ) = k 2 c i ), i = 1,...,r, we have e i,k) = 1 2 k c i ) + ri e ), i = 1,...,r. 7.11) Therefore the system 7.10) can be rewritten as r) e r),k) λ r) = I r + E r),k) ) λ r) λ r)), where E r), k) = diage i, k)) r. Using 7.11) we obtain 1 2 r) c r) ) k 1 λ r) = k 1 I r + E r),k) ) λ r) λ r)) k 1 r) re r) )

21 LOG-SIGMOID MULTIPLIERS 447 or r) c r) ) 2k 1 λ r) = 2k 1 I r +E r),k) ) λ r) λ r)) rλ ), 7.12) where r 2) ) = 2k 1 r) re r) ), re r) )T = r e 1 ),...,re r )), r 2) 0) = 0, and there is L 2) > 0that r 2) ) L 2). Combining 7.9) and 7.12) we obtain or 2 L c r) λ r) = L,λ,k) h,λ m r),k ) r y) r) c r) 2k 1 λ r) = 2k 1 I r + E r),k) ) λ r) λ r)) r 2) ) ϕ k y r) = a,λ,k) + b,λ,k) + r y r) ), 7.13) where [ ] [ L c r) L,λ,k) h,λ m r),k ) ] ϕ k = r) c, a,λ,k) =, r) 2k 1 0 b,λ r),k ) [ ] [ ] 0 r 1) = 2k 1 I r + E r),k) ) y r) ) ), r y r) ) =, λ r) λ r 2) ) r) r0) = 0, and there is L>0that r y r) ) L y r). As we know already for k 0 > 0 large enough and any k>k 0 the inverse matri ϕ 1 k eists and there is ϰ >0 independent on k k 0 that ϕ 1 k ϰ. Therefore we can solve the system 7.13) for y r). y r) = ϕ 1 [ ] [ k a,λ,k) + b,λ,k) + r y) = ϕ 1 k a ) + b ) + r yr) ) ] = C y r) ). 7.14) Therefore and C y r) ) = ϕ 1 k r y r) ) C yr) ) ϕ 1 k r yr) ) ϰl yr). So, for y r) small enough we have C yr) ) q<1. In other words the operator C y r) ) is a contractive operator for y r) small enough.

22 448 POLYAK Let us estimate the contractibility at the operator C y r) ) with more details. First of all we shall estimate a ) and b ). Wehave a ) L,λ,k) + h,λ m r),k ). Note that for ε>0 small enough and S,ε)we obtain λ i = 2λ i 1 + e kc i ) ) 1 4λi 1 + e kc i ) ) 1 4λi 1 + e kσ/2 ) 1, i = r + 1,...,m. Hence for k 0 > 0 large enough and k k 0 one can find a small enough η>0that λ i ηk 1 λ i = ηk 1 ) λ i λ i, i = r + 1,...,m, i.e., and λ m r) = λ m r) λ m r) ηk 1 λm r) λ m r) 7.15) h,λm r),k ) m = i=r+1 m i=r+1 2λ i 1 + e kc i ) ) 1 ci ) 8λ i 1 + e kσ/2 ) 1 ci ). For k 0 > 0 large enough and any k k 0 we have 41 + e kσ/2 ) 1 c i ) 2k 1, i = r + 1,...,m. Therefore h,λm r),k ) 2k 1 λm r) λ m r) 2k 1 λ λ. Further, from 7.11) we obtain L,λ,k) τk 1 λ λ τk 1 λ λ + τk 1 λ λ. 7.16) Therefore a ) τk 1 λ λ τ)k 1 λ λ. 7.17) Further Hence I r + E r),k) = diag ekc i ) ) = diag e ) kc i ) 1) r 1 + e kc i ). b ) 2k 1 λr) λ r) 2k 1 λ λ. 7.18) From 7.14), 7.15) we obtain yr) ϕ 1 [ k a ) + b ) + r yr) ) ] ϰ τk 1 λ r) λ r) + τk 1 λ m r) +4 + τ)k 1 λ λ + r yr) ) ).

23 LOG-SIGMOID MULTIPLIERS 449 Then in view of λ r) λ r) y r), λ m r) k 1 λ m r) λ m r) k 1 λ λ and r y r) ) L 2 y r) 2 we obtain y r) ϰ τk 1 y r) +5 + τ)k 1 ) λ λ L + 2 y r) 2 or ϰl 2 y r) 2 1 ϰτk 1) y r) +5 + τ) ϰk 1 λ λ 0 and y r) 1 [ 1 ϰτ ) 1 ϰτ ) 2 2L ϰ2 5 + τ) λ λ ) 1/2 ]. ϰl k k k If k 0 > 0 is large enough then for any k k 0 we have [ 1 ϰτ ) 2 2L ϰ2 5 + τ) 1/2 λ λ ] k k 1 ϰτ k Therefore 2 ϰ5 + τ) y r) λ λ. k So in view of 7.14) for c = ma{2 ϰ,η} we have The proof is completed. ma {, λ λ } c5 + τ) k ) 2L ϰ2 5 + τ) λ λ. k λ λ. Remark 7.1. The results of theorem 7.1 remain true whenever f) and all c i ) are conve or not. 8. Primal dual LS method The numerical realization of the LS method 5.1), 5.2) leads to finding an approimation from 7.1) and updating the Lagrange multipliers by formula 7.2). To find one can use Newton method. The Newton LS method has been described in [12]. In this section we consider another approach to numerical realization of the LS multipliers method 5.1), 5.2). Instead of using Newton method to find and then to update the Lagrange multipliers we will use Newton method for solving the following primal dual system L ˆ, ˆλ ) = f ˆ ) ) ˆλ i c i ˆ = 0, 8.1) ˆλ = ψ kc ˆ )) λ 8.2)

24 450 POLYAK for ˆ and ˆλ under the fied k>0andλ R m ++,whereψ kc ˆ)) = diagψ kc i ˆ))) m. After finding an approimation, λ) for the primal dual pair ˆ, ˆλ) we replace λ for λ and take, λ) as a starting point for the new system. We apply the Newton method for solving 8.1), 8.2) using, λ) as a starting point. By linearizing 8.1), 8.2) we obtain the following system for the Newton direction, λ) m f)+ 2 f) λ i + λ i ) c i ) + 2 c i ) ) = 0, 8.3) λ + λ = ψ k c) + c) )) λ. 8.4) The system 8.3) we can rewrite as follows: 2 L, λ) c)t λ + L, λ) By ignoring the last term we obtain m λ i 2 c i ) = ) 2 L, λ) c)t λ = L, λ). 8.6) By ignoring the second order components in the representation of ψ kc) + k c) ) we obtain ψ kc) + k c) ) = ψ kc) ) + kψ kc) ) c). So we can rewrite the system 8.4) as follows kψ kc) ) c) + λ = λ λ, 8.7) where ψ kc)) = diagψ kc i ))) m, = diagλ i) m, λ = ψ kc))λ. Combining 8.6), 8.7) we obtain 2 L, λ) c)t λ = L, λ), 8.8) c ) + kψ kc) ) ) 1 λ = kψ kc) ) ) 1 λ λ ). 8.9) We find λ from 8.9) and substitute to 8.8). We obtain the following system for where M ) = L, λ ), 8.10) M,λ) = 2 L, λ) k c)t ψ kc) ) c) is a symmetric and positive definite matri. Moreover, the cond M,λ,k) is stable in the neighborhood of,λ ) for any fied k k 0, see lemma 4.3. By solving the system 8.10) for we find the primal direction. The net primal approimation for ˆ we obtain as = )

25 LOG-SIGMOID MULTIPLIERS 451 The net dual approimation for ˆλ we find by formula λ = λ + λ = λ + kψ kc) ) c). 8.12) So the method 8.11), 8.12) one can view as dual primal predictor corrector. The vector λ = ψ kc ˆ))λ, is the predictor for ˆλ. Using this dual predictor we find the primal direction from 8.10) and use it to find the dual corrector λ by formula λ = kψ kc) ) c). The net dual approimation λ for ˆλ we find by 8.12). The primal dual LS is fast and numerically stable in the neighborhood of,λ ). To make the LS method converge globally one can combine the primal dual method with Newton LS using the scheme [10]. Such approach produced very encouraging results on a number of LP and NLP problems [12]. We will show some recently obtained results in section Log-sigmoid Lagrangian and duality We have seen already that LS Lagrangian L,λ,k) has some important properties, which the classical Lagrangian L, λ) does not possess. Therefore one can epect that the dual function d k λ) = inf { L,λ,k) R n} 9.1) and the dual problem λ = arg ma { d k λ) λ R m } + 9.2) might have some etra properties as compared to the dual function dλ) and the dual problem D), which are based on L, λ). First of all due to the lemmas 4.2 and 4.4 for any λ R m ++ and any k>0there eists a unique minimizer ˆ =ˆλ,k) = arg min { L,λ,k) R n} for any conve programming problem with a bounded optimal set. The uniqueness of the minimizer ˆλ,k) together with smoothness of f) and c i ) provide smoothness for the dual function d k λ) which is always concave whether the primal problem P) is conve or not. So the dual function d k λ) is smooth under reasonable assumption about the primal problem P). Also the dual problem 9.2) is always conve. Let us consider the properties of the dual function and the dual problem 9.2) with more details. Assuming smoothness f)and c i ) and uniqueness ˆλ,k) we can compute the gradient d k λ) and the Hessian 2 d k λ). For the gradient d k λ) we obtain d k λ) = L ˆ,λ,k ) λ ˆλ,k) + λ L,λ,k), where λ ˆλ,k) = J λ ˆλ,k)) is the Jacobian of the vector-function ˆλ,k).

26 452 POLYAK In view of L ˆ,λ,k) = 0wehave d k λ) = λ L ˆλ,k),λ,k ) = λ L ˆ ), ) = 2k 1 ln e kc i ˆ) ),...,ln e kc m ˆ) )) T = 2k 1 ln e kcˆ)), 9.3) where ln e kc) ) is a column vector with components ln e kci) ), i = 1,...,m. Since 2 L ˆλ,k),λ,k)is positive definite the system L,λ,k)= 0 yields a unique vector-function ˆλ,k) such that ˆλ,k)= and By differentiating 9.4) in λ we obtain therefore L ˆλ,k),λ,k ) L ˆ ), ) = ) 2 L ˆ ), ) λ ˆ ) + 2 λ L ˆ ), ) = 0 λ ˆλ,k) = λ ˆ ) = 2 L ˆ ), )) 1 2 λ L ˆ ), ). 9.5) Let us consider the Hessian for the dual function. Using 9.3) and 9.5) we obtain 2 d k λ) = λ λ d k λ) ) = 2k 1 λ ln e kcˆλ,k))) = 2 λ L ˆ,λ,k ) λ ˆλ,k) = 2 λ L ˆ ), ) 2 L ˆ ), )) 1 2 λ L ˆ ), ). 9.6) To compute λ L ˆ ), ) let us consider λ L ln 1 + e ) kc 1 ˆ )) ln 2 ), ) = 2k 1. ln 1 + e ). kc m ˆ )) ln 2 Then for the Jacobian λ L ), )) = λ 2 L ), ) we obtain ) 1 + e kc 1 ˆ )) 1 ) c1 ˆ ) λ 2 L ˆ ), ) = 2. ) = ψ kc ˆ ) )) c ˆ ) ), ) 1 + e kc m ˆ )) c m ˆ ) where ψ kc ˆ ))) = 2diag[1 + e kc i ˆ )) ) 1 ] m. Therefore 2 λ L ˆ ), ) = T λ L ˆ ), ) = c ˆ ) ) T ψ kc ˆ ) )) c ˆ ) ). By substituting λ 2 L ˆ ), ) and 2 λl ˆ ), ) into 9.6) we obtain the following formula for the Hessian of the dual function. 2 d k λ) = ψ kc ˆ ) )) c ˆ ) ) 2 L ˆ ), )) 1 c ˆ ) ) Tψ kc ˆ ) )). 9.7)

27 LOG-SIGMOID MULTIPLIERS 453 Note that 2 d k λ ) = ψ kc )) c ) 2 L,λ,k )) 1 c ) T ψ kc )). 9.8) We proved the following lemma. Lemma 9.1. If f)and all c i ) C 2 then 1) if P) is a conve programming problem and A) is true then the dual function d k λ) C 2 for any λ R m ++ and any k>0; 2) if P) is a conve programming problem and f)is strongly conve then the dual function d k λ) C 2 for any λ R m + and any k>0; 3) if the standard second order optimality conditions 2.4), 2.5) are satisfied then d k λ) C 2 for any pair λ D ) and k k 0 whether the problem P) is conve or not. The gradient d k λ) is given by 9.3) and the Hessian 2 d k λ) is given by 9.7). Theorem 9.1 Duality). 1) If Slater condition B) holds then the eistence of the primal solution implies the eistence of the dual solution and for any k>0. f ) = d k λ ) 9.9) 2) If f)is strictly conve, f)and all c i ) are smooth and the dual solution eists, then the primal eists and 9.9) holds for any k>0. 3) If f)and all c i ) C 2 and 2.4), 2.5) are satisfied then the second order optimality conditions hold true for the dual problem for any k k 0 if k 0 is large enough. Proof. 1) The primal solution is at the same time a solution for the equivalent problem 4.1). Therefore keeping in mind the Slater condition B) we obtain such λ R m + such that λ i c i ) = 0, i = 1,...,m, L,λ,k ) L,λ,k ), R n,k>0. Therefore d k λ ) = min L,λ,k ) = L,λ,k ) = f ) R n L,λ,k ) min L,λ,k)= d kλ), λ R m R n +. Hence λ R m + is the solution for the dual problem and 9.9) holds.

28 454 POLYAK 2) Let us assume that λ R m + is the solution for the dual problem. If f) is strictly conve then the function L,λ,k) is strictly conve too in R n. Therefore the gradient d k λ) eists. Consider the optimality condition for the dual problem λ i = 0 λi d k λ) 0, 9.10) λ i > 0 λi d k λ) = ) Let = arg min{l, λ, k) R n },then λi d k λ) = 2k 1 ln e kc i ) ). From 9.10) we obtain λ i = 0 2k 1 ln e ) kc i ) e ) kc i ) 1 e kc i ) 1 c i ) 0. From 9.11) we have λ i > 0 ln e kc i ) ) = 0 e kc i ) = 1 c i ) = 0. Therefore, λ) is the primal dual feasible pair for which the complementary conditions hold, i.e., =, λ = λ. 3) To prove that for the dual problem the standard second order optimality conditions hold true we consider the Lagrangian for the dual problem We have Then λ = arg ma { d k λ) λ i 0, i= 1,...,m }. Lλ, v,k)= d k λ) + m v i λ i. 2 λλ Lλ, v,k)= 2 λλ d kλ). 9.12) The active dual constraints are λ i = 0, i = r + 1,...,m, and the vectors e i = 0,...,0, 0,...,1,...,0), i = r + 1,...,m, are the gradients of the active dual constraints. Therefore the tangent subspace to the dual active set at the point λ is Y = { y R m : y, e i ) = 0, i= r + 1,...,m } = { y R m : y = y 1,...,y r, 0,...,0) }. It is clear that the gradients e i, i = r + 1,...,m, of the dual active constraints are linear independent. So, to prove that for the dual problem the second order optimality conditions hold true we have to show 2 λλ L λ, v,k ) y,y ) µ y 2 2, µ > 0, y Y.

29 LOG-SIGMOID MULTIPLIERS 455 Using 9.8) and 9.12) we obtain 2 λλ L λ, v,k ) y,y ) = λλ d k λ ) y,y ) where = ψ kc )) c ) 2 L,λ,k )) 1 c ) T ψ kc )) y,y ) = c ) 2 L,λ,k )) 1 ) c Tȳ,ȳ ), ȳ = ψ kc )) y = ψ kc )) 1 y 1,...,ψ kc )) y r, 0,...,0 ) = y 1,...,y r, 0,...,0) = y r), 0) = y. In other words 2 λλ L λ, v,k ) y,y ) = 2 L,λ,k )) 1 c ) T y, c ) T y ). Using 4.4) we obtain i.e., Hence M 0 k) 1 y, y) 2 L,λ,k )) 1 ) y,y µ 1 0 y, y) M 0 k) 1 y, y) 2 L,λ,k )) 1 ) y,y µ 1 0 y, y). 2 λλ L λ, v,k ) y,y ) M 0 k) 1 cr) T ) y r) cr) T ) ) y r) = M 0 k) 1 c r) ) cr) T ) ) y r),y r). 9.13) Due to 2.4) the Gram matri c r) cr) T is positive definite, therefore there is µ > 0that c r) cr) T y r),y r) ) µ y r) 2 2. Therefore in view of 9.13) we obtain 2 λλ L λ, v,k ) y,y ) µ y 2 2, y Y, 9.14) where µ = M 0 k) 1 µ. So the standard second order optimality condition holds true for the dual problem. Corollary 9.1. If 2.4), 2.5) hold and k 0 > 0 is large enough then for any k k 0 the restriction d k λ r) ) = d k λ) λr+1 = 0,...,λ m =0 of the dual function to the manifold of the dual active constraints is strongly concave. The properties of d k λ r) ) allow to use smooth unconstrained optimization technique, in particular Newton method for solving the dual problem. It leads to the second order LS multipliers method. Remark 9.1. The part 3) of theorem 9.1 holds true even for nonconve optimization problems. It is not true if instead of L,λ,k)one uses the classical Lagrangian L, λ).

30 456 POLYAK 10. Numerical results The primal dual LS method we described in section 8 generally speaking does not converge globally. However locally it converges very fast. Therefore in the first stage of the computation we used the path following type approach with LS penalty function 6.1), i.e., we find an approimation for k) and increase k>0 from step to step. For the unconstrained minimization P,k)in we used Newton method with step-length. When the duality gap becomes reasonably small we use the primal approimation for k) to compute approimation λ for the Lagrange multipliers λk) and then the primal dual vector, λ) is used as a starting point in the primal dual method 8.11), 8.12). The first stage consumes the most of the computational time, while the primal dual method 8.11), 8.12) requires only a few steps to reduce the duality gap and the infeasibility from to For all problems which have been solved we observed the hot start phenomenon see [10,11]), when few and from some point on only one Newton step is required for finding an approimation with high accuracy for the solution of the primal dual system 8.1), 8.2). In the following tables we show numerical results obtained by using the NR multipliers method for several problems, which we downloaded from Dr. R. Vanderbei webpage. Name: catenary n = 198, m = 298, p = 100; # Objective: linear # Constraints: conve quadratic # Feasible set: conve # This model finds the shape of a hanging chain # The solution is known to be y = cosha*) + b # for appropriate a and b. Path following... it f g gap inf step e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-05 3 Primal-Dual algorithm e e e e e e e e-09 6

31 LOG-SIGMOID MULTIPLIERS e e e e e e e e-17 1 Name: esfl_socp n = 1002, m = 2002, p = 1000; Pathfollowing... it f g gap inf step e e e e e e e e e e e e e e e e e e e e e e e e-04 3 Primal-Dual algorithm e e e e e e e e e e e e e e e e e e e e e e e e-10 1 Name: fekete n = 150, m = 200, p = 50 # Objective: nonconve nonlinear # Constraints: conve quadratic Pathfollowing... it f g gap inf step e e e e e e e e e e e e e e e e e e e e e e e e e e e e Primal-Dual algorithm e e e e e e e e e e e e e e e e e e e e+00 2

32 458 POLYAK Name: fir_socp n = 12, m = 319, p = 307; Pathfollowing... it f g gap inf step e e e e e e e e e e e e-02 3 Primal-Dual algorithm e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-10 4 Name: hydrothermal n = 46, m = 55, p = 9; # Objective: nonconve nonlinear # Constraints: nonconve nonlinear Pathfollowing... it f g gap inf step e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-03 1 Primal-Dual algorithm e e e e e e e e e e e e e e e e e e e e-10 1

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical