Nonmonotonic back-tracking trust region interior point algorithm for linear constrained optimization

Journal of Computational and Applied Mathematics 155 (2003) 285 305 www.elsevier.com/locate/cam Nonmonotonic bac-tracing trust region interior point algorithm for linear constrained optimization Detong Zhu Department of Mathematics, Shanghai Normal University, Shanghai 200234, China Received 1 February 2002; received in revised form 12 October 2002 Abstract In this paper, we modify the trust region interior point algorithm proposed by Bonnans and Pola in (SIAM J. Optim. 7(3) (1997) 717) for linear constrained optimization. A mixed strategy using both trust region and line-search techniques is adopted which switches to bac-tracing steps when a trial step produced by the trust region subproblem may be unacceptable. The global convergence and local convergence rate of the improved algorithm are established under some reasonable conditions. A nonmonotonic criterion is used to speed up the convergence progress in some ill-conditioned cases. The results of numerical experiments are reported to show the eectiveness of the proposed algorithm. c 2003 Elsevier Science B.V. All rights reserved. MSC: 90C.30; 65K.05; 49M.40 Keywords: Trust region methods; Bac tracing; Nonmonotonic technique; Interior points 1. Introduction In this paper, we analyze the trust region interior point algorithm for solving the linear equality constrained optimization problem: min f(x) s:t: Ax= b; x 0; where f : R n R is smooth nonlinear function, not necessarily convex, A R m n is a matrix b R m is a vector. There are quite a few articles proposing sequential convex quadratic programming methods with trust region idea. These resulting methods generate sequences of points in the interior 0377-0427/03/$ - see front matter c 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/s0377-0427(02)00870-1 (1.1)

286 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 of the feasible set. Recently, Bonnans and Pola in [2] presented a trust region interior point algorithm for the linear constrained optimization. The algorithm would use two matrices at each iteration. The def rst is X = diag{x 1;:::;xn }, a scaling matrix, where xi is the ith component of x 0, the current interior feasible point. The second matrix is B, a symmetric approximation of the Hessian 2 f(x ) of the objective function in which B assumed to be positive semidenite in [2]. A search direction at x by solving the trust region convex quadratic programming: min s:t: (d) def =f + g T d + 1 2 dt B d Ad=0 d T X 2 d 6 2 ; x + d 0; where g = f(x ); d = x x ;B ; (d) is the local quadratic approximation of f and is the trust region radius. Let d be the solution of the subproblem. Then Bonnans and Pola in [2] used the line search to computing =! l, with l the smallest nonnegative integer such that (1.2) f(x +! l d ) 6 f(x )+! l ( (d ) f ); (1.3) where (0; 1 ) and! (0; 1), we obtain next step, 2 x +1 = x + d : However, Bonnans and Pola in [2] assumed that (d ) is a convex function, in order to obtain global convergence of the proposed algorithm. Trust region method is a well-accepted technique in nonlinear optimization to assure global convergence. One of the advantages of the model is that it does not require the objective function to be convex. A solution that minimizes the model function within the trust region is solved as a trial step. If the actual reduction achieved on the objective function f at this point is satisfactory comparing with the reduction predicted by the quadratic model, then the point is accepted as a new iterate, and hence the trust region radius is adjusted and the procedure is repeated. Otherwise, the trust region radius should be reduced and a new trial point needs to be determined. It is possible that the trust region subproblem needs to be resolved many times before obtaining an acceptable step, and hence the total computation for completing one iteration might be expensive. Nocedal and Yuan [9] suggested a combination of the trust region and line-search method for unconstrained optimization. The plausible remedy motivates to switch to the line-search technique by employing the bac-tracing steps at an unsuccessful trial step. Of course, the prerequisite for being able to maing this shift is that although the trial step is unacceptable as next iterative point, it should provide a direction of sucient descent. More recently, the nonmonotonic line-search technique for solving unconstrained optimization is proposed by Grippo et al. in [6]. Furthermore, the nonmonotone technique is developed to trust region algorithm for unconstrained optimization (see [3], for instance). The nonmonotonic idea motivates to further study the bac-tracing trust region interior point algorithm, because monotonicity may cause a series of very small steps if the contours of objective function f are a family of curves with large curvature. The paper is organized as follows. In Section 2, we describe the algorithm which combines the techniques of trust region interior point, bac-tracing step, scaling matrix and nonmonotonic search. In Section 3, wea global convergence of the proposed algorithm is established. Some further convergence properties such as strong global convergence and local convergence rate are discussed (1.4)

D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 287 in Section 4. Finally, the results of numerical experiments of the proposed algorithm are reported in Section 5. 2. Algorithm In this section, we describe a method which combines nonmonotonic line-search technique with a trust region interior point algorithm. 2.1. Initialization step Choose parameters (0; 1 2 );! (0; 1); 0 1 2 1; 0 1 2 1 3 ; 0 and positive integer M. Let m(0) = 0. Choose a symmetric matrix B 0. Select an initial trust region radius 0 0 and a maximal trust region radius max 0, give a starting strictly feasible point x 0 int()={x Ax = b; x 0}. Set = 0, go to the main step. 2.2. Main step 1. Evaluate f = f(x ); g = f(x ). 2. If P g 6, stop with the approximate solution x, here the projection map P =I A T ( A A T ) 1 A def def of the null subspace of N( A ) with A =AX ; g =X g. 3. Solve subproblem min (d) def =f + g T d + 1 2 dt B d (S ) s:t: Ad=0; X 1 d 6 ; x + d 0: Denote by d the solution of the subproblem (S ). 4. Choose =1;!;! 2 ;:::, until the following inequality is satised: f(x + d ) 6 f(x l() )+ g T d ; (2.1) where f(x l() ) = max 06j6m() {f(x j )}. 5. Set h = d ; (2.2) x +1 = x + h : Calculate Pred(h )=f (h ); (2.3) (2.4) [Ared(h )=f(x l() ) f(x + h ); (2.5) ˆ = [ Ared(h ) Pred(h ) (2.6)

288 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 and tae [ 1 ; 2 ] if ˆ 6 1 ; +1 = ( 2 ; ] if 1 ˆ 2 ; ( ; min{ 3 ; max }] if ˆ 2 : Calculate f(x +1 ) and g +1. 6. Tae m( +1)=min{m()+1;M}, and update B to obtain B +1. Then set + 1 and go to step 2. Remar 1. In the subproblem (S ), (d) is a local quadratic model of the objective function f around x. A candidate iterative direction d is generated by minimizing (d) along the interior points of the feasible set within the ellipsoidal ball centered at x with radius. Remar 2. A ey property of this transformation in trust region subproblem (S ) is that X 1 d is at least unit distance from all bounds in the scaled coordinates; i.e., an arbitrary step X 1 d to the point x + d does not violate any bound if d T X 2 d 1. To see this, rst observe that n ( ) d d T X 2 i 2 d = 1 i=1 x i implies d i xi, for i =1;:::;n, where xi and di are the ith components of x and d, respectively. Then no matter what the sign of d i is, the inequality xi + di 0 holds. One typical method for solving the subproblem (S ) was presented in [2] in detail (also see [5,8,11]). Remar 3. Note that in each iteration the algorithm solves only one trust region subproblem. If the solution d fails to meet the acceptance criterion (2.1) (tae = 1), then we turn to line search, i.e., retreat from x + h until the criterion is satised. Remar 4. We improved the trust region interior algorithm in [2] by using bac-tracing trust region technique and the trust region radius adjusted depends on the traditional trust region criterion. At the line search, we use g T d in (2.1) instead of (d ) f in (1.3). The line-search criterion (2.1) is satised easier than criterion (1.3), because if B is positive semidenite, then g T d 6 (d ) f. Remar 5. Comparing the usual monotone technique with the nonmonotonic technique, when M 1, the accepted step h only guarantees that f(x +h ) is smaller than f(x l() ). It is easy to see that the usual monotone algorithm can be viewed as a special case of the proposed algorithm when M =0. 3. Global convergence Throughout this section we assume that f : R n R 1 is twice continuously dierentiable and bounded from below. Given x 0 R n, the algorithm generates a sequence {x } R n. In our analysis,

D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 289 we denote the level set of f by L(x 0 )={x R n f(x) 6 f(x 0 ); Ax= b; x 0}: The following assumption is commonly used in convergence analysis of most methods for linear equality constrained optimization. Assumption 1. Sequence {x } generated by the algorithm is contained in a compact set L(x 0 )on R n. Matrix A has full row-ran m. Based on solving the above trust region subproblem, we give the following lemma. Lemma 3.1. d that is a solution of subproblem (S ) if andonly if there exist 0, R m such (B + X 2 )d = g + A T ; Ad =0; x + d 0; ( 2 d T X 2 d ) = 0 (3.1) holds and B + X 2 is positive semidenite in N(A). Proof. Suppose d is the solution of (S ), it is a straightforward conclusion of the rst-order necessary conditions that there exist 0 and R m such that (3.1) holds. It is only need to show that B + X 2 is positive semidenite in N(A). Suppose that d 0, since d solves (S ), it also solves min{ (d) def =g T d + 1 2 dt B d Ad =0; x + d 0 X 1 d = X 1 d }: It follows that (d) (d ) for all d such that X 1 d = X 1 d and Ad = 0, that is, g T d + 1 2 dt B d g T d + 1 2 dt B d : (3.2) Since (B + X 2 )d = g + A T and Ad = 0, we have g T d = d T (B + X 2 )d and g T d = d T (B + X 2 )d : (3.3) Replacing the above two equalities into (3.2) and rearranging terms in (3.2) gives (d d ) T (B + X 2 )(d d ) (d T X 2 d d T X 2 d )=0; because of d 0, we have the conclusion. If d =0,by(3.1) we have g + A T =0, so d = 0 solves min{ 1 2 dt B d d T X 2 d 6 2 }

290 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 and we must conclude that B is positive semidenite. Since 0 is necessary, B + X 2 is positive semidenite in N(A). Let 0; R m ;d R n satisfy (3.1) and B + X 2 be positive semidenite in N(A). Then for all d N(A); x + d 0, we have g T d + 1 2 dt (B + X 2 )d g T d + 1 2 dt (B + X 2 )d : It means (d) =g T d + 1 2 dt B d = g T d + 1 2 dt (B + X 2 )d 1 2 d T X 2 d g T d + 1 2 dt (B + X 2 )d 1 2 d T X 2 d = (d )+ 1 2 (dt X 2 d d T X 2 d) (d ); which proves that d solves (S ). Lemma 3.1 establishes the necessary and sucient condition concerning ; d, and when d solves (2.1). It is well nown from solving the trust region algorithms in order to assure the global convergence of the proposed algorithm, it is a sucient condition to show that at th iteration the predicted reduction dened by Pred(d )=f (d ) which is obtained by the step d from trust region subproblem. Lemma 3.2. Let the step d be the solution of the trust region subproblem, then there exists 0 such that the step d satises the following sucient descent condition: { } P ĝ Pred(d ) P g min 1; ; (3.4) X B X for all g ; B and, where P = I Â T (Â Â T ) 1 Â with Â = AX and ĝ = X g. Proof. From Remar 2, we now that the inequality x + d 0 must hold, if 1. Assume rst that 1. The ane scaling transformation is the one-to-one mapping ˆd = X 1 d ; (3.5) which is extended for the equality constrained trust region subproblem (S ) by dening the quantities ĝ = X g ; Â = AX ; ˆB = X B X : (3.6) (3.7) (3.8)

D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 291 Thus, the basic anely scaled subproblem at the th iteration is min (Ŝ ) s:t: Â ˆd =0; ˆ( ˆd) def =ĝ T ˆd + 1 2 ˆd T ˆB ˆd ˆd 6 : Denote by ˆd the solution of the subproblem (Ŝ ). Let w = P ĝ, it is clear to see that Â w =0. If P ĝ 0, consider the following problem: min (t) def = t P ĝ 2 + 1 2 t2 (P ĝ ) T ˆB (P ĝ ) s:t: 0 6 t 6 P ĝ : Let t be the optimal solution of the above problem and be the optimal value of the subproblem (S ). Consider two cases: (1) w T ˆB w 0, set t = w 2 =w T ˆB w, if t w 6, then t = t is the solution of the subproblem (3.9), we have that ( ) 6 (t )= w 4 w T ˆB + 1 2 w 2 w 2 w T ˆB (w T w ˆB w ) 6 1 w 2 2 ˆB : (3.10) On the other hand, if t w, i.e., w ( = w 2 )(w T ˆB w ) then set t = = w,we have that ( ) 6 (t )= w 2 + 1 ( ) 2 (w T w 2 w ˆB w ) 6 1 2 w : (3.11) (2) w T ˆB w 6 0, set t = = w, we have that ( ) 6 (t )= w 2 + 1 ( ) 2 (w T w 2 w ˆB w ) 6 1 2 w : (3.12) As above, (3.10) (3.12) mean that 6 1 2 w min{ ; w = ˆB }. From (3.10) (3.12), we have that { } P ĝ Pred(d ) P ĝ min ; ; (3.13) X B X where set = 1 2. If 1, let d be the solution of the following subproblem: min ( S ) s:t: Â ˆd =0; ˆ( ˆd) def =ĝ T ˆd + 1 2 ˆd T ˆB ˆd ˆd 1; (3.9)

292 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 then d is a feasible solution of subproblem (Ŝ ) and x + d 0. So, d is also a feasible solution of subproblem (S ). Hence, similar to prove (3.13), we have { } 6 ˆ( ˆd ) 6 ˆ( P ĝ d ) 6 P ĝ min 1; : (3.14) X B X From (3.12) (3.14), we have that the conclusion of the lemma holds. The following lemma show the relation between the gradient g of the objective function and the step d generated by the proposed algorithm. We can see from the lemma that the direction of the trial step is a suciently descent direction. Lemma 3.3. At the th iteration, let d be generatedin trust region subproblem (S ), Lagrange multiplier estimates =(Â Â T ) 1 Â ĝ, then { } g T P ĝ d 6 1 P ĝ min 1; ; ; (3.15) X B X where 1 0 is a constant. Remar. The Lagrange multiplier estimates =(Â Â T ) 1 Â ĝ, we can use standard rst-order least-squares estimates by solving (Â Â T ) = Â ĝ : Proof. The inequality x + d 0 must hold if 1. Without loss of generality, assume that 1. Since d is generated in trust region subproblem (S ), Lemma 3.1 ensures that, (B + X 2 )d = g + A T : Noting X diagonal matrix and from = (Â Â T ) 1 Â ĝ X 1 d = P ĝ (X B X )X 1 d ; we tae norm in the above equation and obtain = X 1 d 6 P ĝ + X B X X 1 d : (3.16) Hence, noting X 1 d 6, 0 6 6 P ĝ X 1 d + X B X 6 P ĝ + X B X : (3.17) From (3.1), we have that from X 1 d = (X B X + I) P ĝ ; where B is the general pseudo-inverse of B: (3.18)

D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 293 Then we have that from Â ˆd =0, g T d =(X g ) T X 1 d =(P ĝ ) T X 1 d = (P ĝ ) T (X B X + I) P ĝ 1 6 P ĝ 2 X B X + P ĝ 2 6 2 X B X + P ĝ = P ĝ 2 6 2 max{2 X B X ; P ĝ = } 6 1 { } 4 P P ĝ ĝ min X B X ; : (3.19) From (3.19) and taing 1 = 1 we have that (3.15) holds. 4 Assumption 2. X B X and X 2 f(x)x are bounded, i.e., there exist b; ˆb 0 such that X B X 6 b;, and X 2 f(x)x 6 ˆb; x L(x 0 ), here the scaling matrix X = diag{x 1 ;:::;x n } and x i is the ith component of x 0. Similar to proofs of Lemma 4.1 and Theorem 4.2 in [13], we can also obtain following main results. Lemma 3.4. Assume that Assumptions 1 2 hold. If there exists 0 such that P ĝ for all, then there is 0 such that ; : (3.20) (3.21) Theorem 3.5. Assume that Assumptions 1 2 hold. Let {x } R n be a sequence generatedby the algorithm. Then lim inf P ĝ =0: (3.22) Theorem 3.6. Assume that Assumptions 1 2 hold. Let {x } R n be a sequence generatedby the proposedalgorithm andtae B = 2 f(x ), and d T (B + 1 2 X 2 )d 0, where is the Lagrange multiplier given in (3.1) for the trust region constraint, then X 2 f(x )X is positive semidenite on N(A), where x is a limit point of {x }. Proof. By (3.1), we have that g T d = d T (B + X 2 )d T Ad = d T (B + 1 2 X 2 )d 1 2 d T X 2 )d

294 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 6 1 2 T (d T X 2 d ) = 1 2 2 : (3.23) First we consider the case when lim inf =0 on N(A). Let min (X B X ) denote the minimum eigenvalue of X B X on N(A). By Lemma 3.1, B + X 2 is positive semidenite in N(A), is equivalent to X B X + I is positive semidenite on N(A). So, max{ min (X B X ); 0} on N(A). It is clear that when lim inf =0 on N(A) there must exist a limit point x at which X 2 f(x )X is positive semidenite on N(A). Now we prove by contradiction that lim inf = 0. Assume that 2j 0 for all suciently large. First we show that { } converges to zero. According to the acceptance rule in step 4, we have that, from (3.23) f(x l() ) f(x + d ) g T d 1 2 2 j 2 : (3.24) Similar to the proof of Theorem in [4], we have that the sequence {f(x l() )} is nonincreasing for all, and hence {f(x l() )} is convergent. (3.24) means that f(x l() ) 6 f(x l(l() 1) ) l() 1 j 2 (3.25) that {f(x l() )} is convergent means lim l() 1 2 l() 1 = 0 (3.26) which implies that either or lim l() 1 = 0 (3.27) lim inf l() 1 =0: (3.28) If (3.28) holds, by the updating formula of, for all j, j 1 6 +j 6 j 2 so that M+1 l() 1 6 6 M+1 2. It means that lim inf =0: (3.29) If (3.27) holds, similar to the proof of Theorem in [6], we can also obtain that lim f(x l()) = lim f(x ): (3.24) and (3.30) mean that if (3.28) is not true, then lim =0: The acceptance rule (2.1) means that, for large enough, f ( x +! d ) f(x )! gt d : (3.30) (3.31)

D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 295 Since ( f x + )! d f(x )= ( )! gt d +o! d we have that (1 ) ( )! gt d +o! d 0: (3.32) Dividing (3.32) by =! d and noting that 1 0, we have that from (3.23) and j g T 0 6 lim d d 6 lim j 2 d 6 j lim 6 0 (3.33) (3.29) and (3.33) imply also that lim =0: Since f(x) is twice continuously dierentiable we have that (3.34) f(x + d )=f + g T d + 1 2 dt 2 f(x )d +o( d 2 ) 6 f(x l() )+g T d +( 1 2 )gt d + 1 2 (gt d + d T 2 f(x )d )+o( d 2 ): From (3.1), we can obtain that from B = 2 f, g T d + d T 2 f d = d T X 2 d + Ad = d T X 2 d 6 j 2 : Therefore, by (3.23), we have that for large enough, f(x + d ) 6 f(x l() )+ g T d ; which means that the step size = 1, i.e., h = d for large enough. If x is close enough to x and is close to 0, we have that f(x + d ) f (d ) =o( d 2 ): (3.35) Again by (3.1), we have that from B + X 2 being positive semidenite in N(A) Pred (d )= (g T d + 1 2 dt B d ) = d T (B + X 2 )d 1 2 dt B d = 1 2 dt (B + X 2 )d + 2 dt X 2 2 2 j 2 : d (3.36)

296 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 By (3.35) (3.36), we have that set = Ared (d )=Pred (d ) with Ared (d )=f f(x + d ) 1 = Ared (d ) Pred (d )) Pred (d ) = o( d 2 ) j 0 as 0: We conclude that the entire sequence { } converges to unity. ˆ 1 implies that is not decreased for suciently large and hence bounded away from zero. Thus, { } cannot converge to zero, contradicting (3.34). 4. Local convergence Theorem 3.5 indicates that at least one limit point of {x } is a stationary point. In this section, we shall rst extend this theorem to a stronger result and the local convergent rate, but it requires more assumptions. We denote the set of active constraints by I(x) def ={i x i =0; i=1;:::;n}: (4.1) To any I {1;:::;n} we associate the optimization subproblem (P) I min f(x); s:t: Ax= b; x I =0: (4.2) Assumption 3. For all I {1;:::;n}, the rst-order optimality system associated to (P) I nonisolated solutions. has no Assumption 4. The constraints of (1.1) are qualied in the sense that (A T ) i =0; i I(x) implies that =0. Assuming that ( ; ) is associated with a unique pair x which satises Assumption 3. Dene the set of strictly active constraints as J (x) def ={i i 0; i=1;:::;n} (4.3) and the extended critical cone as T def ={d R n Ad =0; d i =0; i J (x)}: (4.4) Assumption 5. The solution x of problem (1.1) satises the strong second-order condition, that is, there exists 0 such that p T 2 f(x )p p 2 ; p T: (4.5) This is a sucient condition for the strong regularity (see [2] or[12]). Given d N(A), the null space of A, we dene d Y ;d N as the orthogonal projection of d onto T and N, where N is the orthogonal complement of T in N(A), i.e., N = {z N(A) z T d =0; d T }; which means that d = d Y + d N and d 2 = d Y 2 + d N 2.

D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 297 The following lemma is actually Lemma 3.5 in [2]. Lemma 4.1. Assume that Assumptions 3 5 hold. Given 0, if x is suciently close to x then d T (B +2 X 2 )d 2 d 2 + d N 2 ; (4.6) where given in (1.1) andmultiplier given in (3.1). Similar to the proof of the above lemma, we can also obtain that given =2 1 0, d T (B + X 2 )d 2 d 2 + 1 d N 2 : (4.7) Theorem 4.2. Assume that Assumptions 3 5 hold. Let {x } be a sequence generatedby the algorithm. Then d 0. Furthermore, if x is close enough to x, and x is a strict local minimum of problem (1.1), then x x. Proof. By (3.1) and (4.7), we get that g T d = d T (B + X 2 )d T Ad = d T (B + X 2 )d 6 2 d 2 1 d N 2 : (4.8) According to the acceptance rule in step 4, we have f(x l() ) f(x + d ) g T d 2 d 2 : (4.9) Similar to the proof of Theorem in [6], we have that the sequence {f(x l() )} is nonincreasing for all, and therefore {f(x l() )} is convergent. (4.8) and (4.9) mean that f(x l() ) 6 f(x l(l() 1) ) l() 1 2 d l() 1 2 : (4.10) That {f(x l() )} is convergent means lim l() 1 d l() 1 2 =0: Similar to the proof of Theorem in [6], we can also obtain that lim f(x l()) = lim f(x ): (4.9) and (4.12) imply that lim d 2 =0: Assume that there exists a subsequence K {} such that lim ; K d 0: (4.11) (4.12) (4.13) (4.14)

298 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 Then, Assumption (4.14) means lim =0: (4.15) ; K Similar to prove (3.32), the acceptance rule (2.1) means that for large enough, (1 ) ( )! gt d +o! d 0: (4.16) Dividing (4.16) by =! d and noting that 1 0 and g T d 6 0, we obtain g T 0 6 lim d d 6 lim d 2 2 d = lim d 6 0: (4.17) From (4.17), we have that lim d =0; ; K which contradicts (4.14). Therefore, we have that lim d =0: (4.18) Assume that there exists a limit point x which is a local minimum of f, let {x } K be a subsequence of {x } converging to x.as l() M, for any, x = x l(+m+1) d l(+m+1) 1 d l(+m+1) 1 ; there exists a point x l() such that from (4.18) lim x l() x =0; so that we can obtain (4.19) lim x l() x 6 lim x x + lim x l() x =0: (4.20) K; K; K; This means that also the subsequence {x l() } K converges to x. As Assumption 3 necessarily holds in a neighborhood of x, then x is the only limit point {x } in some neighborhood N(x ; ) ofx, where 0 is an any constant. On the other hand, we now that the sequence {f(x l() )} is nonincreasing for all large, that is, f(x l() ) f(x l(+1) ). Dene ˆf def =inf {f(x); x N(x ; ) \ N(x ; =2M)}: Because x is a stric local minimum of problem (1.1), we may assume ˆf f. Now, assuming that f(x ) 6 f(x l() ) 6 ˆf and x l() N(x ; =2M), it follows that f(x l(+1) ) 6 ˆf and x l(+1) N(x ; ); using the denition of ˆf, we nd that x l(+1) N(x ; =2M), and hence x l(+2) ;:::;x l(+m+1) N(x ; =2M) again. This implies that sequence {x l() } remains in N(x ; =2M). By x +1 x 6 x x l(+1) + x l(+1) x +1 6 =2 we get that x x which means that the conclusion of the theorem is true.

D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 299 Theorem 4.3. Assume that Assumptions 3 5 hold. Let {x } be a sequence generatedby the algorithm. Then lim P ĝ =0: (4.21) Proof. Assume that there are an 1 (0; 1) and a subsequence {P mi ĝ mi } of {P ĝ } such that for all m i ; i=1; 2;::: P mi ĝ mi 1 : (4.22) Theorem 3.5 guarantees the existence of another subsequence {P li ĝ li } such that and P ĝ 2 for m i 6 l i (4.23) P li ĝ li 6 2 (4.24) for an 2 (0; 1 ). From Theorem 4.2, we now that lim d =0: Because f(x) is twice continuously dierentiable, we have that, from above, (4.25) f(x + d )=f(x )+g T d + 1 2 dt 2 f(x )d +o( d 2 ) ( ) 1 6 f(x l() )+g T d + 2 g T d + 1 2 [gt d + d T B d ] From (3.1), we can obtain + 1 2 dt ( 2 f(x ) B )d +o( d 2 ): (4.26) g T d + d T B d = d T X 2 d T Ad = d T X 2 d 6 0: Similar to the proof of Theorem 3.1 in [2], we can obtain that g T 1 d 6 (2 ) (dy ) T 2 f(x )d Y 2 2 ; (4.27) 1 2 dt (B 2 f(x ))d 1 2(2 ) (dy ) T 2 f(x )d Y d N 2 + 0 4 dy 2 ; (4.28) where def =sup { 2 f(x ) B + 2 f(x ) B 2 def =} and 0 =sup {d T ( 2 f(x ) 2 f(x +d ))d = d 2 }; (0; 1). Since (1=2 )=(2 ) ( 1)=2(2 )==2(2 ) 0. Hence, ( ) 1 2 g T d + 1 2 dt ( 2 f(x ) B )d +o( d 2 ) 0:

300 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 So, we have that for large enough i and m i 6 l i, f(x + d ) 6 f(x l() )+g T d ; (4.29) which means that the step size = 1, i.e., h = d for large enough i and m i 6 l i. If x is close enough to x, d is then close to 0, and hence B is close to 2 f(x ). We deduce that f(x + d ) f(x ) (d ) = g T d + 1 2 dt 2 f(x )d +o( d 2 ) (g T d + 1 2 dt B d ) =o( d 2 ): (4.30) From (2.4) and (4.6), for large enough i; m i 6 l i, Pred(d )=d T ( 1 2 B + X 2 )d + T Ad = d T ( 1 2 B + X 2 )d 2 d 2 + d N 2 ; (4.31) where given in (4.6). As d = h, for large i; m i 6 l i, we obtain that ˆ = f f(x + h ) Pred(h ) =1+ f f(x + d )+ (d ) Pred(h ) o( d 2 ) 1 2 d 2 + d N 2 2 : (4.32) This means that for large i; m i 6 l i, f f(x + h ) 2 Pred(h ) 2 2 d 2 : Therefore, we can deduce that, for large i, x mi x li 2 6 = l i 1 =m i x +1 x 2 l i 1 =m i h 2

D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 301 6 1 2 2 = 1 2 2 l i 1 =m i [f(x ) f(x + h )] (f mi f li ): (4.33) (4.33) and (4.12) mean that for large i, we have x mi x li 6 1 L (L +1) ; where L is the Lipschitz constant of g(x) inl(x 0 ) and L is X and g(x) bounded in L(x 0 ), that is, g(x) 6 L and X 6 L. We then use the triangle inequality to show P mi ĝ mi 6 P mi ĝ mi P mi ĝ li + P mi ĝ li P li ĝ li + P li ĝ li 6 LL x mi x li + L P mi P li + 2 6 L (L +1) x mi x li + 2 : (4.34) We choose 2 = 1 =4, then (4.34) contradicts (4.22). This implies that (4.22) is not true, and hence the conclusion of the theorem holds. Theorem 4.4. Let {x } be propose by the algorithm. Assume that {B } is bounded, then (a) any limit point x of {x } is a solution of the rst-order optimality system associated to problem (P) I(x ), that is, f(x )=A T =0; Ax = b; (x ) I(x ) =0; i =0; i I(x ); where i is the ith component of. (b) If Assumptions 3 4 and (4.7) hold, then x satises the rst-order optimality system of (1.1); i.e., there exist R m ; R n such that f(x ) A T =0; Ax = b; x 0; 0; x T =0: Proof. Following Lemma 2.3 in [1], we only prove that the sequence {x } satises the following conditions: (i) 0, (ii) d 0, (iii) X [ f A T ] 0.

302 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 In fact, we have proved that (a) holds, similar to prove Theorem 3.6. From Theorem 4.2, we obtain that (b) holds. Theorem 4.3 means P ĝ 0 which implies P ĝ =[I Â T (Â Â T ) 1 Â ]ĝ = X [ f A T ] because of Â = AX ; ĝ = X g and def =(Â T Â) 1 Â T ĝ. By conditions (i), (ii), (iii), similar to the proofs of Theorem 2.2 in [2], we can also obtain that the conclusions of the theorem are true. The conclusions of Theorem 4.4 are the same results as Theorem 2.2 in [2] under the same as assumptions in [2]. We now discuss the convergence rate for the proposed algorithm. For this purpose, it is show that for large enough, the step size 1, and there exists ˆ 0 such that ˆ: Theorem 4.5. Assume that Assumptions 3 5 hold. For suciently large, then the step 1 andthe trust region constraint is inactive, that is, there exists ˆ 0 such that K ; K ; where K is a large enough index. Proof. Similar to prove (4.29), we can also obtain that at the suciently large th iteration, f(x + d ) 6 f(x l() )+g T d ; (4.35) which means that the step size 1, i.e., h = d for large enough. By the above inequality, we now that x +1 = x + d : By Assumptions 3 5, we can obtain that from (4.30) 1= Ared(h ) Pred(h ) Pred(h ) = (gt h + 1 2 ht B h ) (g T h + 1 2 ht 2 f(x )h +o( h 2 )) Pred(h ) = o( h 2 ) Pred(h ) : Similar to prove (4.31), for large enough, (4.36) Pred(d ) 2 d 2 + d N 2 ; (4.37) where given in (4.6).

D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 303 Similar to the proof of Theorem 4.2, we can also obtain that d 0. Hence, (4.36) and (4.37) mean that 1. Hence there exists ˆ 0 such that when d 6 ˆ; ˆ 2, and therefore, +1.Ash 0, there exists an index K such that d 6 ˆ whenever K. Thus K ; K : The conclusion of the theorem holds. Theorem 4.5 means that the local convergence rate for the proposed algorithm depends on the Hessian of objective function at x and the local convergence rate of the step d. Since the step size 1, and the trust region constraint is inactive, for suciently large. 5. Numerical experiments Numerical experiments on the nonmonotonic bac-tracing trust region interior point algorithm given in this paper have been performed on an IBM 586 personal computer. In this section, we present the numerical results. We compare with dierent nonmonotonic parameters M =0; 4 and 8, respectively, for the proposed algorithm. A monotonic algorithm is realized by taing M =0. In order to chec eectiveness of the bac-tracing technique, we select the same parameters as used in [4]. The selected parameter values are: ˆ =0:01; 1 =0:001; 2 =0:75; 1 =0:2; 2 =0:5; 3 =2;! =0:5; max =10; =0:2, and initially 0 = 5. The computation terminates when one of the following stopping criterions is satised P ĝ 6 10 6, or f f +1 6 10 8 max{1; f }. The experiments are carried out on 15 standard test problems which are quoted from [10,7]. Besides the recommended starting points in [10,7] (HS: the problems from Hoc and Schittowsi [7] and SC: from Schittowsi [10]), denoted by x 0a, we also test these methods with another set of starting points x 0b. The computational results for B = 2 f(x ), the real Hessian, are presented at the following table, where ALG denote it variation proposed in this paper with nonmonotonic decreasing and bac-tracing techniques. NF and NG stand for the numbers of function evaluations and gradient evaluations, respectively. NO stands for the number of iterations in which nonmonotonic decreasing situation occurs, that is, the number of times f f +1 0. The number of iterations is not presented in the following table because it always equals NG. The results under ALG (M = 0) represent mixture of trust region and line-search techniques via interior point of feasible set considered in this paper. Our type of approximate trust region method is very easy to resolve the subproblem (S ) with a reduced radius via interior point. The bac-tracing steps can outperform the traditional method when the trust region subproblem is solved accurately over the whole hyperball. The last three parts of the table, under the headings of M = 0, 4 and 8 respectively, show that for most test problems the nonmonotonic technique does bring in some noticeable improvement (Table 1).

304 D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 Table 1 Experimental results Problem name Initial ALG point M =0 M =4 M =8 NF NG NF NG NO NF NG NO Rosenbroc x 0a 25 21 16 14 5 13 12 4 (C = 100) x 0b 14 12 7 7 1 7 7 1 Rosenbroc x 0a 92 60 16 16 5 16 14 5 (C = 10 000) x 0b 30 25 8 8 1 8 8 1 Rosenbroc x 0a 249 214 26 24 5 16 14 5 (C = 1 000 000) x 0b 45 35 15 15 4 15 15 4 Freudenstein x 0a 6 6 6 6 0 6 6 0 x 0b 12 10 11 11 1 11 11 1 Huang x 0a 14 14 12 12 1 12 12 1 HS048 x 0b 18 18 16 16 1 16 16 1 Huang x 0a 9 9 9 9 0 9 9 0 HS049 x 0b 15 15 15 15 0 15 15 0 Huang x 0a 23 17 20 14 1 20 14 1 HS050 x 0b 38 27 31 19 1 31 19 1 Hsia x 0a 6 6 6 6 0 6 6 0 HS055 x 0b 10 10 10 10 0 10 10 0 Betts(HS062) x 0 24 16 20 14 1 17 12 2 Sheela(HS063) x 0 35 35 30 30 1 35 30 1 While x 0 231 201 209 183 9 171 156 10 SC265 x 0 6 4 6 4 0 6 4 0 SC347 x 0 4 4 4 4 0 4 4 0 Banana x 0a 27 20 19 18 1 16 16 2 (n =6) x 0b 32 26 24 23 3 23 23 4 Banana x 0a 45 35 45 35 0 45 35 0 (n = 16) x 0b 33 30 32 30 0 32 30 0

Acnowledgements D. Zhu / Journal of Computational andappliedmathematics 155 (2003) 285 305 305 The author would lie to than an anonymous referee for his helpful comments and suggestions. The author gratefully acnowledges the partial supports of the National Science Foundation Grant (10071050) of China, the Science Foundation Grant 02ZA14070 of Shanghai Technical Sciences Committee and the Science Foundation Grant (02DK06) of Shanghai Education Committee. References [1] F. Bonnans, T. Coleman, Combining trust region and ane scaling for linearly constrained nonconvex minimization, in: Y. Yuan (Ed.), Advances in Nonlinear Programming, Kluwer Academic Publishers, Dordrecht, 1998, pp. 219 250. [2] J.F. Bonnans, C. Pola, A trust region interior point algorithm for linear constrained optimization, SIAM J. Optim. 7 (3) (1997) 717 731. [3] N.Y. Deng, Y. Xiao, F.J. Zhou, A nonmonotonic trust region algorithm, J. Optim. Theory Appl. 76 (1993) 259 285. [4] J.E. Jr. Dennis, R.B. Schnable, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, New Jersey, 1983. [5] J.E. Dennis, L.N. Vicente, On the convergence theory of trust-region-based algorithms for equality-constrained optimization, SIAM J. Optim. 7 (1997) 527 550. [6] L. Grippo, F. Lampariello, S. Lucidi, A nonmonotonic line-search technique for Newton s methods, SIAM J. Numer. Anal. 23 (1986) 707 716. [7] W. Hoc, K. Schittowsi, Test Examples for Nonlinear Programming Codes, in: Lecture Notes in Economics and Mathematical Systems, Vol. 187, Springer, Berlin, 1981. [8] J.J. More, D.C. Sorensen, Computing a trust region step, SIAM J. Sci. Statist. Comput. 4 (1983) 553 572. [9] J. Nocedal, Y. Yuan, Combining trust region and line-search techniques, in: Y. Yuan (Ed.), Advances in Nonlinear Programming, Kluwer, Dordrecht, 1998, pp. 153 175. [10] K. Schittowsi, More Test Examples for Nonlinear Programming Codes, in: Lecture Notes in Economics and Mathematical Systems, Vol. 282, Springer, Berlin, 1987. [11] D.C. Sorensen, Newton s method with a model trust region modication, SIAM J. Numer. Anal. 19 (1982) 409 426. [12] Y. Ye, E. Tse, An extension of Karmarar s projective algorithm for convex quadratic programming, Math. Programming 44 (1989) 157 179. [13] D. Zhu, Curvilinear paths and trust region methods with nonmonotonic bac tracing technique for unconstrained optimization, J. Comput. Math. 19 (2001) 241 258.