Global Convergence of. Trust-Region Interior-Point. Algorithms for. Innite-Dimensional Nonconvex. Minimization Subject to Pointwise.

Size: px

Start display at page:

Download "Global Convergence of. Trust-Region Interior-Point. Algorithms for. Innite-Dimensional Nonconvex. Minimization Subject to Pointwise."

Brittney Short
6 years ago
Views:

1 Global Convergence of Trust-Region Interior-Point Algorithms for Innite-Dimensional Nonconvex Minimization Subject to Pointwise Bounds Michael Ulbrich Stefan Ulbrich Matthias Heinenschloss CRPC-TR97692 March 1997 Center for Research on Parallel Computation Rice University 6100 South Main Street CRPC - MS 41 Houston, TX submitted April 1997

2 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT ALGORITHMS FOR INFINITE-DIMENSIONAL NONCONVEX MINIMIZATION SUBJECT TO POINTWISE BOUNDS MICHAEL ULBRICH y, STEFAN ULBRICH z, AND MATTHIAS HEINKENSCHLOSS x Abstract. A class of interior{point trust{region algorithms for innite{dimensional nonlinear optimization subject to pointwise bounds in L p -Banach spaces, 2 p 1, is formulated and analyzed. The problem formulation is motivated by optimal control problems with L p -controls and pointwise control constraints. The interior{point trust{region algorithms are generalizations of those recently introduced by Coleman and Li (SIAM J. Optim., 6 (1996), pp. 418{445) for nite{dimensional problems. Many of the generalizations derived in this paper are also important in the nite{dimensional context. They lead to a better understanding of the method and to considerable improvements in their performance. All rst{ and second{order global convergence results nown for trust{region methods in the nite{dimensional setting are extended to the innite{dimensional framewor of this paper. Key words. Innite-dimensional optimization, bound constraints, ane scaling, interior-point algorithms, trust-region methods, global convergence, optimal control, nonlinear programming. AMS subject classications. 49M37, 65K05, 90C30, 90C48 1. Introduction. This paper is concerned with the development and analysis of a class of interior{point trust{region algorithms for the solution of the following innite{dimensional nonlinear programming problem: (P) minimize f(u) subject to u 2 B def = fu 2 U : a(x) u(x) b(x); x 2 g : Here IR n is a domain with positive and nite Lebesgue measure 0 < () < 1. Moreover, U def = L p (); 2 p 1; denotes the usual Banach space of real-valued measurable functions, and the objective function f : D?! IR is continuous on an open neighborhood D U of B. All pointwise statements on measurable functions are meant to hold -almost everywhere. The lower and upper bound functions a; b 2 V, V def = L 1 (); are assumed to have a distance of at least > 0 from each other. More precisely, b(x)? a(x) on : This version was generated June 2, y Institut fur Angewandte Mathemati und Statisti, Technische Universitat Munchen, D Munchen, Germany, mulbrich@statisti.tu-muenchen.de. This author was supported by the DFG under Grant Ul157/1{1 and by the NATO under Grant CRG z Institut fur Angewandte Mathemati und Statisti, Technische Universitat Munchen, D Munchen, Germany, sulbrich@statisti.tu-muenchen.de. This author was supported by the DFG under Grant Ul158/1{1 and by the NATO under Grant CRG x Department of Computational and Applied Mathematics, Rice University, Houston, Texas 77005{ 1892, USA, heinen@rice.edu. This author was supported by the NSF under Grant DMS{ , by the DoE under Grant DE-FG03-95ER25257, the AFSOR under Grant F49620{96{1{0329, and the NATO under Grant CRG

3 2 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS Problems of type (P) arise for instance when the blac-box approach is applied to optimal control problems with bound{constrained L p -control. See, e.g., the problems studied by Burger, Pogu [3], Kelley, Sachs [14], and Tian, Dunn [20]. The algorithms in this paper are extensions of the interior{point trust{region algorithms for bound constrained problems in IR N introduced by Coleman and Li [6]. Algorithmic enhancements of these methods have been proposed and analyzed in the nite{dimensional context in Branch, Coleman, Li [2], Coleman, Li [5], and Dennis, Vicente [11]. Dennis, Heinenschloss, Vicente [10], and Heinenschloss, Vicente [13] extend these methods to solve a class of nite{dimensional constrained optimization problems with bound constraints on parts of the variables. See also Vicente [23]. The interior{point trust{region methods in [6] are based on the reformulation of the Karush{Kuhn{Tucer (KKT) necessary optimality conditions as a system of nonlinear equations using a diagonal matrix D. This ane scaling matrix is computed using the sign of the gradient components and the distance of the variables to the bounds. See x 2. The nonlinear system is then solved by an ane{scaling interior{point method in which the trust{region is scaled by D? 1 2. These methods enjoy strong theoretical convergence properties as well as a good numerical behavior. The latter is documented in [2], [6], [10], [11] where these algorithms have been applied to various standard nite{dimensional test problems and to some discretized optimal control problems. The present wor is motivated by the application of interior{point trust{region algorithms to optimal control problems with bounds on the controls. Even though the numerical solution of these problems requires a discretization and allows the application of the previously mentioned algorithms to the resulting nite{dimensional problems, it is nown that the innite{dimensional setting dominates the convergence behavior if the discretization becomes suciently small. If the algorithm can be applied to the innite{dimensional problem and convergence can be proven in the innite{dimensional setting, asymptotically the same convergence behavior can be expected if the algorithm is applied to the nite{dimensional discretized problems. Otherwise, the convergence behavior might { and usually does { deteriorate fast as the discretization is rened. In the present context, the formulation of the interior{point trust{region algorithms for the solution of the innite{dimensional problem (P) requires a careful statement of the problem and of the requirements on the function f. This will be done in x 3. The innite{dimensional problem setting in this paper is similar to the ones in [12], [14], [15], [20]. The general structure of the interior{point trust{region algorithms presented here is closely related to the nite{dimensional algorithms in [6]. However, the statement and analysis of the algorithm in the innite{dimensional context is more delicate and has motivated generalizations and extensions which are also relevant in the nite{dimensional context. The analysis performed in this paper allows for a greater variety of choices for the ane scaling matrix and the scaling of the trust{region than those presented previously in [6], [11]. Our convergence analysis is more comprehensive than the ones in [5], [6], [11], [23]. In particular, we adapt techniques proposed in Shultz, Schnabel, and Byrd [18] to prove that under mild assumptions every accumulation point satises the second{order necessary optimality conditions. Moreover, the convergence results proven in this paper extend all the nite{dimensional ones stated in [17], [18], [19] to our innite{dimensional context with bound constraints. In the follow up paper [22] we present a local convergence analysis of a superlinearly convergent ane{scaling interior{point Newton method

4 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT METHODS 3 which is based on equation (13) and prove under appropriate assumptions that in a neighborhood of the solution the generated trial steps are accepted by our trust{region algorithms. There a projection onto the set B will be used in the computation of trial steps. This extension to the nite{dimensional method, which was originally motivated by the function space framewor, has also led to signicant improvements of the nite{dimensional algorithm applied to some standard test problems, not obtained from the discretization of optimal control problems. See [22]. Trust{region methods for innite{dimensional problems lie (P) have also been investigated by Kelley, Sachs [15] and Toint [21]. In both papers the constraints are handled by projections. The paper [21] considers trust{region algorithms for minimization on closed convex bounded sets in Hilbert space. They are extensions of the nite{dimensional algorithms by Conn, Gould, Toint [7]. It is proven that the projected gradient converges to zero. A comprehensive nite{dimensional analysis of trust{region methods closely related to those introduced by Toint can be found in Bure, More, Toraldo [4]. In contrast to the results in [21], our convergence analysis is also applicable to objective functions that are merely dierentiable on a Banach space L p (), p 2 (2; 1], which reduces the dierentiability requirements substantially compared to the L 2 -Hilbert space framewor. Furthermore, for the problem class under consideration our convergence results are more comprehensive than the ones in [21]. The innite{dimensional setting used in [15] ts into the framewor of this paper, but is more restrictive. R The formulation of their algorithm depends on the presence of a penalty term u2 (x)dx in the objective function f and they assume that IR is an interval. Their algorithm also includes a `post smoothing' step, which is performed after the trust{region step is computed. The presence of the post smoothing step ensures that existing local convergence results can be applied. Such a `post smoothing' is not needed in the global analysis of this paper. We introduce the following notations. L(X; Y ) is the space of linear bounded operators from a Banach space X into a Banach space Y. By q we denote the norm of the Lebesgue space L q (), 1 q 1, and we write (; ) 2 for the inner product of the Hilbert space H def = L 2 (). For (v; w) 2 (L q (); L q () ), with L q () denoting the dual space of L q (), we use the canonical dual pairing hv; wi def = R v(x)w(x) dx, for which, if q < 1, the dual space L q () is given by L q0 (), 1=q + 1=q 0 = 1 (in the case q = 1 this means q 0 = 1). Especially, if q = 2, we have L 2 () = L 2 () and h; i coincides with (; ) 2. Finally, we set U 0 def = L p0 (), 1=p + 1=p 0 = 1, which is the same as U, if p < 1. Moreover, it is easily seen that w 7?! h; wi denes a linear norm-preserving injection from L 1 () into L 1 (). Therefore, we may always interpret U 0 as subspace of U. As a consequence of Lemma 5.1 we get the following chain of continuous imbeddings: V,! U,! H = H,! U 0,! U,! V : Throughout we will wor with dierentiability in the Frechet{sense. We write g(u) def = rf(u) 2 U for the gradient and r 2 f(u) 2 L(U; U ) for the second derivative of f at u 2 B if they exist. The 1 -interior of B is denoted by B : [ B def def = B ; B = fu 2 U : a(x) + u(x) b(x)? ; x 2 g : >0 We often write f ; g ; : : : for f(u ); g(u ); : : :

5 4 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS This paper is organized as follows. In the next section we review the basics of the nite{dimensional interior{point trust{region algorithms in [6] and use this to motivate the innite{dimensional setting applied in this paper. In x 3 we formulate the necessary optimality conditions in the framewor needed for the interior{point trust{region algorithms. The interior{point trust{region algorithms are introduced in x 4. Some basic technical results are collected in x 5. The main convergence results are given in x 6, which concerns the global convergence to points satisfying the rst{order necessary optimality conditions, and in x 7, which concerns the global convergence to points satisfying the second{order necessary optimality conditions. These convergence results extend all the nown convergence results for trust{region methods in nite dimensions to the innite{dimensional setting of this paper. The local convergence analysis of these algorithms is given in the follow up paper [22], which also contains numerical examples illustrating the theoretical ndings of this paper. 2. Review of the nite{dimensional algorithm and innite{dimensional problem setting. We briey review the main ingredients of the ane{scaling interior{ point trust{region method introduced in [6]. We refer to that paper for more details. The algorithm solves nite{dimensional problems of the form (P N ) minimize f(u) n def subject to u 2 B N = u 2 IR N : a u b where f : IR N?! IR is a twice continuously dierentiable function and a < b are given vectors in IR N. (One can allow components of a and b to be?1 or 1, respectively. This is excluded here to simplify the presentation.) Inequalities are understood component wise. The necessary optimality conditions for (P N ) are given by rf(u)? a + b = 0; a u b; (u? a) T a + (b? u) T b = 0; a 0; b 0: With the diagonal matrix dened by ( def (b? u)i if (rf(x)) D(u) = i < 0; (1) ii (u? a) i if (rf(x)) i 0; for i = 1; : : :; N, the necessary optimality conditions can be rewritten as o ; (2) D(u) r rf(u) = 0; a u b: where the power r > 0 is applied to the diagonal elements. This form of the necessary optimality conditions { we choose r = 1 { can now be solved using Newton's method. The i-th component of the function D(u) is dierentiable except at points where (rf(u)) i = 0. However, this lac of smoothness is benign since D(u) is multiplied by rf(u). One can use (3) D(u)r 2 f(u) + diag(rf(u))j(u)

6 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT METHODS 5 where J(u) is the diagonal matrix def J(u) = ii 8 >< >:?1 if (rf(x)) i < 0; 1 if (rf(x)) i > 0; 0 else; as the approximate derivative of D(u)rf(u). After symmetrization, one obtains (4) ^M(u) = D(u) 1=2 r 2 f(u)d(u) 1=2 + diag(rf(u))j(u): One can show that the standard second{order necessary optimality conditions are equivalent to (2) and the positive semi{deniteness of ^M(u). The standard second{ order sucient optimality conditions are equivalent to (2) and the positive deniteness of ^M(u). A point satisfying the necessary optimality conditions (2) is now computed using the iteration u +1 = u + s, where for a given u with a < u < b, the trial step s = D 1=2 ^s satises a < u + s < b, and ^s is an approximate solution of (5) min ^(^s) subject to ^s 2 ; u + D 1=2 ^s 2 B N def with ^(^s) = ^g T ^s ^st def ^M^s, ^g = D 1=2 rf. The trust{region radius is updated from iteration to iteration in the usual fashion. In (5) the Hessian r 2 f(u ) might be replaced by a symmetric approximation B. If the approximate solution ^s of (5) satises a fraction of Cauchy decrease condition (6) n o ^(^s ) < min ^(^s) : ^s = t^g ; t 0; ^s 2 ; u + D 1=2 ^s 2 B N ; ^s 2 0 ; then under appropriate, standard conditions one can show the basic trust{region convergence result lim inf!1 D(u ) 1=2 rf(u ) = 0: Stronger convergence results can be proven if the assumptions on the function f and on the step computation ^s are strengthened appropriately. See [6] and [5], [11]. Coleman and Li [6] show that close to nondegenerate KKT{points one obtains trial steps ^s which meet these requirements if one rst computes an approximate solution of (5) ignoring the bound constraints and then satises the interior{point condition a < u + s < b by a step-size rule. A careful analysis of the proofs in [6] unveils that the same holds true for nearly arbitrary trust{region scalings. It becomes apparent that the crucial role of the ane scaling does not consist in the scaling of the trust{ region but rather in leading to the additional term diag(rf )J in the Hessian ^M of ^. Near nondegenerate KKT{points this positive semi{denite diagonal{matrix shapes the level sets of ^ in such a way that all 'bad' directions ^s which allow only for small step-sizes to the boundary of the box cannot minimize ^ on any reasonable trust{region. The trust-region scaling in (5), (6) tends to equilibrate the distance of the origin to the bounding box constraints f^s : u + D 1=2 ^s 2 B N g. However, for this feature the equivalence of 2- and 1-norm is indispensable and thus it does not carry over to our innite{dimensional framewor. In fact, in the innite{dimensional

7 6 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS setting the ane{scaled trust{region f^s p g no longer enjoys the property of reecting the distance to the bounding box constraints. Therefore we will allow for a very general class of trust{region scalings in our analysis. See also [11]. Since, as mentioned above, the term diag(rf )J in the Hessian ^M plays the crucial role in this ane{scaling interior{point all convergence results in [6] remain valid. It is also worth mentioning that in our context an approximate solution ^s of (5) satisfying (6) can be easily obtained by applying any descent method which starts minimization at ^s = 0 along the steepest descent direction?^g. Moreover, we show in [22] that near an optimizer satisfying suitable suciency conditions admissible trial steps can be obtained from unconstrained minimizers of ^ by pointwise projection onto B. Here our exibility in the choice of the trust{region scaling will prove to be valuable. The nite{dimensional convergence analysis heavily relies on the equivalency of norms in IR N. This is for example used to obtain pointwise ( 1 ) estimates from 2 estimates. In the innite{dimensional context the formulation of the algorithm and the proof of its convergence is more delicate. We will mae use of the following Assumptions: (A1) f : D?! IR is dierentiable on D with g mapping B U continuously into U 0. (A2) The gradient g satises g(b) V. (A3) There exists c 1 > 0 such that g(u) 1 c 1 for all u 2 B. (A4) f is twice continuously dierentiable on D. If p = 1 then r 2 f(u) 2 L(U; U 0 ) for all u 2 B, and if (h ) V converges to zero in all spaces L q (), 1 q < 1, then r 2 f(u)h tends to zero in U 0. For p 2 [2; 1) the assumptions (A1) and (A4) simply say that f is continuously Frechet{dierentiable or that f is twice continuously Frechet{dierentiable, respectively. If p = 1, then the requirements that g(u); r 2 f(u)h 2 U 0 = L 1 () 6= U for u 2 B, h 2 V is a further condition. It allows us to use estimates lie hv; g(u)i g(u) p 0v p for p 2 [2; 1) and p = 1. Moreover, since on L 1 () the L 1 - and (L 1 ) - norms coincide, assumption (A4) implies that r 2 f : B U?! L(U; U 0 ) is continuous also for p = 1. Finally, (A1) ensures that the gradient g(u) is always at least an L 1 - function which will be essential for many reasons, e.g. to allow the denition of a function space analogue for the scaling matrix D. The assumption (A2) is motivated by the choice of the scaling matrix D and the fraction of Cauchy decrease condition (6) in the nite{dimensional case. The innite{ dimensional analogue d(u) of the diagonal scaling matrix D(u) will be a function. Given the denition (1) of D(u) it is to be expected that d(u) 2 L 1 () if a < u < b. If g(u) 2 L p0 (), then d(u) 1=2 g(u) 2 L p0 (). Hence, the candidate for the Cauchy step satises ^s c =?td(u) 1=2 g(u) 2 L p0 (). Since p 0 6= 1, one will in general not be able to nd a scaling > 0 so that a < u + d(u) 1=2^s c < b. The assumption (A2) assures that ^s c =?td(u)g(u) 2 V. The uniform boundedness assumption (A3) is, e.g., used to derive the important estimate (26). We point out that in (A3) the uniform bound on g(u) has to hold only for u 2 B which is a bounded set in L 1 (). The conditions (A1){(A4) limit the optimal control problems that t into this framewor. However, a large and important class of optimal control problems with L p -controls satisfy these conditions. For example, the conditions imposed in [12, p. 1270], [20, p. 517] to study the convergence of the gradient projection method

8 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT METHODS 7 imply our assumptions (A1), (A2), and (A4). The assumption (A3) can be enforced by additional requirements on the functions and S used in [12], [20]. The boundary control problems for a nonlinear heat equation in [3] and in [14], [15] also satisfy the assumptions (A1){(A4). See [22]. 3. Necessary optimality conditions and ane scaling. The problem under consideration belongs to the class of cone constrained optimization problems in Banach space for which optimality conditions are available (cf. [16]). But we believe that for our particular problem an elementary derivation of the necessary optimality conditions for problem (P) not only is simpler but also more transparent than the application of the general theory. This derivation also helps us to motivate the choice of the ane scaling which is used to reformulate the optimality condition and which is the basis for the interior{point method First{order necessary conditions. The rst{order necessary optimality conditions in Theorem 3.1 are completely analogous to those for nite{dimensional problems with simple bounds (cf. x 2, [6]). We only have to replace coordinatewise by pointwise statements and to ensure that the gradient g(u) is a measurable function. Theorem 3.1 (First{order necessary optimality conditions). Let u be a local minimizer of problem (P) and assume that f is dierentiable at u with g(u) 2 U 0. Then (O1) u 2 B, 8 >< >: = 0 for x 2 with a(x) < u(x) < b(x); (O2) g(u)(x) 0 for x 2 with u(x) = a(x); 0 for x 2 with u(x) = b(x) are satised. Proof. Condition (O1) is trivially satised. To verify (O2), dene A? = fx 2 : u(x) = a(x); g(u)(x) < 0g ; A? = fx 2 A? : g(u)(x)?1=g ; and assume that A? has positive measure (A? ) > " > 0. Since is continuous from below and A? " A?, there exists l > 0 with (A l?) ". This yields a contradiction, because u + s 2 B, s = A? (b? a), for 0 1, and d d f(u + s)j =0 = hs; g(u)i? " l < 0: Hence we must have (A? ) = 0. In the same way we can show that (A + ) = 0 for A + = fx 2 : u(x) = b(x); g(u)(x) > 0g. Finally, we loo at I = fx 2 : a(x) < u(x) < b(x); g(u)(x) 6= 0g : Assume that (I) > " > 0. Since I " I with I = fx 2 : a(x) + 1= u(x) b(x)? 1=; jg(u)(x)j 1=g ; we can nd l > 0 with (I l ) " and obtain for g(u) s =? I l jg(u)j

9 8 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS that u + s 2 B, 0 1=l, and d d f(u + s)j =0 = hs; g(u)i? " l < 0; a contradiction to the local optimality of u. Hence (I) = (A? ) = (A + ) = 0 which means that (O2) holds Ane scaling. Let assumption (A1) hold. Our algorithm will be based on the following equivalent ane{scaling formulation of (O2): (7) d r (u)g(u) = 0 where r > 0 is arbitrary and d(u) 2 V, u 2 B, is a scaling function which is assumed to satisfy (8) d(u)(x) 8 >< >: = 0 if u(x) = a(x) and g(u)(x) 0; = 0 if u(x) = b(x) and g(u)(x) 0; > 0 else; for all x 2. The equivalence of (O2) and (7) will be stated and proved in Lemma 3.2. Before we do this, we give two examples of proper choices for d. The rst choice d = d I is motivated by the scaling matrices used in [6] (see (1)). Except for points x with g(u)(x) = 0 it equals those used in [6] and [11]: (9) d I (u)(x) def = 8 >< >: u(x)? a(x) b(x)? u(x) if g(u)(x) > 0 or g(u)(x) = 0 and u(x)? a(x) b(x)? u(x); if g(u)(x) < 0 or g(u)(x) = 0 and b(x)? u(x) < u(x)? a(x): The slight modication in comparison to (1) will enable us to establish the valuable relation (16) without a nondegeneracy assumption. While the global analysis could be carried out entirely with this choice, the discontinuous response of d(u)(x) to sign changes of g(u)(x) raises diculties for the design of superlinearly convergent algorithms in innite dimensions. These can be circumvented by the choice d = d II, where (10) d II (u)(x) def = 8 >< >: minfjg(u)(x)j; c(x)g minfjg(u)(x)j; c(x)g minfu(x)? a(x); b(x)? u(x); c(x)g if?g(u)(x) > u(x)? a(x) and u(x)? a(x) b(x)? u(x); if g(u)(x) > b(x)? u(x) and b(x)? u(x) u(x)? a(x); Here c : x 2 7?! minf(b(x)? a(x)); g with 2 (0; 1=2] and 1. It is easily seen that d = d I and d = d II both satisfy (8). An illustrative example for the improved smoothness of the scaling function d II (u) will be given in x 4.1. Lemma 3.2. Let (A1) hold and u 2 B. Then (O2) is equivalent to (7) for all r > 0 and all d satisfying (8). Proof. Since d r, r > 0, also satises (8), we may restrict ourselves to the case r = 1. First assume that (O2) holds. For all x 2 with g(u)(x) = 0 we also have else:

10 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT METHODS 9 d(u)(x)g(u)(x) = 0. If g(u)(x) > 0, then by (O2) u(x) = a(x) and if g(u)(x) < 0 then u(x) = b(x). In both cases d(u)(x) = 0 and hence d(u)(x)g(u)(x) = 0. On the other hand, let d(u)g(u) = 0 hold. For all x 2 with a(x) < u(x) < b(x) we have d(u)(x) > 0 which implies g(u)(x) = 0. For all x 2 with u(x) = a(x) we obtain g(u)(x) 0 since g(u)(x) < 0 would yield the contradiction d(u)(x) > 0. Analogously, we see that g(u)(x) 0 for all x 2 with u(x) = b(x). Therefore, (O2) holds Second{order conditions. If assumption (A4) holds, we can derive second{ order conditions which are satised at all local solutions of (P). These are also analogous to the well nown conditions for nite{dimensional problems. Theorem 3.3 (Second{order necessary optimality conditions). Let (A4) be satised and g(u) 2 U 0 hold at the local minimizer u of problem (P). Then (O1); (O2) and (O3) hs; r 2 f(u)si 0 for all s 2 T (B; u) are satised, where T (B; u) def = fs 2 V : s(x) = 0 for all x 2 with u(x) 2 fa(x); b(x)gg denotes the tangent space of the active constraints. Proof. Let the assumptions hold. As shown in Theorem 3.1, (O1) and (O2) are satised. In particular, we have that sg(u) = 0 for all s 2 T (B; u). Now assume the existence of s 2 T (B; u) and " > 0 with hs; r 2 f(u)si <?". Let I = fx 2 : a(x) < u(x) < b(x)g, (11) I = fx 2 : a(x) + 1= u(x) b(x)? 1=g and dene restrictions s = I s 2 V. Since I " I and s = I s, we get s? s q q (I ni )s q 1. Hence, the restrictions s converge to s in all spaces L q (), 1 q < 1. Therefore, r 2 f(u)(s? s ) tends to zero in U 0 by (A4) and, using the symmetry of r 2 f(u), hs ; r 2 f(u)s i = hs; r 2 f(u)si? hs + s ; r 2 f(u)(s? s )i < 2 r 2 f(u)(s? s ) p 0s p? "?"=2 for all suciently large. Let l > 0 be such that hs l ; r 2 f(u)s l i?"=2. The observations that s l 2 T (B; u) and u + s l 2 B for 0 1=(ls 1 ) now yield the desired contradiction: d d f(u + sl )j =0 = hs; g(u)i = 0 ; d 2 d f(u + 2 sl )j =0 = hs l ; r 2 f(u)s l i?"=2 < 0: This readily shows that (O3) holds.

11 10 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS 4. The algorithm A Newton-lie iteration. The ey idea of the method to be developed consists in solving the equation d(u)g(u) = 0 by means of a Newton{lie method augmented by a trust{region globalization. The bound constraints on u are enforced by, e.g., a scaling of the Newton-lie step. In particular, all iterates will be strictly feasible with respect to the bounds: u 2 B. In general it is not possible to nd a function d satisfying (8) that depends smoothly on u. For an ecient method, however, we need a suitable substitute for the derivative of dg. Formal application of the product rule suggests to choose an approximate derivative of the form D(u)r 2 f(u) + d u (u)g(u) ; u 2 B ; with d u (u)w 2 L(U; U 0 ), w 2 U 0, replacing the in general non-existing derivative of u 2 B 7?! d(u)w 2 U 0 at u. Here and in the sequel the linear operator D r (u), r 0, denotes the pointwise multiplication operator associated with d r (u), i.e. D r (u) : v 7?! d(u) r v: Since d r (u) 2 V, D r (u) maps L q (), 1 q 1, continuously into itself. Moreover, if the assumption (D2) below is satised and u 2 B, then D r (u) denes an automorphism of L q (), 1 q 1, with inverse D?r (u). In fact, for all u 2 B there exists 0 < d such that u 2 B, and thus d(u)(x) " d () on by (D2). If we loo at the special case d = d I, the choice d u (u)w = d 0 I (u)w with (12) d 0 def I (u)(x) = 8 >< >: for u 2 B; x 2 seems to be the most natural. For the general case this suggests the choice 1 if g(u)(x) > 0 or g(u)(x) = 0 and u(x)? a(x) b(x)? u(x)?1 if g(u)(x) < 0 or g(u)(x) = 0 and b(x)? u(x) < u(x)? a(x) D(u)r 2 f(u) + E(u) ; where E(u) : v 7?! e(u)v is a multiplication operator, e(u) 2 V, which approximates d u (u)g(u). Properties of E will be specied below. We are now able to formulate the following Newton-lie iteration for the solution of d(u)g(u) = 0: Given u 2 B, compute the new iterate u +1 := u +s 2 B where s 2 U solves (13) (D B + E )s =?d g ; and B denotes a symmetric approximation of (or replacement for) r 2 f(u ), i.e. hv; B wi = hw; B vi for all v; w 2 U. We assume that B satises the following condition: (A5) The norms B U;U 0 are uniformly bounded by a constant c 2 > 0. In the following, we will not restrict our investigations to special choices of d and e. Rather, we will develop an algorithm that is globally convergent for all ane scalings d and corresponding e satisfying the assumptions (D1){(D5):

12 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT METHODS 11 (D1) The scaling d satises (8) for all u 2 B. (D2) There exists d > 0 such that for all 2 (0; d ] there is " d = " d () > 0 such that d(u)(x) " d for all u 2 B and all x 2 with a(x) + u(x) b(x)?. (D3) The scaling satises d(u)(x) d I (u)(x) for all u 2 B, x 2 and d I given by (9). In particular, d(u)(x) c d for some c d > 0. (D4) For all u 2 B the function e(u) satises 0 e(u)(x) c e for all x 2 and g(u)(x) = 0 implies e(u)(x) = 0. (D5) The function e(u) is given by e(u) = d 0 (u)g(u), where d 0 (u) satises jd 0 (u)(x)j c d 0 for all u 2 B and x 2. We have seen that assumption (D1) is essential for the reformulation of the rst{ order necessary optimality conditions and that (D2) ensures the continuous invertability of the scaling operator D(u) for u 2 B. Furthermore, assumption (D2) will be used in the second{order convergence analysis. The assumption (D4) together with (A5) is needed to ensure uniform boundedness of the Hessian approximations ^M to be dened later (see Remar 4.2). The assumption (D5) is needed to prove second{order convergence results. Obviously, (D1){(D3) hold for either d = d I and d = d II. The assumption (D4) is satised for e(u) = d 0(u)g(u), where I d0(u) is given by (12), provided that g(u) I 1 is uniformly bounded on B, i.e. provided that (A3) holds. The following example illustrates that the relaxed requirements on the scaling function d can be used to improve the smoothness of d and the scaled gradient dg substantially 1 : Example 4.1. The quadratic function f(u) = 1 2 u2? 1 Z u(x) dx 4 is smooth on L 2 ([0; 1]). The gradient and the (strictly positive) Hessian are given by g(u) = u? 1 2 Z 1 0 u(x) dx ; r 2 f(u) : v 7?! v? Z 1 0 v(x) dx: f assumes its strict global minimum on the box B = fx + 1 u(x) 2g at the lower 2 bound u, u(x) = x+ 1. At u 2 " = u+"(x+ 1 ), " > 0, the gradient g(u 100 ") = x+"(x? 49 ) 200 becomes negative for small x. Plot (a) in Figure 1 shows d I (u " ) (dashed), d II (u " ) (solid) and jg(u " )j (dotted) for " = 0:001. Note that d II is continuous at the sign{ change of g(u " ) and retains its order of magnitude in contrast to d I. Plot (b) depicts the remainder terms "?1 di (u)g(u)? d i (u " )g(u " )? (d 0 i(u " )g(u " ) + d 0 i(u " )r 2 f(u " ))(u? u " ) for i = I (dashed) and i = II (solid) while jg(u " )j is again dotted. Here d 0 i is as in (12). Note that the remainder term for d I g does not tend to zero near the sign{change of g(u " ) in contrast to d II g. In fact, it follows from our investigations in [22] that d II (u)g(u)? d II (u)g(u)? (d 0 II(u)g(u) + d 0 II(u)r 2 f(u))(u? u) q = o(u? u 1 ) for u 2 B, 1 q 1, and u satisfying (O1), (O2), if g : B V?! V is locally Lipschitz at u and g : B V?! L q () is continuously dierentiable in a neighborhood of u. Our example admits the choice q = 1. 1 The advantages of the improved smoothness will be seen in the local convergence analysis [22].

13 12 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS (a) (b) Fig. 1. Smoothness properties of the scaling functions di(u) and dii(u) 4.2. New coordinates and symmetrization. Since neither the well-denedness nor the global convergence of the Newton-lie iteration (13) can be ensured, we intend to safeguard and globalize it by means of a closely related trust{region method. To this end we have to transform (13) into an equivalent quadratic programming problem. While the iterates are required to stay strictly feasible with respect to the bound constraints, we want to use an ane{scaling interior{point approach to reduce the eect of the interfering bound constraints in the quadratic subproblem as far as possible. The ane scaling can be expressed by a change of coordinates s ; ^s and has to be performed in such a way that we get enough distance from the boundary of the box B to be able to impose a useful fraction of Cauchy decrease condition on the trial step. An appropriate change of coordinates s ; ^s is given by ^s def = d?r s. Here r 1=2 is arbitrary but xed throughout the iteration. Performing this transformation and applying D r?1, the multiplication operator associated with d r?1, from the left to (13) leads to the equivalent equation (14) ^M ^s =?^g with ^g(u) def = d r def def (u)g(u), ^M = ^B + ^C, where ^B = D rb D r, and def ^C = E D 2r?1. Remar 4.2. Assumptions (D4) and (A5) imply that ^M U;U 0 are uniformly bounded by a constant c 3 > 0. Since ^M is symmetric, ^s is a solution of (14) if and only if it is a stationary point of the quadratic function We will return to this issue later. ^(^s) def = h^s; ^g i h^s; ^M ^si: 4.3. Second{order necessary conditions revisited. If B = r 2 f(u ), then the operator (15) ^M(u) def = D(u) r r 2 f(u)d(u) r + E(u)D(u) 2r?1 also plays an important role in the second{order necessary optimality conditions. In fact, we will show that if conditions (O1), (O2) hold at u, then (O3) can be equivalently replaced by

14 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT METHODS 13 (O3 0 ) hs; ^M(u)si 0 for all s 2 T (B; u) or even (O3 00 ) hs; ^M(u)si 0 for all s 2 V. The proof requires the following two lemmas. Lemma 4.3. Let (D1) be satised, let g(u) 2 U 0, and suppose that (O1), (O2) hold at u. Then (16) I def = fx 2 : d(u)(x) > 0g = fx 2 : a(x) < u(x) < b(x)g def = I: Proof. The inclusion I I is obvious from (8). Now let x 2 I be given. Then g(u)(x) = 0 by (O2) and Lemma 3.2. From (8) we conclude u(x) =2 fa(x); b(x)g, i.e. x 2 I. Lemma 4.4. Let (D1) and (D4) be satised, let g(u) 2 U 0, and suppose that (O1), (O2) hold at u. Moreover, assume that f is twice continuously dierentiable at u with r 2 f(u) 2 L(U; U 0 ). Then the statements (O3 0 ) and (O3 00 ) are equivalent. Proof. Obviously ii) implies i). To show the opposite direction, assume that i) holds. Set A = n I, were I is the set dened in (16). For arbitrary s 2 V we perform the splitting s = s I + s A, s I = I s 2 T (B; u), s A = A s. Lemma 4.3 implies that d r (u)s A = 0 and we obtain hs; ^M(u)si = hsi ; ^M(u)sI i + 2hs A ; e(u)d 2r?1 (u)s I i + 2hd r (u)s A ; r 2 f(u)d r (u)s I i + hd r (u)s A ; r 2 f(u)d r (u)s A i + hs A ; e(u)d 2r?1 (u)s A i = hs I ; ^M(u)sI i + hs A ; e(u)d 2r?1 (u)s A i hs I ; ^M(u)sI i 0: This completes the proof. Theorem 4.5. Let (D1), (D2), and (D4) be satised. Then in Theorem 3:3 condition (O3) can be equivalently replaced by (O3 0 ) or (O3 00 ). Proof. Since the conditions of Theorem 3:3 and Lemma 4.4 guarantee that (O3 0 ) and (O3 00 ) are equivalent, we only need to show that (O3) can be replaced by (O3 0 ). Let (O1), (O2) be satised. Then for all s 2 T (B; u) we have sg(u) = 0. To show that (O3) implies (O3 0 ), let s 2 T (B; u) be arbitrary. Then h = d r (u)s is also contained in T (B; u). Therefore, hs; ^M(u)si hh; r2 f(u)hi 0. To prove the opposite direction, assume that there exist s 2 T (B; u) and " > 0 with hs; r 2 f(u)si <?". As carried out in the proof of Theorem 3.3, we can nd l > 0 such that s l = Il s 2 T (B; u), I l as dened in (11), satises hs l ; r 2 f(u)s l i?"=2. Since d(u) is bounded away from zero on I l by assumption (D2), we obtain that h = Il d?r (u)s is an element of T (B; u) that satises hh; ^M(u)hi = hs l ; r 2 f(u)s l i?"=2 (note that e(u)h = 0 by (D4)). This contradicts (O3 0 ). Dene def ^[u](^s) = h^s; ^g(u)i + 1 h^s; ^M(u)^si 2 with ^M(u) given by (15) and ^g(u) def = d r (u)g(u). Note that ^[u ] = ^j B =r 2 f (u ). The previous results show that ^[u](^s) is convex and admits a global minimum at ^s = 0 if u is a local solution of (P).

15 14 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS 4.4. Trust{region globalization. The results on the second{order conditions in the previous section indicate that the Newton{lie iteration (14) can be used locally under appropriate conditions on B. To globalize the iteration, we minimize ^(^s) over the intersection of the ball ^w ^s p and the box B which leads to the following trust{region subproblem: Compute an approximate solution ^s with u + d r ^s 2 B of (17) min ^(^s) subject to ^w ^s p ; u + d r ^s 2 B Here ^w 2 V is a positive scaling function for the trust{region, see assumption (W) below. As noted in x 2, the crucial contributions of the ane scaling are the term E(u)D(u) 2r?1 in the Hessian ^M(u) and the scaling ^g of the gradient. The trust{region serves as a tool for globalization. Therefore, more general trust{region scalings can be admitted, as long as they satisfy (W) below. This freedom in the scaling of the trust{region will be important for the innite{dimensional local convergence analysis of this method. See [22]. We will wor with the original variables in terms of which the above problem reads Compute s with u + s 2 B as an approximate solution of (18) min (s) subject to w s p ; u + s 2 B with (s) = hs; g i hs; M si, M = B + C, C = E D?1, and w = d?r ^w. The only restriction on the trust{region scaling is that w?1 pointwise bounded uniformly in : as well as ^w = d r w are (W) There exist c w > 0 and c w 0 > 0 such that d r w 1 c w and w?1 1 c w 0 for all. Examples for w are w = d?r which yields a ball in the ^s{variables, and w = 1 which leads to a ball in the s{variables. Both choices satisfy (W) if (D3) holds. See also [11]. The functions d?r and d?1 are only well dened if u 2 B. Therefore, the condition u + d r ^s 2 B on the trial iterate is essential. However, it is important to remar that the bound constraints do not need to be strictly enforced when computing ^s. For example, in the nite{dimensional algorithms in [6], [11], an approximate solution of min ^(^s) subject to ^w ^s p is computed and then scaled by > 0 so that u + d r ^s 2 B. Similar techniques also apply in the innite{dimensional framewor. Practical choices for the innite{ dimensional algorithm will be discussed in [22] Cauchy decrease for the trial steps. An algorithm which is based on the iterative approximate solution of subproblem (18) can be expected to converge to a local solution of (P) only if the trial steps s produce a suciently large decrease of. A well established way to impose such a condition is the requirement that the decrease provided by s should be at least a fraction of the Cauchy decrease.

16 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT METHODS 15 Here the Cauchy decrease denotes the maximum possible decrease along the steepest descent direction of at s = 0 with respect to an appropriate norm (or, equivalently, appropriate coordinates) inside the feasible region of the subproblem. We will see in Lemma 6.1 that the new coordinates ^s = d?r s indeed provide enough distance to the boundary of B to allow the implementation of a useful Cauchy decrease strategy. Unless in the Hilbert space case p = 2, the steepest descent direction of ^ at ^s = 0 is not given by the negative gradient?^g but rather by any ^s d 6= 0 satisfying h^s d ; ^g i = ^s d p ^g p 0. On the other hand, if ^g 2 H then?r ^(0) =?^g is the 2 -steepest descent direction of ^ at ^s = 0. This is a strong argument for choosing this direction as basis for the Cauchy decrease condition. Of course this approach is only useful if we ensure that u? d r ^g 2 B for all > 0 suciently small which can be done by imposing condition (A2) on g which is not very restrictive. Assuming this, we may tae?d r ^g =?d 2r g as Cauchy decrease direction of, and therefore dene the following fraction of Cauchy decrease condition: There exist ; 0 > 0 (xed for all ) such that s is an approximate solution of (18) in the following sense: (19a) where s c w s p 0 ; u + s 2 B ; and (s ) < (s c ); is a solution of the one-dimensional problem (19b) min (s) subject to s =?td 2r g ; t 0 ; u + s 2 B ; w s p : 4.6. Formulation of the algorithm. For the update of the trust{region radius and the acceptance of the step we use a very common strategy. It is based on the demand that the actual decrease (20) ared (s ) def = f? f(u + s ) should be a suciently large fraction of the predicted decrease (21) pred (s ) def =? hs ; g i? 1 2 hs ; B s i =? (s ) hs ; C s i promised by the quadratic model. Since the model error is at most O(s 2 p ), the decrease ratio (22) def = ared (s ) pred (s ) will tend to one for s! 0. This suggests the following strategy for the update of the trust{region radius: Algorithm 4.6 (Update of the trust{region radius ). Let 0 < 1 < 2 < 3 < 1, and 0 < 1 < 1 < 2 < If 1 then choose +1 2 (0; 1 ]. 2. If 2 ( 1 ; 2 ) then choose +1 2 [ 1 ; ]. 3. If 2 [ 2 ; 3 ) then choose +1 2 [ ; 2 ]. 4. If 3 then choose +1 2 [ 2 ; 3 ].

17 16 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS Remar 4.7. The forms of predicted and actual decrease follow the choices used in [11], [23] (and [10] for the constrained case). In [6] the decreases and the ratio are computed as follows: pred 1 (s ) def =? (s ) ; ared 1 (s ) def = ared (s )? 1 2 hs ; C s i ; 1 def = ared 1 (s ) pred 1 (s ) : Since the crucial estimates (25) and (38) also are true for pred 1 (s ), and under (D4) the relations pred 1 (s )( 1? 1) = pred (s )(? 1) ; ared 1 (s ) ared (s ) hold, all convergence results presented in this paper remain valid if is replaced by 1. We restrict the presentation to the choice (20), (21). The algorithm iteratively computes a trial step s satisfying the fraction of Cauchy decrease condition. Depending on the decrease ratio the trial step is accepted or rejected, and the trust{region radius is adjusted. Algorithm 4.8 (Trust{Region Interior{Point Algorithm). Let 1 > 0 as in Algorithm Choose u 0 2 B and 0 > For = 0; 1; : : : 2.1. If ^g = 0 then STOP with result u Compute s satisfying (19) Compute as dened in (22) If > 1 then set u +1 = u + s, else set u +1 = u Compute +1 using Algorithm Norm estimates. In this section we collect several useful norm estimates for L q -spaces. The rst lemma states that q1 is majorizable by a multiple of q2 if q 2 q 1. Lemma 5.1. For all 1 q 1 q 2 1 and v 2 L q 2() we have v q1 m q1 ;q 2 v q2 with m q1 ;q 2 = () 1 q 1? 1 q 2. Here 1=1 is to be interpreted as zero. Proof. See e.g. [1, Thm. 2.8]. As a consequence of Holder's inequality we obtain the following result, which allows us to apply the principle of boundedness in the high- and convergence in the low-norm. Lemma 5.2. (Interpolation inequality) Given 1 q 1 q 2 1 and 0 1, let 1 q 1 satisfy 1=q = =q 1 + (1? )=q 2. Then for all v 2 L q 2 () the following is true: (23) v q v q 1 v 1? q 2 Proof. In the nontrivial cases 0 < < 1 and q < 1 observe that [q 1 =(q)]?1 + [q 2 =((1? )q)]?1 = 1 and apply Holder's inequality: v q q = jvj q jvj (1?)q 1 jvj q q 1 q jvj (1?)q q 2 (1?)q = v (q) q 1 v (1?)q q 2 :

18 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT METHODS 17 The next lemma will be used in the proof of Lemma 7.1. Lemma 5.3. For v 2 L q (), 1 q < 1 and all > 0 holds (fx 2 : jv(x)j g)?q v q q : Proof. v q q = jvjq 1 fjvjg jvj q 1 (fjvj g) q : 6. Convergence to rst{order optimal points. The convergence of the algorithm is mainly achieved by two ingredients: A lower bound for the predicted decrease for trial steps satisfying the fraction of Cauchy decrease condition, and the relation ared (s ) > 1 pred (s ) which is always satised for successful steps s. The lower bound on the predicted decrease is established in the following lemma: Lemma 6.1. Let the assumptions (A1), (A2), (D1){(D4), and (W) hold. Then there exists c 4 > 0 such that for all u 2 B with ^g 6= 0 and all s satisfying (19) the following holds: ( ) pred (s )? (s ) 1 2 ^g 2 min ^g ; ; c1?2r d (24) c w ^g p ^M U;U 0^g 2 g p 1 (25) ( ) c 4 ^g 2 p min ^g 2 p ; 0 ; c1?2r d : 0 c w ^g p ^M U;U 0^g 2 g p 1 Proof. Since C is obviously positive by (D4), we have pred (s ) =? (s ) hs ; C s i? (s ): Now we will derive an upper bound for the minimum of () = (?d 2r g ) on [0; + ] with + = minf B ; g, where and n B = max : b(x)? u (x) + d 2r (x)g (x) 0 ; and o u (x)? a(x)? d 2r (x)g (x) 0 8 x 2 Therefore, using (D3), B = min ( inf fg (x)6=0g inf fg (x)<0g = w d 2r g p = b(x)? u (x)?d 2r (x)g (x) ; d 1?2r (x) jg (x)j inf fg (x)6=0g We have () =? with inf fg (x)>0g c 1?2r d ^w ^g p jg (x)j c1?2r d : g 1 c w ^g p : ) u (x)? a(x) d 2r (x)g = inf (x) fg (x)6=0g 1 = hd 2r g ; g i = ^g 2 2 ; 2 = hd 2r g ; M d 2r g i = h^g ; ^M ^g i; d I (u )(x) d 2r (x)jg (x)j

19 18 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS and observe j 2 j ^M U;U 0^g 2 p. Let be a minimizer for on [0; + ]. If < + then 2 > 0, = 1 = 2, and ( ) =? 1 2 1? If = and 2 > 0 then 1 = 2 and ^g 4 2 ^M U;U 0^g 2 p : ( ) =? ? 1 2? 1 2 ^g 2 2 : c w ^g p If = and 2 0 then even ( )? 1. For = B the same arguments show ( )? 1 2 B? 1 ^g 2 2 c1?2r 2 d : g 1 The rst inequality (24) now follows from these estimates and (19). inequality (25) follows from (24) and the application The second ^g p 0 m p 0 ;2^g 2 of Lemma 5.1. Note that p 2 and 1=p + 1=p 0 = 1 yield p 0 2. Remar 6.2. The sequence of inequalities for the estimation of B uses the inequality d 2r?1 c 2r?1 d. This is where we need the restriction to r 1=2. Let the assumptions of Lemma 6.1 hold. If the th iteration of Algorithm 4.8 is successful, i.e. > 1 (or equivalently u +1 6= u ), then Lemma 6.1 provides an estimate for the actual decrease: f? f +1 > 1 c 4 ^g 2 p 0 min ( c w ^g p ; ^g 2 p 0 ^M U;U 0^g 2 p ) ; c1?2r d : g 1 If in addition the assumptions (A3) and (A5) hold, Remar 4.2 and the previous inequality imply the existence of c 5 > 0 with (26) f? f +1 > c 5 ^g 2 p 0 min n ; ^g 2 p 0; c1?2r d The next statement is trivial: Lemma 6.3. Let ( ) and ( ) be generated by Algorithm 4:8. If 2 for suciently large then ( ) is bounded away from zero. Now we can prove a rst global convergence result. Theorem 6.4. Let assumptions (A1){(A3), (A5), (D1){(D4), and (W) hold. Let the sequence (u ) be generated by Algorithm 4:8. Then Even more: lim inf!1 dr g p 0 = 0: lim inf!1 dr g q = 0 for all 1 q < 1: o :

20 GLOBAL CONVERGENCE OF TRUST-REGION INTERIOR-POINT METHODS 19 Proof. Assume that there are K > 0 and " > 0 with ^g p 0 " for all K. First we will show that this implies P 1 =0 < 1. If there is only a nite number of successful steps then +1 1 for large and we are done. Otherwise, if the sequence ( i ) of successful steps does not terminate, we conclude from f # and the boundedness of f that P 1 =0 (f? f +1 ) < 1. For all = i we may use (26) and obtain, since ^g i p 0 " for i K, that i tends to zero and, moreover, obeys the inequality i < 1 c 5 " 2 (f i? f i +1) for all i suciently large. This shows P 1 i=0 i < 1. Since for all successful steps 2 f i g we have +1 2 and for all others +1 1, we conclude (27) 1X =0 1X i=0 i ? 1 < 1: In a second step we will show that j? 1j! 0. Due to (28) u +1? u p s p 0 w?1 1 0 c w 0 and (27), (u ) is a Cauchy sequence in U. Furthermore, 2 (s )? hs ; g i? 1 2 hs ; C s i = jhs ; B s ij B U;U 0s 2 p c 2 2 0c 2 w 02 : The mean value theorem yields f(u + s )? f = hs ; g i for some 2 [0; 1] and g = g(u + s ), and hence jpred (s )jj? 1j = f(u + s )? f hs ; C s i? (s ) hs ; g i hs ; C s i? (s ) + jhs ; g? g ij : c c2 w c w 0g? g p 0 Since (u ) converges in the closed set B, g is continuous, and ( ) as well as (s p ) (see (28)) tend to zero, the rst factor in the last expression converges to zero, too. Lemma 6.1 garantees that jpred (s )j= is uniformly bounded away from zero for K, since by assumption ^g p 0 ". This shows j? 1j! 0. But now Lemma 6.3 yields a contradiction to! 0. Therefore, the assumption is wrong and the rst part of the assertion holds. The second part follows from Lemma 5.1 for 1 q p 0 and from (A3) and the interpolation inequality (23) for p 0 < q < 1. Now we will show that if ^g is uniformly continuous the limites inferiores in Theorem 6.4 can be replaced by limites. We introduce the following assumptions: (A6) The scaled gradient ^g = d r g : B U?! U 0 is uniformly continuous.

21 20 M. ULBRICH, S. ULBRICH, AND M. HEINKENSCHLOSS (A6 0 ) The gradient g : B U?! U 0 is uniformly continuous and d = d I or d = d II. Condition (A6) is not so easy to verify for most choices of d. With Lemma 6.5, however, we provide a very helpful tool to chec the validity of (A6). Moreover, we show in Lemma 6.6 that (A6 0 ) implies (A6). The proofs of both lemmas can be found in the appendix. As a by-product of our investigations we get the valuable result that ^g inherits the continuity of g if we choose d = d I or d = d II. We will derive the results concerning continuity and uniform continuity of ^g simultaneously. Additional requirements for the uniform continuity are written in parentheses. Lemma 6.5. Let (A1){(A3), (D3) hold and g : B U?! U 0 be (uniformly) continuous. Assume that fg(u)g(~u)>0g (d(u)? d(~u)) p 0 tends to zero (uniformly in u 2 B U) for ~u! u in B U. Then ^g = d r g : B U?! U 0 is (uniformly) continuous. Proof. See appendix. The previous lemma is now applicable to the choices d = d I and d = d II : Lemma 6.6. Let (A1){(A3) hold and d = d I or d = d II. Then ^g = d r g : B U?! U 0 is continuous. If, in addition, g is uniformly continuous, then the same is true for ^g. Proof. See appendix. Now we state the promised variant of Theorem 6.4. Theorem 6.7. Let assumptions (A1){(A3), (A5), (D1){(D4), (W), and (A6) or (A6 0 ) hold. Then the sequence (u ) generated by Algorithm 4:8 satises (29) lim!1 dr g p 0 = 0: Even more: (30) lim!1 dr g q = 0 for all 1 q < 1: Proof. Since, due to Lemma 6.6, ^g = d r g is uniformly continuous, it suces to show that under the assumption ^g p 0 " 1 > 0 for an innite number of iterations there exists a sequence of index pairs (m i ; l i ) with ^g mi? ^g li p 0 > 0 but u mi? u li p! 0, which is a contradiction to the uniform continuity of ^g. So let us assume that (29) does not hold. Then there is " 1 > 0 and a sequence (m i ) with ^g mi p 0 " 1. Theorem 6.4 yields a sequence ( i ) with ^g i p 0! 0. For arbitrary 0 < " 2 < " 1 we can thus nd a sequence (l i ) such that ^g p 0 " 2 ; m i < l i ; ^g li p 0 < " 2 : Since ^g li 6= ^g li?1, iteration l i? 1 is successful and one has for all successful iterations, m i < l i, by Lemma 6.1 and (26) (31) f? f +1 > c 5 " 2 2 min n ; " 2 2; c 1?2r d The left hand side converges to zero, because (f ) is nonincreasing and bounded from below, i.e. is a Cauchy sequence. We conclude that tends to zero for successful steps m i < l i and get with (28) that o : f? f +1 c 5 " 2 2 c 5" 2 2 def u +1? u 0 c p = c 6 u +1? u p ; w 0

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

Journal of Computational and Applied Mathematics 161 (003) 1 5 www.elsevier.com/locate/cam A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality