RICE UNIVERSITY. Trust{Region Interior{Point Algorithms for a. Class of Nonlinear Programming Problems. Lus Nunes Vicente. A Thesis Submitted

Size: px

Start display at page:

Download "RICE UNIVERSITY. Trust{Region Interior{Point Algorithms for a. Class of Nonlinear Programming Problems. Lus Nunes Vicente. A Thesis Submitted"

April Carter
5 years ago
Views:

1 RICE UNIVERSITY Trust{Region Interior{Point Algorithms for a Class of Nonlinear Programming Problems by Lus Nunes Vicente A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Approved, Thesis Committee: John E Dennis, Chairman Noah Harding Professor of Computational and Applied Mathematics Thomas A Badgwell Professor of Chemical Engineering Mahmoud El{Alem Professor of Mathematics, Alexandria University, Egypt Danny C Sorensen Professor of Computational and Applied Mathematics Richard A Tapia Noah Harding Professor of Computational and Applied Mathematics Houston, Texas March, 996

3 Trust{Region Interior{Point Algorithms for a Class of Nonlinear Programming Problems Lus Nunes Vicente Abstract This thesis introduces and analyzes a family of trust{region interior{point (TRIP) reduced sequential quadratic programming (SQP) algorithms for the solution of minimization problems with nonlinear equality constraints and simple bounds on some of the variables These nonlinear programming problems appear in applications in control, design, parameter identication, and inversion In particular they often arise in the discretization of optimal control problems The TRIP reduced SQP algorithms treat states and controls as independent variables They are designed to tae advantage of the structure of the problem In particular they do not rely on matrix factorizations of the linearized constraints, but use solutions of the linearized state and adjoint equations These algorithms result from a successful combination of a reduced SQP algorithm, a trust{region globalization, and a primal{dual ane scaling interior{point method The TRIP reduced SQP algorithms have very strong theoretical properties It is shown in this thesis that they converge globally to points satisfying rst and second order necessary optimality conditions, and in a neighborhood of a local minimizer the rate of convergence is quadratic Our algorithms and convergence results reduce to those of Coleman and Li for box{constrained optimization An inexact analysis is presented to provide a practical way of controlling residuals of linear systems and directional derivatives Complementing this theory, numerical experiments for two nonlinear optimal control problems are included showing the robustness and eectiveness of these algorithms Another topic of this dissertation is a specialized analysis of these algorithms for equality{constrained optimization problems The important feature of the way this family of algorithms specializes for these problems is that they do not require the computation of normal components for the step and an orthogonal basis for the null space of the Jacobian of the equality constraints An extension of More

4 and Sorensen's result for unconstrained optimization is presented, showing global convergence for these algorithms to a point satisfying the second{order necessary optimality conditions

5 Acnowledgments First, I would lie to dedicate this dissertation to my parents I could had never written this thesis without the love, care, and support of my wife, In^es I would lie to than her for all the joy and happiness she brought into my life My daughter, Laura, and my son, Antonio, have also inspired me greatly I than them very much and hope they will understand one day why I spent so little time playing with them I than my family in Portugal, and in particular my dear and supportive brother Pedro, for all the sincere encouragement and generous assistance I have received during the years in Houston My profound gratitude goes to my adviser, Professor John Dennis John has always shared his ideas with me about mathematics and science, and his thoughts had a great inuence upon me He also set an example of generosity, friendship, and integrity that I will always eep in mind During my stay at Rice I wored closely with Professor Matthias Heinenschloss He introduced me to optimal control problems and much of the wor reported in this thesis has resulted from this collaboration I than Matthias for being such a close friend and co{worer I would lie to than all members of my committee for their attention and support I owe a special debt to Professors Danny Sorensen and Richard Tapia I will never forget how much I learned from the classes I too from them I than Professor Mahmoud El{Alem for the stimulating discussions on trust regions and for his careful proofreading I than Professor Thomas Badgwell for his helpful comments and his interesting suggestions I also than David Andrews and Zeferino Parada for proofreading parts of my thesis and thesis proposal Many thans to all my friends and colleagues at Rice I enjoyed the many conversations I had with David Andrews, Martin Bergman, Hector Kle, and Richard Lehoucq

6 v Finally, I would lie to than Professors Jose Alberto Fernandes de Carvalho and Joaquim Jo~ao Judice In 992 and 993 they encouraged me to study abroad and without their help and support my studies in United States would not have been possible Financial support from the following institutions is gratefully acnowledged: Comiss~ao Permanente Invotan, Fundac~ao Luso Americana para o Desenvolvimento, Comiss~ao Cultural Luso Americana (Fulbright Program), and Departamento de Matematica da Universidade de Coimbra Financial support for this wor was also provided by the National Science Foundation cooperative agreement CCR{928 and by the Department of Energy contract DOE{FG3{93ER2578

7 Contents Abstract Acnowledgments List of Illustrations List of Tables ii iv ix xi Introduction The Class of Nonlinear Programming Problems : : : : : : : : : : : : 2 Algorithms and Convergence Theory : : : : : : : : : : : : : : : : : : 2 3 Inexact Analysis and Implementation : : : : : : : : : : : : : : : : : : 4 4 Other Contributions : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 5 Organization of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : 6 6 Notation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2 Globalization Schemes for Nonlinear Optimization 9 2 Basics of Unconstrained Optimization : : : : : : : : : : : : : : : : : : 9 22 Line Searches : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23 The Trust{Region Technique : : : : : : : : : : : : : : : : : : : : : : : 3 23 How to Compute a Step : : : : : : : : : : : : : : : : : : : : : The Trust{Region Algorithm : : : : : : : : : : : : : : : : : : Global Convergence Results : : : : : : : : : : : : : : : : : : : Tihonov Regularization : : : : : : : : : : : : : : : : : : : : : More about Line Searches and Trust Regions : : : : : : : : : : : : : : 23 3 Trust{Region SQP Algorithms for Equality{Constrai{ ned Optimization 25 3 Basics of Equality{Constrained Optimization : : : : : : : : : : : : : : SQP Algorithms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Trust{Region Globalizations : : : : : : : : : : : : : : : : : : : : : : : A General Trust{Region Globalization of the Reduced SQP Algorithm 35

8 vii 34 The Quasi{Normal Component : : : : : : : : : : : : : : : : : The Tangential Component : : : : : : : : : : : : : : : : : : : Outline of the Algorithm : : : : : : : : : : : : : : : : : : : : : General Assumptions : : : : : : : : : : : : : : : : : : : : : : : 4 35 Intermediate Results : : : : : : : : : : : : : : : : : : : : : : : : : : : Global Convergence Results : : : : : : : : : : : : : : : : : : : : : : : The Use of the Normal Decomposition with the Least{Squares Multipliers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Analysis of the Trust{Region Subproblem for the Linearized Constraints 55 4 A Class of Nonlinear Programming Problems 58 4 Structure of the Minimization Problem : : : : : : : : : : : : : : : : : All{At{Once rather than Blac Box : : : : : : : : : : : : : : : : : : : 6 43 The Oblique Projection : : : : : : : : : : : : : : : : : : : : : : : : : : Optimality Conditions : : : : : : : : : : : : : : : : : : : : : : : : : : Optimal Control Examples : : : : : : : : : : : : : : : : : : : : : : : : 7 45 Boundary Control of a Nonlinear Heat Equation : : : : : : : : Distributed Control of a Semi{Linear Elliptic Equation : : : : Problem Scaling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 73 5 Trust{Region Interior{Point Reduced SQP Algorithms for a Class of Nonlinear Programming Problems 74 5 Application of Newton's Method : : : : : : : : : : : : : : : : : : : : : Trust{Region Interior{Point Reduced SQP Algorithms : : : : : : : : 8 52 The Quasi{Normal Component : : : : : : : : : : : : : : : : : The Tangential Component : : : : : : : : : : : : : : : : : : : Reduced and Full Hessians : : : : : : : : : : : : : : : : : : : : Outline of the Algorithms : : : : : : : : : : : : : : : : : : : : General Assumptions : : : : : : : : : : : : : : : : : : : : : : : 9 53 Intermediate Results : : : : : : : : : : : : : : : : : : : : : : : : : : : Global Convergence to a First{Order Point : : : : : : : : : : : : : : : Global Convergence to a Second{Order Point : : : : : : : : : : : : : 56 Local Rate of Convergence : : : : : : : : : : : : : : : : : : : : : : : : 7 57 Computation of Steps and Multiplier Estimates : : : : : : : : : : : : 2

9 viii 57 Computation of the Tangential Component : : : : : : : : : : : Computation of Multiplier Estimates : : : : : : : : : : : : : : 6 58 Numerical Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 6 Analysis of Inexact Trust{Region Interior{Point Re{ duced SQP Algorithms 2 6 Sources and Representation of Inexactness : : : : : : : : : : : : : : : Inexact Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Global Convergence to a First{Order Point : : : : : : : : : : : Inexact Directional Derivatives : : : : : : : : : : : : : : : : : Inexact Calculation of the Quasi{Normal Component : : : : : : : : : 3 63 Methods that Use the Transpose : : : : : : : : : : : : : : : : Methods that Are Transpose Free : : : : : : : : : : : : : : : : Scaled Approximate Solutions : : : : : : : : : : : : : : : : : : Inexact Calculation of the Tangential Component : : : : : : : : : : : Reduced Gradient : : : : : : : : : : : : : : : : : : : : : : : : : Use of Conjugate Gradients to Compute the Tangential Component : : : : : : : : : : : : : : : : : : : : : : : : : : : : Distance to the Null Space of the Linearized Constraints : : : Fraction of Cauchy Decrease Condition : : : : : : : : : : : : : Inexact Calculation of Lagrange Multipliers : : : : : : : : : : Numerical Experiments : : : : : : : : : : : : : : : : : : : : : : : : : : Boundary Control Problem : : : : : : : : : : : : : : : : : : : Distributed Control Problem : : : : : : : : : : : : : : : : : : : 46 7 Conclusions and Open Questions 5 7 Conclusions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 72 Open Questions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 52 Bibliography 54

11 Illustrations Global convergence to a point that satises the rst{order necessary optimality conditions: our result for problem () generalizes those obtained by the indicated authors for simpler problem classes : : : : 5 2 Global convergence to a point that satises the second{order necessary optimality conditions: our result for problem () generalizes those obtained by the indicated authors for simpler problem classes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 A dogleg (at the left) and a conjugate{gradient (at the right) steps inside a trust region To illustrate better the conjugate{gradient algorithm, the number of iterations is set to three, which of course exceeds the number of iterations for nite termination : : : : : : : : 6 3 The quasi{normal and tangential components of the step for the coupled approach : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 39 4 The normal and the quasi{normal components and the action of the orthogonal and oblique projectors : : : : : : : : : : : : : : : : : : : : 65 5 Plots of D(x) 2 and W (x) T rf(x) for W (x) T rf(x) =?x + and x 2 [; 4] : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Plot of D(x) 2 W (x) T rf(x) for W (x) T rf(x) =?x + and x 2 [; 4] The quasi{normal and tangential components of the step for the decoupled approach We assume for simplicity that D = ( ) : : : : The quasi{normal and tangential components of the step for the coupled approach We assume for simplicity that D = ( ) : : : : : Control plot using the Coleman{Li ane scaling : : : : : : : : : : : 2 56 Control plot using the Diin{Karmarar ane scaling : : : : : : : : 2

12 x 6 Performance of the inexact TRIP reduced SQP algorithms applied to the boundary control problem Here ln f (dotted line), ln C (dashed line), and ln D W T rf (solid line) are plotted as a function of : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Illustration of the performance of the inexact TRIP reduced SQP algorithms applied to the boundary control problem These plots show the residuals ln J s q in dashed line and ln J (s q + s t ) in solid line : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Performance of the inexact TRIP reduced SQP algorithms applied to the distributed control problem Here ln f (dotted line), ln C (dashed line), and ln D W T rf (solid line) are plotted as a function of : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Illustration of the performance of the inexact TRIP reduced SQP algorithms applied to the distributed control problem These plots show the residuals ln J s q in dashed line and ln J (s q + s t ) in solid line : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49

13 Tables 5 Number of linearized state and adjoint solvers to compute the tangential component (I() denotes the number of conjugate{gradient iterations) : : : : : : : : : : : : : : : : : : : : : : 5 52 Numerical results for the boundary control problem Case =?2 : 9 53 Numerical results for the boundary control problem Case =?3 : 2 6 Number of iterations to solve the optimal control problems : : : : : : Number of iterations to solve large distributed semi{linear control problems : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5

15 Chapter Introduction Optimization, or mathematical programming, has developed enormously in the last fty years and has reached a point where researchers often concentrate on a specic class of problems Existing algorithmic ideas can be tailored to the characteristics of the class These problem classes usually come from an application in industry or science This is the case of the class of problems addressed in this thesis Moreover, the structure of the problems in the class considered here is fundamental in taing advantage of recent advances in computer technology The resulting algorithms are more robust and ecient, and their implementations t more conveniently the purposes of the application The Class of Nonlinear Programming Problems In this dissertation, we focus on a particular class of nonlinear programming problems that have many applications in engineering and science The formulation of these problems is the following: minimize f(y; u) subject to C(y; u) = ; () a u b; where f : IR n?! IR and C : IR n?! IR m are smooth functions, y 2 IR m, u 2 IR n?m, and m and n are positive integers satisfying m < n In this class of problems the variables x are split into two groups: state variables y, and control variables u These are coupled through a set of nonlinear equality constraints C(y; u) =, the so{called (discretized) state equation We also consider lower and upper bounds on the control variables u However, bounds on the state variables y are not considered in this dissertation The presence of such bounds would add another layer of diculty to problem () and would require possibly a dierent algorithmic approach

16 2 These optimization problems often arise in the discretization of optimal control problems that are governed by partial dierential equations We address the optimal control problems in nite dimensions after the discretization has taen place, but we do not neglect the physics and the structure that such problems have when posed naturally in innite dimensions These nonlinear programming problems also appear in parameter identication, inversion, and optimal design This class of problems is rich, and we continue to nd new applications on a regular basis The linearization of the nonlinear state equation yields the (discretized) linearized state equation and the corresponding adjoint equation Ecient solutions of the linear system corresponding to these equations exist for many applications [22], [75], [49], and the optimization algorithm ought to tae advantage of it This linearization also oers a tremendous amount of structure In particular, we use it to obtain a matrix whose columns form a nonorthogonal basis for the null space of the Jacobian matrix of the nonlinear equality constraints Matrix{vector products with this matrix involve solutions of the linearized state and adjoint equations Furthermore, a solution of the linearized state equation is naturally decomposed into two components, a quasi{ normal component and a tangential component The algorithms that we propose and analyze in this thesis are based on an all{ at{once approach (see [3]), where states y and controls u are treated as independent variables 2 Algorithms and Convergence Theory Although there are algorithms available for the solution of nonlinear programming problems that are more general than (), the family of algorithms presented in this thesis is unique in the consequent use of structure inherent in many optimal control problems, the use of optimization techniques successfully applied in other contexts of nonlinear programming, and the rigorous theoretical justication We call our algorithms trust{region interior{point (TRIP) reduced sequential quadratic programming (SQP) algorithms since they combine: SQP techniques to approximate the nonlinear programming problem by a sequence of quadratic programming subproblems (We chose a reduced SQP algorithm because the reduction given by the null{space representation mentioned above appears naturally from the linearization of the nonlinear equality con-

17 3 straints Both the quasi{normal and the tangential components are associated with solutions of unconstrained quadratic programming subproblems) 2 Trust regions to guarantee global convergence, ie that convergence is attained from any starting point (A trust region is imposed appropriately on the quasi{ normal and tangential components constraining the respective quadratic programming subproblems The trust{region technique we use is similar to those that Byrd and Omojoon [5], Dennis, El{Alem, and Maciel [35], and Dennis and Vicente [42] proposed for equality{constrained optimization Besides assuring global convergence, trust regions regularize ill{conditioned second{order derivatives of the quadratic subproblems This is very important since many problems in this class are ill{conditioned) 3 An interior{point strategy to handle the bounds on the control variables u (We adapt to our contex a primal{dual ane scaling algorithm proposed by Coleman and Li [23] for optimization problems with simple bounds We accomplish this by taing advantage of the structure of our class of problems The interior{ point scheme requires no more information than is needed for the solution of these problems with no bounds on the control variables u) The TRIP reduced SQP algorithms have very powerful convergence properties as we show in this thesis We prove: Global convergence to a point satisfying the rst{order necessary optimality conditions if rst{order derivatives are used 2 Global convergence to a point satisfying the second{order necessary optimality conditions if second{order derivatives are used 3 Boundedness of the sequence of penalty parameters and the boundedness away from zero of the sequence of trust radii if second{order derivatives are used The q{quadratic rate of local convergence for these algorithms is a consequence of the combination of this nice global{to{local behavior with a Newton{type iteration The assumptions we use to prove these results reduce to the weaest assumptions used to establish similar results in the special cases of unconstrained, equality{ constrained, and box{constrained optimization Our theoretical results, also reported

18 4 in Dennis, Heinenschloss, and Vicente [36], generalize similar ones obtained for these simpler problem classes This is schematized in Figures and 2 3 Inexact Analysis and Implementation Neither the analysis of the TRIP reduced SQP algorithms nor their implementation would be complete without studying their behavior under the presence of inexactness In practice, a very large linear system is solved inexactly yielding a certain residual Depending on the iterative method chosen for its solution, there is the possibility of measuring and controlling the size of the residual vector If the solution of the linear system is required at a given iteration of an optimization algorithm, the size of this residual should tighten with a measure of how feasible and optimal the current point is An inexact analysis should provide a practical algorithmic way of accomplishing this tightening We present an inexact analysis for the TRIP reduced SQP algorithms that relates the size of the residual vectors of the linearized state and adjoint equations with the trust radius and the size of the constraint residual, the latter being quantities at hand at the beginning of each iteration We provide practical rules of implementing this relationship that assure global convergence To our nowledge, inexactness for SQP algorithms with trust{region globalizations has not been studied in the literature In practice the TRIP reduced SQP algorithms are robust and ecient techniques for a variety of problems The implementation of these algorithms is currently being beta{tested with the intent of electronic distribution [76] The current implementation provides the user with a number of alternatives to compute the steps and to approximate second{order derivatives There are two versions, one in Fortran 77 and one in Matlab The implementation addresses the problem scaling, the computation of mass and stiness matrices, and the setting of tolerances for inexact solvers These issues arise frequently in optimal control problems governed by partial dierential equations In this thesis, we present numerical results for two medium to large discretized optimal control problems: a boundary nonlinear parabolic control problem and a distributed nonlinear elliptic control problem These numerical results are very satisfactory and indicate the eectiveness of our algorithms Our implementation has been used successfully to solve control problems in uid ow [22], [75]

19 5 Dennis, Heinenschloss, and Vicente 95 (state equality constraints with bounds on controls) Coleman and Li 93 (simple bounds) Dennis, El{Alem, and Maciel 92 (equality constraints) Powell 75 (no constraints) Figure Global convergence to a point that satises the rst{order necessary optimality conditions: our result for problem () generalizes those obtained by the indicated authors for simpler problem classes 4 Other Contributions We present a brief survey of trust regions for unconstrained optimization that covers only the most important trust{region ideas used in our algorithms In this framewor, we compare line searches and trust regions from the point of view of regularization of ill{conditioned second{order approximations The ability to converge globally to points satisfying the second{order necessary optimality conditions is natural for trust{regions, and it has been shown in the literature for dierent classes of problems and dierent trust{region algorithms We prove this property also for a family of general trust{region algorithms [35], [42] for equality{constrained optimization that use nonorthogonal null{space basis and quasi{ normal components This analysis, of value by itself, motivates all the convergence theory for the TRIP reduced SQP algorithms

20 6 Dennis, Heinenschloss, and Vicente 95 (state equality constraints with bounds on controls) Coleman and Li 93 (simple bounds) Dennis and Vicente 95 (equality constraints) More and Sorensen 82, 83 (no constraints) Figure 2 Global convergence to a point that satises the second{order necessary optimality conditions: our result for problem () generalizes those obtained by the indicated authors for simpler problem classes 5 Organization of the Thesis Chapters 2 and 3 review basic material on unconstrained and equality{constrained optimization that is used in the other chapters The reader familiar with these basic concepts might want to sip many of the sections in these two chapters In Chapter 2, we discuss and compare the regularization of ill{conditioned second{order approximations for line searches and trust regions In Chapter 3, we derive global convergence to a point satisfying second{order necessary optimality conditions for a family of trust{ region reduced SQP algorithms for equality{constrained optimization, and present an analysis of the trust{region subproblem for the linearized constraints The class of problems () is described in great detail in Chapter 4, where we establish optimality conditions and comment on the use of structure

21 7 Chapters 5 and 6 are the two main chapters of this thesis They describe the TRIP reduced SQP algorithms for our class of problems and prove their convergence properties Chapter 5 focuses on the exact version of these algorithms and includes both global and local convergence results In Chapter 6, we study the global behavior of the TRIP reduced SQP algorithms under the presence of inexactness Sections 58 and 65 contain numerical experiments The most important conclusions and open questions are summarized in Chapter 7 A short introduction and a summary of contents are given at the beginning of every chapter There we cite related wor and justify our algorithmic choices 6 Notation We list below some of the notation and abbreviations used in this thesis `(x; ) = f(x)+ T C(x) is the Lagrangian function associated with the problem minimize f(x) subject to C(x) =, where is the Lagrange multiplier vector rf(x) is the gradient of the real{valued function f(x) and J(x) is the Jacobian of the vector{valued function C(x) = c (x); : : : ; c m (x) T r 2 f(x), r 2 c i (x), and r 2 xx`(x; ) = r 2 f(x) + P m i= i r 2 c i (x) are the Hessians matrices with respect to x of f(x), c i (x), and `(x; ) respectively N (A) represents the null space of the matrix A W (x) (resp Z(x)) is a matrix whose columns form a basis (resp an orthogonal basis) for the null space of J(x) Subscripted indices are used to represent the evaluation of a function at a particular point of the sequences fx g and f g For instance, f represents f(x ) and ` is the same as `(x ; ) The vector and matrix norms are the `2 norms The sequence fx g is bounded if there exists > independent of such that x for all In this case we say that the element x of the sequence fx g is uniformly bounded I p represents the identity matrix of order p with columns e ; : : : ; e p

22 8 (A) denotes the smallest eigenvalue of the symmetric matrix A (A) represents the `2 condition number of the matrix A with respect to inversion For nonsingular square matrices (A) = A A? In general, we have (A) = (A) r (A), where r is the ran of A, and (A) and r (A) are the largest and smallest singular values of A, respectively The element x of the sequence fx g is O(y ) if there exists a positive constant > independent of such that x y for all SQP algorithms: sequential quadratic programming algorithms TRIP reduced SQP algorithms: trust{region interior{point reduced SQP algorithms

23 9 Chapter 2 Globalization Schemes for Nonlinear Optimization Consider the unconstrained optimization problem minimize f(x) ; (2) where x 2 IR n and f : IR n?! IR is at least twice continuously dierentiable One purpose of this chapter is to use this problem to provide necessary bacground for this thesis of fundamental concepts of nonlinear optimization lie line{search and trust{ region globalization schemes We support the claim that the trust{region technique has built{in a regularization of ill{conditioned second{order approximations The organization of this chapter is the following The optimality conditions and other basic concepts of unconstrained optimization are reviewed in Section 2 In Section 22, we give a very brief introduction to line searches Trust regions are presented with more detail in Section 23 In Section 24, we compare these two globalization strategies focusing on their regularization properties 2 Basics of Unconstrained Optimization The optimality conditions for the unconstrained optimization problem (2) are given in the following proposition Proposition 2 Let f be continuously dierentiable If the point x is a local minimizer for problem (2) then rf(x ) = : In this case x is called a stationary point or a point that satises the rst{order necessary optimality conditions Now let us assume that f is twice continuously dierentiable The second{ order necessary (resp sucient) optimality conditions for x to be a local

24 minimizer for (2) are rf(x ) = and r 2 f(x ) is positive semi{denite (resp denite) The proofs of these basic results can be found in many textboos lie [39], [6] A quasi{newton method for the solution of (2) generates a sequence of iterates fx g and steps fs g such that x + = x + s At x, a quadratic model of f(x + s), q (s) = f(x ) + g T s + 2 st H s; is formed, where g = rf(x ) and H is a symmetric matrix of order n that approximates the Hessian r 2 f(x ) and introduces curvature into the model The quasi{ Newton step s is computed using the quadratic model q (s) Algorithm 2 (Basic Quasi{Newton Algorithm) Choose x 2 For = ; ; 2; : : : do 2 Stop if x satises the stopping criterion 22 Compute s as an approximate solution of minimize f(x ) + g T s + 2 st H s 23 Set x + = x + s and compute H + possibly by updating H A possible stopping criterion is g tol for some tol > If H is nonsingular, a typical quasi{newton step s is given by s =?H? g If in addition H is positive denite, then this quasi{newton step s =?H? g is the unconstrained minimizer of q (s) In Newton's method, we have H = r 2 f(x ) Newton's method is credited to Newton (see [43]) in the 66's for nding a root of a nonlinear equation with one variable using a technique similar to Newton's method, but where the calculations are organized dierently Raphson [24] plays an important role in this discovery by rederiving Newton's technique in a way that is very close to what is used nowadays The multidimensional version of Newton's method is due to Simpson [3] in 74 See the survey paper by Ypma [5]

25 It is well{nown that the basic quasi{newton algorithm is not globally convergent to a stationary point [39][Figure 632] If we want to start with any choice of x and still guarantee convergence, then we need a globalization strategy The most often used globalization strategies for quasi{newton algorithms are line searches and trust regions A line{search strategy requires a direction d from which a step is obtained The step s is of the form d, where the step length is chosen in an appropriate way and d is a descent direction, ie d T g < If H is nonsingular, d =?H? g might be a reasonable choice The trust{region technique does not necessarily choose a specic pattern of directions Here a step s is a suciently good approximate solution of the trust{region subproblem minimize q (s) (22) subject to s ; where is the trust radius We will be more precise later More general forms of this simple trust{region subproblem are considered in the papers [73], [], [3], [5], [36], [4], [53] 22 Line Searches If a line search is used, one might as the step s = d to satisfy the Armijo{ Goldstein{Wolfe conditions: f(x + s ) f(x ) + g T s ; (23) rf(x + s ) T s 2 g T s ; (24) where and 2 are constants xed for all and satisfying < < 2 < Let denote the angle between d and?g dened through dt g cos( ) =? d g ; 2 ; 2 We present now the basic line{search algorithm and its classical convergence result : Algorithm 22 (Basic Line{Search Algorithm) Choose x,, and 2 such that < < 2 <

26 2 2 For = ; ; 2; : : : do 2 Stop if x satises the stopping criterion 22 Compute a direction d based on q (s) 23 Compute s = d to satisfy (23) and (24), and set x + = x + s A possible stopping criterion is g tol for some tol > Theorem 22 Let f be bounded below and rf be uniformly continuous If for all, s = d satises (23){(24) and the direction d is descent, then lim!+ cos( )g = : Some of the ground wor that led to this result was provided by Armijo [2] and Goldstein [65] It was established by Wolfe [44], [45] and Zoutendij [58], under the assumption that the gradient is Lipschitz continuous However this condition can be relaxed and one can see that uniform continuity is enough (see Fletcher [53][Theorem 25]) Some practical line{search algorithms are described by More and Thuente [7] For more references see also the boos [39], [2], [6] and the review papers [4], [3] From Theorem 22, a ey ingredient to obtain global convergence to a stationary point is to eep the angle between?g and d uniformly bounded away from 2 Now let us consider the case where H is nonsingular and d =?H? g If the condition number (H ) of the matrix H is uniformly bounded, ie if there exists a > such that for every, then we have (H ) cos( ) = gt H? g g H? g : (25) One way of assuring that the direction?h? g is descent is to force H to be positive denite The following corollary of Theorem 22 is a result of these considerations Corollary 22 Let f be bounded below and rf be uniformly continuous If for all, H is positive denite, s =? H? g satises

27 3 (23){(24), and the condition number (H ) of H is uniformly bounded, then fx g satises lim!+ g = : 23 The Trust{Region Technique The development of trust regions started with the wor of Levenberg [93] (944), Marquardt [97] (963), and Goldfeld, Quandt, and Trotter [64] (966) A few years later Powell [2], [2] (97, 975), Hebden [7] (973), and More [2] (978) opened the eld of research in this area Trust{region algorithms are ecient and robust techniques to solve unconstrained optimization problems An excellent survey in this area was written by More [3] in 983 Let us describe how the trust{region technique wors A step s has to decrease the quadratic model q (s) from s = to s = s The way s is computed determines the magnitude of the predicted decrease q ()? q (s ) and inuences the type of global convergence of the trust{region algorithm One can as s to satisfy two classical conditions, either fraction of Cauchy decrease (simple decrease) or fraction of optimal decrease The rst condition forces the predicted decrease to be at least as large as a fraction of the decrease given for q (s) by the Cauchy step c This step is dened as the solution of the one{dimensional problem minimize q (c) and it is given by subject to c ; c 2 spanf?g g; c = 8 >< >:? g 2 g T H g g? g g if g 3 g T H g ; otherwise (26) The primitive form of a steepest{descent algorithm was discovered by Cauchy [2] in 847 The step c is called the Cauchy step because the direction?g is the steepest{descent direction for q (s) at s = in the `2 norm, ie? g g is the solution of minimize g T d subject to d = :

28 4 The step s is said to satisfy a fraction of Cauchy decrease for the trust{region subproblem (22) if q ()? q (s ) q ()? q (c ) ; s ; (27) where is positive and xed across all iterations The following lemma expresses this decrease condition in a way that is very convenient to prove global convergence to a stationary point Lemma 23 (Powell [2]) If s satises the fraction of Cauchy decrease (27), then q ()? q (s ) 2 g min ( ) g H ; : Proof Dene : IR +?! IR as (t) = q?t g g? q () Then (t) =?g t + r 2 t2, where r = gt H g g Let t 2 be the minimizer of in [; ] If t 2 (; ) then! (t ) = g =? g 2? g 2 r 2 r 2 H : (28) If t = then either r > in which case g r or r in which case r g In either event, We can combine (28) and (29) with to get the desired result (t ) = ( ) =? g + r 2 2? 2 g : (29) q ()? q (s ) q ()? q (c ) =? (t ) The second condition is more stringent and relates the predicted decrease to the decrease given on q (s) by the optimal solution o of the trust{region subproblem (22) The step s is said to satisfy a fraction of optimal decrease for the trust{region subproblem (22) if q ()? q (s ) 2 q ()? q (o ) ; s 3 ; (2)

29 5 where 2 and 3 are positive and xed across all iterations The condition s 3 replaces the condition s in (27) There is no need to have a parameter lie 3 in (27) since the algorithms that compute steps satisfying only a fraction of Cauchy decrease do not cross the boundary of the trust region An important point here is that if one sets out in practice to exactly solve (22), one will satisfy (2) 23 How to Compute a Step Several algorithms were proposed to compute a step s that satises the fraction of Cauchy decrease (27) The rst is due to Powell [2], and it is called the dogleg algorithm The idea behind this algorithm is very simple and is described below Algorithm 23 (Dogleg Algorithm (H Positive Denite)) Compute the Cauchy step c If c = then set s = c Otherwise compute the quasi{newton step?h? g, and if it is inside the trust region, set s =?H? g If not, consider the convex combination s() = (? )c? H? g, 2 [; ], and pic such that s( ) = Set s = s( ) A dogleg step is depicted in Figure 2 for a value of strictly between one and zero The dogleg algorithm is well dened for H positive denite (see for instance [39]) and can be extended to the case where H is indenite A possible way to accomplish this is to generalize the use of the classical conjugate{gradient algorithm of Hestenes and Stiefel [78] for the solution of the linear system H s =?g with H positive denite Steihaug [34] and Toint [39] adapted this algorithm for the solution of the trust{region subproblem (22) Here two new situations have to be considered First H might not be positive denite This can be xed by stopping the conjugate{ gradient loop when the rst direction of nonpositive curvature is found and using this direction to move to the boundary of the trust{region The other situation happens when an iterate of the conjugate{gradient algorithm passes the boundary of the trust region Here the dogleg idea can be used to stop at the boundary of the trust region This latter situation is illustrated in Figure 2 The conjugate{gradient algorithm is given below

30 6?g s 2 c?h? g s s 2 s 3?g s 2 s c = s s s Figure 2 A dogleg (at the left) and a conjugate{gradient (at the right) steps inside a trust region To illustrate better the conjugate{gradient algorithm, the number of iterations is set to three, which of course exceeds the number of iterations for nite termination Algorithm 232 (Conjugate{Gradient Algorithm for Trust Regions) Set s =, r =?g, and d = r ; pic > 2 For i = ; ; 2; : : : do 2 Compute i = (ri ) T (r i ) (d i ) T H (d i ) 22 Compute i such that s i + d i = : 23 If i, or if i > i, then set s = s i + i d i and stop; otherwise set s i+ = s i + i d i 24 Update the residual: r i+ = r i? i H d i 25 Chec truncation criterion: if ri+ r, set s = s i+ and stop 26 Compute i = (ri+ ) T (r i+ ) (r i ) T (r i ) and the new direction d i+ = r i+ + i d i The following proposition characterizes the type of step computed by these two algorithms Proposition 23 The Dogleg Algorithm 23 and the Conjugate{ Gradient Algorithm 232 compute steps s that satisfy the Cauchy decrease condition (27) with =

31 7 For both algorithms the proof relies on the fact that they start by minimizing the quadratic model q (s) along the steepest{descent direction?g The proof for the dogleg algorithm depends strongly on the positive deniteness of H and can be found in [39] The proof for conjugate gradients is given in [34] and uses the fact that s i+ is the optimal solution of the quadratic q (s) in the Krylov subspace K i (H ;?g ) = span n?g ;?H g ; : : : ;?(H ) i? g o : Other generalizations of the dogleg idea were suggested in the literature Dennis and Mei [37] proposed the so{called double dogleg algorithm Byrd, Schnabel, and Shultz [8], [3] introduced indenite dogleg algorithms using two dimensional subspaces Now we turn our attention to algorithms for computing steps s that satisfy the fraction of optimal decrease (2) Typically these algorithms are based on Newton type iterations and rely on the following propositions Proposition 232 The trust{region subproblem (22) has no solutions at the boundary fs : s = g if and only if H is positive denite and H? g A proof of this simple fact can be found in [6] Proposition 233 (Gay [56] and Sorensen [32]) The step o is an optimal solution of the trust{region subproblem (22) if and only if o and there exists such that H + I n is positive semi{denite; (2) (H + I n ) o =?g ; and (22) (? o ) = : (23) The optimal solution o is unique if H + I n is positive denite The necessary part of these conditions can be seen as an application of a powerful tool of Lagrange multiplier theory, the so{called Karush{Kuhn{Tucer optimality conditions, to the trust{region subproblem (22) These conditions are stated in Propositions 44 and 442 The parameter is the Lagrange multiplier associated

32 8 with the trust{region constraint s 2 2 The gradient with respect to s of the Lagrangian function `(s; ) = q (s)? ( 2? s2 ) is zero if and only if (22) holds Condition (23) is the complementarity condition Conditions (22), (23),, and o are the rst{order necessary optimality conditions If we add (2) we get the second{order necessary optimality conditions Of course Lemma 233 says that these conditions are also sucient but this part does not follow from the Karush{Kuhn{Tucer theory As a consequence of Proposition 233 we can write q ()? q (o ) = 2 R o ; where H + I n = R T R From this we have the following lemma Lemma 232 If s satises the fraction of optimal decrease (2), then q ()? q (s ) : One can compare Lemmas 23 and 232 and see how the two decrease conditions (27) and (2) inuence the accuracy of the predicted decrease q ()? q (s ) Both lemmas are critical for proving global convergence results It follows from Propositions 232 and 233 that nding the optimal solution of the trust{region subproblem (22) is equivalent in all cases but one to nding such that, H + I n is positive semi{denite and ()? s() = ; (24) where s() satises (H + I n ) s() =?g : The root nding problem (24) is usually solved by applying Newton's method to the equation: 2 ()? s() = : (25) It can be shown that both functions and 2 are convex and strictly decreasing in (? (H ); +), where (H ) denotes the smallest eigenvalue of H Reinsch [25] and Hebden [7] were the rst to observe that Newton's method performs better when applied to (25) The reason is that has a pole at? (H ) whereas 2 is nearly linear in (? (H ); +)

33 9 A Newton's iteration for these root nding equations faces numerical problems if is very close to? (H ) or if the so called hard case occurs The hard case is characterized by the following two conditions: (a) g is orthogonal to the eigenspace of? (H ) and (b) (H + I n )? g <, for all > If the hard case occurs, the rightmost root of (25) is such that H +I n is indenite Hence Newton's iteration has to be modied if one wants to compute a such that conditions (2){(23) hold In the hard case, a solution o for the trust{region subproblem (22) is given by o = p + q (26) where p solves (H? (H )I n )p =?g, the vector q is a eigenvector corresponding to (H ), and is such that p + q = : More and Sorensen [6] proposed an algorithm that combines the application of Newton's method to (25) for the easy case with (26) for the hard case They showed that the algorithm computes a step s satisfying the optimal decrease conditions (2) Their algorithm and corresponding Fortran implementation GQTPAR are based on previous wor done by Gay [56] and Sorensen [32] To compute 2 () and 2(), algorithms of the More and Sorensen type require a Cholesy factorization R T R of H + I n whenever this matrix is positive denite In fact if we solve R T R s =?g and R T q = s we have 2 () =? s and 2 () =? q 2 s 3: In large problems the computation of the Cholesy factorization might not be practical Recent new algorithms to compute a step that satises a fraction of optimal decrease that are very promising for large problems have been proposed by Rendl and Wolowicz [26], Sorensen [33], and Santos and Sorensen [29] They rely on dierent parametrizations of the trust{region subproblem (22) Instead of a Cholesy factorization, these algorithms require only matrix{vector products The material in the following paragraph follows the exposition in [29], [33]

34 2 The motivation for the new parametrization is that 2 + q (s) = s A g gt H s A : (27) The new one{dimensional function depends on the parameter and is dened as 3 (; )? (): Let () be the smallest eigenvalue of the bordered matrix given in (27) The hard case occurs when the eigenvectors of the bordered matrix associated with () have zero in its rst component If this is not the case, ie if there exists an s such that then we g gt H s A s A () ; H + ()I n is positive semi{denite; (H + ()I n )s =?g ; 3 (; ) =?g T s; and d d 3(; ) = s 2 : (28) From (28) we can see that solving the trust{region subproblem (22) is equivalent to nding such that d d 3(; ) = s 2 = : If such a () is nonnegative, then the corresponding s is the optimal solution of the trust{region subproblem (22) The parameter can be found by using interpolating schemes If the trust{region subproblem (22) has an unconstrained minimizer, then during the process of choosing a negative () is found such that s < In this case H is positive denite,?h? g is inside the trust region, and the conjugate{ gradient algorithm can be used to solve H s =?g 232 The Trust{Region Algorithm The predicted decrease pred(s ) given by s is dened as q ()? q (s ) The actual decrease ared(s ) is given by f(x )? f(x + s ) The trust{region strategy relates

35 2 the acceptance of s with the ratio ratio(s ) = ared(s ) pred(s ) : We have the following basic trust{region algorithm Algorithm 233 (Basic Trust{Region Algorithm) Choose x,, and such that < ; < 2 For = ; ; 2; : : : do 2 Stop if x satises the stopping criterion 22 Compute a step s based on the subproblem (22) 23 If ratio(s ) < reject s, set + = s and x + = x If ratio(s ) accept s, choose + and set x + = x + s Of course the rules to update the trust radius can be much more involved to enhance eciency but the above suces to prove convergence results and to understand the trust{region mechanism Two reasonable stopping criteria are g tol and g + tol for a given tol >, where is the Lagrange multiplier associated with the trust{region constraint s as described in Proposition 233 The former criterion forces global convergence to a stationary point (see Theorem 23), and the latter forces global convergence to a point satisfying the second{order necessary optimality conditions (see Theorem 233) 233 Global Convergence Results Global convergence of trust{region algorithms to stationary points for unconstrained optimization is summarized in Theorems 23 and 232 Theorem 23 (Powell [2]) Let fx g be a sequence generated by the Trust{Region Algorithm 233, where s satises the fraction of Cauchy decrease (27) Let f be continuously dierentiable and bounded below in L(x ) = fx 2 IR n : f(x) f(x )g If fh g is bounded, then lim inf g = : (29)!+

36 22 Theorem 232 (Thomas [37]) If in addition to the assumptions of Theorem 2, f is uniformly continuous in L(x ) then lim!+ g = : The proofs of these theorems can be found in [3] We remar that Powell in [2] proved (29) for a slightly dierent update of the trust radius The assumption on the Hessian approximation H can be weaened Powell [22] proved a convergence result in the case where there is a bound on the second{order approximation H that depends linearly on the iteration counter Carter [9] established analogous results for the case where the gradients g = rf(x ) are approximated rather than computed exactly If H = r 2 f(x ) and s satises the fraction of optimal decrease (2) for every, then it is also possible to analyze the global convergence of the Trust{Region Algorithm 233 to a point satisfying the second{order necessary optimality conditions Theorem 233 (More and Sorensen [6], [32]) Let fx g be a sequence generated by the Trust{Region Algorithm 233 with H = r 2 f(x ) where s satises the fraction of optimal decrease (2) Let f be twice continuously dierentiable and bounded below in the level set L(x ) If the sequences fx g and fh g are bounded, then lim inf!+ g + = and fx g has a limit point x such that r 2 f(x ) is positive semi{denite More [3] showed how to generalize these theorems for trust{region constraints of the form S s, where fs g is a sequence of nonsingular scaling matrices Related results can be found in references [56], [6], [3], [32] 234 Tihonov Regularization In this section we show how the Tihonov regularization [38] for ill{conditioned linear least{squares is related to a particular trust{region subproblem This is one of many arguments that justify the use of trust regions as a regularization technique A dierent argument is given in the next section

37 23 In many applications lie reconstruction and parameter identication problems the objective function in (2) comes from the discretization of innite dimensional problems of the form minimize Ax? b 2 Y ; (22) where x 2 X, b 2 Y, and A 2 L(X; Y ) is a linear bounded operator mapping the real Hilbert space X into the real Hilbert space Y There are situations where, due to the lac of an inverse or a continuous inverse for A, the solution to (22) does not depend continuously on b (see for instance [69]) When a discretization is introduced this type of problem leads to nite dimensional problems of the form (2) where f(x) = Ax? b 2 and A is ill{conditioned (Here A 2 IR mn and b 2 IR m, with m > n) A common technique to overcome this ill{posedness is the Tihonov regularization This regularization consists of solving a perturbed problem of the form minimize Ax? b 2 Y + Lx 2 X ; (22) where is a positive regularization parameter and L is in L(X; X) To ensure the existence and uniqueness of the solution for (22), it is assumed that L is such that for every > there exists a c > that satises Ax 2 Y + Lx 2 X c x 2 X for all x in X See [72] One can see by looing at the gradient of Ax? b 2 Y + Lx 2 X that the Tihonov regularization is strongly related to the trust{region subproblem in innite dimensions: minimize Ax? b 2 Y (222) subject to Lx X ; where > In fact, if x is the solution for (222) with Lx X =, then x is the solution for (22) with =, where is the positive Lagrange multiplier for (222) associated with x On the other hand, if x is the solution for (22) with = >, then x is the solution for (222) with = Lx X and Lagrange multiplier 24 More about Line Searches and Trust Regions We now point out interesting relationships between line searches and trust regions A major dierence between the global convergence results given in Corollary 22 and Theorem 232 is that a uniform bound on H? is required for line searches but

Analysis of Inexact Trust-Region Interior-Point SQP Algorithms. Matthias Heinkenschloss Luis N. Vicente. TR95-18 June 1995 (revised April 1996)

Analysis of Inexact Trust-Region Interior-Point SQP Algorithms. Matthias Heinkenschloss Luis N. Vicente. TR95-18 June 1995 (revised April 1996) Analysis of Inexact rust-region Interior-Point SQP Algorithms Matthias Heinkenschloss Luis N. Vicente R95-18 June 1995 (revised April 1996) Department of Computational and Applied Mathematics MS 134 Rice