Inexact Newton Methods Applied to Under Determined Systems. Joseph P. Simonis. A Dissertation. Submitted to the Faculty

Similar documents
Handout on Newton s Method for Systems

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

c 2007 Society for Industrial and Applied Mathematics

Maria Cameron. f(x) = 1 n

1. Introduction. The problem of interest is a system of nonlinear equations

Unconstrained optimization

8 Numerical methods for unconstrained problems

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Numerical Methods for Large-Scale Nonlinear Systems

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Numerical Methods for Large-Scale Nonlinear Equations

A derivative-free nonmonotone line search and its application to the spectral residual method

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Linear Algebra Massoud Malek

GLOBALIZATION TECHNIQUES FOR NEWTON KRYLOV METHODS AND APPLICATIONS TO THE FULLY-COUPLED SOLUTION OF THE NAVIER STOKES EQUATIONS

Outline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

A Numerical Study of Globalizations of Newton-GMRES Methods. Joseph P. Simonis. AThesis. Submitted to the Faculty WORCESTER POLYTECHNIC INSTITUTE

This manuscript is for review purposes only.

CS 450 Numerical Analysis. Chapter 5: Nonlinear Equations

5 Handling Constraints

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

STOP, a i+ 1 is the desired root. )f(a i) > 0. Else If f(a i+ 1. Set a i+1 = a i+ 1 and b i+1 = b Else Set a i+1 = a i and b i+1 = a i+ 1

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

17 Solution of Nonlinear Systems

Simple Iteration, cont d

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

Higher-Order Methods

Nonlinear Programming

Scientific Computing: An Introductory Survey

R-Linear Convergence of Limited Memory Steepest Descent

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.

Trust-region methods for rectangular systems of nonlinear equations

On the iterate convergence of descent methods for convex optimization

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Real Analysis Alternative Chapter 1

Numerical Methods for Differential Equations Mathematical and Computational Tools

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Notes on Numerical Optimization

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Newton s Method. Javier Peña Convex Optimization /36-725

Search Directions for Unconstrained Optimization

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Newton s Method and Efficient, Robust Variants

Programming, numerics and optimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Chapter 3 Numerical Methods

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Spectral gradient projection method for solving nonlinear monotone equations


How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

Trust Regions. Charles J. Geyer. March 27, 2013

5 Overview of algorithms for unconstrained optimization

Douglas-Rachford splitting for nonconvex feasibility problems

An Inexact Newton Method for Nonlinear Constrained Optimization

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

THE INVERSE FUNCTION THEOREM

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

FIXED POINT ITERATIONS

The Steepest Descent Algorithm for Unconstrained Optimization

On Lagrange multipliers of trust-region subproblems

CHAPTER 3 Further properties of splines and B-splines

Static unconstrained optimization

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Written Examination

Global convergence of trust-region algorithms for constrained minimization without derivatives

Algorithms for constrained local optimization

Affine covariant Semi-smooth Newton in function space

Metric Spaces and Topology

Zangwill s Global Convergence Theorem

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Globalization techniques for Newton Krylov methods and applications to the fully-coupled solution of the Navier Stokes equations

An introduction to Mathematical Theory of Control

Algorithms for Constrained Optimization

Nonlinear Optimization: What s important?

A Distributed Newton Method for Network Utility Maximization, II: Convergence

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Non-polynomial Least-squares fitting

Iterative Methods for Solving A x = b

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

YURI LEVIN, MIKHAIL NEDIAK, AND ADI BEN-ISRAEL

Constrained Nonlinear Optimization Algorithms

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Optimal Newton-type methods for nonconvex smooth optimization problems

Transcription:

Inexact Newton Methods Applied to Under Determined Systems by Joseph P. Simonis A Dissertation Submitted to the Faculty of WORCESTER POLYTECHNIC INSTITUTE in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Mathematics May 2006 APPROVED: Professor Homer F. Walker, Dissertation Advisor Mathematical Sciences Department Professor Joseph D. Fehribach, Committee Member Mathematical Sciences Department Professor Arthur C. Heinricher, Committee Member Mathematical Sciences Department Dr. John N. Shadid, Committee Member Sandia National Laboratories Professor Suzanne L. Weekes, Committee Member Mathematical Sciences Department

Contents 1 Introduction 4 2 Overview 6 2.1 Preliminaries.................................................. 6 2.2 Newton-like Methods.............................................. 9 2.2.1 Newton s Method............................................ 9 2.2.2 Inexact Newton Methods....................................... 10 2.2.3 Globalized Inexact Newton Methods................................. 12 2.2.4 Trust Region Methods......................................... 13 2.3 Normal Flow Method............................................. 14 3 Methods and Theories 16 3.1 Inexact Newton Methods for Under Determined Systems......................... 16 3.2 A Globalized Inexact Newton Method for Under Determined Systems.................. 22 3.3 Backtracking Methods............................................. 26 3.3.1 Choosing the Scaling Factor...................................... 32 3.4 Trust Region Methods for Under Determined Systems........................... 34 3.4.1 Under Determined Dogleg Method.................................. 41 4 Numerical Experiments 45 4.1 Test Problems.................................................. 45 4.1.1 The Bratu Problem.......................................... 45 4.1.2 The Chan Problem........................................... 46 4.1.3 The Lid Driven Cavity Problem................................... 47 4.1.4 The 1D Brusselator Problem..................................... 49 4.1.5 The 2D Brusselator Problem..................................... 51 4.2 The Solution Algorithms............................................ 51 4.3 Results...................................................... 53 5 Summary 56 5.1 Summary.................................................... 56 5.2 Additional Applications............................................ 57 5.3 Future Work.................................................. 57 Appendices 59 A Adaptation of GMRES 60 B Matlab Files 63 C Raw Data 78 C.1 Chan Problem Results............................................. 78 C.2 Bratu Problem Results............................................. 79 C.3 1D Brusselator Problem Results........................................ 80 C.4 2D Brusselator Problem Results........................................ 82 C.5 Driven Cavity Problem Results........................................ 87 1

List of Figures 4.1 The Bratu Solution Space w/ Solution................. 47 4.2 Chan Solution............................... 48 4.3 Lid Driven Cavity............................. 49 4.4 1D Brusselator Solution Space w/ Solution............... 50 4.5 2D Brusselator Solution......................... 52 4.6 Table of Compute Times......................... 54 4.7 Table of Normalized Compute Times.................. 54 4.8 Residual Norms of Nonlinear functions................. 55 2

Abstract Consider an under-determined system of nonlinear equations F(x) = 0, F : IR m IR n, where F is continuously differentiable and m > n. This system appears in a variety of applications, including parameter dependent systems, dynamical systems with periodic solutions, and nonlinear eigenvalue problems. Robust, efficient numerical methods are often required for the solution of this system. Newton s method is an iterative scheme for solving the nonlinear system of equations F(x) = 0, F : IR n IR n. Simple to implement and theoretically sound, it is not, however, often practical in its pure form. Inexact Newton methods and globalized inexact Newton methods are computationally efficient variations of Newton s method commonly used on large-scale problems. Frequently, these variations are more robust than Newton s method. Trust region methods, thought of here as globalized exact Newton methods, are not as computationally efficient in the large scale case, yet notably more robust than Newton s method in practice. The normal flow method is a generalization of Newton s method for solving the system F : IR m IR n, m > n. Easy to implement, this method has a simple and useful local convergence theory; however, in its pure form, it is not well suited for solving large-scale problems. This dissertation presents new methods that improve the efficiency and robustness of the normal flow method in the large scale case. These are developed in direct analogy with inexact Newton, globalized inexact Newton, and trust region methods, with particular consideration of the associated convergence theory. Included are selected problems of interest simulated in MATLAB. 3

Chapter 1 Introduction This dissertation presents newly developed methods for solving the under-determined nonlinear system of equations F(x) = 0, F : IR m IR n with m > n 1. These methods are shown to be globally robust, locally fast, and computationally efficient on large-scale systems of equations. We define an under determined system of nonlinear equations to be any system of nonlinear equations with more unknowns than equations, regardless of the uniqueness or existence of its solutions. These systems appear in a variety of applications. After discretization, certain nonlinear partial differential equation (PDE) eigenvalue problems (e.g. the Bratu problem) and some parameter dependent PDE s (e.g. the driven cavity problem) take the form F(x,λ) = 0 with x IR n and λ IR. Under-determined systems also sometimes appear when calculating periodic orbits of dynamical systems. Here, one seeks x(0) and T satisfying T 0 f(x(t),t) = dx(t) dt. The function f(x(t),t) is assumed to be nonlinear. f(x(t),t)dt = 0, where This dissertation is divided into three main sections following the introduction. The first of these presents background material. Here, general notation is discussed along with useful lemmas and definitions. This section also includes descriptions of Newton s method and relevant Newton-like methods for solving the nonlinear sys- 4

tem F(x) = 0, F : IR n IR n ; these are important since they will be used later as models for the new methods. Relevant Newton-like methods include inexact Newton methods, globalized inexact Newton methods and trust region methods. Inexact Newton methods only approximately solve the Newton system at each iteration. Details about all of these methods can be found in [2, 21, 31]. Globalized inexact Newton methods impose additional requirements on the generated iterates to improve robustness of the algorithms. See [5, 6, 22] and included references. Trust region methods for nonlinear systems of equations stem from methods in unconstrained optimization. For an understanding of both the methods and their subsequent adaptation to nonlinear systems see [4, 18, 32, 20] and the references therein. The normal flow method, an adaptation of Newton s method for solving under-determined systems, is also presented here. Further material about the adaptation of Newton s method to under determined systems can be found in [16, 35] and the included references. The second section presents new methods for solving the under-determined system of nonlinear equations. Here, the methods from the first section are used to motivate generalizations of the normal flow method. The discussion and development of the new methods closely parallel the development of methods in [2, 5]. These generalizations are shown to have fast local convergence properties and to be globally robust. Additionally, if suitably implemented, they are computationally efficient for solving large-scale systems of equations. The final section discusses numerical experiments. Several specific methods are coded in MATLAB and applied to model problems of interest. The test problems include nonlinear eigenvalue problems (the Bratu problem [13] and the Chan problem [1]), a parameter dependent fluid flow problem (the driven cavity problem [7, 30]), and periodic orbit calculations (the Brusselator problem in one and two dimensions [15, 10, 11]). 5

Chapter 2 Overview 2.1 Preliminaries We begin with a brief overview of assumptions, notation, and a few definitions and lemmas used throughout the text. The norm is assumed to be the Euclidean norm on vectors or the induced norm on matrices throughout. Most results can be extended to the case of an arbitrary inner product vector norm. The function F is from IR m to IR n and is continuously differentiable. It has a Jacobian matrix denoted by F, an n m matrix with F ij { ( Fi (x) x j )}1in,1jm. The δ-neighborhood of a point x IR n is the set N δ (x) {y IR n x y < δ}. A stationary point of F is a point x IR m for which there does not exist an s IR m such that F(x) + F (x)s < F(x). The stationary points include local minimizers of F. 6

Definition 1 ([4]). Let x IR m and x k IR m,k = 1, 2,... Then {x k } is said to converge to x if lim k x k x = 0. If there exists a constant c [0, 1) and an integer ˆk 0 such that for all k ˆk, x k+1 x c x k x, then {x k } is said to be q-linearly convergent to x. If for some sequence {c k } that converges to 0, x k+1 x c k x k x for each k, then {x k } is said to converge q-superlinearly to x. If {x k } converges to x and there exist constants p > 1 and c 0 such that x k+1 x c x k x p for each k, then {x k } is said to converge to x with q-order at least p. If p = 2 or p = 3, then the convergence is said to be q-quadratic or q-cubic, respectively. Throughout, we use q-convergence as opposed to r-convergence. The q stands for quotient, and r stands for root. A sequence {x k } converges to x k with r-order p if { x k+1 x } is bounded above by a sequence in IR that converges to zero with q-order p. Definition 2 ([4]). A function g is Hölder continuous with exponent p (0, 1] and constant γ in a set Ω IR m if, for every x, y Ω, g(x) g(y) γ x y p. Definition 3 ([4]). A function g is Lipschitz continuous with constant γ in a set Ω IR m, written g Lip γ (Ω), if for every x, y Ω, g(x) g(y) γ x y. Definition 4 ([3]). Given F : IR m IR n continuously differentiable and x IR m, F (x) + is the pseudo inverse of F (x), if, given b IR n, F (x) + b IR m is the 7

solution of F (x)s = b having minimal Euclidean norm. When F (x) is of full rank, the pseudo inverse has the form F (x) + = F (x) T (F (x)f (x) T ) 1 and is called the Moore Penrose pseudo inverse. Lemma 1 ([21]). Let F : IR m IR n be a continuously differentiable function. For any x IR m and ǫ > 0, there exists a δ > 0 such that F(z) F(y) F (y)(z y) ǫ z y (2.1) whenever y,z N δ (x). Lemma 2 ([21]). Let F : IR m IR n be continuously differentiable in the open convex set Ω IR m, let x Ω, and let F be Hölder continuous with exponent p and constant γ at x in the neighborhood Ω. Then, for any x + s Ω, Proof. By Lemma (4.1.9) in [4], F(x + s) F(x) F (x)s γ 1 + p s 1+p. (2.2) F(x + s) F(x) F (x)s = = [ 1 0 1 0 ] F (x + ts)s dt F (x)s [F (x + ts) F (x)]s dt. We then obtain F(x + s) F(x) F (x)s 1 0 1 0 F (x + ts) F (x) s dt γ ts p s dt 1 = γ s 1+p t p dt = 0 γ 1 + p s 1+p. 8

Lemma 3. Assume F (x) IR n m is a continuous function of x and is of full rank. Then F (x) + is a continuous function of x. Proof. Because F (x) is of full rank, the Moore Penrose pseudo inverse can be written F (x) + = F (x) T (F (x)f (x) T ) 1. Since F (x) T is a continuous function of x, we have that F (x)f (x) T is a continuous function of x, and it follows that (F (x)f (x) T ) 1 and, hence, F (x) + are continuous functions of x. 2.2 Newton-like Methods This section presents three classes of Newton-like methods designed to solve F(x) = 0 with F : IR n IR n. We begin with a description of Newton s method and follow with descriptions of inexact Newton methods, globalized inexact Newton methods, and trust region methods. 2.2.1 Newton s Method Consider the problem: find x IR n such that F(x) = 0, (2.3) where F : IR n IR n is continuously differentiable. Newton s method begins by assuming an initial guess, x 0, and generates a sequence of iterates via x k+1 = x k F (x k ) 1 F(x k ). In practice, this involves solving the linear system F (x k )s k = F(x k ) (2.4) 9

for the Newton step s k, and then defining x k+1 = x k + s k. Algorithm NM: Newton s Method Let x 0 be given. For k = 0 step 1 until do: Solve F (x k )s k = F(x k ) Set x k+1 = x k + s k. Under mild assumptions the sequence will approach a root of F provided x 0 is sufficiently near the root. Theorem 1 ([34, 4]). Suppose F is Lipschitz continuously differentiable at x, F(x ) = 0 and F (x ) is nonsingular. Then for x 0 sufficiently near x, {x k } produced by Newton s method is well-defined and converges to x with x k+1 x c x k x 2 for a constant c independent of k. The method is simple to implement and theoretically sound, but, in its pure form, not often used to solve large-scale problems. The exact linear solve at each iteration makes the method computationally inefficient. 2.2.2 Inexact Newton Methods Inexact Newton methods [2] are variations of Newton s method designed to be computationally efficient on large scale problems, and are commonly used in the large-scale case. Recall, that the general idea of Newton s method is to linearize F around a current guess, x k, in hope that the root of the linear model, x k+1, is a better approximation of the root of the nonlinear problem than was x k. There are two drawbacks 10

with this method. First, solving for a root of the linear model, in practice, may be computationally time consuming. Second, when far from a solution, the root of the linear model may not be a good approximation of the root of the nonlinear problem. We replace the Newton step with an inexact Newton step. We no longer require s k to exactly solve F (x k )s k = F(x k ), rather only that s k be a point where the norm of the local linear model has been reduced. Precisely, we find some η k [0, 1) and require s k to satisfy F(x k ) + F (x k )s k η k F(x k ). (2.5) Notice that as η k approaches zero, s k approaches the Newton step. This replacement allows s k to be calculated cheaply. Often, an efficient iterative linear solver such as the generalized minimal residual method (GMRES) 1 is used for this calculation. The following algorithm is the inexact Newton method (INM). Algorithm INM: Let x 0 be given. For k = 0 step 1 until do: Find some η k [0, 1) and s k that satisfy F(x k ) + F (x k )s k η k F(x k ) Set x k+1 = x k + s k. The scalar η k is called the forcing term and its choice affects both local convergence properties and the robustness of the method [2, 6]. Assume x is a solution of (2.3) at which the Jacobian is of full rank. If x 0 is sufficiently close to x and 0 η k η max < 1 for each k, then {x k } converges to x q-linearly in some norm. Furthermore, q-superlinear convergence is obtained if lim k η k = 0. Finally, if η k = O( F(x k ) ), then the convergence is q-quadratic. See [2] for further details. 1 See Appendix A 11

2.2.3 Globalized Inexact Newton Methods The robustness of an inexact Newton method often is enhanced by globalizations, i.e., augmentations of the basic method that test and modify steps to ensure adequate progress toward a solution [5, 22]. A step satisfying the inexact Newton condition (2.5) yields a decrease in the local linear model norm, yet this decrease is not always reflected in the nonlinear residual norm. In other words, the chosen step may not actually reduce F. To ensure a reduction of F, an additional step selection criterion is added. The step should reduce F at least some fraction of the reduction predicted by the local linear model of F. More precisely, given a t (0, 1), s k should be chosen to satisfy (2.5) and a sufficient decrease condition: F(x k + s k ) [1 t(1 η k )] F(x k ). (2.6) The resulting algorithm is a globalized inexact Newton method (GINM). Algorithm GINM: Let x 0 and t (0, 1) be given. For k = 0 step 1 until do: Find some η k [0, 1) and s k that satisfy F(x k ) + F (x k )s k η k F(x k ) and F(x k + s k ) [1 t(1 η k ) F(x k ) Set x k+1 = x k + s k. The following is a global convergence theorem for algorithm GINM. Theorem 2 ([5]). Assume that algorithm GINM does not break down. If Σ k 0 (1 η k ) is divergent, then F(x k ) 0. If, in addition, x is a limit point of {x k } such that F (x ) is invertible, then F(x ) = 0 and x k x. 12

2.2.4 Trust Region Methods A general trust region method produces a sequence of iterates using the following procedure: at each iteration we assume the local linear model is an accurate representation of the nonlinear function within some closed δ-ball around the current iterate. We choose a step, s, to minimize F(x) + F (x)s over all s satisfying s δ. Then, we check to see if s is acceptable. If it is not acceptable, this indicates the local linear model is not a good representation of F in the δ-ball; δ is decreased and a new s is chosen. This is repeated until an acceptable step is found. The value δ is a measure of our trust of the local linear model. One commonly used test for step acceptability is the ared/pred condition[5]. Let ared be the actual reduction in function norm obtained by taking a step s: ared(s) F(x) F(x + s). (2.7) The predicted reduction, pred, is the reduction predicted by the local linear model; pred(s) F(x) F(x) + F (x)s. (2.8) The ared/pred condition for step acceptability requires the actual reduction in function norm to be at least some fraction of the predicted reduction; ared(s) t pred(s), t [0, 1). (2.9) The following general trust region method (TR) from [5] is similar in spirit to the method in [18]. Algorithm TR: Trust Region Method Let x 0, δ 0 > 0, 0 < t u < 1, and 0 < θ min < θ max < 1 be given For k = 0 step 1 until do: Set δ k = δ k and 13

choose s k arg min s δk F(x k ) + F (x k )s While ared k (s k ) < t pred k (s k ) do: Choose θ [θ min,θ max ] Update δ k θδ k and choose s k arg min s δk F(x k ) + F (x k )s Set x k+1 = x k + s k If ared k (s k ) u pred k (s k ) choose δ k+1 δ k ; Else choose δ k+1 θ min δ k Theorem 3 ([5]). Assume that Algorithm TR does not break down. Then every limit point of {x k } is a stationary point of F. If x is a limit point of {x k } such that F (x ) is invertible, then F(x ) = 0 and x k x ; furthermore, s k = F (x k ) 1 F(x k ), the full Newton step, whenever k is sufficiently large. 2.3 Normal Flow Method Now, consider an under-determined root finding problem: find x IR m such that F(x) = 0, (2.10) where F : IR m IR n is a continuously differentiable function with m > n. Here, the linear system (2.4) is under-determined, i.e., it may have an infinite number of solutions. In order to develop a well defined algorithm, an additional constraint must be imposed so that a unique step, s k, can be defined. Choosing s k to be the solution of the linear system (2.4) with minimum Euclidean norm gives the normal flow method [35]. The pseudo inverse solution of the linear system is a natural choice for a Newton step because it is the shortest step from the current iterate to a root of the linear problem and, therefore, the linear model is likely to be a better representation of the nonlinear function at that step than at other solutions of (2.4). Hereafter, the 14

normal flow algorithm will be called Newton s method for under-determined systems (NMU). Algorithm NMU: Let x 0 be given. For k = 0 step 1 until do: Let s k = F (x k ) + F(x k ) Set x k+1 = x k + s k. Mathematically, we have F (x k )s k + F(x k ) = 0 and s k Null(F (x k )). When F (x k ) is of full rank, s k will hereafter be referred to as the Moore-Penrose step. A local convergence theory for Algorithm NMU is given in [35] and generalized in [16]. The central result from [35] with respect to this method follows. Hypothesis 1. F is differentiable and F is of full rank n in an open convex set Ω, and the following hold: (i) There exist γ 0 and p (0, 1] such that F (y) F (x) γ y x p for all x,y Ω. (ii) There is a constant µ for which F (x) + µ for all x Ω. Definition 5. For ρ > 0, let Ω ρ = {x Ω : y x < ρ y Ω}. Theorem 4 ([35]). Let F satisfy Hypothesis 1 and suppose Ω η is given by Definition (5) for some η > 0. Then there is an ǫ > 0 which depends only on γ, p, µ, and η such that if x 0 Ω η and F(x 0 ) < ǫ, then the iterates {x k } k=0,1,... determined by Algorithm NMU are well defined and converge to a point x Ω such that F(x ) = 0. Furthermore, there is a constant β for which x k+1 x β x k x p+1, k = 0, 1,... (2.11) If F (x) is Lipschitz continuous in Ω then p = 1 and the iterates produced by Algorithm NMU converge q-quadratically. 15

Chapter 3 Methods and Theories 3.1 Inexact Newton Methods for Under Determined Systems The previous subsection introduced a variation of Newton s method for solving the under-determined system F(x) = 0, F : IR m IR n, and briefly discussed its local convergence theory. This section presents a class of inexact Newton methods for application to the under-determined system. A convergence theory is developed for these new methods. Each iteration of NMU requires the step, s k, satisfying F(x k ) + F (x k )s k = 0 with s k NullF (x k ). Calculation of s k requires solving a linear system of equations. When n is large, this may be computationally expensive. To improve the computational efficiency, we allow for an approximate solution of the linear system. We seek an s k satisfying F(x k ) + F (x k )s k η k F(x k ), where η k [0, 1). (3.1) 16

and s k Null(F (x k )). (3.2) Constraint (3.1) will henceforth be called the inexact Newton condition. The inexact Newton method for under-determined systems (INMU) follows: Algorithm INMU: Let x 0 be given. For k = 0 step 1 until do: Find some η k [0, 1) and s k that satisfy F(x k ) + F (x k )s k η k F(x k ) s k Null(F (x k )) Set x k+1 = x k + s k. The remainder of this section presents a theoretical foundation for this algorithm. Lemma 4. Assume s Null(F (x)), then s = F (x) + F (x)s. Proof. Define s = F (x) + F (x)s. Then s is the pseudo inverse solution of F (x) s = F (x)s. (3.3) Additionally, s Null(F (x)) because s is the minimum norm solution of the linear problem. Rearranging equation (3.3) gives F (x)( s s) = 0; therefore, the vector ( s s) Null(F (x)). However, because both s and s are orthogonal to the null space of the Jacobian, it is also true that ( s s) Null(F (x)). Therefore, ( s s) Null(F (x)) Null(F (x)) = {0}. We conclude that s = s. Then, s = s = F (x) + F (x)s. 17

Theorem 5. Let F satisfy Hypothesis 1 and suppose ρ > 0. Assume that η k η max < 1 for k = 0, 1,... Then there is an ǫ > 0 depending only on γ, p, µ, ρ and η max such that if x 0 Ω ρ and F(x 0 ) ǫ, then the iterates {x k } determined by Algorithm INMU are well defined and converge to a point x Ω such that F(x ) = 0. Proof. Suppose x Ω and s is such that s Null(F (x)) and F(x) + F (x)s η max F(x). Define x + x + s and suppose x + Ω. We can write s = F (x) + F (x)s because s Null(F (x)). Then and s F (x) + F (x)s = F (x) + F(x) + F(x) + F (x)s F (x) + ( F(x) + F(x) + F (x)s ) µ( F(x) + η max F(x) ) = µ(1 + η max ) F(x) F(x + ) F(x + ) F(x) F (x)s + F(x) + F (x)s Choose ǫ > 0 sufficiently small that γ 1+p s 1+p + η max F(x) γ [µ(1 + η 1+p max)] 1+p F(x) 1+p + η max F(x). τ γ [µ(1 + η 1+p max)] 1+p ǫ p + η max < 1 and ǫµ(1+ηmax) < ρ. 1 τ If F(x) ǫ, then F(x + ) τ F(x). We argue by induction that if x 0 Ω ρ and F(x 0 ) ǫ then F(x k+1 ) τ F(x k ) < ǫ and x k Ω for all k. First we show that F(x 1 ) τ F(x 0 ) < ǫ and x 1 Ω. We have that s 0 Null(F (x 0 )) and F(x 0 ) + F (x 0 )s 0 η max F(x 0 ), so 18

s 0 µ(1 + η max ) F(x 0 ) µ(1 + η max )ǫ < µ(1+ηmax)ǫ 1 τ < ρ. Therefore we have x 1 Ω because x 0 Ω ρ. By the above argument, F(x 1 ) τ F(x 0 ) < ǫ. Now assume F(x j+1 ) τ F(x j ) < ǫ and x j Ω for all j k. We show that this is true for j = k + 1. As before x j Ω, s j satisfies s j Null(F (x j )), and F(x j ) + F (x j )s j η max F(x j ). Then x j+1 x 0 j l=0 s l j l=0 µ(1 + η max) F(x l ) µ(1 + η max ) j l=0 τ l ǫ < µ(1 + η max ) l=0 τ l ǫ = µ(1+ηmax)ǫ 1 τ < ρ, so x 0 Ω ρ implies x j+1 Ω. Again, using the earlier argument, F(x j+1 ) τ F(x j ) τ j F(x 0 ) < ǫ. Now, F(x k+1 ) τ F(x k ) implies the sequence { F(x k ) } converges to zero, and since s k µ(1 + η max ) F(x k ), it must be that s k 0 as k. Note x k+l x k l 1 j=0 s k+j l 1 j=0 µ(1 + η max) F(x k+j ) µ(1 + η max ) l 1 j=0 τ j F(x k ) < µ(1 + η max ) j=0 τ j F(x k ) = µ(1+ηmax) F(x 1 τ k ) µ(1+ηmax) τ k F(x 1 τ 0 ) µ(1+ηmax) 1 τ τ k ǫ. 19

Therefore, {x k } is a Cauchy sequence. It has a limit x Ω. The continuity of F yields F(x ) = 0. Theorem 6. Let F satisfy Hypothesis 1 and suppose ρ > 0. Assume that 0 η k η max < 1 for k = 0, 1,... and that {x k } produced by Algorithm INMU converges to x Ω such that F(x ) = 0. Let M F (x ) and assume F (x k ) 2M for all x k Ω. If ǫ > 0 and η max in the proof above are chosen sufficiently small such that τ γ 1+p (µ(1 + η max)) 1+p ǫ p + η max < 1 (1+η max)2mµ+1 and ǫµ(1+ηmax) 1 τ < ρ, then {x k } converges to x q-linearly. Proof. Start with x k+1 x j=k+1 s j j=k+1 µ(1 + η max) F(x j ) = µ(1 + η max ) j=k+1 F(x j) µ(1 + η max ) j=0 τ j F(x k+1 ) = µ(1+ηmax) F(x 1 τ k+1 ) µ(1+ηmax)τ F(x 1 τ k ) = µ(1+ηmax)τ F(x 1 τ k ) F(x ) 2Mµ(1+ηmax)τ x 1 τ k x. By the choice of ǫ, the term 2Mµ(1+ηmax)τ 1 τ convergent. is less than 1. Therefore {x k } is q-linearly Theorem 7. Let F satisfy Hypothesis 1 and suppose ρ > 0. Assume that 0 η k η max < 1 for k = 0, 1,... and that {x k } produced by Algorithm INMU converges to x Ω such that F(x ) = 0. If η k 0, then {x k } converges to x q-superlinearly. If η k = O( F(x k ) p ), then {x k } x with q-order 1 + p. 20

Proof. Let M F (x ). There exists a δ > 0 such that F (x) 2M and F(x + s) F(x) F (x)s < γ 1+p s 1+p whenever x x δ. Assume that k is sufficiently large that x k x δ. We first show superlinear convergence. We have F(x k+1 ) F(x k+1 ) F(x k ) F (x k )s k + F(x k ) + F (x k )s k Previous calculations give x k+1 x µ(1+ηmax) 1 τ Now, let c k γ 1+p s k 1+p + η k F(x k ) γ 1+p [µ(1 + η k) F(x k ) ] 1+p + η k F(x k ) γ 1+p [2Mµ(1 + η k) x k x ] 1+p + η k 2M x k x. µ(1+ηmax) 1 τ µ(1+ηmax) 1 τ F(x k+1 ) [ ] γ [2Mµ(1 + η 1+p k) x k x ] 1+p + η k 2M x k x [ ] γ [2Mµ(1 + η 1+p k)] 1+p x k x p + η k 2M x k x. γ 1+p [2Mµ(1 + η k)] 1+p x k x p +η k 2M. Combined, lim k x k x = 0 and lim k η k = 0 imply lim k c k = 0. Thus, {x k } is q-superlinearly convergent. Now assume η k = O( F(x k ) p ). Because η k is on the order of F(x k ) p, there exists a constant C independent of k such that η k C F(x k ) p C(2M) p x k x p for all sufficiently large k. Then [ ] x k+1 x µ(1+ηmax) γ 2Mµ(1 + η 1 τ 1+p k) 1+p x k x p + η k µ x k x [ µ(1+ηmax) γ 2Mµ(1 + η 1 τ 1+p k) 1+p x k x p +C(2M) p µ x k x p ] x k x [ ] µ(1+ηmax) γ 2Mµ(1 + η 1 τ 1+p k) 1+p + C(2M) p µ x k x 1+p [ ] µ(1+ηmax) γ 2Mµ(1 + η 1 τ 1+p max) 1+p + C(2M) p µ x k x 1+p, which gives q-order 1 + p convergence. It follows that, if F is Lipschitz continuous, then p = 1, and we have q-quadratic convergence. 21

3.2 A Globalized Inexact Newton Method for Under Determined Systems Inexact Newton methods for under-determined systems can achieve fast local convergence rates. Under mild assumptions, including an x 0 such that F(x 0 ) is sufficiently small, the sequence generated by Algorithm INMU converges to a solution of problem (2.10). However, if an acceptable x 0 cannot be found, the sequence may fail to converge. Here, the goal is to augment the step selection criteria of Algorithm INMU with a sufficient decrease condition. In analogy with GINM, the additional requirement on the chosen steps is meant to increase the likelihood that the iterates converge to a solution, given an arbitrary x 0. Additionally, we seek a modification that retains the fast rates of convergence. Each step, s k, must still satisfy the inexact Newton conditions (3.1) and (3.2). We now also require that the step reduce the norm of F at least some fraction of the reduction predicted by the local linear model. Given some t (0, 1), s k should be chosen such that F(x k + s k ) [1 t(1 η k )] F(x k ), (3.4) the same criterion chosen by Eisenstat and Walker in [5]. They note that step criteria (3.1) and (3.4) are similar to acceptability tests used in certain minimization algorithms [18, 32] and methods for solving nonlinear equations [12, 26]. Imposing this additional constraint yields our globalized inexact Newton method for underdetermined systems (GINU) Algorithm GINMU: Global Inexact Newton Method For Under Determined Systems Let x 0 and t (0, 1) be given. For k = 0 step 1 until do: 22

Find some η k [0, 1) and s k that satisfy F(x k ) + F (x k )s k η k F(x k ) s k Null(F (x k )) and F(x k + s k ) [1 t(1 η k )] F(x k ) Set x k+1 = x k + s k. The first lemma shows that an s k satisfying (3.1), (3.2) and (3.4) can be found for each k so long as F(x k ) 0 and x k is not a stationary point of F. Lemma 5. Let x and t (0, 1) be given and assume that there exists an s satisfying F(x) + F (x) s < F(x) and s Null(F (x)). Then there exists η min [0, 1) such that, for any η [η min, 1), there is an s satisfying F(x) + F (x)s η F(x) F(x + s) [1 t(1 η)] F(x) s Null(F (x)). Proof. Clearly F(x) 0 and s 0. Set η F(x)+F (x) s F(x), ǫ { η min max where δ > 0 is sufficiently small that (1 t)(1 η) F(x), s η, 1 (1 η)δ s }, F(x + s) F(x) F (x)s ǫ s whenever s δ. For any η [η min, 1), let s 1 η s. Then 1 η 23

F(x) + F (x)s = η η 1 η (F(x)) + (F(x) + F (x) s) 1 η 1 η η η 1 η F(x) + F(x) + F (x) s 1 η 1 η = η η 1 η F(x) + η F(x) 1 η 1 η = η F(x), and, since s = 1 η s 1 η min s δ, 1 η 1 η it follows that F(x + s) F(x + s) F(x) F (x)s + F(x) + F (x)s ǫ s + η F(x) = ǫ 1 η s + η F(x) 1 η = (1 t)(1 η) F(x) + η F(x) = [1 t(1 η)] F(x). Assume y is an element of the null-space of F (x). Then Thus s Null(F (x)). s T y = ( 1 η 1 η s)t y = 1 η 1 η ( s)t y = 0. Theorem 8. Assume that {x k } is generated by Algorithm GINMU. If k 0 (1 η k) is divergent, then F(x k ) 0. If, in addition, x is a limit point of {x k } such that F (x ) is of full rank, and there exists a Γ independent of k for which s k Γ(1 η k ) F(x k ) (3.5) whenever x k is sufficiently near x and k is sufficiently large, then F(x ) = 0 and x k x. 24

Proof. From equation (3.4), F(x k ) [1 t(1 η k 1 )] F(x k 1 ) F(x 0 ) [1 t(1 η j )] 0j<k [ F(x 0 ) exp t ] (1 η j ). 0j<k Since t > 0 and 1 η j > 0, the divergence of k 0 (1 η k) implies F(x k ) 0. Suppose that x is a limit point of {x k } such that F (x ) is of full rank and that {x k } does not converge to x. Let δ > 0 be such that there exist infinitely many k for which x k / N δ (x ) and sufficiently small that (3.5) holds whenever x k N δ (x ) and k is sufficiently large. Since x is a limit point of {x k }, there exist {k j } and {l j } such that, for each j, x kj N δ/j (x ), x kj +i N δ (x ), i = 0,...,l j 1 x kj +l j / N δ (x ), k j + l j < k j+1. Then for j sufficiently large, δ/2 x kj +l j x kj = Γ t k j +l j 1 k=k j k j +l j 1 k=k j k j +l j 1 Γ k=k j Γ t s k Γ(1 η k ) F(x k ) { F(x t k) F(x k+1 ) } { F(xkj ) F(x kj +l j ) } { F(xkj ) F(x kj+1 ) }. But the last right-hand side converges to zero since x kj x ; hence, this inequality cannot hold for large j. An alternate proof of the first half of the theorem follows: 25

Proof. (Walker, private communication) From equation (3.4), t(1 η k 1 ) F(x k 1 ) F(x k 1 ) F(x k ) and F(x 0 ) F(x k ) = k ( F(x i 1 ) F(x i ) ) i=1 This implies = F(x 0 ) = k 1 ( F(x i 1 ) F(x i ) ) + F(x k 1 ) F(x k ). i=1 k 1 ( F(x i 1 ) F(x i ) ) + F(x k 1 ) i=1 k 1 ( F(x i 1 ) F(x i ) ) i=1 k 1 (t(1 η i 1 ) F(x i 1 ) ) i=1 k 1 = t (1 η i 1 ) F(x i 1 ). i=1 Since t > 0 and 1 η j > 0, the divergence of k 0 (1 η k) implies F(x k ) 0. 3.3 Backtracking Methods The global inexact Newton method for an under-determined system presented above generates a sequence of steps satisfying the inexact Newton and sufficient decrease conditions. This section discusses methods for determining satisfactory steps. Assume that an initial step satisfying (3.1) and (3.2) can be found, i.e., an s k approximating the Moore Penrose step and a forcing term, η k, are computed. Furthermore, assume this step does not satisfy the sufficient decrease condition. Backtracking methods 26

systematically scale s k and η k to find an s k and η k satisfying all three conditions (3.1), (3.2), and (3.4). This leads to the under-determined backtracking method (BINMU): Algorithm BINMU: Let x 0 and t (0, 1), η max [0, 1), and 0 < θ min < θ max < 1 be given. For k = 0 step 1 until do: Find some η k [0,η max ] and s k that satisfy F(x k ) + F (x k ) s k η k F(x k ), s k Null(F (x k )). Evaluate F(x k + s k ). Set η k = η k and s k = s k. While F(x k + s k ) > [1 t(1 η k )] F(x k ), do Choose θ [θ min,θ max ]. Update s k θs k and η k 1 θ(1 η k ). Evaluate F(x k + s k ). Set x k+1 = x k + s k. If a step satisfying the original inexact Newton condition is found, then properly scaling the step will yield a step satisfying both a modified inexact Newton condition and the associated sufficient decrease condition. Theorem 9. Assume that at the k th step of Algorithm BINMU there exists an η k [0, η max ] and s k satisfying F(x k ) + F (x k ) s k η k F(x k ) s k Null(F (x k )). Also, assume F (x k ) is of full rank. Then the while loop will terminate in a finite 27

number of steps with an s k and η k satisfying F(x k ) + F (x k )s k η k F(x k ) s k Null(F (x k )) F(x k+1 ) [1 t(1 η k )] F(x k ). Proof. The ith iteration of the while loop scales s k by some θ i [θ min,θ max ]. At the m th step of the while loop s k = m i=1 θ i s k and η k = 1 m i=1 θ i(1 η k ). Notice m i=1 θ i m i=1 θ max = θ m max. Given any ǫ > 0, an m can be found such that Θ m m i=1 θ i < ǫ. Choose m large enough such that F(x k + Θ m s k ) F(x k ) F (x k )Θ m s k C Θ m s k, where C = 1 F (x k ) + ((1 t)(1 ηmax) (1+ η max) ). We claim that s k = Θ m s k and η k = 1 Θ m (1 η k ) are satisfactory. Indeed, F(x k ) + F (x k )s k = (1 Θ m )F(x k ) + Θ m F(x k ) + Θ m F (x k ) s k (1 Θ m ) F(x k ) + Θ m F(x k ) + F (x k ) s k (1 Θ m ) F(x k ) + Θ m η k F(x k ) = [1 Θ m + Θ m η k ] F(x k ) = [1 Θ m (1 η k )] F(x k ) = η k F(x k ). Further, assume y is an element of the null-space of F (x k ); 28

s T k y = (Θ m s k ) T y = Θ m ( s k ) T y = 0. Finally, F(x k + s k ) F(x k ) + F (x k )s k + F(x k + s k ) F(x k ) F (x k )s k η k F(x k ) + F(x k + s k ) F(x k ) F (x k )s k η k F(x k ) + C s k η k F(x k ) + CΘ m F (x k ) + F (x k ) s k η k F(x k ) + CΘ m F (x k ) + ( F(x k ) + F(x k ) + F (x k ) s k ) η k F(x k ) + CΘ m F (x k ) + (1 + η k ) F(x k ) = [ η k + CΘ m F (x k ) + (1 + η k ) ] F(x k ) [ = η k + Θ ] m (1 t)(1 η max )(1 + η k ) F(x k ) 1 + η max [ η k + Θ m 1 + η max (1 t)(1 η max )(1 + η max ) ] F(x k ) [η k + Θ m (1 t)(1 η k )] F(x k ) = [η k + 1 1 + Θ m (1 η k ) t + t tθ m (1 η k )] F(x k ) = [η k + 1 (1 Θ m (1 η k )) t + t(1 Θ m (1 η k ))] F(x k ) = [η k + 1 η k t + tη k ] F(x k ) = [1 t + tη k ] F(x k ) = [1 t(1 η k )] F(x k ). Therefore, satisfactory s k and η k are found in at most m steps, so the while loop always terminates in a finite number of steps. 29

Theorem 10. Assume that {x k } is generated by Algorithm BINMU. Assume x is a limit point of {x k } such that F (x ) is of full rank. Then F(x ) = 0 and x k x. Furthermore, η k = η k and s k = s k for all sufficiently large k. Proof. Set K = F (x ) + and let δ > 0 be sufficiently small that F (x) + 2K whenever x N δ(x ). Let ǫ = 1 ((1 t)(1 ηmax) 2K (1+ η max) ). There exists a ˆδ > 0 such that F(z) F(y) F (y)(z y) ǫ z y for all z,y Nˆδ(x ). Let δ = min{ δ, ˆδ/2}. Suppose that x k N δ (x ). Let m be the smallest integer such that θmax2k(1 m + η max ) F(x 0 ) < δ. Then, with Θ m as in the proof of Theorem 9, Θ m s k θmax s m k = θmax s m k θ m max2k(1 + η k ) F(x k ) θ m max2k(1 + η max ) F(x k ) θ m max2k(1 + η max ) F(x 0 ) < δ. This implies F(x k + Θ m s k ) F(x k ) F (x k )Θ m s k ǫ Θ m s k, which, as in the proof of Theorem 9, guarantees that the while loop terminates in at most m iterations. Therefore, when x k N δ (x ) we have 1 η k = Θ m (1 η k ) Θ m (1 η max ) θmin(1 m η max ). 30

It is clear that θ m min(1 η max ) > 0. It is given that x is a limit point of {x k }, so there are an infinite number of x k N δ (x ). Therefore, the sum k 0 (1 η k) diverges. We claim that s k Γ(1 η k ) F(x k ) for some Γ independent of k when x k N δ (x ). Indeed, s k F (x k ) + F (x k )s k 2K ( F(x k ) + F(x k ) + F (x k )s k ) 2K(1 + η k ) F(x k ) 2K(1 + η max ) F(x k ) 2K(1+ηmax)(1 η k) F(x θmin m (1 ηmax) k ) = Γ(1 η k ) F(x k ) with Γ = 2K(1 + η max) θ m min (1 η max). Therefore, by Theorem 8 we have that F(x ) = 0 and x k x. To show η k = η k for all sufficiently large k, it is sufficient to show that F(x k + s k ) F(x k ) F (x k ) s k ǫ s k (3.6) for all sufficiently large k. Equation (3.6) is true if s k < δ. Note that x k N δ (x ) for all sufficiently large k, and, therefore F (x k ) + 2K. Now s k F (x k ) + F (x k ) s k 2K( F(x k ) + F(x k ) + F (x k ) s k ) 2K(1 + η k ) F(x k ) 2K(1 + η max ) F(x k ). It is clear that F(x k ) 0, so there exists some k such that for all k > k we have 2K(1 + η max ) F(x k ) < δ. Therefore, for k > k, s k < δ, which implies (3.6). 31

3.3.1 Choosing the Scaling Factor This section does not contribute to the mathematical literature, but it is included here for completeness. In-depth descriptions of the subsequent methods can be found in [4] and [34]. A scaling factor θ [θ min,θ max ] must be chosen at each iteration of the while loop in Algorithm BINMU. The goal is to choose θ such that x k+1 = x k +θs k is an acceptable next iterate. Ideally, θ minimizes F(x k +θs k ), or equivalently F(x k +θs k ) 2. However, this 1-dimensional minimization problem may be computationally expensive. An alternative is to find an easy-to-minimize approximation to F(x k + θs k ) 2. Two of the most popular schemes for choosing θ are described below. The quadratic backtracking method chooses θ to be the minimizer of a quadratic polynomial approximating the function g(θ) = F(x k + θs k ) 2. Let p(θ) denote this quadratic polynomial. The polynomial can be defined using three pieces of information; g(0) = F(x k ) 2, g(1) = F(x k + s k ) 2 and g (0) = 2F(x k ) T F (x k )s k. Notice the first two are already known, and the third is relatively inexpensive to calculate. Using these three values, p(θ) is determined, and its minimizer can be calculated. The quadratic polynomial is given by p(θ) = [g(1) g(0) g (0)]θ 2 + g (0)θ + g(0). (3.7) The derivatives are: p (θ) = 2[g(1) g(0) g (0)]θ + g (0) and p (θ) = 2[g(1) g(0) g (0)]. 32

If p (θ) 0, then the quadratic function is concave down, so choose θ = θ max. If p (θ) > 0, then find θ such that p (θ) = 0: 0 = 2[g(1) g(0) g (0)]θ + g (0) θ = g (0) 2[g(1) g(0) g (0)]. Correcting to ensure θ [θ min,θ max ], one obtains θ. The method updates θs k s k and 1 θ(1 η k ) η k and then checks to see if the updated s k and η k satisfy the step selection criterion. The cubic backtracking method uses a cubic polynomial to approximate g(θ) = F(x k + θs k ) 2 after the first step-length reduction. The cubic polynomial, p(θ), is constructed using four interpolation values. Again, θ is chosen to be the minimizer of p(θ) over [θ min,θ max ]. On the first step reduction there is no clear way to choose a fourth value, so just three are chosen, and a quadratic polynomial is minimized, yielding θ 1. If subsequent reductions are necessary four values are available. The two values g(0) and g (0) are used, along with values of g at the two previous θ values. For example, θ 2 is found using g(0), g (0), g(θ 1 ), and g(1). Generalizing, θ i uses g(0), g (0), g(θ i 1 ), and g(θ i 2 ). As in [4], denote the two previous θ values as θ prev and θ 2prev. The cubic polynomial approximation of the function F(x + θs k ) 2 is p(θ) = aθ 3 + bθ 2 + g (0)θ + g(0) with [ a b ] [ 1 1 θprev = 2 θ prev θ 2prev θ 2prev θprev 2 1 θ2prev 2 θ prev θ2prev 2 ] [ g(θprev ) g(0) g (0)θ prev g(θ 2prev ) g(0) g (0)θ 2prev ]. The local minimizer of the model is given by θ + = b+ b 2 3ag (0) 3a the step and forcing term by θs k s k and 1 θ(1 η k ) η k. 33. As before, update

3.4 Trust Region Methods for Under Determined Systems Trust region methods for an under-determined system of equations are very similar to the methods for the fully determined system. That is, we define a region in which the local linear model is expected to be an accurate representation of the nonlinear function. A step is chosen to minimize the local linear model norm within this region and tested to see whether it satisfies the ared/pred condition. If it does not, the trust region is shrunk and a new minimum is calculated. To ensure locally fast convergence, the steps must approach the Moore-Penrose steps as {x k } approaches a root of F(x). This condition suggests that each step be chosen such that it is orthogonal to the null space of the Jacobian. The trust region method for under-determined systems (UTR) becomes: Algorithm UTR: Let x 0, δ 0 > 0, 0 < t u < 1, and 0 < θ min < θ max < 1 be given. For k = 0 step 1 until do: Set δ k = δ k and choose s k arg min s δk F(x k ) + F (x k )s with s k Null(F (x k )). While ared k (s k ) < t pred k (s k ) do: Choose θ [θ min,θ max ]. Update δ k θδ k, and choose s k arg min s δk F(x k ) + F (x k )s with s k Null(F (x k )). Set x k+1 = x k + s k. If ared k (s k ) u pred k (s k ) choose δ k+1 δ k ; Else choose δ k+1 θ min δ k. The analogy between UTR and TR is manifest in this section. Parallels between 34

the theorems presented here and in [5] reflect this close relationship. The following Lemma was shown in [5] in the fully determined case and is extended here to the under determined case. Lemma 6. Assume that {x k } is such that pred k (s k ) (1 η k ) F(x k ) and ared k (s k ) t (1 η k ) F(x k ) for each k, where t (0, 1) is independent of k. If x is a limit point of {x k } such that there exists a Γ independent of k for which s k Γ pred k (s k ) whenever x k is sufficiently near x and k is sufficiently large, then x k x. Proof. Assume that {x k } does not converge to x. Let δ > 0 be such that there exist infinitely many k for which x k / N δ (x ) and sufficiently small that s k Γ pred k (s k ) holds whenever x k N δ (x ) and k is sufficiently large. Since x is a limit point of {x k }, there exist {k j } and {l j } such that, for each j, Then for j sufficiently large, x kj N δ/j (x ), x kj +i N δ (x ), i = 0,...,l j 1 x kj +l j / N δ (x ), k j + l j < k j+1. δ/2 x kj +l j x kj k j +l j 1 k=k j k j +l j 1 s k k=k j Γ pred k (s k ) k j +l j 1 k=k j Γ ared t k(s k ) = Γ t { F(x k j ) F(x kj +l j ) } Γ t { F(x k j ) F(x kj+1 ) }. But the last right-hand side converges to zero because F is continuous and x kj x ; hence, this inequality cannot hold for large j. 35

Lemma 7. Assume that {x k } is a sequence generated by Algorithm UTR. Suppose that x is a limit point of {x k } such that there exists a Γ independent of k for which s k Γ pred k (s k ) (3.8) whenever x k is sufficiently near x and k is sufficiently large. Then x k x and lim inf k δ k > 0. Proof. It is clear that {x k } satisfies the hypotheses of Lemma 6, and it follows immediately that x k x. Choose δ > 0 such that (3.8) holds whenever x k N δ (x ) and k is sufficiently large and also such that F(y) F(x) F (x)(y x) 1 u y x (3.9) Γ whenever x,y N δ (x ). Let k 0 be such that if k k 0, then x k N δ/2 (x ) and (3.8) holds. We claim if k k 0, then the while-loop in Algorithm UTR terminates with δ k min{ δ k0,θ min δ/2}. Note that if k k 0 and if s k is a trial step for which s k δ/2, then (3.8) and (3.9) give ared k (s k ) F(x k ) F(x k + s k ) F(x k ) F(x k ) + F (x k )s k F(x k + s k ) F(x k ) F (x k )s k pred k (s k ) 1 u s Γ k pred k (s k ) (1 u)pred k (s k ) u pred k (s k ) From this, we see that the while loop terminates when δ k δ/2. So, if the while loop never reduces the radius, we have δ k = δ k. If reductions occur, the loop terminates 36

on or before the iteration that first brings δ k δ/2, which implies δ k θ min δ/2. So δ k min{ δ k,θ min δ/2}. (3.10) Furthermore, if δ k δ/2 on termination, then δ k+1 δ k ; whereas, if δ k > δ/2 on termination, then δ k+1 θ min δ k > θ min δ/2. Thus δ k+1 min{ δ k,θ min δ/2}. (3.11) Induction on (3.10) and (3.11) gives that the while loop terminates with δ k min{ δ k0,θ min δ/2}. Finally, δ k min{ δ k0,θ min δ/2} implies lim inf k δ k min{ δ k0,θ min δ/2} > 0. Lemma 8. If x is such that F (x ) is of full rank, then there exist Γ and ǫ > 0 such that for any δ > 0, s arg min s δ F(x) + F (x) s (3.12) and s Null(F (x k )) (3.13) satisfies s Γ{ F(x) F(x) + F (x)s } (3.14) whenever x N ǫ (x ). Proof. Set K F(x ) + and let ǫ > 0 be sufficiently small that F (x) is of full rank and F(x) + 2K whenever x N ǫ (x ). Suppose that x N ǫ (x ) and s is given by (3.12) and (3.14) for an arbitrary δ > 0. Denote s MP F (x) + F(x). s MP is the unique global minimizer of the norm of the local linear model orthogonal to the null space of F (x). We claim that s s MP. Indeed, if s MP δ then 37

s = s MP, otherwise the radius is too small to include s MP and s δ < s MP. If s MP = 0, then s = 0 and (3.14) holds trivially for any Γ. If s MP 0 then we know that, since s minimizes F(x) + F (x) s over all s δ with s Null(F (x)), it must be that F(x) + F (x)s F(x) + F (x) s s MP smp ; therefore, F(x) F(x) + F (x)s F(x) F(x) + F (x) s s MP smp = F(x) F(x) + F (x) s s MP ( F (x) + F(x)) = F(x) F(x) s s MP F(x) = F(x) + s smp F(x) s MP = s s MP F(x). Since s MP = F (x) + F(x) and therefore, s MP F (x) + F(x) 2K F(x) we have Hence, 1 2K F(x) s MP. F(x) F(x) + F (x)s s 2K, and (3.14) holds with Γ 2K for all x N ǫ (x ). Lemma 9. If x is not a stationary point of F, then there exist Γ, δ > 0 and ǫ > 0 such that s given by (3.12) satisfies (3.14) whenever x N ǫ (x ) and 0 < δ < δ. Proof. Let ǫ > 0 be such that if x N ǫ (x ), then F(x) 1 2 F(x ). Let s be such that F(x ) + F (x )s < F(x ). Choose η such that F(x ) + F (x )s / F(x ) < η < 1. 38

Since F and F are continuous, there exists ǫ (0,ǫ] such that F(x) + F (x)s η F(x) whenever x N ǫ (x ). Choose δ (0, s ). Suppose that x N ǫ (x ) and 0 < δ δ. For s given by (3.12), we have F(x) F(x) + F (x)s F(x) F(x) + F (x) s F(x) ( 1 s s s s F(x) + F (x)s (1 η ) F(x) s s (1 η ) F(x ) 2 s s, s s ) F(x) and (3.14) holds with Γ 2 s (1 η ) F(x ). Theorem 11. Assume that {x k } is a sequence produced by Algorithm UTR. Then every limit point of {x k } is a stationary point of F. If x is a limit point of {x k } such that F (x ) is of full rank, then F(x ) = 0 and x k x ; furthermore, s k = F (x k ) + F(x k ) whenever k is sufficiently large. Proof. Assume that x is a limit point of {x k } that is not a stationary point of F. We claim that, for any δ > 0, there exists an ǫ > 0 such that if x k N ǫ (x ) and k is sufficiently large, then δ k δ. If this were not true, then there would exist some 39

δ > 0 and x kj x k such that x kj x and δ kj > δ for each j. Then 0 = lim j { F(x kj ) F(x kj+1 ) } lim { F(x kj ) F(x kj j +1) } = lim j ared kj (s kj ) t lim j pred kj (s kj ) = t lim j { F(x kj ) F(x kj ) + F (x kj )s kj } = t lim j { F(x kj ) min s δ kj F(x kj ) + F (x kj )s } t lim j { F(x kj ) min s δ F(x k j ) + F (x kj )s } = t { F(x ) min s δ F(x ) + F (x )s }. However, the last right hand side must be positive since x is not a stationary point. Now let Γ, δ and ǫ be as in Lemma 9. By the above claim, there exists ǫ (0,ǫ ] such that if x k N ǫ (x ) and k is sufficiently large, then δ k δ. By Lemma 9, we have s k Γpred k (s k ) for Γ independent of k whenever x k N ǫ (x ) and k is sufficiently large. Then Lemma 7 implies that x k x and lim inf k δ k > 0. But since the claim implies that δ k 0, this is a contradiction. Hence, x must be a stationary point. Suppose that x is a limit point of {x k } such that F (x ) is of full rank. Since x must be a stationary point, we must have F(x ) = 0. It follows from Lemma 8 that there exists a Γ independent of k for which s k Γpred k (s k ) whenever x k is sufficiently near x. Then Lemma 7 implies that x k x, and there exists a δ > 0 such that δ k δ for sufficiently large k. Since x k x and F(x k ) F(x ) = 0, we have that lim F (x k ) + F(x k ) lim F (x k ) + F(x k ) = F (x k ) + F(x k ) = 0 k k 40

implies that, for sufficiently large k, the Moore-Penrose step will be accepted. Thus s k = F (x k ) + F(x k ) whenever k is sufficiently large. 3.4.1 Under Determined Dogleg Method At each iteration, calculating the trust region step requires the minimization of F(x) + F (x)s over all s such that s δ. Calculating the trust region step to a high degree of accuracy is usually prohibitively expensive. Many methods of approximating the step have been developed. Examples include the locally constrained optimal hook step method, the dogleg step method and the double dogleg step method, see [4] and [34]. The dogleg method ([25, 24]) is the focus of this section. The method builds a piecewise linear curve, Γ DL, which approximates the curve minimizing the local linear model in the trust region. In the fully determined case, the dogleg curve connects the current point to the Cauchy point and subsequently the Newton point. The Cauchy point is defined to be the minimizer of l(s) 1 F(x) + F (x)s 2 2 2 in the steepest descent direction l(0) = F (x) T F(x), the steepest descent point, [34] s CP k = F (x) T F(x) 2 2 F (x) T F(x). F (x)f (x) T F(x) 2 2 Here, we replace the Newton point with the Moore Penrose point. We denote the Cauchy point by s CP k and the Moore Penrose point by s MP k. In the fully determined case, the dogleg curve, Γ DL, intersects the trust region boundary at a single point [4]. This result is easily extended to the under determined case. The dogleg step is the step from the current point to the intersection point. We introduce the underdetermined Dogleg method (UDL): 41