Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Similar documents
Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods

Trust Regions. Charles J. Geyer. March 27, 2013

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Notes on Numerical Optimization

Second Order Optimization Algorithms I

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

On Lagrange multipliers of trust-region subproblems

Higher-Order Methods

(Here > 0 is given.) In cases where M is convex, there is a nice theory for this problem; the theory has much more general applicability too.

Maria Cameron. f(x) = 1 n

Gradient-Based Optimization

Inexact Newton Methods Applied to Under Determined Systems. Joseph P. Simonis. A Dissertation. Submitted to the Faculty

5 Handling Constraints

Trust-region methods for rectangular systems of nonlinear equations

On Lagrange multipliers of trust region subproblems

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.

Unconstrained optimization

Line Search Methods for Unconstrained Optimisation

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Gradient Descent. Dr. Xiaowei Huang

Written Examination

Static unconstrained optimization

Nonlinear Programming

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

8 Numerical methods for unconstrained problems

An Inexact Newton Method for Nonlinear Constrained Optimization

1 Directional Derivatives and Differentiability

Introduction to Nonlinear Optimization Paul J. Atzberger

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

Performance Surfaces and Optimum Points

EECS260 Optimization Lecture notes

On the iterate convergence of descent methods for convex optimization

Optimization and Optimal Control in Banach Spaces

A trust region algorithm with a worst-case iteration complexity of O(ɛ 3/2 ) for nonconvex optimization

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Complexity of gradient descent for multiobjective optimization

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Algorithms for constrained local optimization

CONSTRAINED NONLINEAR PROGRAMMING

Università di Firenze, via C. Lombroso 6/17, Firenze, Italia,

Derivative-Free Trust-Region methods

Algorithms for Constrained Optimization

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

10. Unconstrained minimization

w T 1 w T 2. w T n 0 if i j 1 if i = j

Chapter 3 Numerical Methods

MA677 Assignment #3 Morgan Schreffler Due 09/19/12 Exercise 1 Using Hölder s inequality, prove Minkowski s inequality for f, g L p (R d ), p 1:

Global convergence of trust-region algorithms for constrained minimization without derivatives

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Introduction to Real Analysis Alternative Chapter 1

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

Numerical Optimization

THE restructuring of the power industry has lead to

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

Unconstrained minimization of smooth functions

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Chapter 4. Unconstrained optimization

A Distributed Newton Method for Network Utility Maximization, II: Convergence

ORIE 6326: Convex Optimization. Quasi-Newton Methods

5.6 Penalty method and augmented Lagrangian method

Examination paper for TMA4180 Optimization I

Mathematical optimization

Interpolation-Based Trust-Region Methods for DFO

Lecture 3: Linesearch methods (continued). Steepest descent methods

Constrained Optimization and Lagrangian Duality

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Step-size Estimation for Unconstrained Optimization Methods

Programming, numerics and optimization

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Worst Case Complexity of Direct Search

This manuscript is for review purposes only.

Conjugate Gradient Method


Iterative Methods for Solving A x = b

Appendix A Functional Analysis

PDE-Constrained and Nonsmooth Optimization

An improved convergence theorem for the Newton method under relaxed continuity assumptions

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Chap 3. Linear Algebra

1. Introduction. We consider nonlinear optimization problems of the form. f(x) ce (x) = 0 c I (x) 0,

Newton s Method. Javier Peña Convex Optimization /36-725

Optimization and Root Finding. Kurt Hornik

Geometry optimization

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

10. Ellipsoid method

Chapter 7 Iterative Techniques in Matrix Algebra

Convex Optimization. Problem set 2. Due Monday April 26th

Optimization II: Unconstrained Multivariable

Transcription:

Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of m k (p) within the trust region. Typically the trust region is chosen to be a ball around x k of radius k that is updated every iteration. For poorly scaled problems ellipsoidal trust regions can be chosen. The model m k (p) is typically quadratic and given by m k (p) = f k + f T k p + 1 2 pt B k p, f k := f(x k ), f k := f(x k ), and B k is some symmetric matrix. When B k = f k we have a trust region Newton method. In the rest of this section, we will discuss the outline of the trust region algorithm and its convergence, and exact and approximate techniques for solving the constraint optimization problem (1) m k (p) = f k + f T k p + 1 2 pt B k p, p k k. 1.1. Outline of the algorithm and convergence. The agreement between the model m k and the objective function within the trust region is quantified by the ratio (2) ρ k := f(x k) f(x k + p k ) m k (0) m k (p k ). The numerator is called the actual reduction, and the denominator is called the predicted reduction. The predicted reduction is always nonnegative. If ρ k is close to 1, the model is quite accurate, and the trust region can be increased. If ρ k 0, the model makes a poor prediction. Then the trust region needs to be decreased and the the step needs to be rejected. The algorithm implementing these ideas is given below. Algorithm Trust Region Input: max, 0 (0, max ], η [0, 1 4 ). for k = 0, 1, 2,... Obtain p k by solving Eq. (1) exactly or approximately; Calculate ρ k from Eq. (2); if ρ k < 1 4 set k+1 = 1 4 p k else if ρ k > 3 4 and p k = k set k+1 = min{2 k, max }; else set k+1 = k ; if ρ k > η accept step: x k+1 = x k + p k ; else reject step x k+1 = x k ; end The convergence properties of this algorithm depend on the parameter η and on whether some sufficient decrease is achieved at every iteration. The sufficient decrease condition is 1

2 given by the inequality (3) m k (0) m k (p k ) c 1 f k min { k, f } k, c 1 (0, 1]. B k Theorem 1. Suppose B k β for some constant β, and f is continuously differentiable and bounded from below on the set {x f(x) f(x 0 )}. Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf k f k := lim k ( inf m>k f m ) = 0, i.e., one can extract a subsequence from { f k } converging to zero. (2) if η (0, 1 4 ) in the algorithm Trust Region and f is in addition Lipschitz continuously differentiable in the set {x f(x) f(x 0 )}, then lim f k = 0. k 1.2. Characterization of the exact solution of the trust region problem. Theorem 2. Let p be the global solution of the trust-region problem (4) min p R n m(p) = f + gt p + 1 2 pt Bp, p, if and only if there is a scalar λ 0 such that the following conditions are satisfied (5) (6) (7) (B + λi)p = g, λ( p ) = 0, (B + λi) is positive semidefinite. Condition (6) shows that at least one of the following holds: λ = 0 or p =. This means that p is a global minimizer of m(p) or, if not, p =, i.e., if p is not a global minimizer of m(p), the constrained minimum is achieved on the boundary of the region. Condition (5) implies that if λ > 0, p B k p + g m(p ), i.e., p is orthogonal to the level sets of m(p). Condition (7) tells that λ 1 λ 2... λ n are the eigenvalues of B then λ [ λ 1, ). The proof of this theorem relies on the following lemma. Lemma 1. Let m be the quadratic function defined by where B is any symmetric matrix. Then m(p) = g T p + 1 2 pt Bp,

(1) m attains a minimum if and only if B is positive semidefinite and g is in the range of B; (2) m has a unique minimizer if and only if B is positive definite; (3) if B is positive semidefinite, then every p satisfying Bp = g is a global minimizer of m. Note that of g is not in the range of B then m(p) does not attain minimum. For example, let m(x, y) = y + x 2. Here B =. Proof. [ 1 0 0 0 ] [ 0, while g = 1 ] is not in the range of B. Obviously, min m(x, y) = (1) ( =): Since g is in the range of B, one can find p such that Bp = g. Then for all w R n we have m(p + w) = g T (p + w) + 1 2 (p + w)t B(p + w) = (g T p + 1 2 pt Bp) + g T w + (Bp) T w + 1 2 wt Bw = m(p) + 1 2 wt Bw m(p), since B is positive semidefinite. (= ): Let p be a minimizer of m. Since m(p) = Bp + g = 0, g is in the range of B. Also, m(p) = B is positive semidefinite. (2) ( =): Since B is positive definite and hence invertible, one can find p such that Bp = g. Repeating the calculation from the previous item and taking into account that 1 2 wt Bw > 0 for all nonzero w we obtain that the minimizer is unique. (= ): Let p be a minimizer of m. From the proof of previous item B must be positive semidefinite. If B is not positive definite, one can find w 0 such that Bw = 0. Then m(p) = m(p + w), hence the minimizer is not unique, a contradiction. (3) The proof of the last item follows from the proof of the first item. Now we will proof Theorem 2 Proof. ( =): Suppose there is λ 0 such that Eqs (5)-(7) are satisfied. Lemma 1 (3) implies that p is a global minimum of the quadratic function Since ˆm(p) ˆm(p ), we have ˆm(p) = g T p + 1 2 pt (B + λi)p = m(p) + λ 2 pt p. m(p) m(p ) + λ 2 (p T p p T p). Since λ( p ) = 0 and therefore λ( 2 p T p ) = 0 we have m(p) m(p ) + λ 2 ( 2 p T p). 3

4 Since λ 0 we have that m(p) m(p ) for all p such that p. Therefore, p is a global solution of Eq. (4). (= ): Suppose p is a global solution of Eq. (4). First consider the case where p <. Then p is an unconstrained minimizer of m(p). Hence m(p ) = g + Bp = 0, m(p ) = B is positive semidefinite. Hence Eqs. (5)-(7) hold for λ = 0. Now we assume that p =. Then Eq. (6) is satisfied. Hence p is the minimum of m satisfying the constraint p =. Then the Lagrangian function has a stationary point at p satisfying L(p, λ) = m(p) + λ 2 (pt p 2 ) p L(p, λ) = Bp + g + λp = (B + λi)p + g = 0. Hence Eq. (5) holds. Since m(p) m(p ) for all p such that p = we have m(p) m(p ) + λ 2 (p T p p T p). Substituting the expression for g g = (B + λi)p into the last equation we get 1 2 (p p ) T (B + λi)(p p ) 0. Since the set of directions {w : w = ± p } p p p, p = is dense in the unit sphere, we conclude that (B + λi) is positive definite. It remains to show that λ 0. Since we have proven that (B + λi)p = g and B + λi is positive semidefinite, we have that p is a minimum of ˆm(p) = g T p + 1 2 pt (B + λi)p. Hence ˆm(p) ˆm(p ), i.e., m(p) m(p ) + λ 2 (p T p p T p). Now suppose that only some negative λ satisfies Eqs. (5)-(7). Then from the last equation we have that m(p) m(p ) whenever p p =. Since p minimizes m for all p with p we conclude that p is an unconstrained global minimizer of m. From Lemma 1 (1) it follows that them Bp = g and B is positive semidefinite. Hence Eqs. (5)-(7) are satisfies by λ = 0. This contradicts to the assumption that λ satisfying these conditions is negative. Thus, there exists λ 0 satisfying Eqs. (5)-(7).

1.3. Calculation of nearly exact solution. We start solving the trust region problem (4) with checking whether B is positive definite, and if it is, checking whether p = B 1 g satisfies p <. If B is positive semidefinite and g is in the range of B, one can find the minimum norm solution p of the underdetermined system Bp = g and check whether p. Now suppose that either B is not positive semidefinite or the global minimizer of m satisfies p >. Then we define p(λ) := (B + λi) 1 g, λ max{0, λ 1 }, where λ 1 is the smallest eigenvalue of B, and look for λ such that Let B = QΛQ T where Then Then p(λ =. Λ = diag{λ 1,..., λ n }, λ 1... λ n, Q = [q 1... q n ], p(λ) = Q(Λ + λ j I) 1 Q T = (8) p(λ) 2 = n j=1 q T j q k = δ jk n j=1 (q T j g)2 (λ j + λ) 2. q T j g λ j + λ q j. Therefore, the problem of solving (4) is reduced to the 1D root-finding problem (8). Note that if B is positive definite and B 1 g > then there is exactly one solution λ of Eq. (8) on the interval [0, ) since Read [1] for details. lim p(λ) = 0. λ 1.4. Approximate solution of the trust region problem. Three approaches for approximate solution of the trust region problem (4) are considered in [1]: the Dogleg approach, the 2D subspace approach: p span{g, B 1 g}, and Steihaug s approach good for large and sparse B = f k and based on the Conjugate Gradient method. We will consider the Dogleg approach and the 2D subspace minimization approach. We will start with the concept of the Cauchy point that is used for reference: the approximate solution must reduce the objective function f at least as much as the Cauchy point does. 5

6 1.4.1. Cauchy point. The Cauchy point is the minimizer of m k (p) = f k + f T k p + 1 2 pt B k P, p k along the steepest descent direction f k. It is readily found in the explicit form. The steepest descent direction is given by f k. A vector of length k in this direction is p s k := f k f k k. We will look for the Cauchy point in the form p c k = τ kp s k. We need to consider two cases: f T k B k f k 0 and f T k B k f k > 0. If f T k B k f k 0 the function M(τ) := m k (τp s k ) = f k τ f k k + τ 2 2 fk T B k f k f k 2 2 k, τ [ 1, 1] decreases monotonically as τ grows whenever f k. Hence we need to pick the largest admissible τ k, i.e., τ k = 1. If f T k B k f k > 0, the global minimum of M(τ) is achieved at τ min = f k 3 k f T k B k f k. Hence if τ min 1 the global minimum of M is achieved within the interval [ 1, 1]. Otherwise we need to pick the largest τ toward the minimum, i.e., τ = 1. To summarize, we have found that the Cauchy point is given by where p c k = τ f k k f k k { 1, if f T k B k f k 0, τ k = { } f min k 3 k fk T B, 1 otherwise. k f k The Cauchy point provides a sufficient reduction to the model to give global convergence. However, implementing the Cauchy point at every step we simply use the steepest descent algorithm with a particular choice of step length. It is well-known that the steepest descent performs poorly even if the optimal step length is chosen at every iteration. This consideration motivates us to fid a better approximate solution of the trust region problem than the Cauchy point.

7 1.4.2. The Dogleg Method. The Dogleg method method is suitable for the case where B k is positive definite. Its name is motivated by the fact that the solution of the trust region problem is looked along the path consisting of two line segments: from x k to the unconstrained minimum of m k (p) along the steepest descent direction and then to the unconstrained minimum of the quadratic model. We observe that of is small, the quadratic term in m k has little influence on the direction of the step: the direction is approximately f k, while if is large, the solution of the trust region problem is the global minimizer of the quadratic model. The unconstrained minimum of m k along the steepest descent direction is given by p U = gt g g T B k g, g := f k. The global minimizer of the quadratic model is given by p B = B 1 k g. The dogleg path p(τ), τ [0, 2] is defined by { τp U, 0 τ 1 p(τ) = p U + (τ 1)(p B p U ), 1 τ 2. The following lemma shows that the dogleg path intersects the trust region boundary at most once and the intersection point can be computed analytically! Lemma 2. Let B k be positive definite. Then (1) p(τ) is an increasing function of τ; (2) m(p(τ)) is an increasing function of τ. The proof can be found in [1]. The solution is calculated as follows. If p B k then p = p B. If p B l k while p U < k we solve the quadratic equation to find τ: If p U k we set p U + (τ 1)(p B p U ) 2 = 2 k. τ = p U. 1.4.3. Two-dimensional subspace minimization. This approach is an extension of the dogleg approach. Suppose B is positive definite. Then we solve the following constrained minimization problem (9) min p m(p) = f + g T p + 1 2 pt Bp, p, p span{g, B 1 g}. If B has negative eigenvalues, we look for p in another subspace defined by p span{g, (B + αi) 1 g}, α ( λ 1, 2λ 1 ), where λ 1 is the most negative eigenvalue of B. If B has zero eigenvalues but no negative eigenvalues, we use the Cauchy point as an approximate solution.

8 References [1] J. Nocedal, S. Wright, Numerical Optimization, Springer, 1999