Lecture 7 Unconstrained nonlinear programming

Similar documents
Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Programming, numerics and optimization

Chapter 4. Unconstrained optimization

Convex Optimization CMU-10725

Nonlinear Programming

5 Quasi-Newton Methods

2. Quasi-Newton methods

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Math 273a: Optimization Netwon s methods

Convex Optimization. Problem set 2. Due Monday April 26th

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Quasi-Newton Methods

MATH 4211/6211 Optimization Quasi-Newton Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Lecture 14: October 17

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Unconstrained optimization

NonlinearOptimization

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Statistics 580 Optimization Methods

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

1 Numerical optimization

Scientific Computing: An Introductory Survey

Numerical Methods I Solving Nonlinear Equations

Maximum Likelihood Estimation

1 Numerical optimization

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Maria Cameron. f(x) = 1 n

Algorithms for Constrained Optimization

Numerical solutions of nonlinear systems of equations

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Optimization Methods

Quasi-Newton methods for minimization

Optimization II: Unconstrained Multivariable

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Nonlinear Optimization: What s important?

Optimization Tutorial 1. Basic Gradient Descent

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

Lecture V. Numerical Optimization

Gradient-Based Optimization

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Lecture 18: November Review on Primal-dual interior-poit methods

Static unconstrained optimization

Optimization and Root Finding. Kurt Hornik

Line Search Methods for Unconstrained Optimisation

Optimization II: Unconstrained Multivariable

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Matrix Derivatives and Descent Optimization Methods

Numerical Analysis of Electromagnetic Fields

The Steepest Descent Algorithm for Unconstrained Optimization

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

17 Solution of Nonlinear Systems

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Lecture 7: Optimization methods for non linear estimation or function estimation

Multivariate Newton Minimanization

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

8 Numerical methods for unconstrained problems

Higher-Order Methods

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

GRADIENT = STEEPEST DESCENT

Notes on Numerical Optimization

Data Mining (Mineria de Dades)

14. Nonlinear equations

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Lecture 3. Optimization Problems and Iterative Algorithms

13. Nonlinear least squares

Lecture 10: September 26

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Optimization Methods

Numerical solution of Least Squares Problems 1/32

Quadratic Programming

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Search Directions for Unconstrained Optimization

EECS260 Optimization Lecture notes

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)

Applied Mathematics 205. Unit I: Data Fitting. Lecturer: Dr. David Knezevic

Linear and Nonlinear Optimization

Chapter 3 Numerical Methods

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

On nonlinear optimization since M.J.D. Powell

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Non-linear least squares

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Written Examination

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Transcription:

Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn No.1 Science Building, 1575

Outline Application examples

Energy minimization: virtual drug design Virtual drug design is to find a best position of a ligand (a small protein molecule) interacting with a large target protein molecule. It is equivalent to an energy minimization problem.

Energy minimization: protein folding Protein folding is to find the minimal energy state of a protein molecule from its sequence structure. It is an outstanding open problem for global optimization in the molecular mechanics.

Energy minimization: mathematical formulation Molecular force field V total = i k ri 2 (ri ri0)2 + i k θi 2 (θi θi0)2 + i V ni 2 (1+cos(nφi γi)) + ij (( σij ) 6 ( σij ) 12) 4ɛ + r ij r ij ij q iq j ɛr ij Webpage for the explanation of the force field Energy minimization problem with respect to all the configuration of the atoms min V total (x 1,..., x N )

Nonlinear least squares Suppose that we have a series of experimental data (t i, y i), i = 1,..., m. We wish to find parameter x R n such that the remainder r i(x) = y i f(t i, x), i = 1,..., m minimized. Mathematically, define error function where r = (r 1,..., r m) such that φ(x) = 1 2 r(x)t r(x) min φ(x). x Because the function f is nonlinear, it is called a nonlinear least square problem.

Optimal control problem Classical optimal control problem: such that the constraint dx dt min = g(x, u), T 0 f(x, u)dt x(0) = x0, x(t ) = xt is satisfied. Here u(t) is the control function, x(t) is the output. It is a nonlinear optimization in function space.

Optimal control problem Example: Isoparametric problem. max u 1 0 x 1(t)dt dx 1 dt = u, dx 2 dt = 1 + u 2. x 1(0) = x 1(1) = 0, x 2(0) = 0, x 2(1) = π 3 x 1 dx 2 dt dx =udt 1 1 t

Outline Application examples

Iterations Iterative methods Object: construct sequence {x k } k=1, such that x k converge to a fixed vector x, and x is the solution of the linear system. General iteration idea: If we want to solve equations g(x) = 0, and the equation x = f(x) has the same solution as it, then construct x k+1 = f(x k ). If x k x, then x = f(x ), thus the root of g(x) is obtained.

Convergence order Suppose an iterating sequence lim x n = x, and where ɛ n is called error bound. If when x n x ɛ n lim ɛn+1 ɛ n = C, 1. 0 < C < 1, x n is called linear convergence; q, q 2, q 3,, q n,, (q < 1) 2. C = 1, x n is called sublinear convergence; 1, 1 2, 1 3,, 1 n, 3. C = 0, x n is called superlinear convergence; 1, 1 2!, 1 3!,, 1 n!,

Convergence order If lim ɛn+1 ɛ p = C, C > 0, p > 1 n then x n is called p-th order convergence. q, q p, q p2,, q pn, Numerical examples for different convergence orders

Remark on p-th order convergence If p = 1, i.e. linear convergence, the number of significant digits is increasing linearly, such as 2, 3, 4, 5,...; If p > 1, the number of significant digits is increasing exponentially (O(p n )). Suppose p = 2, then the number of significant digits is increased as 2, 4, 8, 16,...!! So a very accurate result will be obtained after 4 5 iterations;

Golden section method Suppose there is a triplet (a, x k, c) and f(x k ) < f(a), f(x k ) < f(c), we want to find x k+1 in (a, c) to perform a section. Suppose x k+1 is in (a, x k ). x k+1 a xk c w z 1 w If f(x k+1 ) > f(x k ), then the new search interval is (x k+1, c); If f(x k+1 ) < f(x k ), then the new search interval is (a, x k ).

Golden section method Define and w = x k a c a, 1 w = c x k c a z = x k x k+1. c a If we want to minimize the worst case possibility (for two cases), we must make w = z + (1 w). (w > 1 2 ) Pay attention that w is also obtained from the previous stage of applying same strategy. This scale similarity implies we have z w = 1 w 5 1 w = 0.618 2 This is called Golden section method.

Golden section method Golden section method is a method to find the local minimum of a function f. Golden section method is a linear convergence method. The contraction coefficient is C = 0.618. Golden section method for Example min ϕ(x) = 0.5 xe x2 where a = 0, c = 2.

One dimensional Newton s method Suppose we want to minimize ϕ(x) min ϕ(x) x Taylor expansion at current iteration point x 0 ϕ(x) = ϕ(x 0) + ϕ (x 0)(x x 0) + 1 2 ϕ (x 0)(x x 0) 2 + Local quadratic approximation ϕ(x) g(x) = ϕ(x 0) + ϕ (x 0)(x x 0) + 1 2 ϕ (x 0)(x x 0) 2 Minimize g(x) at g (x) = 0, then Newton s method x 1 = x 0 ϕ (x 0) ϕ (x 0) x k+1 = x k ϕ (x k ) ϕ (x k )

One dimensional Newton s method Graphical explanation Local parabolic approximation x k+1 x k Example where x 0 = 0.5. min ϕ(x) = 0.5 xe x2

One dimensional Newton s method Theorem If ϕ (x ) 0, then Newton s method converges with second order if x 0 is close to x sufficiently. Drawbacks of Newton s method: 1. one needs to compute the second order derivative which is a huge cost (especially for high dimensional case). 2. The initial state x 0 must be very close to x.

High dimensional Newton s method Suppose we want to minimize f(x), x R n min f(x) x Taylor expansion at current iteration point x 0 f(x) = f(x 0) + f(x 0) (x x 0) + 1 2 (x x0)t 2 f(x 0)(x x 0) + Local quadratic approximation f(x) g(x) = f(x 0) + f(x 0) (x x 0) + 1 2 (x x0)t H f (x 0)(x x 0) where H f is the Hessian matrix defined as (H f ) ij = Minimize g(x) at g(x) = 0, then 2 f x i x j. x 1 = x 0 H f (x 0) 1 f(x 0) Newton s method x k+1 = x k H f (x k ) 1 f(x k )

High dimensional Newton s method Example min f(x 1, x 2) = 100(x 2 x 2 1) 2 + (x 1 1) 2 Initial state x 0 = ( 1.2, 1).

Steepest decent method Basic idea: Find a series of decent directions p k and corresponding stepsize α k such that the iterations x k+1 = x k + α k p k and f(x k+1 ) f(x k ). The negative gradient direction f is the steepest decent direction, so choose p k := f(x k ) and choose α k such that min α f(x k + αp k )

Inexact line search To find α such that min α f(x k + αp k ) is equivalent to perform a one dimensional minimization. But it is enough to find an approximate α by the following inexact line search method. Inexact line search is to make the following type of the decent criterion f(x k ) f(x k+1 ) ɛ 0 is satisfied.

Inexact line search An example of inexact line search strategy by half increment (or decrement) method: [a 0, b 0] = [0, + ), α 0 = 1; [a 1, b 1] = [0, 1], α 1 = 1 2 [a 2, b 2] = [0, 1 2 ], α2 = 1 4 ; [a3, b3] = [ 1 4, 1 2 ], α3 = 3 8 1/4 3/8 1/2 a=0 b=1

Steepest decent method Steepest decent method for example Initial state x 0 = ( 1.2, 1). min f(x 1, x 2) = 100(x 2 x 2 1) 2 + (x 1 1) 2

Dumped Newton s method If the initial value of Newton s method is not near the minimum point, a strategy is to apply dumped Newton s method. Choose the decent direction as the Newton s direction p k := H 1 f (x k ) f(x k ) and perform the inexact line search for min α f(x k + αp k )

Conjugate gradient method Recalling conjugate gradient method for quadratic function ϕ(x) = 1 2 xt Ax b T x 1. Initial step: x 0, p 0 = r 0 = b Ax 0 2. Suppose we have x k, r k, p k, the CGM step 2.1 Search the optimal α k along p k ; α k = (r k) T p k (p k ) T Ap k 2.2 Update x k and gradient direction r k ; x k+1 = x k + α k p k, r k+1 = b Ax k+1 2.3 According to the calculation before to form new search direction p k+1 β k = (r k+1) T Ap k (p k ) T Ap k, p k+1 = r k+1 + β k p k

Conjugate gradient method Local quadratic approximation of general nonlinear optimization f(x) f(x 0) + f(x 0) (x x 0) + 1 2 (x x0)t H f (x 0)(x x 0) where H f (x 0) is the Hessian of f at x 0. Apply conjugate gradient method to the quadratic function above successively. The computation of β k needs the formation of Hessian matrix H f (x 0) which is a formidable task! Equivalent transformation in the quadratic case β k = (r k+1) T Ap k = ϕ(x k+1) 2 (p k ) T Ap k ϕ(x k ) 2 This formula does NOT need the computation of Hessian matrix.

Conjugate gradient method for nonlinear optimization Formally generalize CGM to nonlinear optimization 1. Given initial x 0 and ɛ > 0; 2. Compute g 0 = f(x 0) and p 0 = g 0, k = 0; 3. Compute λ k from and min λ f(x k + λp k ) x k+1 = x k + λ k p k, g k+1 = f(x k+1 ) 4. If g k+1 ɛ, the iteration is over. Otherwise compute µ k+1 = g k+1 2 g k 2 p k+1 = g k+1 + µ k+1 p k Set k = k + 1, iterate until convergence.

Conjugate gradient method In realistic computations, because there is only n conjugate gradient directions for n dimensional problem, it often restarts from current point after n iterations.

Conjugate gradient method CGM for example Initial state x 0 = ( 1.2, 1). min f(x 1, x 2) = 100(x 2 x 2 1) 2 + (x 1 1) 2

Variable metric method A general form of iterations x k+1 = x k λ k H k f(x k ) 1. If H k = I, it is steepest decent method; 2. If H k = [ 2 f(x k )] 1, it is dumped Newton s method. In order to keep the fast convergence of Newton s method, we hope to approximate [ 2 f(x k )] 1 as H k with reduced computational efforts as H k+1 = H k + C k, where C k is a correction matrix which is easily computed.

Variable metric method First consider quadratic function f(x) = a + b T x + 1 2 xt Gx we have f(x) = b + Gx Define g(x) = f(x), g k = g(x k ), then g k+1 g k = G(x k+1 x k ). Define x k = x k+1 x k, g k = g k+1 g k we have G x k = g k.

Variable metric method For general nonlinear function f(x) f(x k+1 )+ f(x k+1 ) (x x k+1 )+ 1 2 (x x k+1) T H f (x k+1 )(x x k+1 ). Similar procedure as above we have [H f (x k+1 )] 1 g k = x k. As H k+1 is a approximation of [H f (x k+1 )] 1, it must satisfy H k+1 g k = x k.

DFP method Davidon-Fletcher-Powell method: Choose C k as rank-2 correction matrix C k = α k uu T + β k vv T where α k, β k, u, v are undetermined variables. From H k+1 = H k + C k and H k+1 g k = x k we have α k u(u T g k ) + β k v(v T g k ) = x k H k g k Take u = H k g k, v = x k and α k = 1 u T g k, β k = We obtain the famous DFP method H k+1 = H k H k g k g T k H k g T k H k g k 1 v T g k + x k x T k x T k g k

Remark on DFP method If f(x) is quadratic and H 0 = I, then the result will converge in n steps theoretically; If f(x) is strictly convex, the DFP method is convergent globally. If H k is SPD and g k 0, then H k+1 is SPD also.

DFP method DFP method for example Initial state x 0 = ( 1.2, 1). min f(x 1, x 2) = 100(x 2 x 2 1) 2 + (x 1 1) 2

BFGS method The most popular variable metric method is BFGS (Broyden-Fletcher-Goldfarb-Shanno) method shown as below H k+1 = H k H k g k g T k H k g T k H k g k where + x k x T k x T k g + ( g T k H k g k )v k v T k k v k = x k x T k g H k g k k g T k H k g k BFGS is more stable than DFP method; BFGS is also a rank-2 correction method for H k.

Nonlinear least squares Mathematically, nonlinear least squares is to minimize φ(x) = 1 2 r(x)t r(x) We have m φ(x) = J T (x)r(x), H φ (x) = J T (x)j(x) + r i(x)h ri (x) i=1 where J(x) is the Jacobian matrix of r(x). Direct Newton s method for increment s k in nonlinear least squares H φ (x k )s k = φ(x k )

Gauss-Newton method If make the assumption that the residual r i(x) is very small, we will drop the term m i=1 ri(x)hr i (x) in Newton s method and we obtain Gauss-Newton method (J T (x k )J(x k ))s k = φ(x k ) Gauss-Newton method is equivalent to solve a sequence of linear least squares problems to approximate the nonlinear least squares.

Levenberg-Marquardt method If the Jacobian J(x) is ill-conditioned, one may take the Levenberg-Marquardt method as (J T (x k )J(x k ) + µ k I)s k = φ(x k ) where µ k is a nonnegative parameter chosen by some strategy. L-M method may be viewed as a regularization method for Gauss-Newton method.

Homework assignment Newton s method and BFGS method for example Initial state x 0 = ( 1.2, 1). min f(x 1, x 2) = 100(x 2 x 2 1) 2 + (x 1 1) 2

References 1. 2004 2. J.F. Bonnans et al., Numerical optimization: Theoretical and practical aspects, Universitext, Springer, Berlin, 2003.