Numerical Optimization

Similar documents
MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Line Search Methods for Unconstrained Optimisation

Numerical solutions of nonlinear systems of equations

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Conjugate Gradient (CG) Method

Scientific Computing: Optimization

Chapter 3 Transformations

Linear Algebra in Actuarial Science: Slides to the lecture

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

14. Nonlinear equations

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

REVIEW OF DIFFERENTIAL CALCULUS

Introduction to unconstrained optimization - direct search methods

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Linear Algebra II Lecture 22

Fundamentals of Unconstrained Optimization

B553 Lecture 5: Matrix Algebra Review

Conjugate Gradient Method

1 Last time: least-squares problems

LINEAR ALGEBRA QUESTION BANK

Numerical Methods - Numerical Linear Algebra

Chapter 2: Unconstrained Extrema

MATH 4211/6211 Optimization Basics of Optimization Problems

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Introduction to gradient descent

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

Unconstrained Optimization

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Chap 3. Linear Algebra

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

17 Solution of Nonlinear Systems

Unconstrained optimization

Numerical methods for eigenvalue problems

Neural Network Training

Matrices and Linear transformations

Symmetric matrices and dot products

Review problems for MA 54, Fall 2004.

Arc Search Algorithms

Lecture 6 Positive Definite Matrices

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

the method of steepest descent

Lagrange multipliers. Portfolio optimization. The Lagrange multipliers method for finding constrained extrema of multivariable functions.

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n.

MATH 304 Linear Algebra Lecture 33: Bases of eigenvectors. Diagonalization.

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Positive Definite Matrix

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Lecture Notes: Geometric Considerations in Unconstrained Optimization

22.4. Numerical Determination of Eigenvalues and Eigenvectors. Introduction. Prerequisites. Learning Outcomes

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Lecture 10 - Eigenvalues problem

Iterative methods for Linear System

Cheat Sheet for MATH461

Iterative Methods for Solving A x = b

Homework 2 Foundations of Computational Math 2 Spring 2019

Linear Algebra Massoud Malek

Section 7.3: SYMMETRIC MATRICES AND ORTHOGONAL DIAGONALIZATION

Nonlinear Optimization: What s important?

MATH 1553-C MIDTERM EXAMINATION 3

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

Math 411 Preliminaries

NonlinearOptimization

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Linear Algebra Practice Problems

Multidisciplinary System Design Optimization (MSDO)

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

HW3 - Due 02/06. Each answer must be mathematically justified. Don t forget your name. 1 2, A = 2 2

8 Numerical methods for unconstrained problems

Elements of linear algebra

THE EIGENVALUE PROBLEM

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3

Multivariate Newton Minimanization

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

Gradient Descent. Dr. Xiaowei Huang

Unconstrained minimization of smooth functions

Review for Exam 2 Ben Wang and Mark Styczynski

CPSC 540: Machine Learning

LINEAR SYSTEMS (11) Intensive Computation

Linear Algebra II Lecture 13

Lecture 15 Review of Matrix Theory III. Dr. Radhakant Padhi Asst. Professor Dept. of Aerospace Engineering Indian Institute of Science - Bangalore

There are six more problems on the next two pages

MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Algebra II. Paulius Drungilas and Jonas Jankauskas

Functions of Several Variables

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

Introduction to Unconstrained Optimization: Part 2

lecture 2 and 3: algorithms for linear algebra

Convex Optimization. Problem set 2. Due Monday April 26th

Computational Intelligence Winter Term 2017/18

Transcription:

Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17

Partial derivative of a two variable function Given a two variable function f (x 1, x 2 ). The partial derivative of f with respect to x i is f f (x 1 + h, x 2 ) f (x 1, x 2 ) = lim h 0 x 1 h f f (x 1, x 2 + h) f (x 1, x 2 ) = lim h 0 x 2 h The meaning of partial derivative: let F (x 1 ) = f (x 1, v) and G(x 2 ) = f (u, x 2 ), f x 1 (x 1, v) = F (x 1 ). f x 2 (u, x 2 ) = G (x 2 ). (UNIT 2) Numerical Optimization February 28, 2011 2 / 17

Gradient Definition The gradient of a function f : R n R is a vector in R n defined as f / x 1 x 1 g = f ( x) =., where x =. f / x n x n (UNIT 2) Numerical Optimization February 28, 2011 3 / 17

Directional derivative Definition The directional derivative of a function f : R n R in the direction p is defined as D(f ( x), p) = lim h 0 f ( x + h p) f (x) h. Remark If f : R n R is continuously differentiable in a neighborhood of x, for any vector p. D(f ( x), p) = f (x) T p, (UNIT 2) Numerical Optimization February 28, 2011 4 / 17

The descent directions A direction p is called a descent direction of f ( x) at x if D(f ( x 0 ), p) < 0. If f is smooth enough, p is a descent direction if f ( x 0 ) T p < 0. Which direction p makes f ( x 0 + p) decreasing most? Mean Value theorem f ( x 0 + p) = f ( x 0 ) + f ( x 0 + α p) p p = f ( x 0 ) is called the steepest descent direction of f (x) at x 0. f ( x 0 + p) = f ( x 0 ) + f ( x 0 + α p) p f ( x 0 ) f ( x 0 ) T f ( x 0 ) (UNIT 2) Numerical Optimization February 28, 2011 5 / 17

The steepest descent algorithm The steepest descent algorithm For k = 1, 2,... until convergence Compute p k = f (x k ) Find α k (0, 1) s,t, F (α k ) = f ( x k + α k p k ) is minimized. x k+1 = x k + α k p k You can use any single variable optimization techniques to compute α k. If F (α k ) = f ( x k + α k p k ) is a quadratic function, α k has a theoretical formula. (will be derived in next slides.) If F (α k ) = f ( x k + α k p k ) is more than a quadratic function, we may approximate it by a quadratic model and use the formula to solve α k. Higher order polynomial approximation will be mentioned in the line search algorithm. (UNIT 2) Numerical Optimization February 28, 2011 6 / 17

Quadratic model If f ( x) is a quadratic function, we can write it as f (x, y) = ax 2 + bxy + cy 2 + dx + ey + f (0, 0). If f is smooth, the derivatives of f are Let x = ( x y f x = 2ax + by + d, f y 2 f x 2 = 2a, 2 f y 2 = 2c, ), f ( x) can be expressed as f ( x) = 1 ( 2 x T 2a b b 2c = 2cy + bx + e 2 f x y = ) x + x T ( d e 2 f y x = b. ) + f ( 0). (UNIT 2) Numerical Optimization February 28, 2011 7 / 17

Gradient and Hessian The gradient of f, as defined before, is f ( g( x) = f ( x) = x 2a b f = b 2c y ) ( d x + e The second derivative, which is a matrix called Hessian, is f f ( 2 f ( x) = H( x) = x 2 x y 2a b f f = b 2c y x y 2 ) ) Therefore, f ( x) = 1/2 x T H( 0) x + g( 0) T x + f ( 0), f ( x) = H x + g, and 2 f = H In the following lectures, we assume H is symmetric. Thus, H = H T. (UNIT 2) Numerical Optimization February 28, 2011 8 / 17

Optimal α k for quadratic model We denote H k = H( x k ), g k = g( x k ), and f k = f ( x k ). Also, H = H( 0), g = g( 0), and f = f ( 0). F (α) = f ( x k + α p k ) = 1 2 ( x k + α p k ) T H( x k + α p k ) + g T ( x k + α p k ) + f ( 0) = 1 2 x k T H x k + g T x k + f ( 0) + α(h x k + g) T p k + α2 2 p k T H p k = f k + α g k T p k + α2 2 p k T k F (α) = g k T p k + α p k T k The optimal solution of α k is at F (α) = 0, which is α k = g k T p k H p k p T k (UNIT 2) Numerical Optimization February 28, 2011 9 / 17

Optimal condition Theorem (Necessary and sufficient condition of optimality) Let f : R n R be continuously differentiable in D. If x D is a local minimizer, f ( x ) = 0 and 2 f ( x) is positive semidefinite. If f ( x ) = 0 and 2 f ( x) is positive definite, then x is a local minimizer. Definition A matrix H is called positive definite if for any nonzero vector v R n, v H v > 0. H is called positive semidefinite if v H v 0 for all v R n. H is negative definite or negative semidefinite if H is positive definite or positive semidefinite. H is indefinite if it is neither positive semidefinite nor negative semidefinite. (UNIT 2) Numerical Optimization February 28, 2011 10 / 17

Convergence of the steepest descent method Theorem (Convergence theorem of the steepest descent method) If the steepest descent method converges to a local minimizer x, where 2 f ( x) is positive definite, and e max and e min are the largest and the smallest eigenvalue of 2 f ( x), then Definition x k+1 x lim k x k x ( ) emax e min e max + e min For a scalar λ and an unit vector v, (λ, v) is an eigenpair of of a matrix H if Hv = λv. The scalar λ is called an eigenvalue of H, and v is called an eigenvector. (UNIT 2) Numerical Optimization February 28, 2011 11 / 17

Newton s method We use the quadratic model to find the step length α k. Can we use the quadratic model to find the search direction p k? Yes, we can. Recall the quadratic model (now p is the variable.) f ( x k + p) 1 2 p T H k p + p T g k + f k Compute the gradient p f ( x k + p) = H k p + g k The solution of p f ( x k + p) = 0 is p k = H 1 k g k. Newton s method uses p k as the search direction Newton s method 1 Given an initial guess x 0 2 For k = 0, 1, 2,... until converge x k+1 = x k H 1 k g k. (UNIT 2) Numerical Optimization February 28, 2011 12 / 17

Descent direction The direction p k = H 1 k g k is called Newton s direction Is p k a descent direction? (what s the definition of descent directions?) We only need to check if g T k p k < 0. g T k p k = g T k H 1 k g k. Thus, p k is a descent direction if H 1 is positive definite. For a symmetric matrix H, the following conditions are equivalent H is positive definite. H 1 is positive definite. All the eigenvalues of H are positive. (UNIT 2) Numerical Optimization February 28, 2011 13 / 17

Some properties of eigenvalues/eigenvectors A symmetric matrix H, of order n has n real eigenvalues and n real and linearly independent (orthogonal) eigenvectors Hv 1 = λ 1 v 1, Hv 2 = λ 2 v 2,..., Hv n = λ n v n λ 1 λ 2 Let V = [v 1 v 2... v n ], Λ =, HV = V Λ.... λn If λ 1, λ 2,..., λ n are nonzero, since H = V ΛV 1, 1/λ 1 H 1 = V Λ 1 V 1, Λ 1 1/λ 2 =... 1/λn The eigenvalues of H 1 are 1 λ 1, 1 λ 2,..., 1 λ n. (UNIT 2) Numerical Optimization February 28, 2011 14 / 17

How to solve H p = g? For a symmetric positive definite matrix H, H p = g can be solved by Cholesky decomposition, which is similar to LU decomposition, but is only half computational cost of LU decomposition. h 11 h 12 h 13 Let H = h 21 h 22 h 23, where h 12 = h 21, h 13 = h 31, h 23 = h 32. h 31 h 32 h 33 Cholesky decomposition makes H = LL T, where L is a lower l 11 triangular matrix, L = l 21 l 22 l 31 l 32 l 33 Using Cholesky decomposition, H p = g can be solved by 1 Compute H = LL T 2 p = (L T ) 1 L 1 g In Matlab, use p = H \ g. Don t use inv(h). (UNIT 2) Numerical Optimization February 28, 2011 15 / 17

The Cholesky decomposition For i = 1, 2,..., n l ii = h ii For j = i + 1, i + 2,..., n l ji = h ji l ii For k = i + 1, i + 2,..., j h jk = h jk l ji l ki h 11 h 12 h 13 h 21 h 22 h 23 h 31 h 32 h 33 l 11 = h 11 l 21 = h 21 /l 11 l 31 = h 31 /l 11 =LL T = l 2 11 l 11 l 21 l 11 l 31 l 11 l 21 l 2 21 + l2 22 l 21 l 31 + l 22 l 32 l 11 l 33 l 21 l 31 + l 22 l 32 l 2 31 + l2 32 + l2 33 h (2) 22 = h 22 l 21 l 21 h (2) 32 = h 32 l 21 l 31 h (2) 33 = h 33 l 31 l 31 l 22 = h (2) 22 l 32 = h (2) 32 /l 22 l 33 = h (2) 33 l 32l 32 (UNIT 2) Numerical Optimization February 28, 2011 16 / 17

Convergence of Newton s method Theorem Suppose f is twice differentiable. 2 f is continuous in a neighborhood of x and 2 f ( x ) is positive definite, and if x 0 is sufficiently close to x, the sequence converges to x quadratically. Three problems of Newton s method 1 H may not be positive definite Modified Newton s method + Line search. 2 H is expensive to compute Quasi-Newton. 3 H 1 is expensive to compute Conjugate gradient. (UNIT 2) Numerical Optimization February 28, 2011 17 / 17