Continuous Optimization

Similar documents
Nonlinear Equations and Continuous Optimization

Eigenvalue Problems and Singular Value Decomposition

Numerical solutions of nonlinear systems of equations

Motivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)

17 Solution of Nonlinear Systems

Application of the LLL Algorithm in Sphere Decoding

Computational Methods. Least Squares Approximation/Optimization

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

13. Nonlinear least squares

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Lecture 7: Minimization or maximization of functions (Recipes Chapter 10)

The Conjugate Gradient Method

Numerical Methods - Numerical Linear Algebra

Intro Polynomial Piecewise Cubic Spline Software Summary. Interpolation. Sanzheng Qiao. Department of Computing and Software McMaster University

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

Lecture 7 Unconstrained nonlinear programming

Intro Polynomial Piecewise Cubic Spline Software Summary. Interpolation. Sanzheng Qiao. Department of Computing and Software McMaster University

Stat751 / CSI771 Midterm October 15, 2015 Solutions, Comments. f(x) = 0 otherwise

Optimization. Next: Curve Fitting Up: Numerical Analysis for Chemical Previous: Linear Algebraic and Equations. Subsections

Orthogonal Transformations

Numerical Integration

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

4TE3/6TE3. Algorithms for. Continuous Optimization

C&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz

EECS 275 Matrix Computation

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3

Math 411 Preliminaries

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

Hence a root lies between 1 and 2. Since f a is negative and f(x 0 ) is positive The root lies between a and x 0 i.e. 1 and 1.

nonrobust estimation The n measurement vectors taken together give the vector X R N. The unknown parameter vector is P R M.

MATH 350: Introduction to Computational Mathematics

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Linear and Nonlinear Optimization

CSL361 Problem set 4: Basic linear algebra

Lecture 3: QR-Factorization

Optimization and Calculus

QR Decomposition. When solving an overdetermined system by projection (or a least squares solution) often the following method is used:

Nonlinear Optimization for Optimal Control

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

COMP 558 lecture 18 Nov. 15, 2010

10.3 Steepest Descent Techniques

Numerical optimization

MATH 350: Introduction to Computational Mathematics

1 Number Systems and Errors 1

Chapter 3 Numerical Methods

Lattice Basis Reduction Part II: Algorithms

3 QR factorization revisited

Matrix Derivatives and Descent Optimization Methods

Integer Least Squares: Sphere Decoding and the LLL Algorithm

Householder reflectors are matrices of the form. P = I 2ww T, where w is a unit vector (a vector of 2-norm unity)

Preface to Second Edition... vii. Preface to First Edition...

Applied Numerical Linear Algebra. Lecture 8

18.06 Problem Set 2 Solution

Solving linear equations with Gaussian Elimination (I)

CLASS NOTES Computational Methods for Engineering Applications I Spring 2015

Numerical Optimization

Nonlinear Programming

M.SC. PHYSICS - II YEAR

Math 5630: Iterative Methods for Systems of Equations Hung Phan, UMass Lowell March 22, 2018

Poisson Equation in 2D

Lecture 4 Eigenvalue problems

Caculus 221. Possible questions for Exam II. March 19, 2002

Lecture V. Numerical Optimization

Math and Numerical Methods Review

Numerical solution of Least Squares Problems 1/32

Trust-region methods for rectangular systems of nonlinear equations

Math 5630: Conjugate Gradient Method Hung M. Phan, UMass Lowell March 29, 2019

Developing an Algorithm for LP Preamble to Section 3 (Simplex Method)

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Mathematical optimization

Least-Squares Fitting of Model Parameters to Experimental Data

Definitions & Theorems

Optimization. Totally not complete this is...don't use it yet...

Optimization Methods

Lecture 9: MATH 329: Introduction to Scientific Computing

10. Unconstrained minimization

Lecture 6, Sci. Comp. for DPhil Students

Solving Ordinary Differential Equations

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Variations on Backpropagation

Orthonormal Transformations and Least Squares

Orthonormal Transformations

Notes for Numerical Analysis Math 5465 by S. Adjerid Virginia Polytechnic Institute and State University. (A Rough Draft)

1 Numerical optimization

Introduction to Applied Linear Algebra with MATLAB

Conjugate Gradient Method

Optimization Methods

Computational Methods

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MATH 409 Advanced Calculus I Lecture 11: More on continuous functions.

AMS526: Numerical Analysis I (Numerical Linear Algebra)

8 Numerical methods for unconstrained problems

Algorithms for Constrained Optimization

Maximum and Minimum Values section 4.1

AMS526: Numerical Analysis I (Numerical Linear Algebra)

S.F. Xu (Department of Mathematics, Peking University, Beijing)

Advanced Techniques for Mobile Robotics Least Squares. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

1 Numerical optimization

Transcription:

Continuous Optimization Sanzheng Qiao Department of Computing and Software McMaster University March, 2009

Outline 1 Introduction 2 Golden Section Search 3 Multivariate Functions Steepest Descent Method 4 Linear Least Squares Problem 5 Nonlinear Least Squares Newton s Method Gauss-Newton Method 6 Software Packages

Outline 1 Introduction 2 Golden Section Search 3 Multivariate Functions Steepest Descent Method 4 Linear Least Squares Problem 5 Nonlinear Least Squares Newton s Method Gauss-Newton Method 6 Software Packages

Problem setting Single variable functions. Minimization: min f(x) x S f(x): objective function, single variable and real-valued S: support

Outline 1 Introduction 2 Golden Section Search 3 Multivariate Functions Steepest Descent Method 4 Linear Least Squares Problem 5 Nonlinear Least Squares Newton s Method Gauss-Newton Method 6 Software Packages

Golden section search Assumption: f(x) has a unique global minimum in [a, b].

Golden section search Assumption: f(x) has a unique global minimum in [a, b]. If x is the minimizer, then f(x) monotonically decreases in [a, x ] and monotonically increases in [x, b].

Golden section search Assumption: f(x) has a unique global minimum in [a, b]. If x is the minimizer, then f(x) monotonically decreases in [a, x ] and monotonically increases in [x, b]. Algorithm Choose interior points c, d: c = a + r(b a) d = a + (1 r)(b a), 0 < r < 0.5 if f(c) f(d) b = d else a = c end Each step, the length of the interval is reduced by a factor of (1 r).

Golden section search (cont.) The choice of r: When f(c) f(d), d + = c (the next d is c) When f(c) > f(d), c + = d (the next c is d) Why? Reduce the number of function evaluations

Choice of r When f(c) f(d), b + = d, d + = a + (1 r)(b + a) = a + (1 r)(d a) then d + = c means a + (1 r)(d a) = a + r(b a) which implies (1 r) 2 = r. When f(c) > f(d), a + = c, then c + = d means c + = c + r(b c) = a + (1 r)(b a) which also implies (1 r) 2 = r. Thus we have r = 3 5 2

Algorithm c = a + r*(b - a); fc = f(c); d = a + (1-r)*(b - a); fd = f(d); if fc <= fd b = d; fb = fd; d = c; fd = fc; c = a + r*(b-a); fc = f(c); else a = c; fa = fc; c = d; fc = fd; d = a + (1-r)*(b-a); fd = f(d); end

Convergence and termination Convergence rate: Each step reduces the length of the interval by a factor of 1 r = 1 3 5 2 0.618

Convergence and termination Convergence rate: Each step reduces the length of the interval by a factor of 1 r = 1 3 5 2 Termination criteria: (d c) u max( c, d ) or a tolerance. 0.618

Outline 1 Introduction 2 Golden Section Search 3 Multivariate Functions Steepest Descent Method 4 Linear Least Squares Problem 5 Nonlinear Least Squares Newton s Method Gauss-Newton Method 6 Software Packages

Problem setting min f(x) where x is a vector (of variables x 1, x 2,..., x n ).

Problem setting min f(x) where x is a vector (of variables x 1, x 2,..., x n ). Gradient f(x c ) = f(x c) x 1. f(x c) x n

Problem setting min f(x) where x is a vector (of variables x 1, x 2,..., x n ). Gradient f(x c ) = f(x c) x 1. f(x c) x n f(x c ): the direction of greatest decrease from x c

Steepest descent method Idea: Steepest descent direction: s c = f(x c ); Find λ c such that f(x c + λ c s c ) f(x c + λs c ), for all λ R (single variable minimization problem); x + = x c + λ c s c.

Steepest descent method Idea: Steepest descent direction: s c = f(x c ); Find λ c such that f(x c + λ c s c ) f(x c + λs c ), for all λ R (single variable minimization problem); x + = x c + λ c s c. Remark. Conjugate gradient method: Use conjugate gradient to replace gradient.

Outline 1 Introduction 2 Golden Section Search 3 Multivariate Functions Steepest Descent Method 4 Linear Least Squares Problem 5 Nonlinear Least Squares Newton s Method Gauss-Newton Method 6 Software Packages

Problem setting Given a matrix A (m-by-n, m n) and b (m-by-1), find x (n-by-1) minimizing Ax b 2 2. Example. Square root problem revisited. Find a 1 and a 2 in y(x) = a 1 x + a 2, such that (y(0.25) 0.25) 2 + (y(0.5) 0.5) 2 + (y(1.0) 1.0) 2 is minimized. In matrix-vector form: 0.25 1 A = 0.5 1, x = 1.0 1 [ a1 a 2 ], b = 0.25 0.5 1.0.

Method Transform A into a triangular matrix: [ ] R PA = 0 where R is upper triangular. Then the problem becomes Ax b 2 2 = P 1 ( Rx Pb) 2 2 where R = [ R 0 ].

Method (cont.) Desirable properties of P: P 1 is easy to compute; P 1 z 2 2 = z 2 2 for any z. Partitioning Pb = [ b1 then the LS solution is the solution of the triangular system b 2 Rx = b 1. ],

Choice of P Orthogonal matrix (transformation) Q: Q 1 = Q T. Example. Givens rotation [ G = cosθ sin θ sin θ cosθ ] Introducing a zero into a 2-vector: [ ] x1 G = x 2 [ 0 ] i.e., rotate x onto x 1 -axis.

Givens rotation cos θ = x 1 x 2 1 + x 2 2 sin θ = x 2 x 2 1 + x 2 2 Algorithm. if x(2) = 0 c =1.0; s = 0.0; elseif abs(x(2)) >= abs(x(1)) ct = x(1)/x(2); s = 1/sqrt(1 + ct*ct); c = s*ct; else t = x(2)/x(1); c = 1/sqrt(1 + t*t); s = c*t; end

Givens rotation (cont.) In general, G 13 = G 13 c 0 s 0 0 1 0 0 s 0 c 0 0 0 0 1 x 1 x 2 x 3 x 4 = x 2 0 x 4 Select a pair (x i, x j ), find a rotation G ij to eliminate x j.

QR factorization 0 0 0 0 0 0 0 0 G 34 G 24 G 23 G 14 G 13 G 12 A = [ R 0 Q = G T 12 GT 13 GT 14 GT 23 GT 24 GT 34 A = QR 0 0 ] 0 0 0 0 0

Householder transformation Basically, in the QR decomposition, we introduce zeros below the main diagonal of A using orthogonal transformations. Another example. Householder transformation H = I 2uu T with u T u = 1 H is symmetric and orthogonal (H 2 = I). Goal: Ha = αe 1. Choose u = a ± a 2 e 1 A geometric interpretation: u a a u e 1 b (a) (b)

Householder transformation (cont.) Normalize u using u 2 2 = 2( a 2 2 ± a 1 a 2 ) for efficiency. Algorithm. Given an n-vector x, this algorithm returns σ, α, and u such that (I σ 1 uu T )x = αe 1. m = max(abs(x)); u = x/m; alpha = sign(u(1))*norm(u); u(1)= u(1) + alpha; sigma = alpha*u(1); alpha = m*alpha;

Framework A framework of the QR decomposition method for solving the linear least squares problem min Ax b 2 Using orthogonal transformations to triangularize A, applying the transformations to b simultaneously; Solving the resulting triangular system.

Outline 1 Introduction 2 Golden Section Search 3 Multivariate Functions Steepest Descent Method 4 Linear Least Squares Problem 5 Nonlinear Least Squares Newton s Method Gauss-Newton Method 6 Software Packages

Problem setting Multivariate vector-valued function f 1 (x) f(x) = find the solution of. f m (x) 1 ρ(x) = min x 2 R m, m f i (x) 2 i=1 x R n

Problem setting Multivariate vector-valued function f 1 (x) f(x) = find the solution of. f m (x) 1 ρ(x) = min x 2 Application: Model fitting problem. R m, m f i (x) 2 i=1 x R n

Newton s Method Idea: Solve ρ(x) = 0. (Root finding problem).

Newton s Method Idea: Solve ρ(x) = 0. (Root finding problem). At each step, find the correction s c (x + = x c + s c ) satisfying 2 ρ(x c )s c = ρ(x c )

Newton s Method Idea: Solve ρ(x) = 0. (Root finding problem). At each step, find the correction s c (x + = x c + s c ) satisfying 2 ρ(x c )s c = ρ(x c ) Note. This is Newton s method for solving nonlinear systems.

Newton t method (cont.) What is the gradient ρ(x c )?

Newton t method (cont.) What is the gradient ρ(x c )? where the Jacobian ρ(x c ) = J(x c ) T f(x c ) [ ] fi (x c ) J(x c ) = x j

Newton t method (cont.) What is the gradient ρ(x c )? ρ(x c ) = J(x c ) T f(x c ) where the Jacobian [ ] fi (x c ) J(x c ) = x j How to get 2 ρ(x c )?

Newton t method (cont.) What is the gradient ρ(x c )? ρ(x c ) = J(x c ) T f(x c ) where the Jacobian [ ] fi (x c ) J(x c ) = x j How to get 2 ρ(x c )? m 2 ρ(x c ) = J(x c ) T J(x c ) + f i (x c ) 2 f i (x c ) i=1

Newton t method (cont.) What is the gradient ρ(x c )? where the Jacobian How to get 2 ρ(x c )? ρ(x c ) = J(x c ) T f(x c ) [ ] fi (x c ) J(x c ) = x j 2 ρ(x c ) = J(x c ) T J(x c ) + m f i (x c ) 2 f i (x c ) If x fits the model well (f i (x ) 0) and x c is close to x, then f i (x c ) 0. Then i=1 2 ρ(x c ) J(x c ) T J(x c ).

Gauss-Newton Method Evaluate f c = f(x c ) and compute the Jacobian J c = J(x c ); Solve (J T c J c )s c = J T c f c for s c ; Update x + = x c + s c ;

Gauss-Newton Method Evaluate f c = f(x c ) and compute the Jacobian J c = J(x c ); Solve (J T c J c )s c = J T c f c for s c ; Update x + = x c + s c ; Note. s c is the solution to the normal equations for the linear least squares problem: min s ( J c s + f c 2 ) Reliable methods such as the QR decomposition method can be used to solve for s c.

Gauss-Newton Method Evaluate f c = f(x c ) and compute the Jacobian J c = J(x c ); Solve (J T c J c )s c = J T c f c for s c ; Update x + = x c + s c ; Note. s c is the solution to the normal equations for the linear least squares problem: min s ( J c s + f c 2 ) Reliable methods such as the QR decomposition method can be used to solve for s c. Remark. Gauss-Newton method works well on small residual (f i (x ) 0) problems.

Outline 1 Introduction 2 Golden Section Search 3 Multivariate Functions Steepest Descent Method 4 Linear Least Squares Problem 5 Nonlinear Least Squares Newton s Method Gauss-Newton Method 6 Software Packages

Software packages IMSL uvmif, uminf, umiah, unlsf, flprs, nconf, ncong MATLAB fmin, fmins, leastsq, lp, constr NAG e04abf, e04jaf, e04laf, e04fdf, e04mbf, e04vdf MINPACK lmdif1 NETLIB varpro, dqed Octave sqp, ols, gls

Summary Problem setting: Real valued objective function Golden section search: Convergence rate Direction of descent: Steepest descent Linear least squares: Data fitting, QR decomposition or triangularization of a matrix using orthogonal transformations (rotation, Householder transformation) Nonlinear least squares: Newton s method (relation with solving nonlinear systems), Gauss-Newton method (relation with solving linear least squares)