Continuous Optimization - PDF Free Download

Continuous Optimization Sanzheng Qiao Department of Computing and Software McMaster University March, 2009

Outline 1 Introduction 2 Golden Section Search 3 Multivariate Functions Steepest Descent Method 4 Linear Least Squares Problem 5 Nonlinear Least Squares Newton s Method Gauss-Newton Method 6 Software Packages

Problem setting Single variable functions. Minimization: min f(x) x S f(x): objective function, single variable and real-valued S: support

Golden section search Assumption: f(x) has a unique global minimum in [a, b].

Golden section search Assumption: f(x) has a unique global minimum in [a, b]. If x is the minimizer, then f(x) monotonically decreases in [a, x ] and monotonically increases in [x, b].

Golden section search Assumption: f(x) has a unique global minimum in [a, b]. If x is the minimizer, then f(x) monotonically decreases in [a, x ] and monotonically increases in [x, b]. Algorithm Choose interior points c, d: c = a + r(b a) d = a + (1 r)(b a), 0 < r < 0.5 if f(c) f(d) b = d else a = c end Each step, the length of the interval is reduced by a factor of (1 r).

Golden section search (cont.) The choice of r: When f(c) f(d), d + = c (the next d is c) When f(c) > f(d), c + = d (the next c is d) Why? Reduce the number of function evaluations

Choice of r When f(c) f(d), b + = d, d + = a + (1 r)(b + a) = a + (1 r)(d a) then d + = c means a + (1 r)(d a) = a + r(b a) which implies (1 r) 2 = r. When f(c) > f(d), a + = c, then c + = d means c + = c + r(b c) = a + (1 r)(b a) which also implies (1 r) 2 = r. Thus we have r = 3 5 2

Algorithm c = a + r*(b - a); fc = f(c); d = a + (1-r)*(b - a); fd = f(d); if fc <= fd b = d; fb = fd; d = c; fd = fc; c = a + r*(b-a); fc = f(c); else a = c; fa = fc; c = d; fc = fd; d = a + (1-r)*(b-a); fd = f(d); end

Convergence and termination Convergence rate: Each step reduces the length of the interval by a factor of 1 r = 1 3 5 2 0.618

Convergence and termination Convergence rate: Each step reduces the length of the interval by a factor of 1 r = 1 3 5 2 Termination criteria: (d c) u max( c, d ) or a tolerance. 0.618

Problem setting min f(x) where x is a vector (of variables x 1, x 2,..., x n ).

Problem setting min f(x) where x is a vector (of variables x 1, x 2,..., x n ). Gradient f(x c ) = f(x c) x 1. f(x c) x n

Problem setting min f(x) where x is a vector (of variables x 1, x 2,..., x n ). Gradient f(x c ) = f(x c) x 1. f(x c) x n f(x c ): the direction of greatest decrease from x c

Steepest descent method Idea: Steepest descent direction: s c = f(x c ); Find λ c such that f(x c + λ c s c ) f(x c + λs c ), for all λ R (single variable minimization problem); x + = x c + λ c s c.

Steepest descent method Idea: Steepest descent direction: s c = f(x c ); Find λ c such that f(x c + λ c s c ) f(x c + λs c ), for all λ R (single variable minimization problem); x + = x c + λ c s c. Remark. Conjugate gradient method: Use conjugate gradient to replace gradient.

Problem setting Given a matrix A (m-by-n, m n) and b (m-by-1), find x (n-by-1) minimizing Ax b 2 2. Example. Square root problem revisited. Find a 1 and a 2 in y(x) = a 1 x + a 2, such that (y(0.25) 0.25) 2 + (y(0.5) 0.5) 2 + (y(1.0) 1.0) 2 is minimized. In matrix-vector form: 0.25 1 A = 0.5 1, x = 1.0 1 [ a1 a 2 ], b = 0.25 0.5 1.0.

Method Transform A into a triangular matrix: [ ] R PA = 0 where R is upper triangular. Then the problem becomes Ax b 2 2 = P 1 ( Rx Pb) 2 2 where R = [ R 0 ].

Method (cont.) Desirable properties of P: P 1 is easy to compute; P 1 z 2 2 = z 2 2 for any z. Partitioning Pb = [ b1 then the LS solution is the solution of the triangular system b 2 Rx = b 1. ],

Choice of P Orthogonal matrix (transformation) Q: Q 1 = Q T. Example. Givens rotation [ G = cosθ sin θ sin θ cosθ ] Introducing a zero into a 2-vector: [ ] x1 G = x 2 [ 0 ] i.e., rotate x onto x 1 -axis.

Givens rotation cos θ = x 1 x 2 1 + x 2 2 sin θ = x 2 x 2 1 + x 2 2 Algorithm. if x(2) = 0 c =1.0; s = 0.0; elseif abs(x(2)) >= abs(x(1)) ct = x(1)/x(2); s = 1/sqrt(1 + ct*ct); c = s*ct; else t = x(2)/x(1); c = 1/sqrt(1 + t*t); s = c*t; end

Givens rotation (cont.) In general, G 13 = G 13 c 0 s 0 0 1 0 0 s 0 c 0 0 0 0 1 x 1 x 2 x 3 x 4 = x 2 0 x 4 Select a pair (x i, x j ), find a rotation G ij to eliminate x j.

QR factorization 0 0 0 0 0 0 0 0 G 34 G 24 G 23 G 14 G 13 G 12 A = [ R 0 Q = G T 12 GT 13 GT 14 GT 23 GT 24 GT 34 A = QR 0 0 ] 0 0 0 0 0

Householder transformation Basically, in the QR decomposition, we introduce zeros below the main diagonal of A using orthogonal transformations. Another example. Householder transformation H = I 2uu T with u T u = 1 H is symmetric and orthogonal (H 2 = I). Goal: Ha = αe 1. Choose u = a ± a 2 e 1 A geometric interpretation: u a a u e 1 b (a) (b)

Householder transformation (cont.) Normalize u using u 2 2 = 2( a 2 2 ± a 1 a 2 ) for efficiency. Algorithm. Given an n-vector x, this algorithm returns σ, α, and u such that (I σ 1 uu T )x = αe 1. m = max(abs(x)); u = x/m; alpha = sign(u(1))*norm(u); u(1)= u(1) + alpha; sigma = alpha*u(1); alpha = m*alpha;

Framework A framework of the QR decomposition method for solving the linear least squares problem min Ax b 2 Using orthogonal transformations to triangularize A, applying the transformations to b simultaneously; Solving the resulting triangular system.

Problem setting Multivariate vector-valued function f 1 (x) f(x) = find the solution of. f m (x) 1 ρ(x) = min x 2 R m, m f i (x) 2 i=1 x R n

Problem setting Multivariate vector-valued function f 1 (x) f(x) = find the solution of. f m (x) 1 ρ(x) = min x 2 Application: Model fitting problem. R m, m f i (x) 2 i=1 x R n

Newton s Method Idea: Solve ρ(x) = 0. (Root finding problem).

Newton s Method Idea: Solve ρ(x) = 0. (Root finding problem). At each step, find the correction s c (x + = x c + s c ) satisfying 2 ρ(x c )s c = ρ(x c )

Newton s Method Idea: Solve ρ(x) = 0. (Root finding problem). At each step, find the correction s c (x + = x c + s c ) satisfying 2 ρ(x c )s c = ρ(x c ) Note. This is Newton s method for solving nonlinear systems.

Newton t method (cont.) What is the gradient ρ(x c )?

Newton t method (cont.) What is the gradient ρ(x c )? where the Jacobian ρ(x c ) = J(x c ) T f(x c ) [ ] fi (x c ) J(x c ) = x j

Newton t method (cont.) What is the gradient ρ(x c )? ρ(x c ) = J(x c ) T f(x c ) where the Jacobian [ ] fi (x c ) J(x c ) = x j How to get 2 ρ(x c )?

Newton t method (cont.) What is the gradient ρ(x c )? ρ(x c ) = J(x c ) T f(x c ) where the Jacobian [ ] fi (x c ) J(x c ) = x j How to get 2 ρ(x c )? m 2 ρ(x c ) = J(x c ) T J(x c ) + f i (x c ) 2 f i (x c ) i=1

Newton t method (cont.) What is the gradient ρ(x c )? where the Jacobian How to get 2 ρ(x c )? ρ(x c ) = J(x c ) T f(x c ) [ ] fi (x c ) J(x c ) = x j 2 ρ(x c ) = J(x c ) T J(x c ) + m f i (x c ) 2 f i (x c ) If x fits the model well (f i (x ) 0) and x c is close to x, then f i (x c ) 0. Then i=1 2 ρ(x c ) J(x c ) T J(x c ).

Gauss-Newton Method Evaluate f c = f(x c ) and compute the Jacobian J c = J(x c ); Solve (J T c J c )s c = J T c f c for s c ; Update x + = x c + s c ;

Gauss-Newton Method Evaluate f c = f(x c ) and compute the Jacobian J c = J(x c ); Solve (J T c J c )s c = J T c f c for s c ; Update x + = x c + s c ; Note. s c is the solution to the normal equations for the linear least squares problem: min s ( J c s + f c 2 ) Reliable methods such as the QR decomposition method can be used to solve for s c.

Software packages IMSL uvmif, uminf, umiah, unlsf, flprs, nconf, ncong MATLAB fmin, fmins, leastsq, lp, constr NAG e04abf, e04jaf, e04laf, e04fdf, e04mbf, e04vdf MINPACK lmdif1 NETLIB varpro, dqed Octave sqp, ols, gls

Summary Problem setting: Real valued objective function Golden section search: Convergence rate Direction of descent: Steepest descent Linear least squares: Data fitting, QR decomposition or triangularization of a matrix using orthogonal transformations (rotation, Householder transformation) Nonlinear least squares: Newton s method (relation with solving nonlinear systems), Gauss-Newton method (relation with solving linear least squares)