Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Julian Hall 10th May 2012

Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global minimizer can be found by searching downhill For a convex quadratic function q(x) = g T x + 1 2 xt Hx the matrix H is positive semi-definite the minimizer is x = H 1 g For a convex nonlinear function f (x), at any point x the Hessian matrix H(x) = 2 f (x) is positive semi-definite

Convexity: non-convex problems A local minimizer may not be a global minimizer A solution of f (x) = 0 (stationary point) may be a maximizer a saddle point a non-global minimizer Finding a global minimizer can be very hard For a non-convex quadratic function q(x) = g T x + 1 2 xt Hx the matrix H is indefinite or negative (semi-)definite the unconstrained minimization of q(x) is unbounded q(x) on [0, 1] n may have 2 n local minimizers

Matrix definiteness A (square) symmetric matrix A is one of the following positive definite: x T Ax > 0 for x 0 All eigenvalues of A are positive positive semi-definite: x T Ax 0 All eigenvalues of A are non-negative indefinite: x T Ax has both signs Eigenvalues of A are both positive and negative negative semi-definite: x T Ax 0 All eigenvalues of A are non-positive negative definite: x T Ax < 0 for x 0 All eigenvalues of A are negative

Results For any matrix A, A T A is symmetric and positive semi-definite For any nonsingular matrix A, A T A is symmetric and positive definite For any positive definite matrix A A 1 is positive definite A 2 is positive definite For any positive definite matrices A and B A + B is positive definite

Unconstrained optimization The problem: minimize f (x) subject to x IR n Necessary conditions at a minimizer x : First order: f = 0 Second order: 2 f positive semi-definite Sufficient conditions for x to be a minimizer: First order: f = 0 Second order: 2 f positive definite Proofs not examinable

Unconstrained optimization The problem: minimize f (x) subject to x IR n Framework for line search method At iterate x k Find a search direction s k Perform a line search of f (x k + αs k ) to identify a step length α k Move to new iterate x k+1 = x k + α k s k Alternative approach is trust region method Find satisfactory (approximate) solution s of minimize f (x k + s) subject to s Move to new iterate x k+1 = x k + s Worth studying as a tricky constrained optimization problem Valuable practical technique Not examinable

Line search methods Find (approximate) minimizer α k of f (x k + αs k ) For a quadratic function q(x) = g T x + 1 2 xt Hx At x k with search direction s k the exact solution is α k = gk T s k s k T Hs k For a nonlinear function, techniques to find α k are Interpolation of f (x k + αs k ) and f (x k + αs k ) by quadratic or cubic polynomials Valuable practical technique Examinable Armijo linesearch considering values of f (x k + α j s k ) for values of α j in geometric sequence. Valuable for theoretical purposes Not examinable

Search directions Steepest descent s k = g k Newton s k = H k 1 g k Quasi-Newton s k = B k g k Conjugate gradient s k = g k + β k 1 s k 1

Steepest descent vs Newton s method Steepest descent (SD) First order needs only f (x) inexpensive Global convergence under weak assumptions Scale-dependent Rapid reduction in f when far from x Linear (slow) local convergence to x Newton s method (N) Second order also needs 2 f (x) expensive Global convergence only under strong assumptions Scale-independent Can be unreliable when far from x Quadratic (fast) local convergence to x

Quasi-Newton methods (QN) s k = B k g k Desired properties of B k Nonsingular Symmetric (since it is the inverse of a Hessian) Positive definite (so s k = B k g k is a descent direction) Like I when far from x (so behaves like SD) Like (H k ) 1 when close to x (so behaves like N)

Quasi-Newton methods s k = B k g k Methods Start with B 0 = I so s 0 = g 0 (SD) B k+1 relates change δ k in x k and change γ k in f (x k ) according to the quasi-newton condition B k+1 γ k = δ k Symmetric rank-one formula uses B k+1 = B k + auu T Rank-two formulae (DFP, BFGS and rest of the Broyden family) use B k+1 = B k + auu T + bvv T

Quasi-Newton methods Properties First order and O(n 2 ) cost per iteration (cheaper than N) Rapid reduction in f when far from x (like SD) Super-linear (fast) local convergence to x (similar to N)

The conjugate gradient method (CG) For quadratic functions with H positive definite: Let s 1 = g 1 Repeat, for k = 1,..., n α k = gk T s k s k T Hs k x k+1 = x k + α k s k g k+1 = g + Hx k+1 β k = gk+1t g k+1 g k T g k s k+1 = g k+1 + β k s k Until g m+1 = 0 for some m n

CG for nonlinear f (x) and linear systems Ax = b For nonlinear f (x) perform approximate line search use g k = f (x k ). For Ax = b If A is symmetric and positive definite: Solution of Ax = b is minimizer of q(x) = b T x + 1 2 xt Ax If A is unsymmetric and nonsingular: Solution of Ax = b is the solution of A T Ax = A T b Can solve A T Ax = A T b using CG by minimizing q(x) = (A T b) T x + 1 2 xt A T Ax since A T A is symmetric and positive definite CG is now more useful for symmetric linear systems than optimization

Least-squares problems Linear least squares: Find the best solution of Ax = b when A IR m n, m > n Minimize q(x) = Ax b 2 Form and solve A T Ax = A T b Nonlinear least squares: Find the best solution of r(x) = 0 when r : IR n IR m, m > n Minimize f (x) = r(x) 2 Use f (x) = 2A(x) T r(x) mx 2 f (x) = 2A(x) T A(x) + 2 r i (x) 2 r i i=1 where A(x) is the Jacobian matrix of r(x) Full Hessian is Newton s method 2A(x) T A(x) 2 f (x) is Gauss-Newton

Constrained optimization The problem: minimize f (x) subject to x S IR n Convexity of feasible region S is important If S is convex and f is convex on S then it is a convex programming problem (CPP) If S is non-convex then there may be multiple local minimizers The most general nonlinear optimization problem in this course is minimize f (x) subject to c i (x) = 0, i E and c i (x) 0, i I

Kuhn-Tucker (KT) conditions For the nonlinear optimization problem minimize f (x) subject to c i (x) = 0, i E and c i (x) 0, i I The Kuhn-Tucker (KT) conditions at a point x are that there exist Lagrange multipliers λ such that i λ i c i (x) = f (x) (KT1) c i (x) = 0, i E c i (x) 0, i I λ i 0, i I λ i = 0, i A (KT2) (KT3) (KT4) (KT5) where the active set A contains the indices of the active constraints at x

Convex programming problems (CPP) Definition minimize f (x) subject to c i (x) 0, i = 1, 2,..., m where each c i is concave and f is convex on S (CPP) Theorems For a CPP Every local minimizer x is a global minimizer The set S of global minimizers is convex and, if f (x) is strictly convex then there is a unique global minimizer If the KT conditions hold at x then it is a global minimizer Proofs examinable

Convex duality The Wolfe dual For the primal CPP (P) the Wolfe dual problem is where maximize L(x, λ) subject to x L(x, λ) = 0, λ 0 L(x, λ) = f (x) m λ i c i (x) i=1 (D) Theorem If x is a minimizer of (P) then x and λ are a maximizer of (D) and the optimal objective values of (P) and (D) are equal Proof examinable

Convex duality: examples LP problems For the primal LP problem minimize c T x subject to Ax b, x 0 the dual LP problem is maximize b T λ subject to A T λ c, λ 0 (P) (D)

Linearly constrained optimization For the linearly constrained optimization problem minimize f (x) subject to a T i x = b i, i E and a T i x b i, i I Feasible region S is convex If f is convex on S then it is a CPP The KT conditions at a point x are i λ ia i = f (x) a T i x = b i, i E a T i x b i, i I λ i 0, i I λ i = 0, i A (KT1) (KT2) (KT3) (KT4) (KT5)

Necessary and sufficient conditions for a minimizer Necessary conditions at a minimizer x First order: KT conditions are satisfied Second order: curvature of f is non-negative in directions of zero slope Sufficient conditions for x to be a minimizer First order: KT conditions are satisfied Second order: curvature of f is positive in directions of zero slope Proofs examinable

Nonlinear constrained optimization For the nonlinear constrained optimization problem minimize f (x) subject to c i (x) = 0, i E and c i (x) 0, i I Necessary and sufficient conditions for a minimizer are as for linearly constrained optimization problems, but curvature requirement applies to the Lagrangian L(x, λ) = f (x) i λ i c i (x) Proofs not examinable