MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2 1 Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year 2013-14

OUTLINE OF WEEK 8 topics: quadratic optimisation, least squares, iterative algorithms for nonlinear optimisatiom Least squares methods Linear least squares ( QP ) Quadratic Programming Integer Programming (IP) and LP relaxation Iterative methods Newton Raphson Quasi-Newton Conjugate gradient Gradient descent 2

LEAST SQUARES OPTIMISATION 3 Week 8 Optimisation 2

LEAST SQUARES 4

LINEAR LEAST SQUARES 5

COMPUTATION AND INTERPRETATION 6

QUADRATIC PROGRAMMING 7 Week 8 Optimisation 2

FORMULATION 8

DUAL PROBLEM AND COMPUTATION 9

INTEGER PROGRAMMING 10 Week 8 Optimisation 2

( IP ) INTEGER PROGRAMMING optimisation problem in which some of the variables are integer numbers It is NP-hard canonical form: where all entries of A, b and c are integer many problems can be formulated as IP: travelling salesman vertex cover Boolean satisfiability 11

EXAMPLE example problem feasible integer points in red constraints after LP relaxation in blue clearly, the LP relax optimum is neither feasible nor optimal for the IP problem 12

LP RELAXATION the idea is to relax the constraint that x is integer, and solve the resulting LP problem, then round in general, solution after relaxation is not feasible however, if A is totally unimodular, every basic feasible solution (vertex of the polytope determined by the linear constraints) is integer! ( unimodular (ever square nonsingular matrix is matrix is unimodular when det A = 1 we can just apply the simplex algorithm, and we are sure to get the optimal integer solution if A is not unimodular, there are exact algorithms 13

ITERATIVE METHODS 14 Week 8 Optimisation 2

NONLINEAR PROGRAMMING some of the constraints or the objective function are nonlinear issue arises when the problem is non-convex under differentiability of the functions involved, the Kuhn-Tucker conditions provide necessary conditions for ( 7 optimality (see Week example: nonlinear feasibility space ( sector (blue useful tools: numerical iterative methods 15

ITERATIVE METHODS solve nonlinear programming problems by evaluating Hessian, gradients and/or function values for smooth functions derivative calculations improve the rate of convergence, but increase computational load performance criterion: number of function evaluations Order of n+1 for gradients Order of n 2 for Hessians ultimately, what s best depend on the problem 16

NEWTON S METHOD 17

GEOMETRIC INTERPRETATION 18

QUASI-NEWTON derives from Newton s method looks for the stationary points of a function, using a second order Taylor approximation the Hessian does not need to be computed, uses an approximation B of it, such that a condition called the secant equation (Taylor expansion ( itself of the gradient it is underdetermined: need to add additional constraints, various options e.g. symmetry: B=B T, minimal distance: 19

QUASI-NEWTON update steps: Newton steps using the current approximation B k of the Hessian various methods to update B k, e.g. DFP where BFGS: Matlab optimization toolbox implementation: BFGS is one option of fminunc.m 20

( DESCENT GRADIENT DESCENT (STEEPEST first-order optimisation algorithm takes steps proportional to the negative of the gradient to find local minimum ( maxima (opposite for local also known as steepest descent starts with a guess x 0... and updates using ( F(b if is small enough, F(a) the sequence x 0, x 1, x 2,... should converge to local minimum 21

BEHAVIOR OF GRADIENT DESCENT kind of zig-zags slow convergence near the minimum gradient points away from the actual direction of the sought minimum can be used to solve linear systems Ax = b, in a least squares sense, by minimising Ax b 22

GRADIENT DESCENT PYTHON IMPLEMENTATION piece of code which finds the local minimum of the function f(x) = x 4-3x 3 + 2, with derivative f'(x) = 4x 3-9x 2 x_old = 0 x_new = 6 # The algorithm starts at x=6 eps = 0.01 # step size precision = 0.00001 def f_prime(x): return 4 * x**3-9 * x**2 while abs(x_new - x_old) > precision: x_old = x_new ( f_prime(x_old x_new = x_old - eps * print "Local minimum occurs at ", x_new 23

GRADIENT DESCENT VS NEWTON illustrates a comparison Newton in red Gradient descent in green Newton uses curvature (second order) information to take more direct route 24

CONJUGATE GRADIENT used to solve linear systems whose matrix A is positive definite iterative method, so can be applied to large systems for which Cholesky decomposition is not feasible can also be used in energy minimisation two vectors are conjugate is their inner product w.r.t. A is zero (they are orthogonal using the norm associated ( A with idea: solution of the system is also unique minimizer of the quadratic function 25

CONJUGATE GRADIENT - ALGORITHM start with initial guess x 0 at each step we take the negative of the gradient of the quadratic function, and we move in the direction p 0 = b Ax 0 the residual is r k = b Ax k gradient descent would move in the direction of r k Instead we want the successive search directions p k to be ( procedure conjugate w.r.t. A (similar to Gram-Schmidt update equations: 26

EXAMPLE MATLAB CODE can be easily implemented function [x] = conjgrad(a,b,x) r = b-a*x; p = r; rsold = r'*r; for i = 1:10^(6) Ap = A*p; alpha = rsold/(p'*ap); x = x + alpha*p; r = r - alpha*ap; rsnew = r'*r; if sqrt(rsnew) < 1e-10 break; end p = r + rsnew/rsold*p; rsold = rsnew; end 27

CONJUGATE GRADIENT VS GRADIENT DESCENT illustrates a comparison Conjugate gradient in red Gradient descent in green Conjugate gradient converges in at most n steps 28

SUMMARY 29 Week 8 Optimisation 2

SUMMARY OF WEEK 8 Nonlinear optimisation topics ( particular Least squares (linear in ( brief ) Quadratic Programming Integer Programming Nonlinear Programming Iterative methods Newton-Raphson Quasi-Newton gradient descent conjugate gradient 30