Optimization
Last time Root inding: deinition, motivation Algorithms: Bisection, alse position, secant, Newton-Raphson Convergence & tradeos Eample applications o Newton s method Root inding in > 1 dimension
Today Introduction to optimization Deinition and motivation 1-dimensional methods Golden section, discussion o error Newton s method Multi-dimensional methods Newton s method, steepest descent, conjugate gradient General strategies, value-only methods
Ingredients Objective unction Variables Constraints Find values o the variables that minimize or maimize the objective unction while satisying the constraints
Dierent Kinds o Optimization Figure rom: Optimization Technology Center http://www-p.mcs.anl.gov/otc/guide/optweb/
Dierent Optimization Techniques Algorithms have very dierent lavor depending on speciic problem Closed orm vs. numerical vs. discrete Local vs. global minima Running times ranging rom O(1) to NP-hard Today: Focus on continuous numerical methods
Optimization in 1-D Look or analogies to bracketing in root-inding What does it mean to bracket a minimum? ( let, ( let )) ( right, ( right )) ( mid, ( mid )) let < mid < right ( mid ) < ( let ) ( mid ) < ( right )
Optimization in 1-D Once we have these properties, there is at least one local minimum between let and right Establishing bracket initially: Given initial, increment Evaluate ( initial ), ( initial +increment) I decreasing, step until ind an increase Else, step in opposite direction until ind an increase Grow increment (by a constant actor) at each step For maimization: substitute or
Optimization in 1-D Strategy: evaluate unction at some new ( let, ( let )) ( right, ( right )) ( new, ( new )) ( mid, ( mid ))
Optimization in 1-D Strategy: evaluate unction at some new Here, new bracket points are new, mid, right ( let, ( let )) ( right, ( right )) ( new, ( new )) ( mid, ( mid ))
Optimization in 1-D Strategy: evaluate unction at some new Here, new bracket points are let, new, mid ( let, ( let )) ( right, ( right )) ( new, ( new )) ( mid, ( mid ))
Optimization in 1-D Unlike with root-inding, can t always guarantee that interval will be reduced by a actor o 2 Let s ind the optimal place or mid, relative to let and right, that will guarantee same actor o reduction regardless o outcome
Optimization in 1-D α α 2 1-α 2 =α i ( new ) < ( mid ) new interval = α else new interval = 1 α 2
Golden Section Search To assure same interval, want α = 1 α 2 So, α = 5 1 2 = Φ This is the reciprocal o the golden ratio = 0.618 So, interval decreases by 30% per iteration Linear convergence
Sources o Error When we ind a minimum value,, why is it dierent rom true minimum min? 1. Obvious: width o bracket min right let 2. Less obvious: loating point representation ( min ) ( min ( ) ) ε mach
Stopping Criterion or Golden Section Q: When is ( right let ) small enough that discrepancy between and min limited by rounding error in ( min )? Use Taylor series, knowing that ( min ) is around 0 + + 1 2 ( ) ( min) 0 2 ( min)( min) So, the condition ( min ) ( ) min ( ) ε mach holds where min ε mach 2 ( ( min min ) )
Implications Rule o thumb: pointless to ask or more accuracy than sqrt(ε ) Q:, what happens to # o accurate digits in results when you switch rom single precision (~7 digits) to double (~16 digits) or, ()? A: Gain only ~4 more accurate digits.
Faster 1-D Optimization Trade o super-linear convergence or worse robustness Combine with Golden Section search or saety Usual bag o tricks: Fit parabola through 3 points, ind minimum Compute derivatives as well as positions, it cubic Use second derivatives: Newton
Newton s Method
Newton s Method
Newton s Method
Newton s Method
Newton s Method At each step: k + 1 = k ( ( k k ) ) Requires 1 st and 2 nd derivatives Quadratic convergence
Questions?
Multidimensional Optimization
Multi-Dimensional Optimization Important in many areas Finding best design in some parameter space Fitting a model to measured data Hard in general Multiple etrema, saddles, curved/elongated valleys, etc. Can t bracket (but there are trust region methods) In general, easier than rootinding Can always walk downhill Minimizing one scalar unction, not simultaneously satisying multiple unctions
Problem with Saddle
Newton s Method in Multiple Dimensions Replace 1 st derivative with gradient, 2 nd derivative with Hessian = = 2 2 2 2 2 2 ), ( y y y y H y
Newton s Method in Multiple Dimensions in 1 dimension: Replace 1 st derivative with gradient, 2 nd derivative with Hessian So, Can be ragile unless unction smooth and starting close to minimum ) ( ) ( 1 1 k k k k H = + ) ( ) ( 1 k k k k = +
Other Methods What i you can t / don t want to use 2 nd derivative? Quasi-Newton methods estimate Hessian Alternative: walk along (negative o) gradient Perorm 1-D minimization along line passing through current point in the direction o the gradient Once done, re-compute gradient, iterate
Steepest Descent
Problem With Steepest Descent
Conjugate Gradient Methods Idea: avoid undoing minimization that s already been done Walk along direction d k + 1 = gk + 1 + β d where g is gradient Polak and Ribiere ormula: β k = T g k + k + 1 T k gk k ( g 1 g g k k )
Conjugate Gradient Methods Conjugate gradient implicitly obtains inormation about Hessian For quadratic unction in n dimensions, gets eact solution in n steps (ignoring roundo error) Works well in practice
Value-Only Methods in Multi-Dimensions I can t evaluate gradients, lie is hard Can use approimate (numerically evaluated) gradients: = + + + δ δ δ δ δ δ ) ( ) ( ) ( ) ( ) ( ) ( 3 2 1 3 2 1 ) ( e e e e e e
Generic Optimization Strategies Uniorm sampling Cost rises eponentially with # o dimensions Heuristic: compass search Try a step along each coordinate in turn I can t ind a lower value, halve step size
Generic Optimization Strategies Simulated annealing: Maintain a temperature T Pick random direction d, and try a step o size dependent on T I value lower than current, accept I value higher than current, accept with probability ~ ep(( ( current ) ( new )) / T) Annealing schedule how ast does T decrease? Slow but robust: can avoid non-global minima
Downhill Simple Method (Nelder-Mead) Keep track o n+1 points in n dimensions Vertices o a simple (triangle in 2D tetrahedron in 3D, etc.) At each iteration: simple can move, epand, or contract Sometimes known as amoeba method: simple oozes along the unction
Downhill Simple Method (Nelder-Mead) Basic operation: relection location probed by relection step worst point (highest unction value)
Downhill Simple Method (Nelder-Mead) I relection resulted in best (lowest) value so ar, try an epansion location probed by epansion step Else, i relection helped at all, keep it
Downhill Simple Method (Nelder-Mead) I relection didn t help (relected point still worst) try a contraction location probed by contration step
Downhill Simple Method (Nelder-Mead) I all else ails shrink the simple around the best point
Downhill Simple Method (Nelder-Mead) Method airly eicient at each iteration (typically 1-2 unction evaluations) Can take lots o iterations Somewhat lakey sometimes needs restart ater simple collapses on itsel, etc. Beneits: simple to implement, doesn t need derivative, doesn t care about unction smoothness, etc.
Rosenbrock s Function 2 2 (, y) = 100( y ) + (1 Designed speciically or testing optimization techniques Curved, narrow valley ) 2
Demo
Global Optimization In general, can t guarantee that you ve ound global (rather than local) minimum Some heuristics: Multi-start: try local optimization rom several starting positions Very slow simulated annealing Use analytical methods (or graphing) to determine behavior, guide methods to correct neighborhoods
Sotware notes
Sotware Matlab: minbnd For unction o 1 variable with bound constraints Based on golden section & parabolic interpolation () doesn t need to be deined at endpoints minsearch Simple method (i.e., no derivative needed) Optimization Toolbo (available ree @ Princeton) meshgrid sur Ecel: Solver