Computational Optimization Mathematical Programming Fundamentals 1/5 (revised)
If you don t know where you are going, you probably won t get there. -from some book I read in eight grade If you do get there, you won t know it. -Dr. Bennett s amendment Mathematical Programming Theory tells us How to formulate a model. Strategies for solving the model. How to know when we have found an optimal solutions. How hard it is to solve the model. Let s start with the basics
Line Segment Let x R n and y R n, the points on the line segment joining x and y are { z z = λx+(1- λ)y, 0 λ 1 }. x y
Convex Sets A set S is convex if the line segment joining any two points in the set is also in the set, i.e., for any x,y S, λx+(1- λ)y S for all 0 λ 1 }. convex not convex convex not convex not convex
Favorite Convex Sets Circle with center c and radius r { x x c r Linear Equalities = plane Linear Inequalities or Polyhedrons } Matrix A R b R x R { x Ax = b} mxn m n Matrix A R b R x R { x Ax b} mxn m n
Convex Sets Is the intersection of two convex sets convex? Yes Is the union of two convex sets convex? NO
Convex Functions A function f is (strictly) convex on a convex set S, if and only if for any x,y S, f(λx+(1- λ)y)(<) λf(x)+ (1- λ)f(y) for all 0 λ 1. f(λx+(1- λ)y) f(y) f(x) x λx+(1- λ)y y
Concave Functions A function f is (strictly) concave on a convex set S, if and only if for any f is (strictly) convex on S. f -f
(Strictly)Convex, Concave, or none of the above? None of the above Concave Convex Concave Strictly convex
Favorite Convex Functions Linear functions n f ( x) = w' x = wx where x R i= 1 f ( x, x ) = x + x 1 1 i Certain Quadratic functions depends on choice of Q (the Hessian matrix) i f ( x) = x' Qx+ w' x+ c f ( x, x ) = x + x n 1 1
Convexity of function affects optimization algorithm
Convexity of constraints affects optimization algorithm min f(x) subject to x S S convex direction of Steepest descent S not convex
Convex Program min f(x) subject to x S where f and S are convex Make optimization nice Many practical problems are convex problem Use convex program as subproblem for nonconvex programs
Theorem : Global Solution of convex program If x* is a local minimizer of a convex programming problem, x* is also a global minimizer. Further more if the objective is strictly convex then x* is the unique global minimizer. Proof: contradiction x* y f(y)<f(x*)
Proof by contradiction Suppose x* is a local but not global minimizer, i.e. there exist y, s.t. f(y) <f(x*). Then for all 0<ε<1, f(εx*+(1- ε)y) ε f(x*)+(1- ε)f(y) < ε f(x*)+(1- ε)f(x*)=f(x*). Contradiction, x* is not a local min. You try for uniqueness in strict case.
Problems with nonconvex objective Min f(x) subject to x [a,b] f strictly convex, problem has unique global minimum a x* b f not convex, problem has two local minima a x b x*
Problems with nonconvex set Min f(x) subject to x [a,b] or [c d] a b x c x* d
Multivariate Calculus For x R n, f(x)=f(x 1, x, x 3, x 4,, x n ) The gradient of f: f ( x) f ( x) f ( x) f( x) =,,..., x1 x xn The Hessian of f: f( x) f( x) f( x)... x1 x1 x1 x x1 xn f( x) f( x)... f( x) f( x) f( x)... xn x1 xn x xn xn f( x) = x x1 x x
For example 4 3 x 1 ( ) 3 4 f x = x + x + e + x x 1 1 3 x 1 x1 + 3e + 4 x f ( x ) = 3 1x + 4x1 + 3 x 1 9 e 4 f ( x ) = 4 3 6 x = [0,1] 7 f ( x ) = 1 f x 11 4 ( ) = 4 3 6 x
Quadratic Functions Form n nxn n x R Q R b R Gradient 1 f ( x) = x Qx b ' x n n n 1 = Q x x b x j = 1 ij i j j j i= 1 j = 1 j = 1 f ( x) 1 1 = Q x + Q x + Q x b x k n kk k ik i kj j k i k j k = Q x b assuming Qsymmetric kj j k f () x = Qx b f () x = Q
Taylor Series Expansion about x* - 1D Case Let x=x*+p 3 3 f(x)= f(x*+p)=f(x*)+pf (x*)+ p f (x*)+ p f (x*) Equivalently 1 n n + + p f (x*) + n! 1 1 3! 3 3 f(x)=f(x*)+(x-x*)f (x*)+ (x-x*) f (x*)+ ( x x*) f (x*) 1 1 3! 1 n! n n + + ( x x*) f (x*) +
Taylor Series Example Let f(x) = exp(-x), compute Taylor Series Expansion about x*=0 1 1 x x 3! 1 n n + + ( x x*) f (x*) + n! 3 n x* x x* x x* n x x* = 1 xe + e e + +(-1) e + 3! n! 3 3 f(x)=f(x*)+(x-x*)f (x*)+ (x-x*) f (x*)+ ( *) f (x*) 3 n x x n x = 1 x + + +(-1) + 3! n!
First Order Taylor Series Let x=x*+p Approximation f(x)=f(x*+p)=f(x*)+p f(x*)+ p α ( x*, p) where lim α ( x*, p) = 0 p 0 Says that a linear approximation of a function works well locally f(x) f(x) f(x*+p)= f ( x*) + p f( x*) f(x) f ( x*) + ( x x*) ' f( x*) x*
Second OrderTaylor Series Let x=x*+p Approximation f(x)=f(x*+p)=f(x*)+p f(x*)+ f(x*)p+ p ( *, ) where lim α ( x*, p) = 0 p 0 Says that a quadratic approximation of a function works even better locally 1 p α x p f(x) f(x) f( x*) + ( x x*) ' f( x*) x* 1 ( *)' x x f ( x *)( x x *) +
Theorem.1 Taylor s Theorem version Suppose f is cont diff, f ( x+ p) = f( x) + f( x+ tp)' p for some t [0,1]. If f is twice cont. diff, f ( x+ p) = f( x) + f( x)' p+ p' f( x+ tp)' p for some t [0,1]. 1 Also called Mean Value Theorem
Taylor Series Approximation Exercise Consider the function and x*=[-,3] = 3 + + + 1 1 1 1 f ( x, x ) x 5x x 7xx x Compute gradient and Hessian. What is First Order TSA about X* What is second order TSA about X* Evaluate both TSA at y=[-1.9,3.] and compare with f(y)
Exercise f ( x, x ) = x + 5x x + 7x x + x function 3 1 1 1 1 f ( x) = f ( x*) = [, ]' gradient f ( x) f ( x*) Hessian First order TSA: = = g ( x) = f ( x*) + ( x x*) f ( x*) = second order TSA: h( x) = f ( x*) + ( x x*) f ( x*) f ( y) g ( y) = f ( y) h( y) = 1 + ( x x*) f ( x*)( x x*)
Exercise f( x, x ) = x + 5x x + 7x x + x function f( x*) = 56 3 1 1 1 1 3x1 + 10xx 1 + 7x f( x) = f( x*) = [15, 5] gradient 5x1 + 14x1x + 4x 6 x1+ 10x 10x1+ 14x 18 f( x) = f( x*) = Hessian 10x1+ 14x 14x1+ 4-4
Exercise First order T S A : g ( x ) = f ( x*) + ( x x*) f ( x*) second order TSA: h ( x ) = f ( x*) + ( x x*) f ( x*) + ( x x *) f ( x*)( x x*) 1 f ( y ) g ( y ) = 64.811 ( 64.9) =.089 f ( y ) h ( y ) = 64.811 ( 64.5) =.0 3 9
General Optimization algorithm Specify some initial guess x 0 For k = 0, 1, If x k is optimal then stop Determine descent direction p k Determine improved estimate of the solution: x k+1 =x k +λ k p k Last step is one-dimensional search problem called line search
Descent Directions If the directional derivative is negative then linesearch will lead to decrease in the function f ( x) d < 0 [8,] [0,-1] f () x d
Descent directions create decrease Let d' f( x) < 0, then λ > 0 such that f( x+ λd) < f( x) Proof for λ λ f ( x+ λd) = f( x) + λd f( x) + λd α( x, λd) f( x+ λd) f( x) λ = d f( x) + d α( x, λd) f( x+ λd) f( x) < 0 for λsufficiently small since d f( x) < 0and α( x, λd) 0.
Negative Gradient An important fact to know is that the negative gradient always points downhill Let d = f( x), then λ > 0 such that f( x+ λd) < f( x) Proof f ( x+ λd) = f( x) + λd f( x) + λd α( x, λd) f( x+ λd) f( x) λ = d f( x) + d α( x, λd) f( x+ λd) f( x) < 0 for λsufficiently small since d f( x) < 0and α( x, λd) 0. for λ λ
Notes on negative gradient If gradient nonzero, then negative gradient defines a descent direction d ' f ( x) = f ( x)' f ( x) by substitution of d = f( x) < 0 if f( x) 0
Directional Derivative f ( x, d) = lim λ 0 = f( x) d f ( x + λd) f ( x) λ Always exists when function is convex
Assignment Read chapter 3 in NW