Multivariate Newton Minimanization
Optymalizacja syntezy biosurfaktantu
Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria. Application: They promote the uptake and biodegradation of poorly soluble substrates, They serve as immune modulators and virulence factors, They act as antimicrobials, They are used in surface motility, They are used to develope biofilm.
Rhamnolipid kinetics
Użyty został wariant 2 2 planowanego doświadczenia x 1 stężenie glicerolu x 2 stosunek wytłoków z trzciny cukrowej do nasion słonecznika x 1 x 2
y=46,25-2,35x1 6,18x2-15,8x12-14,92x22-9,74x1 x2
Surfaces (hipersurfaces) can have much more complex topology
Optimisation Process of finding maximum (minimum) value of a given finction in a specific region (constraints): Unconstrained Constrained.
Finding a root of the nonlinear equation Newton-Raphson method f(x) f(x 1 ) f (x 1 ) f x 1 0 x 2 x 1 x x 1 x 2 f x 1 = f x 1 0 x 1 x 2 = f x 1 x 1 x 2 x 2 = x 1 f x 1 f x 1
Finding a root of the nonlinear equation - Newton-Raphson method f(x) f(x 2 ) f (x 2 ) f x 2 0 x 3 x 2 x x 3 x 2 f x 2 = f x 2 0 x 3 x 2 = f x 2 x 3 x 2 x 3 = x 2 f x 2 f x 2
Finding a root of the nonlinear equation general expression f(x) f(x i ) f (x i ) f x i 0 x i+1 x i x x i+1 x i f x i = f x i 0 x i x i+1 = x i+1 = x i f x i f x i f x i x i x i+1
When do we stop? When nth change f(x)=0 or the change is small f(x) f(x i ) f (x i ) f x i 0 x i+1 x i x x i x i+1 x i+1 x i x i+1 100% = err Relative change
Example f x = x 2 1 f(x) Stopping cryterio < 30 % f(x i ) 1 1
Example f x = x 2 1 f(x) f 4 = 15 y = 8x Stopping cryterio < 30 % f x 1 = 2 4 = 8 f(x i ) 1 1 x 2 =? x 1 = 4 x f x = x 2 1 = 2x x i+1 = x i f x i f x i
Example f x = x 2 1 f(x) y = 8x f 4 = 15 Stopping cryterio < 30 % 2.125 4 4 100% = 46% f x 1 = 2 4 = 8 f(x i ) 1 1 x 2 =? x 1 = 4 x f x = x 2 1 = 2x x i+1 = x i f x i f x i i = 1, x 2 = x 1 f x 1 f = 4 42 1 x 1 2 4 = 4 15 8 = 2.125
Example f x = x 2 1 f(x) y = 4.25x f(x i ) f x 2 = 2 2.125 = 4.25 1 f x = x 2 1 = 2x 1 x 3 =? x 2 = 2.125 x i+1 = x i f x i f x i x i = 2, x 3 = x 2 f x 2 f = 2.125 2.1252 1 x 2 2 2.125 = 1.28
Example f x = x 2 1 f(x) 1.28 2.125 1.28 100% = 66% f(x i ) f 2.125 = 3.51 f 1 = 0.68 y = 4.25x f x 2 = 2 2.125 = 4.25 1 f x = x 2 1 = 2x 1 x 3 =? x 2 = 2.125 x i+1 = x i f x i f x i x i = 2, x 3 = x 2 f x 2 f = 2.125 2.1252 1 x 2 2 2.125 = 1.28
Example f x = x 2 1 f(x) f(x i ) y = 2.56x f x 3 = 2 1.28 = 2.56 1 f x = x 2 1 = 2x 1 x 3 = 1.28 x 4 =? x i+1 = x i f x i f x i x
Example f x = x 2 1 f(x) Finish! 1.03 1.28 1.03 f 1 = 0.07 100% = 24% f(x i ) f 2.125 = 3.51 y = 2.56x f x 3 = 2 1.28 = 2.56 1 f x = x 2 1 = 2x 1 x 3 = 1.28 x 4 =? x i+1 = x i f x i f x i x i = 3, x 4 = x 3 f x 3 f = 1.28 1.282 1 x 3 2 1.28 = 1.03
What about minima (maxima)? We have procedure to find zero of the function f(x) x i+1 = x i f x i f x i Where function f has its minima (maxima) f (x) = 0 So we are looking for the zero of the function g x = f (x) = 0 x i+1 = x i g x i g x i = x i f x i f x i
What about multidemensional problem? In order to explore topology of multidemensional surface we have to use again Taylors expansion series. Taylor expansion gives information about surrounding of the function (f x + Δx ) using ONLY LOCAL information about function: - f(x) it s value, - f (x) the rate of change of f in x, - f (x) it s curvature in x, - f (x) it s rate of change of curvature in x, - And so on. What is important that all the derivates are computed only in point x.
Accuracy using derivatives Trunctuation error resulting of final representation using TE f x + Δx = f x + df dx Δx + 1 d 2 f 2 dx 2 Δx2 + df dx = f x + Δx f x Δx 1 2 d 2 f dx 2 Δx + df dx f x + Δx f x Δx
Trunctuation error Trunctuation error resulting of final representation using TE ε T = 1 d 2 f Δx ~Δx 2 dx2
Accuracy using derivatives Round-off error resulting of final representation of numbers (lack of significant figures) f x + Δx = f x + df dx Δx + 1 d 2 f 2 dx 2 Δx2 + df dx = f x + Δx + ε f x + ε Δx 1 2 d 2 f dx 2 Δx + df dx f x + Δx f x Δx + 2 ε Δx
Round-off error Round-off error resulting of final representation of numbers (lack of significant figures) ε R = 2 ε Δx ~ 1 Δx ε total = 2 ε Δx 1 2 d 2 f dx 2 Δx
Examples of tructuation and round-off errors f x = x 3 + x 1 3 at point x = 3 True derivative value at point 3 is 27.2886751 Δx = 0.01 df dx f x + Δx f x = f 3 + 0.1 f 3 Δx 0.01 ε T = 0.08512 = 27.37385 df dx f x + Δx f x Δx f 3 + 0.1 f 3 0.1 = 2Δx 0.02 ε T = 0.000105 = 27.28878
Examples of tructuation and round-off errors f x = x 3 + x 1 3 at point x = 3 True derivative value at point 3 is 27.2886751 Δx = 0.01 f 3 + 0.1 = 29.0058362 29.006 f(3) = 28.7320508 28.732 df dx f x + Δx f x = f 3 + 0.1 f 3 Δx 0.01 ε total = 0.1113 = 27.4 Δx 1.0 9.98 0.1 0.911 0.01 0.1113 0.001 0.2887 0.0001 2.7113 0.00001 27.728 ε total
Total error Δx 1.0 9.98 0.1 0.911 0.01 0.1113 0.001 0.2887 0.0001 2.7113 0.00001 27.728 ε total
Taylor s expansion in 2D f x i+1, y i+1 = f x i, y i + f x Δx + f y Δy f x i+1, y i+1 = f x i, y i + f x, f y Δx Δy
Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 = f 1 x i, y i + f 1 Δx + f x 1 Δy y f 1 x i+1, y i+1 = f 1 x i, y i + f Δx 1, f x 1 y Δy f 2 x i+1, y i+1 = f 2 x i, y i + f 2 Δx + f x 2 Δy y f 2 x i+1, y i+1 = f 2 x i, y i + f 2 x, f 2 y Δx Δy
Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 = f 1 x i, y i + f Δx 1, f x 1 y Δy f 2 x i+1, y i+1 = f 2 x i, y i + f 2 x, f 2 y Δx Δy
Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 = f 1 x i, y i + f Δx 1, f x 1 y Δy f 2 x i+1, y i+1 = f 2 x i, y i + f Δx 2, f x 2 y Δy f 1 x i+1, y i+1 f 2 x i+1, y i+1 = f 1 x i, y i f 2 x i, y i f 1x, f 1 y + f 2, f x 2 y Δx Δy
Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 f 2 x i+1, y i+1 = f 1 x i, y i f 2 x i, y i f 1x, f 1 y + f 2, f x 2 y Δx Δy f x i+1 = f x i + J i Δ x f x i+1 = f x i + J i Δ x
Multivariate Taylor s expansion f x + Δ x = f x + J x Δ x + 1 2 Δ xt HΔ x +
Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x + 1 2 Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r
Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x + 1 2 Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x +
Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x + 1 2 Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r Multivariate scalar function f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + E.g. energy, cost
Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x + 1 2 Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r Multivariate scalar function f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + E.g. energy, cost J x = f 1 f 1 x 1 x n f n f n x 1 x n f x = f x 1 f x n H x = 2 f f 2 x 1 x 1 x n f 2 f n x 1 x 2 n x n
Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y
Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y
Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y H x = 2 f x 2 f x y f x y 2 f y 2 = 2 0 0 2
Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y H x = 2 f x 2 f x y f x y 2 f y 2 = 2 0 0 2 f x + Δ x = f x + f x f y Δx Δy + 1 2 Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy +
Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y H x = 2 f x 2 f x y f x y 2 f y 2 = 2 0 0 2 f x + Δ x = f x + f x f y Δx Δy + 1 2 Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy + f x + Δ x = f + 1 2 Δx Δy 2 0 0 2 x + 2x 2y Δx Δy Δx Δy +
Multivariate Taylor s expansion example f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x + Δ x = f x + f x f y Δx Δy + 1 2 Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy +
Multivariate Taylor s expansion example f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x + Δ x = f x + f x f y Δx Δy + 1 2 Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy + f x + Δ x = f x + 2x 2y Δx Δy + 1 2 Δx Δy 2 0 0 2 Δx Δy +
Multivariate Taylor s expansion example f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x + Δ x = f x + f x f y Δx Δy + 1 2 Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy + f x + Δ x = f x + 2x 2y Δx Δy + 1 2 Δx Δy 2 0 0 2 Δx Δy + f x + Δ x = f x + 2xΔx + 2yΔy + Δx 2 + Δy 2 f Δ x = f 0,0 = Δx 2 + Δy 2
Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f x 1 f x n x = x 1 x n
Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x 2 + + f x n x n = n i=1 f x i x i = n i=1 f xi x i
Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x 2 + + f x n x n = n i=1 f x i x i = n i=1 f xi x i y T x = n i=1 y i x i
Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x 2 + + f x n x n = n i=1 f x i x i = n i=1 f xi x i y T x = n i=1 Plane tangential to function in point x Equation of plane ax 1 + bx 2 y i x i
Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x 2 + + f x n x n = n i=1 f x i x i = n i=1 f xi x i y T x = n i=1 Plane tangential to function in point x Equation of plane ax 1 + bx 2 y i x i Gradient gives information about the rate of change of f in each direction (x,y)
Gradient T f( x) = f x 1 f x n
Gradient T f( x) = f x 1 f x n
Gradient T f( x) = f f x 1 x n f( x)
Gradient T f( x) = f f x 1 x n f( x) f( x)
Gradient T f( x) = f f x 1 x n f( x) f( x) f( x)
Gradient T f( x) = f f x 1 x n f( x) f( x) - f( x) f( x) f( x)
Gradient T f( x) = f f x 1 x n f( x) f( x) f( x) f( x) f( x)
Gradient is a vector perpendicular to the function isoline T f( x) = f f x 1 x n f x = c f x(t) = c 0 = dc dt = df x t dt = f x t x 1 x 1 t t + f x t x 2 x 2 t t + + f x t x n x n t t = n f x t i=1 x i x i t t x 1 t 0 = dc n f x t dt = x i i=1 x i t t = f x t f x t x 1 x n t x n t t = T f x x (t)
Multivariate Taylor s expansion Hessian part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + H x = 2 f 2 f 2 x 1 x 1 x n 2 f 2 f n x 1 x 2 n x n x = x 1 x n x T = x 1 x n
Multivariate Taylor s expansion - Hessian part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + H x = 2 f 2 f 2 x 1 x 1 x n 2 f 2 f n x 1 x 2 n x n x = x 1 x n x T = x 1 x n x T H x = x 1 x n 2 f 2 f 2 x 1 x 1 x n 2 f 2 f n x 1 x 2 n x n x 1 x n
H x = H 11 H 1n H n1 H nn x 1 x n
H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n
n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1
n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n
n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j
n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j = n j=1 x 1 x n H 1j x j H nj x j
n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j = n j=1 x 1 x n H 1j x j H nj x j x T H x = n j=1 x 1 x n H 1j x j H nj x j
n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j = n j=1 x 1 x n H 1j x j H nj x j x T H x = n j=1 x 1 x n H 1j x j H nj x j = n j=1 H 1j x j x 1 + H 1j x j x 2 + + H 1j x j x n = n n i=1 j=1 H ij x i x j
Hessian containts all information about the shape of the function around minima f x + Δ x = f x + T f x Δ x + 1 2 Δ xt HΔ x + f x + Δ x = f x + 1 2 Δ xt HΔ x +
Recall quadratic function f x = ax 2 a > 0 a = f (x) > 0 1 dimensional Hessian
Recall quadratic function f x = ax 2 a = 0 a = f x = 0 1 dimensional Hessian
Recall quadratic function f x = ax 2 a < 0 a = f x < 0 1 dimensional Hessian
Recall quadratic function f x = ax 2 + by 2 H = 2 f x 2 f x y f x y 2 f y 2 a 0 0 b Defines shape of a skeleton function of the resulting surface
How to detect minimum (unconstrained)? 1. Necessary condition for uncostrained optimum at point x : f x = 0 and f( x) is differentiable at x 2. Sufficient condition for uncostrained optimum at point x: f x = 0 and f( x) is differentiable at x and 2 f( x ) is positive definite
Positive and negative defined matrix If for every vector x the following is true for a symmetric matrix H: x T H x > 0 then H is positive definite x T H x < 0 then H is negative definite Very cumbersome definition one needs to check every vector x
Positive and negative defined matrix Symmetric matrix H is positive definite if: 1. All eigenvalues are positive. 2. The determinant of each of its principal minor matrices are positive. Symmetric matrix H is negative definite if: 1. All eigenvalues are negative. 2. If we reverse the sign of each matrix s elements and the determinant of each of its principal minor matrices are positive.
Example H f 2 4 4 2 x = 2 4 4 2 2 = 2 > 0 = 4 16 = 12 < 0
Nature of stationary points Hessian H positive definite: Quadratic form Eigenvalues y T Hy 0 0 i y T T M M y T My My My 2 0 Local nature: minimum
Nature of stationary points (2) Hessian H negative definite: Quadratic form Eigenvalues Local nature: maximum y T Hy 0 0 i
Nature of stationary points (3) Hessian H indefinite: Quadratic form Eigenvalues y T Hy 0 0 i Local nature: saddle point
Nature of stationary points (4) Hessian H positive semi-definite: Quadratic form Eigenvalues y T Hy 0 0 i H singular! Local nature: valley
Nature of stationary points (5) Hessian H negative semi-definite: Quadratic form Eigenvalues y T Hy 0 0 i H singular! Local nature: ridge
Stationary point nature summary y T Hy i Definiteness H Nature x* 0 Positive d. Minimum 0 0 Positive semi-d. Indefinite Valley Saddlepoint 0 0 Negative semi-d. Negative d. Ridge Maximum
Newton method for optimization 1D Recall Newton method for finding a root of the function f (f x = 0?) x i+1 = x i f x i f x i Function f has minima (maximum) if f x = 0 Let g x = f x = 0 x i+1 = x i g x i g x i = x i f x i f x i
Newton method 1D different perspective Taylor expansion Newton method in fact is again application of Taylor expansion. It answer the question: How far should I jump to reach minimum namely point where f x i+1 = 0? That question can be unswered by expanding the function in the initial point x i : f x i + x = f x i + f x i x + 1 2 f x 2 + Now we want to move by Δx so we will reach minimum namely df x i + x = 0 dx So 0 = df x i + x = d dx dx f x i + f x i x + 1 2 f x 2 +
Newton method 1D different perspective Taylor expansion 0 = df x i + x dx = d dx f x i + f x i x + 1 2 f x 2 + 0 = f x i + f x + We take (for computational simplicty) only first term with x. Recall from the previous equation it comes from quadratic term. 0 = f x i + f x Because x = x i+1 x i 0 = f x i + f x i+1 x i x i+1 = x i f x i f x i
In Newton method we travel along parabola. Why? Because its the first polynomial that has minima indicator f(x) f x i f x i + Δx f x i f x i+1 = x i + Δx x i+1 x i x
For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i
For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i = x i 2x i 2
For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i = x i 2x i 2
For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i = x i 2x i 2 = x i x i = 0
Multivariate Newton method for optimization Again How far should I jump to reach minimum namely point so f x i+1 = 0. f x i+1 = f x i + T f x i ( x i+1 x i ) + 1 2 x i+1 x T i H( x i+1 x i ) f x = b f x = 0 Δ x f x i+1 = Δ x f x i + T f x i ( x i+1 x i ) + 1 2 x i+1 x T i H( x i+1 x i ) f x = b T x f x = b f x = x T A x f x = A T x + A x Δ x f x i+1 = f x i + 1 2 HT ( x i+1 x i ) + 1 2 H( x i+1 x i ) H T = H Δ x f x i+1 = f x i + H( x i+1 x i )
We look for x where f x = 0 so.. 0 = Δ x f x i+1 = f x i + H( x i+1 x i ) 0 = f x i + H( x i+1 x i ) f x i = H( x i+1 x i ) x i+1 = x i H 1 ( x i ) f x i
Multivariate optimisation Newton method Pros: Converges fast (especially for quadratic functions) Uses both information about shape (gradient) and curvature (Hessian). Cons: High compuational cost (Hessian, inverse) Hessian may be singular, Computational errors.
Steepest descent Easy idea let s move towards steepest descent namely along f Question is how far should we go? r = f x x We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0
Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0
Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0 We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0
Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0 We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0 How much should be α? We go until we reach minimum along x 1 direction.
Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0 We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0 How much should be α? f x 1 ( x) We go until we reach minimum along x 1 direction. α
Directional derivative We go until we reach minimum along x 1 direction. So we need To calculate derivative over α along x 1. df x dα = df x α dα = df dx 1 dx 1 dα + df dx 2 dx 2 dα + + df n dx n df dx n dα = dx i dx i dα i=1 df x dα n = i=1 df dx i dx i dα = T f d x dα
Steepest descent f x 1 Starting point x 0 Easy idea let s move towards steepest descent namely along f Question is how far should we go? r = f x x. Final point x 1 r 0 = f x 0 x 1 = x 0 α f x 0 = x 0 + α r 0 df x 1 dα = T f x 1 d x 1 dα = T f x 1 d dα x 0 + α r 0 = T f x 1 r 0 df x 1 dα = 0 T f x 1 r 0 = 0 r 1 T r 0 = 0 We go until gradient is perpendicular to the gradient in initial point.
Steepest descent f x 1 Starting point x 0 Easy idea let s move towards steepest descent namely along f Question is how far should we go? r = f x x. Final point x 1 f x 0 x 1 = x 0 α f x 0 = x 0 + α r 0 df x 1 dα = T f x 1 d x 1 dα = T f x 1 d dα x 0 + α r 0 = T f x 1 r 0 df x 1 dα = 0 T f x 1 r 0 = 0 r 1 T r 0 = 0 We go until gradient is perpendicular to the gradient in initial point. And then start again
How to find it computationally? r 1 T r 0 = 0 f x i+1 = f x i + T f x i ( x i+1 x i ) + 1 2 x i+1 x T i H( x i+1 x i ) f x 1 = f x 0 + T f x 0 ( x 1 x 0 ) + 1 2 x 1 x T 0 H( x 1 x 0 ) f x 1 = T f x 0 + H( x 1 x 0 ) 0 = f x 1 = T f x 0 + H( x 1 x 0 )
How to find it computationally? f x 1 = T f x 0 + H( x 1 x 0 ) r 1 = T f x 0 + H( x 1 x 0 ) r 1 = T f x 0 + Hα r 0 r 1 T r 0 = 0 T f x 0 + Hα r T 0 r 0 = 0
How to find it computationally? T f x 0 + Hα r T 0 r 0 = 0 f x 0 r 0 + α r T 0 H T r 0 = 0 f x 0 r 0 + α r T 0 H r 0 = 0 α = f r T 0 H x 0 r 0 r 0
Steepest descent routine ex f x = x 2 + y 2 1. Choose an initial point say x 0 = 2 2. 2. Choose accuracy ε say 10 6. 2. Compute gradient at this point r 0 = f x 0 = 2x 2y = 4 4. 3. Compute optimal α along r 0 : - compute Hessian at x 0 = 2 2, H = 2 0 0 2, - compute r 0 T r 0 = 2 2 2 2 = 8 - compute r T 0 H r 0 = 2 2 2 0 0 2 - compute α = r 0 T r 0 = 8 = 1 r T 0 H r 0 16 2 2 2 = 16. 4. Compute next point x 1 = x 0 α r 0 = 2 2 1 4 2 4 = 0 0. 5. Compute f x = f 0 = 0. If f x ε finish else go to 2. 0
Steepest descent method Pros: Always goes downhill Always converges Simple implementation Cons: Slow on eccentric functions
Steepest descent method eccentric function example Theorem If we define the error function in the objective function at current value x as: There hold at every step k Where A largest eigenvalue of H a smallest eigenvalue of H E E x = 1 2 x x k+1 x T H( x x ) A a A + a 2 E x k
Steepest descent method eccentric function example For function f x = x 2 + y 2 H = 2 0 0 2 A = 2, a = 2 E x k+1 0 4 E xk = 0 direct method For function f x = 50x 2 + y 2 H = 100 0 0 2 A = 50, a = 2 E x k+1 98 102 2 E x k slow convergence
Solution? combined methods Recall that function around minimum is quadratic f x + Δx = f x + df(x) dx Δx + 1 2 d 2 f x dx 2 Δx2 + But if in f(x) there is a minimum so df(x) dx = 0 and f x + Δx f x + aδx 2 So around minimum Newton method should work really well. Combined method (so called quasi-newton methods) start with steepest descent and transform into Newton method once it reaches near minimum region.
Improvements Computing Hessian is very costfull not metioning inverse: BFGS (Broyden-Fletcher-Goldfarb-Shanno), Conjugate gradients, DFP (Davidon-Fletcher-Powell).