Multivariate Newton Minimanization

Optymalizacja syntezy biosurfaktantu

Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria. Application: They promote the uptake and biodegradation of poorly soluble substrates, They serve as immune modulators and virulence factors, They act as antimicrobials, They are used in surface motility, They are used to develope biofilm.

Rhamnolipid kinetics

Użyty został wariant 2 2 planowanego doświadczenia x 1 stężenie glicerolu x 2 stosunek wytłoków z trzciny cukrowej do nasion słonecznika x 1 x 2

y=46,25-2,35x1 6,18x2-15,8x12-14,92x22-9,74x1 x2

Surfaces (hipersurfaces) can have much more complex topology

Optimisation Process of finding maximum (minimum) value of a given finction in a specific region (constraints): Unconstrained Constrained.

Finding a root of the nonlinear equation Newton-Raphson method f(x) f(x 1 ) f (x 1 ) f x 1 0 x 2 x 1 x x 1 x 2 f x 1 = f x 1 0 x 1 x 2 = f x 1 x 1 x 2 x 2 = x 1 f x 1 f x 1

Finding a root of the nonlinear equation - Newton-Raphson method f(x) f(x 2 ) f (x 2 ) f x 2 0 x 3 x 2 x x 3 x 2 f x 2 = f x 2 0 x 3 x 2 = f x 2 x 3 x 2 x 3 = x 2 f x 2 f x 2

Finding a root of the nonlinear equation general expression f(x) f(x i ) f (x i ) f x i 0 x i+1 x i x x i+1 x i f x i = f x i 0 x i x i+1 = x i+1 = x i f x i f x i f x i x i x i+1

When do we stop? When nth change f(x)=0 or the change is small f(x) f(x i ) f (x i ) f x i 0 x i+1 x i x x i x i+1 x i+1 x i x i+1 100% = err Relative change

Example f x = x 2 1 f(x) Stopping cryterio < 30 % f(x i ) 1 1

Example f x = x 2 1 f(x) f 4 = 15 y = 8x Stopping cryterio < 30 % f x 1 = 2 4 = 8 f(x i ) 1 1 x 2 =? x 1 = 4 x f x = x 2 1 = 2x x i+1 = x i f x i f x i

Example f x = x 2 1 f(x) y = 8x f 4 = 15 Stopping cryterio < 30 % 2.125 4 4 100% = 46% f x 1 = 2 4 = 8 f(x i ) 1 1 x 2 =? x 1 = 4 x f x = x 2 1 = 2x x i+1 = x i f x i f x i i = 1, x 2 = x 1 f x 1 f = 4 42 1 x 1 2 4 = 4 15 8 = 2.125

Example f x = x 2 1 f(x) y = 4.25x f(x i ) f x 2 = 2 2.125 = 4.25 1 f x = x 2 1 = 2x 1 x 3 =? x 2 = 2.125 x i+1 = x i f x i f x i x i = 2, x 3 = x 2 f x 2 f = 2.125 2.1252 1 x 2 2 2.125 = 1.28

Example f x = x 2 1 f(x) 1.28 2.125 1.28 100% = 66% f(x i ) f 2.125 = 3.51 f 1 = 0.68 y = 4.25x f x 2 = 2 2.125 = 4.25 1 f x = x 2 1 = 2x 1 x 3 =? x 2 = 2.125 x i+1 = x i f x i f x i x i = 2, x 3 = x 2 f x 2 f = 2.125 2.1252 1 x 2 2 2.125 = 1.28

Example f x = x 2 1 f(x) f(x i ) y = 2.56x f x 3 = 2 1.28 = 2.56 1 f x = x 2 1 = 2x 1 x 3 = 1.28 x 4 =? x i+1 = x i f x i f x i x

Example f x = x 2 1 f(x) Finish! 1.03 1.28 1.03 f 1 = 0.07 100% = 24% f(x i ) f 2.125 = 3.51 y = 2.56x f x 3 = 2 1.28 = 2.56 1 f x = x 2 1 = 2x 1 x 3 = 1.28 x 4 =? x i+1 = x i f x i f x i x i = 3, x 4 = x 3 f x 3 f = 1.28 1.282 1 x 3 2 1.28 = 1.03

What about minima (maxima)? We have procedure to find zero of the function f(x) x i+1 = x i f x i f x i Where function f has its minima (maxima) f (x) = 0 So we are looking for the zero of the function g x = f (x) = 0 x i+1 = x i g x i g x i = x i f x i f x i

What about multidemensional problem? In order to explore topology of multidemensional surface we have to use again Taylors expansion series. Taylor expansion gives information about surrounding of the function (f x + Δx ) using ONLY LOCAL information about function: - f(x) it s value, - f (x) the rate of change of f in x, - f (x) it s curvature in x, - f (x) it s rate of change of curvature in x, - And so on. What is important that all the derivates are computed only in point x.

Accuracy using derivatives Trunctuation error resulting of final representation using TE f x + Δx = f x + df dx Δx + 1 d 2 f 2 dx 2 Δx2 + df dx = f x + Δx f x Δx 1 2 d 2 f dx 2 Δx + df dx f x + Δx f x Δx

Trunctuation error Trunctuation error resulting of final representation using TE ε T = 1 d 2 f Δx ~Δx 2 dx2

Accuracy using derivatives Round-off error resulting of final representation of numbers (lack of significant figures) f x + Δx = f x + df dx Δx + 1 d 2 f 2 dx 2 Δx2 + df dx = f x + Δx + ε f x + ε Δx 1 2 d 2 f dx 2 Δx + df dx f x + Δx f x Δx + 2 ε Δx

Round-off error Round-off error resulting of final representation of numbers (lack of significant figures) ε R = 2 ε Δx ~ 1 Δx ε total = 2 ε Δx 1 2 d 2 f dx 2 Δx

Examples of tructuation and round-off errors f x = x 3 + x 1 3 at point x = 3 True derivative value at point 3 is 27.2886751 Δx = 0.01 df dx f x + Δx f x = f 3 + 0.1 f 3 Δx 0.01 ε T = 0.08512 = 27.37385 df dx f x + Δx f x Δx f 3 + 0.1 f 3 0.1 = 2Δx 0.02 ε T = 0.000105 = 27.28878

Examples of tructuation and round-off errors f x = x 3 + x 1 3 at point x = 3 True derivative value at point 3 is 27.2886751 Δx = 0.01 f 3 + 0.1 = 29.0058362 29.006 f(3) = 28.7320508 28.732 df dx f x + Δx f x = f 3 + 0.1 f 3 Δx 0.01 ε total = 0.1113 = 27.4 Δx 1.0 9.98 0.1 0.911 0.01 0.1113 0.001 0.2887 0.0001 2.7113 0.00001 27.728 ε total

Total error Δx 1.0 9.98 0.1 0.911 0.01 0.1113 0.001 0.2887 0.0001 2.7113 0.00001 27.728 ε total

Taylor s expansion in 2D f x i+1, y i+1 = f x i, y i + f x Δx + f y Δy f x i+1, y i+1 = f x i, y i + f x, f y Δx Δy

Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 = f 1 x i, y i + f 1 Δx + f x 1 Δy y f 1 x i+1, y i+1 = f 1 x i, y i + f Δx 1, f x 1 y Δy f 2 x i+1, y i+1 = f 2 x i, y i + f 2 Δx + f x 2 Δy y f 2 x i+1, y i+1 = f 2 x i, y i + f 2 x, f 2 y Δx Δy

Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 = f 1 x i, y i + f Δx 1, f x 1 y Δy f 2 x i+1, y i+1 = f 2 x i, y i + f 2 x, f 2 y Δx Δy

Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 = f 1 x i, y i + f Δx 1, f x 1 y Δy f 2 x i+1, y i+1 = f 2 x i, y i + f Δx 2, f x 2 y Δy f 1 x i+1, y i+1 f 2 x i+1, y i+1 = f 1 x i, y i f 2 x i, y i f 1x, f 1 y + f 2, f x 2 y Δx Δy

Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 f 2 x i+1, y i+1 = f 1 x i, y i f 2 x i, y i f 1x, f 1 y + f 2, f x 2 y Δx Δy f x i+1 = f x i + J i Δ x f x i+1 = f x i + J i Δ x

Multivariate Taylor s expansion f x + Δ x = f x + J x Δ x + 1 2 Δ xt HΔ x +

Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x + 1 2 Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r

Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x + 1 2 Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x +

Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x + 1 2 Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r Multivariate scalar function f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + E.g. energy, cost

Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y

Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y

Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y H x = 2 f x 2 f x y f x y 2 f y 2 = 2 0 0 2

Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y H x = 2 f x 2 f x y f x y 2 f y 2 = 2 0 0 2 f x + Δ x = f x + f x f y Δx Δy + 1 2 Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy +

Multivariate Taylor s expansion example f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x + Δ x = f x + f x f y Δx Δy + 1 2 Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy +

Multivariate Taylor s expansion example f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x + Δ x = f x + f x f y Δx Δy + 1 2 Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy + f x + Δ x = f x + 2x 2y Δx Δy + 1 2 Δx Δy 2 0 0 2 Δx Δy +

Multivariate Taylor s expansion example f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x + Δ x = f x + f x f y Δx Δy + 1 2 Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy + f x + Δ x = f x + 2x 2y Δx Δy + 1 2 Δx Δy 2 0 0 2 Δx Δy + f x + Δ x = f x + 2xΔx + 2yΔy + Δx 2 + Δy 2 f Δ x = f 0,0 = Δx 2 + Δy 2

Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f x 1 f x n x = x 1 x n

Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x 2 + + f x n x n = n i=1 f x i x i = n i=1 f xi x i

Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x 2 + + f x n x n = n i=1 f x i x i = n i=1 f xi x i y T x = n i=1 y i x i

Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x 2 + + f x n x n = n i=1 f x i x i = n i=1 f xi x i y T x = n i=1 Plane tangential to function in point x Equation of plane ax 1 + bx 2 y i x i

Gradient T f( x) = f x 1 f x n

Gradient T f( x) = f f x 1 x n f( x)

Gradient T f( x) = f f x 1 x n f( x) f( x)

Gradient T f( x) = f f x 1 x n f( x) f( x) f( x)

Gradient T f( x) = f f x 1 x n f( x) f( x) - f( x) f( x) f( x)

Gradient T f( x) = f f x 1 x n f( x) f( x) f( x) f( x) f( x)

Gradient is a vector perpendicular to the function isoline T f( x) = f f x 1 x n f x = c f x(t) = c 0 = dc dt = df x t dt = f x t x 1 x 1 t t + f x t x 2 x 2 t t + + f x t x n x n t t = n f x t i=1 x i x i t t x 1 t 0 = dc n f x t dt = x i i=1 x i t t = f x t f x t x 1 x n t x n t t = T f x x (t)

Multivariate Taylor s expansion Hessian part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + H x = 2 f 2 f 2 x 1 x 1 x n 2 f 2 f n x 1 x 2 n x n x = x 1 x n x T = x 1 x n

Multivariate Taylor s expansion - Hessian part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + H x = 2 f 2 f 2 x 1 x 1 x n 2 f 2 f n x 1 x 2 n x n x = x 1 x n x T = x 1 x n x T H x = x 1 x n 2 f 2 f 2 x 1 x 1 x n 2 f 2 f n x 1 x 2 n x n x 1 x n

H x = H 11 H 1n H n1 H nn x 1 x n

H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n

n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1

n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n

n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j

n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x 2 + + H 1n x n H n1 x 1 + H n2 x 2 + + H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j = n j=1 x 1 x n H 1j x j H nj x j x T H x = n j=1 x 1 x n H 1j x j H nj x j

Hessian containts all information about the shape of the function around minima f x + Δ x = f x + T f x Δ x + 1 2 Δ xt HΔ x + f x + Δ x = f x + 1 2 Δ xt HΔ x +

Recall quadratic function f x = ax 2 a > 0 a = f (x) > 0 1 dimensional Hessian

Recall quadratic function f x = ax 2 a = 0 a = f x = 0 1 dimensional Hessian

Recall quadratic function f x = ax 2 a < 0 a = f x < 0 1 dimensional Hessian

Recall quadratic function f x = ax 2 + by 2 H = 2 f x 2 f x y f x y 2 f y 2 a 0 0 b Defines shape of a skeleton function of the resulting surface

How to detect minimum (unconstrained)? 1. Necessary condition for uncostrained optimum at point x : f x = 0 and f( x) is differentiable at x 2. Sufficient condition for uncostrained optimum at point x: f x = 0 and f( x) is differentiable at x and 2 f( x ) is positive definite

Positive and negative defined matrix If for every vector x the following is true for a symmetric matrix H: x T H x > 0 then H is positive definite x T H x < 0 then H is negative definite Very cumbersome definition one needs to check every vector x

Positive and negative defined matrix Symmetric matrix H is positive definite if: 1. All eigenvalues are positive. 2. The determinant of each of its principal minor matrices are positive. Symmetric matrix H is negative definite if: 1. All eigenvalues are negative. 2. If we reverse the sign of each matrix s elements and the determinant of each of its principal minor matrices are positive.

Example H f 2 4 4 2 x = 2 4 4 2 2 = 2 > 0 = 4 16 = 12 < 0

Nature of stationary points Hessian H positive definite: Quadratic form Eigenvalues y T Hy 0 0 i y T T M M y T My My My 2 0 Local nature: minimum

Nature of stationary points (2) Hessian H negative definite: Quadratic form Eigenvalues Local nature: maximum y T Hy 0 0 i

Nature of stationary points (3) Hessian H indefinite: Quadratic form Eigenvalues y T Hy 0 0 i Local nature: saddle point

Nature of stationary points (4) Hessian H positive semi-definite: Quadratic form Eigenvalues y T Hy 0 0 i H singular! Local nature: valley

Nature of stationary points (5) Hessian H negative semi-definite: Quadratic form Eigenvalues y T Hy 0 0 i H singular! Local nature: ridge

Stationary point nature summary y T Hy i Definiteness H Nature x* 0 Positive d. Minimum 0 0 Positive semi-d. Indefinite Valley Saddlepoint 0 0 Negative semi-d. Negative d. Ridge Maximum

Newton method for optimization 1D Recall Newton method for finding a root of the function f (f x = 0?) x i+1 = x i f x i f x i Function f has minima (maximum) if f x = 0 Let g x = f x = 0 x i+1 = x i g x i g x i = x i f x i f x i

Newton method 1D different perspective Taylor expansion Newton method in fact is again application of Taylor expansion. It answer the question: How far should I jump to reach minimum namely point where f x i+1 = 0? That question can be unswered by expanding the function in the initial point x i : f x i + x = f x i + f x i x + 1 2 f x 2 + Now we want to move by Δx so we will reach minimum namely df x i + x = 0 dx So 0 = df x i + x = d dx dx f x i + f x i x + 1 2 f x 2 +

Newton method 1D different perspective Taylor expansion 0 = df x i + x dx = d dx f x i + f x i x + 1 2 f x 2 + 0 = f x i + f x + We take (for computational simplicty) only first term with x. Recall from the previous equation it comes from quadratic term. 0 = f x i + f x Because x = x i+1 x i 0 = f x i + f x i+1 x i x i+1 = x i f x i f x i

In Newton method we travel along parabola. Why? Because its the first polynomial that has minima indicator f(x) f x i f x i + Δx f x i f x i+1 = x i + Δx x i+1 x i x

For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i

For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i = x i 2x i 2

For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i = x i 2x i 2 = x i x i = 0

Multivariate Newton method for optimization Again How far should I jump to reach minimum namely point so f x i+1 = 0. f x i+1 = f x i + T f x i ( x i+1 x i ) + 1 2 x i+1 x T i H( x i+1 x i ) f x = b f x = 0 Δ x f x i+1 = Δ x f x i + T f x i ( x i+1 x i ) + 1 2 x i+1 x T i H( x i+1 x i ) f x = b T x f x = b f x = x T A x f x = A T x + A x Δ x f x i+1 = f x i + 1 2 HT ( x i+1 x i ) + 1 2 H( x i+1 x i ) H T = H Δ x f x i+1 = f x i + H( x i+1 x i )

We look for x where f x = 0 so.. 0 = Δ x f x i+1 = f x i + H( x i+1 x i ) 0 = f x i + H( x i+1 x i ) f x i = H( x i+1 x i ) x i+1 = x i H 1 ( x i ) f x i

Multivariate optimisation Newton method Pros: Converges fast (especially for quadratic functions) Uses both information about shape (gradient) and curvature (Hessian). Cons: High compuational cost (Hessian, inverse) Hessian may be singular, Computational errors.

Steepest descent Easy idea let s move towards steepest descent namely along f Question is how far should we go? r = f x x We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0

Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0

Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0 We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0

Directional derivative We go until we reach minimum along x 1 direction. So we need To calculate derivative over α along x 1. df x dα = df x α dα = df dx 1 dx 1 dα + df dx 2 dx 2 dα + + df n dx n df dx n dα = dx i dx i dα i=1 df x dα n = i=1 df dx i dx i dα = T f d x dα

Steepest descent f x 1 Starting point x 0 Easy idea let s move towards steepest descent namely along f Question is how far should we go? r = f x x. Final point x 1 r 0 = f x 0 x 1 = x 0 α f x 0 = x 0 + α r 0 df x 1 dα = T f x 1 d x 1 dα = T f x 1 d dα x 0 + α r 0 = T f x 1 r 0 df x 1 dα = 0 T f x 1 r 0 = 0 r 1 T r 0 = 0 We go until gradient is perpendicular to the gradient in initial point.

Steepest descent f x 1 Starting point x 0 Easy idea let s move towards steepest descent namely along f Question is how far should we go? r = f x x. Final point x 1 f x 0 x 1 = x 0 α f x 0 = x 0 + α r 0 df x 1 dα = T f x 1 d x 1 dα = T f x 1 d dα x 0 + α r 0 = T f x 1 r 0 df x 1 dα = 0 T f x 1 r 0 = 0 r 1 T r 0 = 0 We go until gradient is perpendicular to the gradient in initial point. And then start again

How to find it computationally? r 1 T r 0 = 0 f x i+1 = f x i + T f x i ( x i+1 x i ) + 1 2 x i+1 x T i H( x i+1 x i ) f x 1 = f x 0 + T f x 0 ( x 1 x 0 ) + 1 2 x 1 x T 0 H( x 1 x 0 ) f x 1 = T f x 0 + H( x 1 x 0 ) 0 = f x 1 = T f x 0 + H( x 1 x 0 )

How to find it computationally? f x 1 = T f x 0 + H( x 1 x 0 ) r 1 = T f x 0 + H( x 1 x 0 ) r 1 = T f x 0 + Hα r 0 r 1 T r 0 = 0 T f x 0 + Hα r T 0 r 0 = 0

How to find it computationally? T f x 0 + Hα r T 0 r 0 = 0 f x 0 r 0 + α r T 0 H T r 0 = 0 f x 0 r 0 + α r T 0 H r 0 = 0 α = f r T 0 H x 0 r 0 r 0

Steepest descent routine ex f x = x 2 + y 2 1. Choose an initial point say x 0 = 2 2. 2. Choose accuracy ε say 10 6. 2. Compute gradient at this point r 0 = f x 0 = 2x 2y = 4 4. 3. Compute optimal α along r 0 : - compute Hessian at x 0 = 2 2, H = 2 0 0 2, - compute r 0 T r 0 = 2 2 2 2 = 8 - compute r T 0 H r 0 = 2 2 2 0 0 2 - compute α = r 0 T r 0 = 8 = 1 r T 0 H r 0 16 2 2 2 = 16. 4. Compute next point x 1 = x 0 α r 0 = 2 2 1 4 2 4 = 0 0. 5. Compute f x = f 0 = 0. If f x ε finish else go to 2. 0

Steepest descent method Pros: Always goes downhill Always converges Simple implementation Cons: Slow on eccentric functions

Steepest descent method eccentric function example Theorem If we define the error function in the objective function at current value x as: There hold at every step k Where A largest eigenvalue of H a smallest eigenvalue of H E E x = 1 2 x x k+1 x T H( x x ) A a A + a 2 E x k

Steepest descent method eccentric function example For function f x = x 2 + y 2 H = 2 0 0 2 A = 2, a = 2 E x k+1 0 4 E xk = 0 direct method For function f x = 50x 2 + y 2 H = 100 0 0 2 A = 50, a = 2 E x k+1 98 102 2 E x k slow convergence

Solution? combined methods Recall that function around minimum is quadratic f x + Δx = f x + df(x) dx Δx + 1 2 d 2 f x dx 2 Δx2 + But if in f(x) there is a minimum so df(x) dx = 0 and f x + Δx f x + aδx 2 So around minimum Newton method should work really well. Combined method (so called quasi-newton methods) start with steepest descent and transform into Newton method once it reaches near minimum region.

Improvements Computing Hessian is very costfull not metioning inverse: BFGS (Broyden-Fletcher-Goldfarb-Shanno), Conjugate gradients, DFP (Davidon-Fletcher-Powell).