Multivariate Newton Minimanization

Size: px

Start display at page:

Download "Multivariate Newton Minimanization"

Primrose French
5 years ago
Views:

1 Multivariate Newton Minimanization

2 Optymalizacja syntezy biosurfaktantu

3 Rhamnolipid Rhamnolipids are naturally occuring glycolipid produced commercially by the Pseudomonas aeruginosa species of bacteria. Application: They promote the uptake and biodegradation of poorly soluble substrates, They serve as immune modulators and virulence factors, They act as antimicrobials, They are used in surface motility, They are used to develope biofilm.

4 Rhamnolipid kinetics

5 Użyty został wariant 2 2 planowanego doświadczenia x 1 stężenie glicerolu x 2 stosunek wytłoków z trzciny cukrowej do nasion słonecznika x 1 x 2

6 y=46,25-2,35x1 6,18x2-15,8x12-14,92x22-9,74x1 x2

7 Surfaces (hipersurfaces) can have much more complex topology

8 Optimisation Process of finding maximum (minimum) value of a given finction in a specific region (constraints): Unconstrained Constrained.

9 Finding a root of the nonlinear equation Newton-Raphson method f(x) f(x 1 ) f (x 1 ) f x 1 0 x 2 x 1 x x 1 x 2 f x 1 = f x 1 0 x 1 x 2 = f x 1 x 1 x 2 x 2 = x 1 f x 1 f x 1

10 Finding a root of the nonlinear equation - Newton-Raphson method f(x) f(x 2 ) f (x 2 ) f x 2 0 x 3 x 2 x x 3 x 2 f x 2 = f x 2 0 x 3 x 2 = f x 2 x 3 x 2 x 3 = x 2 f x 2 f x 2

11 Finding a root of the nonlinear equation general expression f(x) f(x i ) f (x i ) f x i 0 x i+1 x i x x i+1 x i f x i = f x i 0 x i x i+1 = x i+1 = x i f x i f x i f x i x i x i+1

12 When do we stop? When nth change f(x)=0 or the change is small f(x) f(x i ) f (x i ) f x i 0 x i+1 x i x x i x i+1 x i+1 x i x i+1 100% = err Relative change

13 Example f x = x 2 1 f(x) Stopping cryterio < 30 % f(x i ) 1 1

14 Example f x = x 2 1 f(x) f 4 = 15 y = 8x Stopping cryterio < 30 % f x 1 = 2 4 = 8 f(x i ) 1 1 x 2 =? x 1 = 4 x f x = x 2 1 = 2x x i+1 = x i f x i f x i

15 Example f x = x 2 1 f(x) y = 8x f 4 = 15 Stopping cryterio < 30 % % = 46% f x 1 = 2 4 = 8 f(x i ) 1 1 x 2 =? x 1 = 4 x f x = x 2 1 = 2x x i+1 = x i f x i f x i i = 1, x 2 = x 1 f x 1 f = x = = 2.125

16 Example f x = x 2 1 f(x) y = 4.25x f(x i ) f x 2 = = f x = x 2 1 = 2x 1 x 3 =? x 2 = x i+1 = x i f x i f x i x i = 2, x 3 = x 2 f x 2 f = x = 1.28

17 Example f x = x 2 1 f(x) % = 66% f(x i ) f = 3.51 f 1 = 0.68 y = 4.25x f x 2 = = f x = x 2 1 = 2x 1 x 3 =? x 2 = x i+1 = x i f x i f x i x i = 2, x 3 = x 2 f x 2 f = x = 1.28

18 Example f x = x 2 1 f(x) f(x i ) y = 2.56x f x 3 = = f x = x 2 1 = 2x 1 x 3 = 1.28 x 4 =? x i+1 = x i f x i f x i x

19 Example f x = x 2 1 f(x) Finish! f 1 = % = 24% f(x i ) f = 3.51 y = 2.56x f x 3 = = f x = x 2 1 = 2x 1 x 3 = 1.28 x 4 =? x i+1 = x i f x i f x i x i = 3, x 4 = x 3 f x 3 f = x = 1.03

20 What about minima (maxima)? We have procedure to find zero of the function f(x) x i+1 = x i f x i f x i Where function f has its minima (maxima) f (x) = 0 So we are looking for the zero of the function g x = f (x) = 0 x i+1 = x i g x i g x i = x i f x i f x i

21 What about multidemensional problem? In order to explore topology of multidemensional surface we have to use again Taylors expansion series. Taylor expansion gives information about surrounding of the function (f x + Δx ) using ONLY LOCAL information about function: - f(x) it s value, - f (x) the rate of change of f in x, - f (x) it s curvature in x, - f (x) it s rate of change of curvature in x, - And so on. What is important that all the derivates are computed only in point x.

23 Accuracy using derivatives Trunctuation error resulting of final representation using TE f x + Δx = f x + df dx Δx + 1 d 2 f 2 dx 2 Δx2 + df dx = f x + Δx f x Δx 1 2 d 2 f dx 2 Δx + df dx f x + Δx f x Δx

24 Trunctuation error Trunctuation error resulting of final representation using TE ε T = 1 d 2 f Δx ~Δx 2 dx2

25 Accuracy using derivatives Round-off error resulting of final representation of numbers (lack of significant figures) f x + Δx = f x + df dx Δx + 1 d 2 f 2 dx 2 Δx2 + df dx = f x + Δx + ε f x + ε Δx 1 2 d 2 f dx 2 Δx + df dx f x + Δx f x Δx + 2 ε Δx

26 Round-off error Round-off error resulting of final representation of numbers (lack of significant figures) ε R = 2 ε Δx ~ 1 Δx ε total = 2 ε Δx 1 2 d 2 f dx 2 Δx

27 Examples of tructuation and round-off errors f x = x 3 + x 1 3 at point x = 3 True derivative value at point 3 is Δx = 0.01 df dx f x + Δx f x = f f 3 Δx 0.01 ε T = = df dx f x + Δx f x Δx f f = 2Δx 0.02 ε T = =

28 Examples of tructuation and round-off errors f x = x 3 + x 1 3 at point x = 3 True derivative value at point 3 is Δx = 0.01 f = f(3) = df dx f x + Δx f x = f f 3 Δx 0.01 ε total = = 27.4 Δx ε total

29 Total error Δx ε total

30 Taylor s expansion in 2D f x i+1, y i+1 = f x i, y i + f x Δx + f y Δy f x i+1, y i+1 = f x i, y i + f x, f y Δx Δy

31 Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 = f 1 x i, y i + f 1 Δx + f x 1 Δy y f 1 x i+1, y i+1 = f 1 x i, y i + f Δx 1, f x 1 y Δy f 2 x i+1, y i+1 = f 2 x i, y i + f 2 Δx + f x 2 Δy y f 2 x i+1, y i+1 = f 2 x i, y i + f 2 x, f 2 y Δx Δy

32 Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 = f 1 x i, y i + f Δx 1, f x 1 y Δy f 2 x i+1, y i+1 = f 2 x i, y i + f 2 x, f 2 y Δx Δy

33 Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 = f 1 x i, y i + f Δx 1, f x 1 y Δy f 2 x i+1, y i+1 = f 2 x i, y i + f Δx 2, f x 2 y Δy f 1 x i+1, y i+1 f 2 x i+1, y i+1 = f 1 x i, y i f 2 x i, y i f 1x, f 1 y + f 2, f x 2 y Δx Δy

34 Taylor s expansion in 2D of two functions f 1 x i+1, y i+1 f 2 x i+1, y i+1 = f 1 x i, y i f 2 x i, y i f 1x, f 1 y + f 2, f x 2 y Δx Δy f x i+1 = f x i + J i Δ x f x i+1 = f x i + J i Δ x

35 Multivariate Taylor s expansion f x + Δ x = f x + J x Δ x Δ xt HΔ x +

36 Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r

37 Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r f x + Δ x = f x + T f( x)δ x Δ xt HΔ x +

38 Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r Multivariate scalar function f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + E.g. energy, cost

gravitational force F r = G Mm r 3 r Multivariate scalar function f x + Δ x = f x + T

39 Multivariate Taylor s expansion Multivariate vector function f x + Δ x = f x + J x Δ x Δ xt HΔ x + E.g. gravitational force F r = G Mm r 3 r Multivariate scalar function f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + E.g. energy, cost J x = f 1 f 1 x 1 x n f n f n x 1 x n f x = f x 1 f x n H x = 2 f f 2 x 1 x 1 x n f 2 f n x 1 x 2 n x n

40 Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y

41 Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y

42 Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y H x = 2 f x 2 f x y f x y 2 f y 2 =

43 Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y H x = 2 f x 2 f x y f x y 2 f y 2 = f x + Δ x = f x + f x f y Δx Δy Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy +

44 Multivariate Taylor s expansion example f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x = f x 1 f x n f x = f x f y = 2x 2y H x = 2 f x 2 f x y f x y 2 f y 2 = f x + Δ x = f x + f x f y Δx Δy Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy + f x + Δ x = f Δx Δy x + 2x 2y Δx Δy Δx Δy +

45 Multivariate Taylor s expansion example f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x + Δ x = f x + f x f y Δx Δy Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy +

46 Multivariate Taylor s expansion example f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x + Δ x = f x + f x f y Δx Δy Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy + f x + Δ x = f x + 2x 2y Δx Δy Δx Δy Δx Δy +

47 Multivariate Taylor s expansion example f: R 2 R 1 np. f x, y = x 2 + y 2 x = x y f x + Δ x = f x + f x f y Δx Δy Δx Δy 2 f x 2 f x y f x y 2 f y 2 Δx Δy + f x + Δ x = f x + 2x 2y Δx Δy Δx Δy Δx Δy + f x + Δ x = f x + 2xΔx + 2yΔy + Δx 2 + Δy 2 f Δ x = f 0,0 = Δx 2 + Δy 2

48 Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + T f( x) = f x 1 f x n x = x 1 x n

49 Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x f x n x n = n i=1 f x i x i = n i=1 f xi x i

50 Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x f x n x n = n i=1 f x i x i = n i=1 f xi x i y T x = n i=1 y i x i

51 Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x f x n x n = n i=1 f x i x i = n i=1 f xi x i y T x = n i=1 Plane tangential to function in point x Equation of plane ax 1 + bx 2 y i x i

Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x + 1 2 Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x 2 + + f x n x n = n i=1 f x i x

52 Multivariate Taylor s expansion Gradient part f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + T f( x) = f f x 1 x n x = x 1 x n T f x x = f x 1 x 1 + f x 2 x f x n x n = n i=1 f x i x i = n i=1 f xi x i y T x = n i=1 Plane tangential to function in point x Equation of plane ax 1 + bx 2 y i x i Gradient gives information about the rate of change of f in each direction (x,y)

53 Gradient T f( x) = f x 1 f x n

54 Gradient T f( x) = f x 1 f x n

55 Gradient T f( x) = f f x 1 x n f( x)

56 Gradient T f( x) = f f x 1 x n f( x) f( x)

57 Gradient T f( x) = f f x 1 x n f( x) f( x) f( x)

58 Gradient T f( x) = f f x 1 x n f( x) f( x) - f( x) f( x) f( x)

59 Gradient T f( x) = f f x 1 x n f( x) f( x) f( x) f( x) f( x)

60 Gradient is a vector perpendicular to the function isoline T f( x) = f f x 1 x n f x = c f x(t) = c 0 = dc dt = df x t dt = f x t x 1 x 1 t t + f x t x 2 x 2 t t + + f x t x n x n t t = n f x t i=1 x i x i t t x 1 t 0 = dc n f x t dt = x i i=1 x i t t = f x t f x t x 1 x n t x n t t = T f x x (t)

61 Multivariate Taylor s expansion Hessian part f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + H x = 2 f 2 f 2 x 1 x 1 x n 2 f 2 f n x 1 x 2 n x n x = x 1 x n x T = x 1 x n

62 Multivariate Taylor s expansion - Hessian part f x + Δ x = f x + T f( x)δ x Δ xt HΔ x + H x = 2 f 2 f 2 x 1 x 1 x n 2 f 2 f n x 1 x 2 n x n x = x 1 x n x T = x 1 x n x T H x = x 1 x n 2 f 2 f 2 x 1 x 1 x n 2 f 2 f n x 1 x 2 n x n x 1 x n

63 H x = H 11 H 1n H n1 H nn x 1 x n

64 H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x H 1n x n H n1 x 1 + H n2 x H nn x n

65 n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x H 1n x n H n1 x 1 + H n2 x H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1

66 n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x H 1n x n H n1 x 1 + H n2 x H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n

67 n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x H 1n x n H n1 x 1 + H n2 x H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j

68 n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x H 1n x n H n1 x 1 + H n2 x H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j = n j=1 x 1 x n H 1j x j H nj x j

69 n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x H 1n x n H n1 x 1 + H n2 x H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j = n j=1 x 1 x n H 1j x j H nj x j x T H x = n j=1 x 1 x n H 1j x j H nj x j

70 n H x = H 11 H 1n H n1 H nn x 1 x n = H 11 x 1 + H 12 x H 1n x n H n1 x 1 + H n2 x H nn x n = j=1 n H 1j x j = n j=1 H 1j x j H nj x j H nj x j j=1 x T H x = x 1 x n H 11 H 1n H n1 H nn x 1 x n = x 1 x n n j=1 H 1j x j H nj x j = n j=1 x 1 x n H 1j x j H nj x j x T H x = n j=1 x 1 x n H 1j x j H nj x j = n j=1 H 1j x j x 1 + H 1j x j x H 1j x j x n = n n i=1 j=1 H ij x i x j

71 Hessian containts all information about the shape of the function around minima f x + Δ x = f x + T f x Δ x Δ xt HΔ x + f x + Δ x = f x Δ xt HΔ x +

72 Recall quadratic function f x = ax 2 a > 0 a = f (x) > 0 1 dimensional Hessian

73 Recall quadratic function f x = ax 2 a = 0 a = f x = 0 1 dimensional Hessian

74 Recall quadratic function f x = ax 2 a < 0 a = f x < 0 1 dimensional Hessian

75 Recall quadratic function f x = ax 2 + by 2 H = 2 f x 2 f x y f x y 2 f y 2 a 0 0 b Defines shape of a skeleton function of the resulting surface

76 How to detect minimum (unconstrained)? 1. Necessary condition for uncostrained optimum at point x : f x = 0 and f( x) is differentiable at x 2. Sufficient condition for uncostrained optimum at point x: f x = 0 and f( x) is differentiable at x and 2 f( x ) is positive definite

77 Positive and negative defined matrix If for every vector x the following is true for a symmetric matrix H: x T H x > 0 then H is positive definite x T H x < 0 then H is negative definite Very cumbersome definition one needs to check every vector x

78 Positive and negative defined matrix Symmetric matrix H is positive definite if: 1. All eigenvalues are positive. 2. The determinant of each of its principal minor matrices are positive. Symmetric matrix H is negative definite if: 1. All eigenvalues are negative. 2. If we reverse the sign of each matrix s elements and the determinant of each of its principal minor matrices are positive.

79 Example H f x = = 2 > 0 = 4 16 = 12 < 0

80 Nature of stationary points Hessian H positive definite: Quadratic form Eigenvalues y T Hy 0 0 i y T T M M y T My My My 2 0 Local nature: minimum

81 Nature of stationary points (2) Hessian H negative definite: Quadratic form Eigenvalues Local nature: maximum y T Hy 0 0 i

82 Nature of stationary points (3) Hessian H indefinite: Quadratic form Eigenvalues y T Hy 0 0 i Local nature: saddle point

83 Nature of stationary points (4) Hessian H positive semi-definite: Quadratic form Eigenvalues y T Hy 0 0 i H singular! Local nature: valley

84 Nature of stationary points (5) Hessian H negative semi-definite: Quadratic form Eigenvalues y T Hy 0 0 i H singular! Local nature: ridge

85 Stationary point nature summary y T Hy i Definiteness H Nature x* 0 Positive d. Minimum 0 0 Positive semi-d. Indefinite Valley Saddlepoint 0 0 Negative semi-d. Negative d. Ridge Maximum

86 Newton method for optimization 1D Recall Newton method for finding a root of the function f (f x = 0?) x i+1 = x i f x i f x i Function f has minima (maximum) if f x = 0 Let g x = f x = 0 x i+1 = x i g x i g x i = x i f x i f x i

87 Newton method 1D different perspective Taylor expansion Newton method in fact is again application of Taylor expansion. It answer the question: How far should I jump to reach minimum namely point where f x i+1 = 0? That question can be unswered by expanding the function in the initial point x i : f x i + x = f x i + f x i x f x 2 + Now we want to move by Δx so we will reach minimum namely df x i + x = 0 dx So 0 = df x i + x = d dx dx f x i + f x i x f x 2 +

88 Newton method 1D different perspective Taylor expansion 0 = df x i + x dx = d dx f x i + f x i x f x = f x i + f x + We take (for computational simplicty) only first term with x. Recall from the previous equation it comes from quadratic term. 0 = f x i + f x Because x = x i+1 x i 0 = f x i + f x i+1 x i x i+1 = x i f x i f x i

89 In Newton method we travel along parabola. Why? Because its the first polynomial that has minima indicator f(x) f x i f x i + Δx f x i f x i+1 = x i + Δx x i+1 x i x

90 For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i

91 For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i = x i 2x i 2

92 For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i = x i 2x i 2

93 For second degree polynomial it is a one step method. f x = x 2 1 f(x) f x i f x i + Δx f x i x i+1 x i x f x i+1 = x i + Δx x i+1 = x i f x i f x i = x i 2x i 2 = x i x i = 0

94 Multivariate Newton method for optimization Again How far should I jump to reach minimum namely point so f x i+1 = 0. f x i+1 = f x i + T f x i ( x i+1 x i ) x i+1 x T i H( x i+1 x i ) f x = b f x = 0 Δ x f x i+1 = Δ x f x i + T f x i ( x i+1 x i ) x i+1 x T i H( x i+1 x i ) f x = b T x f x = b f x = x T A x f x = A T x + A x Δ x f x i+1 = f x i HT ( x i+1 x i ) H( x i+1 x i ) H T = H Δ x f x i+1 = f x i + H( x i+1 x i )

95 We look for x where f x = 0 so.. 0 = Δ x f x i+1 = f x i + H( x i+1 x i ) 0 = f x i + H( x i+1 x i ) f x i = H( x i+1 x i ) x i+1 = x i H 1 ( x i ) f x i

96 Multivariate optimisation Newton method Pros: Converges fast (especially for quadratic functions) Uses both information about shape (gradient) and curvature (Hessian). Cons: High compuational cost (Hessian, inverse) Hessian may be singular, Computational errors.

97 Steepest descent Easy idea let s move towards steepest descent namely along f Question is how far should we go? r = f x x We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0

98 Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0

99 Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0 We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0

100 Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0 We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0 How much should be α? We go until we reach minimum along x 1 direction.

101 Steepest descent Starting point x 0 Easy idea let s move towards steepest descent namely along f r = f x x Final point x 1 r = f x 0 We go along straight line (vector). Question is how far should we go? x 1 = x 0 α f x 0 = x 0 + α r 0 How much should be α? f x 1 ( x) We go until we reach minimum along x 1 direction. α

102 Directional derivative We go until we reach minimum along x 1 direction. So we need To calculate derivative over α along x 1. df x dα = df x α dα = df dx 1 dx 1 dα + df dx 2 dx 2 dα + + df n dx n df dx n dα = dx i dx i dα i=1 df x dα n = i=1 df dx i dx i dα = T f d x dα

103 Steepest descent f x 1 Starting point x 0 Easy idea let s move towards steepest descent namely along f Question is how far should we go? r = f x x. Final point x 1 r 0 = f x 0 x 1 = x 0 α f x 0 = x 0 + α r 0 df x 1 dα = T f x 1 d x 1 dα = T f x 1 d dα x 0 + α r 0 = T f x 1 r 0 df x 1 dα = 0 T f x 1 r 0 = 0 r 1 T r 0 = 0 We go until gradient is perpendicular to the gradient in initial point.

104 Steepest descent f x 1 Starting point x 0 Easy idea let s move towards steepest descent namely along f Question is how far should we go? r = f x x. Final point x 1 f x 0 x 1 = x 0 α f x 0 = x 0 + α r 0 df x 1 dα = T f x 1 d x 1 dα = T f x 1 d dα x 0 + α r 0 = T f x 1 r 0 df x 1 dα = 0 T f x 1 r 0 = 0 r 1 T r 0 = 0 We go until gradient is perpendicular to the gradient in initial point. And then start again

105 How to find it computationally? r 1 T r 0 = 0 f x i+1 = f x i + T f x i ( x i+1 x i ) x i+1 x T i H( x i+1 x i ) f x 1 = f x 0 + T f x 0 ( x 1 x 0 ) x 1 x T 0 H( x 1 x 0 ) f x 1 = T f x 0 + H( x 1 x 0 ) 0 = f x 1 = T f x 0 + H( x 1 x 0 )

106 How to find it computationally? f x 1 = T f x 0 + H( x 1 x 0 ) r 1 = T f x 0 + H( x 1 x 0 ) r 1 = T f x 0 + Hα r 0 r 1 T r 0 = 0 T f x 0 + Hα r T 0 r 0 = 0

107 How to find it computationally? T f x 0 + Hα r T 0 r 0 = 0 f x 0 r 0 + α r T 0 H T r 0 = 0 f x 0 r 0 + α r T 0 H r 0 = 0 α = f r T 0 H x 0 r 0 r 0

108 Steepest descent routine ex f x = x 2 + y 2 1. Choose an initial point say x 0 = Choose accuracy ε say Compute gradient at this point r 0 = f x 0 = 2x 2y = Compute optimal α along r 0 : - compute Hessian at x 0 = 2 2, H = , - compute r 0 T r 0 = = 8 - compute r T 0 H r 0 = compute α = r 0 T r 0 = 8 = 1 r T 0 H r = Compute next point x 1 = x 0 α r 0 = = Compute f x = f 0 = 0. If f x ε finish else go to 2. 0

109 Steepest descent method Pros: Always goes downhill Always converges Simple implementation Cons: Slow on eccentric functions

110 Steepest descent method eccentric function example Theorem If we define the error function in the objective function at current value x as: There hold at every step k Where A largest eigenvalue of H a smallest eigenvalue of H E E x = 1 2 x x k+1 x T H( x x ) A a A + a 2 E x k

111 Steepest descent method eccentric function example For function f x = x 2 + y 2 H = A = 2, a = 2 E x k E xk = 0 direct method For function f x = 50x 2 + y 2 H = A = 50, a = 2 E x k E x k slow convergence

112 Solution? combined methods Recall that function around minimum is quadratic f x + Δx = f x + df(x) dx Δx d 2 f x dx 2 Δx2 + But if in f(x) there is a minimum so df(x) dx = 0 and f x + Δx f x + aδx 2 So around minimum Newton method should work really well. Combined method (so called quasi-newton methods) start with steepest descent and transform into Newton method once it reaches near minimum region.

113 Improvements Computing Hessian is very costfull not metioning inverse: BFGS (Broyden-Fletcher-Goldfarb-Shanno), Conjugate gradients, DFP (Davidon-Fletcher-Powell).

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is