Minimization of Static! Cost Functions!

Minimization of Static Cost Functions Robert Stengel Optimal Control and Estimation, MAE 546, Princeton University, 2017 J = Static cost function with constant control parameter vector, u Conditions for a minimum in J with respect to u Analytical and numerical solutions Copyright 2017 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/mae546.html http://www.princeton.edu/~stengel/optconest.html 1 Static Cost Function Minimum value of the cost, J, is fixed, but may be unknown Corresponding control parameter, u*, also is fixed but possibly unknown Cost function preferably has Single minimum Locally smooth, monotonic contours away from the minimum 2

Vector Norms for Real Variables Norm = Measure of length or magnitude of a vector, x Scalar quantity axicab or Manhattan norm L 1 norm = x 1 = Euclidean or Quadratic Norm n i=1 x i x 1 x x = 2 x n dim(x) = n 1 L 2 norm = x 2 = ( x x) 1/2 = x 2 1 + x 2 2 ( 2 ++ x n ) 1/2 p Norm L p norm = x p = n i=1 x i p 1/ p Infinity Norm: p -> 3 p Norm Example L p norm = x p = 4.5 4 3.5 3 n i=1 x i p 15-dimensional vector 1/ p p norms of the vector p L p 1 24.0000 2 7.2111 4 4.6312 8 4.0957 16 4.0050 32 4.0000 2.5 2 1.5 1 0.5 1 2 1 1 3 1 1 4 1 1 3 1 1 2 1 As p increases, the norm approaches the value of the maximum component of the vector 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4

Apples and Oranges Problem Suppose elements of x represent quantities with different units One solution: Vector, y, normalized by ranges of elements of x y = x = y 1 y 2 y n x 1 x 2 x n = Velocity, m / s Angle, rad emperature, K ( ) ( ) d 1 x 1 x 1 x 1max ( x 1min d 2 x 2 x 2 x 2max ( x 2min d n x n x n x nmax ( x nmin ( ) What is a reasonable definition for the norm? = d 1 0 0 0 d 2 0 0 0 d n x 1 x 2 x n = Dx 5 Weighted Euclidean (Quadratic) Norm of x ( ) 1/2 = y 2 1 + y 2 2 ( 2 ++ y m ) 1/2 ( ) 1/2 dim y = Dx 2 dim x y 2 = y y = x D Dx dim D ( ) = m 1 ( ) = n 1 ( ) = m n If m = n, D is square D D is square and symmetric If D is diagonal D D is diagonal If m n, D is not square D D is square and symmetric Rank(D) m, n > m Rank(D) n, n < m 6

Rank of Matrix D Maximal number of linearly independent rows or columns of D Order of the largest non-zero minor determinant of D Examples D = D = a c e b d f a b c d e f ; maximal rank 2 ; maximal rank 2 7 Quadratic Forms x D Dx x Qx = Quadratic form Q D D = Defining matrix of the quadratic form dim(q) = n x n Q is symmetric x Qx is a scalar Rank(Q) m, n > m Rank(Q) n, n < m Useful identity for the trace of the quadratic form x Qx = r( x Qx) = r( xx Q) = r( Qxx ) ( 1 n) ( n n) ( n 1) = ( 1 1) = r 1 1 ( ) = r ( n n) ( ) = r n n = Scalar 8

Why are Quadratic Forms Useful? 2 x 2 example J ( x) x Qx = x 1 x 2 = q 11 x 2 1 + q 22 x 2 2 + ( q 12 + q 21 )x 1 x 2 q 11 q 12 q 21 q 22 x 1 x 2 = q 11 x 1 2 + q 22 x 2 2 + 2q 12 x 1 x 2 for symmetric matrix 2 nd -order term of aylor series expansion of any vector function f ( x) f ( x 0 + x) = f ( x 0 ) + x ( + 1 f 2 ( x) x x=x0 ( 2 x ( x +... H.O. x 2 x=x 0 ( Gradient (as row vector) well-defined everywhere J x J x 1 J x 2 ( ) ( 2q 22 x 2 + 2q 12 x 1 ) = 2q 11x 1 + 2q 12 x 2 Large values weighted more heavily than small values Some terms can count more than others Coupling between terms can be considered Asymmetric Q can always be replaced by a symmetric Q 9 Definiteness 10

Definiteness of a Matrix If x Qx > 0 for all x 0 a) he scalar is definitely positive b) he matrix Q is a positive definite matrix Q = 1 0 0 0 2 0 0 0 3 If x Qx 0 for all x 0 Q is a positive semi - definite matrix Q = 1 0 0 0 2 0 0 0 0 If x Qx < 0 for all x 0 Q is a negative definite matrix Q = 1 0 0 0 2 0 0 0 3 11 Positive-Definite Matrix Q is positive-definite if All leading principal minor determinants are positive All eigenvalues are real and positive 3 x 3 example q 11 q 12 q 13 Q = q 21 q 22 q 23 q 31 q 32 q 33 ( ) = s 3 + a 2 s 2 + a 1 s + a 0 = ( s 1 )( s 2 )( s 3 ) det si Q 1, 2, 3 > 0 q 11 > 0, q 11 q 12 q 21 q 22 > 0, q 11 q 12 q 13 q 21 q 22 q 23 > 0 q 31 q 32 q 33 What s an eigenvalue? 12

Characteristic Polynomial Characteristic polynomial of a matrix, F si F = det( si F) = Scalar ( si F) = (s) = s n + a n1 s n1 +... + a 1 s + a 0 = s n r( F)s n1 +... + a 1 s + a 0 where ( s f 11 ) f 12... f 1n f 21 s f 22 ( )... f 2n............ ( ) f n1 f n2... s f nn (n x n) 13 Eigenvalues Characteristic equation of the matrix (s) = s n + a n1 s n1 +... + a 1 s + a 0 0 ( )( s 2 )(...)( s n ) 0 = s 1 i are solutions that set ( s) = 0 hey are called the eigenvalues of F the roots of the characteristic polynomial a n1 = r( F) a 0 = 1 n ( ) i i i=1 14

Examples of Positive Definite Matrices Inertia matrix of a rigid body I = (y 2 + z 2 ) xy xz ( xy (x 2 + z 2 ) yz dm = xz yz (x 2 + y 2 ) Bo dy I xx I xy I xz I xy I yy I yz I xz I yz I zz Diagonal matrix with positive elements Q = 7 0 0 0 12 0 0 0 19 15 Rotation (Direction Cosine) Matrix y = Hx Projections of unit vector components of one Cartesian reference frame on another Rotational orientation of one Cartesian reference frame with respect to another H = cos 11 cos 21 cos 31 cos 12 cos 22 cos 32 cos 13 cos 23 cos 33 = h 11 h 12 h 13 h 21 h 22 h 23 h 31 h 32 h 33 16

Euler Angles Body attitude measured with respect to inertial frame hree-angle orientation expressed by sequence of three orthogonal single-angle rotations from inertial to body frame : Yaw angle : Pitch angle : Roll angle det(h) = 1 H = cos cos cos sin sin ( * cos sin + sin sin cos cos cos + sin sin sin sin cos * sin sin + cos sin cos sin cos + cos sin sin cos cos * ) 17 Rotational ransformation Euclidean Norm Position of a particle in two coordinate frames with common origin y = Hx x = H 1 y Orthonormal transformation [H 1 = H ] Not a positive-definite matrix Non-singular matrix [det(h) = 1 0] y 2 = ( y y) 1/2 = y 2 1 + y 2 2 ( 2 + y 3 ) 1/2 ( ) 1/2 = Hx 2 = x 2 = x 2 1 + x 2 2 ( 2 + x 3 ) 1/2 = x H Hx as H H = H 1 H = I 3 (positive-definite) 18

Conditions for a Static Minimum 19 he Static Minimization Problem Find the value of a continuous control parameter, u, that minimizes a continuous scalar cost junction, J Single control parameter min J ( u ); dim J wrt u ( ) = 1, dim ( u ) = 1 u* = argmin J ( u) Argument that minimizes J u Many control parameters min J ( u); dim J wrt u ( ) = 1, dim( u) = m 1 u* = argmin J ( u) u 20

Necessary Condition for Static Optimality Single control dj du u =u* = 0 i.e., the slope is zero at the optimum point Example: J = ( u 4) 2 dj du = 2 u 4 ( ) = 0 when u* = 4 Optimum point is a stationary point 21 Necessary Condition for Static Optimality Multiple controls J u u=u* = J J... u 1 u 2 J u m = 0 u=u* Gradient, defined as a row vector i.e., all the slopes are concurrently zero at the optimum point Example: J = ( u 1 4) + ( u 2 8) dj = 2( u 1 4) = 0 du 1 when u 1 * = 4 dj = 2( u 2 8) = 0 du 2 when u 2 * = 8 J u u=u* = J J u 1 u 2 4 u=u*= 8 Optimum point is a stationary point = 0 0 22

But the Slope can be Zero for More than One Reason Minimum Maximum Either Neither (Inflection Point) 23 But the Gradient can be Zero for More than One Reason Minimum Maximum Either Neither (Saddle Point) 24

Sufficient Condition for Extremum Single control Minimum Satisfy necessary condition plus d 2 J du 2 u =u* > 0 Maximum Satisfy necessary condition plus d 2 J du 2 u =u* < 0 i.e., the curvature is positive at the optimum point Example: J = ( u 4) 2 dj du = 2 u 4 d 2 J du = 2 > 0 2 ( ) i.e., the curvature is negative at the optimum point Example: J = ( u 4) 2 dj du = 2 u 4 d 2 J du = 2 < 0 2 ( ) 25 Sufficient Condition for a Local Minimum Multiple controls Satisfy necessary condition plus Hessian matrix 2 J u 2 u=u* = 2 J u 1 2 2 J u 1 u 2... 2 J u 1 u m 2 J 2 J 2 J... 2 u 2 u 1 u 2 u 2 u m............ 2 J 2 J... u m u 1 u 2 u m 2 J u m 2 u=u* > 0 i.e., J must be convex for a minimum 26

Minimized Cost Function, J* Gradient is zero at the minimum Hessian matrix is positive-definite at the local minimum J ( u*+u) J ( u* ) + J ( u* ) + 2 J ( u* ) +... ( ) = u J J u* 2 J u* ( ) = 1 2 u 2 J u ) u=u* ( = 0 u 2 ) u * 0 u=u* ( First variation is zero at the minimum Second variationis positive at the minimum 27 Equality Constraints 28

Static Cost Functions with Equality Constraints Minimize J(u ), subject to c(u ) = 0 dim(c) = [n x 1] dim(u ) = [(m + n) x 1] 29 wo Approaches to Static Optimization with a Constraint 1.Use constraint to reduce control dimension u = 2.Augment the cost function to recognize the constraint u 1 u 2 Example : min J u 1,u 2 subject to c( u ) = c( u 1,u 2 ) = 0 u 2 = fcn u 1 then ( ) = J ( u 1,u 2 ) = J u 1, fcn( u 1 ) J u ( ) ( ) = J u 1 Lagrange multiplier,, is an unknown constant dim () = dim( c) = n 1 J A ( u ) = J ( u ) + c( u ) Warning Lagrange multiplier,, is not the same as an eigenvalue, 30

Solution Example: First Approach Cost function J = u 1 2 2u 1 u 2 + 3u 2 2 40 Constraint c = u 2 u 1 2 = 0 u 2 = u 1 + 2 31 Solution Example: Reduced Control Dimension Cost function and gradient with substitution J = u 1 2 2u 1 u 2 + 3u 2 2 40 = u 1 2 2u 1 u 1 + 2 = 2u 1 2 + 8u 1 28 J u 1 = 4u 1 + 8 = 0 ( ) + 3( u 1 + 2) 2 40 Optimal solution u 1 * = 2 u 2 * = 0 J* = 36 32

Solution: Second Approach Partition u into a state, x, and a control, u, such that dim(x) = [n x 1] = dim(c) dim(u) = [m x 1] x u = u Add constraint to the cost function, weighted by Lagrange multiplier, u J A J A ( ) = J ( u ) + c( u ) ( x,u) = J ( x,u) + c( x,u) c is required to be zero when J A is a minimum ( ) = c c u x u = 0 33 J A x = J x + c x = 0 Solution: Adjoin Constraint with Lagrange Multiplier Gradients with respect to x, u, and are zero at the optimum point J A u = J u + c u = 0 J A = c = 0 34

Simultaneous Solutions for State and Control First equation: find optimal Lagrange multiplier (n scalar equations) Second and third equations: specify the state and control (n + m) scalar equations * = J x * c * =, x( ) + ( 2n + m) values must be found: ( x, u, ) 1 - /. c x( ) or 1 J x ( ) [Row] [Column] J u + c * u = 0 J u J x c x( ) c( x,u) = 0 1 c u = 0 35 Solution Example: Lagrange Multiplier Cost function Constraint Partial derivatives J x J u J = u 2 2xu + 3x 2 40 c = x u 2 = 0 = 2u + 6x = 2u 2x c x = 1 c u = 1 36

Solution Example: Lagrange Multiplier From first equation * = 2u 6x From second equation ( 2u 2x) + ( 2u 6x) (1) x = 0 From constraint u = 2 Optimal solution x* = 0 u* = 2 J* = 36 37 Numerical Optimization 38

Numerical Optimization What if J is too complicated to find an analytical solution for the minimum? or J has multiple minima? Use numerical optimization to find local and/or global solutions 39 wo Approaches to Numerical Minimization Gradient-Free Search Evaluate J and search directly for min J J u o J u n Gradient-Based Search Evaluate J/u and search for zero = J u u=u0 = starting guess = J u n(1 + ) J u n = J u u=un such that J < J u n u n(1 40

Gradient-Free Search (Based on J) Exhaustive grid search Unstructured random search Structured random search 41 Gradient-Free Search Structured random search Nelder-Mead (Downhill Simplex) algorithm Simulated annealing Genetic algorithm Particle-swarm optimization 42

Downhill Simplex Search (Nelder-Mead Algorithm)* Simplex: N-dimensional figure in control space defined by N + 1 vertices (N + 1) N / 2 straight edges between vertices * Basis for MALAB fminsearch 43 Search Procedure for Downhill Simplex Method Select starting set of vertices Evaluate cost at each vertex Determine vertex with largest cost (e.g., J 1 at right) Project search from this vertex through middle of opposite face (or edge for N = 2) Evaluate cost at new vertex (e.g., J 4 at right) Drop J 1 vertex, and form simplex with new vertex Repeat until cost is small 44

Gradient Search to Minimize a Quadratic Function Cost function, gradient, and Hessian matrix of a quadratic function Guess a starting value of u, u o Solve gradient equation J = 1 ( 2 u u* ) R( u u* ), R > 0 = 1 ( 2 u Ru u Ru*u* Ru + u* Ru* ) J u = ( u u* ) R = 0 when u = u* 2 J = R = symmetric constant 2 u J = ( u o u* ) R = ( u o u* ) 2 J u u=uo u 2 u=u o ( u o u* ) = J R 1 u u=uo J u* = u o R 1 ( u u=uo ( row) ( column) 45 For a quadratic cost function J u* = u o R 1 ( u u=uo = u o 2 J ( u 2 u=u o ( 1 Optimal Value Found in a Single Step J ( u u=uo Gradient establishes general search direction Hessian fine-tunes direction and tells exactly how far to go 46

Newton-Raphson Iteration Many cost functions are not quadratic However, the surface is well-approximated by a quadratic in the vicinity of the optimum, u* J ( u*+u) J ( u* ) + J ( u* ) + 2 J ( u* ) +... J J ( u* ) = u = 0 u u=u* 2 J u* ( ) = 1 2 u 2 J u 2 ) u * 0 u=u* ( Optimal solution requires multiple steps 47 Newton-Raphson Iteration Newton-Raphson algorithm is an iterative search patterned after the quadratic search u k +1 = u k 2 J u 2 u=u k ( ( 1 J u u=uk ( ( 48

Difficulties with Newton- Raphson Iteration u k +1 = u k 2 J u 2 u=u k ( ( 1 J u u=uk ( ( Good when close to the optimum, but Hessian matrix (i.e., the curvature) may be Difficult to estimate from local measurements of the cost May have the wrong sign (e.g., not positive-definite) May lead to large errors in incremental control variation 49 Steepest-Descent Algorithm Multiplies Gradient by a Scalar Constant ( Gain ) u k +1 = u k J u u=uk Replace Hessian matrix by a scalar constant Gradient is orthogonal to equal-cost contours ) () 50

Choice of Steepest- Descent Gain If gain is too small Convergence is slow If gain is too large Convergence oscillates or may fail Solution: Make gain adaptive 51 Optimal Steepest- Descent Gain Find optimal gain by evaluating cost, J, for intermediate solutions with same J u Adjustment rule for Starting estimate, J 0 First estimate, J 1, using ( ) Second estimate, J 2, using 2 If J 2 > J 1 Quadratic fit through three points to find * Else, third estimate, J 3, using 4 Use optimal gain, *, on each major iteration 52

Generalized Direct Search Algorithm J u k+1 (t) = u k (t) K k u t ( ) ( k Choose optimal elements of K by sequential line search before recalculating the gradient K k = k 11......... k 22............ 53 Ad Hoc Modifications to the Newton-Raphson Search u k+1 = u k 2 J ) u 2 u=u k () 1 J ) u u=uk (, < 1 u k+1 = u k 2 J u 1 2 0 0 0 0 0 0 2 J 2 u m ( ( ( ( ( ( ( ( ( ( ( ( ( 1 J ( u u=uk u k+1 = u k 2 J + K( u 2 u=u k ( 1 J ( u u=uk, K > 0, diagonal u k+1 = u k 2 J + I) u 2 u=u k () 1 J ) u u=uk ( With varying, this is the Levenberg-Marquardt algorithm 54

Next ime: Principles for Optimal Control of Dynamic Systems Reading: OCE: Sections 3.1, 3.2, 3.4 55 Supplemental Material 56

L p norm = x p = n i=1 Infinity Norm Norm As p -> Norm is the value of the maximum component x i p 1/ p n ) ***( x p() i i=1 1/) = x imax = L ) norm x 1 x x = 2 x n dim(x) = n n Also called Supremum norm Chebyshev norm Uniform norm L norm = x = max{ x 1, x 2,, x n } 57 Gradient-Free vs. Gradient-Based Searches J is a scalar J provides no search direction Search may provide a global minimum J/u is a vector J/u indicates feasible search direction Search defines a local minimum 58

MALAB Implementations of Gradient-Free Search Nelder-Mead (Downhill Simplex) algorithm http://www.mathworks.com/help/matlab/ref/ fminsearch.html Simulated annealing http://www.mathworks.com/help/gads/ simulated-annealing.html Genetic algorithm http://www.mathworks.com/help/gads/ genetic-algorithm.html Example: edit gaconstrained Particle-swarm optimization http://www.mathworks.com/help/gads/ particle-swarm.html 59 Numerical Example Cost function and derivatives J = 1. 0 ( /* 2 0* 1) 5J 5u u 1 u 2 ( = * * ) 1 + ( 3 - * -, ) u 1 u 2 1 + ( 3 - * -, ) 1 2 2 9 1 2 2 9 + ( - *, * ) u 1 u 2 + ( - ; R = *, ) 1 + 2 3-0 3 -, 0 4 1 2 2 9 First guess at optimal control (details of the actual cost function are unknown) u 1 4 = u 2 7 Derivatives evaluated at starting point J ) = + u u=u0 * + J = 1 ( 2 u u *) R( u u *), R > 0 4 7 ( 1, ) 3. + -. * 1 2 2 9,. = - 11 42 0 + -, ; R = ) 1 2, +. * 2 9 - u* = = 4 7 u 1 u 2 * = Solution from starting point J u* = u o R 1 4 7 3 4 = 1 3 u u=uo ( ( ( 9 / 5 2 / 5 + * - ) 2 / 5 1 / 5, 60 11 42

Conjugate Gradient Algorithm First step H u 1 (t) = u k (t) K ( 0 u t) ( Calculate ratio of integrated gradient magnitudes squared b = t f ( t o t f ( t o H ( u t) H ( 1 u t) dt 1 H ( u t) H ( 0 u t) dt 0 0 Calculate gradient of improved trajectory H u t ( ) 1 Second step ) + H u k +1 (t) = u 2 (t) = u 1 (t) K ( 1 u t) * (,+ 1 b H ( u t) -+ (. 0 /+ 61 Rosenbrock Function ypical test function for numerical optimization algorithms J(u 1,u 2 ) = ( 1 u 1 ) 2 2 + 100( u 2 u 1 ) 2 Wolfram Alpha Plot (D ((1-u1)^2+100 (u2 - u1^2)^2, u1), D ((1-u1)^2+100 (u2 - u1^2)^2, u2)) Minimize[(1-u1)^2+100 (u2 - u1^2)^2, u1, u2] 62

How Many Maxima/Minima does the Mexican Hat [z = (sin R)/R] Have? J u u=u* = J J... u 1 u 2 J u m = 0 u=u* One maximum 2 J u 2 u=u* = 2 J u 1 2 2 J u 1 u 2... 2 J 2 J... 2 u 2 u 1 u 2 2 J u 1 u m 2 J u 2 u m............ 2 J 2 J... u m u 1 u 2 u m 2 J u m 2 u=u* > < 0 Prior example Wolfram Alpha (sin(sqrt(u1^2 + u2^2)) ) / (sqrt(u1^2 + u2^2)) maximize(sin(sqrt(u1^2 + u2^2)) ) / (sqrt(u1^2 + u2^2)) 63