Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained optimization (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 2 / 30

Unconstrained optimization Introduction General problem of optimization (minimization) Given f : Ω R n R find x Ω such that f (x ) f (x) for all x Ω. f is called the objective function, and Ω the set of feasible solutions. Main cases: Unconstrained optimization: Ω = R n Constrained optimization: Ω R n, usually determined by a set of equality or inequality constraints, h(x) = 0, g(x) 0, etc. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 3 / 30

Unconstrained optimization Fact: The are no general techniques for solving the problem of global optimization. Therefore, one usually solves in a weaker sense: Local optimization Find x Ω such that f (x ) f (x) for all x such that x x R. Exception If f is a strictly convex function and Ω is a strictly convex set, then f has a unique global minimum in Ω. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 4 / 30

Unconstrained optimization Recall theory of local optimization One variable: Solve the optimization problem for f : R R: Find the set of critical points x (f (x ) = 0). If f (x ) > 0 then x is a local minimum. n variables: Solve the optimization problem for f : R n R: Find critical points x, which satisfy f (x ) = 0, i.e. x1 f (x ) = 0, x2 f (x ) = 0,..., xn f (x ) = 0 Compute the Hessian at x H(f )(x ) = ( ) n xi x j f (x )). i,j=1 If definite positive, then x is a local minimum. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 5 / 30

Newton s method Unconstrained optimization Consider the second-order Taylor expansion of f with x = x x k. f (x k + x) = f (x k ) + f (x k ) T x + 1 2 ( x)t H(x k ) x, The extremum is attained when the differential with respect to x equals zero, i.e. when f (x k ) T + H(x k ) x = 0 = x = H(x k ) 1 f (x k ) T Newton s method is defined by x k+1 = x k H(x k ) 1 f (x k ) T. Remark Exact for quadratic objective functions, where H(x) is constant. Identical to using Newton s method for solving f (x) = 0. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 6 / 30

Unconstrained optimization Example f (x, y) = 1 m (x m + ηy m ), where m > 1 is and η > 0. (0, 0) is a global minimum. Then, ( ) f (x, y) = (x m 1, ηy m 1 x m 2 0 ), H f (x, y) = (m 1) 0 ηy m 2. ( Hf (x, y) ) 1 f (x, y) = 1 m 1 For x = (x, y), Newton s method gives ( ) ( ) x 2 m 0 x m 1 1 0 η y 2 m ηy m 1 = 1 m 1 x k+1 = x k 1 m 1 x k = m 2 m 1 x k. ( ) x. y (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 7 / 30

Unconstrained optimization x k+1 = m 2 m 1 x k. If m = 2 (f is a parabolid) Newton s method converges in the first step for any initial guess. If m 2, the iterative formula gives x k+1 = ( m 2 ) k+1x0 0 as k, m 1 for any x 0 R 2, since (m 2)/(m 1) < 1. The method converges for any m > 1 and for any initial guess. If m is very large, (m 2)/(m 1) 1, and the convergence is slow. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 8 / 30

Unconstrained optimization Convergence Theorem Assume the following conditions: f is three times continuously differentiable x is a critical point of f H f (x ) is positive definite. Then, if x 0 is close enough to x, the iterations of Newton s method converge quadratically to x, i.e., for some constant λ > 0, x k+1 x λ x k x 2. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 9 / 30

Unconstrained optimization Problems with Newton s method For nonlinear functions, Newton s method requires solving a linear system every step: expensive. It may not converge if the initial guess is not good, or may converge to a saddle-point or maximum: unreliable. Difficulties addressed by using variants or quasi-newton methods: x k+1 = x k α k H 1 k f (x k ) T, where 0 < α k < 1 and H k is an approximation to the exact Hessian. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 10 / 30

Descent methods Unconstrained optimization Remark Finding a local minimum is generally easier than the general problem of solving the non-linear equations because f (x ) = 0 We can evaluate (and use) f, in addition to f, The Hessian is positive defintite near the solution, which gives algebraic advantages. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 11 / 30

Descent methods Unconstrained optimization If we have a current guess for the solution, x k, and know a descent direction (downhill) d k, i.e. a direction in which f (x k + αd k ) < f (x k ) for all 0 < α α max, then we can move downhill and get a point closer to the minimum: where α k is a step length. x k+1 = x k + α k d k, (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 12 / 30

Gradient descent Unconstrained optimization Using Taylor s expansion f (x k + α k d k ) f (x k ) + α k ( f ) T d k. The fastest local decrease is achieved moving opposite to the gradient d k = f (x k ). For choosing α k we minimize φ(α) = f (x k + αd k ). Must be done approximately. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 13 / 30

Unconstrained optimization Choosing the step size, α k φ(α) = f (x k + αd k ) We minimize an interpolator of φ. We have the data φ(0) = f (x k ), φ(1) = f (x k + d k ), and φ (0) = d k, d k < 0, we take, for α [0, 1], the quadratic polynomial q(α) = φ(0) + φ (0)α + (φ(1) φ(0) φ (0))α 2. If φ(1) φ(0) φ (0) < 0, the minimum of q is on the border of [0, 1], and we take α = 1 (α = 0 stops the iterations). Otherwise φ has an interior minimum given by α L = Thus, we take α = min{1, α L }. φ (0) 2(φ(1) φ(0) φ (0)) > 0. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 14 / 30

Some properties Unconstrained optimization If α k is the exact minimum of φ(α) then, using the chain rule, we obtain 0 = φ (α k ) = f (x k + α k d k ), d k = f (x k+1 ), f (x k ), Thus, f (x k ) and f (x k+1 ) are orthogonal: steepest descent takes a zig-zag path down to the minimum. Error Steepest descent has linear convergence: for some constant λ > 0, x k+1 x λ x k x. Can be very slow for ill-conditioned Hessians. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 15 / 30

Unconstrained optimization Example Consider the function f (x) = a 2 x 2, with a (0, 1), having the unique critical point at x = 0. An easy computation for the step α = min{1, α L } shows that α L = 1/a, so we must take α = 1. Then x k+1 = x k f (x k ) = x k ax k, so we can expect only linear convergence: x k+1 x k = a x k = a x k x. Moreover, we obtain by recursion that x k = (1 a) k x 0, and therefore, if a is close to zero, the convergence is extremely slow. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 16 / 30

Unconstrained optimization Figure : Descent trajectories for x k and f (x k ). 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 17 / 30

Constrained optimization Outline 1 Unconstrained optimization 2 Constrained optimization (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 18 / 30

General formulation Constrained optimization General constrained optimization problem Given f : R n R, find x R n satisfying min x R n f (x), φ(x) = 0 ψ(x) 0 (equality constraints), (inequality constraints). We assume that functions f, φ and ψ are differentiable. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 19 / 30

Constrained optimization Theorem (Necessary conditions for constrained problems) Suppose that x is a point of the set U = {x Ω : φ i (x) = 0, 1 i m} Ω, such that the m vectors φ i (x ) R N, with i = 1,..., m, are linearly independent. Then, if f has a local minimum at x relative to the set U, there exist m numbers λ i (x ), such that f (x ) + λ 1 (x ) φ 1 (x ) +... + λ m (x ) φ m (x ) = 0. The numbers λ i (x ) are called Lagrange multipliers. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 20 / 30

Constrained optimization Example If φ : R 2 R, the set {x R 2 : h(x) = 0} is a curve with normal vector = φ. At the minimum, we have f φ and then f + λ φ = 0, for some λ. Figure : Left: Surface and curve. Right: Contour map. The red line shows the constraint g(x, y) = c (g is our φ). The blue lines are contours of f (x, y). The point where the red line tangentially touches a blue contour is the solution. Since d 1 > d 2, the solution is a maximization of f (x, y). (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 21 / 30

Constrained optimization The Lagrangian function is L : R N R m R given by L(x, λ) = f (x) + m λ i φ i (x) with λ = (λ 1,..., λ m ). i=1 If (x, λ ) is a minimum of L (without constraints) then (x,λ) L(x, λ ) = 0. Therefore, the optimality conditions with respect to x f (x ) + m λ i φ i (x ) = 0, i=1 and with respect to λ hold. φ i (x ) = 0, i = 1,..., m, We deduce that any x such that (x, λ ) is a critical point of L(x, λ) is a candidate to be a minimum for the constrained problem. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 22 / 30

Constrained optimization Example Let f (x 1, x 2 ) = x 2 and φ(x 1, x 2 ) = x1 2 + x 2 2 1 (n = 2, m = 1). The set of constraints is, then, the circumference The Lagrangian function is given by The critical points are determined by U = {(x 1, x 2 ) R 2 : x 2 1 + x 2 2 = 1}. L(x 1, x 2, λ) = x 2 + λ(x 2 1 + x 2 2 1). 0 = L x 1 (x, λ ) = 2λx 1, 0 = L x 2 (x, λ ) = 1 + 2λx 2, 0 = L λ (x, λ ) = x 2 1 + x 2 2 1. Solving, we get x 1 = 0, x 2 = ±1 and λ = 1/2x 2. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 23 / 30

Constrained optimization Theorem (Sufficient conditions for constrained problems) Let x U, with U (equality constraints) and λ R m such that f (x ) + m λ i φ i (x ) = 0. Suppose that the Hessian matrix of L, with respect to x, given by i=1 H(x ) = H f (x ) + λ T H φ (x ) is positive definite in {y R m : φ(x ) T y = 0}. Then x is a constrained minimum of f in U. In the previous example, ( ) ( ) H(x 0 0 ) = + λ 2 0, M = {(y 0 0 0 2 1, y 2 ) R 2 : x2 y 2 = 0}. Therefore, H(x ) is positive definite only for x = (0, 1). The other critical point of the Lagrangian, (0, 1), corresponds to a constrained maximum. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 24 / 30

Penalty method Constrained optimization The problem is stated as min f (x). x S Idea: replace f (x) by f (x) + cp(x) and solve an unconstrained problem. To do this, we take c > 0 and P satisfying: 1 P is continuous in Ω, 2 P(x) 0 for x Ω, and 3 P(x) = 0 if and only if x S. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 25 / 30

Constrained optimization Example: Suppose that S is given by S = {x R N : φ i (x) 0, i = 1,..., m}. An example of penalty function is P(x) = 1 2 m max(0, φ i (x)) 2. i=1 In the next figure we have an example of cp(x), with φ 1 (x) = x 2, φ 2 (x) = 1 x. For c large, the minimum of f (x) + cp(x) lie in a region where P is small. When c, the solution to the penalty problem converges to the solution of the constrained problem. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 26 / 30

Constrained optimization 0.5 0.45 c = 1 0.4 0.35 c = 5 0.3 0.25 0.2 c = 20 0.15 0.1 0.05 0 0 0.5 1 1.5 2 2.5 3 Figure : Function cp(x) for several values of c. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 27 / 30

Constrained optimization The procedure to solve the constrained problem by the penalty method is: Let c k be a sequence such that: c k 0 c k+1 > c k, lim k c k =. Define the functions q(c, x) = f (x) + cp(x). For each k, assume that the problem min q(c k, x) has a solution, x k. Theorem Let x k be a sequence generated by the penalty method. Then, any limit point of the sequence is the solution of the constrained minimization problem. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 28 / 30

Constrained optimization Example: Minimize f (x, y) = x 2 + 2y 2 in the set S = {(x, y) R 2 : x + y 1}. We define the differentiable penalty function { P(x, y) = c k = k, and q k (x, y) = f (x, y) + c k P(x, y). P and c k satisify the conditions. 0 if (x, y) S, (x + y 1) 2 if (x, y) R 2 \S, In practice, we use a numerical method (such as the gradient method) to solve the unconstrained minimization of q k. In this (simple) example, we compute the exact solution. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 29 / 30

Constrained optimization We start computing the critical points. If (x, y) S is a critical point of q k then q k (x, y) = (2x, 4y) = (0, 0). The solution is (0, 0) / S. Therefore, we disregard this point. If (x, y) R 2 \S is a critical point of q k then q k (x, y) = (2(1 + k)x + 2ky 2k, 2kx + 2(2 + k)y 2k) = (0, 0), with the solution (x k, y k ) = ( 2k 3k + 2, k ). 3k + 2 Since x k + y k = 3k/(3k + 2) < 1, we have (x k, y k ) R2 \S, for all k. The exact constrained minimum of f is obtained taking the limit k, whcih gives (x, y ) = (2/3, 1/3) S. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 30 / 30