Chapter 4: Unconstrained nonlinear optimization Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-15-16.shtml Academic year 2015-16 Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 1 / 14
4.1 Examples 1) Statistical estimation A random variable X with probability density f(x,θ), where θ R m is the parameter vector, and n independent observations x 1,...,x n of X. Maximum likelihood: Estimates ˆθ of θ are derived by maximizing L(θ) = f(x 1,θ) f(x 2,θ)...f(x n,θ) Assumption: θ for which all factors are positive Since ln is monotonically increasing, ˆθ maximizes also n ln(l(θ)) = ln(f(x j,θ)) If f is differentiable with respect to θ in ˆθ, necessary optimality conditions: n θ f(x j,ˆθ) = 0 f(x j,ˆθ) j=1 j=1 Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 2 / 14
For Guassian density and θ = (µ,σ), we obtain f(x) = 1 σ µ)2 exp (x 2π 2σ 2 ln(l(θ)) = n 2 ln(2π) nln(σ) 1 2σ 2 Minimum is achieved in a stationary point: [ln(l(θ))] µ n (x j µ) 2 j=1 = 1 n (x σ 2 j µ) = 0 j=1 and [ln(l(θ))] σ = n σ + 1 n (x σ 3 j µ) 2 = 0 j=1 Therefore ˆµ = 1 n n x j ˆσ = 1 n j=1 n (x j ˆµ) 2 j=1 Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 3 / 14
2) 3-D Image Reconstruction (Computer Tomography see Chapter 1) Problem: Given V R 3 subdivided into n voxels V j and the measurements provided by m beams, reconstruct a 3-D image of V, that is, determine the density x j for each V j. i-th beam attenuation depends on the total amount of matter on the way: Let b i be the measurement of the i-th beam at the exit point. Given m beams with prescribed directions, we have: a ij x j = b i i = 1,...,m j J i x j 0 j = 1,...,n usually infeasible due to measurement errors, non uniformity of the V j s,... Since in general m < n, one possible formulation: min m i=1 (b i j J i a ij x j ) 2 +δ n j=1 x j s.t. x j 0 j = 1,...,n. with δ > 0. 3) Linear Regression... j J i a ij x j Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 4 / 14
4.2 Optimality conditions Consider a generic optimization problem: where S R n and f C 1 o C 2. Unconstrained case: S = R n min x S f(x) Extension of the necessary and sufficient optimality conditions (first and second order), and special case where f and S are convex. Definition: d R n is a feasible direction at x if α > 0 such that x +αd S α [0,α] (1) N.B.: At any interior point all directions (all d R n ) are feasible. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 5 / 14
First order necessary optimality conditions: If f C 1 on S and x is a local minimum of f over S, then for any feasible direction d R n at x t f(x)d 0, namely all feasible directions are ascent directions. Proof According to (1), we consider φ: [0,α] R such that φ(α) = f(x +αd) Since x is a local minimum of f over S, α = 0 is a local minimum of φ(α). Taylor series of φ at point α = 0 φ(α) = φ(0)+αφ (0)+o(α) N.B.: u(α) = o(α) if u(α) tends to 0 faster than α when α 0. Suppose that φ (0) < 0: if α 0 + we can neglect the asymptotic term and we have φ(α) φ(0) < 0, which contradicts the local optimality of 0. Therefore φ (0) 0 and, since φ (α) = t f(x +αd)d, we have t f(x)d 0. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 6 / 14
Example: min x1, x 2 0 f(x 1,x 2) = x 2 1 x 1 +x 2 +x 1x 2 4.8 4 3.2 2.4 1.6 0.8 0.5 1 1.5 2 2.5 x = ( 1 2 0)t is a global minimum because t f(x )d 0 for all feasible directions d in x (all those with d 2 0), even if t f(x ) = (0 3 2 ) 0. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 7 / 14
Second order necessary optimality conditions: If f C 2 on S and x is a local minimum of f over S then i) t f(x)d 0 for every d R n feasible direction at x, ii) if t f(x)d = 0 then d t 2 f(x)d 0. Proof To verify (ii), we proceed in a similar way. Suppose t f(x)d = 0, then φ(α) = φ(0)+αφ (0) + 1 }{{} 2 α2 φ (0)+o(α 2 ). 0 If φ (0) < 0, for sufficiently small values of α we have φ(α) φ(0) 1 2 α2 φ (0) < 0, namely 0 would not be a local minimum of φ(α). Hence φ (0) 0 and φ (0) = d t 2 f(x)d 0. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 8 / 14
Corollary: (Unconstrained case) If f C 2 on S and x int(s) is a local minimum of F over S, then 1 f(x) = 0 (stationarity condition) 2 2 f(x) is positive semidefinite. Proof Since x int(s), all d R n are feasible directions at x. The facts that t f(x)d 0 for every d and d imply (1). Point 2) is an immediate consequence of d t 2 f(x)d 0 for all d R n. Three types of candidate points: local minima, local maxima and saddle points. Clearly these optimality conditions are not sufficient. For instance, f(x) = x 3 with f (0) = 0 and f (0) = 0 but x = 0 is not a local minimum. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 9 / 14
Example: min x1, x 2 0 f(x 1,x 2) = x 3 1 x 2 1x 2 +2x 2 2 15 10 5 1 2 3 4 5 6 7 8 9 Candidate points: (0 0) and (6 9). The point (0 0) belongs to the boundary and (6 9) is not a local minimum even though, for x 1 = 6, x 2 = 9 it is a local minimum w.r.t. x 2 and, for x 2 = 9, x 1 = 6 it is a local minimum w.r.t. x 1. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 10 / 14
Sufficient optimality conditions: If f C 2 on S and x int(s) such that f(x) = 0 and 2 f(x) is positive definite, then x is a strict local minimum of f over S, namely Proof f(x) > f(x) x N ǫ(x) S. Let d B ǫ(0) be any feasible direction such that x +d S B ǫ(x). Then with f(x) = 0. f(x +d) = f(x)+ t f(x)d + 1 2 dt 2 f(x)d +o( d 2 ) Since 2 f(x) is positive definite, a > 0 such that d t 2 f(x)d a d 2 with a smallest eigenvalue of 2 f(x). Thus for d sufficiently small f(x +d) f(x) a 2 d 2 > 0 which implies f(x +d) > f(x), namely x is a strict local minimum along d. Since this holds d R n such that x +d S B ǫ(x), f is locally strictly convex. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 11 / 14
Convex problems min f(x) x C R n where C Rn convex and f convex We know that, if f : C R is convex, every local minimum is a global minimum. Necessary and sufficient conditions for global optimality: Let f : C R be convex of class C 1 on C R n convex. x is a global minimum of f on C if and only if t f(x )(y x ) 0 y C. Proof Necessary condition: if f C 1 and x is a local minimum (and hence, due to convexity, also global minimum) then t f(x )d 0 d feasible directions at x, namely d = y x with y C. Sufficient conditions: f is convex if and only if f(y) f(x )+ t f(x )(y x ) y C. The assumption f(x )(y x ) 0 implies that f(y) f(x ) for every y C. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 12 / 14
Definition: Let C R n be convex. Then x C is an extreme point of C if it cannot be expressed as a convex combination of two different points of C, namely implies that x 1 = x 2. x = αx 1 +(1 α)x 2 with x 1,x 2 C and α (0,1) Property: (maximization of convex functions) Let f be a convex function defined on the convex bounded closed set C. If f has a (finite) maximum over C, then there exists an optimal extreme point of C. Proof Suppose that x is a global maximum of f over C, but not an extreme point. 1) Verify that the maximum is achieved at a point on the boundary C. Since C is convex bounded and closed, any x int(c) can be expressed as a convex combination of two points y 1,y 2 C that belong to the boundary C. If x is not an extreme point, y 1,y 2 C and α [0,1] such that f(x ) αf(y 1 )+(1 α)f(y 2 ) min{f(y 1 ),f(y 2 )}. Thus also y 1 and y 2 are global maxima. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 13 / 14
2) Suppose that x C is not an extreme point. Consider the intersection T 1 = C H, where H is a supporting hyperplane at x C. Clearly T 1 is of dimension n 1. Since T 1 is compact, there exists a global optimum x 1 of f over T 1 such that and, as previously, we have x 1 T 1. maxf(x) = f(x 1 ) = f(x ) x T 1 Claim: If x 1 is an extreme point of T 1, x 1 is also an extreme point of C. If x 1 is not an extreme point of T 1, we similarly define T 2,... In the worst case dim(t n) = 0. Such an isolated point x n is clearly an extreme point. Since an extreme point of T i is also an extreme point of T i 1, x n must be an extreme point of C. Illustrations: a polyhedron and a convex set with and infinite number of extreme points. Particular cases: Linear programming (a linear function is both convex and concave, and the polyhedron of the feasible solutions has a finite number of extreme points). Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 14 / 14