Nonlinear Programming Models p. 1

Size: px
Start display at page:

Download "Nonlinear Programming Models p. 1"

Transcription

1 Nonlinear Programming Models Fabio Schoen Introduction Nonlinear Programming Models p. Nonlinear Programming Models p. 2 NLP problems Local and global optima min f(x) x S R n Standard form: min f(x) h i (x) = 0 i =,m g j (x) 0 j =,k A global minimum or global optimum is any x S such that x S f(x) f(x ) A point x is a local optimum if ε > 0 such that x S B( x,ε) f(x) f( x) where B( x,ε) = {x R n : x x ε} is a ball in R n. Any global optimum is also a local optimum, but the opposite is generally false. Here S = {x R n : h i (x) = 0 i,g j (x) 0 j} Nonlinear Programming Models p. 3 Nonlinear Programming Models p. 4

2 Convex Functions Convex Functions A set S R n is convex if x,y S λx + ( λ)y S for all choices of λ [0, ]. Let Ω R n : non empty convex set. A function f : Ω R is convex iff for all x,y Ω,λ [0, ] f(λx + ( λ)y) λf(x) + ( λ)f(y) x y Nonlinear Programming Models p. 5 Nonlinear Programming Models p. 6 Properties of convex functions Convex functions Every convex function is continuous in the interior of Ω. It might be discontinuous, but only on the frontier. If f is continuously differentiable then it is convex iff for all y Ω f(y) f(x) + (y x) T f(x) x y Nonlinear Programming Models p. 7 Nonlinear Programming Models p. 8

3 If f is twice continuously differentiable f it is convex iff its Hessian matrix is positive semi-definite: [ ] 2 2 f f(x) := x i x j then 2 f(x) 0 iff v T 2 f(x)v 0 v R n or, equivalently, all eigenvalues of 2 f(x) are non negative. Example: an affine function is convex (and concave) For a quadratic function (Q: symmetric matrix): we have f(x) = 2 xt Qx + b T x + c f(x) = Qx + b f is convex iff Q 0 2 f(x) = Q Nonlinear Programming Models p. 9 Nonlinear Programming Models p. 0 Convex Optimization Problems min f(x) x S is a convex optimization problem iff S is a convex set and f is convex on S. For a problem in standard form min f(x) h i (x) = 0 i =,m g j (x) 0 j =,k Maximization Slight abuse in notation: a problem maxf(x) x S is called convex iff S is a convex set and f is a concave function (not to be confused with minimization of a concave function, (or maximization of a convex function) which are NOT a convex optimization problem) if f is convex, h i (x) are affine functions, g j (x) are convex functions, then the problem is convex. Nonlinear Programming Models p. Nonlinear Programming Models p. 2

4 Convex and non convex optimization Convex optimization is easy, non convex optimization is usually very hard. Fundamental property of convex optimization problems: every local optimum is also a global optimum (will give a proof later) Minimizing a positive semidefinite quadratic function on a polyhedron is easy (polynomially solvable); if even a single eigenvalue of the hessian is negative the problem becomes NP hard Convex functions: examples Many (of course not all... ) functions are convex! affine functions a T x + b quadratic functions 2 xt Qx + b T x + c with Q = Q T, Q 0 any norm is a convex function x log x (however log x is concave) f is convex if and only if x 0,d R n, its restriction to any line: φ(α) = f(x 0 + αd), is a convex function a linear non negative combination of convex functions is convex g(x,y) convex in x for all y g(x,y)dy convex Nonlinear Programming Models p. 3 Nonlinear Programming Models p. 4 more examples... max i {a T i x + b} is convex f,g: convex max{f(x),g(x)} is convex f a convex functions for any a A (a possibly uncountable set) sup a A f a (x) is convex f convex f(ax + b) let S R n be any set f(x) = sup s S x s is convex Trace(A T X) = i,j A ijx ij is convex (it is linear!) log detx is convex over the set of matrices X R n n : X 0 λ max (X) (the largest eigenvalue of a matrix X) Data Approximation Nonlinear Programming Models p. 5 Nonlinear Programming Models p. 6

5 Table of contents norm approximation maximum likelihood robust estimation Problem: Norm approximation min Ax b x where A, b: parameters. Usually the system is over-determined, i.e. b Range(A). For example, this happens when A R m n with m > n and A has full rank. r := Ax b: residual. Nonlinear Programming Models p. 7 Nonlinear Programming Models p. 8 Examples r = r T r: least squares (or regression ) r = r T Pr with P 0: weighted least squares r = max i r i : minimax, or l or di Tchebichev approximation r = i r i : absolute or l approximation Possible (convex) additional constraints: maximum deviation from an initial estimate: x x est ǫ simple bounds l i x i u i ordering: x x 2 x n Nonlinear Programming Models p. 9 Example: l norm Matrix A R norm residuals Nonlinear Programming Models p. 20

6 l norm l 2 norm 20 8 norm residuals norm 2 residuals Nonlinear Programming Models p. 2 Nonlinear Programming Models p. 22 Variants min i h(y i a T i x) where h: convex function: { z 2 z h linear quadratic h(z) = 2 z z > { 0 z dead zone : h(z) = z z > { log( z 2 ) z < logarithmic barrier: h(z) = z comparison norm (x) norm 2(x) linquad(x) deadzone(x) logbarrier(x) Nonlinear Programming Models p. 23 Nonlinear Programming Models p. 24

7 Maximum likelihood Max likelihood estimate - MLE Given a sample X,X 2,...,X k and a parametric family of probability density functions L( ; θ), the maximum likelihood estimate of θ given the sample is ˆθ = arg max L(X,...,X k ;θ) θ Example: linear measures with and additive i.i.d. (independent identically dsitributed) noise: X i = a T i θ + ε i () (taking the logarithm, which does not change optimum points): ˆθ = arg max log(p(x i a T i θ)) θ If p is log concave this problem is convex. Examples: i ε N(0,σ), i.e. p(z) = (2πσ) /2 exp( z 2 /2σ 2 ) MLE is the l 2 estimate: θ = arg min Aθ X 2 ; p(z) = (/(2a)) exp( z /a) l estimate: ˆθ = arg min θ Aθ X where ε i iid random variables with density p( ): k L(X...,X k ;θ) = p(x i a T i θ) i= Nonlinear Programming Models p. 25 Nonlinear Programming Models p. 26 Ellipsoids p(z) = (/a) exp( z/a) {z 0} (negative exponential) the estimate can be found solving the LP problem: min T (X Aθ) Aθ p uniform on [ a,a] the MLE is any θ such that Aθ X a X An ellipsoid is a subset of R n of the form E = {x R n : (x x 0 ) T P (x x 0 ) } where x 0 R n is the center of the ellipsoid and P is a symmetric positive-definite matrix. Alternative representations: E = {x R n : Ax b 2 } where A 0, or E = {x R n : x = x 0 + Au u 2 } where A is square and non singular (affine transformation of the unit ball) Nonlinear Programming Models p. 27 Nonlinear Programming Models p. 28

8 Robust Least Squares Least Squares: ˆx = arg min i (at i x b i) 2 Hp: a i not known, but it is known that where P i = P T i a i E i = {ā i + P i u : u } 0. Definition: worst case residuals: max (a T i x b i) 2 a i E i i A robust estimate of x is the solution of ˆx r = arg min max (a T i x b x i) 2 a i E i i It holds: RLS α + β T y α + β y then, choosing y = β/ β if α 0 and y = β/ β, otherwise if α < 0, then y = and then: α + β T y = α + β T β/ β sign(α) max (a T i x b i ) a i E i = α + β = max ā T i x b i + u T P i x u = ā T i x b i + P i x Nonlinear Programming Models p. 29 Nonlinear Programming Models p Thus the Robust Least Squares problem reduces to ( ) /2 min ( ā T i x b i + P i x ) 2 (a convex optimization problem). Transformation: i min t 2 x,t min t 2 x,t ā T i x b i + P i x t i ā T i x + b i + P i x t i (Second Order Cone Problem). A norm cone is a convex set C = {(x,t) R n+ : x t} ā T i x b i + P i x t i i i.e. Nonlinear Programming Models p. 3 Nonlinear Programming Models p. 32

9 Geometrical Problems Geometrical Problems projections and distances polyhedral intersection extremal volume ellipsoids classification problems Nonlinear Programming Models p. 33 Nonlinear Programming Models p. 34 Projection on a set Given a set C the projection of x on C is defined as: P C (x) = arg min z x z C Projection on a convex set If C = {x : Ax = b,f i (x) 0,i =,m} where f i : convex C is a convex set and the problem is convex. P C (x) = arg min x z Az = b f i (z) 0 i =,m Nonlinear Programming Models p. 35 Nonlinear Programming Models p. 36

10 Distance between convex sets Distance between convex sets dist(c (),C (2) ) = min x y x C (),y C (2) If C (j) = {x : A (j) x = b (j),f (j) i 0} then the minimum distance can be found through a convex model: min x () x (2) A () x () = b () A (2) x (2) = b (2) f () i x () 0 f (2) i x (2) 0 Nonlinear Programming Models p. 37 Nonlinear Programming Models p. 38 Polyhedral intersection : polyhedra described by means of linear inequalities: P = {x : Ax b}, P 2 = {x : Cx d} Polyhedral intersection P P2 =? It is a linear feasibility problem: Ax b,cx d P P 2? Just check sup{c T k x : Ax b} d k k (solution of a finite number of LP s) Nonlinear Programming Models p. 39 Nonlinear Programming Models p. 40

11 Polyhedral intersection (2) 2: polyhedra (polytopes) described through vertices: P = conv{v,...,v k }, P 2 = conv{w,...,w h } Minimal ellipsoid containing k points Given v,...,v k R n find an ellipsoid E = {x : Ax b } P P2 =? Need to find λ,λ k,µ,µ h 0: λ i = µ j = i λ i v i i j = µ j w j j P P 2? i =,...,k check whether µ j 0: µ j = j µ j w j = v i Nonlinear Programming Models p. 4 j with minimal volume containing the k given points. * * * * * * * * * * * * * * * * * * * Nonlinear Programming Models p. 42 Max. ellipsoid contained in a polyhedron A = A T 0. Volume of E is proportional to det A convex optimization problem (in the unknowns: A, b): min log deta A = A T Given P = {x : Ax b} find an ellipsoid: E = {By + d : y } contained in P with maximum volume. A 0 Av i b i =,k Nonlinear Programming Models p. 43 Nonlinear Programming Models p. 44

12 Max. ellipsoid contained in a polyhedron E P a T i (By + d) b i y : y sup {a T i By + a T i d} b i i y Ba i + a T i d b i max B,d log detb B = B T 0 Ba i + a T i d b i i =,... Difficult variants These problems are hard: find a maximal volume ellipsoid contained in a polyhedron given by its vertices * * * * * * * * * * * * * * * * * * Nonlinear Programming Models p. 45 Nonlinear Programming Models p. 46 find a minimal volume ellipsoid containing a polyhedron described as a system of linear inequalities. It is already a difficult problem to show whether a given ellipsoid E contains a polyhedron P = {Ax b}. This problem is still difficult even when the ellipsoid is a sphere: this problem is equivalent to norm maximization in a polyhedron it is an NP hard concave optimization problem. Nonlinear Programming Models p. 47 Nonlinear Programming Models p. 48

13 Linear classification (separation) Given two point sets X,...,X k,y,...,y h find an hyperplane a T x = t such that: (LP feasibility problem). a T X i i =,k a T Y j j =,h Nonlinear Programming Models p. 49 Nonlinear Programming Models p. 50 Robust separation Robust separation Find a maximal separation: ( max a: a min i ) a T X i maxa T Y j j equivalent to the convex problem: maxt t 2 a T X i t i a T Y j t 2 j a Nonlinear Programming Models p. 5 Nonlinear Programming Models p. 52

14 Optimality Conditions: descent directions Optimality Conditions Fabio Schoen Let S R n be a convex set and consider the problem min f(x) x S where f : S R. Let x,x 2 S and d = x 2 x. d is a feasible direction. If there exists ǫ > 0 such that f(x + ǫd) < f(x ) ǫ (0, ǫ), d is called a descent direction at x. Elementary necessary optimality condition: if x is a local optimum, no descent direction may exist at x Optimality Conditions p. Optimality Conditions p. 2 Optimality Conditions for Convex Sets If x S is a local optimum for f() and there exists a neighborhood U(x ) such that f C (U(x )), then d T f(x ) 0 d : feasible direction Optimality Conditions p. 3 Optimality Conditions p. 4

15 proof Taylor expansion: Optimality Conditions: tangent cone General case: f(x + ǫd) = f(x ) + ǫd T f(x ) + o(ǫ) d cannot be a descent direction, so, if ǫ is sufficiently small, then f(x + ǫd) f(x ). Thus min f(x) g i (x) 0 x X i =,...,m (X : open set) and dividing by ǫ, ǫd T f(x ) + o(ǫ) 0 d T f(x ) + o(ǫ) ǫ Letting ǫ 0 the proof is complete. 0 Let S = {x X : g i (x) 0,i =,...,m}. Tangent cone to S in x: T( x) = {d R n }: where x k S. d d = lim x k x x k x x k x Optimality Conditions p. 5 Optimality Conditions p. 6 Some examples S = R n T(x) = R n x S = {Ax = b} T(x) = {d : Ad = 0} S = {Ax b}; let I be the set of active constraints in x: a T i x = b i i I a T i x < b i i I. Optimality Conditions p. 7 Optimality Conditions p. 8

16 Let d = lim k (x k x)/ (x k x) a T i d = a T i lim k (x k x)/ (x k x) i I = lima T i (x k x)/ (x k x) k = lim(a T i x k b)/ (x k x) k 0 Thus if d T( x) a T i d 0 for i I. Optimality Conditions p. 9 Optimality Conditions p. 0 Example Viceversa, let x k = x + α k d. If a T i d 0 for i I a T i x k = a T i ( x + α k d) i I = b i + α k a T i d b i a T i x k = a T i ( x + α k d) i I < b i + α k a T i d b i if α k small enough Thus T(x) = {d : a T i d 0 i I} Let S = {(x,y) R 2 : x 2 y = 0} (parabola). Tangent cone at (0, 0)? Let {(x k,y k ) (0, 0)}, i.e. x k 0,y k = x 2 k : (x k,y k ) (0, 0) = x 2 k + (x k) 4 = x k + x 2 k and lim x k 0 + lim x k 0 x k x k + x 2 k x k x k + x 2 k = lim x k 0 + = lim x k 0 y k x k + x 2 k y k x k + x 2 k = 0 = 0 Optimality Conditions p. thus T(0, 0) = {(, 0), (, 0)} Optimality Conditions p. 2

17 Descent direction d R n is a feasible direction in x S if ᾱ > 0 : x + αd S α [0,ᾱ). d feasible d T( x), but in general the converse is false. If f( x + αd) f( x) d is a descent direction α (0,ᾱ) I order necessary opt condition Let x S R n be a local optimum for min x S f(x); let f C (U( x)). Then d T f( x) 0 d T( x) Proof d = lim k (x k x)/ (x k x). Taylor expansion: f(x k ) = f( x) + T f( x)(x k x) + o( x k x ) = f( x) + T f( x)(x k x) + x k x o(). x local optimum U( x) : f(x) f( x) x U S. Optimality Conditions p. 3 Optimality Conditions p If k is large enough, x k U( x): f(x k ) f( x) 0 thus T f( x)(x k x) + x k x o() 0 Dividing by (x k x) : T f( x)(x k x)/ (x k x) + o() 0 and in the limit T f( x)d 0. Examples Unconstrained problems Every d R n belongs to the tangent cone at a local optimum T f( x)d 0 Choosing d = e i e d = e i we get f( x) = 0 d R n NB: the same is true if x is a local minimum in the relative interior of the feasible region. Optimality Conditions p. 5 Optimality Conditions p. 6

18 Linear equality constraints min f(x) Ax = b Tangent cone: {d : Ad = 0}. Necessary conditions: T f( x)d 0 d : Ad = 0 equivalent statement: Linear equality constraints From LP duality max 0 T λ = 0 A T λ = f( x) Thus at a local minimum point there exist Lagrange multipliers: λ : A T λ = f( x) min d T f( x)d = 0 Ad = 0 (a linear program). Optimality Conditions p. 7 Optimality Conditions p. 8 Linear inequalities min f(x) Ax b Tangent cone at a local minimum x: {d R n : a T i d 0 i I( x)}. Let A I be the rows of A associated to active constraints at x. Then min d T f( x)d = 0 A I d 0 λ 0 Linear inequalities From LP duality: max 0 T λ = 0 A T Iλ = f( x) λ 0 Thus, at a local optimum, the gradient is a non positive linear combination of the coefficients of active constraints. Optimality Conditions p. 9 Optimality Conditions p. 20

19 Farkas Lemma Geometrical interpretation Let A: matrix in R m n and b R n. One and only one of the following sets: A T y 0 b T y > 0 A T y 0 Ax = b b T y > 0 x 0 and a is non empty Ax = b x 0 b {z : x : z = Ax,x 0} a 2 {y : A T y 0} Optimality Conditions p. 2 Optimality Conditions p. 22 Proof ) if x 0 : Ax = b b T y = x T A T y. Thus if A T y 0 b T y 0. 2) Premise: Separating hyperplane theorem: let C and D be two convex nonempty sets: C D =. Then there exists a 0 and b: a T x b a T x b x C x D If C is a point and D is a closed convex set, separation is strict, i.e. a T C < b a T x > b x D Farkas Lemma (proof) 2) let {x : Ax = b,x 0} =. Let S = {y R m : x 0,Ax = y} S is closed, convex and b S. From the separating hyperplane theorem: α R m 0, β R: α T y β α T b > β x S 0 S β 0 α T b > 0; α T Ax β for all x 0. This is possible iff α T A 0. Letting y = α we obtain a solution of A Y y 0 b T y > 0 Optimality Conditions p. 23 Optimality Conditions p. 24

20 First order feasible variations cone First order variations G( x) = {d R n : T g i ( x)d 0} i I G( x) T( x). In fact if {x k } is feasible and then g i ( x) 0 and d = lim k x k x x k x g( x + lim k (x k x)) 0 Optimality Conditions p. 25 Optimality Conditions p g( x + lim k g( x + lim k Let α k = x k x, if α k 0: x k x x k x x k x ) 0 x k x lim x k x x k x ) 0 g( x + lim k x k x d) 0 g( x + α k d) 0 g i ( x + α k d) = g i ( x) + α k T g i ( x)d + o(α k ) where α k > 0 and d belong to the tangent cone T( x). If the i th constraint is active, then g i ( x + α k d) = α k T g i ( x)d + o(α k ) 0 g i ( x + α k d)/α k = T g i ( x)d + o(α k ))/α k 0 Letting α k 0 the result is obtained. Optimality Conditions p. 27 Optimality Conditions p. 28

21 example KKT necessary conditions G( x) T( x); x 3 + y 0 y 0 (Karush Kuhn Tucker) Let x X R n,x be a local optimum for min f(x) g i (x) 0 x X I: indices of active constraints at x. If:. f(x),g i (x) C ( x) for i I i =,...,m 2. constraint qualifications conditions: T( x) = G( x) hold in x ; then there exist Lagrange multipliers λ i 0,i I: f( x) + i I λ i g i ( x) = 0. Optimality Conditions p. 29 Optimality Conditions p. 30 Proof x local optimum if d T( x) d T f( x) 0. But d T( x) d T g i ( x) 0 i I. Thus it is impossible that T f( x)d > 0 T g i ( x)d 0 i I From Farkas Lemma there exists a solution of: λ i T g i ( x) = T f( x) i I i I λ i 0 i I Constraint qualifications: examples polyhedra: X = R n and g i (x) are affine functions: Ax b linear independence: X open set, g i (x),i I continuous in x and { g i ( x)},i I are linearly independent. Slater condition: X open set, g i (x),i I convex differentiable functions in x, g i (x),i I continuous in x, and ˆx X strictly feasible: g i (ˆx) < 0 i I. Optimality Conditions p. 3 Optimality Conditions p. 32

22 Convex problems An optimization problem is a convex problem if S is a convex set, i.e. λ [0, ] min f(x) x S x,y S λx + ( λ)y S f is a convex function on S, i.e. f(λx + ( λ)y) λf(x) + ( λ)f(y) Standard convex problem min f(x) g i (x) 0 i =,m h j (x) = 0 j =,k if f is convex g i are convex h j are affine (i.e. of the form α T x + β) then the problem is convex. λ [0, ] and x,y S Optimality Conditions p. 33 Optimality Conditions p. 34 Convex problems Every local optimum is a global one. Proof: x: local optimum for min S f(x) x : global optimum. S convex λx + ( λ) x S. Thus if λ 0 f( x) f(λx + ( λ) x λf(x ) + ( λ)f( x) f( x) f(x ) and x is also a global optimum. Sufficiency of st order conditions (for a convex differentiable problem: if d T f( x) d T( x), then x is a (global) optimum Proof: f(y) f( x) + (y x) T f(x) But y x T( x) f(y) f( x) + d T f(x) f( x) thus x is a global minimum. y S y S Optimality Conditions p. 35 Optimality Conditions p. 36

23 Convexity of the set of global optima (for convex problems) The set of global minima of a convex problem is a convex set. In fact, let x and ȳ be global minima for the convex problem min f(x) x S Then, choosing λ [0, ] we have λ x + ( λ)ȳ S, as S is convex. Moreover f(λ x + ( λ)ȳ) λf( x) + ( λ)f(ȳ) λf + ( λ)f = f KKT for equality constraints x: local optimum for min f(x) g i (x) 0 h j (x) = 0 x X R n i =,...,m j =,...,k Let I: set of active inequalities in x. If f(x), g i (x),i I,h j (x) C and constraint qualifications hold in x, λ i 0 i I e µ j R, j =,...,h: where f is the global minimum value. Thus the equality holds and the proof is complete. f( x) + i I h λ i g i ( x) + µ j h j ( x) = 0 j= Optimality Conditions p. 37 Optimality Conditions p. 38 Complementarity KKT equivalent formulation: m h f( x) + λ i g i ( x) + µ j h j ( x) = 0 i= j= λ i g i ( x) = 0 i =,...,m Condition λ i g i ( x) = 0 is called complementarity condition II order necessary conditions If f,g,h j C 2 in x and the gradients of active constraints in x are linearly independent, then there exist mutlipliers λ i 0,i I and µ j,j =,...,k such that and f( x) + i I λ i g i ( x) + k µ j h j ( x) = 0 j= d T 2 L( x)d 0 for every direction d: d T g i ( x) 0,d T h j (x) = 0 where Optimality Conditions p L(x) := 2 f(x) + i I k λ i 2 g i (x) + µ j 2 h j (x) j= Optimality Conditions p. 40

24 Sufficient conditions Let f,g i,h j twice continuously differentiable. Let x,λ,µ : Problem: Lagrange Duality f(x ) + i I k λ i g i (x ) + µ j h j (x ) = 0 j= λ ig i (x ) = 0 f = min f(x) g i (x) 0 x X λ i 0 definition: Lagrange Function: d T 2 L(x )d > 0 d :d T h j (x ) = 0 d T g i (x ) = 0,i I L(x;λ) = f(x) + i λ i g i (x) λ 0,x X then x is a local minimum. Optimality Conditions p. 4 Optimality Conditions p. 42 Relaxation Given an optimization problem a relaxation is a problem min f(x) x S Proof: Lagrange minimization is a relaxation Feasible set of the Lagrange problem: X (contains the original one) If g(x) 0 and λ 0 where S Q min g(x) x Q g(x) f(x) x S. L(x,λ) = f(x) + λ T g(x) f(x) Weak Duality : The optimal value of a relaxation is a lower bound on the optimum value of the problem. Optimality Conditions p. 43 Optimality Conditions p. 44

25 Dual Lagrange function Example (circle packing) with respect to constraints g(x) 0: θ(λ) = inf x X L(x,λ) = inf x X (f(x) + λt g(x)) For every choice of λ 0, θ(λ) is a lower bound for every feasible solution and in particular, is a lower bound for the global minimum value of the problem. min r 4r 2 (x i x j ) 2 (y i y j ) 2 0 x i,y i x i, y i 0 i < j N i =,...,N i =,...,N Optimality Conditions p. 45 Optimality Conditions p. 46 solution When N = 2, relaxing the first constraint: θ(λ) = min x,y,r r + λ(4r2 (x x 2 ) 2 (y y 2 ) 2 ) x,x 2,y,y 2 0 x,x 2,y,y 2 Minimizing with respect to x,y x x 2 = y y 2 = from which θ(λ) = min r + 4λr 2 2λ r r = 8λ θ(λ) = 2λ 6λ This is a lower bound on the optimum value. Best possible lower bound: Optimality Conditions p. 47 θ = maxθ(λ) λ λ = θ = 2 Optimality Conditions p. 48

26 Lagrange Dual Choosing (x,y ) = (0, 0) and (x 2,y 2 ) = (, ) a feasible solution with r = 2/2 is obtained. The Lagrange dual gives a lower bound equal to 2/2: same as the objective function at a feasible solution optimal solution! (an exception, not the rule!) This problem might:. be unbounded θ = maxθ(λ) λ 0 2. have a finite sup but non max 3. have a unique maximum attained in correspondence with a single solution x 4. have many different maxima, each connected with a different solution x Optimality Conditions p. 49 Optimality Conditions p. 50 Equality constraints Linear Programming f = minf(x) g i (x) 0 i =,...,m h j (x) = 0 j =,...,k x X Lagrange function: L(x;λ,µ) = f(x) + λ T g(x) + µ T h(x) where λ 0, but µ is free. Dual Lagrange function: but: min c T x Ax b θ(λ) = min x c T x + λ T (Ax b) min x (c T + λ T A)x = = λ T b + min x (c T + λ T A)x. { 0 if c T + λ T A = 0 otherwise. Optimality Conditions p. 5 Optimality Conditions p. 52

27 ... Lagrange dual function: { θ(λ) = λ T b if c T + λ T A = 0 otherwise. Quadratic Programming (QP) min 2 xt Qx + c T x Ax = b Lagrange dual: which is equivalent to: max λ T b λ T A + c T = 0 λ 0 (Q: symmetric). Lagrange dual function: θ(λ) = min x 2 xt Qx + c T x + λ T (Ax b) = λ T b + min x 2 xt Qx + (c T + λ T A)x max λ T b λ T A = c T λ 0 Optimality Conditions p. 53 Optimality Conditions p. 54 QP Case Q has at least one negative eigenvalue min x 2 xt Qx + (c T + λ T A)x = QP Case 2 Q positive definite minimum point of the dual Lagrange function: Q x + (c + A T λ) = 0 In fact d : d T Qd < 0. Choosing x = αd with α > 0 2 xt Qx + (c T + λ T A)x = 2 α2 d T Qd + α(c T + λ T A)d and for large values of α this can be made as small as desired. i.e. x = Q (c + A T λ) Optimality Conditions p. 55 Optimality Conditions p. 56

28 Lagrange function value: θ(λ) = λ T b + 2 xt Q x + (c T + λ T A) x Lagrange dual (seen as a min problem): min λ λ T b + 2 (c + AT λ) T Q (c + A T λ) = λ T b + 2 (c + AT λ) T Q QQ (c + A T λ) (c T + λ T A)Q (c + A T λ) Optimality conditions: b + AQ (c + A T λ) = 0 = λ T b + 2 (c + AT λ) T Q (c + A T λ) (c T + λ T A)Q (c + A T λ) = λ T b 2 (c + AT λ) T Q (c + A T λ) But recalling that x = Q (c + A T λ) b A x = 0 feasibility of x if we find optimal multipliers λ (a linear system) we get the optimal solution x (thanks to feasibility and weak duality)! Optimality Conditions p. 57 Optimality Conditions p. 58 Properties of the Lagrange dual Dim. For any problem f = minf(x) g i (x) 0 x X i =,...,m where X is non empty and compact, if f and g i are continuous then the Lagrange dual function is concave From Weierstrass theorem θ(λ) = min x X f(x) + λt g(x) exists and is finite θ(ηa + ( η)b) = min x X (f(x) + (ηa + ( η)b)t g(x)) = min x X (η(f(x) + at g(x)) + ( η)(f(x) + b T g(x))) η min x X (f(x) + at g(x)) + ( η) min x X (f(x) + bt g(x)) = ηθ(a) + ( η)θ(b). Optimality Conditions p. 59 Optimality Conditions p. 60

29 Solution of the Lagrange dual... is equivalent to max λ maxz θ(λ) = max min (f(x) + λ x X λt g(x)) z f(x) + λ T g(x) λ 0 x X After having computed f and g in x,x 2,...,x k a restricted dual can be defined: Let λ be the optimal solution of the restricted dual. Is it an optimal dual solution? Is it true that z f(x) + λ T g(x)? Check: we look for x, optimal solution of min f(x) + λ T g(x) x X if f( x) + λ T g( x) z then we have found the optimal solution of the dual; otherwise the pair x,f( x) is added to the restricted dual and a new solution is computed. maxz z f(x j ) + λ T g(x j ) j =,...,k λ 0 Optimality Conditions p. 6 Optimality Conditions p. 62 Geometric programming Unconstrained Geometric program: min x>0 m n c k x α kj j α kj R,c k > 0 k= j= (non convex). Variable substitution: x j = exp(y j ) y j R Transformed problem: ( m n ) min c k e α kjy j = y k= min y j= m k= e αt k y+β k β k = log c k still non convex, but its logarithm is convex. Optimality Conditions p. 63 Optimality Conditions p. 64

30 Duality example solving the dual Dual of m min f(x) min log exp(αk T x + β k ) k= Dual function L(λ) = min log m exp y k + λ T (Ax + β y) x,y k= No constraints dual lagrange function is identical to f(x)! Strong duality holds, but is useless. Simple transformation: min log m expy k k= y k = α T k x + β k Minimization in x is unconstrained: min λ T Ax if λ T A 0 L(λ) is unbounded if λ T A = 0 then L(λ) = min log y m exp y k + λ T (β y) k= Optimality Conditions p. 65 Optimality Conditions p. 66 First order (unconstrained) optimality conditions w.r.t. y i : Substituting λ j = expy j / k expy k, expy i k exp y λ i = 0 k L(λ) = log j exp y j j λ j y j Lagrange multipliers exist provided that λ i = λ i > 0 i i = log exp y j y j expy j / expy k j j k = k exp y k( exp y k (log expy j y k )) k j = ( expy k k j exp y (log ) exp y j y k ) j j = k λ k log λ k Optimality Conditions p. 67 Optimality Conditions p. 68

31 Lagrange Dual Special cases: linear constraints The Lagrange Dual becomes: maxβ T λ λ k λ k log λ k λ k = k A T λ = 0 λ 0 Lagrange function: min f(x) Ax b L(x,λ) = f(x) + λ T (b Ax) Constraint qualifications always hold (polyhedron). If x is a local optimum there exists λ 0: Ax b f(x ) = A T λ λ T (b Ax ) = 0 Optimality Conditions p. 69 Optimality Conditions p. 70 Non negativity constraints min f(x) x 0 Lagrange function: L(x,λ) = f(x) λ T x. KKT conditions: f(x ) = λ x 0 λ 0 (λ ) T x = 0 from which λ j = f(x ) x j j =,n f(x ) x j = 0 j : x j > 0 f(x ) x j 0 otherwise Optimality Conditions p. 7 Optimality Conditions p. 72

32 Box constraints Box constr. (cont) min f(x) l x u l i < u i i Lagrange function: L(x,λ,µ) = f(x) + λ T (l x) + µ T (x u). KKT conditions: f(x ) = λ µ (l x ) T λ = 0 Then, from complementarity, f(x ) x j = λ j j J l f(x ) x j = µ j j J u f(x ) x j = 0 j J 0 (x u) T µ = 0 (λ,µ ) 0 Given x let J l = {j : x j = l j },J u = {j : x j = u j },J 0 = {j : l j < x j < u j } Optimality Conditions p. 73 Optimality Conditions p. 74 Optimization over the simplex Thus f(x ) x j 0 j J l f(x ) x j 0 j J u f(x ) x j = 0 j J 0 min f(x) T x = x 0 Lagrange function: L(x,λ,µ) = f(x) λ T x + µ T ( T x ). KKT: f(x ) = λ µ with feasibility l x u T x = (x,λ ) 0 (λ ) T x = 0 Optimality Conditions p. 75 Optimality Conditions p. 76

33 simplex... f(x ) x j λ j = µ (all equal). Thus, from complementarity, if x j > 0 then λ j = 0 and f(x ) x j = µ ; otherwise f(x ) x j µ. Thus, if j : x j > 0, f(x ) x j f(x ) x k k Application: Min var portfolio Given n assets with random returns R,...,R n, how to invest e in such a way that the resulting portfolio has minimum variance? If x j denotes the percentage of the investment on asset j, how to compute the variance of this portfolio P(x)? Var = E(P(x) (E(P(x)))) 2 ( n ) 2 = E (R j E(R j ))x j j= = i,j (R i E(R i ))(R j E(R j ))x i x j = x T Qx where Q is the variance-covariance matrix of the n assets. Optimality Conditions p. 77 Optimality Conditions p. 78 Min var portfolio Problem (objective multiplied by /2 for simpler computations): min(/2)x T Qx T x = x 0 Optimal portfolio KKT: for all j : x j > 0: Q ij x j j j Q kj x j Vector Qx might be thaught as the vector of marginal contributions to the total risk (which is a weighted sum of elements of Qx). Thus in the optimal portfolio, all assets with positive level give equal (and minimal) contribution to the total risk. k Optimality Conditions p. 79 Optimality Conditions p. 80

34 Optimization Algorithms Algorithms for unconstrained local optimization Fabio Schoen Most common form for optimization algorithms: Line search-based methods: Given a starting point x 0 a sequence is generated: x k+ = x k + α k d k where d k R n : search direction, α k > 0: step Usually first d k is chosen and than the step is obtained, often from a dimensional optimization Algorithms for unconstrained local optimization p. Algorithms for unconstrained local optimization p. 2 Trust-region algorithms Speed measures A model m(x) and a confidence region U(x k ) containing x k are defined. The new iterate is chosen as the solution of the constrained optimization problem min m(x) x U(x k ) The model and the confidence region are possibly updated at each iteration. Let x : local optimum. The error in x k might be measured e.g. as e(x k ) = x k x e(x k ) = f(x k ) f(x ). Given {x k } x if q > 0,β (0, ) : (for k large enough): e(x k ) qβ k or {x k } is linearly convergent, or converges with order ; β : convergence rate A sufficient condition for linear convergence: lim sup e(x k+) e(x k ) β Algorithms for unconstrained local optimization p. 3 Algorithms for unconstrained local optimization p. 4

35 super linear convergence Higher order convergence If for every β (0, ) exists q: e(x k ) qβ k If, given p >, q > 0,β (0, ) : e(x k ) qβ (pk ) then convergence is super linear. Sufficient condition: lim sup e(x k+) e(x k ) = 0 then {x k } is said to converge with order at least p If p = 2 quadratic convergence Sufficient condition: lim sup e(x k+) e(x k ) p < Algorithms for unconstrained local optimization p. 5 Algorithms for unconstrained local optimization p. 6 Examples Examples converges to 0 with order one (linear convergence) k converges to 0 with order one (linear convergence) k converges to 0 with order k 2 Algorithms for unconstrained local optimization p. 7 Algorithms for unconstrained local optimization p. 7

36 Examples Examples converges to 0 with order one (linear convergence) k converges to 0 with order k 2 2 k converges to 0 with order converges to 0 with order one (linear convergence) k converges to 0 with order k 2 2 k converges to 0 with order k k converges to 0 with order ; convergence is super linear Algorithms for unconstrained local optimization p. 7 Algorithms for unconstrained local optimization p. 7 Examples Descent directions and the gradient converges to 0 with order one (linear convergence) k converges to 0 with order k 2 2 k converges to 0 with order k k converges to 0 with order ; convergence is super linear converges a 0 with order 2 quadratic convergence 2 2k Let f C (R n ), x k R n : f(x k ) 0 Let d R n. If then d is a descent direction Taylor expansion: d T f(x k ) < 0 f(x k + αd) f(x k ) = αd T f(x k ) + o(α) f(x k + αd) f(x k ) α = d T f(x k ) + o() Thus if α is small enough f(x k + αd) f(x k ) < 0 NB: d might be a descent direction even if d T f(x k ) = 0 Algorithms for unconstrained local optimization p. 7 Algorithms for unconstrained local optimization p. 8

37 Convergence of line search methods If a sequence x k+ = x k + α k d k is generated in such a way that: L 0 = {x : f(x) f(x 0 )} is compact d k 0 whenever f(x k ) 0 f(x k+ ) f(x k ) if f(x k ) 0 k then lim k d T k d k f(x k) = 0 if d k 0 then d T k f(x k) d k σ( f(x k ) ) where σ is such that lim k σ(t k ) = 0 lim k t k = 0 (σ is called a forcing function) Algorithms for unconstrained local optimization p. 9 Algorithms for unconstrained local optimization p. 0 Comments on the assumptions Then either there exists a finite index k such that f(x k) = 0 or otherwise x k L 0 and all of its limit points are in L 0 {f(x k )} admits a limit lim k f(x k ) = 0 for every limit point x of {x k } we have f( x) = 0 f(x k+ ) f(x k ): most optimization methods choose d k as a descent direction. If d k is a descent direction, choosing α k sufficiently small ensures the validity of the assumption d lim T k k f(x d k k) = 0: given a normalized direction d k, the scalar product d k T f(x k ) is the directional derivative of f along d k : it is required that this goes to zero. This can be achieved through precise line searches (choosing the step so that f is minimized along d k ) d T k f(x k) d k σ( f(x k ) ): letting, e.g., σ(t) = ct, c > 0, if d k : d T k f(x k) < 0 then the condition becomes d T k f(x k) d k f(x k c Algorithms for unconstrained local optimization p. 2 Algorithms for unconstrained local optimization p.

38 Gradient Algorithms Recalling that then the condition becomes cos θ k = dt k f(x k) d k f(x k cos θ k c that is, the angle between d k and f(x k ) is bounded away from orthogonality. General scheme: with D k 0 e α k > 0 If f(x k ) 0 then is a descent direction. In fact x k+ = x k α k D k f(x k ) d k = D k f(x k ) d T k f(x k ) = T f(x k )D k f(x k ) < 0 d T k f(xk) θk Algorithms for unconstrained local optimization p. 3 Algorithms for unconstrained local optimization p. 4 Steepest Descent or gradient method: D k := I i.e. x k+ = x k α k f(x k ). If f(x k ) 0 then d k = f(x k ) is a descent direction. Moreover, it is the steepest (w.r.t. the euclidean norm): min d R n T f(x k )d d f(x k ) Algorithms for unconstrained local optimization p. 5 Algorithms for unconstrained local optimization p. 6

39 ... Newton s method min d R T f(x k )d n dt d KKT conditions: In the interior T f(x k ) = 0; if the constraint is active f(x k ) + λ d d = 0 dt d = λ 0 d = f(x k) f(x k ). Algorithms for unconstrained local optimization p. 7 D k := ( 2 f(x k ) ) Motivation: Taylor expansion of f: f(x) f(x k ) + T f(x k )(x x k ) + 2 (x x k) T 2 f(x k )(x x k ) Minimizing the approximation: f(x k ) + 2 f(x k )(x x k ) = 0 If the hessian is non singular x = x k ( 2 f(x k ) ) f(xk ) Algorithms for unconstrained local optimization p. 8 Step choice Given d k, how to choose α k so that x k+ = x k + α k d k? optimal choice (one-dimensional optimization): α k = arg min α 0 f(x k + αd k ). Analytical expression of the optimal step is available only in few cases. E.g. if f(x) = 2 xt Qx + c T x with Q 0. Then f(x k + αd k ) = 2 (x k + αd k ) T Q(x k + αd k ) + c T (x k + αd k ) where β does not depend on α. = 2 α2 d T k Qd k + α(qx k + c) T d k + β Minimizing w.r.t. α: αd T k Qd k + (Qx k + c) T d k = 0 E.g., in steepest descent: α k = α = (Qx k + c) T d k d T k Qd k = dt k f(x k) d T k 2 f(x k )d k f(x k ) 2 T f(x k ) 2 f(x k ) f(x k ) Algorithms for unconstrained local optimization p. 9 Algorithms for unconstrained local optimization p. 20

40 Approximate step size Avoid too large steps Rules for choosing a step-size (from the sufficient condition for convergence): f(x k+ ) < f(x k ) lim k d T k d k f(x k) = 0 Often it is also required that x k+ x k 0 d T K f(x k + α k d k ) 0 In general it is important to insure a sufficient reduction of f and a sufficiently large step x k+ x k Algorithms for unconstrained local optimization p. 2 Algorithms for unconstrained local optimization p. 22 Avoid too small steps Armijo s rule Input: δ (0, ), γ (0, /2), k > 0 α := k ; while (f(x k + αd k ) > f(x k ) + γαd T k f(x k)) do α := δα ; end return α Typical values : δ [0., 0.5], γ [0 4, 0 3 ]. On exit the returned step is such that f(x k + αd k ) f(x k ) + γαd T k f(x k ) Algorithms for unconstrained local optimization p. 23 Algorithms for unconstrained local optimization p. 24

41 Line search in practice acceptable steps How to choose the initial step size k? Let φ(α) = f(x k + αd k ). A possibility is to choose k = α, the minimizer of a quadratic approximation to φ( ). Example: α q(α) = c 0 + c α + 2 c 2α 2 q(0) = c 0 := f(x k ) q (0) = c := d T k f(x k ) γαd T k f(x k) Then α = c /c 2. αd T k f(x k) Algorithms for unconstrained local optimization p. 25 Algorithms for unconstrained local optimization p. 26 Third condition? If an estimate ˆf of the minimum of f(x k + αd k ) is available choose c 2 : min q(α) = ˆf. min q(α) = q( c /c 2 ) = c 0 c 2 /c 2 := ˆf c 2 = c 2 /2( ˆf c 0 ) Thus it is reasonable to start with k = 2 ˆf f(x k ) d T k f(x k) A reasonable estimate might be to choose k = 2 (f(x k ) f(x k )) d T k f(x k) α = c /c 2 = 2 ˆf c 0 c Algorithms for unconstrained local optimization p. 27 Algorithms for unconstrained local optimization p. 28

42 Convergence of steepest descent x k+ = x k α k f(x k ) If a sufficiently accurate step size is used the condition of the theorem on global convergence are satisfied the steepest descent algorithm globally converges to a stationary point. Sufficiently accurate means exact line search or, e.g., Armijo s rule. Local analysis of steepest descent Behaviour of the algorithm when minimizing f(x) = 2 xt Qx where Q 0. (local and global) optimum: x = 0. Steepest descent method: x k+ = x k α k f(x k ) = x k α k Qx k = (I α k Q)x k Error (in x) at step k + : Algorithms for unconstrained local optimization p. 29 x k+ 0 = (I α k Q)x k = x T k (I α kq) 2 x k Algorithms for unconstrained local optimization p. 30 Analysis Let A: symmetric with eigenvalues: λ < < λ n. Then λ v 2 v T Av λ m v 2 v R n x T k (I α k Q) 2 x k λ x T k x k where λ largest eigenvalue of (I α k Q) λ is an eigenvalue of A iff αλ is an eigenvalue of αa λ is an eigenvalue of A iff + λ is an eigenvalue of I + A Thus the eigenvalues of (I α k Q) are αλ i where λ i are the eigenvalues of Q. The maximum eigenvalue will be: max{( α k λ ) 2, ( α k λ n ) 2 } thus x k+ max{( α k λ ) 2, ( α k λ n ) 2 } x k = max{ α k λ, α k λ n } x k Algorithms for unconstrained local optimization p. 3 Algorithms for unconstrained local optimization p. 32

43 Eliminating the dependency on α k : max{ αλ, αλ n } = max{ αλ, + αλ, αλ n, + αλ n } 5 4 αλ αλ n Algorithms for unconstrained local optimization p. 33 α 0 and λ λ n, αλ αλ n + αλ + αλ n and thus max{ α k λ, α k λ n } x k = max{ αλ, + αλ n } Minimum point: αλ = + αλ n i.e. α 2 = λ + λ n Algorithms for unconstrained local optimization p. 34 Analysis In the best possible case x k+ x k α λ = = λ n λ λ n + λ = ρ ρ + 2 λ + λ n λ where ρ = λ n /λ : condition number of Q ρ (ill conditioned problem) very slow convergence ρ very speed convergence Zig zagging min 2 (x2 + My 2 ) where M > 0. Optimum: x 0y = 0. Starting point: (M, ). Iterates: [ ] [ ] [ ] xk+ xk x = + α k My k y k+ With optimal step size [ xk+ y k+ y k ] [ ( M M ) k ] M+ = ( ) M k M+ Algorithms for unconstrained local optimization p. 35 Algorithms for unconstrained local optimization p. 36

44 Zig zagging Converegence is rapid if M very slow and zig zagging if M or M Slow convergence and zig zagging are general phenomena (especially when the starting point is near the longest axes of the ellipsoidal level sets) Algorithms for unconstrained local optimization p. 37 Algorithms for unconstrained local optimization p. 38 Analysis of Newton s method Newton-Raphson method: x k+ = x k ( 2 f(x k )) f(x k ). Let x : local optimum. Taylor expansion of f: f(x ) = 0 = f(x k ) + 2 f(x k )(x x k ) + o( x x k ) If 2 f(x k ) is non singular and ( 2 f(x k )) is limited 0 = ( 2 f(x k ) ) f(xk ) + (x x k ) + ( 2 f(x k ) ) o( x x k ) = x x k+ + o( x x k ) Thus i.e. x x k+ x x k = o( x x k ) x x k x x k+ = o( x x k ) convergence is at least super linear Algorithms for unconstrained local optimization p. 39 Algorithms for unconstrained local optimization p. 40

45 Local Convergence of Newton s Method Let f C 2 (U(x,δ )), where U: ball with radius δ and center x ; let 2 f(x ) be non singular. Then:. δ > 0 : if x 0 U(x,δ) {x k } is well defined and converges to x at least superlinearly. 2. If δ > 0,L > 0,M > 0 : and 2 f(x) 2 f(y) L x y Difficulties Many things might go wrong: at some iteration, 2 f(x k ) might be singular. For example: if x k belongs to a flat region f(x) = constant. even if non singular, inversion 2 f(x k ) or, in any case, solving a linear system with coefficient matrix 2 f(x k ) is numerically unstable and computationally demanding there is no guarantee that 2 f(x k ) 0 Newton direction might not be a descent direction ( 2 f(x)) M then, if x 0 U(x,δ) Newton s method converges with order at least 2 and x k+ x LM 2 x k x 2 Algorithms for unconstrained local optimization p. 4 Algorithms for unconstrained local optimization p. 42 Difficulties Newton s method just tries to solve the system f(x k ) = 0 and thus might very well be attracted towards a maximum the method lacks global convergence: it converges only if started near a local optimum Newton type methods line search variant: x k+ = x k α k ( 2 f(x k )) f(x k ) Modified Newton method: replace 2 f(x k ) by ( 2 f(x k ) + D k ) where D k is chosen so that 2 f(x k ) + D k is positive definite Algorithms for unconstrained local optimization p. 43 Algorithms for unconstrained local optimization p. 44

46 Quasi-Newton methods Consider solving the nonlinear system f(x) = 0. Taylor expansion of the gradient: f(x k ) f(x k+ ) + 2 f(x k+ )(x k x k+ ) Let B k+ be an approximation of the hessian in x k+. Quasi Newton equation: B k+ (x k+ x k ) = f(x k+ ) f(x k ) Let: Quasi Newton equation s k := x k+ x k y k := f(x k+ ) f(x k ) Quasi Newton equation: B k+ s k = y k. If B k was the previous approximate hessian, we ask that. the variation between B k and B k+ is small 2. nothing changes along directions which are normal to the step s k : B k z = B k+ z z : z T s k = 0 Choosing n vectors z which are orthogonal to s k n 2 linearly independent equations in n 2 unknowns a unique solution. Algorithms for unconstrained local optimization p. 45 Algorithms for unconstrained local optimization p. 46 Broyden updating It can be shown that the unique solution is given by: B k+ = B k + (y k B k s k )s T k s T k s k Theorem: let B k R n n and s k 0. The unique solution to: min B k ˆB F ˆB ˆBs k = y k is Broyden s update B k+ here X F = TrX T X denotes Frobenius norm. proof B k+ B k = (y k B k s k )s T k s T k s k ( = ˆBs k B k s k )s T k s T k s k = ( ˆB B k )s k s T k s T k s k ( ˆB B k ) s ks T k s T k s = ( ˆB Trsk s T k B k ) s ks T k k s T k s k = ( ˆB B k ) st k s k s T k s = ( ˆB B k ) k Unicity is a consequence of the strict convexity of the norm and the convexity of the feasible region. Algorithms for unconstrained local optimization p. 47 Algorithms for unconstrained local optimization p. 48

47 Quasi-Newton and optimization Special situation:. the hessian matrix in optimization problems is symmetric; 2. in gradient methods, when we let x k+ = x k (B k+ ) f(x k ), it is desirable that B k+ be positive definite. Broyden s update: B k+ = B k + (y k B k s k )s T k s T k s k is generally not symmetric even if B k is. Simmetry Remedy: let C = B k + (y k B k s k )s T k s T k s k C 2 = 2 (C + C T ) symmetrization: However, it does not satisfy Quasi Newton equation. Broyden update of C 2 : which is not symmetric,... C 3 = C 2 + (y k C 2 s k )s T k s T k s k Algorithms for unconstrained local optimization p. 49 Algorithms for unconstrained local optimization p. 50 PBS update BFGS In the limit B k+ = B k + (y k B k s k )s T k + s k(y k B k s k ) T s T k s k + (st k (y k B k s k ))s k s T k (s T k s k) 2 Same ideas, but applied to the approximate inverse Hessian: Inverse Quasi Newton equation: s k = H k+ y k (PBS Powell-Broyden-Symmetric update). Imposing also hereditary positive definiteness, DFP (Davidon-Fletcher-Powell) is obtained: B k+ = B k + (y k B k s k )yk T + y k(y k B k s k ) T yk Ts k ( = I y ) ( ks T k yk Ts B k I s ) kyk T k yk Ts + y kyk T k yk Ts k + (st k (y k B k s k ))y k y T k (y T k s k) 2 lead to the most common Quasi Newton update: BFGS (Broyden-Fletcher-Goldfarb-Shanno): ( H k+ = I s ) ( kyk T yk Ts H k I y ) ks T k k yk Ts + s ks T k k yk Ts k Algorithms for unconstrained local optimization p. 5 Algorithms for unconstrained local optimization p. 52

48 BFGS method x k+ = x k α k H k f(x k ) ( H k+ = I s kyk T yk Ts k y k = f(x k+ ) f(x k ) s k = x k+ x k ) H k ( I y ks T k y T k s k ) + s ks T k y T k s k Trust Region methods Possible defect of standard Newton method: the approximation becomes less and less precise if we move away from the current point. Long step bad approximation. Idea: constrained minimization of quadratic approximation: x k+ = arg min m k (x) x k+ x k k m k (x) = f(x k ) + T f(x k )(x k+ x k ) where + 2 (x k+ x k ) T 2 f(x k )(x k+ x k ) k > 0: parameter. First advantage (over pure Newton): the step is always definite (thanks to Weierstrass s theorem) Algorithms for unconstrained local optimization p. 53 Algorithms for unconstrained local optimization p. 54 Outline of Trust Region Let m k ( ) a local model function. E.g. in Newton Trust Region methods, m k (s) = f(x k ) + s T f(x k ) + 2 st 2 f(x k )s or in a Quasi-Newton Trust Region method m k (s) = f(x k ) + s T f(x k ) + 2 st B k s How to choose and update the trust region radius k? Given a step s k, let ρ k = f(x k) f(x k + s k ) m k (0) m k (s k ) the ratio between the actual reduction and the predicted reduction Algorithms for unconstrained local optimization p. 55 Algorithms for unconstrained local optimization p. 56

49 Model updating Algorithm ρ k = f(x k) f(x k + s k ) m k (0) m k (s k ) The predicted reduction is always non negative; if ρ k is small (surely if it is negative) the model and the function strongly disagree the step must be rejected and the trust region reduced if ρ k it is safe to expand the trust region intermediate ρ k values lead us to keep the region unchanged Data: ˆ > 0, 0 (0, ˆ ), η [0,/4] for k = 0,,... do Find the step s k and ρ k minimizing the model in the trust region ; if ρ k < /4 then k+ = k /4 ; else end if ρ k > 3/4 and s k = k then k+ = min{2 k, ˆ } ; else k+ = k ; end if ρ k > η then x k+ = x k + s k ; else xk+ = x k ; end end Algorithms for unconstrained local optimization p. 57 Algorithms for unconstrained local optimization p. 58 Solving the model How to find min f(x k ) T s + s 2 st B k s s If B k 0, KKT conditions are necessary and sufficient; rewriting the constraint as s T s 2 : Thus either s is in the interior of the ball with radius, in which case λ = 0 and we have the (quasi)-newton step: p = B k f(x k) or s = and if λ > 0 then 2λs = f(x k ) Bs = m k (s) s is parallel to the negtaive gradient of the model and normal to its contour lines. f(x k ) + B k s + 2λs = 0 λ( s ) = 0 Algorithms for unconstrained local optimization p. 59 Algorithms for unconstrained local optimization p. 60

50 The Cauchy Point Finding the Cauchy point Strategy to approximately solve the trust region sub problem. Find the Cauchy point : the minimizer of m k along the direction f(x k ) within the trust region. First find the direction: p s k = arg minf k + f(x k ) T p p p k Then along this direction find a minimizer τ k = arg min τ 0 m k(τp s k) τp s k k Finding p s k is easy: analytic solution: For the step size τ k : p s k = f(x k) g k If f(x k ) T B k f(x k ) 0 negative curvature direction largest possible step τ k = Otherwise the model along the line is strictly convex, so τ k = min{, k f(x k ) 3 k f(x k ) T B k f(x k ) } The Cauchy point is x k + τ k p s k. Algorithms for unconstrained local optimization p. 6 Choosing the Cauchy point global but extremely slow convergence (similar to steepest descent). Usually an improved point is searched starting from the Cauchy one. Algorithms for unconstrained local optimization p. 62 Pattern Search Derivative Free Optimization For smooth optimization, but without knowledge of derivatives. Elementary idea: if x R 2 is not a local minimum for f, then at least one of the directions e,e 2, e, e 2 (moving towards E, N, W, S) forms an acute angle with f(x) is a descent direction. Direct search: explores all the direction in search of one which gives a descent. Algorithms for unconstrained local optimization p. 63 Algorithms for unconstrained local optimization p. 64

51 Coordinate search Pattern search Let D = {±e i } be the set of coordinate directions and their opposites Data: k = 0, 0 an initial step length, x 0 a starting point while is large enough do if f(x k + k d) < f(x k ) for some d D then x k+ = x k + k d (step accepted) ; else k+ = 0.5 k ; end k = k + ; end It is not necessary to explore 2n directions. It is sufficient that the set of directions forms a positive span, i.e. every v R n should be expressible as a non negative linear combination of the vectors in the set. Formally, G is a generating set iff v 0 R n g G : v T g > 0 A good generating set should be characterized by a sufficiently high cosine measure: κ(g) := min max v T d v 0 d G v d Algorithms for unconstrained local optimization p. 65 Algorithms for unconstrained local optimization p. 66 Examples Step Choice In the first case κ 0.962, in the second κ = 0.5, in the third κ = x k + k d k if f(x k + k d k ) < f(x k ) ρ( k )(success) x k+ = x k otherwise (failure) where ρ(t) = o(t). We let k+ = φ k k where φ k for successful iterations, φ k < otherwise. Direct methods possess good convergence properties. Algorithms for unconstrained local optimization p. 67 Algorithms for unconstrained local optimization p. 68

Nonlinear Programming Models

Nonlinear Programming Models Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

2.3 Linear Programming

2.3 Linear Programming 2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are

More information

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima B9824 Foundations of Optimization Lecture 1: Introduction Fall 2009 Copyright 2009 Ciamac Moallemi Outline 1. Administrative matters 2. Introduction 3. Existence of optima 4. Local theory of unconstrained

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents MATHEMATICAL ECONOMICS: OPTIMIZATION JOÃO LOPES DIAS Contents 1. Introduction 2 1.1. Preliminaries 2 1.2. Optimal points and values 2 1.3. The optimization problems 3 1.4. Existence of optimal points 4

More information

9. Geometric problems

9. Geometric problems 9. Geometric problems EE/AA 578, Univ of Washington, Fall 2016 projection on a set extremal volume ellipsoids centering classification 9 1 Projection on convex set projection of point x on set C defined

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Numerical Optimization

Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima B9824 Foundations of Optimization Lecture 1: Introduction Fall 2010 Copyright 2010 Ciamac Moallemi Outline 1. Administrative matters 2. Introduction 3. Existence of optima 4. Local theory of unconstrained

More information

8. Geometric problems

8. Geometric problems 8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 1 Minimum volume ellipsoid around a set Löwner-John ellipsoid

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Chap 2. Optimality conditions

Chap 2. Optimality conditions Chap 2. Optimality conditions Version: 29-09-2012 2.1 Optimality conditions in unconstrained optimization Recall the definitions of global, local minimizer. Geometry of minimization Consider for f C 1

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

2.098/6.255/ Optimization Methods Practice True/False Questions

2.098/6.255/ Optimization Methods Practice True/False Questions 2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Inequality Constraints

Inequality Constraints Chapter 2 Inequality Constraints 2.1 Optimality Conditions Early in multivariate calculus we learn the significance of differentiability in finding minimizers. In this section we begin our study of the

More information

Examination paper for TMA4180 Optimization I

Examination paper for TMA4180 Optimization I Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Chapter 3 Numerical Methods

Chapter 3 Numerical Methods Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

8. Geometric problems

8. Geometric problems 8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 Minimum volume ellipsoid around a set Löwner-John ellipsoid

More information

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Dr. Abebe Geletu Ilmenau University of Technology Department of Simulation and Optimal Processes (SOP)

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Convex Optimization Fourth lecture, 05.05.2010 Jun.-Prof. Matthias Hein Reminder from last time Convex functions: first-order condition: f(y) f(x) + f x,y x, second-order

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

Introduction to Nonlinear Stochastic Programming

Introduction to Nonlinear Stochastic Programming School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Duality Theory of Constrained Optimization

Duality Theory of Constrained Optimization Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 The Practical Importance of Duality Duality is pervasive

More information

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints. 1 Optimization Mathematical programming refers to the basic mathematical problem of finding a maximum to a function, f, subject to some constraints. 1 In other words, the objective is to find a point,

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Conic Linear Programming. Yinyu Ye

Conic Linear Programming. Yinyu Ye Conic Linear Programming Yinyu Ye December 2004, revised January 2015 i ii Preface This monograph is developed for MS&E 314, Conic Linear Programming, which I am teaching at Stanford. Information, lecture

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780 Optimality

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Convex optimization problems. Optimization problem in standard form

Convex optimization problems. Optimization problem in standard form Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality

More information

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness. CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity

More information

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS Here we consider systems of linear constraints, consisting of equations or inequalities or both. A feasible solution

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009 UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c

More information

Interior Point Methods for Mathematical Programming

Interior Point Methods for Mathematical Programming Interior Point Methods for Mathematical Programming Clóvis C. Gonzaga Federal University of Santa Catarina, Florianópolis, Brazil EURO - 2013 Roma Our heroes Cauchy Newton Lagrange Early results Unconstrained

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

Optimization. A first course on mathematics for economists

Optimization. A first course on mathematics for economists Optimization. A first course on mathematics for economists Xavier Martinez-Giralt Universitat Autònoma de Barcelona xavier.martinez.giralt@uab.eu II.3 Static optimization - Non-Linear programming OPT p.1/45

More information

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2 1 Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year 2013-14 OUTLINE OF WEEK 8 topics: quadratic optimisation, least squares,

More information

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material

More information

IOE 511/Math 652: Continuous Optimization Methods, Section 1

IOE 511/Math 652: Continuous Optimization Methods, Section 1 IOE 511/Math 652: Continuous Optimization Methods, Section 1 Marina A. Epelman Fall 2007 These notes can be freely reproduced for any non-commercial purpose; please acknowledge the author if you do so.

More information

Chapter 2: Preliminaries and elements of convex analysis

Chapter 2: Preliminaries and elements of convex analysis Chapter 2: Preliminaries and elements of convex analysis Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-14-15.shtml Academic year 2014-15

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Linear and non-linear programming

Linear and non-linear programming Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Lecture: Convex Optimization Problems

Lecture: Convex Optimization Problems 1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization

More information

Constrained Optimization Theory

Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

More information

Lecture 13: Constrained optimization

Lecture 13: Constrained optimization 2010-12-03 Basic ideas A nonlinearly constrained problem must somehow be converted relaxed into a problem which we can solve (a linear/quadratic or unconstrained problem) We solve a sequence of such problems

More information

Appendix A Taylor Approximations and Definite Matrices

Appendix A Taylor Approximations and Definite Matrices Appendix A Taylor Approximations and Definite Matrices Taylor approximations provide an easy way to approximate a function as a polynomial, using the derivatives of the function. We know, from elementary

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information