Notes on Numerical Optimization
|
|
- Meagan Norton
- 5 years ago
- Views:
Transcription
1 Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18,
2 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization Problem and Classification Theoretical Properties of Solutions Algorithm Overview Fundamental Definitions and Results Line Search Methods Step Length Step Length Selection Algorithms Global Convergence and Zoutendji Trust Region Methods Trust Region Subproblem and Fidelity Fidelity Algorithms Approximate Solutions to Subproblem Cauchy Point Dogleg Method Global Convergence of Cauchy Point Methods Iterative Solutions to Subproblem Exact Solution to Subproblem Newton s Root Finding Iterative Solution Trust Region Subproblem Algorithms Conjugate Gradients 23 II Model Hessian Selection 24 5 Newton s Method 24 6 Newton s Method with Hessian Modification 28 7 Quasi-Newton s Method Ran-2 Update: DFP & BFGS Ran-1 Update: SR III Specialized Applications of Optimization 32 8 Inexact Newton s Method 32 9 Limited Memory BFGS 33 2
3 10 Least Squares Regression Methods Linear Least Squares Regression Line Search: Gauss-Newton Trust Region: Levenberg-Marquardt IV Constrained Optimization Theory of Constrained Optimization
4 List of Algorithms 1 Bactracing Algorithm Interpolation Algorithm Trust Region Management Algorithm Solution Acceptance Algorithm Overview of Conjugate Gradient Algorithm Conjugate Gradients Algorithm for Convex Quadratic Fletcher Reeves CG Algorithm Line Search with Modified Hessian Overview of Inexact CG Algorithm L-BFGS Algorithm
5 Part I Fundamentals of Optimization 1 Overview of Numerical Optimization 1.1 Problem and Classification 1. Problem: arg min z R n f(z) : { c i (z) = 0 c i (z) 0 i E i I (a) f : R n R is nown as the objective function (b) E are equality constraints (c) I are inequality constraints 2. Classifications (a) Unconstrained vs. Constrained. If E I = then it is an unconstrained problem. (b) Linear Programming vs. Nonlinear Programming. When f(z) and c i (z) are all linear functions of z then this is a linear programming problem. Otherwise, it is a nonlinear programming problem. (c) Note: This is not the same as having linear or nonlinear equations. 1.2 Theoretical Properties of Solutions 1. Global Minimizer. Wea Local Minimizer (Local Minimizer). Strict Local Minimizer. Isolated Local Minimizer. Definition 1.1. Let f : R n R (a) x is a global minimizer of f if f(x ) f(x) for all x R n (b) x is a local minimizer of f if for some neighborhood, N of x, f(x ) f(x) for all x N(x ). (c) x is a strict local minimizer of f if for some neighborhood, N(x ), f(x ) < f(x) for all x N(x ). (d) x is an isolated local minimizer of f if for some neighborhood, N(x ), there are no other minimizers of f in N(x ). Note 1.1. Every isolated local minimizer is a strict minimizer, but it is not true that a strict minimizer is always an isolated local minimizer. 2. Taylor s Theorem Theorem 1.1. Suppose f : R n R is continuously differentiable. Let p R n. Then for some t [0, 1]: f(x + p) = f(x) + f(x + tp) T p (1) 5
6 If f is twice continuously differentiable, then: f(x + p) = f(x) + And for some t [0, 1]: f(x + tp)pdt (2) f(x + p) = f(x) + f(x) T p pt 2 f(x + tp)p (3) 3. Necessary Conditions (a) First Order Necessary Conditions Lemma 1.1. Let f be continuously differentiable. If x is a minimizer of f then f(x ) = 0. Proof. Suppose f(x ) 0. Let p = f(x ). Then, p T f(x ) < 0 and by continuity there is some T > 0 such that for any t [0, T ), p T f(x + tp) < 0. We now apply Equation 1 of Taylor s Theorem to get a contradiction. Let t [0, T ) and q = x + tp. Then t [0, t] such that: f(x + q) = f(x ) + t f(x + t tp) T p < f(x ) (b) Second Order Necessary Conditions Lemma 1.2. Suppose f is twice continuously differentiable. If x is a minimizer of f then f(x ) = 0 and 2 f(x ) 0 (i.e. positive semi-definite) Proof. The first conclusion follows from the first order necessary condition. We proceed with the second one by contradiction and Taylor s Theorem. Suppose 2 f(x ) 0. Then there is a p such that: p T 2 f(x )p < 0 By continuity, there is a T > 0 such that p T 2 f(x + tp)p < 0 for t [0, T ]. Fix t [0, T ] and let q = tp. Then, by Equation 3 from Taylor s Theorem, t [0, t] such that: f(x + q) = f(x ) t2 p T 2 f(x + t tp)p < f(x ) 4. Second Order Sufficient Condition Lemma 1.3. Let x R n. Suppose 2 f is continuous, 2 f(x ) 0, and f(x ) = 0. Then x is a strict local minimizer. 6
7 Proof. By continuity, there is a ball of radius T about x, B, in which 2 f(x) 0. Let p < T. Then, there exists a t [0, 1] such that: f(x + p) = f(x ) pt 2 f(x + tp)p > f(x ) 5. Convexity and Local Minimizers Lemma 1.4. When f is convex any local minimizer x is a global minimizer. If, in addition, f is differentiable then any stationary point is a global minimizer. Proof. Suppose x is not a global minimizer. Let z be the global minimizer. Then f(x ) > f(z). By convexity, for any λ [0, 1]: f(x ) > λf(z) + (1 λ)f(x ) f(λz + (1 λ)x ) So as λ 0, we see that in any neighborhood of x there is a point w such that f(w) < f(x ). A contradiction. For the second part, suppose z is as above. Then: f(x ) T (z x f(x + λ(z x )) f(x ) ) = lim λ 0 λ λf(z) + (1 λ)f(x ) f(x ) lim λ 0 λ f(z) f(x ) < Algorithm Overview 1. In general, algorithms begin with a seed point, x 0, and locally search for decreases in the objective function, producing iterates x, until stopping conditions are met. 2. Algorithms typically generate a local model for f near a point x : f(x + p) m (p) = f + p T f pt B p 3. Different choices of B will lead to different methods with different properties: (a) B = 0 will lead to steepest descent methods (b) Letting B be the closest positive definite approximation to 2 f leads to newton s methods (c) Iterative approximations to the Hessian given by B lead to quasinewton s methods (d) Conjugate Gradient methods update p without explicitly computing B 7
8 1.4 Fundamental Definitions and Results 1. Q-convergence. Definition 1.2. Let x x in R n. (a) x converge Q-linearly if q (0, 1) such that for sufficiently large : (b) x converge Q-superlinearly if: x +1 x x x q x +1 x lim x x 0 (c) x converges Q-quadratically if q 2 > 0 such that for sufficiently large : x +1 x x x 2 q 2. R-convergence. Definition 1.3. Let x x in R n (a) x converges R-linearly if v R converging Q-linearly such that x x v (b) x converges R-superlinearly if v R converging Q-superlinearly such that x x v (c) x converges R-quadratically if v R converging Q-quadratically such that x x v 3. Sherman-Morrison-Woodbury Lemma 1.5. If A and à are non-singular and related by à = A + abt then: à 1 = A 1 A 1 ab T A b T A 1 a Proof. We can chec this by simply plugging in the formula. However, to derive this: à 1 = ( A + ab T ) 1 = ( A [ I + A 1 ab T ]) 1 = [ I + A 1 ab T ] 1 A 1 = [ I A 1 ab T + A 1 ab T A 1 ab T ] A 1 = [ I A 1 a(1 b T A 1 a + )b T ] A 1 = A 1 A 1 ab T A b T A 1 a 8
9 2 Line Search Methods 2.1 Step Length 1. Descent Direction Definition 2.1. p R n is a descent direction if p T f < Problem: Find α arg min φ(α) where φ(α) = f(x + αp ). However, it is too costly to find the exact minimizer of φ. Instead, we find an α which acceptably reduces φ. 3. Wolfe Condition (a) Armijo Condition. Curvature Condition. Wolfe Condition. Strong Wolfe Condition. Definition 2.2. Fix x and p to be a descent direction. Let φ(α) = f(x + αp) and so φ (α) = f T (x + αp)p. Fix 0 < c 1 < c 2 < 1. i. α satisfies the Armijo Condition if φ(α) φ(0)+αc 1 φ (0). Note l(α) := φ(0) + αc 1 φ (0). ii. α satisfies the Curvature Condition if φ (α) c 2 φ (0) iii. α satisfies the Strong Curvature Condition if φ (α) c 2 φ (0) iv. The Wolfe Condition is the Armijo Condition with the Curvature Condition v. The Strong Wolfe Condition is the Armijo Condition with the Strong Curvature Condition. (b) The Armijo Condition: guarantees a decrease at the next iterate by ensuring φ(alpha) < φ(0). (c) Curvature Condition i. If φ (α) < c 2 φ (0), then φ is still decreasing at α and so we improve the reduction in f by taing a larger α ii. If φ (α) c 2 φ (0), then either we are closer to a minimum where φ = 0 or φ > 0 which means we have surpassed the minimum. iii. The strong condition guarantees that the choice of α is closer to φ = 0. (d) Existence of Step satisfying Wolfe and Strong Wolfe Conditions Lemma 2.1. Suppose f : R n R is continuously differentiable. Let p be a descent direction at x and assume that f is bounded from below along the ray {x + αp : α > 0}. If 0 < c 1 < c 2 < 1 there exists an interval satisfying the Wolfe and Strong Wolfe Condition. Proof. i. Satisfying Armijo Condition: Let l(α) = f(x) + c 1 αp T f(x). Since l(α) is decreasing and φ(α) is bounded from below, eventually φ(α) = l(α). Let α A > 0 be the first time this intersection occurs. Then φ(α) < l(α) for α (0, α A ). 9
10 ii. Curvature Condition: By the mean value theorem, β (0, α A ) such that (since l(α A ) = φ(α A )): α A φ (β) = φ(α A ) φ(0) = c 1 α A φ (0) > c 2 α A φ (0) And since φ is smooth there is an interval containing β for which this inequality holds. iii. Strong Curvature Condition. Since φ (β) = c 1 φ (0) < 0, it follows that: φ (β) < c 2 φ (0) 4. Goldstein Condition (a) Goldstein Condition Definition 2.3. Tae c (0, 1/2). The Goldstein Condition is satisfied by α if: φ(0) + (1 c)αφ (0) φ(α) φ(0) + cαφ (0) (b) Existence of Step satisfying Goldstein Condition Lemma 2.2. Suppose f : R n R is continuously differentiable. Let p be a descent direction at x and assume that f is bounded from below along the ray {x + αp : α > 0}. Let c (0, 1/2). Then there exists an interval satisfying the Goldstein condition. Proof. Let l(α) = φ(0)+(1 c)αφ (0) and u(α) = φ(0)+cαφ (0). For α > 0, l(α) < u(α) since c (0, 1/2). Since φ is bounded from below let α u be the smallest point of intersection between u(α) and φ(α) for α > 0. And let α l be the largest point of intersection, less than α u, of l(α) and φ(α). Then, for α (α l, α u ), l(α) < φ(α) < u(α). 5. Bactracing (a) Bactracing Definition 2.4. Let ρ (0, 1) and ᾱ > 0. Bactracing checs over a sequence ᾱ, ρᾱ, ρ 2 ᾱ,... until a α is found satisfying the Armijo condition. (b) Existence of Step satisfying Bactracing Lemma 2.3. Suppose f : R n R is continuously differentiable. Let p be a descent direction at x. Then there is a step produced by the bactracing algorithm which satisfies the Armijo condition. Proof. ɛ > 0 such that for 0 < α < ɛ, φ(α) < φ(0) + cαφ (0) by continuity. Since ρ n ᾱ 0, there is an n such that ρ n ᾱ < ɛ. 10
11 2.2 Step Length Selection Algorithms 1. Algorithms which implement the Wolfe or Goldstein Conditions exist and are ideal, but are not discussed herein. 2. Bactracing Algorithm Algorithm 1: Bactracing Algorithm input : Reduction rate ρ (0, 1), Initial estimate ᾱ > 0, c (0, 1), φ(α) α ᾱ while φ(α) φ(0) + αcφ (0) do α ρα end return α 3. Interpolation with Expensive First Derivatives (a) Suppose the Interpolation technique produces a sequence of guesses α 0, α 1,.... (b) Interpolation produces α by modeling φ with a polynomial m (quadratic or cubic) and then minimizes m to find a new estimate for α. The parameters of the polynomial are given by requiring: i. m(0) = φ(0) ii. m (0) = φ (0) iii. m(α 1 ) = φ(α 1 ) iv. m(α 2 ) = φ(α 2 ) (c) When = 1, we model using a quadratic polynomial 4. Interpolation with Inexpensive First Derivatives (a) Suppose interpolation produces a sequence of guesses α 0, α 1,... (b) In this case, m is always a cubic polynomial such that, α is computed by: i. m(α 1 ) = φ(α 1 ) ii. m(α 2 ) = φ(α 2 ) iii. m (α 1 ) = φ (α 1 ) iv. m (α 2 ) = φ (α 2 ) 5. Interpolation Algorithm Algorithm 2: Interpolation Algorithm input : Feasible Search Region [ā, b], Initial estimate α 0 > 0, φ α 0, β α 0 while φ(β) φ(0) + cβφ (0) do Approximate m if Inexpensive Derivatives then α β end Explicitly compute minimizer of m in feasible region. Store as β. end return β 11
12 2.3 Global Convergence and Zoutendji 1. Globally Convergent. Zoutendji Condition. Definition 2.5. Suppose we have an algorithm, Ω which produces iterates (x ) and denote f(x ) = f. (a) Ω is globally convergent if f 0. (That is, x converge to a stationary point.) (b) Suppose Ω produced search directions p such that p = 1. And let θ be the angle between f and p. Ω satisfies the Zoutendji Condition if cos 2 (θ ) f 2 < =1 2. Zoutendji Condition & Angle Bound implies global convergence Lemma 2.4. Suppose Ω produces a sequence (x, p, f, θ ) such that δ > 0 and 1, cos(θ ) δ. If Ω satisfies the Zoutendji condition then Ω is globally convergent. Proof. By the Zoutendji condition: Therefore, f 2 0. δ 2 =1 f 2 < 3. Example of Angle Bound Example 2.1. Suppose p = B 1 f where B are positive definite and B B 1 < M < for all. Letting = 2, B = XΛX T be the EVD of B, and X T f = z: f T cos(θ ) = B 1 f f B 1 f = z T Λ 1 z z Λ 1 z n z 2 i = i=1 λ i n z 1/λ 1 z 2 1/λ n z 2 λ n λ 1 1 M Hence, if Ω satisfies the Zoutendji Condition, we see that lim n f n 0. i=1 z 2 i λ 2 i 12
13 4. Wolfe Condition Line Search Satisfies Zoutendji Condition. Theorem 2.1. Suppose we have an objective f satisfying: (a) f is bounded from below in R n (b) Given an initial x 0, there is an open set N containing L = {x : f(x) f(x 0 ) (c) f is continuously differentiable on N (d) f is Lipschitz continuous on N Suppose we have an algorithm Ω producing (x, p, f, θ, α ) such that: (a) p is a descent direction (with p = 1) (b) α satisfies the Wolfe Conditions Then Ω satisfies the Zoutendji Condition. Proof. The general strategy is to lower bound α uniformly by C f T p, where C > 0. Then using the descent condition: f +1 f cα f T p f C f T p 2 f C cos 2 (θ ) f 2 2 f 0 C cos ( θ j ) f j 2 2 j=1 And we can conclude that since f is bounded from below by l, then f 0 f f 0 l <. The result follows. So now we set out to show that α C f T p. We do this by leveraging the Curvature Condition. (a) From the curvature condition, we have that: ( f +1 f ) T p (c 2 1) f T p (b) From Lipschitz Continuity, with constant L: ( f +1 f ) T p f +1 f p Lα p 2 = Lα Together these imply that 1 c2 L f T p α. 5. Goldstein Condition Line Search satisfies Zoutendji Condition Theorem 2.2. Suppose f has the same properties as it does in Theorem 2.1. And suppose Ω produces (x, p, f, θ, α ) such that: (a) p is a descent direction with p = 1 (b) α satisfies the Goldstein conditions. Then Ω satisfies the Zoutendji Condition. 13
14 Proof. We use Taylor s Theorem and then Lispchitz Continuity to find the lower bound. By Taylor s theorem, for t (0, 1): (1 c)α f T p f +1 f = f(x + t α p ) T α p Therefore, cα f T p ( f(x + t α p ) T f T )p Lt α p 2 Lα The result follows. 6. Bactracing Line Search satisfies Zoutendji Condition Theorem 2.3. Suppose f has the same properties as it does in Theorem 2.1. And suppose Ω produces (x, p, f, θ, α ) such that: (a) p is a descent direction with p = 1 (b) α is selected by bactracing with ᾱ = 1 Then Ω satisfies the Zoutendji condition. Proof. Suppose Ω uses ρ (0, 1). There are two cases to consider. When α = 1 and α < 1. We denote the first subsequence by (j) and the second by (l). For α (l) we now at least that α (l) /ρ was rejected, that is: f(x + α (l) p /ρ) > f + cα (l) f T p /ρ Using the same strategy as in Goldstein, t such that: f(x + t α (l) p /ρ) T α p /ρ > cα (l) f T p /ρ Therefore, by Lipschitz continuity: (1 c) f T p Lt α (l) /ρ Lα (l) /ρ We now consider α (j) = 1 since the Armijo condition gives the sequence: f (j+1) f (j)+1 f (j) + c f T p = f (j) c cos(θ (j)s ) f (j) Therefore: c j cos(θ (j) ) f (j) < Then J such that j J, cos 2 (θ (j) ) f(j) 2 cos(θ(j) ) f(j) < 1 Therefore, the Zoutendji condition holds. 14
15 3 Trust Region Methods 3.1 Trust Region Subproblem and Fidelity 1. The strategy of the trust region problem is as follows: at any point x we have a model m x (p) of f(x + p) which we trust is a good approximation in a region p. Minimizing this, we have a new iterate on which to continue applying the approach. 2. Trust Region Subproblem Definition 3.1. Let x R n and f : R n R be an objective function with model m x (p) in a region p. The Trust Region Subproblem is to find arg min{m x (p) : p } Note 3.1. Usually m x is taen to be m x (p) = f(x) + f T x p pt Bp where B is the model Hessian and is not necessarily positive definite. 3. Fidelity Definition 3.2. The Fidelity of m x in a region p at a point p + x is f(x + p) f(x) ρ(m x, f, p, ) = m x (p) m x (0) 4. Fidelity and Trust Region. Let 0 < c 1 < c 2 < 1 (a) If ρ c 2, especially if p is intended for descent, the model is a good approximation to f and we can expand the trust region (b) If ρ c 1, especially if p is intended for descent, the model is a poor approximation to f and we should shrin the trust region (c) Otherwise, the model is a sufficient approximation to f and the trust region should not be adjusted. 5. Fidelity and Iterates. Suppose p causes a descent in m x. And let η (0, 1/4] (a) If ρ < η then we do not have a sufficient decrease in the function and the process should be repeated with x (b) If ρ η then we have a sufficient decrease in f, and we should continue the process with x + p 6. Notation (a) Let x 0, x 1,... be the iterates produces by subsequential solutions to the trust region subproblem for some objective f (b) m x (p) =: m (p) (c) for a particular m is denoted (d) Accepted solutions to the subproblem at x are p so that x +1 = x + p. 15
16 3.2 Fidelity Algorithms 1. Fidelity and Trust Region Algorithm 3: Trust Region Management Algorithm input : Thresholds 0 < c 2 < c 1 < 1, Trust Region, max, ρ x if ρ x > c 1 then min(2, max ) else if ρ < c 2 then /2 else Continue end return Note 3.2. Usually c 1 = 1/4 and c 2 = 3/4 2. Fidelity and Solution Acceptance Algorithm 4: Solution Acceptance Algorithm input : Treshold η (0, 1/4], ρ,x,p if ρ x (p) η then x = x + p else Continue end return x 3.3 Approximate Solutions to Subproblem Cauchy Point 1. Cauchy Point Definition 3.3. Let m x be a model of f within p. Let p S be the direction of steepest descent at m x (0) such that p S = 1, and let τ be: arg min{m x (τp s ) : τ } Then, p C = τp S is the Cauchy Point of m x. Note 3.3. The Cauchy point is the line search minimizer of m along the direction of steepest descent subject to p. 2. Computing the Cauchy Point Lemma 3.1. Let m x be a quadratic model of f within p such that m x (p) = f(x) + g T p pt Bp Then, its Cauchy Point is: { ( ) g p C g = min, g 3 g T Bg > 0 g T Bg g g gt Bg 0 16
17 Proof. First, we compute the direction of steepest descent: p S = g g Then, we have that m x (tp S ) = f(x) + tg T p S + t2 2 (ps ) T Bp S. If the quadratic term is negative, then t = is the minimizer. And so one option is: p C = g g If the quadratic term is positive then: d dt m x(tp S ) = d dt = g T p S + tp S Bp S Setting this to 0 and substituting bac in for p S : t = g 3 g T Bg This value could potentially be larger than, so to safeguard against this, in this case: ( ) τ = min, g 3 g T Bg 3. Using the Cauchy point does not produce the best rate of convergence Dogleg Method 1. Requirement: B 0 2. Intuitively, the dogleg method moves towards the minimizer of m x which is p min = B 1 g until it reaches this point or hits the boundary of the trust region. It does this by first moving to the Cauchy Point, and then along a linear path to p min 3. Dogleg Path Definition 3.4. Let m x be the quadratic model with B 0 of f with radius. Let p C be the Cauchy Point and p min be the unconstrained minimizer of m x. The Dogleg Path is p DL (t) where: { p DL p C t t [0, 1] (t) = p C + (t 1)(p min p C ) t [1, 2] 4. Dogleg Method Definition 3.5. The Dogleg Method chooses x +1 = x + p DL (τ) where p DL (τ) =. 17
18 5. Properties of Dogleg Path Lemma 3.2. Suppose p DL (t) is the Dogleg path of m x with = and B 0. Then: (a) m x (p DL (t)) is decreasing (b) p DL (t) is increasing Proof. When 0 < t < 1, this case is uninteresting since for these values we now m x (p DL (t)) is decreasing and p DL (t) is increasing. So we consider 1 < t < 2 and reparameterize using t 1 = z. We have that: d dz m x(p DL (t)) = (p min p C ) T (g Bp C ) + z(p min p C ) T B(p min p C ) = (1 z)(p min p C ) T B(p min p C ) Letting c = (p min p C ) T B(p min p C ) > 0 since B 0, we have that the derivative is strictly negative for z < 1. To show that the norm is increasing, first we show that the angle between p C and p min is between ( π/2, π/2), then we show its length is longer. (a) For the angle, g 0: p min, p C = g T B 1 g g 2 g T Bg > 0 (b) Using Cauchy-Schwartz, g p min g p C since: B 1 g g g T B 1 g = gt B 1 gg T Bg g T g 4 Bg g T Bg Global Convergence of Cauchy Point Methods 1. Reduction obtained by Cauchy Point Lemma 3.3. The Cauchy Point p c satisfies m (p C ) m (0) 1 ( 2 g min, g ) B Proof. We have that m (p C ) m (0) = g T p C (pc ) T B (p C ) 18
19 This brings in two cases when g T Bg 0 then (dropping subscripts) m (p C ) m (0) = g g 2 gt Bg g 1 ( 2 g min, g ) B When g T Bg 0, and (g T Bg) g 3, we are again in the above case: When g T Bg > g 3 : m (p C ) m (0) = g g 2 gt Bg g g 1 ( 2 g min, g ) B m (p C ) m (0) = g 4 g T Bg + g 4 2(g T Bg) 2 gt Bg = g 4 2g T Bg 1 g 2 2 B 1 ( 2 g min, g ) B 2. Reduction achieved by Dogleg Corollary 3.1. The dogleg step p DL satisfies: m (p DL ) m (0) 1 2 g min (, g ) B Proof. Since m (p DL ) mpc the result follows. 3. Reduction achieved by any method based on Cauchy Point Corollary 3.2. Suppose a method uses step p where m (p ) m (0) c(m (p C ) m (0)) for c (0, 1). Then p satisfies: m (p ) m (0) 1 ( 2 c g min, g ) B 19
20 4. Convergence when η = 0 Theorem 3.1. Let x 0 R n and f be an objective function. Let S = {x : f(x) f(x 0 )} and S(R 0 ) = {x : x y < R 0, y S} be a neighborhood of radius R 0 about S. Suppose the following conditions: (a) f is bounded from below on S (b) f is Lipschitz continuously differentiable on S(R 0 ) for some R 0 > 0 and some constant L. (c) There is a c such that for all : m (p ) m (0) c g min (d) There is a γ 1 such that for all : (e) η = 0 (fidelity threshold) (f) B β for all. p γ (, g ) B Then: lim inf g = 0 5. Convergence when η (0, 1/4) Theorem 3.2. If η (0, 1/4) with all other conditions from Theorem 3.1 holding, then: lim g = Iterative Solutions to Subproblem Exact Solution to Subproblem 1. Conditions for Minimizing Quadratics Lemma 3.4. Let m be the quadratic function where B is a symmetric matrix. m(p) = g T p pt Bp (a) m attains a minimum if and only if B 0 and g Im(B). If B 0 then every p satisfying Bp = g is a global minimizer of M. (b) m has a unique minimizer if and only if B 0. Proof. If p is a minimizer of m we have from second order conditions that 0 = m(p ) = Bp + g and B = 2 m(p ) 0. Now suppose that g Im(B), then p such that Bp = g. Let w R n then: m(p + w) = g T p + g T w ( w T Bw + 2p T Bw + p T Bp ) = m(p) + g T w 1 2 2gT w wt Bw m(p) 20
21 where the inequality follows since B 0. This proves the first part. Now suppose m has a unique minimizer, p, and suppose w such that w T Bw = 0. Since p is a minimizer, we have that Bp = g and so from the computation above m(p+w) = m(p) and so p+w, p are minimizers to m which is a contradiction. For the other direction, since B 0 and letting p = B 1 g, by Second Order Sufficient Conditions, m has a unique minimizer. 2. Characterization of Exact Solution Theorem 3.3. The vector p is a solution to the trust region problem: min f(x) + p R gt p + 1 n 2 pt Bp : p if and only if p is feasible and λ > 0 such that: (a) (B + λi)p = g (b) λ( p ) = 0 (c) B + λi 0 Proof. First we prove ( ) direction. So λ 0 which satisfies the properties. Consider the problem m (p) = f(x) + g T p pt (B + λi)p = m(p) λ p 2 Moreover, m (p) m (p ) which implies that: m(p) m(p ) + λ( p 2 p 2 ) So if λ = 0 then m(p) m(p ) and p is minimizer of m, or if λ > 0 then p = and so for a feasible p, p, which implies m(p) m(p ). Now suppose p is the minimizer of m. If p < then p is the unconstrained minimizer of m and so λ = 0, and so m(p ) = Bp + g = 0 and 2 m(p ) = B 0 by the previous lemma. Now suppose p =. Then the second condition is automatically satisfied. We now use the Lagrangian to determine the solution. L(λ, p) = g T p+ 1 2 pt Bp+ λ 2 (pt p 2 ) Differentiating with respect to p and setting this equal to 0, we see that (B + λi)p + g = 0 Now to show that (B + λi) is positive semi definite, note that for any p such that p 2 = and since p is the minimizer of m: m(p) + λ 2 p 2 m(p ) + λ 2 p 2 Rearranging, we have that (p p ) T (B λi)(p p ) 0 And since the set of all normalized ±(p p ) where p = is dense on the unit sphere, B λi is positive semi-definite. The last thing to show is that λ 0 which follows using the fact that if λ < 0 then p must be a global minimizer of m. By the previous lemma, this is a contradiction. 21
22 3.4.2 Newton s Root Finding Iterative Solution 1. Theorem 3.3 guarantees that a solution exists and we need on find λ which satisfies these conditions. Two cases exist from this theorem: Case 1 λ = 0. If λ = 0 then B 0 and we can compute a solution p using QR decomposition. Case 2 λ > 0. First, letting QΛQ T, be the EVD of B, and defining: p(λ) = Q(Λ + λi) 1 Q T g Hence: n p(λ) 2 = g T Q(Λ + λi) 2 Q T (qj T g = g)2 (λ j + λ) 2 j=1 From Theorem 3.3, we want to find λ > 0 for which: 2 = n j=1 (q T j g)2 (λ j + λ) 2 2. The second case has two sub-cases which must be considered. Let Q 1 be the columns of Q which correspond to the eigenspace of λ 1. Case 2a If Q T 1 g 0, then we implement Newton s Root finding algorithm on: f(λ) = 1 1 p(λ) since as λ λ 1, p(λ) and so a solution exists. Case 2b If Q T 1 g = 0, then applying Newton s root finding algorithm naively is not useful since p(λ) as λ λ 1. We note that when λ = λ 1, (B + λi) 0 and so we can find a τ such that: 2 = j:λ 1 λ j (qt j g)2 (λ j λ 1 ) 2 + τ 2 Computation. z 0 such that (B λ 1 I)z = 0 and z = 1. Therefore, q T j z = 0 for all j : λ 1 λ j. Setting we need only find τ. p(λ 1, τ) = j:λ 1 λ j (qt j g)2 (λ j λ 1 ) 2 + τ Trust Region Subproblem Algorithms 22
23 4 Conjugate Gradients 1. Conjugated Gradients Overview Algorithm 5: Overview of Conjugate Gradient Algorithm input : x 0 a starting point, ɛ tolerance, f objective x x 0 Compute Gradient: r f(x) Compute Conjugated Descent Direction: p r while r > ɛ do Compute Optimal Step Length: α Compute Step: x x + αr Compute Gradient: r f(x) Compute Conjugated Step Direction: p end return Minimizer: x 2. Conjugate Gradients for minimizing convex (A 0), quadratic function f(x) = 1 2 xt Ax b T x Algorithm 6: Conjugate Gradients Algorithm for Convex Quadratic input : x 0 a starting point, ɛ = 0 tolerance, f objective x x 0 Compute Gradient: r f(x) = Ax b Compute Conjugated Descent Direction: p r while r > ɛ do Compute Optimal Step Length: α = rt r p T Ap Compute Step: x x + αr Store Previous: r old r Compute Gradient: r r + αap Compute Conjugated Step Direction: p r + end return Minimizer: x r 2 p r old 2 3. Fletcher-Reeves for Nonlinear Functions Algorithm 7: Fletcher Reeves CG Algorithm input : x 0 a starting point, ɛ tolerance, f objective x x 0 Compute Gradient: r f(x) Compute Conjugated Descent Direction: p r while r > ɛ do Compute Optimal Step Length: α (Strong Wolfe Line Search) Compute Step: x x + αr Store Previous: r old r Compute Gradient: r f(x) Compute Conjugated Step Direction: p r + r 2 r 2 old p end return Minimizer: x 23
24 Part II Model Hessian Selection 5 Newton s Method 1. Requires 2 f 0 in some region for the method to wor. 2. Line Search (a) The search direction: p N = 2 f 1 f (b) Rate of Convergence Theorem 5.1. Suppose f is an objective with minimum x satisfying the Second Order Conditions. Moreover: i. Suppose f is twice continuously differentiable in a neighborhood of x ii. Specifically, suppose 2 f is Lipschitz continuous in a neighborhood of x iii. Suppose p = p N and x +1 = x + p N iv. Suppose x x Then for x 0 sufficiently close to x : i. x x quadratically ii. f 0 quadratically. Proof. Our goal is to get x +1 x in terms of 2 f and use Lipschitz continuity to provide an upper bound. i. We have that: x +1 x = x + p x = x x 2 f 1 f = 2 f 1 [ 2 f (x x ) ( f f ) ] = 2 f 1 = 2 f 1 [ 2 f (x x ) 1 0 ] 2 f(x + t(x x ))(x x )dt [ 1 ( 2 f 2 f(x + t(x x )) ) ] (x x )dt 0 24
25 ii. Using Lipschitz continuity: x +1 x 2 f 1 2 f f 2 f(x + t(x x )) x x dt tl x x 2 dt 2 f 1 L 2 x x 2 We now need to show that 2 f 1 is bounded from above. Let η(2r) be a neighborhood of radius 2r about x for which 2 f 0. In this neighborhood, 2 f 1 exists and g(x) = 2 f(x) 1 is continuous. Then, on the compact closed ball η(r), g(x) has a finite maximum. Therefore, we have quadratic convergence of x x. iii. We now consider the first derivative: f +1 f = f +1 = f+1 f 2 f p 1 [ = 2 (x + tp ) 2 ] f p dt 0 1 Lt p 2 L 2 g(x ) 2 f 2 So if x 0 η(r), the result follows Trust Region (a) Asymptotically Similar Definition 5.1. A sequence p is asymptotically similar to a sequence q if p q = o( q ) (b) The Subproblem (c) Rate of Convergence min f(x) + f T p R n x p pt 2 f x p : p Theorem 5.2. Suppose f is an objective with minimum x satisfying the Second Order Conditions. Moreover: i. Suppose f is twice continuously differentiable in a neighborhood of x 25
26 ii. Specifically, suppose 2 f is Lipschitz continuous in a neighborhood of x iii. Suppose for some c (0, 1), p satisfy ( m (p ) m (0) c f min, f ) 2 f and are asymptotically similar to p N when p N 1 2. iv. Suppose x +1 = x + p and x x Then: i. becomes inactive for sufficiently large. ii. x x superlinearly. Proof. We need to show that are bounded from below. The second part follows from this quicly since x x implies p N 0 and eventually p N 1 2. So applying the asymptotic similarity: x + p x x + p N x + p N p o( x x 2 ) + o( p N ) o( x x 2 ) + o( x x ) o( x x ) where the penultimate inequality follows from: 2 f 1 f f ( 2 f 1 x x sup x η(r) 2 f(x) ) To get that is bounded from below, we frame it in terms of ρ. The numerator of ρ 1 satisfies: f(x + p ) f(x ) m (p ) m (0) 1 1 ( 2 f(x + tsp ) 2 ) f p dt 2 pt 0 L 4 p 3 And the denominator of ρ 1 satisfies: ( m (p ) m (0) c f min, f ) 2 f ( c f min p, f ) 2 f So we need only lower bound f by p to translate this to a lower bound for. If p N > 1 2 then (in some neighborhood of x ) p 2 p N 2 2 f 1 f C f 26
27 Else if p N 1 2 then Therefore: p p N + o( p N ) 2 p N C f m (p ) m (0) c p C ( ) min p p, 2 f 2 f 1 Since the second term s denominator is the condition number of the matrix (which must be greater than or equal to 1), we have that: ρ 1 C p C Hence, if < 3 4C, ρ > 3/4 and would double. So we now that 1 ˆ for some fixed M such that 1 ˆ 3 2 M 2 M 4C. So we have that for sufficiently large, p 0 and so ρ 1 and becomes inactive. 27
28 6 Newton s Method with Hessian Modification 1. If 2 f 0, we can add a matrix E so that 2 f + E =: B 0. As long as this model produces a descent direction, and B B 1 are bounded uniformly, Zoutendji guarantees convergence. 2. Newton s Method with Hessian Modification Algorithm 8: Line Search with Modified Hessian input : Starting point x 0, Tolerance ɛ 0 while p > ɛ do Find E so that 2 f + E 0 p Solve B p = g Compute α (Wolfe, Goldstein, Bactracing, Interpolation) x +1 x + α p + 1 end 3. Minimum Frobenius Norm (a) Problem: find E which satisfies min E 2 F : λ min(a + E) δ. (b) Solution: If A = QΛQ T is the EVD of A then E = Qdiag(τ i )Q T where { 0 λ i δ τ i = δ λ i λ i < δ 4. Minimum 2-Norm (a) Problem: find E which satisfies min E 2 2 : λ min(a + E) δ (b) Solution: If λ min (A) = λ n then E = τi where: { 0 λ n δ τ = δ λ n λ n < δ 5. Modified Cholesy (a) Compute LDL Factorization but add constants to ensure positive definiteness of A (b) May require pivoting to ensure numerical stability 6. Modified Bloc Cholesy (a) Compute LBL factorization where B is a bloc diagonal (1 1 and 2 where the number of blocs is the number of positive eigenvalues and the number of 2 2 blocs is the number of negative eigenvalues) (b) Compute the EVD of B and modify it to ensure appropriate eigenvalues. 28
29 7 Quasi-Newton s Method 1. Fundamental Idea Computation. Let f be our objective and suppose it is Lipschitz twice continuously differentiable. From Taylor s Theorem: f(x + p) = f(x) + 1 = f(x) + 2 f(x)p f(x + tp)pdt 1 = f(x) + 2 f(x)p + o( p ) Therefore, letting x = x and p = x +1 x 2 f (x +1 x ) f +1 f 0 ( 2 f(x + tp) 2 f(x) ) pdt Quasi-Newton methods choose an approximation to 2 f, B which satisfies: B +1 (x +1 x ) = g +1 g 2. Secant Equation Definition 7.1. Let x i be a sequence of iterates when minimizing an objective function f. Let s = x +1 x and y = g +1 g where g i = f i. Then the secant equation is where B +1 is some matrix. 3. Curvature Condition B +1 s = y Definition 7.2. The curvature condition is s T y > 0. Note 7.1. If B satisfies the secant equation and s, y satisfy the curvature condition then s T B +1s > 0. So the quadratic along the step direction is convex. 4. Line Search Rate of Convergence Theorem 7.1. Suppose f is an objective with minimum x satisfying the Second Order Conditions. Moreover: (a) Suppose f is twice continuously differentiable in a neighborhood of x (b) Suppose B is computed using a quasi-newton method and B p = g (c) Suppose x +1 = x + p 29
30 (d) Suppose x x When x 0 is sufficiently close to x, x x superlinearly if and only if: (B 2 f(x ))p lim p 0 Proof. We show that the condition is equivalent to: p p N = o( p ). First, if the condition holds: p p N = 2 f 1 ( 2 ) f p + f = 2 f 1 ( 2 ) f B p = 2 f 1 [( 2 f 2 f(x ) ) p + ( 2 f(x ) ] ) B p Taing norms and using that x 0 is sufficiently close to x, and continuity: p p N = o( p ) For the other direction: o( p ) = p p N ( 2 f(x ) B ) p Now we just apply triangle inequality and properties of the Newton Step: x + p x x + p N x + p p N = o( x x 2 )+o( p ) Note that o( p ) = o( x x ). 7.1 Ran-2 Update: DFP & BFGS 1. Davidson-Fletcher-Powell Update (a) Let: G = f(x + tαp )dt So Taylor s Theorem implies: y = G s (b) Let: A W = G 1/2 AG 1/2 F (c) Problem. Given B symmetric positive definite, s and y, find arg min B B W : B = B T, Bs = y (d) Solution: B +1 = ( I y s T ) ( y T s B I s y T ) y T s + y y T y T s 30
31 (e) Inverse Hessian: H +1 = H H y y T H y T H y + s s T y T s 2. Broyden, Fletcher, Goldfarb, Shanno Update (a) Problem: Given H (inverse of B ) symmetric positive definite, s and y, find arg min H H W : H = H T, Hy = s (b) Solution: H +1 = ( I s y T ) ( y T s H I y s T ) y T s + s s T y T s (c) Hessian: B +1 = B B s s T B s T B s 3. Preservation of Positive Definiteness + y y T y T s Lemma 7.1. If B 0 and B +1 is updated from the BFGS method, then B Proof. We can equivalently show that H Let z R n \ {0}. Then: z T H +1 z = w T H w + ρ (s T z) 2 where w = z ρ (s T z)y. If s T z = 0 then w = z and zt H z > 0. If s T z 0, then ρ (s T z)2 > 0 (regardless of if w = 0) and so H Ran-1 Update: SR1 1. Problem: find v such that: B +1 = B + σvv T : B +1 s = y 2. Solution: B +1 = B + (y B s )(y B s ) T s T (y B s ) Proof. Multiplying by s : Hence, v is a multiple of y B s. y B s = σ(v T s )v 31
32 Part III Specialized Applications of Optimization 8 Inexact Newton s Method 1. Requirements for search direction: (a) p is inexpensive to compute (b) p approximates the newton step closely as measured by: z = 2 f p f (c) The error is controlled by the current gradient: z η f 2. Typically, inexact methods eep B = 2 f and improve on computing b 3. CG Based Algorithmic Overview Algorithm 9: Overview of Inexact CG Algorithm input : x 0 a starting point, f objective, restrictions x x 0 Define Tolerance: ɛ Compute Gradient: r f(x) Compute Conjugated Descent Direction: p r while r > ɛ do Chec Constrained Polynomial if p T Bp 0 then Compute α given restrictions Compute Minimizer: x x + αp Brea Loop else Compute Optimal Step Length: α if Restrictions on x then Compute τ given restrictions Compute Minimizer: x x + ταp Brea Loop else Compute Step: x x + αr end Compute Gradient: r f(x) Compute Conjugated Step Direction: p end end x x return Minimizer: x 4. As with the usual CG, we can implement methods for computing the gradient and conjugate step direction efficiently 32
33 5. Line Search Strategy: the restriction of α is simply that it satisfies the strong Wolfe Conditions 6. Trust-Region Strategy: (a) The restriction on α is that x + αp = because the restricted polynomial in the direction p is decreasing, so we should go all the way to the boundary (b) The restriction on x is that x + αp > because x+αp is no longer in the trust region so we need to find τ that ensures this happens. 7. An alternative is to use Lanczos Method, which generalizes CG Methods. 9 Limited Memory BFGS 1. In normal BFGS: H +1 = (I ρ s y T )H (I ρ y s T ) + ρ s s T 2. Substituting in for H until H 0 reveals that only s, y must be stored to compute H Limited BFGS capitalizes on this by storing a fixed number of s, y and re-updating H from a (typically diagonal) matrix H0 Algorithm 10: L-BFGS Algorithm input : x 0 a starting point, f objective, m storage size, ɛ tolerance x x 0 0 while f(x) > ɛ do Choose H 0 Update H from {(s, y ),..., (s m, y m )} Compute p u H f(x) Implement Line Search or Trust Region (dogleg) to find p x x + p + 1 end return Minimizer: x 33
34 10 Least Squares Regression Methods 1. In least squares problems, the objective function has a specialized form, for example: m f(x) = r j (x) 2 i=1 (a) x R n and usually are the parameters of a particular model of interest (b) r j are usually the residuals evaluated for the model φ with parameter x at point t j and output y j. That is: 2. Formulation r j (x) = φ(x, t j ) y j (a) Let r(x) = [ r 1 (x) r 2 (x) r m (x) ]. Then: f(x) = 1 2 r(x) 2 (b) The first derivative: f(x) = r(x) T r(x) = r 1 (x) T r 2 (x) T. r m (x) T r(x) =: J(x)T r(x) (c) The Second Derivative: 2 f(x) = r(x) T r(x) + 2 r(x)r(x) = J(x) T J(x) + 2 r(x)r(x) 3. The main conceit behind Least Squares methods is to approximate 2 f(x) by J(x) T J(x), thus, only first derivative information is required Linear Least Squares Regression 1. Suppose φ(x, t j ) = t T j x then: r(x) = T x y where each row of T is t T j 2. Objective Function: f(x) = 1 T x y 2 2 Note The objective function can be solved directly using matrix factorizations if m is not too large. 3. First Derivative: f(x) = T T T x T y Note The normal equations may be better to solve if m is large because by multiplying by the transpose, we need only solve an n n system of equations. 34
35 10.2 Line Search: Gauss-Newton 1. The Gauss-Newton Algorithm is a line search for non-linear least squares regression with search direction: J T J p = J T r 2. Notice that these are the normal equations from: 1 2 J p + r 2 3. Using this search direction, we proceed with line search as per usual 10.3 Trust Region: Levenberg-Marquardt 1. The local model is: m(p) = f(x) + (J(x) T r(x)) T p pt J(x) T J(x)p p 2. Since f(x) = r(x) 2 /2, the solution to minimizing m(p) is equivalent to: min 1 J(x)p + r 2 p 2 35
36 Part IV Constrained Optimization 10.4 Theory of Constrained Optimization 1. In constrained optimization, the problem is: min{x arg min f(x) : c i (x) = 0 i E, c j (x) 0 j I} (a) f is the usual objective function (b) E is the index for equality constraints (c) I is the index for inequality constraints 2. Feasible Set. Active Set. Definition The feasible set, Ω, is the set of all points which satisfy the constraints: Ω = {x : c i (x) = 0 i E, c j (x) 0 j I} The active set at point x Ω, A(x), is defined by: A(x) = E {j I : c j (x) = 0} 3. Feasible Sequence. Tangent. Tangent Cone. Definition Let x Ω. A sequence z x is a feasible sequence if for sufficiently large K, if K then z Ω. A vector d is a tangent to x if there is a feasible sequence z x and a sequence t 0 such that: z x lim = d t The set of all tangents is the Tangent Cone at x, denoted T Ω (x). Note T Ω (x) contains all step directions which for a sufficiently small step size will remain in the feasible region Ω. 4. Linearized Feasible Directions. Definition Let x Ω and A(x) be its active set. The set of linearized feasible directions, F(x) is defined as: F(x) = {d : d T c i (x) = 0 i E, d T c i (x) 0 A(x) I} Note Let x Ω and suppose we linearize the constraints near x. The set F(x) contains all search directions for which the linearized approximations to c i (x+d) satisfy the constraints. For example If c i (x) = 0 for i E, then 0c i (x + d) c i (x) + d T c i (x) = d T c i (x) for d to be a good search direction. 5. LICQ Constrain Qualification 36
37 Definition Given a point x and active set A(x), the linear independence constrain qualification (LICQ) holds if the set of active constraint gradients, { c i (x) : i A(x)}, is linearly independent. Note Most algorithms use line search or trust region require that F(x) T Ω (x), else there is not a good way to find another iterate. 6. First Order Necessary Conditions Theorem Let f, c i be continuously differentiable. Suppose x is a local solution to the optimization problem and at x, LICQ holds. Then there is a Lagrange multiplier vector λ, with components λ i for i E I, such that the Karush-Kuhn-Tucer conditions are satisfied at (x, λ): (a) x L(x, λ) = 0 (b) For all i E, c i (x ) = 0 (c) For all j I, c j (x ) 0 (d) For all j I, λ j 0 (e) For all j I, λ j c j (x ) = 0 37
Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationSearch Directions for Unconstrained Optimization
8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All
More informationSECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS
SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationj=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k
Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationIntroduction to Nonlinear Optimization Paul J. Atzberger
Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,
More informationEECS260 Optimization Lecture notes
EECS260 Optimization Lecture notes Based on Numerical Optimization (Nocedal & Wright, Springer, 2nd ed., 2006) Miguel Á. Carreira-Perpiñán EECS, University of California, Merced May 2, 2010 1 Introduction
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationTrust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Trust Region Methods Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Trust Region Methods min p m k (p) f(x k + p) s.t. p 2 R k Iteratively solve approximations
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex
More informationMaria Cameron. f(x) = 1 n
Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationOPER 627: Nonlinear Optimization Lecture 14: Mid-term Review
OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization
More information1 Numerical optimization
Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationQuasi-Newton methods for minimization
Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More information2. Quasi-Newton methods
L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization
More informationUniversity of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More informationStatic unconstrained optimization
Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R
More informationNonlinear equations. Norms for R n. Convergence orders for iterative methods
Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationGradient-Based Optimization
Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationExamination paper for TMA4180 Optimization I
Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted
More informationOPER 627: Nonlinear Optimization Lecture 9: Trust-region methods
OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods Department of Statistical Sciences and Operations Research Virginia Commonwealth University Sept 25, 2013 (Lecture 9) Nonlinear Optimization
More informationOptimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23
Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationOptimal Newton-type methods for nonconvex smooth optimization problems
Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationOptimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng
Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38
More informationOutline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems
Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationQuasi-Newton Methods. Javier Peña Convex Optimization /36-725
Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More information1 Numerical optimization
Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationQuasi-Newton Methods
Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references
More informationStatistics 580 Optimization Methods
Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationNumerical Optimization of Partial Differential Equations
Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationLecture 18: November Review on Primal-dual interior-poit methods
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the
More informationTrust Regions. Charles J. Geyer. March 27, 2013
Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but
More informationHYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract
HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More information1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0
Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =
More informationReview of Classical Optimization
Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization
More informationNumerical Optimization
Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationA new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints
Journal of Computational and Applied Mathematics 161 (003) 1 5 www.elsevier.com/locate/cam A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality
More informationOptimization Methods for Circuit Design
Technische Universität München Department of Electrical Engineering and Information Technology Institute for Electronic Design Automation Optimization Methods for Circuit Design Compendium H. Graeb Version
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationA line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 )
A line-search algorithm inspired by the adaptive cubic regularization framewor with a worst-case complexity Oɛ 3/ E. Bergou Y. Diouane S. Gratton December 4, 017 Abstract Adaptive regularized framewor
More information2.3 Linear Programming
2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationOptimization Methods
Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available
More informationHandling Nonpositive Curvature in a Limited Memory Steepest Descent Method
Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 14T-011-R1
More informationOn fast trust region methods for quadratic models with linear constraints. M.J.D. Powell
DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many
More informationComputational Optimization. Augmented Lagrangian NW 17.3
Computational Optimization Augmented Lagrangian NW 17.3 Upcoming Schedule No class April 18 Friday, April 25, in class presentations. Projects due unless you present April 25 (free extension until Monday
More informationPart 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More informationAN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING
AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general
More informationOptimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes
Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding
More informationOn Lagrange multipliers of trust-region subproblems
On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008
More informationCONSTRAINED NONLINEAR PROGRAMMING
149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach
More informationArc Search Algorithms
Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where
More informationUNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems
UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction
More informationSub-Sampled Newton Methods I: Globally Convergent Algorithms
Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationMATH 205C: STATIONARY PHASE LEMMA
MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)
More informationSecond Order Optimization Algorithms I
Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The
More informationSurvey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA
Survey of NLP Algorithms L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA NLP Algorithms - Outline Problem and Goals KKT Conditions and Variable Classification Handling
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationUnconstrained Optimization
1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationPreconditioned conjugate gradient algorithms with column scaling
Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Preconditioned conjugate gradient algorithms with column scaling R. Pytla Institute of Automatic Control and
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More information1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min
LLNL IM Release number: LLNL-JRNL-745068 A STRUCTURED QUASI-NEWTON ALGORITHM FOR OPTIMIZING WITH INCOMPLETE HESSIAN INFORMATION COSMIN G. PETRA, NAIYUAN CHIANG, AND MIHAI ANITESCU Abstract. We present
More informationConstrained Optimization Theory
Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More information