Notes on Numerical Optimization

Size: px
Start display at page:

Download "Notes on Numerical Optimization"

Transcription

1 Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18,

2 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization Problem and Classification Theoretical Properties of Solutions Algorithm Overview Fundamental Definitions and Results Line Search Methods Step Length Step Length Selection Algorithms Global Convergence and Zoutendji Trust Region Methods Trust Region Subproblem and Fidelity Fidelity Algorithms Approximate Solutions to Subproblem Cauchy Point Dogleg Method Global Convergence of Cauchy Point Methods Iterative Solutions to Subproblem Exact Solution to Subproblem Newton s Root Finding Iterative Solution Trust Region Subproblem Algorithms Conjugate Gradients 23 II Model Hessian Selection 24 5 Newton s Method 24 6 Newton s Method with Hessian Modification 28 7 Quasi-Newton s Method Ran-2 Update: DFP & BFGS Ran-1 Update: SR III Specialized Applications of Optimization 32 8 Inexact Newton s Method 32 9 Limited Memory BFGS 33 2

3 10 Least Squares Regression Methods Linear Least Squares Regression Line Search: Gauss-Newton Trust Region: Levenberg-Marquardt IV Constrained Optimization Theory of Constrained Optimization

4 List of Algorithms 1 Bactracing Algorithm Interpolation Algorithm Trust Region Management Algorithm Solution Acceptance Algorithm Overview of Conjugate Gradient Algorithm Conjugate Gradients Algorithm for Convex Quadratic Fletcher Reeves CG Algorithm Line Search with Modified Hessian Overview of Inexact CG Algorithm L-BFGS Algorithm

5 Part I Fundamentals of Optimization 1 Overview of Numerical Optimization 1.1 Problem and Classification 1. Problem: arg min z R n f(z) : { c i (z) = 0 c i (z) 0 i E i I (a) f : R n R is nown as the objective function (b) E are equality constraints (c) I are inequality constraints 2. Classifications (a) Unconstrained vs. Constrained. If E I = then it is an unconstrained problem. (b) Linear Programming vs. Nonlinear Programming. When f(z) and c i (z) are all linear functions of z then this is a linear programming problem. Otherwise, it is a nonlinear programming problem. (c) Note: This is not the same as having linear or nonlinear equations. 1.2 Theoretical Properties of Solutions 1. Global Minimizer. Wea Local Minimizer (Local Minimizer). Strict Local Minimizer. Isolated Local Minimizer. Definition 1.1. Let f : R n R (a) x is a global minimizer of f if f(x ) f(x) for all x R n (b) x is a local minimizer of f if for some neighborhood, N of x, f(x ) f(x) for all x N(x ). (c) x is a strict local minimizer of f if for some neighborhood, N(x ), f(x ) < f(x) for all x N(x ). (d) x is an isolated local minimizer of f if for some neighborhood, N(x ), there are no other minimizers of f in N(x ). Note 1.1. Every isolated local minimizer is a strict minimizer, but it is not true that a strict minimizer is always an isolated local minimizer. 2. Taylor s Theorem Theorem 1.1. Suppose f : R n R is continuously differentiable. Let p R n. Then for some t [0, 1]: f(x + p) = f(x) + f(x + tp) T p (1) 5

6 If f is twice continuously differentiable, then: f(x + p) = f(x) + And for some t [0, 1]: f(x + tp)pdt (2) f(x + p) = f(x) + f(x) T p pt 2 f(x + tp)p (3) 3. Necessary Conditions (a) First Order Necessary Conditions Lemma 1.1. Let f be continuously differentiable. If x is a minimizer of f then f(x ) = 0. Proof. Suppose f(x ) 0. Let p = f(x ). Then, p T f(x ) < 0 and by continuity there is some T > 0 such that for any t [0, T ), p T f(x + tp) < 0. We now apply Equation 1 of Taylor s Theorem to get a contradiction. Let t [0, T ) and q = x + tp. Then t [0, t] such that: f(x + q) = f(x ) + t f(x + t tp) T p < f(x ) (b) Second Order Necessary Conditions Lemma 1.2. Suppose f is twice continuously differentiable. If x is a minimizer of f then f(x ) = 0 and 2 f(x ) 0 (i.e. positive semi-definite) Proof. The first conclusion follows from the first order necessary condition. We proceed with the second one by contradiction and Taylor s Theorem. Suppose 2 f(x ) 0. Then there is a p such that: p T 2 f(x )p < 0 By continuity, there is a T > 0 such that p T 2 f(x + tp)p < 0 for t [0, T ]. Fix t [0, T ] and let q = tp. Then, by Equation 3 from Taylor s Theorem, t [0, t] such that: f(x + q) = f(x ) t2 p T 2 f(x + t tp)p < f(x ) 4. Second Order Sufficient Condition Lemma 1.3. Let x R n. Suppose 2 f is continuous, 2 f(x ) 0, and f(x ) = 0. Then x is a strict local minimizer. 6

7 Proof. By continuity, there is a ball of radius T about x, B, in which 2 f(x) 0. Let p < T. Then, there exists a t [0, 1] such that: f(x + p) = f(x ) pt 2 f(x + tp)p > f(x ) 5. Convexity and Local Minimizers Lemma 1.4. When f is convex any local minimizer x is a global minimizer. If, in addition, f is differentiable then any stationary point is a global minimizer. Proof. Suppose x is not a global minimizer. Let z be the global minimizer. Then f(x ) > f(z). By convexity, for any λ [0, 1]: f(x ) > λf(z) + (1 λ)f(x ) f(λz + (1 λ)x ) So as λ 0, we see that in any neighborhood of x there is a point w such that f(w) < f(x ). A contradiction. For the second part, suppose z is as above. Then: f(x ) T (z x f(x + λ(z x )) f(x ) ) = lim λ 0 λ λf(z) + (1 λ)f(x ) f(x ) lim λ 0 λ f(z) f(x ) < Algorithm Overview 1. In general, algorithms begin with a seed point, x 0, and locally search for decreases in the objective function, producing iterates x, until stopping conditions are met. 2. Algorithms typically generate a local model for f near a point x : f(x + p) m (p) = f + p T f pt B p 3. Different choices of B will lead to different methods with different properties: (a) B = 0 will lead to steepest descent methods (b) Letting B be the closest positive definite approximation to 2 f leads to newton s methods (c) Iterative approximations to the Hessian given by B lead to quasinewton s methods (d) Conjugate Gradient methods update p without explicitly computing B 7

8 1.4 Fundamental Definitions and Results 1. Q-convergence. Definition 1.2. Let x x in R n. (a) x converge Q-linearly if q (0, 1) such that for sufficiently large : (b) x converge Q-superlinearly if: x +1 x x x q x +1 x lim x x 0 (c) x converges Q-quadratically if q 2 > 0 such that for sufficiently large : x +1 x x x 2 q 2. R-convergence. Definition 1.3. Let x x in R n (a) x converges R-linearly if v R converging Q-linearly such that x x v (b) x converges R-superlinearly if v R converging Q-superlinearly such that x x v (c) x converges R-quadratically if v R converging Q-quadratically such that x x v 3. Sherman-Morrison-Woodbury Lemma 1.5. If A and à are non-singular and related by à = A + abt then: à 1 = A 1 A 1 ab T A b T A 1 a Proof. We can chec this by simply plugging in the formula. However, to derive this: à 1 = ( A + ab T ) 1 = ( A [ I + A 1 ab T ]) 1 = [ I + A 1 ab T ] 1 A 1 = [ I A 1 ab T + A 1 ab T A 1 ab T ] A 1 = [ I A 1 a(1 b T A 1 a + )b T ] A 1 = A 1 A 1 ab T A b T A 1 a 8

9 2 Line Search Methods 2.1 Step Length 1. Descent Direction Definition 2.1. p R n is a descent direction if p T f < Problem: Find α arg min φ(α) where φ(α) = f(x + αp ). However, it is too costly to find the exact minimizer of φ. Instead, we find an α which acceptably reduces φ. 3. Wolfe Condition (a) Armijo Condition. Curvature Condition. Wolfe Condition. Strong Wolfe Condition. Definition 2.2. Fix x and p to be a descent direction. Let φ(α) = f(x + αp) and so φ (α) = f T (x + αp)p. Fix 0 < c 1 < c 2 < 1. i. α satisfies the Armijo Condition if φ(α) φ(0)+αc 1 φ (0). Note l(α) := φ(0) + αc 1 φ (0). ii. α satisfies the Curvature Condition if φ (α) c 2 φ (0) iii. α satisfies the Strong Curvature Condition if φ (α) c 2 φ (0) iv. The Wolfe Condition is the Armijo Condition with the Curvature Condition v. The Strong Wolfe Condition is the Armijo Condition with the Strong Curvature Condition. (b) The Armijo Condition: guarantees a decrease at the next iterate by ensuring φ(alpha) < φ(0). (c) Curvature Condition i. If φ (α) < c 2 φ (0), then φ is still decreasing at α and so we improve the reduction in f by taing a larger α ii. If φ (α) c 2 φ (0), then either we are closer to a minimum where φ = 0 or φ > 0 which means we have surpassed the minimum. iii. The strong condition guarantees that the choice of α is closer to φ = 0. (d) Existence of Step satisfying Wolfe and Strong Wolfe Conditions Lemma 2.1. Suppose f : R n R is continuously differentiable. Let p be a descent direction at x and assume that f is bounded from below along the ray {x + αp : α > 0}. If 0 < c 1 < c 2 < 1 there exists an interval satisfying the Wolfe and Strong Wolfe Condition. Proof. i. Satisfying Armijo Condition: Let l(α) = f(x) + c 1 αp T f(x). Since l(α) is decreasing and φ(α) is bounded from below, eventually φ(α) = l(α). Let α A > 0 be the first time this intersection occurs. Then φ(α) < l(α) for α (0, α A ). 9

10 ii. Curvature Condition: By the mean value theorem, β (0, α A ) such that (since l(α A ) = φ(α A )): α A φ (β) = φ(α A ) φ(0) = c 1 α A φ (0) > c 2 α A φ (0) And since φ is smooth there is an interval containing β for which this inequality holds. iii. Strong Curvature Condition. Since φ (β) = c 1 φ (0) < 0, it follows that: φ (β) < c 2 φ (0) 4. Goldstein Condition (a) Goldstein Condition Definition 2.3. Tae c (0, 1/2). The Goldstein Condition is satisfied by α if: φ(0) + (1 c)αφ (0) φ(α) φ(0) + cαφ (0) (b) Existence of Step satisfying Goldstein Condition Lemma 2.2. Suppose f : R n R is continuously differentiable. Let p be a descent direction at x and assume that f is bounded from below along the ray {x + αp : α > 0}. Let c (0, 1/2). Then there exists an interval satisfying the Goldstein condition. Proof. Let l(α) = φ(0)+(1 c)αφ (0) and u(α) = φ(0)+cαφ (0). For α > 0, l(α) < u(α) since c (0, 1/2). Since φ is bounded from below let α u be the smallest point of intersection between u(α) and φ(α) for α > 0. And let α l be the largest point of intersection, less than α u, of l(α) and φ(α). Then, for α (α l, α u ), l(α) < φ(α) < u(α). 5. Bactracing (a) Bactracing Definition 2.4. Let ρ (0, 1) and ᾱ > 0. Bactracing checs over a sequence ᾱ, ρᾱ, ρ 2 ᾱ,... until a α is found satisfying the Armijo condition. (b) Existence of Step satisfying Bactracing Lemma 2.3. Suppose f : R n R is continuously differentiable. Let p be a descent direction at x. Then there is a step produced by the bactracing algorithm which satisfies the Armijo condition. Proof. ɛ > 0 such that for 0 < α < ɛ, φ(α) < φ(0) + cαφ (0) by continuity. Since ρ n ᾱ 0, there is an n such that ρ n ᾱ < ɛ. 10

11 2.2 Step Length Selection Algorithms 1. Algorithms which implement the Wolfe or Goldstein Conditions exist and are ideal, but are not discussed herein. 2. Bactracing Algorithm Algorithm 1: Bactracing Algorithm input : Reduction rate ρ (0, 1), Initial estimate ᾱ > 0, c (0, 1), φ(α) α ᾱ while φ(α) φ(0) + αcφ (0) do α ρα end return α 3. Interpolation with Expensive First Derivatives (a) Suppose the Interpolation technique produces a sequence of guesses α 0, α 1,.... (b) Interpolation produces α by modeling φ with a polynomial m (quadratic or cubic) and then minimizes m to find a new estimate for α. The parameters of the polynomial are given by requiring: i. m(0) = φ(0) ii. m (0) = φ (0) iii. m(α 1 ) = φ(α 1 ) iv. m(α 2 ) = φ(α 2 ) (c) When = 1, we model using a quadratic polynomial 4. Interpolation with Inexpensive First Derivatives (a) Suppose interpolation produces a sequence of guesses α 0, α 1,... (b) In this case, m is always a cubic polynomial such that, α is computed by: i. m(α 1 ) = φ(α 1 ) ii. m(α 2 ) = φ(α 2 ) iii. m (α 1 ) = φ (α 1 ) iv. m (α 2 ) = φ (α 2 ) 5. Interpolation Algorithm Algorithm 2: Interpolation Algorithm input : Feasible Search Region [ā, b], Initial estimate α 0 > 0, φ α 0, β α 0 while φ(β) φ(0) + cβφ (0) do Approximate m if Inexpensive Derivatives then α β end Explicitly compute minimizer of m in feasible region. Store as β. end return β 11

12 2.3 Global Convergence and Zoutendji 1. Globally Convergent. Zoutendji Condition. Definition 2.5. Suppose we have an algorithm, Ω which produces iterates (x ) and denote f(x ) = f. (a) Ω is globally convergent if f 0. (That is, x converge to a stationary point.) (b) Suppose Ω produced search directions p such that p = 1. And let θ be the angle between f and p. Ω satisfies the Zoutendji Condition if cos 2 (θ ) f 2 < =1 2. Zoutendji Condition & Angle Bound implies global convergence Lemma 2.4. Suppose Ω produces a sequence (x, p, f, θ ) such that δ > 0 and 1, cos(θ ) δ. If Ω satisfies the Zoutendji condition then Ω is globally convergent. Proof. By the Zoutendji condition: Therefore, f 2 0. δ 2 =1 f 2 < 3. Example of Angle Bound Example 2.1. Suppose p = B 1 f where B are positive definite and B B 1 < M < for all. Letting = 2, B = XΛX T be the EVD of B, and X T f = z: f T cos(θ ) = B 1 f f B 1 f = z T Λ 1 z z Λ 1 z n z 2 i = i=1 λ i n z 1/λ 1 z 2 1/λ n z 2 λ n λ 1 1 M Hence, if Ω satisfies the Zoutendji Condition, we see that lim n f n 0. i=1 z 2 i λ 2 i 12

13 4. Wolfe Condition Line Search Satisfies Zoutendji Condition. Theorem 2.1. Suppose we have an objective f satisfying: (a) f is bounded from below in R n (b) Given an initial x 0, there is an open set N containing L = {x : f(x) f(x 0 ) (c) f is continuously differentiable on N (d) f is Lipschitz continuous on N Suppose we have an algorithm Ω producing (x, p, f, θ, α ) such that: (a) p is a descent direction (with p = 1) (b) α satisfies the Wolfe Conditions Then Ω satisfies the Zoutendji Condition. Proof. The general strategy is to lower bound α uniformly by C f T p, where C > 0. Then using the descent condition: f +1 f cα f T p f C f T p 2 f C cos 2 (θ ) f 2 2 f 0 C cos ( θ j ) f j 2 2 j=1 And we can conclude that since f is bounded from below by l, then f 0 f f 0 l <. The result follows. So now we set out to show that α C f T p. We do this by leveraging the Curvature Condition. (a) From the curvature condition, we have that: ( f +1 f ) T p (c 2 1) f T p (b) From Lipschitz Continuity, with constant L: ( f +1 f ) T p f +1 f p Lα p 2 = Lα Together these imply that 1 c2 L f T p α. 5. Goldstein Condition Line Search satisfies Zoutendji Condition Theorem 2.2. Suppose f has the same properties as it does in Theorem 2.1. And suppose Ω produces (x, p, f, θ, α ) such that: (a) p is a descent direction with p = 1 (b) α satisfies the Goldstein conditions. Then Ω satisfies the Zoutendji Condition. 13

14 Proof. We use Taylor s Theorem and then Lispchitz Continuity to find the lower bound. By Taylor s theorem, for t (0, 1): (1 c)α f T p f +1 f = f(x + t α p ) T α p Therefore, cα f T p ( f(x + t α p ) T f T )p Lt α p 2 Lα The result follows. 6. Bactracing Line Search satisfies Zoutendji Condition Theorem 2.3. Suppose f has the same properties as it does in Theorem 2.1. And suppose Ω produces (x, p, f, θ, α ) such that: (a) p is a descent direction with p = 1 (b) α is selected by bactracing with ᾱ = 1 Then Ω satisfies the Zoutendji condition. Proof. Suppose Ω uses ρ (0, 1). There are two cases to consider. When α = 1 and α < 1. We denote the first subsequence by (j) and the second by (l). For α (l) we now at least that α (l) /ρ was rejected, that is: f(x + α (l) p /ρ) > f + cα (l) f T p /ρ Using the same strategy as in Goldstein, t such that: f(x + t α (l) p /ρ) T α p /ρ > cα (l) f T p /ρ Therefore, by Lipschitz continuity: (1 c) f T p Lt α (l) /ρ Lα (l) /ρ We now consider α (j) = 1 since the Armijo condition gives the sequence: f (j+1) f (j)+1 f (j) + c f T p = f (j) c cos(θ (j)s ) f (j) Therefore: c j cos(θ (j) ) f (j) < Then J such that j J, cos 2 (θ (j) ) f(j) 2 cos(θ(j) ) f(j) < 1 Therefore, the Zoutendji condition holds. 14

15 3 Trust Region Methods 3.1 Trust Region Subproblem and Fidelity 1. The strategy of the trust region problem is as follows: at any point x we have a model m x (p) of f(x + p) which we trust is a good approximation in a region p. Minimizing this, we have a new iterate on which to continue applying the approach. 2. Trust Region Subproblem Definition 3.1. Let x R n and f : R n R be an objective function with model m x (p) in a region p. The Trust Region Subproblem is to find arg min{m x (p) : p } Note 3.1. Usually m x is taen to be m x (p) = f(x) + f T x p pt Bp where B is the model Hessian and is not necessarily positive definite. 3. Fidelity Definition 3.2. The Fidelity of m x in a region p at a point p + x is f(x + p) f(x) ρ(m x, f, p, ) = m x (p) m x (0) 4. Fidelity and Trust Region. Let 0 < c 1 < c 2 < 1 (a) If ρ c 2, especially if p is intended for descent, the model is a good approximation to f and we can expand the trust region (b) If ρ c 1, especially if p is intended for descent, the model is a poor approximation to f and we should shrin the trust region (c) Otherwise, the model is a sufficient approximation to f and the trust region should not be adjusted. 5. Fidelity and Iterates. Suppose p causes a descent in m x. And let η (0, 1/4] (a) If ρ < η then we do not have a sufficient decrease in the function and the process should be repeated with x (b) If ρ η then we have a sufficient decrease in f, and we should continue the process with x + p 6. Notation (a) Let x 0, x 1,... be the iterates produces by subsequential solutions to the trust region subproblem for some objective f (b) m x (p) =: m (p) (c) for a particular m is denoted (d) Accepted solutions to the subproblem at x are p so that x +1 = x + p. 15

16 3.2 Fidelity Algorithms 1. Fidelity and Trust Region Algorithm 3: Trust Region Management Algorithm input : Thresholds 0 < c 2 < c 1 < 1, Trust Region, max, ρ x if ρ x > c 1 then min(2, max ) else if ρ < c 2 then /2 else Continue end return Note 3.2. Usually c 1 = 1/4 and c 2 = 3/4 2. Fidelity and Solution Acceptance Algorithm 4: Solution Acceptance Algorithm input : Treshold η (0, 1/4], ρ,x,p if ρ x (p) η then x = x + p else Continue end return x 3.3 Approximate Solutions to Subproblem Cauchy Point 1. Cauchy Point Definition 3.3. Let m x be a model of f within p. Let p S be the direction of steepest descent at m x (0) such that p S = 1, and let τ be: arg min{m x (τp s ) : τ } Then, p C = τp S is the Cauchy Point of m x. Note 3.3. The Cauchy point is the line search minimizer of m along the direction of steepest descent subject to p. 2. Computing the Cauchy Point Lemma 3.1. Let m x be a quadratic model of f within p such that m x (p) = f(x) + g T p pt Bp Then, its Cauchy Point is: { ( ) g p C g = min, g 3 g T Bg > 0 g T Bg g g gt Bg 0 16

17 Proof. First, we compute the direction of steepest descent: p S = g g Then, we have that m x (tp S ) = f(x) + tg T p S + t2 2 (ps ) T Bp S. If the quadratic term is negative, then t = is the minimizer. And so one option is: p C = g g If the quadratic term is positive then: d dt m x(tp S ) = d dt = g T p S + tp S Bp S Setting this to 0 and substituting bac in for p S : t = g 3 g T Bg This value could potentially be larger than, so to safeguard against this, in this case: ( ) τ = min, g 3 g T Bg 3. Using the Cauchy point does not produce the best rate of convergence Dogleg Method 1. Requirement: B 0 2. Intuitively, the dogleg method moves towards the minimizer of m x which is p min = B 1 g until it reaches this point or hits the boundary of the trust region. It does this by first moving to the Cauchy Point, and then along a linear path to p min 3. Dogleg Path Definition 3.4. Let m x be the quadratic model with B 0 of f with radius. Let p C be the Cauchy Point and p min be the unconstrained minimizer of m x. The Dogleg Path is p DL (t) where: { p DL p C t t [0, 1] (t) = p C + (t 1)(p min p C ) t [1, 2] 4. Dogleg Method Definition 3.5. The Dogleg Method chooses x +1 = x + p DL (τ) where p DL (τ) =. 17

18 5. Properties of Dogleg Path Lemma 3.2. Suppose p DL (t) is the Dogleg path of m x with = and B 0. Then: (a) m x (p DL (t)) is decreasing (b) p DL (t) is increasing Proof. When 0 < t < 1, this case is uninteresting since for these values we now m x (p DL (t)) is decreasing and p DL (t) is increasing. So we consider 1 < t < 2 and reparameterize using t 1 = z. We have that: d dz m x(p DL (t)) = (p min p C ) T (g Bp C ) + z(p min p C ) T B(p min p C ) = (1 z)(p min p C ) T B(p min p C ) Letting c = (p min p C ) T B(p min p C ) > 0 since B 0, we have that the derivative is strictly negative for z < 1. To show that the norm is increasing, first we show that the angle between p C and p min is between ( π/2, π/2), then we show its length is longer. (a) For the angle, g 0: p min, p C = g T B 1 g g 2 g T Bg > 0 (b) Using Cauchy-Schwartz, g p min g p C since: B 1 g g g T B 1 g = gt B 1 gg T Bg g T g 4 Bg g T Bg Global Convergence of Cauchy Point Methods 1. Reduction obtained by Cauchy Point Lemma 3.3. The Cauchy Point p c satisfies m (p C ) m (0) 1 ( 2 g min, g ) B Proof. We have that m (p C ) m (0) = g T p C (pc ) T B (p C ) 18

19 This brings in two cases when g T Bg 0 then (dropping subscripts) m (p C ) m (0) = g g 2 gt Bg g 1 ( 2 g min, g ) B When g T Bg 0, and (g T Bg) g 3, we are again in the above case: When g T Bg > g 3 : m (p C ) m (0) = g g 2 gt Bg g g 1 ( 2 g min, g ) B m (p C ) m (0) = g 4 g T Bg + g 4 2(g T Bg) 2 gt Bg = g 4 2g T Bg 1 g 2 2 B 1 ( 2 g min, g ) B 2. Reduction achieved by Dogleg Corollary 3.1. The dogleg step p DL satisfies: m (p DL ) m (0) 1 2 g min (, g ) B Proof. Since m (p DL ) mpc the result follows. 3. Reduction achieved by any method based on Cauchy Point Corollary 3.2. Suppose a method uses step p where m (p ) m (0) c(m (p C ) m (0)) for c (0, 1). Then p satisfies: m (p ) m (0) 1 ( 2 c g min, g ) B 19

20 4. Convergence when η = 0 Theorem 3.1. Let x 0 R n and f be an objective function. Let S = {x : f(x) f(x 0 )} and S(R 0 ) = {x : x y < R 0, y S} be a neighborhood of radius R 0 about S. Suppose the following conditions: (a) f is bounded from below on S (b) f is Lipschitz continuously differentiable on S(R 0 ) for some R 0 > 0 and some constant L. (c) There is a c such that for all : m (p ) m (0) c g min (d) There is a γ 1 such that for all : (e) η = 0 (fidelity threshold) (f) B β for all. p γ (, g ) B Then: lim inf g = 0 5. Convergence when η (0, 1/4) Theorem 3.2. If η (0, 1/4) with all other conditions from Theorem 3.1 holding, then: lim g = Iterative Solutions to Subproblem Exact Solution to Subproblem 1. Conditions for Minimizing Quadratics Lemma 3.4. Let m be the quadratic function where B is a symmetric matrix. m(p) = g T p pt Bp (a) m attains a minimum if and only if B 0 and g Im(B). If B 0 then every p satisfying Bp = g is a global minimizer of M. (b) m has a unique minimizer if and only if B 0. Proof. If p is a minimizer of m we have from second order conditions that 0 = m(p ) = Bp + g and B = 2 m(p ) 0. Now suppose that g Im(B), then p such that Bp = g. Let w R n then: m(p + w) = g T p + g T w ( w T Bw + 2p T Bw + p T Bp ) = m(p) + g T w 1 2 2gT w wt Bw m(p) 20

21 where the inequality follows since B 0. This proves the first part. Now suppose m has a unique minimizer, p, and suppose w such that w T Bw = 0. Since p is a minimizer, we have that Bp = g and so from the computation above m(p+w) = m(p) and so p+w, p are minimizers to m which is a contradiction. For the other direction, since B 0 and letting p = B 1 g, by Second Order Sufficient Conditions, m has a unique minimizer. 2. Characterization of Exact Solution Theorem 3.3. The vector p is a solution to the trust region problem: min f(x) + p R gt p + 1 n 2 pt Bp : p if and only if p is feasible and λ > 0 such that: (a) (B + λi)p = g (b) λ( p ) = 0 (c) B + λi 0 Proof. First we prove ( ) direction. So λ 0 which satisfies the properties. Consider the problem m (p) = f(x) + g T p pt (B + λi)p = m(p) λ p 2 Moreover, m (p) m (p ) which implies that: m(p) m(p ) + λ( p 2 p 2 ) So if λ = 0 then m(p) m(p ) and p is minimizer of m, or if λ > 0 then p = and so for a feasible p, p, which implies m(p) m(p ). Now suppose p is the minimizer of m. If p < then p is the unconstrained minimizer of m and so λ = 0, and so m(p ) = Bp + g = 0 and 2 m(p ) = B 0 by the previous lemma. Now suppose p =. Then the second condition is automatically satisfied. We now use the Lagrangian to determine the solution. L(λ, p) = g T p+ 1 2 pt Bp+ λ 2 (pt p 2 ) Differentiating with respect to p and setting this equal to 0, we see that (B + λi)p + g = 0 Now to show that (B + λi) is positive semi definite, note that for any p such that p 2 = and since p is the minimizer of m: m(p) + λ 2 p 2 m(p ) + λ 2 p 2 Rearranging, we have that (p p ) T (B λi)(p p ) 0 And since the set of all normalized ±(p p ) where p = is dense on the unit sphere, B λi is positive semi-definite. The last thing to show is that λ 0 which follows using the fact that if λ < 0 then p must be a global minimizer of m. By the previous lemma, this is a contradiction. 21

22 3.4.2 Newton s Root Finding Iterative Solution 1. Theorem 3.3 guarantees that a solution exists and we need on find λ which satisfies these conditions. Two cases exist from this theorem: Case 1 λ = 0. If λ = 0 then B 0 and we can compute a solution p using QR decomposition. Case 2 λ > 0. First, letting QΛQ T, be the EVD of B, and defining: p(λ) = Q(Λ + λi) 1 Q T g Hence: n p(λ) 2 = g T Q(Λ + λi) 2 Q T (qj T g = g)2 (λ j + λ) 2 j=1 From Theorem 3.3, we want to find λ > 0 for which: 2 = n j=1 (q T j g)2 (λ j + λ) 2 2. The second case has two sub-cases which must be considered. Let Q 1 be the columns of Q which correspond to the eigenspace of λ 1. Case 2a If Q T 1 g 0, then we implement Newton s Root finding algorithm on: f(λ) = 1 1 p(λ) since as λ λ 1, p(λ) and so a solution exists. Case 2b If Q T 1 g = 0, then applying Newton s root finding algorithm naively is not useful since p(λ) as λ λ 1. We note that when λ = λ 1, (B + λi) 0 and so we can find a τ such that: 2 = j:λ 1 λ j (qt j g)2 (λ j λ 1 ) 2 + τ 2 Computation. z 0 such that (B λ 1 I)z = 0 and z = 1. Therefore, q T j z = 0 for all j : λ 1 λ j. Setting we need only find τ. p(λ 1, τ) = j:λ 1 λ j (qt j g)2 (λ j λ 1 ) 2 + τ Trust Region Subproblem Algorithms 22

23 4 Conjugate Gradients 1. Conjugated Gradients Overview Algorithm 5: Overview of Conjugate Gradient Algorithm input : x 0 a starting point, ɛ tolerance, f objective x x 0 Compute Gradient: r f(x) Compute Conjugated Descent Direction: p r while r > ɛ do Compute Optimal Step Length: α Compute Step: x x + αr Compute Gradient: r f(x) Compute Conjugated Step Direction: p end return Minimizer: x 2. Conjugate Gradients for minimizing convex (A 0), quadratic function f(x) = 1 2 xt Ax b T x Algorithm 6: Conjugate Gradients Algorithm for Convex Quadratic input : x 0 a starting point, ɛ = 0 tolerance, f objective x x 0 Compute Gradient: r f(x) = Ax b Compute Conjugated Descent Direction: p r while r > ɛ do Compute Optimal Step Length: α = rt r p T Ap Compute Step: x x + αr Store Previous: r old r Compute Gradient: r r + αap Compute Conjugated Step Direction: p r + end return Minimizer: x r 2 p r old 2 3. Fletcher-Reeves for Nonlinear Functions Algorithm 7: Fletcher Reeves CG Algorithm input : x 0 a starting point, ɛ tolerance, f objective x x 0 Compute Gradient: r f(x) Compute Conjugated Descent Direction: p r while r > ɛ do Compute Optimal Step Length: α (Strong Wolfe Line Search) Compute Step: x x + αr Store Previous: r old r Compute Gradient: r f(x) Compute Conjugated Step Direction: p r + r 2 r 2 old p end return Minimizer: x 23

24 Part II Model Hessian Selection 5 Newton s Method 1. Requires 2 f 0 in some region for the method to wor. 2. Line Search (a) The search direction: p N = 2 f 1 f (b) Rate of Convergence Theorem 5.1. Suppose f is an objective with minimum x satisfying the Second Order Conditions. Moreover: i. Suppose f is twice continuously differentiable in a neighborhood of x ii. Specifically, suppose 2 f is Lipschitz continuous in a neighborhood of x iii. Suppose p = p N and x +1 = x + p N iv. Suppose x x Then for x 0 sufficiently close to x : i. x x quadratically ii. f 0 quadratically. Proof. Our goal is to get x +1 x in terms of 2 f and use Lipschitz continuity to provide an upper bound. i. We have that: x +1 x = x + p x = x x 2 f 1 f = 2 f 1 [ 2 f (x x ) ( f f ) ] = 2 f 1 = 2 f 1 [ 2 f (x x ) 1 0 ] 2 f(x + t(x x ))(x x )dt [ 1 ( 2 f 2 f(x + t(x x )) ) ] (x x )dt 0 24

25 ii. Using Lipschitz continuity: x +1 x 2 f 1 2 f f 2 f(x + t(x x )) x x dt tl x x 2 dt 2 f 1 L 2 x x 2 We now need to show that 2 f 1 is bounded from above. Let η(2r) be a neighborhood of radius 2r about x for which 2 f 0. In this neighborhood, 2 f 1 exists and g(x) = 2 f(x) 1 is continuous. Then, on the compact closed ball η(r), g(x) has a finite maximum. Therefore, we have quadratic convergence of x x. iii. We now consider the first derivative: f +1 f = f +1 = f+1 f 2 f p 1 [ = 2 (x + tp ) 2 ] f p dt 0 1 Lt p 2 L 2 g(x ) 2 f 2 So if x 0 η(r), the result follows Trust Region (a) Asymptotically Similar Definition 5.1. A sequence p is asymptotically similar to a sequence q if p q = o( q ) (b) The Subproblem (c) Rate of Convergence min f(x) + f T p R n x p pt 2 f x p : p Theorem 5.2. Suppose f is an objective with minimum x satisfying the Second Order Conditions. Moreover: i. Suppose f is twice continuously differentiable in a neighborhood of x 25

26 ii. Specifically, suppose 2 f is Lipschitz continuous in a neighborhood of x iii. Suppose for some c (0, 1), p satisfy ( m (p ) m (0) c f min, f ) 2 f and are asymptotically similar to p N when p N 1 2. iv. Suppose x +1 = x + p and x x Then: i. becomes inactive for sufficiently large. ii. x x superlinearly. Proof. We need to show that are bounded from below. The second part follows from this quicly since x x implies p N 0 and eventually p N 1 2. So applying the asymptotic similarity: x + p x x + p N x + p N p o( x x 2 ) + o( p N ) o( x x 2 ) + o( x x ) o( x x ) where the penultimate inequality follows from: 2 f 1 f f ( 2 f 1 x x sup x η(r) 2 f(x) ) To get that is bounded from below, we frame it in terms of ρ. The numerator of ρ 1 satisfies: f(x + p ) f(x ) m (p ) m (0) 1 1 ( 2 f(x + tsp ) 2 ) f p dt 2 pt 0 L 4 p 3 And the denominator of ρ 1 satisfies: ( m (p ) m (0) c f min, f ) 2 f ( c f min p, f ) 2 f So we need only lower bound f by p to translate this to a lower bound for. If p N > 1 2 then (in some neighborhood of x ) p 2 p N 2 2 f 1 f C f 26

27 Else if p N 1 2 then Therefore: p p N + o( p N ) 2 p N C f m (p ) m (0) c p C ( ) min p p, 2 f 2 f 1 Since the second term s denominator is the condition number of the matrix (which must be greater than or equal to 1), we have that: ρ 1 C p C Hence, if < 3 4C, ρ > 3/4 and would double. So we now that 1 ˆ for some fixed M such that 1 ˆ 3 2 M 2 M 4C. So we have that for sufficiently large, p 0 and so ρ 1 and becomes inactive. 27

28 6 Newton s Method with Hessian Modification 1. If 2 f 0, we can add a matrix E so that 2 f + E =: B 0. As long as this model produces a descent direction, and B B 1 are bounded uniformly, Zoutendji guarantees convergence. 2. Newton s Method with Hessian Modification Algorithm 8: Line Search with Modified Hessian input : Starting point x 0, Tolerance ɛ 0 while p > ɛ do Find E so that 2 f + E 0 p Solve B p = g Compute α (Wolfe, Goldstein, Bactracing, Interpolation) x +1 x + α p + 1 end 3. Minimum Frobenius Norm (a) Problem: find E which satisfies min E 2 F : λ min(a + E) δ. (b) Solution: If A = QΛQ T is the EVD of A then E = Qdiag(τ i )Q T where { 0 λ i δ τ i = δ λ i λ i < δ 4. Minimum 2-Norm (a) Problem: find E which satisfies min E 2 2 : λ min(a + E) δ (b) Solution: If λ min (A) = λ n then E = τi where: { 0 λ n δ τ = δ λ n λ n < δ 5. Modified Cholesy (a) Compute LDL Factorization but add constants to ensure positive definiteness of A (b) May require pivoting to ensure numerical stability 6. Modified Bloc Cholesy (a) Compute LBL factorization where B is a bloc diagonal (1 1 and 2 where the number of blocs is the number of positive eigenvalues and the number of 2 2 blocs is the number of negative eigenvalues) (b) Compute the EVD of B and modify it to ensure appropriate eigenvalues. 28

29 7 Quasi-Newton s Method 1. Fundamental Idea Computation. Let f be our objective and suppose it is Lipschitz twice continuously differentiable. From Taylor s Theorem: f(x + p) = f(x) + 1 = f(x) + 2 f(x)p f(x + tp)pdt 1 = f(x) + 2 f(x)p + o( p ) Therefore, letting x = x and p = x +1 x 2 f (x +1 x ) f +1 f 0 ( 2 f(x + tp) 2 f(x) ) pdt Quasi-Newton methods choose an approximation to 2 f, B which satisfies: B +1 (x +1 x ) = g +1 g 2. Secant Equation Definition 7.1. Let x i be a sequence of iterates when minimizing an objective function f. Let s = x +1 x and y = g +1 g where g i = f i. Then the secant equation is where B +1 is some matrix. 3. Curvature Condition B +1 s = y Definition 7.2. The curvature condition is s T y > 0. Note 7.1. If B satisfies the secant equation and s, y satisfy the curvature condition then s T B +1s > 0. So the quadratic along the step direction is convex. 4. Line Search Rate of Convergence Theorem 7.1. Suppose f is an objective with minimum x satisfying the Second Order Conditions. Moreover: (a) Suppose f is twice continuously differentiable in a neighborhood of x (b) Suppose B is computed using a quasi-newton method and B p = g (c) Suppose x +1 = x + p 29

30 (d) Suppose x x When x 0 is sufficiently close to x, x x superlinearly if and only if: (B 2 f(x ))p lim p 0 Proof. We show that the condition is equivalent to: p p N = o( p ). First, if the condition holds: p p N = 2 f 1 ( 2 ) f p + f = 2 f 1 ( 2 ) f B p = 2 f 1 [( 2 f 2 f(x ) ) p + ( 2 f(x ) ] ) B p Taing norms and using that x 0 is sufficiently close to x, and continuity: p p N = o( p ) For the other direction: o( p ) = p p N ( 2 f(x ) B ) p Now we just apply triangle inequality and properties of the Newton Step: x + p x x + p N x + p p N = o( x x 2 )+o( p ) Note that o( p ) = o( x x ). 7.1 Ran-2 Update: DFP & BFGS 1. Davidson-Fletcher-Powell Update (a) Let: G = f(x + tαp )dt So Taylor s Theorem implies: y = G s (b) Let: A W = G 1/2 AG 1/2 F (c) Problem. Given B symmetric positive definite, s and y, find arg min B B W : B = B T, Bs = y (d) Solution: B +1 = ( I y s T ) ( y T s B I s y T ) y T s + y y T y T s 30

31 (e) Inverse Hessian: H +1 = H H y y T H y T H y + s s T y T s 2. Broyden, Fletcher, Goldfarb, Shanno Update (a) Problem: Given H (inverse of B ) symmetric positive definite, s and y, find arg min H H W : H = H T, Hy = s (b) Solution: H +1 = ( I s y T ) ( y T s H I y s T ) y T s + s s T y T s (c) Hessian: B +1 = B B s s T B s T B s 3. Preservation of Positive Definiteness + y y T y T s Lemma 7.1. If B 0 and B +1 is updated from the BFGS method, then B Proof. We can equivalently show that H Let z R n \ {0}. Then: z T H +1 z = w T H w + ρ (s T z) 2 where w = z ρ (s T z)y. If s T z = 0 then w = z and zt H z > 0. If s T z 0, then ρ (s T z)2 > 0 (regardless of if w = 0) and so H Ran-1 Update: SR1 1. Problem: find v such that: B +1 = B + σvv T : B +1 s = y 2. Solution: B +1 = B + (y B s )(y B s ) T s T (y B s ) Proof. Multiplying by s : Hence, v is a multiple of y B s. y B s = σ(v T s )v 31

32 Part III Specialized Applications of Optimization 8 Inexact Newton s Method 1. Requirements for search direction: (a) p is inexpensive to compute (b) p approximates the newton step closely as measured by: z = 2 f p f (c) The error is controlled by the current gradient: z η f 2. Typically, inexact methods eep B = 2 f and improve on computing b 3. CG Based Algorithmic Overview Algorithm 9: Overview of Inexact CG Algorithm input : x 0 a starting point, f objective, restrictions x x 0 Define Tolerance: ɛ Compute Gradient: r f(x) Compute Conjugated Descent Direction: p r while r > ɛ do Chec Constrained Polynomial if p T Bp 0 then Compute α given restrictions Compute Minimizer: x x + αp Brea Loop else Compute Optimal Step Length: α if Restrictions on x then Compute τ given restrictions Compute Minimizer: x x + ταp Brea Loop else Compute Step: x x + αr end Compute Gradient: r f(x) Compute Conjugated Step Direction: p end end x x return Minimizer: x 4. As with the usual CG, we can implement methods for computing the gradient and conjugate step direction efficiently 32

33 5. Line Search Strategy: the restriction of α is simply that it satisfies the strong Wolfe Conditions 6. Trust-Region Strategy: (a) The restriction on α is that x + αp = because the restricted polynomial in the direction p is decreasing, so we should go all the way to the boundary (b) The restriction on x is that x + αp > because x+αp is no longer in the trust region so we need to find τ that ensures this happens. 7. An alternative is to use Lanczos Method, which generalizes CG Methods. 9 Limited Memory BFGS 1. In normal BFGS: H +1 = (I ρ s y T )H (I ρ y s T ) + ρ s s T 2. Substituting in for H until H 0 reveals that only s, y must be stored to compute H Limited BFGS capitalizes on this by storing a fixed number of s, y and re-updating H from a (typically diagonal) matrix H0 Algorithm 10: L-BFGS Algorithm input : x 0 a starting point, f objective, m storage size, ɛ tolerance x x 0 0 while f(x) > ɛ do Choose H 0 Update H from {(s, y ),..., (s m, y m )} Compute p u H f(x) Implement Line Search or Trust Region (dogleg) to find p x x + p + 1 end return Minimizer: x 33

34 10 Least Squares Regression Methods 1. In least squares problems, the objective function has a specialized form, for example: m f(x) = r j (x) 2 i=1 (a) x R n and usually are the parameters of a particular model of interest (b) r j are usually the residuals evaluated for the model φ with parameter x at point t j and output y j. That is: 2. Formulation r j (x) = φ(x, t j ) y j (a) Let r(x) = [ r 1 (x) r 2 (x) r m (x) ]. Then: f(x) = 1 2 r(x) 2 (b) The first derivative: f(x) = r(x) T r(x) = r 1 (x) T r 2 (x) T. r m (x) T r(x) =: J(x)T r(x) (c) The Second Derivative: 2 f(x) = r(x) T r(x) + 2 r(x)r(x) = J(x) T J(x) + 2 r(x)r(x) 3. The main conceit behind Least Squares methods is to approximate 2 f(x) by J(x) T J(x), thus, only first derivative information is required Linear Least Squares Regression 1. Suppose φ(x, t j ) = t T j x then: r(x) = T x y where each row of T is t T j 2. Objective Function: f(x) = 1 T x y 2 2 Note The objective function can be solved directly using matrix factorizations if m is not too large. 3. First Derivative: f(x) = T T T x T y Note The normal equations may be better to solve if m is large because by multiplying by the transpose, we need only solve an n n system of equations. 34

35 10.2 Line Search: Gauss-Newton 1. The Gauss-Newton Algorithm is a line search for non-linear least squares regression with search direction: J T J p = J T r 2. Notice that these are the normal equations from: 1 2 J p + r 2 3. Using this search direction, we proceed with line search as per usual 10.3 Trust Region: Levenberg-Marquardt 1. The local model is: m(p) = f(x) + (J(x) T r(x)) T p pt J(x) T J(x)p p 2. Since f(x) = r(x) 2 /2, the solution to minimizing m(p) is equivalent to: min 1 J(x)p + r 2 p 2 35

36 Part IV Constrained Optimization 10.4 Theory of Constrained Optimization 1. In constrained optimization, the problem is: min{x arg min f(x) : c i (x) = 0 i E, c j (x) 0 j I} (a) f is the usual objective function (b) E is the index for equality constraints (c) I is the index for inequality constraints 2. Feasible Set. Active Set. Definition The feasible set, Ω, is the set of all points which satisfy the constraints: Ω = {x : c i (x) = 0 i E, c j (x) 0 j I} The active set at point x Ω, A(x), is defined by: A(x) = E {j I : c j (x) = 0} 3. Feasible Sequence. Tangent. Tangent Cone. Definition Let x Ω. A sequence z x is a feasible sequence if for sufficiently large K, if K then z Ω. A vector d is a tangent to x if there is a feasible sequence z x and a sequence t 0 such that: z x lim = d t The set of all tangents is the Tangent Cone at x, denoted T Ω (x). Note T Ω (x) contains all step directions which for a sufficiently small step size will remain in the feasible region Ω. 4. Linearized Feasible Directions. Definition Let x Ω and A(x) be its active set. The set of linearized feasible directions, F(x) is defined as: F(x) = {d : d T c i (x) = 0 i E, d T c i (x) 0 A(x) I} Note Let x Ω and suppose we linearize the constraints near x. The set F(x) contains all search directions for which the linearized approximations to c i (x+d) satisfy the constraints. For example If c i (x) = 0 for i E, then 0c i (x + d) c i (x) + d T c i (x) = d T c i (x) for d to be a good search direction. 5. LICQ Constrain Qualification 36

37 Definition Given a point x and active set A(x), the linear independence constrain qualification (LICQ) holds if the set of active constraint gradients, { c i (x) : i A(x)}, is linearly independent. Note Most algorithms use line search or trust region require that F(x) T Ω (x), else there is not a good way to find another iterate. 6. First Order Necessary Conditions Theorem Let f, c i be continuously differentiable. Suppose x is a local solution to the optimization problem and at x, LICQ holds. Then there is a Lagrange multiplier vector λ, with components λ i for i E I, such that the Karush-Kuhn-Tucer conditions are satisfied at (x, λ): (a) x L(x, λ) = 0 (b) For all i E, c i (x ) = 0 (c) For all j I, c j (x ) 0 (d) For all j I, λ j 0 (e) For all j I, λ j c j (x ) = 0 37

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

EECS260 Optimization Lecture notes

EECS260 Optimization Lecture notes EECS260 Optimization Lecture notes Based on Numerical Optimization (Nocedal & Wright, Springer, 2nd ed., 2006) Miguel Á. Carreira-Perpiñán EECS, University of California, Merced May 2, 2010 1 Introduction

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Trust Region Methods. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Trust Region Methods Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Trust Region Methods min p m k (p) f(x k + p) s.t. p 2 R k Iteratively solve approximations

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review Department of Statistical Sciences and Operations Research Virginia Commonwealth University Oct 16, 2013 (Lecture 14) Nonlinear Optimization

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Examination paper for TMA4180 Optimization I

Examination paper for TMA4180 Optimization I Department of Mathematical Sciences Examination paper for TMA4180 Optimization I Academic contact during examination: Phone: Examination date: 26th May 2016 Examination time (from to): 09:00 13:00 Permitted

More information

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods Department of Statistical Sciences and Operations Research Virginia Commonwealth University Sept 25, 2013 (Lecture 9) Nonlinear Optimization

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Quasi-Newton Methods

Quasi-Newton Methods Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions. Charles J. Geyer. March 27, 2013 Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

More information

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract

HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION. Darin Griffin Mohr. An Abstract HYBRID RUNGE-KUTTA AND QUASI-NEWTON METHODS FOR UNCONSTRAINED NONLINEAR OPTIMIZATION by Darin Griffin Mohr An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Review of Classical Optimization

Review of Classical Optimization Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization

More information

Numerical Optimization

Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints Journal of Computational and Applied Mathematics 161 (003) 1 5 www.elsevier.com/locate/cam A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality

More information

Optimization Methods for Circuit Design

Optimization Methods for Circuit Design Technische Universität München Department of Electrical Engineering and Information Technology Institute for Electronic Design Automation Optimization Methods for Circuit Design Compendium H. Graeb Version

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

A line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 )

A line-search algorithm inspired by the adaptive cubic regularization framework with a worst-case complexity O(ɛ 3/2 ) A line-search algorithm inspired by the adaptive cubic regularization framewor with a worst-case complexity Oɛ 3/ E. Bergou Y. Diouane S. Gratton December 4, 017 Abstract Adaptive regularized framewor

More information

2.3 Linear Programming

2.3 Linear Programming 2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method

Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 14T-011-R1

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Computational Optimization. Augmented Lagrangian NW 17.3

Computational Optimization. Augmented Lagrangian NW 17.3 Computational Optimization Augmented Lagrangian NW 17.3 Upcoming Schedule No class April 18 Friday, April 25, in class presentations. Projects due unless you present April 25 (free extension until Monday

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

Arc Search Algorithms

Arc Search Algorithms Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Sub-Sampled Newton Methods I: Globally Convergent Algorithms Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Second Order Optimization Algorithms I

Second Order Optimization Algorithms I Second Order Optimization Algorithms I Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 7, 8, 9 and 10 1 The

More information

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA Survey of NLP Algorithms L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA NLP Algorithms - Outline Problem and Goals KKT Conditions and Variable Classification Handling

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Unconstrained Optimization

Unconstrained Optimization 1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Preconditioned conjugate gradient algorithms with column scaling

Preconditioned conjugate gradient algorithms with column scaling Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Preconditioned conjugate gradient algorithms with column scaling R. Pytla Institute of Automatic Control and

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min LLNL IM Release number: LLNL-JRNL-745068 A STRUCTURED QUASI-NEWTON ALGORITHM FOR OPTIMIZING WITH INCOMPLETE HESSIAN INFORMATION COSMIN G. PETRA, NAIYUAN CHIANG, AND MIHAI ANITESCU Abstract. We present

More information

Constrained Optimization Theory

Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information