ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS

Size: px
Start display at page:

Download "ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS"

Transcription

1 ON THE CONNECTION BETWEEN THE CONJUGATE GRADIENT METHOD AND QUASI-NEWTON METHODS ON QUADRATIC PROBLEMS Anders FORSGREN Tove ODLAND Technical Report TRITA-MAT-203-OS-03 Department of Mathematics KTH Royal Institute of Technology February 203 Abstract It is well nown that the conugate gradient method and a quasi-newton method, using any well-defined update matrix from the one-parameter Broyden family of updates, produce the same iterates on a quadratic problem with positive-definite Hessian. This equivalence does not hold for any quasi-newton method. We discuss more precisely the conditions on the update matrix that give rise to this behavior, and show that the crucial fact is that the components of each update matrix is choosen in the last two dimensions of the Krylov subspaces defined by the conugate gradient method. In the framewor based on a sufficient condition to obtain mutually conugate search directions, we show that the one-parameter Broyden family is complete. We also show that the update matrices from the one-parameter Broyden family is almost always well-defined on a quadratic problem with positivedefinite Hessian. The only exception is when the symmetric ran-one update is used and the unit steplength is taen in the same iteration, in this case it is the Broyden parameter that becomes undefined. Introduction In this paper we examine some well-nown methods used for solving unconstrained optimization problems and specifically their behavior on quadratic problems. A motivation why these problems are of interest is that the tas of solving a linear system of equations Ax = b, Optimization and Systems Theory, Department of Mathematics, KTH Royal Institute of Technology, SE Stocholm, Sweden (andersf@th.se, odland@th.se). Research supported by the Swedish Research Council (VR).

2 2 On the connection between CG and QN with the assumption A = A T 0, may equivalently be considered as the one of solving an unconstrained quadratic programming problem, min q(x) = min x Rn x R n 2 xt Hx + c T x, (QP) where one lets H = A and c = b to obtain the usual notation. Some examples of methods that can be used to solve (QP) are the steepest descent method, Quasi-Newton methods, Newton s method and conugate directions methods. Given an initial guess x 0, the general idea of these methods is to, in each iteration, generate a search direction p and then tae a step α along that direction to approach the optimal solution. For 0, the next iterate is hence obtained as x + = x + α p. () The main difference between the methods mentioned above is the manner in which the search direction p is generated. For high-dimensional problems it is preferred that only function and gradient values are used in calculations. The gradient of the obective function q(x) is given by g(x) = Hx + c and its value at x is denoted by g. The research presented in this paper stems from the desire to better understand the well-nown connection between the conugate gradient method, henceforth CG, and quasi-newton methods, henceforth QN, namely that, using exact linesearch, these two methods will generate the same sequence of iterates as they approach the optimal solution of (QP). We are interested in understanding this connection especially in light of the choices made in association with the generation of p in QN. QN and CG will be introduced in Section 2 where we also state some bacground results on the connection between the two methods. In Section 3 we present our results and some concluding remars are made in Section 4. 2 Bacground On (QP), naturally exact linesearch is used to obtain the steplength in each iteration. Exact linesearch entails that the steplength is chosen such that the unique minimum of the function value along the given search direction p is obtained. In iteration, the optimal steplength is given by α = pt g p T Hp. (2) Since H = H T 0, it can be shown that the descent property, q(x + ) < q(x ), holds, as long as p T g 0. (3) Solves (QP) in one iteration, but requires an explicit expression of the Hessian H.

3 2 BACKGROUND 3 Note that if p T g < 0, i.e. if p is a descent direction then α > 0 and a step is taen along that direction. If p T g > 0, i.e. if p is an ascent direction then α < 0 and a step is taen in the opposite direction. On (QP), one is generally interested in methods for which the optimal solution is found in at most n iterations. It can be shown that a sufficient property for this behavior is that the method generates search directions which are mutually conugate with respect to H, i.e. p T i Hp = 0, i, (4) see, e.g., [3, Chapter 5]. A generic way to generate conugate vectors is by means of the conugate Gram- Schmidt process. Given a set of linearly independent vectors {a 0,..., a n }, a set of vectors {p 0,..., p n } mutually conugate with respect to H can be construct by letting p 0 = a 0 and for > 0, p = a + β p. (5) The values of {β } =0 are uniquely determined in order to mae p conugate to {p 0,..., p } with respect to H. Conugate direction methods is the common name for all methods which are based on generating search directions in this manner. See, e.g., [6] for an intuitive introduction. 2. Quasi-Newton methods In QN methods the search directions are generated by solving =0 B p = g, (6) in each iteration. A well-nown sufficient condition for this p to be conugate with respect to H to all previous search directions, and hence guarantee the finite termination property, is for B to satisfy B p i = ρ i Hp i, i, (7) where ρ i is an arbitrary nonzero scaling factor that we for the time being will put to. Observe that (7) can be split in the two sets of equations and B p = Hp, (8) B p i = Hp i, i 2. (9) Here (8) can be seen as installing the correct curvature for B over p, while (9) installs the correct curvature over the rest of the previous search directions. The matrix B is in this sense an approximation of the Hessian H. 2 2 The choice B = H would give Newton s method, wheras the choice B = I would give the method of steepest-descent.

4 4 On the connection between CG and QN In this paper, we will consider symmetric approximations of the Hessian, i.e. B = B T. As will be mentioned, one could also consider unsymmetric approximation matrices, see [0]. The first suggestion of a QN method was made by Davidon in 959 [3], using the term variable metric method. In 963, in a famous paper by Fletcher and Powell [5], Davidon s method was modified 3 and this was the starting point for maing these QN methods widely nown, used and studied. We choose to wor with an approximation of the Hessian rather than an approximation of the inverse Hessian, M, as many of the earlier papers did, e.g. [5]. Our results can however straightforwardly be derived for the inverse point of view where (6) is replaced by p = M g. The approximation matrix B used in iteration to solve for p is obtained by adding an update matrix, U, to the previous approximation matrix, B = B + U. (0) One often considers the Cholesy factorization of B, then (6) can be solved in order of n 2 operations. Also, if in (0) the update matrix U is of low-ran, one does not need to compute the Cholesy factorization of B from scratch in each iteration, see, e.g., [7]. Equation (7) can be seen as a starting point for deriving update schemes. In the literature, much emphasis is placed on that the updates should satisfy (8), the so called quasi-newton condition. 4 Probably the most famous update scheme is the one using update matrices from the one-parameter Broyden family of updates [] described by U = Hp p T H p T Hp B p p T B p T B p + φp T B p ww T, () with Hp w = p T Hp B p p T B, p and where φ is a free parameter, which we will refer to as the Broyden parameter. For all updates in this family, (0) has the property of hereditary symmetry, i.e. if B is symmetric then B will be symmetric. The update given by the choice φ = 0 is nown as the Broyden-Fletcher-Goldfarb-Shanno-update, or BFGS-update for short. For this update, when exact linesearch is used, (0) has the property of hereditary positive definiteness, i.e. if B 0 then B 0. An implication of this is that for all updates given by φ 0, when exact linesearch is used, (0) has the property of hereditary positive definiteness, see, e.g., [, Chapter 9]. Note that 3 We have made both a simplification by which certain orthogonality conditions which are important to the rate of attaining the solution are preserved, and also an improvement in the criterion of convergence. [5] 4 This condition alone is not a sufficient condition on B to give conugate directions.

5 2 BACKGROUND 5 there are updates in the one-parameter Broyden family for which (0) does not have this property. For all values of the Broyden parameter φ in (), the approximation matrix B given by (0) satisfies (7). The one-parameter Broyden family of updates is by no means the only updates that give an approximation matrix B that satisfies (7). However, on (QP), they are lined in a particular way to the conugate gradient method introduced next. 2.2 Conugate gradient method With the choice a = g in (5) one obtains the conugate gradient method, CG, of Hestenes and Stiefel [9]. In effect, in CG let p 0 = g 0, and for > 0 the only β-values in (5) that will be non-zero are β, = pt Hg p T Hp, (2) where one may drop the first sub-index. Equation (5) can then be written as p = g + β p. (3) From the use of exact linesearch it holds that g T p = 0,, which implies g T g = 0,, (4) i.e. the produced gradients are orthogonal and therefore linearly independent, as required. In CG, one only needs to store the most recent previous search direction, p. This is a reduction in the amount of storage required compared to a general conugate direction method where potentially all previous search directions are needed to compute p. Although equations (), (2), (2) and (3) give a complete description of an iteration of CG, the power and richness of the method is somewhat clouded in notation. An intuitive way to picture what happens in an iteration of CG is to describe it as a Krylov subspace method. Definition 2.. Given a matrix A and a vector b the Krylov subspace generated by A and b is given by K (b, A) = span{b, Ab,..., A b}. Krylov subspaces are linear subspaces, which are expanding, i.e. K (A, b) K 2 (A, b) K 3 (A, b)..., and dim(k (b, A)) =. Given x K (A, b), then Ax K + (A, b), see, e.g., [8] for an introduction to Krylov space methods. CG is a Krylov subspace method and iteration may be put as the following constrained optimization problem min q(x), s.t. x x 0 + K + (p 0, H). (CG ) The optimal solution of (CG ) is x + and the corresponding multiplier is given by g + = q(x + ) = Hx + + c. In each iteration, the dimension of the affine

6 6 On the connection between CG and QN subspace where the optimal solution is sought increases by one. After at most n iterations the optimal solution in an n-dimensional space is found, which will then be the optimal solution of (QP). It may happen, depending on the number of distinct eigenvalues of H and the orientation of p 0, that the optimal solution is found after less than n iterations, see, e.g., [5, Chapter 6]. We will henceforth let n CG denote the number of iterations that CG will require to solve a given problem on the form (QP). The search direction p belongs to K + (p 0, H), and as it is conugate to all previous search directions with respect to H it holds that span{p 0, p,..., p } = K + (H, p 0 ), i.e., the search directions p 0,..., p form an H-orthogonal basis for K + (p 0, H). This result is summarized in the following lemma, for a proof see, e.g. [8]. Lemma 2.2. If a set of vectors {p 0,..., p } satisfy p i K i+ (p 0, H), i and p T i Hp = 0, i, then span{p 0,..., p } = K + (p 0, H). We will henceforth refer to the search direction produced by CG, in iteration on (QP), as p CG. Since the gradients are mutually orthogonal, and because of the relationship with the search directions in (3), it can be shown that span{g 0,..., g } = K + (p 0, H), i.e. the gradients form an orthogonal basis for the subspace K + (p 0, H). General conugate direction methods can not be described as Krylov subspace methods, since in general one does not have span{p 0,..., p } = K + (p 0, H). We will use this special characteristic of CG when investigating the well-nown connection to QN. Although our focus is quadratic programming it may interest the reader to now that CG was extended to general unconstrained problems by Fletcher and Reeves [6]. 2.3 Bacground results Given two parallel vectors and an initial point x, performing exact linesearch from the initial point with respect to a given obective function along these two vectors will yield the same iterate x +. Hence, two methods will find the same sequence of iterates if and only if the search directions generated by the two methods are parallel. In [0], Huang shows that a QN method with B in the Huang family of updates produces search directions which are parallel to those of CG on (QP). In the Huang family of updates the additional scaling parameter ρ i in (7) is allowed to be different from, also this family includes unsymmetric approximation matrices. The symmetric part of the Huang family, with ρ i =, i, is the one-parameter Broyden family, see, e.g., [4].

7 3 RESULTS 7 As we focus on symmetric approximations, our interest is on the connection between CG and QN with B in the one-parameter Broyden family of updates. However, the following results could straightforwardly be translated to the case of an arbitrary ρ i in (7), but we eep this parameter fixed to to eep the notation cleaner. Nazareth [2] shows that a QN method with B updated using the BFGS-update, gives a search direction that, in addition to being parallel to the search direction of CG, has the same length. Bucley, see e.g., [2], uses this special connection to design algorithms that combine CG and QN using the BFGS-update. Another interesting result proved by Dixon, [4], is that on any convex function, using perfect line-search 5, the one-parameter Broyden family gives rise to parallel search directions. However, the connection between CG and QN does not hold for general convex functions. All B satisfying (7) can be seen as defining different conugate direction methods, but only certain choices of B will give CG. The one-parameter Broyden family satisfies (7), but some additional condition is required in order to obtain the special conugate direction method which is CG. What is it about this family of updates that maes the generated search direction p parallel to p CG for all? In this paper, we provide an answer to this question by turning our attention to the search directions defined by CG and the Krylov subspaces they span. We will investigate the conditions on B and in particular on U that arise in this setting. The conditions we obtain will contribute to a better understanding of the well-nown connection between CG and QN. 3 Results As a reminder for the reader we will, in the following proposition, state the necessary and sufficient conditions for when each vector in the set {p 0,..., p } is parallel to the corresponding vector in the set {p CG 0,..., p CG }. Given x 0, one may calculate g 0 = Hx 0 + c. Proposition 3.. Let p 0 = p CG 0 = g 0. Then for all 0, p 0 is parallel to if and only if p CG (i) p K + (p 0, H), and, (ii) p is conugate to {p 0,..., p } with respect to H. Proof. For the sae of completeness we include the proof. Necessary: Suppose p is parallel to p CG for all 0. Then p = δ p CG, for arbitrary nonzero scalars δ for, and δ 0 =. As p CG satisfies (i), it holds that p = δ p CG K + (p 0, H), since K + (p 0, H) is a linear subspace. And as p CG p T Hp i = δ δ i (p CG ) T Hp CG i = 0, i, 5 A generalization of exact linesearch for general convex functions, see [4]. satisfies (ii), it follows that,

8 8 On the connection between CG and QN i.e. p is conugate to {p 0,..., p } with respect to H. Sufficient: Suppose p satisfies (i) and (ii) for all 0. It is clear that p 0 and p CG 0 are parallel. From (i) one has that, for all, p and p CG lie in the same subspace of dimension +, namely K + (p 0, H). By (ii) and Lemma 2.2, both {p 0,..., p } and {p CG 0,..., p CG } form an H-orthogonal basis for the space K (p 0, H). This only leaves one dimension of K + (p 0, H) in which p and p CG can lie. Hence, p and p CG are parallel for each. These necessary and sufficient conditions will serve as a foundation for the rest of our results. We will determine conditions, first on B, and second on U used in QN, in order for the generated search direction p to satisfy the conditions of Proposition 3.. Given the current iterate x, one may calculate g = Hx +c. Hence, in iteration of QN, p is determined from (6) and depends entirely on the choice of B. Assume that B is constructed as, n B = I + γ v v T, (5) = i.e. an identity matrix plus linear combination of ran one matrices. 6 We mae no assumption on n. Let B 0 = I, then p 0 = p CG 0 = g 0. Given a sequence {p 0,..., p } that satisfies the conditions of Proposition 3., the following lemma gives sufficient conditions on B in order for the sequence {p 0,..., p, p } to satisfy the conditions of Proposition 3.. Lemma 3.2. Let B 0 = I, so that p 0 = p CG 0 = g 0. Given {p 0,..., p } that satisfies the conditions of Proposition 3., if p is obtained from (6), and B satisfies (i) v K + (p 0, H) for all =,..., n, and, (ii) B p i = Hp i for all i, then {p 0,..., p, p } will satisfy the conditions of Proposition 3.. Proof. First we show that p satisfies condition (i) of Proposition 3.. If (5) is inserted into (6) one obtains, ( n I + γ v v T ) p = g, = so that where n p = g γ v v T p = g + ˆγ v, n = = ˆγ = γ v T p, 6 Any symmetric matrix can be expressed on this form.

9 3 RESULTS 9 are scalars for all. Since g K + (p 0, H), and since, by assumption, v K + (p 0, H) for all, it holds that p K + (p 0, H). Hence, p satisfies condition (i) of Proposition 3.. Condition (ii) is identical to (7) with ρ i = for all i, and this is a sufficient condition for p to be conugate to {p 0,..., p } with respect to H, i.e. p satisfies condition (ii) of Proposition 3.. If in QN, for each = 0,..., n CG, the matrix B satisfies the conditions of Lemma 3.2, then one will obtain a sequence {p 0,..., p ncg } that satisfies the conditions of Proposition 3.. Hence, with exact line-search, the same iterates would be obtained as for CG on (QP). The conditions on B given in Lemma 3.2 and the conditions in all the following results will be sufficient conditions to generate search directions that satisfies condition (ii) of Proposition 3.. The reason we do not get necessary and sufficient conditions is due to the use of (7) to obtain the desired conugacy of the search directions. As B is updated according to (0), one would prefer to have conditions, in iteration, on the update matrix U instead of on the entire matrix B. Therefore, we now modify Lemma 3.2 by noting that equation (5) may be stated as n m B = B + U = I + γ v v T + = = η () u () (u () ) T, (6) where the last sum is U, the part that is added in iteration. We mae no assumption on m. Lemma 3.2 can be modified, recalling that one may spilt condition (ii) as B p i = Hp i, i 2, (7) B p = Hp. (8) Using (0) one can reformulate (7) and (8) in terms of U, see (ii) and (iii) of the following proposition. Given B i that satisfies Lemma 3.2 for i = 0,...,, i.e. {p 0,..., p } satisfies the conditions of Proposition 3., the following proposition gives sufficient conditions on the update matrix U in order for the sequence {p 0,..., p, p } to satisfy the conditions of Proposition 3.. Proposition 3.3. Let B 0 = I, so that p 0 = p CG 0 = g 0. Given {p 0,..., p } that satisfies the conditions of Proposition 3., if p is obtained from (6), B is obtained from (0), and U satisfies (i) u () K + (p 0, H) for all =,..., m, (ii) U p i = 0, i 2, and, (iii) U p = (H B )p, then {p 0,..., p, p } will satisfy the conditions of Proposition 3..

10 0 On the connection between CG and QN Proof. The proof is identical to the one of Lemma 3.2. If (6) is inserted into (6) one obtains n m p = g γ v v T p = = η () u () (u () n ) T p = g + ˆγ v = m = ˆη () u (). Since g K + (p 0, H), v K (p 0, H) K + (p 0, H), for all =,... n and, by assumption, u () K + (p 0, H) for all =,... m, it holds that p K + (p 0, H). Hence, p satisfies condition (i) of Proposition 3.. Conditions (ii) and (iii) are merely a reformulation of (7) with ρ i = for all i, and this is a sufficient condition for p to be conugate to {p 0,..., p } with respect to H, i.e. p satisfies condition (ii) of Proposition 3.. If in QN, for each = 0,..., n CG, the matrix U satisfies the conditions of Proposition 3.3, then one will obtain a sequence {p 0,..., p ncg } that satisfies the conditions of Proposition 3.. Hence, with exact line-search, the same iterates would be obtained as for CG on (QP). Next, we consider a modification of Proposition 3.3 where condition (ii) is rewritten in terms of the vectors {u () } that compose U, as m = η () u () Clearly the above holds if for all =,..., m, (u () ) T p i = 0, i 2. (9) (u () ) T p i = 0, i 2. (20) A necessary and sufficient condition for (20) to hold is that u () K (p 0, H), for all, since, by Lemma 2.2, span{p 0,..., p 2 } = K (p 0, H). Note that (20) is a potentially stronger condition on the vectors {u () } than (9). Hence, given B i that satisfies Lemma 3.2 for i = 0,...,, i.e. {p 0,..., p } satisfies the conditions of Proposition 3., the next proposition potentially gives stronger sufficient conditions on U than Proposition 3.3 in order for the sequence {p 0,..., p, p } to satisfy the conditions of Proposition 3.. Proposition 3.4. Let B 0 = I, so that p 0 = p CG 0 = g 0. Given {p 0,..., p } that satisfies the conditions of Proposition 3., if p is obtained from (6), B is obtained from (0), and U defined by (6) satisfies (i) u () (ii) m K (p 0, H) K + (p 0, H) for all =,..., m, and, = η() u () (u () ) T p = (H B )p, then {p 0,..., p, p } will satisfy the conditions of Proposition 3..

11 3 RESULTS If in QN, for each = 0,..., n CG, the matrix U satisfies the conditions of Proposition 3.4, then one will obtain a sequence {p 0,..., p ncg } that satisfies the conditions of Proposition 3.. Hence, with exact line-search, the same iterates would be obtained as for CG on (QP). All U that satisfies Proposition 3.4 will also satisfy Proposition 3.3. Next we state a result which will be needed in our further investigation, and which holds for any update scheme with approximation matrix B that generates search directions that satisfy the conditions of Proposition 3.. Hence, it holds for any update scheme defined by Proposition 3.3 or Proposition 3.4. Proposition 3.5. If B i for i = 0,..., is such that {p 0,..., p } satisfy the conditions of Proposition 3., then p T B p 0, unless B p = g = 0. Proof. Assume that p T B p = 0, but B p = g 0. Since p satisfies the conditions of Proposition 3. and {g i } i=0 form an orthogonal basis for K +(p 0, H), one may write p = β i g i = β i g i + β g. i=0 i=0 Introduce this expression in p T B p = p T g = 0, 0 = p T g = ( β i g i + β g ) T g = β g T g β = 0, i=0 where the last equality follows from the gradients being mutually orthogonal. However, since span{p 0,..., p } = span{g 0,..., g } = K + (p 0, H), one can only get β = 0 if K + (p 0, H) = K (p 0, H), i.e. x is the optimal solution and g = 0 which is a contradiction to our assumption. This implies that one does not get p T B p = 0 unless B p = 0. Hence, on (QP), p T B p = p T g 0, unless g = 0 for update schemes that generates search directions which satisfy the conditions of Proposition 3.. Note that this implies that (3) is satisfied for any QN method using an update scheme that satisfies Proposition 3.3 or Proposition 3.4. We will now investigate the updates defined by Proposition 3.3 and Proposition 3.4 separately, starting with Proposition 3.4 as it gives an intuitive picture of how the vectors {u () } have to be chosen in a two-dimensional space. 3. Updates matrices defined by Proposition 3.4 As for each, {g 0,..., g } form an orthogonal basis for K + (p 0, H) it holds that span{g, g } = K (p 0, H) K + (p 0, H). Hence, {u () } satisfying condition (i) of Proposition 3.4, is equivalent to expressing {u () } as a linear combination of g and g.

12 2 On the connection between CG and QN Due to the linear relationship it follows that g = B p, g = α Hp + g, span{g, g } = span{b p, Hp } = K (p 0, H) K + (p 0, H). This implies that {u () } satisfying condition (i) of Proposition 3.4, is equivalent to expressing {u () } as a linear combination of B p and Hp. Note that B p and Hp are linearly independent if and only if g and g are linearly independent and that this holds until the optimal solution has been found, i.e. only if g = 0, does it holds that α Hp = B p. So far, we have made no assumptions on the ran of U. By condition (i) of Proposition 3.4 the vectors {u () }, must be chosen in the 2-dimensional space K (p 0, H) K + (p 0, H). Hence, a general construction of U is given by U = η () u() (u() )T + η () 2 u() 2 (u() 2 )T. (2) For this U, ran(u ) 2, where equality holds as long as u () and u () 2 are linearly independent and η () 0, for =, 2. Hence, we can conclude that the conditions of Proposition 3.4 will define updates of at most ran two. In the following theorem, we show that Proposition 3.4 defines the one-parameter Broyden family of updates. Theorem 3.6. Assume g 0. A matrix U satisfies (i)-(ii) of Proposition 3.4 if and only if U can be expressed according to (), the one-parameter Broyden family, for some φ. Proof. We want to express U on a general form that satisfies the conditions of Proposition 3.4, and may do so in the following setting. As g 0, the vectors {B p, Hp } form a basis for K (p 0, H) K + (p 0, H), one may scale them and obtain the following basis, span{ p T Hp Hp, p T B B p } = K (p 0, H) K + (p 0, H). p Note that the division by p T B p in the scaling is well-defined according to Proposition 3.5, since our assumption is that we have used B i, i, that generate search directions which satisfies the conditions of Proposition 3.. For now we will drop the sub-indices on p, U and B. Then one may express (2) as ( U = ) p T Hp Hp p T Bp Bp M ( p T Hp pt H p T Bp pt B ), (22)

13 3 RESULTS 3 where M is any symmetric two-by-two matrix. Inserting this expression into condition (ii) yields ( ( ) ( ) ( p T Hp Hp p T Bp Bp p M T Hp pt Hp p T ) ) ( ) Hp 0 p T Bp pt Bp p T =. Bp 0 This implies that, since the first matrix has linearly independent columns, ( ) ( p M T ) ( ) Hp 0 p T =. Bp 0 One may express M as M = ( p T Hp 0 0 p T Bp ) + T, (23) where T is any symmetric matrix such that ( ) ( 0 T = 0 Hence, T has one eigenvalue which is zero and we may conclude that ran(t ) =. In addition the eigenvector corresponding to the nonzero eigenvalue is a multiple of ( ) T. Therefore, T may be written as ( T = ϕ ). ) ( ( ) ϕ ϕ = ϕ ϕ ), (24) where ϕ is a free parameter. One may scale ϕ as ϕ := φp T Bp, again using the fact that, by Proposition 3.5, it holds that p T Bp 0. Combining (22), (23) and (24), U may be expressed as, with U = p T Hp HppT H p T Bp BppT B + φp T Bpww T, w = p T Hp Hp p T Bp Bp. In effect, U describes the one-parameter Broyden family defined in (). The one-parameter Broyden family of updates completely describes the updates defined by Proposition 3.4. It can be shown that there is a unique value of the Broyden parameter φ for which the update matrix U, defined by (), is of ran one. This value is φ = and the corresponding update is called the symmetric ran-one update, SR, see, e.g., [3, Chapter 8]. It is well-nown that the SR-update is uniquely defined by condition (ii) of Proposition 3.4. This result is summarized in the following lemma, for a proof see, e.g., [, Chapter 9]. pt Hp p T (H B)p,

14 4 On the connection between CG and QN Lemma 3.7. Let U be such that ran(u) =, i.e. U = η u u T. Then if U satisfies condition (ii) of Proposition 3.4, then U is uniquely determined as U = η u u T = η u u T p = (H B)p, p T (H B)p (H B)p( (H B)p ) T. (25) In addition, it holds that u = ±(H B)p satisfies condition (i) of Proposition 3.4, u K (p 0, H) K + (p 0, H). We may conclude that for ran(u ) =, condition (ii) of Proposition 3.4 is by itself sufficient to obtain p that satisfies Proposition 3.. For ran(u ) = 2 both conditions of Proposition 3.4 are needed in order to define the update matrices. We will now discuss under which circumstances the one-parameter Broyden family is not well-defined. For the SR-update of (25) we state, in following lemma, the necessary and sufficient conditions for when this update not well-defined. Lemma 3.8. The SR-update defined by (25) is not well-defined if and only if α = and g 0. Proof. SR is undefined if and only if ( (H B )p ) T p = 0, (H B )p 0. Since p T Hp p T B p = 0 = pt B p p T Hp = pt g p T Hp and 0 (H B )p 0 (g g ) g = g, α the statement of the lemma holds. = α, If iteration gives α = and g 0, i.e., x is not the optimal solution, then U will be undefined if the SR-update (25) is used to update B. Hence, taing the unit steplength in an iteration is an indication that one needs to choose a different update scheme in that iteration. Note that the undefinedness for SR enters in the Broyden parameter φ SR := pt Hp p T (H B)p. For all well-defined values of φ it holds that the one-parameter Broyden family given by () is not well-defined if and only if p T Bp = 0 and Bp 0. It is clear that requiring B to be positive definite is sufficient to avoid U being undefined. However, from Proposition 3.5 it follows that, on (QP), this undefinedness does not occur for any update scheme that generates search directions that satisfy the conditions of Proposition 3.. Hence, based on Theorem 3.6, Proposition 3.5 and Lemma 3.8, we may pose the following corollary giving necessary and sufficient conditions for when the oneparameter Broyden family is well-defined on (QP).

15 3 RESULTS 5 Corollary 3.9. Unless φ = φ SR in () and the unit steplength is taen in the same iteration, then U defined by (), the one-parameter Broyden family, is always well-defined on (QP). Our results hold for the entire one-parameter Broyden family and it is wellnown that (0) does not have the property of hereditary positive definiteness for all updates in this family, e.g. SR. We may therefore conclude that hereditary positive definiteness is not a necessary property of an update scheme in order for it to produce search directions that satisfy Proposition 3.. Also, as was mentioned above, positive definiteness of B is not a necessary property to guarantee that the updates are well-defined on (QP). 3.2 Updates matrices defined by Proposition 3.3 We now turn to Proposition 3.3 to determine if the seemingly weaer conditions (i) u () K + (p 0, H) for all =,..., m, (ii) U p i = 0, i 2, (iii) U p = (H B )p, yield a larger class of updates than the conditions of Proposition 3.4. It turns out that this is not the case, and in the following theorem we show that Proposition 3.3 also defines the one-parameter Broyden family of updates. Theorem 3.0. Assume g 0. A matrix U satisfies (i)-(ii) of Proposition 3.3 if and only if U can be expressed according to (), the one-parameter Broyden family, for some φ. Proof. As span{g 0,..., g } = K + (p 0, H), a general form for U that satisfies condition (i) of Proposition 3.3 would be U = GMG T, where G = ( ) g 0 g R n + and M R + + is any symmetric matrix. Then conditions (ii) and (iii) may be written as (ii) GMG T p i = 0, i 2, (iii) GMG T p = (H B )p = ( α )g + α g. Let and ( α = P = ( p 0 p ) R n, 0 0 ( α ) α ) R +, ᾱ = ( 0 + α T ) R +,

16 6 On the connection between CG and QN then conditions (ii) and (iii) may be written as GMG T P = Gᾱ G(MG T P ᾱ) = 0. Since G is a matrix whose columns are linearly independent it holds that MG T P ᾱ = 0 MG T P = ᾱ. (26) Note that, since the matrix G T P has the form go T p 0 g0 T p g0 T p G T g T P = p 0 g T p g T p = g T p 0 g T p g T p go T p 0 g0 T p g0 T p 0 g T p g T p g T p 0 0 0, and since ᾱ has only two non-zero components, M in (26) has to be on the form M = , 0 0 m, m, 0 0 m, m, with m, = g T p ( ), α m, = m, = g T p α and m, = ϕ, a free parameter. Hence, one may write a general U that satisfies the conditions of Proposition 3.3 as U = ( ) ( g g g T p ( ) α ) ( ) g T p α g. (27) ϕ g T p α Recalling that span{g, g } = K (p 0, H) K + (p 0, H), it turns out that the conditions of Proposition 3.3 restrict the choice of {u } to the subspace K (p 0, H) K + (p 0, H). Hence, it follows that the updates defined by Proposition 3.3 satisfy Proposition 3.4, and the statement of the theorem follows from Theorem 3.6. In effect, the sufficient conditions on U of the two propositions 3.3 and 3.4 are equivalent and they define the one-parameter Broyden family of updates according to theorems 3.6 and 3.0. From these results we may draw the conclusion that, in the framewor where we use the sufficient condition (7) to guarantee conugacy of the search directions, the update schemes in QN that give the same sequence of iterates as CG is completely described by the one-parameter Broyden family. g

17 4 CONCLUSIONS 7 4 Conclusions According to our analysis, if one in a QN method chooses B to satisfy the conditions of Theorem 3.2 or equivalently U to satisfy the conditions of Proposition 3.3 or 3.4, then the generated search directions will satisfy the conditions of Proposition 3.. Hence, with exact line-search, the same iterates would be obtained as for CG on (QP). As is shown in Theorem 3.6 and 3.0 the one-parameter Broyden family of updates is defined by these results. In the derivation it becomes clear that it is the fact that one chooses the components {u () } of the update matrix U in the subspace K (p 0, H) K + (p 0, H), i.e. the two last added dimensions of the Krylov subspace, that guarantees that the generated p will be such that {p 0,..., p } satisfy the conditions of Proposition 3.. What is often mentioned as a feature of the one-parameter Broyden family, using information only from the current and previous iteration, is in fact the condition that guarantees this behavior. Also, the fact that the update matrices in () are of low ran is a consequence of satisfying this condition. This is what distinguishes the one-parameter Broyden family of updates from any B satisfying (7). We believe that this extended nowledge on the connection between QN using the one-parameter Broyden family and CG is valuable as it gives a deeper understanding of the impact of the choices made in different update schemes of QN. In our framewor, where the derivation of conugate directions are based on (7), the one-parameter Broyden family completely describes the updates which will render QN to produce identical iterates as CG on (QP). It seems as if it is the sufficient requirement to get conugate directions, B p i = Hp i, i, that limits the freedom when choosing B to the one-parameter Broyden family of updates as shown by the above results. Since this condition is only sufficient, one may still pose the question if there are other update schemes that yield the same sequence of iterates as CG on (QP). We believe that in order to understand this possible limitation it will be necessary to tae a closer loo at what really happens in CG. We are also able to mae the following remars regarding when the one-parameter Broyden family is not well-defined. The only time when these updates are not welldefined on (QP) is when the steplength is of unit length and in the same iteration the ran-one update is used. This means that one can employ the ran-one update scheme when solving (QP) and only switch to a scheme with some ran-two update 7 from the one-parameter Broyden family when the unit steplength is taen. Furthermore, it is clear from our results that the concept of hereditary positive definiteness, or ust hereditary definiteness for (0) is not relevant when solving (QP). However, on problems with general convex functions this would be a sufficient condition to mae sure that B does not become undefined for ran two. In fact, the necessary and sufficient condition that needs to be satisfied to avoid undefinedness 7 Or a ran-one update scheme defined for some ρ i in (7).

18 8 On the connection between CG and QN for U of ran two, is p T Bp = p T g 0. This condition is, as mentioned, identical to (3), the condition that guarantees the descent property in exact linesearch. On (QP), this condition will always be fulfilled according to Proposition 3.5 for any update scheme that generates search directions which satify the conditions of Proposition 3.. In this paper we have focused on quadratic programming. Besides being important in its own right, it is also a highly important as a subproblem when solving unconstrained optimization problems. For a survey on methods for unconstrained optimization see, e.g., [4]. A further motivation for this research is that the deeper understanding of what is important in the choice of U could be implemented in a limited-memory QN method. I.e., can one choose which columns to save based on some other criteria than ust picing the most recent ones? See, e.g., [3, Chapter 9], for an introduction to limited-memory QN methods.

19 REFERENCES 9 References [] Broyden, C. G. Quasi-Newton methods and their application to function minimisation. Math. Comp. 2 (967), [2] Bucley, A. G. A combined conugate-gradient quasi-newton minimization algorithm. Math. Programming 5, 2 (978), [3] Davidon, W. C. Variable metric method for minimization. SIAM J. Optim., (99), 7. [4] Dixon, L. C. W. Quasi-Newton algorithms generate identical points. Math. Programming 2 (972), [5] Fletcher, R., and Powell, M. J. D. A rapidly convergent descent method for minimization. Comput. J. 6 (963/964), [6] Fletcher, R., and Reeves, C. M. Function minimization by conugate gradients. Comput. J. 7 (964), [7] Gill, P. E., Murray, W., and Wright, M. H. Practical optimization. Academic Press Inc. [Harcourt Brace Jovanovich Publishers], London, 98. [8] Gutnecht, M. H. A brief introduction to Krylov space methods for solving linear systems. In Frontiers of Computational Science (2007), Kaneda, Y and Kawamura, H and Sasai, M, Ed., Springer-Verlag Berlin, pp International Symposium on Frontiers of Computational Science, Nagoya Univ, Nagoya, Japan, Dec 2-3, [9] Hestenes, M. R., and Stiefel, E. Methods of conugate gradients for solving linear systems. J. Research Nat. Bur. Standards 49 (952), (953). [0] Huang, H. Y. Unified approach to quadratically convergent algorithms for function minimization. J. Optimization Theory Appl. 5 (970), [] Luenberger, D. G. Linear and nonlinear programming, second ed. Addison- Wesley Pub Co, Boston, MA, 984. [2] Nazareth, L. A relationship between the BFGS and conugate gradient algorithms and its implications for new algorithms. SIAM J. Numer. Anal. 6, 5 (979), [3] Nocedal, J., and Wright, S. J. Numerical optimization. Springer Series in Operations Research. Springer-Verlag, New Yor, 999. [4] Powell, M. J. D. Recent advances in unconstrained optimization. Math. Programming, (97), [5] Saad, Y. Iterative methods for sparse linear systems, second ed. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2003.

20 20 On the connection between CG and QN [6] Shewchu, J. R. An introduction to the conugate gradient method without the agonizing pain. Tech. rep., Pittsburgh, PA, USA, 994.

A general Krylov method for solving symmetric systems of linear equations

A general Krylov method for solving symmetric systems of linear equations A general Krylov method for solving symmetric systems of linear equations Anders FORSGREN Tove ODLAND Technical Report TRITA-MAT-214-OS-1 Department of Mathematics KTH Royal Institute of Technology March

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Notes on Some Methods for Solving Linear Systems

Notes on Some Methods for Solving Linear Systems Notes on Some Methods for Solving Linear Systems Dianne P. O Leary, 1983 and 1999 and 2007 September 25, 2007 When the matrix A is symmetric and positive definite, we have a whole new class of algorithms

More information

Preconditioned conjugate gradient algorithms with column scaling

Preconditioned conjugate gradient algorithms with column scaling Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Preconditioned conjugate gradient algorithms with column scaling R. Pytla Institute of Automatic Control and

More information

Quasi-Newton Methods

Quasi-Newton Methods Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

THE RELATIONSHIPS BETWEEN CG, BFGS, AND TWO LIMITED-MEMORY ALGORITHMS

THE RELATIONSHIPS BETWEEN CG, BFGS, AND TWO LIMITED-MEMORY ALGORITHMS Furman University Electronic Journal of Undergraduate Mathematics Volume 12, 5 20, 2007 HE RELAIONSHIPS BEWEEN CG, BFGS, AND WO LIMIED-MEMORY ALGORIHMS ZHIWEI (ONY) QIN Abstract. For the solution of linear

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 20 1 / 20 Overview

More information

Chapter 10 Conjugate Direction Methods

Chapter 10 Conjugate Direction Methods Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, 2012 1 Wei-Ta Chu 2012/4/13 Introduction Conjugate direction methods can be viewed as being intermediate between the method

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S. Bull. Iranian Math. Soc. Vol. 40 (2014), No. 1, pp. 235 242 Online ISSN: 1735-8515 AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Improved Damped Quasi-Newton Methods for Unconstrained Optimization Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Numerical Optimization: Basic Concepts and Algorithms

Numerical Optimization: Basic Concepts and Algorithms May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some

More information

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS by Johannes Brust, Oleg Burdaov, Jennifer B. Erway, and Roummel F. Marcia Technical Report 07-, Department of Mathematics and Statistics, Wae

More information

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method. Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

Iterative Linear Solvers

Iterative Linear Solvers Chapter 10 Iterative Linear Solvers In the previous two chapters, we developed strategies for solving a new class of problems involving minimizing a function f ( x) with or without constraints on x. In

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Unconstrained Multivariate Optimization

Unconstrained Multivariate Optimization Unconstrained Multivariate Optimization Multivariate optimization means optimization of a scalar function of a several variables: and has the general form: y = () min ( ) where () is a nonlinear scalar-valued

More information

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright. John L. Weatherwax July 7, 2010 wax@alum.mit.edu 1 Chapter 5 (Conjugate Gradient Methods) Notes

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Reduced-Hessian Methods for Constrained Optimization

Reduced-Hessian Methods for Constrained Optimization Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its

More information

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number

University of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number Limited-Memory Matrix Methods with Applications 1 Tamara Gibson Kolda 2 Applied Mathematics Program University of Maryland at College Park Abstract. The focus of this dissertation is on matrix decompositions

More information

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren SF2822 Applied Nonlinear Optimization Lecture 9: Sequential quadratic programming Anders Forsgren SF2822 Applied Nonlinear Optimization, KTH / 24 Lecture 9, 207/208 Preparatory question. Try to solve theory

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

1 Numerical optimization

1 Numerical optimization Contents Numerical optimization 5. Optimization of single-variable functions.............................. 5.. Golden Section Search..................................... 6.. Fibonacci Search........................................

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

RESIDUAL SMOOTHING AND PEAK/PLATEAU BEHAVIOR IN KRYLOV SUBSPACE METHODS

RESIDUAL SMOOTHING AND PEAK/PLATEAU BEHAVIOR IN KRYLOV SUBSPACE METHODS RESIDUAL SMOOTHING AND PEAK/PLATEAU BEHAVIOR IN KRYLOV SUBSPACE METHODS HOMER F. WALKER Abstract. Recent results on residual smoothing are reviewed, and it is observed that certain of these are equivalent

More information

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS JOHANNES BRUST, OLEG BURDAKOV, JENNIFER B. ERWAY, ROUMMEL F. MARCIA, AND YA-XIANG YUAN Abstract. We present

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

Lecture 10: September 26

Lecture 10: September 26 0-725: Optimization Fall 202 Lecture 0: September 26 Lecturer: Barnabas Poczos/Ryan Tibshirani Scribes: Yipei Wang, Zhiguang Huo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

BFGS WITH UPDATE SKIPPING AND VARYING MEMORY. July 9, 1996

BFGS WITH UPDATE SKIPPING AND VARYING MEMORY. July 9, 1996 BFGS WITH UPDATE SKIPPING AND VARYING MEMORY TAMARA GIBSON y, DIANNE P. O'LEARY z, AND LARRY NAZARETH x July 9, 1996 Abstract. We give conditions under which limited-memory quasi-newton methods with exact

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax

More information

Optimization II: Unconstrained Multivariable

Optimization II: Unconstrained Multivariable Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Optimization II: Unconstrained

More information

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23 Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is

More information

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization

Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Xiaocheng Tang Department of Industrial and Systems Engineering Lehigh University Bethlehem, PA 18015 xct@lehigh.edu Katya Scheinberg

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Seminal papers in nonlinear optimization

Seminal papers in nonlinear optimization Seminal papers in nonlinear optimization Nick Gould, CSED, RAL, Chilton, OX11 0QX, England (n.gould@rl.ac.uk) December 7, 2006 The following papers are classics in the field. Although many of them cover

More information

First Published on: 11 October 2006 To link to this article: DOI: / URL:

First Published on: 11 October 2006 To link to this article: DOI: / URL: his article was downloaded by:[universitetsbiblioteet i Bergen] [Universitetsbiblioteet i Bergen] On: 12 March 2007 Access Details: [subscription number 768372013] Publisher: aylor & Francis Informa Ltd

More information

A NOTE ON Q-ORDER OF CONVERGENCE

A NOTE ON Q-ORDER OF CONVERGENCE BIT 0006-3835/01/4102-0422 $16.00 2001, Vol. 41, No. 2, pp. 422 429 c Swets & Zeitlinger A NOTE ON Q-ORDER OF CONVERGENCE L. O. JAY Department of Mathematics, The University of Iowa, 14 MacLean Hall Iowa

More information

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps. Conjugate Gradient algorithm Need: A symmetric positive definite; Cost: 1 matrix-vector product per step; Storage: fixed, independent of number of steps. The CG method minimizes the A norm of the error,

More information

New hybrid conjugate gradient methods with the generalized Wolfe line search

New hybrid conjugate gradient methods with the generalized Wolfe line search Xu and Kong SpringerPlus (016)5:881 DOI 10.1186/s40064-016-5-9 METHODOLOGY New hybrid conjugate gradient methods with the generalized Wolfe line search Open Access Xiao Xu * and Fan yu Kong *Correspondence:

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2017/18 Part 3: Iterative Methods PD

More information

On the acceleration of augmented Lagrangian method for linearly constrained optimization

On the acceleration of augmented Lagrangian method for linearly constrained optimization On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental

More information

Course Notes: Week 4

Course Notes: Week 4 Course Notes: Week 4 Math 270C: Applied Numerical Linear Algebra 1 Lecture 9: Steepest Descent (4/18/11) The connection with Lanczos iteration and the CG was not originally known. CG was originally derived

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Conjugate Directions for Stochastic Gradient Descent

Conjugate Directions for Stochastic Gradient Descent Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate

More information

The conjugate gradient method

The conjugate gradient method The conjugate gradient method Michael S. Floater November 1, 2011 These notes try to provide motivation and an explanation of the CG method. 1 The method of conjugate directions We want to solve the linear

More information

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction

More information

Mathematical optimization

Mathematical optimization Optimization Mathematical optimization Determine the best solutions to certain mathematically defined problems that are under constrained determine optimality criteria determine the convergence of the

More information

ENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK)

ENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK) Quasi-Newton updates with weighted secant equations by. Gratton, V. Malmedy and Ph. L. oint Report NAXY-09-203 6 October 203 0.5 0 0.5 0.5 0 0.5 ENIEEH-IRI, 2, rue Camichel, 3000 oulouse France LM AMECH,

More information

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A. AMSC/CMSC 661 Scientific Computing II Spring 2005 Solution of Sparse Linear Systems Part 2: Iterative methods Dianne P. O Leary c 2005 Solving Sparse Linear Systems: Iterative methods The plan: Iterative

More information

A SUFFICIENTLY EXACT INEXACT NEWTON STEP BASED ON REUSING MATRIX INFORMATION

A SUFFICIENTLY EXACT INEXACT NEWTON STEP BASED ON REUSING MATRIX INFORMATION A SUFFICIENTLY EXACT INEXACT NEWTON STEP BASED ON REUSING MATRIX INFORMATION Anders FORSGREN Technical Report TRITA-MAT-2009-OS7 Department of Mathematics Royal Institute of Technology November 2009 Abstract

More information

NonlinearOptimization

NonlinearOptimization 1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

New Hybrid Conjugate Gradient Method as a Convex Combination of FR and PRP Methods

New Hybrid Conjugate Gradient Method as a Convex Combination of FR and PRP Methods Filomat 3:11 (216), 383 31 DOI 1.2298/FIL161183D Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat New Hybrid Conjugate Gradient

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 4: Iterative Methods PD

More information

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH Jie Sun 1 Department of Decision Sciences National University of Singapore, Republic of Singapore Jiapu Zhang 2 Department of Mathematics

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University

More information

A projected Hessian for full waveform inversion

A projected Hessian for full waveform inversion CWP-679 A projected Hessian for full waveform inversion Yong Ma & Dave Hale Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA (c) Figure 1. Update directions for one iteration

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

Notes on Numerical Optimization

Notes on Numerical Optimization Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization

More information

Lec10p1, ORF363/COS323

Lec10p1, ORF363/COS323 Lec10 Page 1 Lec10p1, ORF363/COS323 This lecture: Conjugate direction methods Conjugate directions Conjugate Gram-Schmidt The conjugate gradient (CG) algorithm Solving linear systems Leontief input-output

More information

Arc Search Algorithms

Arc Search Algorithms Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where

More information

Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method

Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 14T-011-R1

More information