Institute of Computer Science. Limited-memory projective variable metric methods for unconstrained minimization

Size: px
Start display at page:

Download "Institute of Computer Science. Limited-memory projective variable metric methods for unconstrained minimization"

Transcription

1 Institute of Computer Science Academy of Sciences of the Czech Republic Limited-memory projective variable metric methods for unconstrained minimization J. Vlček, L. Lukšan Technical report No. V 036 December 2008 Pod Vodárenskou věží 2, Prague 8 phone: , fax: , luksan@cs.cas.cz, vlcek@cs.cas.cz

2 Institute of Computer Science Academy of Sciences of the Czech Republic Limited-memory projective variable metric methods for unconstrained minimization J. Vlček, L. Lukšan Technical report No. V 036 December 2008 Abstract: A new family of limited-memory variable metric or quasi-newton methods for unconstrained minimization is given. The methods are based on a positive definite inverse Hessian approximation in the form of the sum of identity matrix and two low rank matrices, obtained by the standard scaled Broyden class update. To reduce the rank of matrices, various projections are used. Numerical experience is encouraging. Keywords: Unconstrained minimization, variable metric methods, limited-memory methods, Broyden class updates, projection matrix, numerical results. This work was supported by the Grant Agency of the Czech Academy of Sciences, project No. IAA030405, and the Institutional research plan No. AV0Z , L. Lukšan is also from Technical University of Liberec, Hálkova 6, 46 7 Liberec.

3 Introduction In this report we present a new family of limited-memory variable metric (VM) line search methods for unconstrained minimization, which are projective, ie. they reduce the rank of matrices in the inverse Hessian approximation by suitable projection. VM line search methods, see [5], [3], are iterative. Starting with an initial point x 0 R N, they generate iterations x k+ R N by the process x k+ = x k + s k, s k = t k d k, k 0, where d k is the direction vector and t k > 0 is a stepsize. We assume that the problem function f : R N R is differentiable, d k = H k g k and stepsize t k is chosen in such a way that f k+ f k ε t k g T k d k, g T k+d k ε 2 g T k d k, (.) k 0, where 0 < ε < /2, ε < ε 2 <, f k = f(x k ), g k = f(x k ) and H k is a symmetric positive definite matrix. We denote B k = Hk, y k = g k+ g k, k 0 and by. F the Frobenius matrix norm. Various possibilities how to construct positive definite inverse Hessian approximations based on the low rank matrices are discussed in Section 2. In Section 3 we show how properties of the standard scaled BFGS update can be utilized and generalized to construct new limited-memory methods. Numerical results are presented in Section 4. 2 Positive definite inverse Hessian approximation A simple form of positive definite matrices H k based on low rank matrices is ζ k I+U k U T k, k 0, where ζ k > 0 and U k are N m k rectangular matrices, m k N, but updating matrices of this form appears to be difficult. To avoid this drawback, shifted VM methods, see [7], were developed. They appeared to be very efficient, except for ill-conditioned problems, which is probably caused by the fact that shifted VM updates are not invariant under linear transformations (significance of the invariance property of methods for solution of ill-conditioned problems is discussed in [3]). Another possibility, how to update matrices of this form, is described in [8]. Here invariant variationally-derived updates are applied, but VM matrices cannot be directly used to calculate direction vectors d k, and thus complicated corrections are necessary. Our new methods are based on matrices H k of the form H k = ζ k I + U k U T k R k R T k, (2.) k 0, where ζ k > 0 and U k, R k are N min(k, m) rectangular matrices, m N. Most of VM matrices in standard VM methods, see [3], [5], can be written in this way. These methods set usually H 0 = I and H k+ is obtained from γ k H k (γ k is a scaling parameter) by a rank-two VM update to satisfy the quasi-newton condition; its generalized form is H k+ y k = ϱ k s k, (2.2) where ϱ k > 0 is a nonquadratic correction parameter (see [5]). Therefore matrices H k can be easily updated in starting iterations, i.e. for k < m. When k m, the rank of matrices U, R needs to be reduced before each update and

4 updating formulas modified, which is described in Section 2.. Some conditions to have the result positive definite are discussed in Section 2.2. We use in this report matrices U k, R k with the same number of columns only for simplicity. Numerical experiments indicate that the number of columns of R k can be substantially reduced without significant increasing of the number of function and gradient evaluations, which can increase efficiency. All methods here can be easily adapted in this way. For given q k R N, qk T y k 0, we denote by V qk the projection matrix I q k yk/q T k T y k. To simplify the notation we frequently omit index k and replace index k+ by symbol +. In the subsequent analysis we use the following notation Note that b > 0 by (.). C = ζi RR T, b = s T y, V = V s = I (/b)sy T. 2. Limited-memory VM matrices updating To be able to add next columns, when matrices U, R have rank m, we first need to make their columns dependent, i.e. we need matrices U, R, near to U, R, and suitable vectors z, z 2 R m, satisfying U z = 0, R z 2 = 0. Such matrix U can be in a general way written as the product of any N m matrix and orthogonal projection matrix P = I z z T /z T z = I z z T, z = z / z, similarly for matrix R with P 2 = I z 2 z T 2, z 2 = z 2 / z 2, instead of P. The following lemma can be used to find the nearest (in the sense of Frobenius matrix norm) matrices to U, R. Lemma 2.. Let T be symmetric positive definite, q R N, z R m, z 0, m N, and denote W the set of N m matrices, W W. Then the unique solution to is min{ T /2 (W W ) F : W W} s.t. W z = 0 (2.3) W = W P, P = I zz T /z T z, (2.4) independently on the choice of T. Moreover, we can write W W T + qq T = ( W + qz T / z ) ( W + qz T / z ) T. (2.5) Proof. Setting W = (w,..., w m ), W = (w,..., w m ), z = (ξ,..., ξ m ) T, we define Lagrangian function L = L(W, e), e R N, as L = 2 T /2 (W W ) 2 F + e T W z = m m (w i w i ) T T (w i w i ) + ξ i e T w i. 2 i= i= A local minimizer W satisfies the equations L/ w i = 0, i =,..., m, which gives T (w i w i ) + ξ i e = 0, i =,..., m, yielding W = W T ez T. Using the condition W z = 0, we have T e = W z/z T z, i.e. W = W P. Matrix W satisfies (2.3) in view of convexity of Frobenius norm and the rest follows immediately from (2.4). 2

5 In view of Lemma 2. we set U = UP, R = RP 2, C = ζi R R T, H = C + U U T. (2.6) If matrix H is positive definite and we choose a suitable standard VM update of H in the form (/γ)h + = H + uu T rr T which maintain positive definiteness of VM matrices, limited-memory updating of H can be completely covered by Lemma 2.. Writing H + = ζ + I R + R++U T + U+ T and using (2.5), we can set ζ + = γζ, (/ γ)u + = UP +u z T, (/ γ)r + = RP 2 + r z 2 T. Unfortunately, in general case we cannot find vectors z, z 2 which guarantee positive definiteness of H. Some conditions that ensure it will be given in the next section. 2.2 Conditions for positive definiteness of reduced VM matrices In view of P 2 i = P i, i =, 2, we can write U U T = UP U T = UU T U z z U T, R R T = RP 2 R T = RR T R z 2 z 2 R T. (2.7) In view of C = C + R z 2 z 2 R T by (2.6), we see that situation is quite simple, when matrix C is positive definite, since then also matrix C and H = C + U U T are positive definite for any z, z 2. The next section will be devoted to such standard VM updates that can be adapted to maintain this property. Even if matrix C is not positive definite, the choice of z 2 has no influence to preserving of positive definiteness of H after reduction of matrix R, since ζi R R T + UU T = H+R z 2 z 2 R T by (2.7). The following lemma shows one possibility how to attain positive definiteness of the VM matrix after reduction of matrix U. Lemma 2.2. Let A = H + vv T, v R N, and denote λ(a), λ(h) the minimum eigenvalues of A, H. If [ U, R ] T v = 0 then λ(h) = λ(a). Conditions U T v = 0, R T v = 0 can be satisfied e.g. by the choice z = U T v, z 2 = R T v. Proof. We first show that λ(a) ζ. Denoting K = {q R N : [ U, v] T q = 0, q = }, we have λ(a) = min q = qt Aq min q T Aq = min q T (ζi R R T )q ζ. q K q K We will consider the following two cases: (a) If λ(h) = ζ, then λ(h) = λ(a) holds, since ζ = λ(h) λ(a) ζ. (b) If λ(h) ζ, then H w = λ(h) w, w 0, implies U U T w R R T w = (λ(h) ζ)w and thus w K 2 = {q range([ U, R ]) : q 0}. By [ U, R ] T v = 0 we get λ(h) λ(a) = min q 0 q T Aq q T q min q T Aq q K 2 q T q = min q T H q q K 2 q T q wt H w w T w = λ(h), therefore we again obtain λ(h) = λ(a). If z = U T v, we have U T v = P U T v = P z = 0 and similarly R T v = 0 for z 2 = R T v. 3

6 Note that in our case v = U z by (2.7), thus the choice z = U T v leads to the eigenvalue problem U T Uz = z z ; in view of (2.7) the solution of this problem minimizes matrix UU T damage, measured by U z z U T = U z 2 = z U T Uz /z T z, if we choose as z the eigenvector corresponding to the smallest eigenvalue of U T U. This is valuable also in situations when we attain positive definiteness of the VM matrix after reduction of matrix U in another way. 3 New limited-memory methods In this section we focus on adaptation of standard VM updates to maintain positive definiteness of matrix C. Having reduced matrix C positive definite, we want to find updated matrix H + satisfying the quasi-newton condition (2.2), in the form H + = C + + U + U T +, where C + is positive definite. An easy way to do it is shown in Section 3. in case of the scaled BFGS method, see [5]. The variant of this approach based on the partly inverse representation of matrix C is described in Section 3.2. These results are generalized in Section 3.3, using variational approach, and in Section 3.4, using a special transformation. The choice of parameters, including z and z 2, is discussed in Section Adaptation of the scaled BFGS method The standard scaled BFGS update of H can be written in the form (see [5]) BF GS H + = ϱ γ γb sst + V HV T, (3.) which in view of H = C + U U T after easy rearrangements leads to where BF GS H + = BF GS C + + V U U T V T = C rr T + uu T + V U U T V T, (3.2) γ γ u = (/ ωb) (ωs C y), r = (/ ωb) C y, ω = ϱ/γ + ã/b, ã = y T C y. (3.3) If we define (/γ)c + = C rr T, we can easily show that matrix C + is positive definite. Lemma 3.. Let matrix C be positive definite and vector r be given by (3.3). Then matrix C rr T is also positive definite. Proof. For any q R N we obtain by the Schwarz inequality, (3.3) and b > 0 q T (C rr T )q q T C q T = (qt C y) 2 (ωb) q T C q T ã ωb = ã bϱ/γ + ã = bϱ/γ bϱ/γ + ã > 0. Replacing C rr T in (3.2) by (/γ)c +, we see that to get H + = C + + U + U T +, (3.4) 4

7 we need matrix U + satisfying (/γ)u + U T + = V U U T V T + uu T. Similarly, considering that C = ζi R R T, we see that to get C + = ζ + I R + R T +, (3.5) we need ζ + > 0 and matrix R + satisfying (/γ)r + R T + = R R T + rr T. By (2.6) we can fulfil all these requirements if we define ζ + = γζ, (/ γ)u + = V U + u z T, (/ γ)r + = R + r z T 2. (3.6) For better clearness, we summarize the updating process. First we choose vectors z, z 2, see recommendations in Section 3.5, and then we determine matrices U, R, C, H by (2.6) and compute vectors u, r by (3.3). Finally we compute number ζ + and matrices U +, R + by (3.6) and determine matrices C + by (3.5) and H + by (3.4). In starting iterations we omit reductions of matrices U, R, i.e. C = C, U = U, R = R and simply add column u to V U and r to R. Note that then H + is the BFGS update of H, unlike C +, which is not any standard VM update of matrix C. 3.2 Partly inverse representation of the scaled BFGS method In the previous method we can change ζ only by scaling (3.6) to guarantee that C + will be positive definite. This drawback can be overcome when we use partly inverse representation of the scaled BFGS update. In starting iterations we can in (3.2) rewrite the BFGS update of matrix C = C in the form ( BF GS C + = C + γ ) γ ϱb yyt + uu T (3.7) with u given again by (3.2), which can be readily verified. Thus instead of (/γ)c + = C rr T, we can use equivalent formula C + = γ C + ϱb yyt. (3.8) Starting with C 0 = I and updating C0,..., Ck according to (3.8), we obtain C = C = (/ γ) I + Y DY T, (3.9) where γ = γ γ k, Y = [y,..., y k ] and diagonal positive definite matrix D is given by D = [/(ϱ b )] and by update formula ( D + = diag γ D, ). (3.0) ϱb In view of (3.9), we store matrices Y, D instead of R. When k m, we reduce matrix U in the same way as before, i.e. U = UP, while in case of matrices Y, D, we can simply delete their first columns, i.e. supposing that Y = [y k m,..., y k ], D = diag(d,..., d m ) and C = (/ζ) I + Y DY T, ζ > 0, we set Y = [y k m+,..., y k ], D = diag(d 2,..., d m ), C = (/ζ) I + Y D Y T. (3.) 5

8 To get H + = C + + U + U+, T we define ζ +, U + again by (3.6) and C + in view of (3.8) by C+ = ( I + Y + D + Y+ T, Y + = [Y, y], D + = diag ζ + γ D, ). (3.2) ϱb To be able to compute C y, we store m m matrix Y T Y and use the Woodbury formula, which yields ( ) ( ) C = ζ I + Y D Y T = ζi ζy ζ D + Y T Y Y T, (3.3) i.e. to compute C y, we need to solve a linear system with symmetric positive definite matrix (/ζ) D + Y T Y. Note that during computation of C y we can save vector Y T y and use it in updating of matrix Y T Y, since by (3.2) we obtain ( Y Y+ T T Y Y T ) y Y + = y T Y y T. (3.4) y Now we summarize the updating process. First we choose vector z, see recommendations in Section 3.5, and then we determine matrices U by (2.6), Y, D, C by (3.) and H by (2.6) and compute vector u by (3.3), using (3.3). Finally we compute number ζ + and matrix U + by (3.6) and determine matrices Y +, D +, C + by (3.2), using (3.4), and H + by (3.4). If we wish to use a different projection instead of deleting the first column of Y and D (although we are not able to say in this time, how to choose a suitable projection), we can use another representation of C. In starting iterations we rewrite (3.9) in the form C = C = (/ γ) I + Ỹ Ỹ T, Ỹ = Y D /2. (3.5) Then for k m we reduce matrix U as before, i.e. U = UP, and matrix Ỹ in a similar way, i.e. Ỹ = Ỹ P 3, P 3 = I z 3 z 3 T, z 3 = z 3 / z 3, with some z 3 R m, and set C = (/ζ) I + Ỹ Ỹ T, ζ > 0. (3.6) To get H + = C + + U + U T +, we define ζ +, U + again by (3.6) and C + in view of (3.8) by C + = ζ + I + Ỹ+Ỹ T +, Ỹ + = γ Ỹ + ϱb y z T 3. (3.7) To be able to compute C y, we store m m matrix Ỹ T Ỹ and use the Woodbury formula, which yields ( ) ( ) C = ζ I + Ỹ Ỹ T = ζi ζỹ ζ I + Ỹ T T Ỹ Ỹ. Note that during computation of C y we can again save vector Ỹ T y and use it in updating of matrix Ỹ T Ỹ, since by (3.7) we obtain Ỹ+ T Ỹ+ = γ Ỹ T Ỹ + ) T (Ỹ y z T ϱγb 3 + z 3 y T Ỹ + yt y ϱ b z 3 z 3 T. 6

9 3.3 Variationally-derived generalization Although the scaled Broyden class updates of H with positive value of parameter η can be written in the quasi-product form similar to (3.), see [7], we cannot construct limited-memory methods in a similar way as in the previous sections, since then projection matrix contains vector H y C y and therefore we do not obtain the relation similar to (3.2). In spite of that, it is possible to generalize this process for the standard scaled Broyden class updates, as we show in the next section. In this section we briefly describe two other possibilities. The first approach is based on the projection variant, see [8], of the well-known Greenstadt s theorem, see [4]: Theorem 3.. Let M, T be symmetric matrices, T positive definite, ϱ > 0, p = T y and denote M the set of N N symmetric matrices. Then the unique solution to min{ T /2 (M + M)T /2 F : M + M} s.t. M + y = ϱs is determined by the relation V p (M + M) V T p = 0 and can be written in the form M + = E + V p (M E) V T p, where E is any symmetric matrix satisfying Ey = ϱs, e.g. E = (ϱ/b)ss T. Using this theorem with M = γh = γ(c + U U T ), M + = H +, p = s α C y, α R, and E = (ϱ/b)ss T, we obtain γ H + = V p U U T Vp T + γ C BC +, γ C BC + = ϱ ss T γ b ( + V p C ϱ ss T ) Vp T, (3.8) γ b where by Lemma 2.3 in [8], this C BC + is the Broyden class update, see [5], of C with parameter b 2 ( η C = α 2 ϱ ) ã. (3.9) (b αã) 2 γ b For given η C satisfying 0 < η C < ω γ ϱ, ω = ϱ γ + ã b η C, (3.20) we can readily verify that the corresponding α can be obtained from α = η C + η C + ( η C )ϱb/(γã) ω or α 2 = (b/ã)(η C ) η C + η C + ( η C )ϱb/(γã). (3.2) Using the usual form of the Broyden class update of C in (3.8), after straightforward arrangements we obtain the relation, similar to (3.2) γ H + = γ C BC + + V p U U T Vp T = C r r T + ũũ T + V p U U T V T p, (3.22) 7

10 where ũ = ω b ( s η ) C ω C y, r = ϱ η γ C ω C y ã. (3.23) Note that H + is not here the Broyden class update of H. Similarly as in Section 3. we define (/γ)c + = C r r T = ζi R R T r r T. The following lemma gives simple conditions for matrix C + to be positive definite. Lemma 3.2. Let matrix C be positive definite, η C satisfy (3.20) and vector r be given by (3.23). Then matrix C r r T is also positive definite. Proof. For any q R N we obtain by the Schwarz inequality and b > 0 q T (C rr T ) q q T C q T = ( ϱ η γ C ω ) (q T C y) 2 ã q T C q T ( ϱ η γ C ω ) ϱ η = γ C ω > 0. To get H + = C + + U + U T +, we can similarly as in (3.6) for k m set ζ + = γζ, (/ γ)u + = V p U + ũ z T, (/ γ)r + = R + r z T 2, (3.24) for k < m simply add column ũ to V p U and r to R. The second approach utilizes idea of shifting, similarly as shifted VM methods, see [7]. We again update C, using the scaled standard Broyden class update (3.8) in the usual form (/γ)c BC + = C r r T +ũũ T with r, ũ given by (3.23), set (/γ)c + = C r r T and update ζ and R according to (3.24). To update matrix U, we use shifted quasi-newton condition U + U T +y = ϱ s, (3.25) where in view of H + = C + + U + U+, T (2.2) and (3.23) we define s = ( H+ y C + y ) = s γ ( C y r T y r ) = s γ [ ϱ ϱ ϱ C y ( ϱ η γ C ω ) ] = s η C C y. (3.26) ω Important property of shifting defined in this way is that number b = s T y is always positive, since by (3.26) b = ( s η C ω C y) T y = b η C ω ã = b ω ã η C ω Now we define matrix U + by the following theorem, see [7]. = (ϱ/γ) b ω > 0. (3.27) Theorem 3.2. Let T be a symmetric positive definite matrix, ϱ > 0, γ > 0, p = T y and denote U the set of N m, m N, matrices. Then the unique solution to is min{ϕ(u + ) : U + U} s.t. (3.25), ϕ(u + ) = y T T y T /2 (U + γ U) 2 F, (/ γ)u + = V p UP + (/ b) sz T, z = (/ γ)u T +y, P = I zz T /z T z. 8

11 It follows from (3.25) and (3.27) that vector z defined by this theorem satisfy z T z = (/γ)y T U + U+y T = (ϱ/γ) s T y = b 2 (ϱ/γ)/ b = b 2 ω/b, which yields by (3.26) and (3.23) ( z T z/ b) s = ω/b s = ũ. Setting z = z, Theorem 3.2 gives for U + the updating formula (/ γ)u + = V p U + ( zz T / b) s z T = V p U + ũ z T, (3.28) which is the corresponding formula in (3.24) with V p instead of V p, i.e. we see that the projection matrix in this update formula can be derived not only from update of matrix C, but also in another way, e.g. from update of H. Summarizing the updating process, we see that it is very similar to the process in Section 3.. First we choose vectors z, z 2, see recommendations in Section 3.5, and then we determine matrices U, R, C, H by (2.6) and compute vectors ũ, r by (3.23). Finally we compute number ζ + and matrices U + by (3.24) or (3.28) and R + by (3.24) and determine matrices C + by (3.5) and H + by (3.4). Previous methods can also be modified to use the partly inverse representation of updates. We can readily verify the following generalization of (3.7) for the scaled standard Broyden class update with parameter η C 0 [ ( γ C BC + = C ηc + η C + ã ) ] γ yy T + ũũ T, (3.29) b ϱ ã which can be used instead of (/γ)c BC + = C r r T + ũũ T in (3.22). 3.4 Generalization to the scaled Broyden class using transformation Methods based on adaptation of the scaled BFGS method can be generalized to the standard Broyden class updates with parameter η if we use the following special representation of these updates. We denote a = y T H y, ω = ϱ γ + a b η, µ = η + ( η) ϱ γ Theorem 3.3. Let ϱ > 0, γ > 0, a ω 0, µ 0 and β = (η ± µ)/ω. Then the scaled standard Broyden class update of H with parameter η, scaling parameter γ and nonquadratic correction parameter ϱ can be expressed in the form γ H BC + = ϱ γ η ŝŝt b + ˆV H ˆV T, ŝ = s βh y, ˆV = I ± b a. µ b ŝyt. (3.30) Moreover, if ˆω = (ϱ/γ)η + (ã/b)µ > 0, we can write γ H BC + = C ˆrˆr T + ûû T + ˆV ˆω U U T ˆV T, ˆr = µˆωb C y, û = ŝ ± ˆr. (3.3) b 9

12 Proof. Consider the scaled Broyden class update with parameters η, γ and ϱ in the form, see [5], γ H BC + = H + ω b sst η ( H ys T + sy T H ) + η b a H yyt H. Setting s = ŝ + ξh y, ξ R, we obtain γ H BC + = H + ω b ŝŝt + ξω η ( H yŝ T + ŝy T H ) ( ) η + + ξ2 ω 2ξη H yy T H. b a b The last term vanishes for ξ 2 ω 2ξη + (b/a)(η ) = 0, i.e. for ξ = (η ± µ)/ω = β; then ξω η = ± µ and γ H BC + = H + ω b ŝŝt + ± µ ( H yŝ T + ŝy T H ) = b ˆV H ˆV T + (ω a ) ŝŝ T b µ b, which is (3.30) in view of bω a µ = (ϱ/γ)b[ ( η)] + aη aη = (ϱ/γ)ηb. Setting H = C + U U T into (3.30), we get γ H BC + = ϱ γ η ŝŝt b + ˆV U U T ˆV T + ( µ I ± ) b ŝyt C ( µ I ± ) b yŝt = ˆV U U T ˆV T + C + ± µ ( C yŝ T + ŝy T C ) ( ϱ + b γ η + ã ) ŝŝ T b µ b = ˆV U U T ˆV T + C + ˆω (ŝ µ µ ± b ˆω C y)( ŝ ± ˆω C y) T µ ˆωb C yyt C, which is (3.3). Note that ˆV is not projection matrix in general and that we prefer the minus sign in β, ˆV and û, since then for η = (BFGS) we get β = 0, ŝ = s and ˆV = V. For η it is also µ, therefore the formula for β above should be rewritten in another form. In view of we obtain η 2 µ = η 2 η ( η)ϱb/(γa) = (η )(η + ϱb/(γa)) = (η )ωb/a β = η µ ω = η2 µ ω(η + µ) = (η )ωb/a ω(η + µ) = (η )b/a η + µ. For better understanding, condition µ 0 can be rewritten as η(ϱb γa) ϱb, i.e. η η SR for η SR > 0, or η η SR for η SR < 0, where η SR is the value of parameter η for the SR method, η SR = ϱb/(ϱb γa), see [5]. Now we define (/γ)c + = C ˆrˆr T and give conditions for matrix C + to be positive definite. Lemma 3.3. Let the assumptions of Theorem 3.3 be satisfied, matrix C be positive definite, vector ˆr be given by (3.3) and η > 0. Then C ˆrˆr T is also positive definite. 0

13 Proof. For any q R N we obtain by the Schwarz inequality and b > 0 q T (C ˆrˆr T )q q T C q T = µ(qt C y) 2 (ˆωb) q T C q T µã ˆωb = µã ηbϱ/γ + µã = ηbϱ/γ ηbϱ/γ + µã > 0. To get H + = C + + U + U T +, we can set for k m similarly as in (3.6) ζ + = γζ, (/ γ)u + = ˆV U + û z T, (/ γ)r + = R + ˆr z T 2, (3.32) for k < m simply add column û to ˆV U and ˆr to R. Summarizing the updating process, we see that it is again very similar to the process in Section 3.. First we choose vectors z, z 2, see recommendations in Section 3.5, and then we determine matrices U, R, C, H by (2.6) and compute vectors ŝ by (3.30) and û, ˆr by (3.3). Finally we compute number ζ + and matrices U +, R + by (3.32) and determine matrices C + by (3.5) and H + by (3.4). It can be readily verified that the partly inverse representation of (3.3) has the form [ ( η γ H BC + = C + + a ) ] γ yy T + ûû T + η b ϱ a ˆV U U T ˆV T. (3.33) 3.5 Choice of parameters Efficiency of all methods described in this report depends very much on the suitable choice of parameters. Since analysis of VM matrix damage caused by projections is very complicated, recommendations given here have empirical character. The choice of scaling parameter γ plays the main role. The value γ =, i.e. scaling is omitted in some iterations, is not good here in general. In starting iterations (H = H, C = C), the value γ = b/a appears to be the best choice; the choices γ = b/ã, γ = b/ aã give approximately the same results. In other iterations the choice γ = b/ aã gives good results, the value γ = b/ ã max(a, ã + U T Bs 2 ) seems to be the best choice. In all iterations, if γ computed in this way is less than 0 3, we set γ = b/ã. Also the choice of the projection vector z has great influence. The good choices are z () = U T Bs (y T UU T Bs/ U T y 2 )U T y, z (2) = U T Bs ( U T Bs / y T Hy) U T y, we use combination z = ( θ)z () + θz (2) with θ = U T y 2 /( U T y 2 + U T Bs 2 ). Note that y T Uz () = 0. On the other hand, the choice of the vector z 2 has only little influence. We use the vector z 2 = (y T RR T Bs) R T Bs ( R T Bs 2 )R T y, which satisfies s T BRz 2 = 0. As regards nonquadratic correction parameter ϱ, no value seems to be the best, therefore we choose ϱ =.

14 4 Computational experiments Our new limited-memory VM methods were tested, using the collection of sparse and partially separable test problems from [6] (Test 4 and Test 5, 22 problems each) with N = 000, m = 0, ϱ =, η = 0.8 in starting iterations and η = otherwise, the final precision g(x ) 0 6 for Test 4 and g(x ) 0 5 for Test 5. In Table we compare our method described in Section 3.4 (PLM) with the following methods: method given in [2] (BNS), shifted limited-memory method (SLM), see [7], and variationally-derived limited-memory method (VLM), see[8]. We present the total numbers of function and also gradient evaluations over all problems (NFE) and the total computational time (Time) in seconds. Collection BNS SLM VLM PLM of problems NFE Time NFE Time NFE Time NFE Time Test Test Table. Comparison with other methods for Test 4 and Test 5. We see that the numerical results of the new methods are comparable with other methods as regards the number of evaluations. The longer total computational time is caused by the fact that the number of columns of matrices U, R are the same in this test version of method. We plan to reduce the number of columns of matrix R in future, see Section 2. For a better comparison with method BNS, we performed additional tests with problems from the widely used CUTE collection [] with various dimensions N and the final precision g(x ) 0 6. The results are given in Table 2, where NFE is the number of function and also gradient evaluations and Time the computational time in seconds. CUTE BNS SLM VLM PLM Problem N NFV Time NFV Time NFV Time NFV Time BROYDN7D CURLY DIXMAANI DQRTIC FLETCBV GENHUMPS GENROSE MSQRTALS NCB20B NONCVXU NONDQUAR POWER QUARTC Table 2: Comparison with other methods for CUTE. 2

15 Our limited numerical experiments indicate that the additional improvements and testing of the new methods could be useful. References [] I. Bongartz, A.R. Conn, N. Gould, P.L. Toint: CUTE: constrained and unconstrained testing environment, ACM Transactions on Mathematical Software 2 (995), [2] R.H. Byrd, J. Nocedal, R.B. Schnabel: Representation of quasi-newton matrices and their use in limited memory methods, Math. Programming 63 (994) [3] R. Fletcher: Practical Methods of Optimization, John Wiley & Sons, Chichester, 987. [4] J. Greenstadt, Variations on variable metric methods, Math. Comput. 24 (970) -22. [5] L. Lukšan, E. Spedicato: Variable metric methods for unconstrained optimization and nonlinear least squares, J. Comput. Appl. Math. 24 (2000) [6] L. Lukšan, J. Vlček: Sparse and partially separable test problems for unconstrained and equality constrained optimization, Report V-767, Prague, ICS AS CR, 998. [7] J. Vlček, L. Lukšan: Shifted variable metric methods for large-scale unconstrained optimization, J. Comput. Appl. Math. 86 (2006) [8] J. Vlček, L. Lukšan: New class of limited-memory variationally-derived variable metric methods, Report V-973, Prague, ICS AS CR,

New class of limited-memory variationally-derived variable metric methods 1

New class of limited-memory variationally-derived variable metric methods 1 New class of limited-memory variationally-derived variable metric methods 1 Jan Vlček, Ladislav Lukšan Institute of Computer Science, Academy of Sciences of the Czech Republic, L. Lukšan is also from Technical

More information

Institute of Computer Science. A conjugate directions approach to improve the limited-memory BFGS method

Institute of Computer Science. A conjugate directions approach to improve the limited-memory BFGS method Institute of Computer Science Academy of Sciences of the Czech Republic A conjugate directions approach to improve the limited-memory BFGS method J. Vlček a, L. Lukšan a, b a Institute of Computer Science,

More information

Institute of Computer Science. New class of limited-memory variationally-derived variable metric methods

Institute of Computer Science. New class of limited-memory variationally-derived variable metric methods Institute of Computer Science Academy of Sciences of the Czech Repulic New class of limited-memory variationally-derived variale metric methods J. Vlček, L. Lukšan echnical report No. V 973 Decemer 2006

More information

Institute of Computer Science. A limited-memory optimization method using the infinitely many times repeated BNS update and conjugate directions

Institute of Computer Science. A limited-memory optimization method using the infinitely many times repeated BNS update and conjugate directions Institute of Computer Science Academy of Sciences of the Czech Republic A limited-memory optimization method using the infinitely many times repeated BNS update and conjugate directions Jan Vlček a, Ladislav

More information

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Improved Damped Quasi-Newton Methods for Unconstrained Optimization Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

On Lagrange multipliers of trust region subproblems

On Lagrange multipliers of trust region subproblems On Lagrange multipliers of trust region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Applied Linear Algebra April 28-30, 2008 Novi Sad, Serbia Outline

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS

A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS MEHIDDIN AL-BAALI AND HUMAID KHALFAN Abstract. Techniques for obtaining safely positive definite Hessian approximations with selfscaling

More information

ENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK)

ENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK) Quasi-Newton updates with weighted secant equations by. Gratton, V. Malmedy and Ph. L. oint Report NAXY-09-203 6 October 203 0.5 0 0.5 0.5 0 0.5 ENIEEH-IRI, 2, rue Camichel, 3000 oulouse France LM AMECH,

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

MSS: MATLAB SOFTWARE FOR L-BFGS TRUST-REGION SUBPROBLEMS FOR LARGE-SCALE OPTIMIZATION

MSS: MATLAB SOFTWARE FOR L-BFGS TRUST-REGION SUBPROBLEMS FOR LARGE-SCALE OPTIMIZATION MSS: MATLAB SOFTWARE FOR L-BFGS TRUST-REGION SUBPROBLEMS FOR LARGE-SCALE OPTIMIZATION JENNIFER B. ERWAY AND ROUMMEL F. MARCIA Abstract. A MATLAB implementation of the Moré-Sorensen sequential (MSS) method

More information

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University

More information

Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization 1

Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization 1 journal of complexity 18, 557 572 (2002) doi:10.1006/jcom.2001.0623 Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization 1 M. Al-Baali Department of Mathematics

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Preconditioned conjugate gradient algorithms with column scaling

Preconditioned conjugate gradient algorithms with column scaling Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Preconditioned conjugate gradient algorithms with column scaling R. Pytla Institute of Automatic Control and

More information

REPORTS IN INFORMATICS

REPORTS IN INFORMATICS REPORTS IN INFORMATICS ISSN 0333-3590 A class of Methods Combining L-BFGS and Truncated Newton Lennart Frimannslund Trond Steihaug REPORT NO 319 April 2006 Department of Informatics UNIVERSITY OF BERGEN

More information

Matrix Secant Methods

Matrix Secant Methods Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

The antitriangular factorisation of saddle point matrices

The antitriangular factorisation of saddle point matrices The antitriangular factorisation of saddle point matrices J. Pestana and A. J. Wathen August 29, 2013 Abstract Mastronardi and Van Dooren [this journal, 34 (2013) pp. 173 196] recently introduced the block

More information

Institute of Computer Science. Primal interior-point method for large sparse minimax optimization

Institute of Computer Science. Primal interior-point method for large sparse minimax optimization Institute of Computer Science Academy of Sciences of the Czech Republic Primal interior-point method for large sparse minimax optimization L.Lukšan, C. Matonoha, J. Vlček Technical report No. 941 October

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Key words. conjugate gradients, normwise backward error, incremental norm estimation. Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322

More information

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν 1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Conjugate gradient methods based on secant conditions that generate descent search directions for unconstrained optimization

Conjugate gradient methods based on secant conditions that generate descent search directions for unconstrained optimization Conjugate gradient methods based on secant conditions that generate descent search directions for unconstrained optimization Yasushi Narushima and Hiroshi Yabe September 28, 2011 Abstract Conjugate gradient

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

arxiv: v3 [math.na] 23 Mar 2016

arxiv: v3 [math.na] 23 Mar 2016 Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms Robert M. Gower and Peter Richtárik arxiv:602.0768v3 [math.na] 23 Mar 206 School of Mathematics University of Edinburgh

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

March 8, 2010 MATH 408 FINAL EXAM SAMPLE March 8, 200 MATH 408 FINAL EXAM SAMPLE EXAM OUTLINE The final exam for this course takes place in the regular course classroom (MEB 238) on Monday, March 2, 8:30-0:20 am. You may bring two-sided 8 page

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

Journal of Computational and Applied Mathematics. Notes on the Dai Yuan Yuan modified spectral gradient method

Journal of Computational and Applied Mathematics. Notes on the Dai Yuan Yuan modified spectral gradient method Journal of Computational Applied Mathematics 234 (200) 2986 2992 Contents lists available at ScienceDirect Journal of Computational Applied Mathematics journal homepage: wwwelseviercom/locate/cam Notes

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Stochastic Quasi-Newton Methods

Stochastic Quasi-Newton Methods Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent

More information

A projected Hessian for full waveform inversion

A projected Hessian for full waveform inversion CWP-679 A projected Hessian for full waveform inversion Yong Ma & Dave Hale Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA (c) Figure 1. Update directions for one iteration

More information

BFGS WITH UPDATE SKIPPING AND VARYING MEMORY. July 9, 1996

BFGS WITH UPDATE SKIPPING AND VARYING MEMORY. July 9, 1996 BFGS WITH UPDATE SKIPPING AND VARYING MEMORY TAMARA GIBSON y, DIANNE P. O'LEARY z, AND LARRY NAZARETH x July 9, 1996 Abstract. We give conditions under which limited-memory quasi-newton methods with exact

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS

DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS by Johannes Brust, Oleg Burdaov, Jennifer B. Erway, and Roummel F. Marcia Technical Report 07-, Department of Mathematics and Statistics, Wae

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

Academic Editors: Alicia Cordero, Juan R. Torregrosa and Francisco I. Chicharro

Academic Editors: Alicia Cordero, Juan R. Torregrosa and Francisco I. Chicharro Algorithms 2015, 8, 774-785; doi:10.3390/a8030774 OPEN ACCESS algorithms ISSN 1999-4893 www.mdpi.com/journal/algorithms Article Parallel Variants of Broyden s Method Ioan Bistran, Stefan Maruster * and

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Downloaded 12/02/13 to Redistribution subject to SIAM license or copyright; see

Downloaded 12/02/13 to Redistribution subject to SIAM license or copyright; see SIAM J. OPTIM. Vol. 23, No. 4, pp. 2150 2168 c 2013 Society for Industrial and Applied Mathematics THE LIMITED MEMORY CONJUGATE GRADIENT METHOD WILLIAM W. HAGER AND HONGCHAO ZHANG Abstract. In theory,

More information

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS

ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS JOHANNES BRUST, OLEG BURDAKOV, JENNIFER B. ERWAY, ROUMMEL F. MARCIA, AND YA-XIANG YUAN Abstract. We present

More information

ETNA Kent State University

ETNA Kent State University Electronic Transactions on Numerical Analysis. Volume 41, pp. 159-166, 2014. Copyright 2014,. ISSN 1068-9613. ETNA MAX-MIN AND MIN-MAX APPROXIMATION PROBLEMS FOR NORMAL MATRICES REVISITED JÖRG LIESEN AND

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Characterization of half-radial matrices

Characterization of half-radial matrices Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the

More information

LBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version)

LBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version) LBFGS John Langford, Large Scale Machine Learning Class, February 5 (post presentation version) We are still doing Linear Learning Features: a vector x R n Label: y R Goal: Learn w R n such that ŷ w (x)

More information

230 L. HEI if ρ k is satisfactory enough, and to reduce it by a constant fraction (say, ahalf): k+1 = fi 2 k (0 <fi 2 < 1); (1.7) in the case ρ k is n

230 L. HEI if ρ k is satisfactory enough, and to reduce it by a constant fraction (say, ahalf): k+1 = fi 2 k (0 <fi 2 < 1); (1.7) in the case ρ k is n Journal of Computational Mathematics, Vol.21, No.2, 2003, 229 236. A SELF-ADAPTIVE TRUST REGION ALGORITHM Λ1) Long Hei y (Institute of Computational Mathematics and Scientific/Engineering Computing, Academy

More information

The speed of Shor s R-algorithm

The speed of Shor s R-algorithm IMA Journal of Numerical Analysis 2008) 28, 711 720 doi:10.1093/imanum/drn008 Advance Access publication on September 12, 2008 The speed of Shor s R-algorithm J. V. BURKE Department of Mathematics, University

More information

Sparse Hessian Factorization in Curved Trajectories for Minimization

Sparse Hessian Factorization in Curved Trajectories for Minimization Sparse Hessian Factorization in Curved Trajectories for Minimization ALBERTO J JIMÉNEZ * The Curved Trajectories Algorithm (CTA) for the minimization of unconstrained functions of several variables follows

More information

Minimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces

Minimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces MATHEMATICS OF COMPUTATION, VOLUME 32, NUMBER 143 JULY 1978, PAGES 829-837 Minimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces By Robert B. Schnabel* Abstract. The Davidon-Fletcher-Powell

More information

The Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation

The Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation Linear Algebra and its Applications 360 (003) 43 57 www.elsevier.com/locate/laa The Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation Panayiotis J. Psarrakos

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Limited Memory Bundle Algorithm for Large Bound Constrained Nonsmooth Minimization Problems

Limited Memory Bundle Algorithm for Large Bound Constrained Nonsmooth Minimization Problems Reports of the Department of Mathematical Information Technology Series B. Scientific Computing No. B. 1/2006 Limited Memory Bundle Algorithm for Large Bound Constrained Nonsmooth Minimization Problems

More information

Minimax Design of Complex-Coefficient FIR Filters with Low Group Delay

Minimax Design of Complex-Coefficient FIR Filters with Low Group Delay Minimax Design of Complex-Coefficient FIR Filters with Low Group Delay Wu-Sheng Lu Takao Hinamoto Dept. of Elec. and Comp. Engineering Graduate School of Engineering University of Victoria Hiroshima University

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Themodifiedabsolute-valuefactorizationnormfortrust-regionminimization

Themodifiedabsolute-valuefactorizationnormfortrust-regionminimization Themodifiedabsolute-valuefactorizationnormfortrust-regionminimization Nicholas I. M. Gould Department for Computation and Information, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX11 0QX, England.

More information

Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization M. Al-Baali y December 7, 2000 Abstract This pape

Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization M. Al-Baali y December 7, 2000 Abstract This pape SULTAN QABOOS UNIVERSITY Department of Mathematics and Statistics Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization by M. Al-Baali December 2000 Extra-Updates

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

On the use of quadratic models in unconstrained minimization without derivatives 1. M.J.D. Powell

On the use of quadratic models in unconstrained minimization without derivatives 1. M.J.D. Powell On the use of quadratic models in unconstrained minimization without derivatives 1 M.J.D. Powell Abstract: Quadratic approximations to the objective function provide a way of estimating first and second

More information

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization

An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization H. Mansouri M. Zangiabadi Y. Bai C. Roos Department of Mathematical Science, Shahrekord University, P.O. Box 115, Shahrekord,

More information

A Regularized Interior-Point Method for Constrained Nonlinear Least Squares

A Regularized Interior-Point Method for Constrained Nonlinear Least Squares A Regularized Interior-Point Method for Constrained Nonlinear Least Squares XII Brazilian Workshop on Continuous Optimization Abel Soares Siqueira Federal University of Paraná - Curitiba/PR - Brazil Dominique

More information

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

More information

Jim Lambers MAT 419/519 Summer Session Lecture 11 Notes

Jim Lambers MAT 419/519 Summer Session Lecture 11 Notes Jim Lambers MAT 49/59 Summer Session 20-2 Lecture Notes These notes correspond to Section 34 in the text Broyden s Method One of the drawbacks of using Newton s Method to solve a system of nonlinear equations

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Matrix stabilization using differential equations.

Matrix stabilization using differential equations. Matrix stabilization using differential equations. Nicola Guglielmi Universitá dell Aquila and Gran Sasso Science Institute, Italia NUMOC-2017 Roma, 19 23 June, 2017 Inspired by a joint work with Christian

More information

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min

1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min LLNL IM Release number: LLNL-JRNL-745068 A STRUCTURED QUASI-NEWTON ALGORITHM FOR OPTIMIZING WITH INCOMPLETE HESSIAN INFORMATION COSMIN G. PETRA, NAIYUAN CHIANG, AND MIHAI ANITESCU Abstract. We present

More information

Gradient method based on epsilon algorithm for large-scale nonlinearoptimization

Gradient method based on epsilon algorithm for large-scale nonlinearoptimization ISSN 1746-7233, England, UK World Journal of Modelling and Simulation Vol. 4 (2008) No. 1, pp. 64-68 Gradient method based on epsilon algorithm for large-scale nonlinearoptimization Jianliang Li, Lian

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

A New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming

A New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming A New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming Roger Fletcher Numerical Analysis Report NA/223, August 2005 Abstract A new quasi-newton scheme for updating a low rank positive semi-definite

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

Quasi-Newton Methods

Quasi-Newton Methods Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

f = 2 x* x g 2 = 20 g 2 = 4

f = 2 x* x g 2 = 20 g 2 = 4 On the Behavior of the Gradient Norm in the Steepest Descent Method Jorge Nocedal y Annick Sartenaer z Ciyou Zhu x May 30, 000 Abstract It is well known that the norm of the gradient may be unreliable

More information

10-725/ Optimization Midterm Exam

10-725/ Optimization Midterm Exam 10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted

More information

CHAPTER 6. Projection Methods. Let A R n n. Solve Ax = f. Find an approximate solution ˆx K such that r = f Aˆx L.

CHAPTER 6. Projection Methods. Let A R n n. Solve Ax = f. Find an approximate solution ˆx K such that r = f Aˆx L. Projection Methods CHAPTER 6 Let A R n n. Solve Ax = f. Find an approximate solution ˆx K such that r = f Aˆx L. V (n m) = [v, v 2,..., v m ] basis of K W (n m) = [w, w 2,..., w m ] basis of L Let x 0

More information

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize Bol. Soc. Paran. Mat. (3s.) v. 00 0 (0000):????. c SPM ISSN-2175-1188 on line ISSN-00378712 in press SPM: www.spm.uem.br/bspm doi:10.5269/bspm.v38i6.35641 Convergence of a Two-parameter Family of Conjugate

More information

On Sparse and Symmetric Matrix Updating

On Sparse and Symmetric Matrix Updating MATHEMATICS OF COMPUTATION, VOLUME 31, NUMBER 140 OCTOBER 1977, PAGES 954-961 On Sparse and Symmetric Matrix Updating Subject to a Linear Equation By Ph. L. Toint* Abstract. A procedure for symmetric matrix

More information

Reduced-Hessian Methods for Constrained Optimization

Reduced-Hessian Methods for Constrained Optimization Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its

More information

Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio

Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio Long Zhao, Deepayan Chakrabarti, and Kumar Muthuraman McCombs School of Business, University of Texas,

More information

Math 408A: Non-Linear Optimization

Math 408A: Non-Linear Optimization February 12 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x

More information