Institute of Computer Science. Limited-memory projective variable metric methods for unconstrained minimization

Size: px

Start display at page:

Download "Institute of Computer Science. Limited-memory projective variable metric methods for unconstrained minimization"

Trevor Walters
6 years ago
Views:

1 Institute of Computer Science Academy of Sciences of the Czech Republic Limited-memory projective variable metric methods for unconstrained minimization J. Vlček, L. Lukšan Technical report No. V 036 December 2008 Pod Vodárenskou věží 2, Prague 8 phone: , fax: , luksan@cs.cas.cz, vlcek@cs.cas.cz

2 Institute of Computer Science Academy of Sciences of the Czech Republic Limited-memory projective variable metric methods for unconstrained minimization J. Vlček, L. Lukšan Technical report No. V 036 December 2008 Abstract: A new family of limited-memory variable metric or quasi-newton methods for unconstrained minimization is given. The methods are based on a positive definite inverse Hessian approximation in the form of the sum of identity matrix and two low rank matrices, obtained by the standard scaled Broyden class update. To reduce the rank of matrices, various projections are used. Numerical experience is encouraging. Keywords: Unconstrained minimization, variable metric methods, limited-memory methods, Broyden class updates, projection matrix, numerical results. This work was supported by the Grant Agency of the Czech Academy of Sciences, project No. IAA030405, and the Institutional research plan No. AV0Z , L. Lukšan is also from Technical University of Liberec, Hálkova 6, 46 7 Liberec.

3 Introduction In this report we present a new family of limited-memory variable metric (VM) line search methods for unconstrained minimization, which are projective, ie. they reduce the rank of matrices in the inverse Hessian approximation by suitable projection. VM line search methods, see [5], [3], are iterative. Starting with an initial point x 0 R N, they generate iterations x k+ R N by the process x k+ = x k + s k, s k = t k d k, k 0, where d k is the direction vector and t k > 0 is a stepsize. We assume that the problem function f : R N R is differentiable, d k = H k g k and stepsize t k is chosen in such a way that f k+ f k ε t k g T k d k, g T k+d k ε 2 g T k d k, (.) k 0, where 0 < ε < /2, ε < ε 2 <, f k = f(x k ), g k = f(x k ) and H k is a symmetric positive definite matrix. We denote B k = Hk, y k = g k+ g k, k 0 and by. F the Frobenius matrix norm. Various possibilities how to construct positive definite inverse Hessian approximations based on the low rank matrices are discussed in Section 2. In Section 3 we show how properties of the standard scaled BFGS update can be utilized and generalized to construct new limited-memory methods. Numerical results are presented in Section 4. 2 Positive definite inverse Hessian approximation A simple form of positive definite matrices H k based on low rank matrices is ζ k I+U k U T k, k 0, where ζ k > 0 and U k are N m k rectangular matrices, m k N, but updating matrices of this form appears to be difficult. To avoid this drawback, shifted VM methods, see [7], were developed. They appeared to be very efficient, except for ill-conditioned problems, which is probably caused by the fact that shifted VM updates are not invariant under linear transformations (significance of the invariance property of methods for solution of ill-conditioned problems is discussed in [3]). Another possibility, how to update matrices of this form, is described in [8]. Here invariant variationally-derived updates are applied, but VM matrices cannot be directly used to calculate direction vectors d k, and thus complicated corrections are necessary. Our new methods are based on matrices H k of the form H k = ζ k I + U k U T k R k R T k, (2.) k 0, where ζ k > 0 and U k, R k are N min(k, m) rectangular matrices, m N. Most of VM matrices in standard VM methods, see [3], [5], can be written in this way. These methods set usually H 0 = I and H k+ is obtained from γ k H k (γ k is a scaling parameter) by a rank-two VM update to satisfy the quasi-newton condition; its generalized form is H k+ y k = ϱ k s k, (2.2) where ϱ k > 0 is a nonquadratic correction parameter (see [5]). Therefore matrices H k can be easily updated in starting iterations, i.e. for k < m. When k m, the rank of matrices U, R needs to be reduced before each update and

4 updating formulas modified, which is described in Section 2.. Some conditions to have the result positive definite are discussed in Section 2.2. We use in this report matrices U k, R k with the same number of columns only for simplicity. Numerical experiments indicate that the number of columns of R k can be substantially reduced without significant increasing of the number of function and gradient evaluations, which can increase efficiency. All methods here can be easily adapted in this way. For given q k R N, qk T y k 0, we denote by V qk the projection matrix I q k yk/q T k T y k. To simplify the notation we frequently omit index k and replace index k+ by symbol +. In the subsequent analysis we use the following notation Note that b > 0 by (.). C = ζi RR T, b = s T y, V = V s = I (/b)sy T. 2. Limited-memory VM matrices updating To be able to add next columns, when matrices U, R have rank m, we first need to make their columns dependent, i.e. we need matrices U, R, near to U, R, and suitable vectors z, z 2 R m, satisfying U z = 0, R z 2 = 0. Such matrix U can be in a general way written as the product of any N m matrix and orthogonal projection matrix P = I z z T /z T z = I z z T, z = z / z, similarly for matrix R with P 2 = I z 2 z T 2, z 2 = z 2 / z 2, instead of P. The following lemma can be used to find the nearest (in the sense of Frobenius matrix norm) matrices to U, R. Lemma 2.. Let T be symmetric positive definite, q R N, z R m, z 0, m N, and denote W the set of N m matrices, W W. Then the unique solution to is min{ T /2 (W W ) F : W W} s.t. W z = 0 (2.3) W = W P, P = I zz T /z T z, (2.4) independently on the choice of T. Moreover, we can write W W T + qq T = ( W + qz T / z ) ( W + qz T / z ) T. (2.5) Proof. Setting W = (w,..., w m ), W = (w,..., w m ), z = (ξ,..., ξ m ) T, we define Lagrangian function L = L(W, e), e R N, as L = 2 T /2 (W W ) 2 F + e T W z = m m (w i w i ) T T (w i w i ) + ξ i e T w i. 2 i= i= A local minimizer W satisfies the equations L/ w i = 0, i =,..., m, which gives T (w i w i ) + ξ i e = 0, i =,..., m, yielding W = W T ez T. Using the condition W z = 0, we have T e = W z/z T z, i.e. W = W P. Matrix W satisfies (2.3) in view of convexity of Frobenius norm and the rest follows immediately from (2.4). 2

5 In view of Lemma 2. we set U = UP, R = RP 2, C = ζi R R T, H = C + U U T. (2.6) If matrix H is positive definite and we choose a suitable standard VM update of H in the form (/γ)h + = H + uu T rr T which maintain positive definiteness of VM matrices, limited-memory updating of H can be completely covered by Lemma 2.. Writing H + = ζ + I R + R++U T + U+ T and using (2.5), we can set ζ + = γζ, (/ γ)u + = UP +u z T, (/ γ)r + = RP 2 + r z 2 T. Unfortunately, in general case we cannot find vectors z, z 2 which guarantee positive definiteness of H. Some conditions that ensure it will be given in the next section. 2.2 Conditions for positive definiteness of reduced VM matrices In view of P 2 i = P i, i =, 2, we can write U U T = UP U T = UU T U z z U T, R R T = RP 2 R T = RR T R z 2 z 2 R T. (2.7) In view of C = C + R z 2 z 2 R T by (2.6), we see that situation is quite simple, when matrix C is positive definite, since then also matrix C and H = C + U U T are positive definite for any z, z 2. The next section will be devoted to such standard VM updates that can be adapted to maintain this property. Even if matrix C is not positive definite, the choice of z 2 has no influence to preserving of positive definiteness of H after reduction of matrix R, since ζi R R T + UU T = H+R z 2 z 2 R T by (2.7). The following lemma shows one possibility how to attain positive definiteness of the VM matrix after reduction of matrix U. Lemma 2.2. Let A = H + vv T, v R N, and denote λ(a), λ(h) the minimum eigenvalues of A, H. If [ U, R ] T v = 0 then λ(h) = λ(a). Conditions U T v = 0, R T v = 0 can be satisfied e.g. by the choice z = U T v, z 2 = R T v. Proof. We first show that λ(a) ζ. Denoting K = {q R N : [ U, v] T q = 0, q = }, we have λ(a) = min q = qt Aq min q T Aq = min q T (ζi R R T )q ζ. q K q K We will consider the following two cases: (a) If λ(h) = ζ, then λ(h) = λ(a) holds, since ζ = λ(h) λ(a) ζ. (b) If λ(h) ζ, then H w = λ(h) w, w 0, implies U U T w R R T w = (λ(h) ζ)w and thus w K 2 = {q range([ U, R ]) : q 0}. By [ U, R ] T v = 0 we get λ(h) λ(a) = min q 0 q T Aq q T q min q T Aq q K 2 q T q = min q T H q q K 2 q T q wt H w w T w = λ(h), therefore we again obtain λ(h) = λ(a). If z = U T v, we have U T v = P U T v = P z = 0 and similarly R T v = 0 for z 2 = R T v. 3

6 Note that in our case v = U z by (2.7), thus the choice z = U T v leads to the eigenvalue problem U T Uz = z z ; in view of (2.7) the solution of this problem minimizes matrix UU T damage, measured by U z z U T = U z 2 = z U T Uz /z T z, if we choose as z the eigenvector corresponding to the smallest eigenvalue of U T U. This is valuable also in situations when we attain positive definiteness of the VM matrix after reduction of matrix U in another way. 3 New limited-memory methods In this section we focus on adaptation of standard VM updates to maintain positive definiteness of matrix C. Having reduced matrix C positive definite, we want to find updated matrix H + satisfying the quasi-newton condition (2.2), in the form H + = C + + U + U T +, where C + is positive definite. An easy way to do it is shown in Section 3. in case of the scaled BFGS method, see [5]. The variant of this approach based on the partly inverse representation of matrix C is described in Section 3.2. These results are generalized in Section 3.3, using variational approach, and in Section 3.4, using a special transformation. The choice of parameters, including z and z 2, is discussed in Section Adaptation of the scaled BFGS method The standard scaled BFGS update of H can be written in the form (see [5]) BF GS H + = ϱ γ γb sst + V HV T, (3.) which in view of H = C + U U T after easy rearrangements leads to where BF GS H + = BF GS C + + V U U T V T = C rr T + uu T + V U U T V T, (3.2) γ γ u = (/ ωb) (ωs C y), r = (/ ωb) C y, ω = ϱ/γ + ã/b, ã = y T C y. (3.3) If we define (/γ)c + = C rr T, we can easily show that matrix C + is positive definite. Lemma 3.. Let matrix C be positive definite and vector r be given by (3.3). Then matrix C rr T is also positive definite. Proof. For any q R N we obtain by the Schwarz inequality, (3.3) and b > 0 q T (C rr T )q q T C q T = (qt C y) 2 (ωb) q T C q T ã ωb = ã bϱ/γ + ã = bϱ/γ bϱ/γ + ã > 0. Replacing C rr T in (3.2) by (/γ)c +, we see that to get H + = C + + U + U T +, (3.4) 4

7 we need matrix U + satisfying (/γ)u + U T + = V U U T V T + uu T. Similarly, considering that C = ζi R R T, we see that to get C + = ζ + I R + R T +, (3.5) we need ζ + > 0 and matrix R + satisfying (/γ)r + R T + = R R T + rr T. By (2.6) we can fulfil all these requirements if we define ζ + = γζ, (/ γ)u + = V U + u z T, (/ γ)r + = R + r z T 2. (3.6) For better clearness, we summarize the updating process. First we choose vectors z, z 2, see recommendations in Section 3.5, and then we determine matrices U, R, C, H by (2.6) and compute vectors u, r by (3.3). Finally we compute number ζ + and matrices U +, R + by (3.6) and determine matrices C + by (3.5) and H + by (3.4). In starting iterations we omit reductions of matrices U, R, i.e. C = C, U = U, R = R and simply add column u to V U and r to R. Note that then H + is the BFGS update of H, unlike C +, which is not any standard VM update of matrix C. 3.2 Partly inverse representation of the scaled BFGS method In the previous method we can change ζ only by scaling (3.6) to guarantee that C + will be positive definite. This drawback can be overcome when we use partly inverse representation of the scaled BFGS update. In starting iterations we can in (3.2) rewrite the BFGS update of matrix C = C in the form ( BF GS C + = C + γ ) γ ϱb yyt + uu T (3.7) with u given again by (3.2), which can be readily verified. Thus instead of (/γ)c + = C rr T, we can use equivalent formula C + = γ C + ϱb yyt. (3.8) Starting with C 0 = I and updating C0,..., Ck according to (3.8), we obtain C = C = (/ γ) I + Y DY T, (3.9) where γ = γ γ k, Y = [y,..., y k ] and diagonal positive definite matrix D is given by D = [/(ϱ b )] and by update formula ( D + = diag γ D, ). (3.0) ϱb In view of (3.9), we store matrices Y, D instead of R. When k m, we reduce matrix U in the same way as before, i.e. U = UP, while in case of matrices Y, D, we can simply delete their first columns, i.e. supposing that Y = [y k m,..., y k ], D = diag(d,..., d m ) and C = (/ζ) I + Y DY T, ζ > 0, we set Y = [y k m+,..., y k ], D = diag(d 2,..., d m ), C = (/ζ) I + Y D Y T. (3.) 5

8 To get H + = C + + U + U+, T we define ζ +, U + again by (3.6) and C + in view of (3.8) by C+ = ( I + Y + D + Y+ T, Y + = [Y, y], D + = diag ζ + γ D, ). (3.2) ϱb To be able to compute C y, we store m m matrix Y T Y and use the Woodbury formula, which yields ( ) ( ) C = ζ I + Y D Y T = ζi ζy ζ D + Y T Y Y T, (3.3) i.e. to compute C y, we need to solve a linear system with symmetric positive definite matrix (/ζ) D + Y T Y. Note that during computation of C y we can save vector Y T y and use it in updating of matrix Y T Y, since by (3.2) we obtain ( Y Y+ T T Y Y T ) y Y + = y T Y y T. (3.4) y Now we summarize the updating process. First we choose vector z, see recommendations in Section 3.5, and then we determine matrices U by (2.6), Y, D, C by (3.) and H by (2.6) and compute vector u by (3.3), using (3.3). Finally we compute number ζ + and matrix U + by (3.6) and determine matrices Y +, D +, C + by (3.2), using (3.4), and H + by (3.4). If we wish to use a different projection instead of deleting the first column of Y and D (although we are not able to say in this time, how to choose a suitable projection), we can use another representation of C. In starting iterations we rewrite (3.9) in the form C = C = (/ γ) I + Ỹ Ỹ T, Ỹ = Y D /2. (3.5) Then for k m we reduce matrix U as before, i.e. U = UP, and matrix Ỹ in a similar way, i.e. Ỹ = Ỹ P 3, P 3 = I z 3 z 3 T, z 3 = z 3 / z 3, with some z 3 R m, and set C = (/ζ) I + Ỹ Ỹ T, ζ > 0. (3.6) To get H + = C + + U + U T +, we define ζ +, U + again by (3.6) and C + in view of (3.8) by C + = ζ + I + Ỹ+Ỹ T +, Ỹ + = γ Ỹ + ϱb y z T 3. (3.7) To be able to compute C y, we store m m matrix Ỹ T Ỹ and use the Woodbury formula, which yields ( ) ( ) C = ζ I + Ỹ Ỹ T = ζi ζỹ ζ I + Ỹ T T Ỹ Ỹ. Note that during computation of C y we can again save vector Ỹ T y and use it in updating of matrix Ỹ T Ỹ, since by (3.7) we obtain Ỹ+ T Ỹ+ = γ Ỹ T Ỹ + ) T (Ỹ y z T ϱγb 3 + z 3 y T Ỹ + yt y ϱ b z 3 z 3 T. 6

9 3.3 Variationally-derived generalization Although the scaled Broyden class updates of H with positive value of parameter η can be written in the quasi-product form similar to (3.), see [7], we cannot construct limited-memory methods in a similar way as in the previous sections, since then projection matrix contains vector H y C y and therefore we do not obtain the relation similar to (3.2). In spite of that, it is possible to generalize this process for the standard scaled Broyden class updates, as we show in the next section. In this section we briefly describe two other possibilities. The first approach is based on the projection variant, see [8], of the well-known Greenstadt s theorem, see [4]: Theorem 3.. Let M, T be symmetric matrices, T positive definite, ϱ > 0, p = T y and denote M the set of N N symmetric matrices. Then the unique solution to min{ T /2 (M + M)T /2 F : M + M} s.t. M + y = ϱs is determined by the relation V p (M + M) V T p = 0 and can be written in the form M + = E + V p (M E) V T p, where E is any symmetric matrix satisfying Ey = ϱs, e.g. E = (ϱ/b)ss T. Using this theorem with M = γh = γ(c + U U T ), M + = H +, p = s α C y, α R, and E = (ϱ/b)ss T, we obtain γ H + = V p U U T Vp T + γ C BC +, γ C BC + = ϱ ss T γ b ( + V p C ϱ ss T ) Vp T, (3.8) γ b where by Lemma 2.3 in [8], this C BC + is the Broyden class update, see [5], of C with parameter b 2 ( η C = α 2 ϱ ) ã. (3.9) (b αã) 2 γ b For given η C satisfying 0 < η C < ω γ ϱ, ω = ϱ γ + ã b η C, (3.20) we can readily verify that the corresponding α can be obtained from α = η C + η C + ( η C )ϱb/(γã) ω or α 2 = (b/ã)(η C ) η C + η C + ( η C )ϱb/(γã). (3.2) Using the usual form of the Broyden class update of C in (3.8), after straightforward arrangements we obtain the relation, similar to (3.2) γ H + = γ C BC + + V p U U T Vp T = C r r T + ũũ T + V p U U T V T p, (3.22) 7

10 where ũ = ω b ( s η ) C ω C y, r = ϱ η γ C ω C y ã. (3.23) Note that H + is not here the Broyden class update of H. Similarly as in Section 3. we define (/γ)c + = C r r T = ζi R R T r r T. The following lemma gives simple conditions for matrix C + to be positive definite. Lemma 3.2. Let matrix C be positive definite, η C satisfy (3.20) and vector r be given by (3.23). Then matrix C r r T is also positive definite. Proof. For any q R N we obtain by the Schwarz inequality and b > 0 q T (C rr T ) q q T C q T = ( ϱ η γ C ω ) (q T C y) 2 ã q T C q T ( ϱ η γ C ω ) ϱ η = γ C ω > 0. To get H + = C + + U + U T +, we can similarly as in (3.6) for k m set ζ + = γζ, (/ γ)u + = V p U + ũ z T, (/ γ)r + = R + r z T 2, (3.24) for k < m simply add column ũ to V p U and r to R. The second approach utilizes idea of shifting, similarly as shifted VM methods, see [7]. We again update C, using the scaled standard Broyden class update (3.8) in the usual form (/γ)c BC + = C r r T +ũũ T with r, ũ given by (3.23), set (/γ)c + = C r r T and update ζ and R according to (3.24). To update matrix U, we use shifted quasi-newton condition U + U T +y = ϱ s, (3.25) where in view of H + = C + + U + U+, T (2.2) and (3.23) we define s = ( H+ y C + y ) = s γ ( C y r T y r ) = s γ [ ϱ ϱ ϱ C y ( ϱ η γ C ω ) ] = s η C C y. (3.26) ω Important property of shifting defined in this way is that number b = s T y is always positive, since by (3.26) b = ( s η C ω C y) T y = b η C ω ã = b ω ã η C ω Now we define matrix U + by the following theorem, see [7]. = (ϱ/γ) b ω > 0. (3.27) Theorem 3.2. Let T be a symmetric positive definite matrix, ϱ > 0, γ > 0, p = T y and denote U the set of N m, m N, matrices. Then the unique solution to is min{ϕ(u + ) : U + U} s.t. (3.25), ϕ(u + ) = y T T y T /2 (U + γ U) 2 F, (/ γ)u + = V p UP + (/ b) sz T, z = (/ γ)u T +y, P = I zz T /z T z. 8

11 It follows from (3.25) and (3.27) that vector z defined by this theorem satisfy z T z = (/γ)y T U + U+y T = (ϱ/γ) s T y = b 2 (ϱ/γ)/ b = b 2 ω/b, which yields by (3.26) and (3.23) ( z T z/ b) s = ω/b s = ũ. Setting z = z, Theorem 3.2 gives for U + the updating formula (/ γ)u + = V p U + ( zz T / b) s z T = V p U + ũ z T, (3.28) which is the corresponding formula in (3.24) with V p instead of V p, i.e. we see that the projection matrix in this update formula can be derived not only from update of matrix C, but also in another way, e.g. from update of H. Summarizing the updating process, we see that it is very similar to the process in Section 3.. First we choose vectors z, z 2, see recommendations in Section 3.5, and then we determine matrices U, R, C, H by (2.6) and compute vectors ũ, r by (3.23). Finally we compute number ζ + and matrices U + by (3.24) or (3.28) and R + by (3.24) and determine matrices C + by (3.5) and H + by (3.4). Previous methods can also be modified to use the partly inverse representation of updates. We can readily verify the following generalization of (3.7) for the scaled standard Broyden class update with parameter η C 0 [ ( γ C BC + = C ηc + η C + ã ) ] γ yy T + ũũ T, (3.29) b ϱ ã which can be used instead of (/γ)c BC + = C r r T + ũũ T in (3.22). 3.4 Generalization to the scaled Broyden class using transformation Methods based on adaptation of the scaled BFGS method can be generalized to the standard Broyden class updates with parameter η if we use the following special representation of these updates. We denote a = y T H y, ω = ϱ γ + a b η, µ = η + ( η) ϱ γ Theorem 3.3. Let ϱ > 0, γ > 0, a ω 0, µ 0 and β = (η ± µ)/ω. Then the scaled standard Broyden class update of H with parameter η, scaling parameter γ and nonquadratic correction parameter ϱ can be expressed in the form γ H BC + = ϱ γ η ŝŝt b + ˆV H ˆV T, ŝ = s βh y, ˆV = I ± b a. µ b ŝyt. (3.30) Moreover, if ˆω = (ϱ/γ)η + (ã/b)µ > 0, we can write γ H BC + = C ˆrˆr T + ûû T + ˆV ˆω U U T ˆV T, ˆr = µˆωb C y, û = ŝ ± ˆr. (3.3) b 9

12 Proof. Consider the scaled Broyden class update with parameters η, γ and ϱ in the form, see [5], γ H BC + = H + ω b sst η ( H ys T + sy T H ) + η b a H yyt H. Setting s = ŝ + ξh y, ξ R, we obtain γ H BC + = H + ω b ŝŝt + ξω η ( H yŝ T + ŝy T H ) ( ) η + + ξ2 ω 2ξη H yy T H. b a b The last term vanishes for ξ 2 ω 2ξη + (b/a)(η ) = 0, i.e. for ξ = (η ± µ)/ω = β; then ξω η = ± µ and γ H BC + = H + ω b ŝŝt + ± µ ( H yŝ T + ŝy T H ) = b ˆV H ˆV T + (ω a ) ŝŝ T b µ b, which is (3.30) in view of bω a µ = (ϱ/γ)b[ ( η)] + aη aη = (ϱ/γ)ηb. Setting H = C + U U T into (3.30), we get γ H BC + = ϱ γ η ŝŝt b + ˆV U U T ˆV T + ( µ I ± ) b ŝyt C ( µ I ± ) b yŝt = ˆV U U T ˆV T + C + ± µ ( C yŝ T + ŝy T C ) ( ϱ + b γ η + ã ) ŝŝ T b µ b = ˆV U U T ˆV T + C + ˆω (ŝ µ µ ± b ˆω C y)( ŝ ± ˆω C y) T µ ˆωb C yyt C, which is (3.3). Note that ˆV is not projection matrix in general and that we prefer the minus sign in β, ˆV and û, since then for η = (BFGS) we get β = 0, ŝ = s and ˆV = V. For η it is also µ, therefore the formula for β above should be rewritten in another form. In view of we obtain η 2 µ = η 2 η ( η)ϱb/(γa) = (η )(η + ϱb/(γa)) = (η )ωb/a β = η µ ω = η2 µ ω(η + µ) = (η )ωb/a ω(η + µ) = (η )b/a η + µ. For better understanding, condition µ 0 can be rewritten as η(ϱb γa) ϱb, i.e. η η SR for η SR > 0, or η η SR for η SR < 0, where η SR is the value of parameter η for the SR method, η SR = ϱb/(ϱb γa), see [5]. Now we define (/γ)c + = C ˆrˆr T and give conditions for matrix C + to be positive definite. Lemma 3.3. Let the assumptions of Theorem 3.3 be satisfied, matrix C be positive definite, vector ˆr be given by (3.3) and η > 0. Then C ˆrˆr T is also positive definite. 0

13 Proof. For any q R N we obtain by the Schwarz inequality and b > 0 q T (C ˆrˆr T )q q T C q T = µ(qt C y) 2 (ˆωb) q T C q T µã ˆωb = µã ηbϱ/γ + µã = ηbϱ/γ ηbϱ/γ + µã > 0. To get H + = C + + U + U T +, we can set for k m similarly as in (3.6) ζ + = γζ, (/ γ)u + = ˆV U + û z T, (/ γ)r + = R + ˆr z T 2, (3.32) for k < m simply add column û to ˆV U and ˆr to R. Summarizing the updating process, we see that it is again very similar to the process in Section 3.. First we choose vectors z, z 2, see recommendations in Section 3.5, and then we determine matrices U, R, C, H by (2.6) and compute vectors ŝ by (3.30) and û, ˆr by (3.3). Finally we compute number ζ + and matrices U +, R + by (3.32) and determine matrices C + by (3.5) and H + by (3.4). It can be readily verified that the partly inverse representation of (3.3) has the form [ ( η γ H BC + = C + + a ) ] γ yy T + ûû T + η b ϱ a ˆV U U T ˆV T. (3.33) 3.5 Choice of parameters Efficiency of all methods described in this report depends very much on the suitable choice of parameters. Since analysis of VM matrix damage caused by projections is very complicated, recommendations given here have empirical character. The choice of scaling parameter γ plays the main role. The value γ =, i.e. scaling is omitted in some iterations, is not good here in general. In starting iterations (H = H, C = C), the value γ = b/a appears to be the best choice; the choices γ = b/ã, γ = b/ aã give approximately the same results. In other iterations the choice γ = b/ aã gives good results, the value γ = b/ ã max(a, ã + U T Bs 2 ) seems to be the best choice. In all iterations, if γ computed in this way is less than 0 3, we set γ = b/ã. Also the choice of the projection vector z has great influence. The good choices are z () = U T Bs (y T UU T Bs/ U T y 2 )U T y, z (2) = U T Bs ( U T Bs / y T Hy) U T y, we use combination z = ( θ)z () + θz (2) with θ = U T y 2 /( U T y 2 + U T Bs 2 ). Note that y T Uz () = 0. On the other hand, the choice of the vector z 2 has only little influence. We use the vector z 2 = (y T RR T Bs) R T Bs ( R T Bs 2 )R T y, which satisfies s T BRz 2 = 0. As regards nonquadratic correction parameter ϱ, no value seems to be the best, therefore we choose ϱ =.

14 4 Computational experiments Our new limited-memory VM methods were tested, using the collection of sparse and partially separable test problems from [6] (Test 4 and Test 5, 22 problems each) with N = 000, m = 0, ϱ =, η = 0.8 in starting iterations and η = otherwise, the final precision g(x ) 0 6 for Test 4 and g(x ) 0 5 for Test 5. In Table we compare our method described in Section 3.4 (PLM) with the following methods: method given in [2] (BNS), shifted limited-memory method (SLM), see [7], and variationally-derived limited-memory method (VLM), see[8]. We present the total numbers of function and also gradient evaluations over all problems (NFE) and the total computational time (Time) in seconds. Collection BNS SLM VLM PLM of problems NFE Time NFE Time NFE Time NFE Time Test Test Table. Comparison with other methods for Test 4 and Test 5. We see that the numerical results of the new methods are comparable with other methods as regards the number of evaluations. The longer total computational time is caused by the fact that the number of columns of matrices U, R are the same in this test version of method. We plan to reduce the number of columns of matrix R in future, see Section 2. For a better comparison with method BNS, we performed additional tests with problems from the widely used CUTE collection [] with various dimensions N and the final precision g(x ) 0 6. The results are given in Table 2, where NFE is the number of function and also gradient evaluations and Time the computational time in seconds. CUTE BNS SLM VLM PLM Problem N NFV Time NFV Time NFV Time NFV Time BROYDN7D CURLY DIXMAANI DQRTIC FLETCBV GENHUMPS GENROSE MSQRTALS NCB20B NONCVXU NONDQUAR POWER QUARTC Table 2: Comparison with other methods for CUTE. 2

15 Our limited numerical experiments indicate that the additional improvements and testing of the new methods could be useful. References [] I. Bongartz, A.R. Conn, N. Gould, P.L. Toint: CUTE: constrained and unconstrained testing environment, ACM Transactions on Mathematical Software 2 (995), [2] R.H. Byrd, J. Nocedal, R.B. Schnabel: Representation of quasi-newton matrices and their use in limited memory methods, Math. Programming 63 (994) [3] R. Fletcher: Practical Methods of Optimization, John Wiley & Sons, Chichester, 987. [4] J. Greenstadt, Variations on variable metric methods, Math. Comput. 24 (970) -22. [5] L. Lukšan, E. Spedicato: Variable metric methods for unconstrained optimization and nonlinear least squares, J. Comput. Appl. Math. 24 (2000) [6] L. Lukšan, J. Vlček: Sparse and partially separable test problems for unconstrained and equality constrained optimization, Report V-767, Prague, ICS AS CR, 998. [7] J. Vlček, L. Lukšan: Shifted variable metric methods for large-scale unconstrained optimization, J. Comput. Appl. Math. 86 (2006) [8] J. Vlček, L. Lukšan: New class of limited-memory variationally-derived variable metric methods, Report V-973, Prague, ICS AS CR,

New class of limited-memory variationally-derived variable metric methods 1

New class of limited-memory variationally-derived variable metric methods 1 Jan Vlček, Ladislav Lukšan Institute of Computer Science, Academy of Sciences of the Czech Republic, L. Lukšan is also from Technical