Institute of Computer Science. Limited-memory projective variable metric methods for unconstrained minimization
|
|
- Trevor Walters
- 6 years ago
- Views:
Transcription
1 Institute of Computer Science Academy of Sciences of the Czech Republic Limited-memory projective variable metric methods for unconstrained minimization J. Vlček, L. Lukšan Technical report No. V 036 December 2008 Pod Vodárenskou věží 2, Prague 8 phone: , fax: , luksan@cs.cas.cz, vlcek@cs.cas.cz
2 Institute of Computer Science Academy of Sciences of the Czech Republic Limited-memory projective variable metric methods for unconstrained minimization J. Vlček, L. Lukšan Technical report No. V 036 December 2008 Abstract: A new family of limited-memory variable metric or quasi-newton methods for unconstrained minimization is given. The methods are based on a positive definite inverse Hessian approximation in the form of the sum of identity matrix and two low rank matrices, obtained by the standard scaled Broyden class update. To reduce the rank of matrices, various projections are used. Numerical experience is encouraging. Keywords: Unconstrained minimization, variable metric methods, limited-memory methods, Broyden class updates, projection matrix, numerical results. This work was supported by the Grant Agency of the Czech Academy of Sciences, project No. IAA030405, and the Institutional research plan No. AV0Z , L. Lukšan is also from Technical University of Liberec, Hálkova 6, 46 7 Liberec.
3 Introduction In this report we present a new family of limited-memory variable metric (VM) line search methods for unconstrained minimization, which are projective, ie. they reduce the rank of matrices in the inverse Hessian approximation by suitable projection. VM line search methods, see [5], [3], are iterative. Starting with an initial point x 0 R N, they generate iterations x k+ R N by the process x k+ = x k + s k, s k = t k d k, k 0, where d k is the direction vector and t k > 0 is a stepsize. We assume that the problem function f : R N R is differentiable, d k = H k g k and stepsize t k is chosen in such a way that f k+ f k ε t k g T k d k, g T k+d k ε 2 g T k d k, (.) k 0, where 0 < ε < /2, ε < ε 2 <, f k = f(x k ), g k = f(x k ) and H k is a symmetric positive definite matrix. We denote B k = Hk, y k = g k+ g k, k 0 and by. F the Frobenius matrix norm. Various possibilities how to construct positive definite inverse Hessian approximations based on the low rank matrices are discussed in Section 2. In Section 3 we show how properties of the standard scaled BFGS update can be utilized and generalized to construct new limited-memory methods. Numerical results are presented in Section 4. 2 Positive definite inverse Hessian approximation A simple form of positive definite matrices H k based on low rank matrices is ζ k I+U k U T k, k 0, where ζ k > 0 and U k are N m k rectangular matrices, m k N, but updating matrices of this form appears to be difficult. To avoid this drawback, shifted VM methods, see [7], were developed. They appeared to be very efficient, except for ill-conditioned problems, which is probably caused by the fact that shifted VM updates are not invariant under linear transformations (significance of the invariance property of methods for solution of ill-conditioned problems is discussed in [3]). Another possibility, how to update matrices of this form, is described in [8]. Here invariant variationally-derived updates are applied, but VM matrices cannot be directly used to calculate direction vectors d k, and thus complicated corrections are necessary. Our new methods are based on matrices H k of the form H k = ζ k I + U k U T k R k R T k, (2.) k 0, where ζ k > 0 and U k, R k are N min(k, m) rectangular matrices, m N. Most of VM matrices in standard VM methods, see [3], [5], can be written in this way. These methods set usually H 0 = I and H k+ is obtained from γ k H k (γ k is a scaling parameter) by a rank-two VM update to satisfy the quasi-newton condition; its generalized form is H k+ y k = ϱ k s k, (2.2) where ϱ k > 0 is a nonquadratic correction parameter (see [5]). Therefore matrices H k can be easily updated in starting iterations, i.e. for k < m. When k m, the rank of matrices U, R needs to be reduced before each update and
4 updating formulas modified, which is described in Section 2.. Some conditions to have the result positive definite are discussed in Section 2.2. We use in this report matrices U k, R k with the same number of columns only for simplicity. Numerical experiments indicate that the number of columns of R k can be substantially reduced without significant increasing of the number of function and gradient evaluations, which can increase efficiency. All methods here can be easily adapted in this way. For given q k R N, qk T y k 0, we denote by V qk the projection matrix I q k yk/q T k T y k. To simplify the notation we frequently omit index k and replace index k+ by symbol +. In the subsequent analysis we use the following notation Note that b > 0 by (.). C = ζi RR T, b = s T y, V = V s = I (/b)sy T. 2. Limited-memory VM matrices updating To be able to add next columns, when matrices U, R have rank m, we first need to make their columns dependent, i.e. we need matrices U, R, near to U, R, and suitable vectors z, z 2 R m, satisfying U z = 0, R z 2 = 0. Such matrix U can be in a general way written as the product of any N m matrix and orthogonal projection matrix P = I z z T /z T z = I z z T, z = z / z, similarly for matrix R with P 2 = I z 2 z T 2, z 2 = z 2 / z 2, instead of P. The following lemma can be used to find the nearest (in the sense of Frobenius matrix norm) matrices to U, R. Lemma 2.. Let T be symmetric positive definite, q R N, z R m, z 0, m N, and denote W the set of N m matrices, W W. Then the unique solution to is min{ T /2 (W W ) F : W W} s.t. W z = 0 (2.3) W = W P, P = I zz T /z T z, (2.4) independently on the choice of T. Moreover, we can write W W T + qq T = ( W + qz T / z ) ( W + qz T / z ) T. (2.5) Proof. Setting W = (w,..., w m ), W = (w,..., w m ), z = (ξ,..., ξ m ) T, we define Lagrangian function L = L(W, e), e R N, as L = 2 T /2 (W W ) 2 F + e T W z = m m (w i w i ) T T (w i w i ) + ξ i e T w i. 2 i= i= A local minimizer W satisfies the equations L/ w i = 0, i =,..., m, which gives T (w i w i ) + ξ i e = 0, i =,..., m, yielding W = W T ez T. Using the condition W z = 0, we have T e = W z/z T z, i.e. W = W P. Matrix W satisfies (2.3) in view of convexity of Frobenius norm and the rest follows immediately from (2.4). 2
5 In view of Lemma 2. we set U = UP, R = RP 2, C = ζi R R T, H = C + U U T. (2.6) If matrix H is positive definite and we choose a suitable standard VM update of H in the form (/γ)h + = H + uu T rr T which maintain positive definiteness of VM matrices, limited-memory updating of H can be completely covered by Lemma 2.. Writing H + = ζ + I R + R++U T + U+ T and using (2.5), we can set ζ + = γζ, (/ γ)u + = UP +u z T, (/ γ)r + = RP 2 + r z 2 T. Unfortunately, in general case we cannot find vectors z, z 2 which guarantee positive definiteness of H. Some conditions that ensure it will be given in the next section. 2.2 Conditions for positive definiteness of reduced VM matrices In view of P 2 i = P i, i =, 2, we can write U U T = UP U T = UU T U z z U T, R R T = RP 2 R T = RR T R z 2 z 2 R T. (2.7) In view of C = C + R z 2 z 2 R T by (2.6), we see that situation is quite simple, when matrix C is positive definite, since then also matrix C and H = C + U U T are positive definite for any z, z 2. The next section will be devoted to such standard VM updates that can be adapted to maintain this property. Even if matrix C is not positive definite, the choice of z 2 has no influence to preserving of positive definiteness of H after reduction of matrix R, since ζi R R T + UU T = H+R z 2 z 2 R T by (2.7). The following lemma shows one possibility how to attain positive definiteness of the VM matrix after reduction of matrix U. Lemma 2.2. Let A = H + vv T, v R N, and denote λ(a), λ(h) the minimum eigenvalues of A, H. If [ U, R ] T v = 0 then λ(h) = λ(a). Conditions U T v = 0, R T v = 0 can be satisfied e.g. by the choice z = U T v, z 2 = R T v. Proof. We first show that λ(a) ζ. Denoting K = {q R N : [ U, v] T q = 0, q = }, we have λ(a) = min q = qt Aq min q T Aq = min q T (ζi R R T )q ζ. q K q K We will consider the following two cases: (a) If λ(h) = ζ, then λ(h) = λ(a) holds, since ζ = λ(h) λ(a) ζ. (b) If λ(h) ζ, then H w = λ(h) w, w 0, implies U U T w R R T w = (λ(h) ζ)w and thus w K 2 = {q range([ U, R ]) : q 0}. By [ U, R ] T v = 0 we get λ(h) λ(a) = min q 0 q T Aq q T q min q T Aq q K 2 q T q = min q T H q q K 2 q T q wt H w w T w = λ(h), therefore we again obtain λ(h) = λ(a). If z = U T v, we have U T v = P U T v = P z = 0 and similarly R T v = 0 for z 2 = R T v. 3
6 Note that in our case v = U z by (2.7), thus the choice z = U T v leads to the eigenvalue problem U T Uz = z z ; in view of (2.7) the solution of this problem minimizes matrix UU T damage, measured by U z z U T = U z 2 = z U T Uz /z T z, if we choose as z the eigenvector corresponding to the smallest eigenvalue of U T U. This is valuable also in situations when we attain positive definiteness of the VM matrix after reduction of matrix U in another way. 3 New limited-memory methods In this section we focus on adaptation of standard VM updates to maintain positive definiteness of matrix C. Having reduced matrix C positive definite, we want to find updated matrix H + satisfying the quasi-newton condition (2.2), in the form H + = C + + U + U T +, where C + is positive definite. An easy way to do it is shown in Section 3. in case of the scaled BFGS method, see [5]. The variant of this approach based on the partly inverse representation of matrix C is described in Section 3.2. These results are generalized in Section 3.3, using variational approach, and in Section 3.4, using a special transformation. The choice of parameters, including z and z 2, is discussed in Section Adaptation of the scaled BFGS method The standard scaled BFGS update of H can be written in the form (see [5]) BF GS H + = ϱ γ γb sst + V HV T, (3.) which in view of H = C + U U T after easy rearrangements leads to where BF GS H + = BF GS C + + V U U T V T = C rr T + uu T + V U U T V T, (3.2) γ γ u = (/ ωb) (ωs C y), r = (/ ωb) C y, ω = ϱ/γ + ã/b, ã = y T C y. (3.3) If we define (/γ)c + = C rr T, we can easily show that matrix C + is positive definite. Lemma 3.. Let matrix C be positive definite and vector r be given by (3.3). Then matrix C rr T is also positive definite. Proof. For any q R N we obtain by the Schwarz inequality, (3.3) and b > 0 q T (C rr T )q q T C q T = (qt C y) 2 (ωb) q T C q T ã ωb = ã bϱ/γ + ã = bϱ/γ bϱ/γ + ã > 0. Replacing C rr T in (3.2) by (/γ)c +, we see that to get H + = C + + U + U T +, (3.4) 4
7 we need matrix U + satisfying (/γ)u + U T + = V U U T V T + uu T. Similarly, considering that C = ζi R R T, we see that to get C + = ζ + I R + R T +, (3.5) we need ζ + > 0 and matrix R + satisfying (/γ)r + R T + = R R T + rr T. By (2.6) we can fulfil all these requirements if we define ζ + = γζ, (/ γ)u + = V U + u z T, (/ γ)r + = R + r z T 2. (3.6) For better clearness, we summarize the updating process. First we choose vectors z, z 2, see recommendations in Section 3.5, and then we determine matrices U, R, C, H by (2.6) and compute vectors u, r by (3.3). Finally we compute number ζ + and matrices U +, R + by (3.6) and determine matrices C + by (3.5) and H + by (3.4). In starting iterations we omit reductions of matrices U, R, i.e. C = C, U = U, R = R and simply add column u to V U and r to R. Note that then H + is the BFGS update of H, unlike C +, which is not any standard VM update of matrix C. 3.2 Partly inverse representation of the scaled BFGS method In the previous method we can change ζ only by scaling (3.6) to guarantee that C + will be positive definite. This drawback can be overcome when we use partly inverse representation of the scaled BFGS update. In starting iterations we can in (3.2) rewrite the BFGS update of matrix C = C in the form ( BF GS C + = C + γ ) γ ϱb yyt + uu T (3.7) with u given again by (3.2), which can be readily verified. Thus instead of (/γ)c + = C rr T, we can use equivalent formula C + = γ C + ϱb yyt. (3.8) Starting with C 0 = I and updating C0,..., Ck according to (3.8), we obtain C = C = (/ γ) I + Y DY T, (3.9) where γ = γ γ k, Y = [y,..., y k ] and diagonal positive definite matrix D is given by D = [/(ϱ b )] and by update formula ( D + = diag γ D, ). (3.0) ϱb In view of (3.9), we store matrices Y, D instead of R. When k m, we reduce matrix U in the same way as before, i.e. U = UP, while in case of matrices Y, D, we can simply delete their first columns, i.e. supposing that Y = [y k m,..., y k ], D = diag(d,..., d m ) and C = (/ζ) I + Y DY T, ζ > 0, we set Y = [y k m+,..., y k ], D = diag(d 2,..., d m ), C = (/ζ) I + Y D Y T. (3.) 5
8 To get H + = C + + U + U+, T we define ζ +, U + again by (3.6) and C + in view of (3.8) by C+ = ( I + Y + D + Y+ T, Y + = [Y, y], D + = diag ζ + γ D, ). (3.2) ϱb To be able to compute C y, we store m m matrix Y T Y and use the Woodbury formula, which yields ( ) ( ) C = ζ I + Y D Y T = ζi ζy ζ D + Y T Y Y T, (3.3) i.e. to compute C y, we need to solve a linear system with symmetric positive definite matrix (/ζ) D + Y T Y. Note that during computation of C y we can save vector Y T y and use it in updating of matrix Y T Y, since by (3.2) we obtain ( Y Y+ T T Y Y T ) y Y + = y T Y y T. (3.4) y Now we summarize the updating process. First we choose vector z, see recommendations in Section 3.5, and then we determine matrices U by (2.6), Y, D, C by (3.) and H by (2.6) and compute vector u by (3.3), using (3.3). Finally we compute number ζ + and matrix U + by (3.6) and determine matrices Y +, D +, C + by (3.2), using (3.4), and H + by (3.4). If we wish to use a different projection instead of deleting the first column of Y and D (although we are not able to say in this time, how to choose a suitable projection), we can use another representation of C. In starting iterations we rewrite (3.9) in the form C = C = (/ γ) I + Ỹ Ỹ T, Ỹ = Y D /2. (3.5) Then for k m we reduce matrix U as before, i.e. U = UP, and matrix Ỹ in a similar way, i.e. Ỹ = Ỹ P 3, P 3 = I z 3 z 3 T, z 3 = z 3 / z 3, with some z 3 R m, and set C = (/ζ) I + Ỹ Ỹ T, ζ > 0. (3.6) To get H + = C + + U + U T +, we define ζ +, U + again by (3.6) and C + in view of (3.8) by C + = ζ + I + Ỹ+Ỹ T +, Ỹ + = γ Ỹ + ϱb y z T 3. (3.7) To be able to compute C y, we store m m matrix Ỹ T Ỹ and use the Woodbury formula, which yields ( ) ( ) C = ζ I + Ỹ Ỹ T = ζi ζỹ ζ I + Ỹ T T Ỹ Ỹ. Note that during computation of C y we can again save vector Ỹ T y and use it in updating of matrix Ỹ T Ỹ, since by (3.7) we obtain Ỹ+ T Ỹ+ = γ Ỹ T Ỹ + ) T (Ỹ y z T ϱγb 3 + z 3 y T Ỹ + yt y ϱ b z 3 z 3 T. 6
9 3.3 Variationally-derived generalization Although the scaled Broyden class updates of H with positive value of parameter η can be written in the quasi-product form similar to (3.), see [7], we cannot construct limited-memory methods in a similar way as in the previous sections, since then projection matrix contains vector H y C y and therefore we do not obtain the relation similar to (3.2). In spite of that, it is possible to generalize this process for the standard scaled Broyden class updates, as we show in the next section. In this section we briefly describe two other possibilities. The first approach is based on the projection variant, see [8], of the well-known Greenstadt s theorem, see [4]: Theorem 3.. Let M, T be symmetric matrices, T positive definite, ϱ > 0, p = T y and denote M the set of N N symmetric matrices. Then the unique solution to min{ T /2 (M + M)T /2 F : M + M} s.t. M + y = ϱs is determined by the relation V p (M + M) V T p = 0 and can be written in the form M + = E + V p (M E) V T p, where E is any symmetric matrix satisfying Ey = ϱs, e.g. E = (ϱ/b)ss T. Using this theorem with M = γh = γ(c + U U T ), M + = H +, p = s α C y, α R, and E = (ϱ/b)ss T, we obtain γ H + = V p U U T Vp T + γ C BC +, γ C BC + = ϱ ss T γ b ( + V p C ϱ ss T ) Vp T, (3.8) γ b where by Lemma 2.3 in [8], this C BC + is the Broyden class update, see [5], of C with parameter b 2 ( η C = α 2 ϱ ) ã. (3.9) (b αã) 2 γ b For given η C satisfying 0 < η C < ω γ ϱ, ω = ϱ γ + ã b η C, (3.20) we can readily verify that the corresponding α can be obtained from α = η C + η C + ( η C )ϱb/(γã) ω or α 2 = (b/ã)(η C ) η C + η C + ( η C )ϱb/(γã). (3.2) Using the usual form of the Broyden class update of C in (3.8), after straightforward arrangements we obtain the relation, similar to (3.2) γ H + = γ C BC + + V p U U T Vp T = C r r T + ũũ T + V p U U T V T p, (3.22) 7
10 where ũ = ω b ( s η ) C ω C y, r = ϱ η γ C ω C y ã. (3.23) Note that H + is not here the Broyden class update of H. Similarly as in Section 3. we define (/γ)c + = C r r T = ζi R R T r r T. The following lemma gives simple conditions for matrix C + to be positive definite. Lemma 3.2. Let matrix C be positive definite, η C satisfy (3.20) and vector r be given by (3.23). Then matrix C r r T is also positive definite. Proof. For any q R N we obtain by the Schwarz inequality and b > 0 q T (C rr T ) q q T C q T = ( ϱ η γ C ω ) (q T C y) 2 ã q T C q T ( ϱ η γ C ω ) ϱ η = γ C ω > 0. To get H + = C + + U + U T +, we can similarly as in (3.6) for k m set ζ + = γζ, (/ γ)u + = V p U + ũ z T, (/ γ)r + = R + r z T 2, (3.24) for k < m simply add column ũ to V p U and r to R. The second approach utilizes idea of shifting, similarly as shifted VM methods, see [7]. We again update C, using the scaled standard Broyden class update (3.8) in the usual form (/γ)c BC + = C r r T +ũũ T with r, ũ given by (3.23), set (/γ)c + = C r r T and update ζ and R according to (3.24). To update matrix U, we use shifted quasi-newton condition U + U T +y = ϱ s, (3.25) where in view of H + = C + + U + U+, T (2.2) and (3.23) we define s = ( H+ y C + y ) = s γ ( C y r T y r ) = s γ [ ϱ ϱ ϱ C y ( ϱ η γ C ω ) ] = s η C C y. (3.26) ω Important property of shifting defined in this way is that number b = s T y is always positive, since by (3.26) b = ( s η C ω C y) T y = b η C ω ã = b ω ã η C ω Now we define matrix U + by the following theorem, see [7]. = (ϱ/γ) b ω > 0. (3.27) Theorem 3.2. Let T be a symmetric positive definite matrix, ϱ > 0, γ > 0, p = T y and denote U the set of N m, m N, matrices. Then the unique solution to is min{ϕ(u + ) : U + U} s.t. (3.25), ϕ(u + ) = y T T y T /2 (U + γ U) 2 F, (/ γ)u + = V p UP + (/ b) sz T, z = (/ γ)u T +y, P = I zz T /z T z. 8
11 It follows from (3.25) and (3.27) that vector z defined by this theorem satisfy z T z = (/γ)y T U + U+y T = (ϱ/γ) s T y = b 2 (ϱ/γ)/ b = b 2 ω/b, which yields by (3.26) and (3.23) ( z T z/ b) s = ω/b s = ũ. Setting z = z, Theorem 3.2 gives for U + the updating formula (/ γ)u + = V p U + ( zz T / b) s z T = V p U + ũ z T, (3.28) which is the corresponding formula in (3.24) with V p instead of V p, i.e. we see that the projection matrix in this update formula can be derived not only from update of matrix C, but also in another way, e.g. from update of H. Summarizing the updating process, we see that it is very similar to the process in Section 3.. First we choose vectors z, z 2, see recommendations in Section 3.5, and then we determine matrices U, R, C, H by (2.6) and compute vectors ũ, r by (3.23). Finally we compute number ζ + and matrices U + by (3.24) or (3.28) and R + by (3.24) and determine matrices C + by (3.5) and H + by (3.4). Previous methods can also be modified to use the partly inverse representation of updates. We can readily verify the following generalization of (3.7) for the scaled standard Broyden class update with parameter η C 0 [ ( γ C BC + = C ηc + η C + ã ) ] γ yy T + ũũ T, (3.29) b ϱ ã which can be used instead of (/γ)c BC + = C r r T + ũũ T in (3.22). 3.4 Generalization to the scaled Broyden class using transformation Methods based on adaptation of the scaled BFGS method can be generalized to the standard Broyden class updates with parameter η if we use the following special representation of these updates. We denote a = y T H y, ω = ϱ γ + a b η, µ = η + ( η) ϱ γ Theorem 3.3. Let ϱ > 0, γ > 0, a ω 0, µ 0 and β = (η ± µ)/ω. Then the scaled standard Broyden class update of H with parameter η, scaling parameter γ and nonquadratic correction parameter ϱ can be expressed in the form γ H BC + = ϱ γ η ŝŝt b + ˆV H ˆV T, ŝ = s βh y, ˆV = I ± b a. µ b ŝyt. (3.30) Moreover, if ˆω = (ϱ/γ)η + (ã/b)µ > 0, we can write γ H BC + = C ˆrˆr T + ûû T + ˆV ˆω U U T ˆV T, ˆr = µˆωb C y, û = ŝ ± ˆr. (3.3) b 9
12 Proof. Consider the scaled Broyden class update with parameters η, γ and ϱ in the form, see [5], γ H BC + = H + ω b sst η ( H ys T + sy T H ) + η b a H yyt H. Setting s = ŝ + ξh y, ξ R, we obtain γ H BC + = H + ω b ŝŝt + ξω η ( H yŝ T + ŝy T H ) ( ) η + + ξ2 ω 2ξη H yy T H. b a b The last term vanishes for ξ 2 ω 2ξη + (b/a)(η ) = 0, i.e. for ξ = (η ± µ)/ω = β; then ξω η = ± µ and γ H BC + = H + ω b ŝŝt + ± µ ( H yŝ T + ŝy T H ) = b ˆV H ˆV T + (ω a ) ŝŝ T b µ b, which is (3.30) in view of bω a µ = (ϱ/γ)b[ ( η)] + aη aη = (ϱ/γ)ηb. Setting H = C + U U T into (3.30), we get γ H BC + = ϱ γ η ŝŝt b + ˆV U U T ˆV T + ( µ I ± ) b ŝyt C ( µ I ± ) b yŝt = ˆV U U T ˆV T + C + ± µ ( C yŝ T + ŝy T C ) ( ϱ + b γ η + ã ) ŝŝ T b µ b = ˆV U U T ˆV T + C + ˆω (ŝ µ µ ± b ˆω C y)( ŝ ± ˆω C y) T µ ˆωb C yyt C, which is (3.3). Note that ˆV is not projection matrix in general and that we prefer the minus sign in β, ˆV and û, since then for η = (BFGS) we get β = 0, ŝ = s and ˆV = V. For η it is also µ, therefore the formula for β above should be rewritten in another form. In view of we obtain η 2 µ = η 2 η ( η)ϱb/(γa) = (η )(η + ϱb/(γa)) = (η )ωb/a β = η µ ω = η2 µ ω(η + µ) = (η )ωb/a ω(η + µ) = (η )b/a η + µ. For better understanding, condition µ 0 can be rewritten as η(ϱb γa) ϱb, i.e. η η SR for η SR > 0, or η η SR for η SR < 0, where η SR is the value of parameter η for the SR method, η SR = ϱb/(ϱb γa), see [5]. Now we define (/γ)c + = C ˆrˆr T and give conditions for matrix C + to be positive definite. Lemma 3.3. Let the assumptions of Theorem 3.3 be satisfied, matrix C be positive definite, vector ˆr be given by (3.3) and η > 0. Then C ˆrˆr T is also positive definite. 0
13 Proof. For any q R N we obtain by the Schwarz inequality and b > 0 q T (C ˆrˆr T )q q T C q T = µ(qt C y) 2 (ˆωb) q T C q T µã ˆωb = µã ηbϱ/γ + µã = ηbϱ/γ ηbϱ/γ + µã > 0. To get H + = C + + U + U T +, we can set for k m similarly as in (3.6) ζ + = γζ, (/ γ)u + = ˆV U + û z T, (/ γ)r + = R + ˆr z T 2, (3.32) for k < m simply add column û to ˆV U and ˆr to R. Summarizing the updating process, we see that it is again very similar to the process in Section 3.. First we choose vectors z, z 2, see recommendations in Section 3.5, and then we determine matrices U, R, C, H by (2.6) and compute vectors ŝ by (3.30) and û, ˆr by (3.3). Finally we compute number ζ + and matrices U +, R + by (3.32) and determine matrices C + by (3.5) and H + by (3.4). It can be readily verified that the partly inverse representation of (3.3) has the form [ ( η γ H BC + = C + + a ) ] γ yy T + ûû T + η b ϱ a ˆV U U T ˆV T. (3.33) 3.5 Choice of parameters Efficiency of all methods described in this report depends very much on the suitable choice of parameters. Since analysis of VM matrix damage caused by projections is very complicated, recommendations given here have empirical character. The choice of scaling parameter γ plays the main role. The value γ =, i.e. scaling is omitted in some iterations, is not good here in general. In starting iterations (H = H, C = C), the value γ = b/a appears to be the best choice; the choices γ = b/ã, γ = b/ aã give approximately the same results. In other iterations the choice γ = b/ aã gives good results, the value γ = b/ ã max(a, ã + U T Bs 2 ) seems to be the best choice. In all iterations, if γ computed in this way is less than 0 3, we set γ = b/ã. Also the choice of the projection vector z has great influence. The good choices are z () = U T Bs (y T UU T Bs/ U T y 2 )U T y, z (2) = U T Bs ( U T Bs / y T Hy) U T y, we use combination z = ( θ)z () + θz (2) with θ = U T y 2 /( U T y 2 + U T Bs 2 ). Note that y T Uz () = 0. On the other hand, the choice of the vector z 2 has only little influence. We use the vector z 2 = (y T RR T Bs) R T Bs ( R T Bs 2 )R T y, which satisfies s T BRz 2 = 0. As regards nonquadratic correction parameter ϱ, no value seems to be the best, therefore we choose ϱ =.
14 4 Computational experiments Our new limited-memory VM methods were tested, using the collection of sparse and partially separable test problems from [6] (Test 4 and Test 5, 22 problems each) with N = 000, m = 0, ϱ =, η = 0.8 in starting iterations and η = otherwise, the final precision g(x ) 0 6 for Test 4 and g(x ) 0 5 for Test 5. In Table we compare our method described in Section 3.4 (PLM) with the following methods: method given in [2] (BNS), shifted limited-memory method (SLM), see [7], and variationally-derived limited-memory method (VLM), see[8]. We present the total numbers of function and also gradient evaluations over all problems (NFE) and the total computational time (Time) in seconds. Collection BNS SLM VLM PLM of problems NFE Time NFE Time NFE Time NFE Time Test Test Table. Comparison with other methods for Test 4 and Test 5. We see that the numerical results of the new methods are comparable with other methods as regards the number of evaluations. The longer total computational time is caused by the fact that the number of columns of matrices U, R are the same in this test version of method. We plan to reduce the number of columns of matrix R in future, see Section 2. For a better comparison with method BNS, we performed additional tests with problems from the widely used CUTE collection [] with various dimensions N and the final precision g(x ) 0 6. The results are given in Table 2, where NFE is the number of function and also gradient evaluations and Time the computational time in seconds. CUTE BNS SLM VLM PLM Problem N NFV Time NFV Time NFV Time NFV Time BROYDN7D CURLY DIXMAANI DQRTIC FLETCBV GENHUMPS GENROSE MSQRTALS NCB20B NONCVXU NONDQUAR POWER QUARTC Table 2: Comparison with other methods for CUTE. 2
15 Our limited numerical experiments indicate that the additional improvements and testing of the new methods could be useful. References [] I. Bongartz, A.R. Conn, N. Gould, P.L. Toint: CUTE: constrained and unconstrained testing environment, ACM Transactions on Mathematical Software 2 (995), [2] R.H. Byrd, J. Nocedal, R.B. Schnabel: Representation of quasi-newton matrices and their use in limited memory methods, Math. Programming 63 (994) [3] R. Fletcher: Practical Methods of Optimization, John Wiley & Sons, Chichester, 987. [4] J. Greenstadt, Variations on variable metric methods, Math. Comput. 24 (970) -22. [5] L. Lukšan, E. Spedicato: Variable metric methods for unconstrained optimization and nonlinear least squares, J. Comput. Appl. Math. 24 (2000) [6] L. Lukšan, J. Vlček: Sparse and partially separable test problems for unconstrained and equality constrained optimization, Report V-767, Prague, ICS AS CR, 998. [7] J. Vlček, L. Lukšan: Shifted variable metric methods for large-scale unconstrained optimization, J. Comput. Appl. Math. 86 (2006) [8] J. Vlček, L. Lukšan: New class of limited-memory variationally-derived variable metric methods, Report V-973, Prague, ICS AS CR,
New class of limited-memory variationally-derived variable metric methods 1
New class of limited-memory variationally-derived variable metric methods 1 Jan Vlček, Ladislav Lukšan Institute of Computer Science, Academy of Sciences of the Czech Republic, L. Lukšan is also from Technical
More informationInstitute of Computer Science. A conjugate directions approach to improve the limited-memory BFGS method
Institute of Computer Science Academy of Sciences of the Czech Republic A conjugate directions approach to improve the limited-memory BFGS method J. Vlček a, L. Lukšan a, b a Institute of Computer Science,
More informationInstitute of Computer Science. New class of limited-memory variationally-derived variable metric methods
Institute of Computer Science Academy of Sciences of the Czech Repulic New class of limited-memory variationally-derived variale metric methods J. Vlček, L. Lukšan echnical report No. V 973 Decemer 2006
More informationInstitute of Computer Science. A limited-memory optimization method using the infinitely many times repeated BNS update and conjugate directions
Institute of Computer Science Academy of Sciences of the Czech Republic A limited-memory optimization method using the infinitely many times repeated BNS update and conjugate directions Jan Vlček a, Ladislav
More informationImproved Damped Quasi-Newton Methods for Unconstrained Optimization
Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified
More information2. Quasi-Newton methods
L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization
More informationQuasi-Newton methods for minimization
Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1
More informationQuasi-Newton Methods. Javier Peña Convex Optimization /36-725
Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex
More informationOn Lagrange multipliers of trust region subproblems
On Lagrange multipliers of trust region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Applied Linear Algebra April 28-30, 2008 Novi Sad, Serbia Outline
More informationOn Lagrange multipliers of trust-region subproblems
On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008
More informationQuasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization
Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)
More informationSearch Directions for Unconstrained Optimization
8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All
More informationA COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS
A COMBINED CLASS OF SELF-SCALING AND MODIFIED QUASI-NEWTON METHODS MEHIDDIN AL-BAALI AND HUMAID KHALFAN Abstract. Techniques for obtaining safely positive definite Hessian approximations with selfscaling
More informationENSIEEHT-IRIT, 2, rue Camichel, Toulouse (France) LMS SAMTECH, A Siemens Business,15-16, Lower Park Row, BS1 5BN Bristol (UK)
Quasi-Newton updates with weighted secant equations by. Gratton, V. Malmedy and Ph. L. oint Report NAXY-09-203 6 October 203 0.5 0 0.5 0.5 0 0.5 ENIEEH-IRI, 2, rue Camichel, 3000 oulouse France LM AMECH,
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationLecture 18: November Review on Primal-dual interior-poit methods
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationSpectral gradient projection method for solving nonlinear monotone equations
Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department
More informationMSS: MATLAB SOFTWARE FOR L-BFGS TRUST-REGION SUBPROBLEMS FOR LARGE-SCALE OPTIMIZATION
MSS: MATLAB SOFTWARE FOR L-BFGS TRUST-REGION SUBPROBLEMS FOR LARGE-SCALE OPTIMIZATION JENNIFER B. ERWAY AND ROUMMEL F. MARCIA Abstract. A MATLAB implementation of the Moré-Sorensen sequential (MSS) method
More informationImproving L-BFGS Initialization for Trust-Region Methods in Deep Learning
Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University
More informationExtra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization 1
journal of complexity 18, 557 572 (2002) doi:10.1006/jcom.2001.0623 Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization 1 M. Al-Baali Department of Mathematics
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationPreconditioned conjugate gradient algorithms with column scaling
Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Preconditioned conjugate gradient algorithms with column scaling R. Pytla Institute of Automatic Control and
More informationREPORTS IN INFORMATICS
REPORTS IN INFORMATICS ISSN 0333-3590 A class of Methods Combining L-BFGS and Truncated Newton Lennart Frimannslund Trond Steihaug REPORT NO 319 April 2006 Department of Informatics UNIVERSITY OF BERGEN
More informationMatrix Secant Methods
Equation Solving g(x) = 0 Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). Newton-Lie Iterations: x +1 := x J g(x ), where J g (x ). 3700 years ago the Babylonians used the secant method in 1D:
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationThe antitriangular factorisation of saddle point matrices
The antitriangular factorisation of saddle point matrices J. Pestana and A. J. Wathen August 29, 2013 Abstract Mastronardi and Van Dooren [this journal, 34 (2013) pp. 173 196] recently introduced the block
More informationInstitute of Computer Science. Primal interior-point method for large sparse minimax optimization
Institute of Computer Science Academy of Sciences of the Czech Republic Primal interior-point method for large sparse minimax optimization L.Lukšan, C. Matonoha, J. Vlček Technical report No. 941 October
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationKey words. conjugate gradients, normwise backward error, incremental norm estimation.
Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322
More information1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν
1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection
More informationA derivative-free nonmonotone line search and its application to the spectral residual method
IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral
More informationConjugate gradient methods based on secant conditions that generate descent search directions for unconstrained optimization
Conjugate gradient methods based on secant conditions that generate descent search directions for unconstrained optimization Yasushi Narushima and Hiroshi Yabe September 28, 2011 Abstract Conjugate gradient
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationarxiv: v3 [math.na] 23 Mar 2016
Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms Robert M. Gower and Peter Richtárik arxiv:602.0768v3 [math.na] 23 Mar 206 School of Mathematics University of Edinburgh
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationMarch 8, 2010 MATH 408 FINAL EXAM SAMPLE
March 8, 200 MATH 408 FINAL EXAM SAMPLE EXAM OUTLINE The final exam for this course takes place in the regular course classroom (MEB 238) on Monday, March 2, 8:30-0:20 am. You may bring two-sided 8 page
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationJournal of Computational and Applied Mathematics. Notes on the Dai Yuan Yuan modified spectral gradient method
Journal of Computational Applied Mathematics 234 (200) 2986 2992 Contents lists available at ScienceDirect Journal of Computational Applied Mathematics journal homepage: wwwelseviercom/locate/cam Notes
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationA projected Hessian for full waveform inversion
CWP-679 A projected Hessian for full waveform inversion Yong Ma & Dave Hale Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA (c) Figure 1. Update directions for one iteration
More informationBFGS WITH UPDATE SKIPPING AND VARYING MEMORY. July 9, 1996
BFGS WITH UPDATE SKIPPING AND VARYING MEMORY TAMARA GIBSON y, DIANNE P. O'LEARY z, AND LARRY NAZARETH x July 9, 1996 Abstract. We give conditions under which limited-memory quasi-newton methods with exact
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationDENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS
DENSE INITIALIZATIONS FOR LIMITED-MEMORY QUASI-NEWTON METHODS by Johannes Brust, Oleg Burdaov, Jennifer B. Erway, and Roummel F. Marcia Technical Report 07-, Department of Mathematics and Statistics, Wae
More informationMultipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations
Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,
More informationAcademic Editors: Alicia Cordero, Juan R. Torregrosa and Francisco I. Chicharro
Algorithms 2015, 8, 774-785; doi:10.3390/a8030774 OPEN ACCESS algorithms ISSN 1999-4893 www.mdpi.com/journal/algorithms Article Parallel Variants of Broyden s Method Ioan Bistran, Stefan Maruster * and
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationDownloaded 12/02/13 to Redistribution subject to SIAM license or copyright; see
SIAM J. OPTIM. Vol. 23, No. 4, pp. 2150 2168 c 2013 Society for Industrial and Applied Mathematics THE LIMITED MEMORY CONJUGATE GRADIENT METHOD WILLIAM W. HAGER AND HONGCHAO ZHANG Abstract. In theory,
More informationALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS
ALGORITHM XXX: SC-SR1: MATLAB SOFTWARE FOR SOLVING SHAPE-CHANGING L-SR1 TRUST-REGION SUBPROBLEMS JOHANNES BRUST, OLEG BURDAKOV, JENNIFER B. ERWAY, ROUMMEL F. MARCIA, AND YA-XIANG YUAN Abstract. We present
More informationETNA Kent State University
Electronic Transactions on Numerical Analysis. Volume 41, pp. 159-166, 2014. Copyright 2014,. ISSN 1068-9613. ETNA MAX-MIN AND MIN-MAX APPROXIMATION PROBLEMS FOR NORMAL MATRICES REVISITED JÖRG LIESEN AND
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationCharacterization of half-radial matrices
Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the
More informationLBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version)
LBFGS John Langford, Large Scale Machine Learning Class, February 5 (post presentation version) We are still doing Linear Learning Features: a vector x R n Label: y R Goal: Learn w R n such that ŷ w (x)
More information230 L. HEI if ρ k is satisfactory enough, and to reduce it by a constant fraction (say, ahalf): k+1 = fi 2 k (0 <fi 2 < 1); (1.7) in the case ρ k is n
Journal of Computational Mathematics, Vol.21, No.2, 2003, 229 236. A SELF-ADAPTIVE TRUST REGION ALGORITHM Λ1) Long Hei y (Institute of Computational Mathematics and Scientific/Engineering Computing, Academy
More informationThe speed of Shor s R-algorithm
IMA Journal of Numerical Analysis 2008) 28, 711 720 doi:10.1093/imanum/drn008 Advance Access publication on September 12, 2008 The speed of Shor s R-algorithm J. V. BURKE Department of Mathematics, University
More informationSparse Hessian Factorization in Curved Trajectories for Minimization
Sparse Hessian Factorization in Curved Trajectories for Minimization ALBERTO J JIMÉNEZ * The Curved Trajectories Algorithm (CTA) for the minimization of unconstrained functions of several variables follows
More informationMinimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces
MATHEMATICS OF COMPUTATION, VOLUME 32, NUMBER 143 JULY 1978, PAGES 829-837 Minimum Norm Symmetric Quasi-Newton Updates Restricted to Subspaces By Robert B. Schnabel* Abstract. The Davidon-Fletcher-Powell
More informationThe Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation
Linear Algebra and its Applications 360 (003) 43 57 www.elsevier.com/locate/laa The Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation Panayiotis J. Psarrakos
More informationLeast Sparsity of p-norm based Optimization Problems with p > 1
Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from
More informationLimited Memory Bundle Algorithm for Large Bound Constrained Nonsmooth Minimization Problems
Reports of the Department of Mathematical Information Technology Series B. Scientific Computing No. B. 1/2006 Limited Memory Bundle Algorithm for Large Bound Constrained Nonsmooth Minimization Problems
More informationMinimax Design of Complex-Coefficient FIR Filters with Low Group Delay
Minimax Design of Complex-Coefficient FIR Filters with Low Group Delay Wu-Sheng Lu Takao Hinamoto Dept. of Elec. and Comp. Engineering Graduate School of Engineering University of Victoria Hiroshima University
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationThemodifiedabsolute-valuefactorizationnormfortrust-regionminimization
Themodifiedabsolute-valuefactorizationnormfortrust-regionminimization Nicholas I. M. Gould Department for Computation and Information, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX11 0QX, England.
More informationExtra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization M. Al-Baali y December 7, 2000 Abstract This pape
SULTAN QABOOS UNIVERSITY Department of Mathematics and Statistics Extra-Updates Criterion for the Limited Memory BFGS Algorithm for Large Scale Nonlinear Optimization by M. Al-Baali December 2000 Extra-Updates
More informationSECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS
SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss
More informationUNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems
UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationOn the use of quadratic models in unconstrained minimization without derivatives 1. M.J.D. Powell
On the use of quadratic models in unconstrained minimization without derivatives 1 M.J.D. Powell Abstract: Quadratic approximations to the objective function provide a way of estimating first and second
More informationAn Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization
An Infeasible Interior-Point Algorithm with full-newton Step for Linear Optimization H. Mansouri M. Zangiabadi Y. Bai C. Roos Department of Mathematical Science, Shahrekord University, P.O. Box 115, Shahrekord,
More informationA Regularized Interior-Point Method for Constrained Nonlinear Least Squares
A Regularized Interior-Point Method for Constrained Nonlinear Least Squares XII Brazilian Workshop on Continuous Optimization Abel Soares Siqueira Federal University of Paraná - Curitiba/PR - Brazil Dominique
More informationA PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES
IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES
More informationJim Lambers MAT 419/519 Summer Session Lecture 11 Notes
Jim Lambers MAT 49/59 Summer Session 20-2 Lecture Notes These notes correspond to Section 34 in the text Broyden s Method One of the drawbacks of using Newton s Method to solve a system of nonlinear equations
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationMatrix stabilization using differential equations.
Matrix stabilization using differential equations. Nicola Guglielmi Universitá dell Aquila and Gran Sasso Science Institute, Italia NUMOC-2017 Roma, 19 23 June, 2017 Inspired by a joint work with Christian
More information1. Introduction and motivation. We propose an algorithm for solving unconstrained optimization problems of the form (1) min
LLNL IM Release number: LLNL-JRNL-745068 A STRUCTURED QUASI-NEWTON ALGORITHM FOR OPTIMIZING WITH INCOMPLETE HESSIAN INFORMATION COSMIN G. PETRA, NAIYUAN CHIANG, AND MIHAI ANITESCU Abstract. We present
More informationGradient method based on epsilon algorithm for large-scale nonlinearoptimization
ISSN 1746-7233, England, UK World Journal of Modelling and Simulation Vol. 4 (2008) No. 1, pp. 64-68 Gradient method based on epsilon algorithm for large-scale nonlinearoptimization Jianliang Li, Lian
More informationNetwork Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)
Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationA New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming
A New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming Roger Fletcher Numerical Analysis Report NA/223, August 2005 Abstract A new quasi-newton scheme for updating a low rank positive semi-definite
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationQuasi-Newton Methods
Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the
More informationf = 2 x* x g 2 = 20 g 2 = 4
On the Behavior of the Gradient Norm in the Steepest Descent Method Jorge Nocedal y Annick Sartenaer z Ciyou Zhu x May 30, 000 Abstract It is well known that the norm of the gradient may be unreliable
More information10-725/ Optimization Midterm Exam
10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted
More informationCHAPTER 6. Projection Methods. Let A R n n. Solve Ax = f. Find an approximate solution ˆx K such that r = f Aˆx L.
Projection Methods CHAPTER 6 Let A R n n. Solve Ax = f. Find an approximate solution ˆx K such that r = f Aˆx L. V (n m) = [v, v 2,..., v m ] basis of K W (n m) = [w, w 2,..., w m ] basis of L Let x 0
More informationConvergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize
Bol. Soc. Paran. Mat. (3s.) v. 00 0 (0000):????. c SPM ISSN-2175-1188 on line ISSN-00378712 in press SPM: www.spm.uem.br/bspm doi:10.5269/bspm.v38i6.35641 Convergence of a Two-parameter Family of Conjugate
More informationOn Sparse and Symmetric Matrix Updating
MATHEMATICS OF COMPUTATION, VOLUME 31, NUMBER 140 OCTOBER 1977, PAGES 954-961 On Sparse and Symmetric Matrix Updating Subject to a Linear Equation By Ph. L. Toint* Abstract. A procedure for symmetric matrix
More informationReduced-Hessian Methods for Constrained Optimization
Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its
More informationAppendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio
Appendix to Portfolio Construction by Mitigating Error Amplification: The Bounded-Noise Portfolio Long Zhao, Deepayan Chakrabarti, and Kumar Muthuraman McCombs School of Business, University of Texas,
More informationMath 408A: Non-Linear Optimization
February 12 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x
More information