x M where f is a smooth real valued function (the cost function) defined over a Riemannian manifold

Size: px
Start display at page:

Download "x M where f is a smooth real valued function (the cost function) defined over a Riemannian manifold"

Transcription

1 1 THE RIEMANNIAN BARZILAI-BORWEIN METHOD WITH NONMONOTONE LINE SEARCH AND THE MATRIX GEOMETRIC MEAN COMPUTATION BRUNO IANNAZZO AND MARGHERITA PORCELLI Abstract. The Barzilai-Borwein method, an effective gradient descent method with clever choice of the step-length, is adapted from nonlinear optimization to Riemannian manifold optimization. More generally, global convergence of a nonmonotone line-search strategy for Riemannian optimization algorithms is proved under some standard assumptions. By a set of numerical tests, the Riemannian Barzilai-Borwein method with nonmonotone line-search is shown to be competitive in several Riemannian optimization problems. When used to compute the matrix geometric mean, known as the Karcher mean of positive definite matrices, it notably outperforms existing first-order optimization methods. Riemannian optimization; manifold optimization; Barzilai-Borwein algorithm; nonmonotone line-search; Karcher mean; matrix geometric mean; positive definite matrix. 1. Introduction. Globally convergent Barzilai-Borwein (BB) methods are well-known optimization methods for the solution of unconstrained and constrained optimization problems formulated in the Euclidean space. These algorithms are rather appealing for their simplicity, low-cost per iteration (gradient-type algorithms) and good practical performance due to a clever choice of the step-length ([2, 11, 13, 9]). The key of success of BB methods lies in the explicit use of first-order information of the cost function on one side and, on the other side, in the implicit use of second-order information embedded in the step-length through a rough approximation of the Hessian of the cost function. This is crucial in the solution of problems where computing the Hessian represents a heavy burden, as when the problem dimension is large. For strictly convex quadratic problems, global convergence of the BB method has been established in [28] while, in the general nonquadratic case, this property is guaranteed if BB is incorporated in a globalization strategy, see [29]. Due to the nonmonotone behaviour of the cost function through the BB steps, BB methods are generally combined with nonmonotone linesearch strategies that do not impose a sufficient decrease condition on the cost function at each step, but rather on the maximum of the cost functions over the last, say, M steps (with M > 1) ([16, 29, 17]). It has been observed that if M is sufficiently large this strategy does not spoil the BB average rate of convergence ([29, 17]) and yields impressive good practical performance, especially in comparison with classical conjugate gradient methods, see [29]. Motivated by the nice theoretical and numerical properties of the (globalized) BB methods in the Euclidean space, we consider here a more general setting: BB methods for Riemannian manifold optimization based on retractions. Therefore, in this work we address the following optimization problem min f(x), (1.1) x M where f is a smooth real valued function (the cost function) defined over a Riemannian manifold M. To the best of our knowledge, globally convergent nonmonotone BB type methods have been already proposed in Riemannian optimization only for the special case of Stiefel manifolds, Version of February 14, Dipartimento di Matematica e Informatica, Università degli Studi di Perugia, Via Vanvitelli 1, Perugia, Italy (bruno.iannazzo@dmi.unipg.it). Dipartimento di Ingegneria Industriale, Università degli Studi di Firenze, Viale Morgagni 40/44, Firenze, Italy (margherita.porcelli@unifi.it). 1

2 2 B. Iannazzo and M. Porcelli see [14, 33, 21]. More precisely, in [14] the global convergence of a trust-region nonmonotone Levenberg-Marquardt algorithm for minimization problems on closed domains is provided and the application of the proposed method for minimization on Stiefel manifolds results in a BB method with a nonmonotone trust-region. The papers [21] and [33] are concerned with the BB method with nonmonotone line-search and the global convergence of the procedure is proved in [21] for Stiefel manifolds. As a first contribution of this paper, under mild assumptions we prove global convergence of general (gradient-related) Riemannian optimization algorithms, when equipped with a nonmonotone line-search strategy. Line-search will be suitably adjusted to fit the framework of Riemannian manifold optimization and will be performed on the tangent space at a point. We therefore extend to Riemannian optimization the convergence results proved for the Euclidean space in [16, 29, 8]. Secondly, we introduce the Riemannian Barzilai-Borwein algorithm, which generalizes the Euclidean BB method, and discuss the corresponding algorithm embedded in a nonmonotone line-search strategy. Thirdly, we apply the proposed algorithm to the computation of the geometric mean of positive definite matrices, the so called Karcher mean, that is very popular in many areas of applied mathematics and engineering, see, e.g., part II of [26]. The proposed method is a first-order algorithm, nevertheless, as pointed-out in [20], the Riemannian Hessian (or its full approximation) of the cost function defining the Karcher mean of positive definite matrices has a high computational complexity which makes the use of second-order algorithms prohibitive. Finally, we benchmark the Riemannian BB over a set of problems involving different manifold structures and compare it with the trust-region and the steepest-descent line-search algorithms. Remarkably, in all our experiments, the performance of the proposed algorithm is superior to those of the existing first-order optimization algorithms for computing the Karcher mean of positive definite matrices ([31, 20, 5, 34]). The paper is organized as follows. For later convenience, we recall in Section 2 some concepts from Riemannian optimization using the language and the tools described in the book by [1]. Section 3 is devoted to the global convergence analysis of the nonmonotone line-search strategy for gradient-related Riemannian optimization algorithms; then, in Section 4 we introduce the Riemannian BB algorithm and its globalization via nonmonotone line-search. Section 5 is devoted to the Karcher mean of positive definite matrices and to the implementation of the Riemannian BB method for its computation. In Section 6 numerical tests are performed to confirm the good behaviour of the Riemannian BB method for computing the Karcher mean of positive definite matrices and the possibility to apply the method to other problems of Riemannian optimization. Conclusions are drawn in Section Preliminaries on Riemannian optimization. The subject of Riemannian optimization is the computation of the minima of a differentiable function f : M R, where M is a real Riemannian manifold. The standard nonlinear optimization, which will be referred as Euclidean optimization, can be interpreted as a special case of Riemannian optimization, where M = R n. For the ease of the reader, we briefly recall some of the basic concepts of Riemannian optimization based on retractions. We refer the reader to the book [1] for a thorough treatise of the topic. In Riemannian optimization algorithms, the iterate x k belongs to a manifold M while the gradient at x k belongs to the tangent space T xk M at x k, isomorphic to an Euclidean space. The gradient of f related to the geometry of M is denoted by (R) f and defined as follows: for any point x M, (R) f(x) is the unique element of T x M that satisfies Df(x)[h] = (R) f(x), h x, h T x M,

3 The Riemannian Barzilai-Borwein algorithm 3 where, x is the Riemannian scalar product in T x M and Df(x) : T x M T x M is the differential of f at x. In the paper, we refer to (R) f(x) as the Riemannian gradient of f at x and frequently use the notation g(x) := (R) f(x) and g k := (R) f(x k ). A step of a gradient-like method in R n involves summing points and gradient vectors. While in R n, the concept of moving in the direction of the negative gradient is straightforward, on a manifold, using the gradient (R) f(x k ) to update x k is tricky since x k lies on the manifold while the gradient belongs to T xk M. Indeed, the next iterate x k+1 ought to be constructed following a geodesic starting at x k and with tangent vector (R) f(x k ), that is a curve γ(t) : [0, t 0 ) M, with t 0 (0, ]. In contrast with the standard gradient-like methods for Euclidean unconstrained optimization, where t is an arbitrary positive number, in the Riemannian case, t must be chosen between 0 and t 0. Nevertheless, there are special manifolds, said to be geodesically complete, for which t 0 can be always chosen to be. Geodesics can be followed using the exponential map exp x (v), which gives the point γ(1) of a geodesic γ(t) starting at x = γ(0) and with tangent vector v. Since it is not always easy to find and follow geodesics, in manifold optimization the exponential map is approximated with a retraction map and we refer to retraction-based Riemannian optimization. The retraction is defined as a smooth map R from the tangent bundle T M to M, such that R x (0 x ) = x and DR x (0 x ) = id TxM, where R x denotes the restriction of R to T x M, 0 x is the zero element of T x M, id TxM is the identity map on T x M and we have identified the tangent space to T x M with itself. The exponential map is a special case of retraction since it satisfies the above properties too. In practice using a retraction, in lieu of the exponential map, yields the same convergence results in an optimization algorithm, being possibly much easier to compute. In general, the domain of R is not necessarily the whole tangent bundle but for each point x we may assume that there is a neighbourhood of 0 x in T x M such that R x is defined at any point of this neighbourhood. The step of a gradient-like method based on a retraction is of the form x k+1 = R xk ( α k g k ), where α k is a step-length such that α k g k belongs to the domain of R xk. The extension of the Euclidean BB algorithm to the Riemannian setting requires the sum of gradients evaluated at different points (see Section 4), that is vectors belonging to different tangent spaces. In retraction-based optimization algorithms, the standard tool to move vectors from a tangent space to another is the vector transport. Roughly speaking, given two tangent vectors at x, say v x, w x T x M, a vector transport T associated with a retraction R, is a smooth map that generates a point T wx (v x ) that belongs to T Rx(w x)m, and that is linear with respect to the argument. For the precise definition of the vector transport, we refer the reader to Sec. 8.1 of [1]. In many algorithms, a vector v T x M has to be transported to the tangent space T y M, with y x. Given a vector transport T, for the sake of simplicity, we use the notation T x y (v) to indicate the transport T wx (v), where w x is such that R x (w x ) = y. To ensure the existence of a w x such that R x (w x ) = y, we have to assume that the retraction has a sufficiently large domain so as to allow the search step and the vector transport. A special case of vector transport, well-known in geometry, is the parallel transport that can be interpreted as a vector transport where the underlying retraction is the exponential map. 3. Nonmonotone line-search in Riemannian optimization. The Armijo line-search is a standard optimization strategy that consists in successively reducing the step-length (backtracking) until a sufficient decrease criterion, called Armijo s condition, is met. When combined with a gradient-type algorithm, this strategy guarantees global convergence of the overall procedure under mild assumptions, see e.g. [27].

4 4 B. Iannazzo and M. Porcelli In Euclidean geometry, given a point x k R n and a search direction d k, the Armijo linesearch strategy is as follows: starting from a fixed step-length λ k > 0, compute the smallest nonnegative integer h k such that f(x k + σ h k λ k d k ) f(x k ) + γσ h k λ k f(x k ) T d k, for a suitable γ (0, 1) and reduction factor σ (0, 1), set the current step-length α k = σ h k λ k and update x k+1 = x k + α k d k. We have denoted by f(x k ) the (Euclidean) gradient of f at x k, that is the unique vector such that f(x k ) T h = Df(x k )[h] for h R n. A generalization of the Armijo line-search to Riemannian manifold optimization can be found in [1, Sec. 4.2], where the condition to find h k is f(r xk (σ h k λ k d k )) f(x k ) + γσ h k λ k g k, d k xk, (3.1) where R is a retraction on M and g k is the Riemannian gradient of f at x k. Then the iterate is updated as x k+1 = R xk (α k d k ) with α k = σ h k λ k. This strategy is sometimes referred to as curvilinear search since, in some sense, it is a search on a curve in M ([33]). Alternatively, it can be seen as a line-search on the tangent space. In certain optimization algorithms, such as the BB method, global convergence is still ensured if one allows the cost function to increase from time to time, while asking that the maximum of the cost function over the last M > 1 steps decreases. This can be achieved using a nonmonotone Armijo line-search strategy and choosing h k as the smallest nonnegative integer such that f(x k + σ h k λ k d k ) max {f(x k+1 j)} + γσ h k λ k f(x k ) T d k, 1 j min{k+1,m} where γ (0, 1) and σ (0, 1). The adaptation of the nonmonotone line-search in Riemannian optimization consists in finding h k as the smallest nonnegative integer such that f(r xk (λ k σ h k d k )) max 1 j min{k+1,m} {f(x k+1 j)} + γσ h k λ k g k, d k xk. (3.2) We now give our main result on the global convergence of the nonmonotone Armijo linesearch in Riemannian optimization when the search direction d k satisfies a general set of suitable conditions as, e.g., that of being gradient-related. The proof closely follows and generalizes the ones of [16], [8] and [29], where the case M = R n is considered. Theorem 3.1. Let M be a Riemannian manifold with metric, x at x M, and let R be a retraction on M whose domain is the whole tangent bundle. Let f : M R be a smooth function and x 0 M be such that M (x0) := {x M : f(x) f(x 0 )} is a compact set. Let {x k } M be a sequence defined by x k+1 = R xk (α k d k ), k = 0, 1, 2,... with α k R and d k T xk M, and let g k denote the Riemannian gradient of f at x k. Let σ (0, 1), γ (0, 1), M be a nonnegative integer and {λ k } be a sequence such that 0 < a λ k b, where a and b are independent of k. Assume that: (a) d k is a descent direction, that is, g k, d k xk < 0 for each k; (b) the sequence {d k } is gradient-related, namely, for any subsequence {x k } k K of {x k } that converges to a nonstationary point of f, the corresponding subsequence {d k } k K is bounded and satisfies lim sup g k, d k xk < 0; k K

5 The Riemannian Barzilai-Borwein algorithm 5 (c) for any subsequence {x k } k K of {x k } that converges to a stationary point, then there exists a subsequence {d k } k K2 of {d k }, with K 2 K, such that lim d k = 0; k K 2 (d) α k = σ h k λ k, where, for each k, h k is the smallest nonnegative integer such that f(r xk (σ h k λ k d k )) Then every limit point of the sequence {x k } is stationary. max {f(x k+1 j)} + γσ h k λ k g k, d k xk. (3.3) 1 j min{k+1,m} We prove the theorem using a few lemmas. Lemma 3.2. In the hypotheses of Theorem 3.1, for j = 1, 2, 3,... let V j = max{f(x jm M+1 ), f(x jm M+2 ),..., f(x jm )}, and ν(j) {jm M + 1, jm M + 2,..., jm} be such that Then, f(x ν(j) ) = V j. V j+1 V j + γα ν(j+1) 1 g ν(j+1) 1, d ν(j+1) 1 xν(j+1) 1, (3.4) for all j = 1, 2,.... Proof. We prove that, for l = 1, 2,..., M, and for all j = 1, 2,..., we have f(x jm+l ) V j + γα jm+l 1 g jm+l 1, d jm+l 1 xjm+l 1 V j. (3.5) Observe that from assumption (a) and the definition of α jm+l 1 and γ we have the latter inequality. For any j, the case l = 1 follows by the nonmonotone line-search assumption (d), while assuming that the statement is true for any l < l, with 1 < l M, we can write f(x jm+l ) max{f(x jm M+l ),..., f(x jm ), f(x jm+1 ),..., f(x jm+l 1 )} + γα jm+l 1 g jm+l 1, d jm+l 1 xjm+l 1. Since f(x jm M+l ),..., f(x jm ) V j by definition and f(x jm+1 ),..., f(x jm+l 1 ) V j by inductive hypothesis, we have that (3.5) holds for l. Since x ν(j+1) = x jm+l for some l {1,..., M}, the inequality (3.4) follows from (3.5). Let us define and observe that ν(j) < ν(j + 1) < ν(j) + 2M. K = {ν(1) 1, ν(2) 1,... }, (3.6) Lemma 3.3. In the hypotheses of Theorem 3.1 and with K as in (3.6), every limit point of {x k } k K is stationary. Proof. Since f is bounded in M (x0) = {x M : f(x) f(x 0 )} and it is monotone in {x k+1 } k K, there exists f such that lim k K f(x k+1 ) = f. Taking the limit of both sides of (3.4), and using the assumption (a) of Theorem 3.1, we get lim γα ν(j+1) 1 g ν(j+1) 1, d ν(j+1) 1 xν(j+1) 1 = 0, j

6 6 B. Iannazzo and M. Porcelli that is lim α k g k, d k xk = 0. (3.7) k K By contradiction, assume that there exists a nonstationary limit point, namely, there exists K 2 K such that lim k K2 x k = x and g(x ) 0. Since d k is gradient-related by assumption (b), there exists K 3 K 2 such that lim k K3 g k, d k xk exists and is a negative number. Thus, lim k K3 α k = 0, in view of (3.7). Since λ k a > 0, the condition lim k K3 α k = 0 implies that, for a sufficiently large k K 3, say for k K 4 K 3, we have h k 1, and thus condition (3.3) is not verified for α k = α k/σ. We have found a sequence {α k } k K 4 such that lim k K4 α k = 0 and f ( R xk (α kd k ) ) > max {f(x k+1 j)} + γα k g k, d k xk f(x k ) + γα k g k, d k xk, k K 4. 1 j min{k+1,m} (3.8) By taking a subsequence K 5 K 4 we may assume that every point {x k } k K5 and x belong to a single coordinate chart. By the previous relation (3.8), the fact that R x (0) = x for x M, and the mean value theorem applied to the smooth function f R xk : T xk M R, there exists ξ k (0, α k ), such that f ( R xk (α k d k) ) f ( R xk (0) ) = D(f R xk )(ξ k d k )[d k ] α k = Df(R xk (ξ k d k ))[DR xk (ξ k d k )[d k ]] = g(r xk (ξ k d k )), DR xk (ξ k d k )[d k ] Rxk (ξ k d k ) > γ g k, d k xk. Since d k is bounded by assumption (b), there exists K 6 K 5 such that lim k K6 d k = d 0 and thus lim k K6 ξ k d k = 0, and since the retraction R x (v) is smooth as a function of both x and v, we have lim k K6 R xk (ξ k d k ) = R x (0) = x and lim k K6 DR xk (ξ k d k )[d k ] = DR x (0)[d] = d. Finally, since f and the Riemannian metric are smooth, using the chart where the sequence and x lie, and taking limits on the last line of (3.9) we get lim k K 6 g(r xk (ξ k d k )), DR xk (ξ k d k )[d k ] Rxk (ξ k d k ) = g(x ), d x γ g(x ), d x. Now we have (1 γ) g(x ), d x 0, with γ (0, 1), and on the other hand, by the choice of K 3, g(x ), d x < 0 which leads to a contradiction. Lemma 3.4. In the hypotheses of Theorem 3.1, let K(l) for l 0 be the set {k + l, k K}. For each l 0, every limit point of {x k } k K(l) is stationary. Proof. We give a proof by induction on l. Since K(0) = K, the case l = 0 is Lemma 3.3. We assume that the property is true for K(l) and we prove it for K(l + 1). Let x be a limit point of {x k } k K(l+1), namely, there exists H 1 K(l + 1) such that lim k H1 x k = x ; we want to prove that x is stationary. The sequence {x k 1 } k H1 belongs to the compact set M (x0) and thus it has a subsequence {x k 1 } k H2, with H 2 H 1, converging to a certain limit point y. By inductive hypothesis, since {x k 1 } k H2 is a subsequence of {x k } k K(l), we have that y is a stationary point. By assumption (c), there exists H 3 H 2 such that lim k H3 d k 1 = 0, and since α j λ j b for any j, we have hence, x is stationary. x = lim k H 3 x k = lim k H 3 R xk 1 (α k 1 d k 1 ) = R y (0) = y; (3.9)

7 The Riemannian Barzilai-Borwein algorithm 7 Proof. (of Theorem 3.1). Since {1, 2,...} = 2M l=0 K(l), one can see that for any sequence {x k } k K0 converging to x, there exists 0 l 2M such that infinitely many terms of {x k } k K0 belong to {x k } k K( l), that is, there exists K 00 K 0 K( l) such that lim k K00 x k = x. Since the sequence {x k } k K00 belongs to {x k } k K( l), using Lemma 3.4, we conclude that x is a stationary point. A few remarks on the assumptions of Theorem 3.1 are in order. Provided that x k is not stationary for all k, assumptions (a)-(c) are always fulfilled for non-finite sequences generated by a gradient-type algorithm, being d k = g k. The requirement that M (x0) is a compact set is natural in nonlinear minimization. The strongest hypothesis is that the retraction R has to be defined in the whole tangent bundle. This occurs in several practical cases, but it is not always the case. On the other hand, in order to apply the algorithm and prove its global convergence, we only need that for each k, R xk (td k ) is defined for t [0, λ k ]; this might be true also if R is not globally defined. 4. The Riemannian Barzilai-Borwein method. In this section we propose an adaptation of the Euclidean BB method to the solution of the Riemannian manifold optimization problem (1.1). The BB method belongs to the class of first-order (or gradient-type) optimization algorithms, namely algorithms that only use the gradient of the cost function, while disregarding the Hessian (second-order information). Given the problem min f(x), x R n where f : R n R is a given smooth cost function, the simplest gradient-type method is a linesearch method based on the steepest descent direction: given x 0 R n, the algorithm generates a sequence {x k } k with x k+1 = x k α k f(x k ), (4.1) where α k is a suitable step-length. The majority of gradient-type methods are also descent methods, that is, {x k } k is built so that the cost function decreases monotonically on {x k } k, i.e., f(x k+1 ) < f(x k ) for all k, unless the sequence converges in a finite number of steps. The ideal step-length α k satisfies α k = argmin f(x k α f(x k )) (4.2) α ( exact line-search ); however this value is usually unnecessarily expensive to compute for a general nonlinear cost function f. More practical strategies perform an inexact line-search to identify a step-length that achieves adequate reductions in f at minimal cost. The steepest-descent method with exact line-search is commonly considered ineffective because of its slow convergence rate and its oscillatory behaviour, that has been completely understood and studied in the convex quadratic case, see [27]. The BB method provides an alternative strategy for the choice of the step-length; while it does not guarantee the cost function decrease at each step, in some problems, it yields impressive good practical performance. The basic idea of the BB method is to solve, for k 1, the leastsquares problem min t s k t y k 2, (4.3) with s k := x k+1 x k and y k := f(x k+1 ) f(x k ), which, assuming that x k+1 x k, has the

8 8 B. Iannazzo and M. Porcelli unique solution t = st k y k s T k s k. When st k y k > 0, the BB step-length is chosen to be α BB k+1 = st k s k s T k y. (4.4) k The overdetermined system s k t = y k is in fact a Quasi-Newton secant equation of the type A k+1 (x k+1 x k ) = f(x k+1 ) f(x k ), where we require the matrix A k+1 R n n to be a scalar multiple of the identity matrix, and then treat it as a linear least-squares problem. The Quasi-Newton method thus results in a gradient-type method where second-order information is implicitly embedded in the step-length through a cheap approximation of the Hessian. As in the Euclidean case, the idea of the Riemannian BB method is to approximate the action of the Riemannian Hessian of f at a certain point by a multiple of the identity. At the (k + 1)-st step, the Hessian is a linear map from T xk+1 M to T xk+1 M. Instead of the increment x k+1 x k, we can consider the vector η k = α k g k belonging to T xk M, and transport it to T xk+1 M, yielding s k := T ηk (η k ) = T xk x k+1 ( α k g k ) = α k T xk x k+1 (g k ). (4.5) We assume here that a vector transport between x k and x k+1 exists. To obtain y k we need to subtract two gradients that lie in two different tangent spaces. To be coherent with the manifold structure and to work on T xk+1 M, this difference should be made after g k is transported to T xk+1 M so that y k := g k+1 T xk x k+1 (g k ) = g k α k T ηk (η k ). (4.6) Imposing a scalar multiple of the identity as the approximation of the Hessian, the Riemannian counterpart of the secant equation (4.3) can be written again as s k t = y k, in the unknown t R. The least-squares approximation with respect to the metric of T xk+1 M yields t = s k,y k xk+1 s k,s k xk+1. Therefore the Riemannian BB step-length has the form α BB k+1 = s k, s k xk+1 s k, y k xk+1, (4.7) provided that s k, y k xk+1 > 0. Remark 4.1. Alternative strategies for the BB step-length have been proposed in Euclidean optimization, see e.g. [2, 11, 30, 15]. A standard choice is the step-length obtained by considering the least-square problem min t s k y k t 2 ([2]) that, by symmetry with respect to (4.3), yields the Riemannian BB step-length or the one obtained by alternating αk+1 BB and αbb k+1. αk+1 BB s k, y k xk+1 =, (4.8) y k, y k xk+1

9 The Riemannian Barzilai-Borwein algorithm 9 Remark 4.2. As opposed to the Euclidean case, different alternatives are available for y k. For instance, we can do the subtraction at T xk M after g k+1 is transported over there. Or, we can transport the two vectors to the tangent space at any point of the geodesic joining x k and x k+1, or even at any point of the manifold reachable from x k and x k+1 through a vector transport. All these alternatives for y k lead to different versions of the Riemannian BB method. Yet another possibility is to avoid completely the vector transport, identifying T xk M with T xk+1 M and then approximating the transport with the identity. This approximation is justified by the property T 0x (v) = v, where v, 0 x T x M (see [1]), but it is not recommended if one wants to preserve the structure The globalized Riemannian Barzilai-Borwein algorithm. Algorithm 1 summarizes the main steps of the Riemannian BB method with a nonmonotone line-search strategy. Here αk BB plays the role of λ k used in Section 3 and its updating at Step 5 follows [8]. Moreover, we assume that a retraction R and a transport vector T mapping are supplied for the manifold under consideration. Algorithm 1 (The Riemannian Barzilai-Borwein with nonmonotone line-search (RBB- NMLS) algorithm). Set the line-search parameters: the step-length reduction factor σ (0, 1); the sufficient decrease parameter γ (0, 1); the integer parameter for the nonmonotone linesearch M > 0; the upper and lower bounds for the step-length α max > α min > 0. Set the initial values: the starting point x 0 M; the starting step-length α0 BB [α min, α max ]. 1. Compute f 0 = f(x 0 ) and g 0 = (R) f(x 0 ). 2. For k = 0, 1, 2,... find the smallest h = 0, 1, 2,... such that f(r xk ( σ h αk BB g k )) max f k+1 j γσ h αk BB g k, g k xk 1 j min{k+1,m} and set α k := σ h α BB k. 3. Compute x k+1 = R xk ( α k g k ), f k+1 = f(x k+1 ) and g k+1 = (R) f(x k+1 ). 4. Compute y k and s k as in (4.6) and (4.5), respectively. 5. Set τ k+1 = s k,s k xk+1 s k,y k xk Set { { αk+1 BB min αmax, max { }} α = min, τ k+1 if sk, y k > 0, xk+1 α max otherwise. We point out that τ k+1 can be choose using alternative strategies, as mentioned in in Remark 4.1. Moreover, the backtracking strategy considered in Step 2 of Algorithm 1 could be slightly generalized without affecting the global convergence. In particular, the contraction factor σ could vary at each iteration of the line-search and, for example, could be chosen by safeguarded interpolation, see Sec. 3.5 in [27]. If x k is not a stationary point, then assuming that R xk ( αg k ) is defined for α > 0, by the mean value theorem (compare equation (3.9)), there exists ξ (0, α), such that f(r xk ( αg k )) f(x k ) α + γ g k, g k xk = g(r xk ( ξg k )), DR xk ( ξg k )[ g k ] R xk ( ξg k ) + γ g k, g k xk α 0 (γ 1) g k, g k xk < 0, since γ < 1. Thus, for α sufficiently small, we have f(r xk ( αg k )) f(x k ) γ α g k, g k xk max 1 j min{k+1,m} {f(x k+1 j)} γ α g k, g k xk,

10 10 B. Iannazzo and M. Porcelli that is, the nonmonotone Armijo s condition at Step 2 of Algorithm 1 is verified after a finite number of step reductions. The practical implementation of the various steps strictly depends on the structure of the cost function f and of the underlying manifold M; further details will be discussed in the next section for the computation of the matrix geometric mean. Other implementation issues are postponed to Section 6. We remark that the use of the nonmonotone strategy requires the additional cost of evaluating the cost function at each trial point x k (Step 3). Indeed, a pure BB algorithm (i.e., without line-search) consists in setting x k+1 = R xk ( αk BB g k ) without computing α k at Step 2. We will refer to RBB-NMLS and to RBB to indicate the Riemannian BB algorithm with and without nonmonotone line-search, respectively. Finally, note that if M = 1 then Algorithm 1 corresponds to the standard (monotone) line-search with Armijo s rule. 5. Computing the Karcher mean of positive definite matrices. In this section we recall the Riemannian optimization problem whose solution is the Karcher mean of positive definite matrices, and propose the adaptation of the RBB method for this problem. First, we describe the set of positive definite matrices as a Riemannian manifold. Let P n denote the set of positive definite matrices of size n and H n denote the real vector space of Hermitian matrices. The set P n is an open subset of the Euclidean space H n (with the scalar product E, F = trace(ef ) for E, F H n ) and thus it is a differentiable manifold with T (P n ) X isomorphic to H n for each X P n. Two main Riemannian structures on P n are commonly used in the literature: the first one is obtained by the Euclidean scalar product of H n at each point X P n, the second is the one induced by the scalar product E, F X = trace(x 1 EX 1 F ), E, F H n, X P n (5.1) see Ch. XII in [22] and Sec. 6.3 in [25]. Since the first structure is trivial and the second one does not have an established name, in the following we will refer to the second geometry as the Riemannian structure of P n. The scalar product (5.1) induces a metric on P n such that the distance between A, B P n is δ(a, B) = log(a 1/2 BA 1/2 ) F, (5.2) where F is the Frobenius norm. The resulting metric space is complete [3], simply connected and with nonpositive curvature, thus it is an example of Cartan-Hadamard manifold. There exists a unique geodesic joining A and B in P n and whose natural parametrization is γ(t) = A(A 1 B) t = A 1/2 (A 1/2 BA 1/2 ) t A 1/2 =: A# t B, t [0, 1], see [23]. Notice that the geodesic can be indefinitely extended for t R. A nice feature of this geometry is that for any set of matrices A 1,..., A m P n, there exists unique X P n that minimizes the function f(x) := f(x; A 1,..., A m ) = m δ 2 (X, A k ). (5.3) This point is called matrix geometric mean or Karcher mean of A 1,..., A m and it is established as the geometric mean of matrices. Therefore, we state the problem of computing the Karcher mean as the problem of minimizing the above function f over the manifold P n. A class of algorithms developed for this problem can be found in [5] and in [20]. k=1

11 The Riemannian Barzilai-Borwein algorithm 11 Since the cost function f in (5.3) is smooth, the corresponding derivative can be computed and two different gradients can be defined, one for each considered geometry, i.e. the Euclidean and Riemannian geometry. A tedious computation proves that the gradient with respect to the Euclidean geometry, that is the unique matrix (E) f(x) such that Df(X)[H] = trace( (E) f(x)h), is (E) f(x) = 2 m k=1 X 1 log(xa 1 k ). While gradient with respect to the Riemannian structure, namely the Riemannian gradient, defined as the unique Hermitian matrix (R) f(x) such that Df(X)[H] = (R) f(x), H X, for H H n, turns out to be (R) f(x) = 2 m X log(x 1 A k ) (5.4) k=1 see [24]. Notice that (E) f(x) = 0 if and only if (R) f(x) = 0. An interesting property of the Riemannian geometry of P n is that the function f in (5.3) is geodesically strictly convex with respect to the Riemannian geometry of P n, that is for any A, B P n, with A B, we have (see [3]) f(a# t B) < (1 t)f(a) + tf(b), t (0, 1); on the contrary, it can be shown that f is not convex in the Euclidean sense. The application of Algorithm 1 for the Karcher mean computation requires the definition of the retraction R and transport vector T mappings for the manifold P n. Since the final point of a geodesic starting at A and with tangent vector V is A exp(a 1 V ), the retraction at point A has the form R A (V ) = A exp(a 1 V ). It follows that the iterate update of RBB is X k+1 = R Xk ( α BB k g k ) = X k exp( αk BB X 1 k g k), (5.5) where αk BB is defined by (4.7) (or in an alternative way as discussed in Remark 4.1) and s k and y k are given below. Notice that the retraction (5.5) is the exponential map with respect to the Riemannian geometry of P n, and thus the vector transport is given by the parallel transport. In the Riemannian geometry of P n, the parallel transport of a vector V H n from the tangent space at A to the tangent space at B is given by (see [32]) T A B (V ) = (BA 1 ) 1/2 V (A 1 B) 1/2. Notice that the parallel transport is a congruence of the type MV M, with M = (BA 1 ) 1/2. Using the vec( ) operator which stacks the columns of a matrix into a long vector and the Kronecker product, we get the simple expression T A B vec(v ) = (M M) vec(v ), M = (BA 1 ) 1/2. Alternatively, we can also write T A B = (A# 1/2 B)A 1 V A 1 (A# 1/2 B). In view of (4.5) and (4.6), it is sufficient to transport a vector along itself, or, equivalently, along the geodesic tangent to it. We now derive the formula for this case that is simpler than those above.

12 12 B. Iannazzo and M. Porcelli Let V A T A P n and let B := R A (V A ) = A exp(a 1 V A ), then A 1 B = exp(a 1 V A ) and T VA (V A ) = T A B (V A ) = A(A 1 B) 1/2 A 1 V A (A 1 B) 1/2 = A exp(a 1 V A ) 1/2 A 1 V A exp(a 1 V A ) 1/2 = V A A 1 R A (V A ) = V A exp(a 1 V A ), (5.6) where we have used standard properties of matrix functions (compare Theorem 1.13 in [18]). The formula (5.6) yields a simple expressions for s k and y k of equations (4.5) and (4.6), respectively, s k = α k g k exp( α k X 1 k g k), y k = g k+1 g k exp( α k X 1 g k), that characterize the Riemannian BB method for the Karcher mean computation. As a final remark we observe that the retraction used in the Riemannian geometry of the positive definite matrices is defined in the whole tangent bundle and that, for any X 0 P n, the level set M (X0) = {X P n : f(x) f(x 0 )} is a compact set (compare the proof of Theorem 2.1 in [7]). Thus, this problem fits the hypotheses of Theorem Implementation of the RBB method for the Karcher mean. The RBB method requires the computation of several quantities of the type Aϕ(A 1 B), where A is positive definite, B is Hermitian and ϕ is a real function. This is indeed the case for the Riemannian gradient (5.4) and the retraction (5.5), while formula (5.6) is fairly similar. Following [19], these quantities can be efficiently computed by forming the Cholesky factorization of A = R A R A. Using the similarity invariance of matrix functions, we write Aϕ(A 1 B) = R AR A ϕ(r 1 A (R A) 1 B) = R Aϕ((R A) 1 BR 1 A )R A, and observe that (RA ) 1 BR 1 A = UDU, where D = (d ij ) is diagonal real and U is unitary, so that we obtain Aϕ(A 1 B) = R AU diag(ϕ(d 11 ),..., ϕ(d nn ))U R A. The cost of this basic operation is approximately γ 1 n 3, where γ 1 is a moderate constant. Since one step of the RBB algorithm requires m + 1 evaluations of the type Aϕ(A 1 B) (m are needed to compute the gradient and one for the retraction), the cost of the algorithm is about γ 1 (m + 1)n 3 k arithmetic operations (ops), where k is the number of steps needed to achieve convergence, while m is the number of matrices and n is their size. For comparison, the Richardson algorithm in [5], the steepest descent with fixed step-length and the Majorization- Minimization algorithm in [34] have essentially the same cost. Regarding the cost of the version of the RBB algorithm with the nonmonotone line-search, we need to take into account the number of ops needed to compute the distance (5.2) during the cost function evaluation. By a similar use of Cholesky factorizations and an eigenvalue computation, the distance can be computed with γ 2 n 3 ops, where γ 2 is smaller than γ 1. Overall, the cost function evaluation requires γ 2 mn 3 ops. This is not negligible with respect to the cost of a single step of the RBB algorithm. Hence algorithms without line-search are usually more effective, when the performed number of steps is the same. 6. Experiments. In this section we explore the behaviour of the Riemannian BB method (RBB) and the Riemannian BB method with nonmonotone line-search (RBB-NMLS) for the solution of various optimization problems on Riemannian manifolds. Section 6.1 is devoted to the computation of the Karcher mean of a set of positive definite matrices. To this purpose, we consider several test matrices constructed with the random function k

13 The Riemannian Barzilai-Borwein algorithm 13 of the Matrix Means Toolbox ([6]). These tests have been performed using MATLAB R2011b on a machine with a Double Intel Xeon X5680 Processor with 24 GB of ram. Experiments in Section 6.2 are carried out using Manopt ([10]), a toolbox which provides a large library of manifolds and ready-to-use Riemannian optimization algorithms together with the flexibility of including user-defined solvers. The experiments are run using MATLAB R2015a on a machine with Double Intel Xeon E v2 Processor and 128 GB of DDR3 ram Tests on the Karcher Mean computation. We investigate the performance of RBB (RBB) in comparison with four other first-order optimization algorithms for computing the Karcher mean of positive definite matrices: the Richardson iteration (Richardson) proposed in [5], the Majorization-Minimization algorithm (MajMin) of [34] and the standard Steepest-Descent (SD) and the Fletcher-Reeves variant of the Conjugate Gradients (CG FR) algorithms described in [10, 20]. Comparisons with the (Euclidean) Barzilai-Borwein algorithm (BB NoRiem) are also included. For the SD and the CG FR algorithms we implemented the Riemannian Armijo s line search (3.1), where the parameters λ k = 1, σ = 0.5 and γ = 0.5, are chosen in accordance with [20]; the maximum number of reductions has been set to 10. For the Conjugate Gradient algorithm we have chosen the Fletcher-Reeves variant because in our experience it showed lightly better results. The implementation of the standard Barzilai-Borwein algorithm (BB NoRiem) consists in using the Euclidean gradient as a search direction, avoiding the parallel transport and using the Euclidean scalar product in the definition of α k ; moreover, in order to ensure that all iterates are positive definite, the iteration update has been performed using the retraction, i.e. setting X k+1 = R xk exp( α k g k ). We compare the number of iterations needed for the convergence since all algorithms require roughly the same computational cost at each step, with the exception of SD and CG FR where some extra cost function evaluations are needed. RBB has been implemented following Algorithm 1. We have considered only the RBB without nonmonotone line-search, since we have noticed that for this problem the line-search is not required. The first step-length α0 BB for the RBB has been chosen using the strategy of the Richardson iteration in [5]. The step-length has been computed using (4.7) in both RBB and RBB NoRiem. The updating (4.8) and the alternating strategy for the step-length described in Remark 4.1 have been tested as well, giving essentially the same results (not report here). As a first test, we consider the three matrices A 1 = , A 2 = , A 3 = , (6.1) and we run the algorithms for the Karcher mean for a fixed number of steps and with no stopping criterion. At each step of each algorithm, we compute the relative error of the current value X k, that is ε k = X k K 2 K 2, where K is a reference value of the Karcher mean obtained using variable precision arithmetic in MATLAB and rounded to 16 significant digits. The results are presented in Figure 6.1 and show that RBB works very well in this example and outperforms the other first-order algorithms. Notice that BB NoRiem does not exploit the geometric structure of the problem and then gives worse results, converging in a nonmonotone way and in a larger number of steps. As a second test, we investigate how convergence depends on the initial value. We consider the matrices (6.1) with four different initial values: the arithmetic mean of the data, the Cheap

14 14 B. Iannazzo and M. Porcelli Richardson RBB MajMin SD CG FR BB NoRiem 10 6 Error Iteration Fig Comparison of different first-order Riemannian optimization methods and the (Euclidean) Barzilai- Borwein algorithm for the Karcher mean of the matrices in (6.1). mean of the data, known to be a good approximation of the Karcher mean ([4]), one of the data matrices and an ill-conditioned matrix of norm 1. The results are presented in Figure 6.2. The relative behaviour of the methods is mostly independent of the initial value in the later phase, while there can be some difference in the early steps. In all cases RBB performs better than the other methods. The step-length choice of the Richardson method is based on optimizing the rate of convergence near the fixed point (see [5]) and therefore the method is penalized whenever the initial value is far from the Karcher mean. As a third test, we run the algorithms on 4 different data sets: random matrices, obtained with the command random(10) of the Matrix Means Toolbox; random matrices, obtained with the command random(100); ill-conditioned matrices, with condition number 10 5, obtained with the command random(10,2,1e5); ill-conditioned matrices clustered far from the identity, obtained with the command random(10,2,1e5)*0.01+a, where A is a fixed matrix generated as in data set 3. In Figure 6.3, we show the relative error with respect to a reference approximation of the Karcher mean computed with variable precision arithmetic. We observe that RBB convergence behaviour is always the best, while performance of Richardson, MajMin, SD and CG FR depends on the test case. In particular, failures of SD and CG FR are due to the computation of tiny steps that prevent progress in the iteration. We remark that, since the data are randomly generated, each test has been repeated several times and we have always obtained the qualitative behavior shown in Figure 6.3.

15 The Riemannian Barzilai-Borwein algorithm Richardson RBB MajMin SD CG FR Richardson RBB MajMin SD CG FR Richardson RBB MajMin SD CG FR Richardson RBB MajMin SD CG FR Fig Comparison of different first-order optimization methods for the Karcher mean of the matrices in (6.1), using different initial values: the arithmetic mean (top-left), the Cheap mean (top-right), one of the data matrices (bottom-left) and a matrix with large condition number and unit norm (bottom-right) Testing RBB using Manopt. We consider the trust-region (TR) and steepest-descent (with monotone line-search and Armijo step-length) (SD) algorithms available in Manopt ([10]) and, for comparison, we consider an implementation of our RBB algorithm obtained by modifying the steepest descent implementation of Manopt. This allows us to use the execution time as a fair measure of algorithmic comparison. In our tests, we use the RBB method without line-search (RBB) and with nonmonotone line-search (RBB-NMLS) as described in Algorithm 1 both using the step-length (4.7). Default parameters, stopping criteria and starting guess choices provided in Manopt have been used. Regarding the parameter setting of RBB-NMLS, we set M = 10, γ = 10 4, α min = 10 3, α max = 10 3 and σ = 0.5 in Algorithm 1, as indicated in [8]. The vectors s k and y k at Step 6 are computed using the vector transport functions provided by Manopt for both RBB and RBB-NMLS. Moreover, the initial α0 BB is chosen so that the first trial point x 1 is the same as the one computed by SD. We consider all test problems available in [10] and follow the formulations and implementations provided therein. In particular, Table 6.1 summarizes the test problems together with a brief description and the various tested sizes. Table 6.2 collects the results in terms of average number of iterations Av-it, average execution time in seconds Av-cpu and number of failures F computed over 10 repeated runs. In

16 16 B. Iannazzo and M. Porcelli Richardson RBB MajMin SD CG FR Richardson RBB MajMin SD CG FR Richardson RBB MajMin SD CG FR Richardson RBB MajMin SD CG FR Fig Comparison of different first-order optimization methods for the Karcher mean using different data sets: 100 random matrices (top-left), 10 random matrices (top-right), 10 ill-conditioned matrices (bottom-left) and 10 ill-conditioned matrices clustered far from the identity (bottom-right). the presence of failures within the 10 runs, the average values are computed taking into account successful runs for all solvers only. The overall number of runs is 240. The comments below follow from Table 6.2: As expected, the use of the nonmonotone strategy makes the RBB procedure much more robust: overall RBB and RBB-NMLS fail 33 and 3 times, respectively. Clearly, when RBB is successful, that is, the nonmonotone strategy is not needed and therefore not activated in RBB-NMLS, RBB is faster than RBB-NMLS since there is no need to compute and store the cost function at each iteration. Focusing on first-order procedures, the use of the BB step-length significantly enhances the speed of convergence of the gradient-type algorithms: on average SD takes much more iterations than RBB and RBB-NMLS resulting in a much higher computational time. This behaviour is especially evident in the solution of KM where Av-cpu of SD is at least a factor 6 of the same value of RBB. Regarding the comparison of RBB with the second-order procedures, we note that the behaviour of TR and RBB-NMLS is comparable. For KM and increasing values of n, RBB is faster than TR since, as expected, the computation of the Hessian of the problem cost function becomes increasingly costly.

17 The Riemannian Barzilai-Borwein algorithm 17 Name Size Description Manifold KM n = 50 Computing the Karcher Mean of a n = 100 set of n n positive definite matrices. SPD n = 200 SPCA n = 100, p = 10, m = 2 Computing the m Sparse Principal n = 500, p = 15, m = 5 Components of a p n matrix Stiefel n = 1000, p = 30, m = 10 encoding p samples of n variables. SP n = 8 Finding the largest diameter of n equal circles Product n = 12 that can be placed on the sphere without of n = 24 overlap (the Sphere Packing problem). Spheres DIS n = 128 Finding an orthonormal basis of the n = 500 Dominant Invariant 3-Subspace of an Grassmann n = 1000 n n matrix. LRMC n = 100, p = 10, m = 2 Low-Rank Matrix Completion: given partial fixed-rank n = 500, p = 15, m = 5 observation of an m n matrix matrices n = 1000, p = 30, m = 10 of rank p, attempting to complete it. MC n = 20 Max-Cut: given an n n Laplacian matrix SPD n = 100 of a graph, finding the max-cut, or an matrices of n = 300 upper-bound on the maximum-cut value. rank 2 TSVD n = 60, m = 42, p = 5 Computing the SVD decomposition of an n = 100, m = 60, p = 7 m n matrix truncated to rank p. Grassmann n = 200, m = 70, p = 8 GP m = 10, N = 50 Generalized Procrustes: rotationally align Product of rotations m = 50, N = 100 clouds of points. Data: matrix A R 3 m N, with the Euclidean m = 50, N = 500 each slice is a cloud of m points in R 3. space for A Table 6.1 Test problems and manifold structure provided in Manopt. TR SD RBB RBB-NMLS Av-it Av-cpu F Av-it Av-cpu F Av-it Av-cpu F Av-it Av-cpu F KM SPCA SP DIS LRMC MC TSVD GP Table 6.2 Results on problems in Table 6.1.

18 18 B. Iannazzo and M. Porcelli 1 CPU time performance profile π(τ) TR (100%) SD LS (59.6%) RBB (85.8%) RBB NMLS (98.7%) τ Fig CPU time performance profiles on problems in Table 6.1. We also summarize the results plotting the performance profile function π S defined as π S (τ) = number of problems s.t. q P,S τ q P number of problems, τ 1, that is the probability for solver S that a performance ratio q P,S /q P is within a factor τ of the best possible ratio ([12]). Here q P,S denotes computational effort of the solver S to solve problem P and q P is computational effort of the best solver to solve problem P. Note that π S (1) is the fraction of problems for which solver S performs the best, π S (2) gives the fraction of problems for which the algorithm is within a factor of 2 of the best algorithm, and that for t sufficiently large, π S (t) is the fraction of problems solved by S. In Figure 6.4 we consider the total CPU time as measure of computational effort for the compared solvers and plot π S (τ) for τ 6 in order to zoom the behaviour of the solver for small value of τ. Therefore, we include in the legend the percentage of tests solved by each solver (retrievable in the plot for large values of τ). We observe that RBB is the most efficient for the 60% of the runs, TR and RBB-NMLS are comparable in both robustness and efficiency and SD shows the worst performance. 7. Conclusions. We have adapted the Barzilai-Borwein method to the framework of Riemannian optimization. The resulting algorithm is competitive in several cases, since it requires just first-order information, while it converges usually faster than other first-order algorithms such as the steepest-descent method. We have provided also a general convergence result for gradient-related Riemannian optimization algorithms, when a nonmonotone line-search is considered. This strategy, can be applied, in

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Statistics 580 Optimization Methods

Statistics 580 Optimization Methods Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Math 164: Optimization Barzilai-Borwein Method

Math 164: Optimization Barzilai-Borwein Method Math 164: Optimization Barzilai-Borwein Method Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 online discussions on piazza.com Main features of the Barzilai-Borwein (BB) method The BB

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Chapter 4. Unconstrained optimization

Chapter 4. Unconstrained optimization Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Static unconstrained optimization

Static unconstrained optimization Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations

An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations International Journal of Mathematical Modelling & Computations Vol. 07, No. 02, Spring 2017, 145-157 An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations L. Muhammad

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

This manuscript is for review purposes only.

This manuscript is for review purposes only. 1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

On efficiency of nonmonotone Armijo-type line searches

On efficiency of nonmonotone Armijo-type line searches Noname manuscript No. (will be inserted by the editor On efficiency of nonmonotone Armijo-type line searches Masoud Ahookhosh Susan Ghaderi Abstract Monotonicity and nonmonotonicity play a key role in

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Residual iterative schemes for largescale linear systems

Residual iterative schemes for largescale linear systems Universidad Central de Venezuela Facultad de Ciencias Escuela de Computación Lecturas en Ciencias de la Computación ISSN 1316-6239 Residual iterative schemes for largescale linear systems William La Cruz

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012. Math 5620 - Introduction to Numerical Analysis - Class Notes Fernando Guevara Vasquez Version 1990. Date: January 17, 2012. 3 Contents 1. Disclaimer 4 Chapter 1. Iterative methods for solving linear systems

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES

II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES II. DIFFERENTIABLE MANIFOLDS Washington Mio Anuj Srivastava and Xiuwen Liu (Illustrations by D. Badlyans) CENTER FOR APPLIED VISION AND IMAGING SCIENCES Florida State University WHY MANIFOLDS? Non-linearity

More information

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS Igor V. Konnov Department of Applied Mathematics, Kazan University Kazan 420008, Russia Preprint, March 2002 ISBN 951-42-6687-0 AMS classification:

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)

More information

Extending a two-variable mean to a multi-variable mean

Extending a two-variable mean to a multi-variable mean Extending a two-variable mean to a multi-variable mean Estelle M. Massart, Julien M. Hendrickx, P.-A. Absil Universit e catholique de Louvain - ICTEAM Institute B-348 Louvain-la-Neuve - Belgium Abstract.

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Trust-region methods for rectangular systems of nonlinear equations

Trust-region methods for rectangular systems of nonlinear equations Trust-region methods for rectangular systems of nonlinear equations Margherita Porcelli Dipartimento di Matematica U.Dini Università degli Studi di Firenze Joint work with Maria Macconi and Benedetta Morini

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize

Convergence of a Two-parameter Family of Conjugate Gradient Methods with a Fixed Formula of Stepsize Bol. Soc. Paran. Mat. (3s.) v. 00 0 (0000):????. c SPM ISSN-2175-1188 on line ISSN-00378712 in press SPM: www.spm.uem.br/bspm doi:10.5269/bspm.v38i6.35641 Convergence of a Two-parameter Family of Conjugate

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

1 Numerical optimization

1 Numerical optimization Contents 1 Numerical optimization 5 1.1 Optimization of single-variable functions............ 5 1.1.1 Golden Section Search................... 6 1.1. Fibonacci Search...................... 8 1. Algorithms

More information

4 damped (modified) Newton methods

4 damped (modified) Newton methods 4 damped (modified) Newton methods 4.1 damped Newton method Exercise 4.1 Determine with the damped Newton method the unique real zero x of the real valued function of one variable f(x) = x 3 +x 2 using

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

DISSERTATION MEAN VARIANTS ON MATRIX MANIFOLDS. Submitted by. Justin D Marks. Department of Mathematics. In partial fulfillment of the requirements

DISSERTATION MEAN VARIANTS ON MATRIX MANIFOLDS. Submitted by. Justin D Marks. Department of Mathematics. In partial fulfillment of the requirements DISSERTATION MEAN VARIANTS ON MATRIX MANIFOLDS Submitted by Justin D Marks Department of Mathematics In partial fulfillment of the requirements For the Degree of Doctor of Philosophy Colorado State University

More information

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.) 4 Vector fields Last updated: November 26, 2009. (Under construction.) 4.1 Tangent vectors as derivations After we have introduced topological notions, we can come back to analysis on manifolds. Let M

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Step-size Estimation for Unconstrained Optimization Methods

Step-size Estimation for Unconstrained Optimization Methods Volume 24, N. 3, pp. 399 416, 2005 Copyright 2005 SBMAC ISSN 0101-8205 www.scielo.br/cam Step-size Estimation for Unconstrained Optimization Methods ZHEN-JUN SHI 1,2 and JIE SHEN 3 1 College of Operations

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Handling nonpositive curvature in a limited memory steepest descent method

Handling nonpositive curvature in a limited memory steepest descent method IMA Journal of Numerical Analysis (2016) 36, 717 742 doi:10.1093/imanum/drv034 Advance Access publication on July 8, 2015 Handling nonpositive curvature in a limited memory steepest descent method Frank

More information

Sample ECE275A Midterm Exam Questions

Sample ECE275A Midterm Exam Questions Sample ECE275A Midterm Exam Questions The questions given below are actual problems taken from exams given in in the past few years. Solutions to these problems will NOT be provided. These problems and

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues

More information

Proximal-like contraction methods for monotone variational inequalities in a unified framework

Proximal-like contraction methods for monotone variational inequalities in a unified framework Proximal-like contraction methods for monotone variational inequalities in a unified framework Bingsheng He 1 Li-Zhi Liao 2 Xiang Wang Department of Mathematics, Nanjing University, Nanjing, 210093, China

More information

Pseudo-Poincaré Inequalities and Applications to Sobolev Inequalities

Pseudo-Poincaré Inequalities and Applications to Sobolev Inequalities Pseudo-Poincaré Inequalities and Applications to Sobolev Inequalities Laurent Saloff-Coste Abstract Most smoothing procedures are via averaging. Pseudo-Poincaré inequalities give a basic L p -norm control

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

Scaled gradient projection methods in image deblurring and denoising

Scaled gradient projection methods in image deblurring and denoising Scaled gradient projection methods in image deblurring and denoising Mario Bertero 1 Patrizia Boccacci 1 Silvia Bonettini 2 Riccardo Zanella 3 Luca Zanni 3 1 Dipartmento di Matematica, Università di Genova

More information

Chapter 10 Conjugate Direction Methods

Chapter 10 Conjugate Direction Methods Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, 2012 1 Wei-Ta Chu 2012/4/13 Introduction Conjugate direction methods can be viewed as being intermediate between the method

More information

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION

AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION AN ALTERNATING MINIMIZATION ALGORITHM FOR NON-NEGATIVE MATRIX APPROXIMATION JOEL A. TROPP Abstract. Matrix approximation problems with non-negativity constraints arise during the analysis of high-dimensional

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

Matrix Derivatives and Descent Optimization Methods

Matrix Derivatives and Descent Optimization Methods Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Review of Classical Optimization

Review of Classical Optimization Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization

More information

Keywords: Nonlinear least-squares problems, regularized models, error bound condition, local convergence.

Keywords: Nonlinear least-squares problems, regularized models, error bound condition, local convergence. STRONG LOCAL CONVERGENCE PROPERTIES OF ADAPTIVE REGULARIZED METHODS FOR NONLINEAR LEAST-SQUARES S. BELLAVIA AND B. MORINI Abstract. This paper studies adaptive regularized methods for nonlinear least-squares

More information

Directional Field. Xiao-Ming Fu

Directional Field. Xiao-Ming Fu Directional Field Xiao-Ming Fu Outlines Introduction Discretization Representation Objectives and Constraints Outlines Introduction Discretization Representation Objectives and Constraints Definition Spatially-varying

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

FIXED POINT ITERATIONS

FIXED POINT ITERATIONS FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

More information

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.

More information

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Key words. conjugate gradients, normwise backward error, incremental norm estimation. Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322

More information