Handling High Dimensional Problems with Multi-Objective Continuation Methods via Successive Approximation of the Tangent Space

Size: px
Start display at page:

Download "Handling High Dimensional Problems with Multi-Objective Continuation Methods via Successive Approximation of the Tangent Space"

Transcription

1 Handling High Dimensional Problems with Multi-Objective Continuation Methods via Successive Approximation of the Tangent Space Maik Ringkamp 1, Sina Ober-Blöbaum 1, Michael Dellnitz 1, and Oliver Schütze 1 University of Paderborn Faculty of Computer Science, Electrical Engineering and Mathematics Warburger Strasse 100 Paderborn, Germany {ringkamp,dellnitz,sinaob}@math.uni-paderborn.de CINVESTAV-IPN Computer Science Department Av. IPN 508, C. P , Col. San Pedro Zacatenco Mexico City, Mexico schuetze@cs.cinvestav.mx Abstract. In many applications, several conflicting objectives have to be optimized concurrently leading to a multi-objective optimization problem. Since the set of solutions, the so-called Pareto set, typically forms a (k 1)-dimensional manifold, where k is the number of objectives considered in the model, continuation methods such as predictor-corrector (PC) methods are in certain cases very efficient tools to rapidly compute a finite size representation of the set of interest. However, their classical implementation leads to trouble when considering higher dimensional models (i.e., for dimension n > 1000 of the parameter space). In this work, it is proposed to perform a successive approximation of the tangent space which allows to find promising predictor points with lower effort in particular for high dimensional models since no Hessians of the objectives have to be calculated. The applicability of the resulting PC variant is demonstrated on a benchmark model for up to n = 100, 000 parameters. Keywords: multi-objective optimization, continuation, tangent space approximation, high dimensional problems. 1 Introduction In a variety of applications in industry and finance the problem arises that several objective functions have to be optimized concurrently. For instance, for a perfect economical production plan, the ultimate desire would be to simultaneously minimize cost and maximize quality. This example already illustrates

2 a natural feature of these problems, namely that the different objectives typically contradict each other and therefore certainly do not have identical optima. Thus, the question arises how to approximate a particular optimal compromise (see e.g. Miettinen (1999) for an overview of widely used interactive methods) or how to compute or approximate all optimal compromises of this multi objective optimization problem (MOP). The latter will be considered in this article. Mathematically speaking, in a MOP there are given k objective functions f 1,..., f k : R n R which have to be minimized. The set of optimal compromises with respect to the objective functions is called the Pareto set 1. A point x R n in parameter space is said to be optimal or a Pareto point if there is no other point which is at least as good as x in all the objectives and strictly better in at least one objective. This work will concentrate on the approximation of the Pareto set. Multi objective optimization is currently a very active area of research. By far most of the methods for the computation of single Pareto points or the entire Pareto set are based on a scalarization of the MOP (i.e., a transformation of the original MOP into one or a sequence of scalar optimization problems, see, e.g., Das and Dennis (1998), Fliege (004) or Eichfelder (008)). For a survey of these and further methods the reader is referred to Miettinen (1999) for nonlinear MOPs and to Jahn (1986) and Steuer (1986) in the linear case. Another way to attack the problem is by using bio-inspired heuristics such as Evolutionary Algorithms (e.g., Deb (001), Coello Coello et al. (007)). In such methods, an entire set of candidate solutions (population) is considered and iterated (evolved) simultaneously which allows for a finite size representation of the entire Pareto set in one run of the algorithm. These methods work without gradient information of the objectives and are particularly advantageous in the situation where the MOP is not differentiable and/or where the objectives contain many local minima. A method which is based on a stochastic approach is presented in Schäffler et al. (00). In this work, the authors derive a stochastic differential equation which has the property that it is very likely to observe corresponding solutions in a neighborhood of the set of (local) Pareto points. Similar to the evolutionary strategies the idea of Schäffler et al. (00) is to directly approximate the entire solution set and not just single Pareto points on the set. Another way to compute the entire Pareto set is to use subdivision techniques (Dellnitz et al. 005, Jahn 006). These algorithms start with a compact subset Q R n of the domain and generate outer approximations of the Pareto set which get finer under iteration (see Dellnitz et al. (00) for a convergence result). The approach is of global nature and hence in practice restricted to moderate dimensions of the parameter space. Typically that is, under mild regularity conditions on the model the set of Pareto points forms locally a (k 1)-dimensional manifold if there are k smooth objective functions. This is the reason why continuation methods such as predictor- 1 Named after the economist Vilfredo Pareto,

3 3 corrector (PC) methods for the computation of general implicitly defined manifolds (e.g., Rheinboldt (1986), Allgower and Georg (1990)) can be successfully applied to the context of multi-objective optimization, see e.g. Guddat et al. (1985), Rakowska et al. (1991), Hillermeier (001) or Schütze et al. (005). In the following the working principle of a PC method is described. For sake of a better understanding the particular case of a bi-objective problem is considered here (i.e., a MOP with k = objectives, where it is assumed that the Pareto set can be expressed by a curve. See Figure 1 for such an example). For a more general and thorough description of PC methods the reader is referred e.g. to Allgower and Georg (1990) or Section. Given a Pareto optimal solution x i P Q, where P Q denotes the Pareto set of the given problem, a further solution x i+1 P Q near to x i is computed in the following two steps: (P) Predict a point p i+1 R n such that p i+1 x i points along the solution set. Typically, this is done by linearizing P Q at x i. That is, one can choose p i+1 := x i + tν, where t R\{0} is a step size and ν is the tangent vector of P Q at x i. (C) Compute a solution x i+1 P Q in the vicinity of p i+1 (p i+1 is corrected to the solution set). It is widely accepted (e.g., Allgower and Georg (1990)) that the additional selection of the predictor is beneficial for the overall computational efficiency than directly computing x i+1 from x i due to the locality in the search in step (C) (in case t is chosen properly p i+1 is already very close to P Q ). Proceeding in the same manner one obtains a method that generates solutions along the Pareto set starting from the initial solution x i. Though PC methods are at least locally typically quite effective, they are, however, based on some assumptions. First, an initial solution has to be known or computed before the process can start. Further, it can be the case that P Q falls into several connected components (which may happen, for instance, if one or more objectives contain discontinuities). Due to their local nature, PC methods are restricted to the connected component that contains the given initial solution. Hence, in order to be able to approximate the entire Pareto set, the PC method has to be fed with multiple initial solutions. As a possible remedy for both problems PC methods can be combined with global search strategies such as evolutionary algorithms. This has been done e.g. in Schütze et al. (003), Harada et al. (007), Schütze et al. (008) and Lara et al. (010). One problem remaining is that PC methods may run into trouble for the treatment of higher dimensional MOPs as it is addressed here. Given a solution x, the main requirements of a classical (multi-objective) PC method to obtain a further solution are as follows: (P) In the predictor step, the Hessians f i (x) of the objective functions f i have to be computed. Further, a QR-decomposition of the Jacobian F of an auxiliary map F has to be computed. F is at least in the unconstrained case

4 4 Fig. 1. Given x i P Q a further solution is computed in two steps by PC methods: (P) a predictor is generated by linearizing P Q at x i ( p i+1), and (C) this point is corrected back to the solution set ( x i+1). mainly given by a weighted sum H(x, α) := k i=1 α if i (x) of the Hessians f i. This yields a linearization of the solution set at x. (C) For the correction of the predicted solution p obtained via linearization, typically the Gauss-Newton method applied on F is used which requires H(x (i), α (i) ) in each iteration i and the solution of a linear system of equation (of dimension m > n, where x R n, however, for large values of n one can assume m n). Hence, the cost to obtain a further solution is O(n ) in terms of memory and O(n 3 ) in terms of flops for full matrices H(x, α) and the algorithm runs into trouble on a standard PC for n > One possible remedy for high dimensional MOPs is certainly to exploit the sparsity of the model (if given). Here, an alternative approach is followed by changing the PC method as follows: (P) perform a successive approximation of the tangent space of the implicitly defined manifold at a given solution x which is based on some analysis performed on the geometry of the problem (and which is also the main contribution of this work), and (C) change the Gauss-Newton method against the limited memory BFGS method proposed in Shanno (1978) which is designed and approved for large scale problems. The cost of the novel method is O(n) in terms of memory and O(n ) in terms of flops. The remainder of this paper is organized as follows: In Section, the required background for the understanding of the sequel is given. In Section 3, the analysis for the successive approximation of the tangent space, and in Section 4 the resulting algorithms are presented. In Section 5, some numerical results on an

5 5 academic model are shown with up to n = 100, 000 dimensions, and finally, some conclusions are drawn in Section 6. Background This section gives the required background for the predictor-corrector algorithm which is described in the next section: the basic idea of continuaton methods are addressed (following mainly Rheinboldt (1986) and Allgower and Georg (1990)), and further the connection to multi-objective optimization is given. Continuation Methods Assume a differentiable map H : S R N R M, d := N M 1, (1) of class C r, r 1, is given on an open subset S R N. A point x S is called regular if the first derivative, H (x) R M N, has maximal rank M. Further, assume one is interested in the following system of equations: In the case the regular solution set H(x) = 0, x S. () M = {x S H(x) = 0, x regular} (3) is non-empty, it is well-known that M is a d-dimensional C r -manifold in R N without boundary (e.g., Rheinboldt (1986)). One possible way to tackle problem () numerically is to use continuation methods such as PC methods. Given an initial (approximated) solution x M further solutions x (i) M near x are found by PC methods via the following two steps: (P) Predict a set {p 1,..., p s } of distinct (and well distributed) points which are near both to x and to M. (C) For i = 1,..., s Starting with the predicted point p i, compute by some (typically few) iterative steps an approximated element x (i) of M, i.e. H(x (i) ) 0. One way to obtain well distributed predictors near to a solution x M is to compute an orthonormal basis of the tangent space at x via a QR-decomposition of H (x) T : The tangent space at a point x M is given by T x M = kerh (x) = {u R N H (x)u = 0}. (4)

6 6 The normal space N x M at x M is the orthogonal complement of T x M: N x M = (T x M) = (kerh (x)) = rgeh (x) T. (5) Let Q = (Q 1 Q ) R N N be an orthogonal matrix and R = where R 1 R M M is an upper triangular matrix, such that H (x) T = QR = (Q 1 Q ) ( ) R1 R 0 N M, ( ) R1. (6) 0 If x is regular it follows that the diagonal elements of R 1 do not vanish and hence it is straightforward to see that the columns of Q 1 R N M provide an orthonormal basis of rgeh (x) T = N x M. Thus, an orthonormal basis of T x M is given by the columns of Q R N d. Hence, one may choose predictors p R n e.g. by p := x + tq, (7) where t R\{0} is a step size and q a column vector of Q (compare to the example related to Figure 1). For particular choices to spread the predictors along the tangent space as well as for step size strategies refer to Ringkamp (009). The efficient approximation of Q is the main focus in this work. For the realization of the corrector step (C), typically the Gauss-Newton method x (i+1) = x (i) H (x (i) ) + H(x (i) ), i = 0, 1,..., (8) where H (x (i) ) + R N M is the Moore-Penrose inverse of H (x (i) ), is applied. It is well-known that this method converges quadratically to a point x M if the starting vector x (0) R N is chosen close enough to M. Refer e.g. to Deuflhard (004) for a local convergence result. In case of higher dimensions it is suggested here to use the limited memory BFGS method introduced by Shanno (1978) x (i+1) := x (i) + t i d (i), i = 0, 1,... (9) where t i is an exact, Powell- or Armijo-step size. With f(x) = H(x) it holds d (0) = f(x (0) ) and for i = 0, 1,... d (i+1) := (y(i) ) T s (i) y (i) g (i+1) ( (s(i) ) T g (i+1) (y(i) ) T g (i+1) ) (y (i) ) T s (i) y (i) s (i) + (s(i) ) T g (i+1) y (i) y (i) where g (i) := f(x (i) ), s (i) := x (i+1) x (i) and y (i) := f(x (i+1) ) f(x (i) ).

7 7 Multi-Objective Optimization In a multi objective optimization problem (MOP) the task is to simultaneously optimize k objective functions f 1,..., f k : R n R. More precisely, a general MOP can be stated as follows: min {F (x)}, Q := {x x Q Rn h(x) = 0, g(x) 0}, (MOP) where the function F is defined as the vector of the objective functions F : Q R k, F (x) = (f 1 (x),..., f k (x)), and h : Q R m, m n, and g : Q R q. The optimality of a MOP is defined by the concept of dominance (Pareto 1971). Definition 1. (a) Let v, w R k. Then the vector v is less than w (v < p w), if v i < w i for all i {1,..., k}. The relation p is defined analogously. (b) A vector y R n is dominated by a vector x R n (x y) with respect to (MOP) if F (x) p F (y) and F (x) F (y), else y is called non-dominated by x. (c) A point x Q is called (Pareto) optimal or a Pareto point if there is no y Q which dominates x. The set of all Pareto optimal solutions is called the Pareto set, and is denoted by P Q. The image F (P Q ) of the Pareto set is called the Pareto front. Fundamental for many methods for the numerical treatment of MOPs is the following theorem of Kuhn and Tucker (1951) which states a necessary condition for Pareto optimality for MOPs with equality constraints. Theorem 1. Let x be a Pareto point of (MOP) with q = 0. Suppose that the set of vectors { h i (x) i = 1,..., m} is linearly independent. Then there exist vectors λ R m and α R k with α i 0, i = 1,..., k, and k i=1 α i = 1 such that k m α i f i (x ) + λ j h j (x ) = 0 (10) i=1 j=1 h i (x ) = 0, i = 1,..., m. In the unconstrained case i.e. for m = 0 the theorem says that the vector of zeros can be written as a convex combination of the gradients of the objectives at every Pareto point. Obviously, (10) is not a sufficient condition for (local) Pareto optimality. On the other hand, points satisfying (10) are certainly Pareto candidates, and thus, following Miettinen (1999), their relevance is now emphasized by the following Definition. A point x R n is called a substationary point or Karush Kuhn Tucker point 3 (KKT point) if there exist scalars α 1,..., α k 0 and λ R m such that (10) is satisfied. Without loss of generality only equality constraints are considered here. For a more general formulation of the theorem refer e.g. to Miettinen (1999). 3 Named after the works of Karush (1939) and Kuhn and Tucker (1951) for scalar valued optimization problems.

8 8 Having stated Theorem 1, one is (following Hillermeier (001)) in the position to give a qualitative description of the set of Pareto optimal solutions which gives at the same time the link to (): Denote by F : R n+m+k R n+m+1 the following auxiliary map: k α i f i (x) + m λ j h j (x) i=1 j=1 F (x, λ, α) = h(x). (11) k α i 1 By Theorem 1 it follows that for every substationary point x R n there exist vectors λ R m and α R k such that i=1 F (x, λ, α ) = 0. (1) Hence, one expects that the set of KKT-points defines a (k 1)-dimensional manifold. This is indeed the case under certain smoothness assumptions, see Hillermeier (001) for a thorough discussion of this topic. To estimate the approximation quality of a candidate set generated by an optimization procedure to the Pareto front the Hausdorff distance will be used in this work which is defined as follows: Definition 3. Let u, v R n, A, B R n, and d(, ) be a metric on R n. Then, the Hausdorff distance d H (, ) is defined as follows: (a) dist(u, A) := inf d(u, v) v A (b) dist(b, A) := sup dist(u, A) u B (c) d H (A, B) := max {dist(a, B), dist(b, A)} 3 Approximation of T x M In this section, the geometry of the problem will be analyzed. The results will be the basis for the successive approximation of the tangent space which will be done in the next section. In the following, assume that M as defined in (3) is a sufficiently smooth d- dimensional manifold, and that a point x (0) M is given. In the sequel, matrices are used for the representation of approximations of the tangent space T x (0)M which are defined as follows: Definition 4. Let c, δ R with c 1 and δ > 0. Denote by T x (0)M(c, δ) the set of all matrices A R N d with rank(a) = d, condition number κ (A) c,

9 9 and A i + x (0) M B δ (x (0) ) i = 1,..., d, where A i are the columns of A, i.e., T x (0)M(c, δ) := {A = (A 1,..., A d ) R N d rank(a) = d, κ (A) c, A i + x (0) M B δ (x (0) ) i = 1,..., d} (13) Remark 1. (a) Given A T x (0)M(c, δ), the columns A i can be interpreted as secants that intersect M in the two points x (0) and x (i) := A i + x (0). In case the A i s are linearly independent, the image of the linear map A : R N R d, is a d-dimensional subspace of R N. In this way, A(R N ) can be viewed as an approximation of T x (0)M (or the matrix Q as defined in Equation (6)). In the context of PC methods it means that if A is accepted as a suitable approximation of T x (0)M predictors can be chosen e.g. as p := x (0) + t A i A i, (14) where A i is a column vector of A and t is chosen as in (7). (b) For all δ > 0 and 1 c 1 c it holds T x (0)M(c 1, δ) T x (0)M(c, δ). Analog for all c 1 and 0 < δ 1 δ it is T x (0)M(c, δ 1 ) T x (0)M(c, δ ). Lemma 1. There exists δ > 0 such that for all δ R with 0 < δ < δ there exists a matrix A = (A 1,..., A d ) T x (0)M(1, δ) with A i = δ i = 1,..., d and A i A j i j. Proof. The proof is separated into two parts. In (a), the existence of x M l = (H l ) 1 (0) with x x (0) = δ will be shown under some requirements on M l. In (b), it will be first proven for all l {0,..., d} that these requirements hold for some points x (1),..., x (l) M. After that, part (a) will be repeatedly used to construct these points and they will finally be used to create the matrix A. (a): For 1 l d let M l := M M l be a (d l + 1) dimensional submanifold given by M l = (H l ) 1 (0) where H l : B δ (x (0) ) R N R N d+l 1 and x (0) M l. Further let (i) ϕ : B δ (x 0 ) V be a local chart for the d-dimensional submanifold M with ϕ(b δ (x 0 ) M) = V (R d {0}), (ii) M l be a (N l + 1)-dimensional submanifold with a chart ϕ l : R N R N, ϕ l ( M l ) = R N l+1 {0}, (iii) the N d + l vectors (x x (0) ), H1(x), l..., HN d+l 1 l (x) be linearly independent for all x M l Ḃδ(x (0) ) where Ḃδ(x (0) ) := B δ (x (0) ) \ {x (0) }. Then, for all δ > 0 with δ < δ it follows the existence of x M l with x x (0) = δ.

10 10 Proof: Let K := M l B δ(x (0) ). First it is proven that K is compact by showing that each sequence in K has a convergent subsequence with limit in K. Let (x n ) n N K be a sequence, (x n ) n N is bounded and due to the Bolzano- Weierstrass theorem it has a convergent subsequence. In abuse of notation the same index n is used for the subsequence, since the entire sequence is not needed any longer. Thus, it is x n x B δ(x (0) ) B δ (x (0) ) and according to (i) and (ii) it follows that ϕ(x) V and ϕ l (x) N N l+1. By continuity of ϕ and using x n M l n N it is 0 = lim n ϕ d+i(x n ) = ϕ d+i (x) i {1,..., N d}, (15) which implies ϕ(x) V (R d {0}) and so x M. By continuity of ϕ l it follows analogously that x M l, and hence, also that x K which shows the compactness of K. By this it follows the existence of a vector x K with x x (0) = max x K x x(0). (16) In the case x x (0) = δ the claim follows. Now assume that x x (0) < δ. Consider the following optimization problem min x R N x x (0) s.t. H l (x) = 0 g(x) := x x (0) δ 0. (17) It is x x (0) and so x Ḃδ(x (0) ) since M l is a (d l + 1)-dimensional submanifold with x (0) M l. As a result the inequality constraint is not active at x and using (iii) it follows that the vectors H1( x), l..., HN d+l 1 l ( x) are linearly independent. That means that the Mangasarian-Fromowitz Constraint Qualification (e.g., Nocedal and Wright (006)) is fulfilled and since x is a local minimizer for the optimization problem (17), there exists λ R N d+l such that ( x, λ) is a Karush-Kuhn-Tucker point. Therefore, it holds λ N d+l g( x) = 0 and N d+l 0 = ( x x (0) ) + λ i Hi( x) l + λ N d+l ( x x (0) ). (18) i=1 Since g( x) 0 it follows that λ N d+l = 0, and thus, that the vectors ( x x (0) ), H1( x), l..., HN d+l 1 l ( x) are linearly dependent. That is a contradiction, and hence, it holds that x x (0) = δ, and the claim follows. (b): Let ϕ : U V be a local chart for the d-dimensional submanifold M and let U, V R N be open sets with x (0) U, ϕ(x (0) ) = 0 V and ϕ(u M) = V (R d {0}) and let l {1,..., d}. First of all, it will be proven by contradiction that a rank criterion holds for some special x (1),..., x (l) M.

11 Assumption: δ > 0 x (1),..., x (l) Ḃδ(x (0) ) M, x (i) x (0) x (j) x (0) i j and x B δ (x (0) ) with H (x) (x (1) x (0) ) T rank N d + l.. (19) (x (l) x (0) ) T By choosing the sequence (δ n ) n N = ( ) 1 n the assumption leads to the existence of sequences x (1)n) ( ( n N,..., x (l)n) Ḃ 1 (x(0) ) M and (x n ) n N n N n N n B 1 ) with x (n), x (1)n,..., x (l)n x (0) and x (i) x (0) x (j) x (0) i j. n Since the rank in (19) has the upper bound N d + l, it is H (x n ) (x (1)n x (0) ) T rank < N d + l. n N. (0) (x (l)n x (0) ) T Equation (0) is used in the following to get a contradiction. In fact, a sequence of zeros with non-zero limit will be constructed. According to standard analysis (e.g. Königsberger (1997)) there exists an embedding γ : V R N with an open set V R d and γ(v ) = M U. W.l.o.g. assume 0 V, γ(0) = x (0) and B 1 (x (0) ) U. Therefore, it holds: (i) γ : V M U is bijective, (ii) γ is continuously differentiable, (iii) γ (0) R N d has maximal rank and T x (0)M = γ (0)R d, (iv) γ 1 : M U V exists and is continuous. Define t (i)n := γ 1 (x (i)n ), i {1,..., l}, n N. By γ 1 (x (0) ) = 0 and (i) it follows that t (i)n 0. Simple calculations show that the following sequence is bounded (more concrete, its Euclidean norm is bounded by l) t (1)n t (1)n. t (l)n t (l)n n N 11. (1) Therefore one can apply the theorem of Bolzano-Weierstrass to obtain a convergent subsequence. In abuse of notation the same index n is used for that subsequence, since the entire sequence is not needed ( any longer. Subdividing that t subsequence in the l d-dimensional parts (i)n t (i)n leads to l convergent )n N sequences. It is t(i)n t t (i)n = 1 n N, hence, it follows (i)n t (i)n t (i) 0. Due to (iii) it follows 0 b (i) := γ (0)t (i) T x (0)M ()

12 1 and due to (ii) it is x (i)n x (0) t (i)n Using x (i) x (0) x (j) x (0) By 0 = lim n = γ(t(i)n ) γ(0) t (i)n b (i). (3) i j, it follows x (i)n x (0), x(j)n x (0) t (i)n t (j)n = b (i), b (j) i j. (4) T x (0)M = {v R N H (x (0) ) v = 0} (5) and () it follows that b (i) H l (x (0) ) i {1,..., l}, l {1,..., N d}. Using that and (4) it becomes apparent that the matrix H (x (0) ) b (1)T. R(N d+l) N (6) b (l)t has full rank. Let det be a composition of a function which projects the matrix in (6) to a regular (N d + l) (N d + l) submatrix and the determinant. It holds that det is a continuous function since it is a composition of continuous functions. Therefore, it is H (x (0) ) H (x n ) 0 det b (1)T. b (k)t (3) = lim n det (x (1)n x (0) ) T t (1)n. (x (k)n x (0) ) T t (k)n (0) = lim 0 = 0, (7) n which is a contradiction. Thus, the initial assumption is false. Hence, the following statement holds: l {1,..., d} δ > 0 such that x (1),..., x (l) Ḃ δ (x (0) ) M, x (i) x (0) x (j) x (0) i j and x B δ (x (0) ), and hence H (x) (x (1) x (0) ) T rank = N d + l. (8). (x (l) x (0) ) T Since there are just finitely many integers l {1,..., d} to consider, it follows the existence of such a radius δ > 0 for all l {1,..., d}. Using such a radius δ with B δ (x (0) ) U mathematical induction will be used to show the existence of points x (1),..., x (l) M with x (i) x (0) = δ i {1,..., l} and x (i) x (0) x (j) x (0) i, j {1,..., l}, i j for all l d. Basis l = 1:

13 Define M 1 := R N and use some orthogonal vectors v (1),..., v (N) R N to define a chart ϕ 1 : R N R N, (v (1) ) T (x x (0) ) ϕ 1 (x) :=. (9) (v (N) ) T (x x (0) ) for the N-dimensional submanifold M 1. Further define the d-dimensional submanifold M 1 := M M 1 = M = (H 1 ) 1 (0) with H 1 = H. Therefore (a) (ii) and (i) are fulfilled. According to the rank condition (8) it follows the linear independence of (x (1) x (0) ), H 1 1 (x (1) ),..., H 1 N d(x (1) ) (30) for all x (1) M 1 Ḃδ(x (0) ) as desired in (a) (iii). As a result (a) yields a point x (1) M 1 with x (1) x (0) = δ. Inductive step: Let points x (1),..., x (l) M exist with x (i) x (0) = δ i {1,..., l} and x (i) x (0) x (j) x (0) i, j {1,..., l}, i j for l < d. It has to be shown that this is also true for l + 1 d. Let H l+1 : R N R l with ( x (1) x (0) ) T (x x (0) ) H l+1 (x) :=.. (31) ( x (l) x (0) ) T (x x (0) ) According to standard linear algebra there exist orthogonal vectors v (1),..., v (N l) R N with ( x (i) x (0) ) v (j) for all i {0,..., l}, j {1,..., N l}. Hence, it follows that ϕ l+1 : R N R N, (v (1) ) T (x x (0) ). ϕ l+1 (v (N l) ) T (x x (0) ) (x) := ( x (1) x (0) ) T (x x (0) (3) ). ( x (l) x (0) ) T (x x (0) ) is a chart for the (N l)-dimensional submanifold M l+1 := ( H l+1 ) 1 (0). Further, let H l+1 : B δ (x (0) ) R N d+l with ( ) H(x) H l+1 (x) := H l+1 (33) (x) and define M l+1 := (H l+1 ) 1 (0) = H 1 (0) ( H l+1 ) 1 (0) = M M l+1. According to (8) it is rank ( (H l+1 ) (x) ) = N d + l for all x B δ (x (0) ) and 13

14 14 with x (0) M l+1 it follows with the regular value theorem that M l+1 is a (d l)-dimensional submanifold. Therefore (a) (i) and (ii) are fulfilled. Because of Hl+1 (x (l+1) ) = 0 x (l+1) M l+1 it follows x (l+1) x 0 x (i) x 0 i {1,..., l}. Again, using the rank condition (8) yields the linear independence of H 1 (x (l+1) ),..., H N d (x (l+1) ), ( x (1) x 0 ),..., ( x (l) x 0 ), (x (l+1) x (0) ) = H l+1 1 (x (l+1) ),..., H l+1 N d+l (x(l+1) ), (x (l+1) x (0) ) (34) for all x (l+1) M l+1 Ḃδ(x (0) ) as desired in (a) (iii). As a result (a) yields a point x (l+1) M l+1 with x (l+1) x (0) = δ and x (i) x (0) x (j) x (0) {1,..., l + 1}, i j. That completes the mathematical induction. Finally, define A i := x (i) x (0) i 1,..., d, then with i, j A T A = ( A T ) d i A j = diag(δ,..., δ) (35) i,j=1 it follows κ (A) = λmax(a T A) λ min(a T A) = δ δ = 1, where λ max(a T A) is the maximal and λ min (A T A) the minimal singular value of A T A. Thus, A T x (0)M(1, δ), and the claim follows. The following result shows that for every c 1 there exists a δ > 0 such that the image of every matrix A T x (0)M(c, δ) approximates the tangent space within a pre-defined tolerance ɛ R +. Theorem. Let N, d N, d N and M R N be a d-dimensional submanifold with tangent space T x (0)M in x (0) M. Then it holds: (a) δ > 0, c 1 it is T x (0)M(c, δ). (b) ɛ > 0, c 1 there exists a δ > 0 such that A T x (0)M(c, δ) it holds: B R N d with rank(b) = d, B(R d ) = T x (0)M and At Bt ɛ Bt t R d (36) Proof. Ad (a): Let δ > 0 and c 1. Due to Lemma 1 there exists δ > 0 such that δ 1 R with 0 < δ 1 < δ there exists a matrix A = (A 1,..., A d ) T x (0)M(1, δ ) with A i = δ 1 i = 1,..., d. Hence, it is A T x (0)M(1, δ 1 ), and for δ 1 < δ it follows with Remark 1 (b): T x (0)M(1, δ 1 ) T x (0)M(c, δ). Ad (b): According e.g. to Königsberger (1997) there exists an embedding γ : V R N, U R N, V R d with γ(v ) = M U and x (0) M U. W.l.o.g. assume 0 V and γ(0) = x (0). Therefore, it holds: (i) γ (0) R N d has maximal rank, (ii) γ is continuously differentiable, (iii) γ 1 : M U V exists and is continuous.

15 15 Given a matrix C R N d with rank(c) = d, it follows that also the pseudo inverse C + R d N has maxmial rank. Further, since C + C = I R d d, it holds: t = C + Ct C + Ct, t R d. (37) Because of (ii) for each a > 0 there exists a constant δ > 0, such that t R d B δ(0) it holds γ(t) γ(0) γ (0)t a t. (38) Due to (i) it holds γ (0) + > 0 and one can define a := ɛ (c + 1) d γ (0) +. (39) By (iii) and γ 1 (x (0) ) = 0 it follows that for every δ > 0 there exists a δ > 0 such that B δ (x (0) ) U and for all x M B δ (x (0) ) it holds γ 1 (x) < δ. Defining t := γ 1 (x) it follows x x (0) γ (0)t = γ(t) γ(0) γ (0)t (38) (37) ɛ (c + 1) d γ (0)t. ɛ (c + 1) d γ (0) + t (40) Let A := (A 1,..., A d ) T x (0)M(c, δ) and define x (i) := A i + x (0) i = 1,..., d, then it is x (i) M B δ (x (0) ). Further, define t (i) := γ 1 (x (i) ) and B := (B 1,..., B d ) = (γ (0)t (1),..., γ (0)t (d) ) R N d, then it follows by (40) A i B i ɛ (c + 1) d B i, i = 1,..., d. (41) W.l.o.g. assume that ɛ < 1. Using the Frobenius norm. F : R N d R, d (C 1,..., C d ) F = i=1 C i, and the inequalities C C F d C (e.g., Golub and Loan (1996)) it follows A B A B F = d A i B i = i=1 ɛ (c + 1) d B F (41) ɛ (c + 1) B ɛ (c + 1) d B i d i=1 ɛ (c + 1) ( A B + A ) ɛ<1 A B + ɛ A (c + 1). (4)

16 16 By this it follows that A B ɛ A c κ (A) c ɛ A A + A = ɛ A + (43) which leads to A + B A < 1. Using the perturbation lemma (e.g., Golub and Loan (1996)) it follows and rank(b) = rank(a + (B A)) = d (44) B + = (A + (B A)) + By (44) it follows B(R d ) = T x (0)M. Further, it holds A + 1 A + B A. (45) (43) A B ɛ A + A B ( 1 A + A B = ɛ A + ɛ<1 ɛ A + ɛ A B ) (45) ɛ B +. (46) For all t R d with t = 1 it follows (A B)t max (A B) t = A B t =1 (46) ɛ B + = ɛ t (37) B + ɛ Bt. (47) Finally, since (A B) is a linear function, it holds t R d At Bt ɛ Bt. (48) That is, roughly speaking, for every c > 1 there exists a δ > 0 such that the relative deviation v ṽ v of every vector v T x (0)M \ {0} to a vector ṽ A(R d ) is less than a given tolerance ɛ > 0. Here, A T x (0)M(c, δ) can be chosen arbitrarily. However, in general, no δ > 0 can be determined such that this property holds for all c > 1 which will be demonstrated in the following example: In each neighborhood around x 0 M there exist further points x (1), x () M such that A 1 := x (1) x (0) and A := x () x (0) are linearly dependent, however, in the vector space A(R d ) there exist always vectors which are perpendicular to vectors in T x (0)M. Example 1. Consider the -dimensional manifold M := {x R 3 x 1 + x + x 3 = 0, x i < 1, i = 1, } (49)

17 17 with the embedding ϕ : U R 3, ϕ(t) := (t 1, t, t 1 t ) T, (50) ), where U := {t R t i < 1}. It is ϕ(u) = M. Let B := ϕ (0) = ( then it is T x (0)M = B(R ). For 1 > δ > 0 and t (0) := (0, 0) T, t (1) := ( δ, 0)T, t () := ( δ 4, 0)T let x (0) := ϕ(t (0) ) = (0, 0, 0) T, x (1) := ϕ(t 1 ) = ( δ, 0, δ 4 )T, and x () := ϕ(t () ) = ( δ δ 4, 0, 16 )T. Then, it is x (0), x (1), x () M B δ (x 0 ), and the vectors A 1 := x (1) x (0) = ( δ, 0, δ 4 A := x () x (0) δ4 δ = (, 0, 16 ) T ) T (51) are linearly independent for every value of δ 0. However, the subspace spanned by A := (A 1, A ) contains vectors that are orthogonal to T x (0)M such as v = (0, 0, 1) T (compare to Figure ). It holds A + = (A + 1, A+, A+ 3 ), with A+ 1 = ( δ, 8 δ )T, A + = (0, 0)T, A + 3 = ( 8 δ, 16 δ ) T. Thus, for the condition number of A it holds κ (A) = A A + Ae A + e 3 = δ + 4 δ + 1 (5) for δ 0 and thus there does not exist any constant c 1 with κ (A) c required in Theorem. δ > 0 as Corollary 1. For all ɛ > 0, δ > 0, c 1 there exists δ > 0 such that A T x (0)M(c, δ) it holds v T x (0)M B δ(0) ṽ A(R d ) : ṽ v ɛ. (53) Proof. Let v T x (0)M B δ(0). By Theorem there exists a matrix B R N d such that v T x (0)M B δ(0) t R d with v = Bt. Let ṽ := At, it follows ṽ v = At Bt ɛ δ Bt = ɛ δ v ɛ. (54) The above result gives a hint of how to approximate the tangent space: Given x (0) M, one can compute d further solutions x (i) M, i = 1,..., d, in the vicinity of x (0). If for the matrix A of secants it holds rank(a) = d (55) κ (A) c (56) A i + x (0) M B δ (x (0) ), (57)

18 18 Fig.. The vectors v i, i = 1, (which are multiples of the secants A i for sake of a better visualization) span a subspace that contains vectors that are orthogonal to the (exact) tangent space T x (0)M. then it is A T x (0)M(c, δ). If further c and δ are small enough, then one can expect due to Corollary 1 that rge(a) serves as a good approximation of T x (0)M. However, so far nothing is gained from the practical point of view since it is still unclear how to choose the neighboring solutions x (i), and for a given set of solutions the conditions (55) to (57) have to be checked. In the following, a result is stated that is the basis for the successive approximation of the tangent space that is proposed in the next section. As an additional bonus, the verification of the conditions (55) to (57) will get superfluous. Theorem 3. Let c 1, then it holds: (a) There exists δ > 0 such that δ l, δ u R [ ] with 0 < δ u < δ and δ l (1+c )(d 1)+ (1+c )(d 1)+c δ u, δ u there exist vectors x (i) M B δ (x (0) ) i = 1,..., d such that (a1) x (i) x (0) [δ l, δ u ] i = 1,..., d (a) 1 x (i) x (0) (x (j) x (0) ) ] [δ u c δ l δ u (1+c )(d 1), δ l + c δ l δ u (1+c )(d 1) i j. [ ] (b) δ, δ l, δ u R with 0 < δ u < δ and δ l (1+c )(d 1)+ (1+c )(d 1)+c δ u, δ u and x (i) M B δ (x (0) ) such that (a1) and (a) are satisfied, it holds A := (x (1) x (0),..., x (d) x (0) ) T x (0)M(c, δ). (58) Proof. Ad (a): Due to Lemma 1 there exists δ > 0, such that δ R with 0 < δ < δ there exists a matrix A = (A 1,..., A d ) T x (0)M(1, δ) with A i = δ,

19 19 i = 1,..., d, and A i A j i j. Let δ l, δ u R with 0 < δ u < δ and δ l 0 < δ l δ u, since 0 < (1+c )(d 1)+ (1+c )(d 1)+c [ (1+c )(d 1)+ (1+c )(d 1)+c δ u, δ u ], then it holds (1+c )(d 1)+ (1+c )(d 1)+ = 1. It follows that [δ l, δ u ]. Let x (i) := A i + x (0), then it is x (i) M B δ (x (0) ) i = 1,..., d. Choosing δ δl = +δ u δl it follows that δ l = +δ l δ δu +δ u = δ u, and it is x (i) x (0) = A i = δ [δ l, δ u ] i = 1,..., d, (59) i.e., condition (a1) holds for the chosen solutions x (i). Further, it holds for all i j Furthermore, it is 1 x (i) x (0) (x (j) x (0) ) = 1 A i A j (A i A j) = = δ. 1 ( A i + A j ) (60) (1 + c )(d 1) + (1 + c )(d 1) + c δ u δ l (61) (1 + c )(d 1)δ u + δ u (1 + c )(d 1)δ l + c δ l (6) δ u + δ u (1 + c )(d 1) δ l + c δ l (1 + c )(d 1) (63) δ u δ l + c δ l δ u (1 + c )(d 1). (64) It follows that δ l + δ u δ l + c δ l δ u (1 + c )(d 1) (65) and δu c δl δ u (1 + c )(d 1) δ l + δ u. (66) Combining the above results it follows i j 1 x (i) x (0) (x (j) x (0) ) = δ = δ l + δ u [ δu c δ l δ u (1 + c )(d 1), δ l + c δ l δ u (1 + c )(d 1) ].

20 0 Hence, condition (a) holds for the chosen x (i), and the claim follows. [ ] Ad (b): Let δ, δ l, δ u R with 0 < δ u < δ and δ l (1+c )(d 1)+ (1+c )(d 1)+c δ u, δ u and let x (i) M B δ (x (0) ) i = 1,..., d. Let A i := x (i) x (0) and A := (A 1,..., A d ) R N d, then A T A = ( A i, A j ) d i,j=1 R d d is symmetric and has d real eigenvalues λ i, i = 1,..., d. Let K i := z R z A i, A i d j=1,j i A i, A j, (67) then it follows by the Theorem of Gerschgorin (e.g., Atkinson (1989)) that every eigenvalue λ i of A T A is contained in d i=1 K i. By condition (a1) it holds A i, A i = A i [δl, δ u], i = 1,..., d, and by condition (a)it is 1 A i A j [δ u c δ l δ u (1+c )(d 1), δ l + Putting this together it follows i j c δ l δ u (1+c )(d 1) ] i j. A i, A j = 1 Ai + A j A i A j = { 1 1 (a1),(a) = ( ) Ai + A j A i A j, if Ai + A j A i A j ( ) Ai A j A i A j, if Ai + A j < A i A j δu δl + ( δ u c δ l δ u (1 + c )(d 1). ) c δ l δ u (1+c )(d 1), if A i + A j A i A j c δ l δ u (1+c )(d 1) δ l, if A i + A j < A i A j For all z K i with z A i it follows z A i = z A i, A i z A i + d j=1,j i = δ u + c δ l δ u (1 + c ). d j=1,j i A i, A j A i, A j δu c δl + (d 1) δ u (1 + c )(d 1)

21 1 Furthermore, for all z K i with z A i it is d A i z = z A i, A i A i, A j d z A i j=1,j i j=1,j i A i, A j δl c δl (d 1) δ u (1 + c )(d 1) Hence, it is K i = δ l c δ l δ u (1 + c ). [ ] δl c δ l δ u (1+c ), δ u + c δ l δ u (1+c ) i = 1,..., d and it follows K := d K i i=1 [ δl c δl δ u (1 + c ), δ u + c δl ] δ u (1 + c. (68) ) The following consideration shows that all eigenvalues λ 1... λ d of A T A are strictly positive. It is Hence, λ1 λ d λ i inf z K z δ l c δ l δ u (1 + c ) > δ l c δ l + δ l (1 + c ) = δ l (c + 1) (1 + c ) δ l = 0. is defined and it holds c δ l δ u (1+c ) λ 1 δ u + λ d δl c δl δ u (1+c ) = (1 + c )δ u + c δ l δ u (1 + c )δ l c δ l + δ u = c (δ u + δ l ) δ l + δ u = c. Since A T A has only strictly positive eigenvalues it is rank(a T A) = d and it follows that also rank(a) = d. Let σ 1... σ d be the singular values of A, then it holds σ i = λ i, and it follows κ (A) = σ 1 λ1 = c. σ d λ d Hence, it is A T x (0)M(c, δ), and the proof is complete.

22 4 The Algorithms Here, three different strategies for the successive approximation of the tangent space are presented that are based on the considerations made in the previous section. Given an initial solution z (0) = (x (0), α (0) ) M R N, N = n + k, where M = F 1 (0) and F the map defined in (11), all methods aim to find suitable neighboring solutions z (i) M in the vicinity of z (0). While the first two approaches work directly in the complete (x, α)-space, the third approach splits the x- and the α-space for the successive approximation. The first method considered here (see Algorithm 1) is straightforward: successively d neighboring solutions z (i) M B δ (z (0) ) are computed starting from randomly chosen starting points in the vicinity of z (0). If the resulting matrix A satisfies (a1) and (a) from Theorem 3, then rge(a) can be viewed as a suitable approximation of T x (0)M. Algorithm 1 (Randomly chosen solutions z (i) ) (S1) Choose δ > 0, set i := 1. (S) Choose z (i) B δ (z (0) ) R N uniformly at random. (S3) Solve F ( ) starting with z (i). (S4) If no acceptable solution has been found in (S3), go to (S), else proceed with the solution z (i) with F (z (i) ) 0. (S5) Set A i := z (i) z (0). (S6) If i < d + 1 go to (S), else STOP. Here, a solution z is defined to be acceptable if the value F (z ) is below a given (low) threshold. In practice, it has been observed that Algorithm 1 already yields sufficient approximations of the tangent spaces even though it does not check the conditions (a1) and (a) from Theorem 3 (refer to the numerical results presented in the next section). However, as Example 1 shows, an acceptable approximation cannot be expected in general, even for arbitrarily small values of δ. To prevent such cases, the next algorithm is constructed. For this, the following penalty functions will be needed: ( ) z (i) z (0) δ l, if z (i) z (0) < δl h 0 (δ l, δ u, z (0), z (i) ( ) ) := z (i) z (0) δu, if z (i) z (0) > δu 0, else (69) h 1 (d, c, δ l, δ u, z (i), z (j) ) := ( ( )) 1 z(i) z (j) δu c δ l δ u (1+c )(d 1), if 1 ( ( 1 z(i) z (j) δl + c δ l δ u (1+c )(d 1) 0, else z(i) z (j) < δ u c δ l δ u (1+c )(d 1) )), if 1 z(i) z (j) > δ l c δ l δ u (1+c )(d 1) (70)

23 3 By construction of h 0 and h 1 it holds: h 0 (δ l, δ u, x (0), x (i) ) = 0 z (i) z (0) satisfies (a1) from Theorem 3. h 1 (d, c, δ l, δ u, z (i), z (j) ) = 0 j < i 1 z(i) z (j) satisfies (a) from Theorem 3. Algorithm is based on the result in the previous section since it aims to find a distribution of the solutions z (i) as discussed in Theorem 3. Algorithm (Distribution of solutions z (i) via penalization) [ ] (S1) Choose δ u > 0, c 1, δ l (1+c )(d 1)+ (1+c )(d 1)+c δ u, δ u and set i := 1. (S) Choose z (i) B δu (z (0) ) R N uniformly at random. (S3) Solve F ( ) + w 0 h 0 (δ l, δ u, z (0), ( )) + w 1 j<i h 1(d, c, δ l, δ u, ( ), z (j) ) with weights w 0, w 1 R + starting with z (i). (S4) If no acceptable solution has been found in (S3), go to (S), else proceed with the solution z (i) with F (z (i) ) +w 0 h 0 (δ l, δ u, z (0), z (i) )+w 1 j<i h 1(d, c, δ l, δ u, z (i), z (j) ) = 0. (S5) Set A i := z (i) z (0) and i := i + 1. (S6) If i < d + 1 go to (S), else STOP. Crucial is of course the computation of the minimizers of g MOP (x) = F ( ). Note that F already contains the gradient information of each objective. Hence, if the gradient g MOP ( z) is evaluated directly, the Hessians H i (z) of each objective at x have to be computed. To prevent this, it is suggested here to approximate g MOP ( z) by finite differences or (which is more accurate) to compute it using automatic differentiation (Griewank 000). In that case and assuming that n >> k, the cost to obtain g MOP ( z) scales basically linearly with n in terms of memory (if not the entire matrix is stored at once but evaluated one by one) and quadratic in terms of flops. If a derivative free solver is used, the number of flops grows only linearly with n. These costs hold ideally also for the entire approximation of the tangent space. The above methods can in principle be applied to any given problem of the form (1). Based on numerical experiments on high dimensional MOPs the authors of this work have, however, observed a different sensibility in x and α space of the map F leading to the problem of finding a proper value of δ for the neighborhood search. As a possible remedy, it is here suggested to split the two different spaces as follows: instead of choosing a neighbor solution z (i) = ( x (i), α (i) ) B δ (z (i) ) which is corrected to the solution set (steps (S) and (S3) of Algorithm 1), one computes neighboring solution by varying the weight vector α (i) once in the beginning and tackling F α (i)(x) := F (x, α (i) ) for fixed α (i). Algorithm 3 details this procedure. Algorithm 3 (Distribution of solutions z (i) via variation of α)

24 4 (S1) Choose δ α, δ z > 0, i := 1. (S) Choose α (i) B δα (α (0) ) R k with α (i) 1 = 1 and α (i) 0 at random. (S3) Solve F α (i)(x) starting with x (0). (S4) If (x (i), α (i) ) / B δz (x (0), α (0) ), set δ α := δα and go to (S ). (S5) If no acceptable solution could be computed go to (S ), else set x (i) as the obtained solution. (S6) Set A i := (x (i), α (i) ) (x (0), α (0) ). (S7) If i < d + 1 go to (S ), else STOP. 5 Results In this section, first the mechanism of Algorithm to select new solutions z (i) is demonstrated. Further on, the performances of the resulting PC methods are tested and compared against the classical implementation. 5.1 Revisit of Example 1 In Example 1, two points x (1) and x () have been chosen such that the space spanned by x (1) x (0) and x () x (0) is orthogonal to the tangent space T x (0)M, and one could find such points in every ball B δ (x (0) ). The example demonstrated what can go wrong if one does not take care of the condition constraint in Theorem. The following discussion shows that for a given point x (1) Algorithm computes another point x () which in case δ is small enough prevents the spanned space to be orthogonal to T x (0)M and which even serves as an approximation of T x (0)M. Consider the -dimensional manifold from Example 1 and the vector given therein. For δ = 3 Setting δ u := 1, c := M := {x R 3 x 1 + x + x 3 = 0, x i < 1, i = 1, } (71) ) T A 1 := x (1) x (0) δ = (, 0, δ (7) 4 it holds A 1 = ( 3 4, 0, 9 ) T. (73) and d =, it follows that (1+c )(d 1)+ (1+c )(d 1)+c = 3. Choosing δ l := 3 δu+δu = 5 6 leads to δ l [ 3 δ ] u, δ u as required in Algorithm, and also A 1 satisfies A 1 [δ l, δ u ] = [ 5 6, 1]. Similar to Example 1, one can choose another vector A := x () x (0) = x () M, but in contrast to Example 1

25 5 A is chosen here such that it satisfies conditions (a1) and (a) as Algorithm does. Defining the vector leads [ to A = 15 δu c δ l δ u ( A := 0, 3 4, 9 ) T (74) [ 5 6, 1] and 1 A 1 A = 9 16 [ , ] ] = (1+c )(d 1), δ l + c δ l δ u (1+c )(d 1). Thus, A 1 and A satisfy (a1) and (a). Figure 3 illustrates the areas of points which satisfy (a1) and (a) and the approximated tangent space. Choosing a smaller value for δ leads to a better approximation of T x (0)M, e.g. for δ = 3 4 it holds A 3 = ( 3 8, 0, 9 ) T. (75) 64 Setting δ u := 9 0, c := 3 3, d = and choosing δ l in the same way as above leads to δ l = 3 8 [ 3 δ ] u, δ u. In addition, A3 satisfies A 3 [δ l, δ u ] = [ 3 8, 0] 9. Defining the vector A 4 := leads to A 4 = [ 3 [ δu c δ l δ u ] 8, 9 0 ( 0, 3 8, 9 ) T (76) 64 ] and 1 A 3 A 4 = 9 64 [ , 41600] =. Thus, A 3 and A 4 satisfy (a1) and (a). Fig- (1+c )(d 1), δ l + c δ l δ u (1+c )(d 1) ure 4 illustrates the areas of points which satisfy (a1) and (a) and the approximated tangent space spanned by A 3 and A Testing the PC Methods Now the performances of the different PC methods when approximating the tangent space successively are tested and compared. As base algorithm it was chosen to use the recovering technique presented in Dellnitz et al. (005) and Schütze et al. (005). This method uses boxes as a tool to maintain a spread of the solutions: The domain R is partitioned by a set of boxes. Every solution z of F is associated with the box which contains z, and only one solution is associated with each box. The idea of the recovering algorithm is to detect from a given box which contains a solution of F neighboring boxes which contain further solutions of F, and so on. By this, the solution set is represented by a box collection C. Ideally, i.e., for a perfect outcome set, the associated box collection C covers the entire Pareto set tightly. In the following, it will be distinguished between the classical recover algorithm R QR as described in Schütze et al. (005) which uses a QR-decomposition of F to obtain the tangent space, and the modifications R Alg.1, R Alg., and R Alg.3 which are obtained via a successive approximation of the tangent space via Algorithm 1 to 3, respectively.

26 6 Fig. 3. The vectors v i, i = 1, (which are multiples of the secants A i for sake of a better visualization) span a subspace that approximates the (exact) tangent space T x (0)M. The horizontal area marks the points which satisfy (a1), the vertical area marks the points wich satisfy (a) and their intersection marks the points wich satisfy both. To compare the performance of the three different PC algorithms the following scalable MOP taken from Schütze et al. (005) is used: f 1, f, f 3 : R R n R n f i (x) = (x j a i j) + (x i a i i) 4, j=1 j i (77) where a 1 = (1, 1, 1, 1,...) R n a = ( 1, 1, 1, 1,...) R n a 3 = (1, 1, 1, 1,...) R n. For the application of the recovering techniques the domain R = [ 1.5, 1.5] n has been chosen. Table 1 shows a comparison of the algorithms R QR, R Alg.1, and R Alg. for different values of n on the benchmark problem. Here, all procedures have been started with one single solution z = (a 1, α 1 ), where α 1 = (1, 0, 0) T, i.e., with the solution of the first objective f 1. For all required scalar optimization problems in both the predictor and the corrector step the derivative free Quasi Newton method e04jyf of the NAG Fortran package 4 has been used. For all cases in Table 1, it holds for two box collections C 1 and C with C 1 > C that the 4

27 7 Fig. 4. The vectors v i, i = 3, 4 (which are multiples of the secants A i for sake of a better visualization) span a subspace that approximates the (exact) tangent space T x (0)M. The horizontal area marks the points which satisfy (a1), the vertical area marks the points wich satisfy (a) and their intersection marks the points wich satisfy both. collection C 1 is indeed a superset of C (though the differences do not play a major role in this case since additional boxes are mostly neighboring solutions, and no significant difference could be observed when considering the Pareto fronts. In all cases, the box collections are near to a perfect covering of the Pareto set.). Though the approximation qualities are basically equal, this does not hold for the computational times. In all cases, R Alg.1 is the fastest method, and for n = 1, 000, R QR is the slowest method (which holds as well for all larger values of n, where R QR is applicable). Among the two novel methods R Alg.1 and R Alg., it can be observed as anticipated that R Alg.1 is a bit faster while R Alg. tends to find more solutions which is probably due to the better approximation of the tangent space. Figure 5 shows the result obtained by R Alg. for n = 1, 000. In order to treat parameter dimensions n > 1, 000 it was observed that the best strategy is to approximate the tangent space via a split of x- and α-space as done in Algorithm 6. Further, it is advantageous to use a solver that uses gradient information such as the limited memory BFGS method (Shanno 1978). Table lists the number of function and derivative calls as well as the CPU time of R Alg.3 applied on MOP 77 for n = 100 up to n = 100, 000, where # F is the number of derivative calls of F α (i)(x). Figure 6 shows the result for n = 100, 000. None of the other methods could obtain similar results. Finally, an attempt has been made to estimate the Hausdorff distance between the x-part of the candidate set obtained by the different PC methods (X QR, X Alg.1, X Alg., X Alg.3 ) and the x-part of the Pareto set P Q = (X PQ, α PQ ). To approximate X PQ, all the algorithms described above have been run using a much smaller box size with different starting values, the resulting

28 8 Table 1. Comparison of the recovering methods R QR, R Alg.1, and R Alg. on MOP (77), where the derivative free routine e04jyf has been used to solve the required scalar optimization problems. Listed are the number of boxes generated by the recovering algorithms, the CPU time (in seconds), the numbers of function calls # F and derivative calls # F. n R QR R Alg.1 R Alg. #boxes CPU # F 895, 59 1, 83, 57 1, 848, 806 # F , 000 #boxes CPU # F, 169, 144, 89, 431 4, 178, 00 # F #boxes CPU # F 5, 765, 70 8, 69, , 70, 643 # F #boxes CPU # F 1, 654, 54 0, 700, 411 5, 561, 88 # F non-dominated solutions have been merged to get a reference set X ref X PQ (this led to an amount of 70,000 non-dominated solutions). Since boxes are used to represent the sets of interest, the metric induced by the -norm has been chosen to calculate the Hausdorff distance (compare to Definition 3). The boxes used to calculate the box collections as shown in Table 1 have a side length of d b = In Table 5. the resulting Hausdorff distances obtained by the different methods for dimensions n = 100 and n = 1, 000 are listed. Since all values are below d b, all approximations can be considered as good enough according to the given precision induced by the boxes. lar approximation results for 6 Conclusions and Future Work In this paper, the numerical treatment of high dimensional multi-objective optimization problems has been addressed by means of continuation methods. The bottleneck for such problems is the approximation of the tangent space. Given a solution, the cost to obtain a further solution is for full Hessians of the objectives O(n ) in terms of memory and O(n 3 ) in terms of flops, where n is the dimension of the parameter space. Alternatively, is was suggested to perform a successive approximation of the tangent space which is based on some analysis

29 Table. Numerical results for R Alg.3 on MOP (77) for n = 1, 000 to n = 100, 000 parameters. n # F # F CPU boxes 100, 947 7, , 893 7, , 7 6, , 000 3, 155 7, , 000, 15 7, , 000 1, 170 6, , 000, 069 7, , 000 1, 3 6, , 000, 385 7, , 000 1, 67 6, Table 3. Hausdorff distance of the results of all PC methods to the estimated Pareto front of MOP (77) for n = 100 and n = 1, 000 parameters. n d H(X ref, X QR) d H(X ref, X Alg.1 ) d H(X ref, X Alg. ) d H(X ref, X Alg.3 ) , on the geometry of the problem as presented in Section 3. The cost of the novel method is O(n) in terms of memory and O(n ) in terms of flops. Finally, the new approach has been tested within a particular predictor corrector method on a benchmark model with up to n = 100, 000 dimensions yielding superior results to the classical implementation. The presented approach is not restricted to the solution of high dimensional multi-objective optimization problems. In fact, any parameter-dependent rootfinding problem of high dimension can be efficiently tackled using the presented continuation method (under the assumptions discussed in Section 1). For the future, it would be interesting to apply the novel method to a real world problem which will also allow for an improved comparison to other methods. The techniques are of importance for various technical and engineering applications. For example, for the numerical solution of optimal control problems for complex systems such as multi-body systems, one is faced with high dimensional multiobjective optimization problems. Using a direct method, the optimal control problem is transformed to a nonlinear optimization problem by discretizing all state and control variables in time. The discrete states and controls defined on a discrete time grid are the optimization variables for the optimization problem (see e.g. Leyendecker et al. (009) or Ober-Blöbaum et al. (010) for singleobjective optimal control problems). To meet accuracy requirements, the time discretization has to be fine enough which leads especially for long time spans as e.g. for the trajectory design for space missions to a high number of optimization

30 30 variables. Since the minimization of not only one but rather multiple conflicting objectives is of interest (e.g., minimal fuel consumption and minimal flight time), the high dimensional optimization problem results in a multi-objective problem Dellnitz et al. (009). Considering partial differential equation constrained optimization problems (for an overview see Biegler et al. (003)), a discretization in space and time leads to an even higher number of optimization parameters. Using the presented algorithms, one is able to compute the entire Pareto set for these multi-objective optimal control problems in an efficient way. Acknowledgments The last author acknowledges support from CONACyT project no References Allgower, E. L. and Georg, K. (1990), Numerical Continuation Methods, Springer. Atkinson, K. E. (1989), An Introduction to Numerical Analysis, Springer, New York. Biegler, L. T., Ghattas, O., Heinkenschloss, M. and van Bloemen Waanders, B. (003), Large-Scale PDE-Constrained Optimization, Springer Lecture Notes in Computational Science and Engineering. Coello Coello, C. A., Lamont, G. B. and Van Veldhuizen, D. A. (007), Evolutionary Algorithms for Solving Multi-Objective Problems, second edn, Springer, New York. ISBN Das, I. and Dennis, J. (1998), Normal-boundary intersection: A new method for generating the Pareto surface in nonlinear multicriteria optimization problems., SIAM Journal of Optimization 8, Deb, K. (001), Multi-Objective Optimization Using Evolutionary Algorithms, Wiley. Dellnitz, M., Ober-Blöbaum, S., Post, M., Schütze, O. and Thiere, B. (009), A multiobjective approach to the design of low thrust space trajectories + using optimal control., Celestial Mechanics and Dynamical Astronomy 105(1), Dellnitz, M., Schütze, O. and Hestermeyer, T. (005), Covering Pareto sets by multilevel subdivision techniques, Journal of Optimization Theory and Applications 14, Dellnitz, M., Schütze, O. and Sertl, S. (00), Finding zeros by multilevel subdivision techniques, IMA Journal of Numerical Analysis (), Deuflhard, P. (004), Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms, Springer. Eichfelder, G. (008), Adaptive Scalarization Methods in Multiobjective Optimization, Springer, Berlin Heidelberg. ISBN Fliege, J. (004), Gap-free computation of Pareto-points by quadratic scalarizations, Mathematical Methods of Operations Research 59, Golub, G. H. and Loan, C. F. V. (1996), Matrix Computations, Johns Hopkins University Press. Griewank, A. (000), Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, number 1 in Frontiers in Appl. Math., SIAM, Philadelphia, PA. Guddat, J., Vasquez, F. G., Tammer, K. and Wendler, K. (1985), Multiobjective and Stochastic Optimization based on Parametric Optimization, Akademie-Verlag.

31 Harada, K., Sakuma, J., Kobayashi, S. and Ono, I. (007), Uniform sampling of local Pareto-optimal solution curves by pareto path following and its applications in multi-objective GA, in GECCO, pp Hillermeier, C. (001), Nonlinear Multiobjective Optimization - A Generalized Homotopy Approach, Birkhäuser. Jahn, J. (1986), Mathematical Vector Optimization in Partially Ordered Linear Spaces, Verlag Peter Lang GmbH, Frankfurt am Main. Jahn, J. (006), Multiobjective search algorithm with subdivision technique, Computational Optimization and Applications 35(), Karush, W. E. (1939), Minima of functions of several variables with inequalities as side conditions, PhD thesis, University of Chicago. Königsberger, K. (1997), Analysis, Springer. Kuhn, H. and Tucker, A. (1951), Nonlinear programming, in J. Neumann, ed., Proceeding of the nd Berkeley Symposium on Mathematical Statistics and Probability, pp Lara, A., Sanchez, G., Coello, C. A. C. and Schütze, O. (010), HCS: A new local search strategy for memetic multiobjective evolutionary algorithms, IEEE Transactions on Evolutionary Computation 14(1), Leyendecker, S., Ober-Blöbaum, S., Marsden, J. and Ortiz, M. (009), Discrete mechanics and optimal control for constrained systems, Optimal Control Applications & Methods, DOI: /oca.91. Miettinen, K. (1999), Nonlinear Multiobjective Optimization, Kluwer Academic Publishers. Nocedal, J. and Wright, S. (006), Numerical Optimization, Springer Series in Operations Research and Financial Engineering, Springer. Ober-Blöbaum, S., Junge, O. and Marsden, J. (010), Discrete mechanics and optimal control: an analysis, ESAIM: Control, Optimisation and Calculus of Variations. DOI: /cocv/ Pareto, V. (1971), Manual of Political Economy, The MacMillan Press. Rakowska, J., Haftka, R. T. and Watson, L. T. (1991), Tracing the efficient curve for multi-objective control-structure optimization, Computing Systems in Engineering (6), Rheinboldt, W. C. (1986), Numerical Analysis of Parametrized Nonlinear Equations, Wiley. Ringkamp, M. (009), Fortsetzungsalgorithmen für hochdimensionale Mehrzieloptimierungsprobleme, Diploma thesis, University of Paderborn. Schäffler, S., Schultz, R. and Weinzierl, K. (00), A stochastic method for the solution of unconstrained vector optimization problems, Journal of Optimization Theory and Applications 114(1), 09. Schütze, O., Coello, C. A. C., Mostaghim, S. and Talbi, E.-G. (008), Hybridizing evolutionary strategies with continuation methods for solving multi-objective problems, Engineering Optimization 40(5), Schütze, O., Dell Aere, A. and Dellnitz, M. (005), On continuation methods for the numerical treatment of multi-objective optimization problems, in J. Branke, K. Deb, K. Miettinen and R. E. Steuer, eds, Practical Approaches to Multi- Objective Optimization, number in Dagstuhl Seminar Proceedings, Internationales Begegnungs- und Forschungszentrum (IBFI), Schloss Dagstuhl, Germany. < Schütze, O., Mostaghim, S., Dellnitz, M. and Teich, J. (003), Covering Pareto sets by multilevel evolutionary subdivision techniques, in C. M. Fonseca, P. J. Fleming, 31

32 3 E. Zitzler, K. Deb and L. Thiele, eds, Evolutionary Multi-Criterion Optimization, Lecture Notes in Computer Science. Shanno, D. F. (1978), On the convergence of a new conjugate gradient algorithm., SIAM J. Numer. Anal. 15, Steuer, R. E. (1986), Multiple Criteria Optimization: Theory, Computation, and Applications, John Wiley & Sons, Inc.

33 33 (a) Parameter Space (b) Image Space Fig. 5. Numerical result of R Alg. on MOP (77) for n = 1, 000. Above: the projection of the final box collecion for x 1, x, and x 3. Below: the obtained Pareto front.

Unit: Optimality Conditions and Karush Kuhn Tucker Theorem

Unit: Optimality Conditions and Karush Kuhn Tucker Theorem Unit: Optimality Conditions and Karush Kuhn Tucker Theorem Goals 1. What is the Gradient of a function? What are its properties? 2. How can it be used to find a linear approximation of a nonlinear function?

More information

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

A Descent Method for Equality and Inequality Constrained Multiobjective Optimization Problems

A Descent Method for Equality and Inequality Constrained Multiobjective Optimization Problems A Descent Method for Equality and Inequality Constrained Multiobjective Optimization Problems arxiv:1712.03005v2 [math.oc] 11 Dec 2017 Bennet Gebken 1, Sebastian Peitz 1, and Michael Dellnitz 1 1 Department

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

1. Introduction. We consider the general global optimization problem

1. Introduction. We consider the general global optimization problem CONSTRUCTION OF VALIDATED UNIQUENESS REGIONS FOR NONLINEAR PROGRAMS IN WHICH CONVEX SUBSPACES HAVE BEEN IDENTIFIED R. BAKER KEARFOTT Abstract. In deterministic global optimization algorithms for constrained

More information

Math Advanced Calculus II

Math Advanced Calculus II Math 452 - Advanced Calculus II Manifolds and Lagrange Multipliers In this section, we will investigate the structure of critical points of differentiable functions. In practice, one often is trying to

More information

Implicit Functions, Curves and Surfaces

Implicit Functions, Curves and Surfaces Chapter 11 Implicit Functions, Curves and Surfaces 11.1 Implicit Function Theorem Motivation. In many problems, objects or quantities of interest can only be described indirectly or implicitly. It is then

More information

Symmetric Matrices and Eigendecomposition

Symmetric Matrices and Eigendecomposition Symmetric Matrices and Eigendecomposition Robert M. Freund January, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 Symmetric Matrices and Convexity of Quadratic Functions

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Numerical Optimization

Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum Functions

Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum Functions 260 Journal of Advances in Applied Mathematics, Vol. 1, No. 4, October 2016 https://dx.doi.org/10.22606/jaam.2016.14006 Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum

More information

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM H. E. Krogstad, IMF, Spring 2012 Karush-Kuhn-Tucker (KKT) Theorem is the most central theorem in constrained optimization, and since the proof is scattered

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009 UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries EC 521 MATHEMATICAL METHODS FOR ECONOMICS Lecture 1: Preliminaries Murat YILMAZ Boğaziçi University In this lecture we provide some basic facts from both Linear Algebra and Real Analysis, which are going

More information

Implications of the Constant Rank Constraint Qualification

Implications of the Constant Rank Constraint Qualification Mathematical Programming manuscript No. (will be inserted by the editor) Implications of the Constant Rank Constraint Qualification Shu Lu Received: date / Accepted: date Abstract This paper investigates

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Analysis-3 lecture schemes

Analysis-3 lecture schemes Analysis-3 lecture schemes (with Homeworks) 1 Csörgő István November, 2015 1 A jegyzet az ELTE Informatikai Kar 2015. évi Jegyzetpályázatának támogatásával készült Contents 1. Lesson 1 4 1.1. The Space

More information

Lecture 1: Basic Concepts

Lecture 1: Basic Concepts ENGG 5781: Matrix Analysis and Computations Lecture 1: Basic Concepts 2018-19 First Term Instructor: Wing-Kin Ma This note is not a supplementary material for the main slides. I will write notes such as

More information

On the Local Convergence of an Iterative Approach for Inverse Singular Value Problems

On the Local Convergence of an Iterative Approach for Inverse Singular Value Problems On the Local Convergence of an Iterative Approach for Inverse Singular Value Problems Zheng-jian Bai Benedetta Morini Shu-fang Xu Abstract The purpose of this paper is to provide the convergence theory

More information

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

INTEGRATION ON MANIFOLDS and GAUSS-GREEN THEOREM

INTEGRATION ON MANIFOLDS and GAUSS-GREEN THEOREM INTEGRATION ON MANIFOLS and GAUSS-GREEN THEOREM 1. Schwarz s paradox. Recall that for curves one defines length via polygonal approximation by line segments: a continuous curve γ : [a, b] R n is rectifiable

More information

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING Nf SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING f(x R m g HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 5, DR RAPHAEL

More information

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

REAL AND COMPLEX ANALYSIS

REAL AND COMPLEX ANALYSIS REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any

More information

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

g 2 (x) (1/3)M 1 = (1/3)(2/3)M. COMPACTNESS If C R n is closed and bounded, then by B-W it is sequentially compact: any sequence of points in C has a subsequence converging to a point in C Conversely, any sequentially compact C R n is

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Topological properties of Z p and Q p and Euclidean models

Topological properties of Z p and Q p and Euclidean models Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Quasi-Newton Methods

Quasi-Newton Methods Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references

More information

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103 Analysis and Linear Algebra Lectures 1-3 on the mathematical tools that will be used in C103 Set Notation A, B sets AcB union A1B intersection A\B the set of objects in A that are not in B N. Empty set

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012. Math 5620 - Introduction to Numerical Analysis - Class Notes Fernando Guevara Vasquez Version 1990. Date: January 17, 2012. 3 Contents 1. Disclaimer 4 Chapter 1. Iterative methods for solving linear systems

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 18, No. 1, pp. 106 13 c 007 Society for Industrial and Applied Mathematics APPROXIMATE GAUSS NEWTON METHODS FOR NONLINEAR LEAST SQUARES PROBLEMS S. GRATTON, A. S. LAWLESS, AND N. K.

More information

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

COURSE Iterative methods for solving linear systems

COURSE Iterative methods for solving linear systems COURSE 0 4.3. Iterative methods for solving linear systems Because of round-off errors, direct methods become less efficient than iterative methods for large systems (>00 000 variables). An iterative scheme

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

DEVELOPMENT OF MORSE THEORY

DEVELOPMENT OF MORSE THEORY DEVELOPMENT OF MORSE THEORY MATTHEW STEED Abstract. In this paper, we develop Morse theory, which allows us to determine topological information about manifolds using certain real-valued functions defined

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

An Introduction to Correlation Stress Testing

An Introduction to Correlation Stress Testing An Introduction to Correlation Stress Testing Defeng Sun Department of Mathematics and Risk Management Institute National University of Singapore This is based on a joint work with GAO Yan at NUS March

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Differential Topology Solution Set #2

Differential Topology Solution Set #2 Differential Topology Solution Set #2 Select Solutions 1. Show that X compact implies that any smooth map f : X Y is proper. Recall that a space is called compact if, for every cover {U } by open sets

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Zangwill s Global Convergence Theorem

Zangwill s Global Convergence Theorem Zangwill s Global Convergence Theorem A theory of global convergence has been given by Zangwill 1. This theory involves the notion of a set-valued mapping, or point-to-set mapping. Definition 1.1 Given

More information

Lebesgue Measure on R n

Lebesgue Measure on R n 8 CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

MTH 309 Supplemental Lecture Notes Based on Robert Messer, Linear Algebra Gateway to Mathematics

MTH 309 Supplemental Lecture Notes Based on Robert Messer, Linear Algebra Gateway to Mathematics MTH 309 Supplemental Lecture Notes Based on Robert Messer, Linear Algebra Gateway to Mathematics Ulrich Meierfrankenfeld Department of Mathematics Michigan State University East Lansing MI 48824 meier@math.msu.edu

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM

ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM ALEXANDER KUPERS Abstract. These are notes on space-filling curves, looking at a few examples and proving the Hahn-Mazurkiewicz theorem. This theorem

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents MATHEMATICAL ECONOMICS: OPTIMIZATION JOÃO LOPES DIAS Contents 1. Introduction 2 1.1. Preliminaries 2 1.2. Optimal points and values 2 1.3. The optimization problems 3 1.4. Existence of optimal points 4

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS RALPH HOWARD DEPARTMENT OF MATHEMATICS UNIVERSITY OF SOUTH CAROLINA COLUMBIA, S.C. 29208, USA HOWARD@MATH.SC.EDU Abstract. This is an edited version of a

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Multivariable Calculus

Multivariable Calculus 2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

A projection algorithm for strictly monotone linear complementarity problems.

A projection algorithm for strictly monotone linear complementarity problems. A projection algorithm for strictly monotone linear complementarity problems. Erik Zawadzki Department of Computer Science epz@cs.cmu.edu Geoffrey J. Gordon Machine Learning Department ggordon@cs.cmu.edu

More information

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material

More information

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f))

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f)) 1. Basic algebra of vector fields Let V be a finite dimensional vector space over R. Recall that V = {L : V R} is defined to be the set of all linear maps to R. V is isomorphic to V, but there is no canonical

More information

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018 MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S

More information

Theorem 3.11 (Equidimensional Sard). Let f : M N be a C 1 map of n-manifolds, and let C M be the set of critical points. Then f (C) has measure zero.

Theorem 3.11 (Equidimensional Sard). Let f : M N be a C 1 map of n-manifolds, and let C M be the set of critical points. Then f (C) has measure zero. Now we investigate the measure of the critical values of a map f : M N where dim M = dim N. Of course the set of critical points need not have measure zero, but we shall see that because the values of

More information

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

A new nonmonotone Newton s modification for unconstrained Optimization

A new nonmonotone Newton s modification for unconstrained Optimization A new nonmonotone Newton s modification for unconstrained Optimization Aristotelis E. Kostopoulos a George S. Androulakis b a a Department of Mathematics, University of Patras, GR-265.04, Rio, Greece b

More information

Handlebody Decomposition of a Manifold

Handlebody Decomposition of a Manifold Handlebody Decomposition of a Manifold Mahuya Datta Statistics and Mathematics Unit Indian Statistical Institute, Kolkata mahuya@isical.ac.in January 12, 2012 contents Introduction What is a handlebody

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

Solution of Linear Equations

Solution of Linear Equations Solution of Linear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 7, 07 We have discussed general methods for solving arbitrary equations, and looked at the special class of polynomial equations A subclass

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

Problem set 1, Real Analysis I, Spring, 2015.

Problem set 1, Real Analysis I, Spring, 2015. Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received

More information

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION

GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION Chapter 4 GENERALIZED CONVEXITY AND OPTIMALITY CONDITIONS IN SCALAR AND VECTOR OPTIMIZATION Alberto Cambini Department of Statistics and Applied Mathematics University of Pisa, Via Cosmo Ridolfi 10 56124

More information