Mesh adaptive direct search with second directional derivative-based Hessian update

Size: px
Start display at page:

Download "Mesh adaptive direct search with second directional derivative-based Hessian update"

Transcription

1 Comput Optim Appl DOI /s Mesh adaptive direct search with second directional derivative-based Hessian update Árpád Bűrmen 1 Jernej Olenšek 1 Tadej Tuma 1 Received: 28 February 2014 Springer Science+Business Media New York 2015 Abstract The subject of this paper is inequality constrained black-box optimization with mesh adaptive direct search (MADS). The MADS search step can include additional strategies for accelerating the convergence and improving the accuracy of the solution. The strategy proposed in this paper involves building a quadratic model of the function and linear models of the constraints. The quadratic model is built by means of a second directional derivative-based Hessian update. The linear terms are obtained by linear regression. The resulting quadratic programming (QP) problem is solved with a dedicated solver and the original functions are evaluated at the QP solution. The proposed search strategy is computationally less expensive than the quadratically constrained QP strategy in the state of the art MADS implementation (NOMAD). The proposed MADS variant (QPMADS) and NOMAD are compared on four sets of test problems. QPMADS outperforms NOMAD on all four of them for all but the smallest computational budgets. Keywords Black-box optimization Constrained optimization Mesh adaptive direct search Second directional derivative Hessian update Quadratic models Quadratic programming Mathematics Subject Classification 90C30 90C56 65K05 90C20 B Árpád Bűrmen arpad.buermen@fe.uni-lj.si 1 Faculty of Electrical Engineering, University of Ljubljana, Tržaška cesta 25, 1000 Ljubljana, Slovenia

2 Á. Bűrmen et al. 1 Introduction In constrained black-box optimization the derivatives are usually not available. The optimization problem is described in the form of a black box capable of computing objective and constraint function values. One of the algorithmic frameworks for solving such problems is mesh adaptive direct search (MADS) [6]. The optimization problem is given by min x Ω f (x), (1) where f : R n R and Ω R n is the feasible region defined by nonlinear constraints. In this paper we limit ourselves to problems where f and the nonlinear constraints cannot be evaluated separately. This is often the case when f and the constraints are obtained from the results of a simulation. One of the advantages of MADS is its capability of handling general nonlinear constraints with the extreme barrier approach by replacing f (x) with f Ω (x) that is equal to f (x) if x Ω and + otherwise. Some of the constraints defining Ω can be of the form x L x x H (2) where vectors x L and x H represent lower and upper bounds on the optimization variables. These constraints are trivial to evaluate. We assume the function and the remaining constraints can only be evaluated if (2) is satisfied. This is often the case in simulation-based optimization where points outside bounds cannot be evaluated due to the inherent limitations of the underlying simulator. MADS relies on two basic ingredients to deliver certain convergence properties [1,6,27] for nonsmooth functions in the Clarke sense [10] and in the Rockafellar sense [23]. The first one is the restriction of the visited points to a discrete set (mesh). This restriction guarantees the existence of certain subsequences for which favourable convergence properties exist. The second ingredient is the exploration of the neighbourhood of the incumbent solution (also referred to as polling). The normalized directions in which MADS explores the neighbourhood of the incumbent solution are asymtotically dense on the unit sphere. The exploration mechanism provided by polling is slow, particularly on higher dimensional problems. To accelerate convergence MADS allows for the evaluation of arbitrary points on the current mesh (search). The points can be chosen by an arbitrary algorithm that can also use curvature information to significantly accelerate the convergence towards a solution. The MADS instance in [12] (implemented in the NOMAD software [18]) constructs a quadratic model of the function and quadratic models of the constraints from previously evaluated points in the neighbourhood of the incumbent solution. The obtained subproblem is solved and its solution is used as a trial point for the search. The model can also be used for ordering the poll directions so that directions resulting in the largest incumbent solution improvement predicted by the model are probed first [12]. Sampling and simplex derivatives were used for poll direction ordering and adjusting the mesh size parameter in [14]. Simplex derivatives can be obtained by interpolation or regression of a sufficiently large set of sample points. The smallest

3 Mesh adaptive direct search with second directional... number of points required for computing a full quadratic model (all first and second simplex derivatives) is (n + 1)(n + 2)/2. This requirement was relaxed in [12] where minimum Frobenius norm models were built using less than (n + 1)(n + 2)/2 points. An alternative approach is the use of the curvature information matrix (CIM) that approximates the Hessian for smooth functions [17]. Finite difference approximations of the second directional derivatives along coordinate directions yield the diagonal elements of the CIM. Extradiagonal elements are obtained by computing finite difference approximations of mixed derivatives in 2-dimensional subspaces defined by pairs of coordinate directions. For building a quadratic model one needs to compute the gradient and the Hessian matrix (i.e. the first and the second derivatives of the function). The Hessian matrix can be used in many different ways. One of the more straightforward approaches for exploiting the information in this matrix is to align the search directions with its eigenvectors [3]. This helps the algorithm to converge to a second order stationary point. Computing a complete quadratic model from sample points involves solving a system of (n + 1)(n + 2)/2 linear equations. To avoid this one can construct a sequence of Hessian approximations using various update formulas. In derivative-based quasi- Newton optimization methods, update formulas for approximating the Hessian matrix have been known since 1959 (see [16]). These update formulas are not suitable for derivative-free methods like MADS, because they require the knowledge of the function s gradient. The gradient of a function of n variables can be approximated by evaluating at least n + 1 points. If these points are not positioned in a particular manner (i.e. n points must lie along coordinate directions with respect to the n + 1-th point), a linear system of n equations must be solved for obtaining the gradient. Recently, a simple Hessian matrix update formula based on the second directional derivative was analysed in [19]. By continually applying this update the Hessian approximation converges to the Hessian of the function. The second directional derivative can be approximated using only three distinct collinear points. Such a constellation of points is common in some variants of MADS. For the Hessian update formula in [19] to perform optimally, the directions along which the second directional derivative is approximated must be uniformly distributed. Unfortunately, no known variant of MADS produces uniformly distributed poll directions. The first published MADS variant (LTMADS [6])tends to distribute most of its search directions along lower dimensional subspaces aligned with the coordinate directions [2,25]. The OrthoMADS variant of MADS [2] was developed with this shortcoming in mind. In [25] it was shown the poll directions in OrthoMADS tend to concentrate around coordinate directions as the problem dimension increases. A solution that generates the poll directions by means of QR decomposition of a random matrix was proposed in [25]. It produces seemingly uniformly distributed poll directions, albeit the uniformity of the distribution was not mathematically proven. Furthermore, the distribution used for generating the random matrix subject to QR decomposition was also not given. In [12], quadratic models of the objective and constraint functions are constructed. The obtained model of the original optimization problem (also referred to as the subproblem) is solved by a simple MADS algorithm. Because this algorithm uses

4 Á. Bűrmen et al. only the poll step, the obtained solution of the subproblem may be of very low quality, particularly for higher dimensional problems. The approach proposed in this paper relies on a dedicated quadratic programming (QP) solver, and the proposed algorithm is referred to as QPMADS. Because QP solvers handle only linear constraints, QPMADS uses a linear model of the constraints. The remainder of the paper is divided as follows. Section 2 introduces the second directional derivative-based Hessian update. Section 3 gives an overview of MADS and outlines the QPMADS algorithm. The approach for generating uniformly distributed poll directions and the update rules for the mesh and the poll size parameters are given in Sect. 4. Section 5 explains the model-based poll direction ordering and the application of the second directional derivative-based Hessian update. Construction of the model and its role in the search step are the subject of Sect. 6. The convergence proof for QPMADS is presented in Sect. 7. QPMADS was tested on several sets of test problems from [12,25] and its performance was compared to the performance of the state of the art MADS implementation in the NOMAD software [12,18]. The results are discussed in Sect. 8. Notation N, Z, and R denote, respectively, the sets of natural, integer, and real numbers. Vectors and matrices are denoted by lower- and uppercase bold letters (e.g. x, A), respectively. The indentity matrix of appropriate size is denoted by I. Vectors are treated as column vectors and the dot product of two vectors is expressed as a product of two matrices (x T y). Inequality relations are applied to vectors componentwise (e.g. x < 0 means all components of x are negative). Members of the standard orthonormal basis are denoted by e i. The expression x A is used for specifying that x is a column of matrix A., and F denote, respectively, the Euclidean, maximum, and Frobenius norms. Sequences are denoted by {x k } k=1. N(μ, σ 2 ) denotes the normal distribution with mean μ and variance σ 2. The rounding operator R k : R n R n is defined as R(x) arg min x y, (3) y G where G is the set of available rounded values. When arg min results in multiple points a tie breaking rule chooses a unique point for every x and every G. A tie breaking rule is implicitly applied every time the rounding function of the undelying mathematical library is invoked. 2 The Hessian update We consider the construction of a quadratic model of a nonlinear function f (x). Let g and B denote the approximate gradient and the approximate Hessian of f (x) at x 0. Then the model is given by m(x) = 1 2 (x x 0) T B(x x 0 ) + g T (x x 0 ) + f (x 0 ) (4) Suppose an approximation of the second directional derivative d2 dα 2 f (x 0 + αp) is available. Such an approximation can be obtained from the first three terms of the Taylor series expansion of f.

5 Mesh adaptive direct search with second directional... f 2,p (x 0 ) = d2 dα 2 f (x 0 + αp) 2 α + α ( f+ f 0 f ) f 0, (5) α + α where f 0, f + and f denote the values of f (x 0 +αp) at α = 0,α = α +, and α = α, respectively. The formula is exact if f is a quadratic function. Three collinear points are often available such that α + = α = 1. In this case (5) can be simplified to f 2,p (x 0 ) f + + f 2 f 0. (6) With decreasing p the approximation becomes more accurate. The second directional derivative of a quadratic functions depends only on p and can be expressed as f 2,p = p T Hp, (7) where H is the Hessian of f. In[19] the second directional derivative information was used for updating the Hessian approximation. Let B + denote the updated Hessian approximation. By minimizing the Frobenius norm of B + B subject to the linear constraint p T B + p = f 2,p (8) the following update formula is obtained. B + = B + f 2,p p T Bp p 4 pp T. (9) The following properties of the update formula (9) justify its use (see [19]). Lemma 1 Suppose H is the Hessian of a quadratic function f, B is a symmetric matrix, and p R n is a random vector such that p/ p is uniformly distributed on the unit sphere. Then the update formula given by (9) satisfies B + H F B H F (10) and E[ B + H 2 F ] (1 ) 2 B H 2 F (11) n(n + 2) Proof See the proof of Theorem 2.1 in [19]. Lemma 1 indicates that the Hessian approximation obtained with (9) converges linearly to H if p/ p is uniformly distributed on the unit sphere. In fact, we can relax the requirements for linear convergence. Lemma 2 Suppose H is the Hessian of a quadratic function f, B is a symmetric matrix, and p R n is a random vector such that [ (p T (B H)p) 2 ] E p 4 α B H 2 F (12)

6 Á. Bűrmen et al. for some α>0. Then the update formula given by (9) satisfies (10) and E[ B + H 2 F ] (1 α) B H 2 F. (13) Proof The proof of (10) in [19] does not depend on the distribution of p. Therefore (10) holds under the assumptions of Lemma 2. To prove (13) we begin by subtracting H from (9). Taking into account (7) results in B + H = B H pt (B H)p p 4 pp T. (14) Matrix B + H is orthogonal to pp T in the Frobenius product sense because B + B = pt (B H)p p 4 pp T (15) has minimal Frobenius norm subject to constraint (8). Thus we have B + H 2 F = B H 2 F (pt (B H)p) 2 p 4. (16) Computing the expected value and taking into account (12) yields the desired result. Lemma 3 Let B 0 be an arbitrary symmetric matrix and let {H k } denote the sequence of Hessians of f corresponding to a sequence of random points {x k } such that E[ H k F ] is finite. Let the members of the sequence of vectors {p k / p k } be independent and uniformly distributed on the unit sphere. Construct a sequence of random matrices using B k+1 = B k + f 2,p k (x k ) p T k B kp k p k 4 p k p T k. (17) Suppose H k converges in squared Frobenius norm to H (i.e. E[ H k H 2 F ] 0). Then E[ B k H 2 F ] 0. Proof See proof of Theorem 2.3 in [19].TheproofreliesonLemma1 If we replace Lemma 1 with Lemma 2 we obtain a more general result that requires the directions to satisfy (12). The final result of Lemma 3 justifies the use of update (9) for approximating the Hessian of a general nonlinear function. 3 Algorithm outline In iteration k MADS examines points that lie in scaled directions from the incumbent solution (x k ). These scaled directions are members of the set G k = { Δ m k Dz : z Nn D }, (18)

7 Mesh adaptive direct search with second directional... where the n D columns of D = GZ form a positive spanning set for R n [11], G is a n n real matrix, and Z is a n n D integer matrix. Points visited by the algorithm in k-th iteration lie on a mesh M k = { x + p : x S k, p G k }, (19) where S k is the set of all points visited by the algorithm in past iterations. The scaling coefficient Δ m k in (18)isalsoreferredtoasthemesh size.letx k denote the point with the lowest value of f Ω found in iterations 1,...,k 1 and f k the corresponding value of f Ω. One iteration of MADS is given by Algorithm 1. Algorithm 1 k-th iteration of the MADS framework. 1. Search. Evaluate f Ω on a finite subset of M k. 2. Poll. Select a finite set D k G k Evaluate f Ω (x k + d) for d D k until f k+1 < f k or D k is exhausted. 3. Update. Choose Δ m k+1 and Δp k+1. The length of poll directions (d) is determined by the poll size (Δ p k ). The search (or the poll for that matter) fails if it does not find a point x for which f Ω (x) < f k.an iteration is deemed successful if either the search or the poll is successful (i.e. does not fail). The poll may be omitted if the search succeeds. The algorithm starts with a given x 0 Ω, f 0 = f Ω (x 0 )<, and k = 0. Let D k denote the set of normalized poll directions {d/ d :d D k}. Aminimal frame center is a point x k for which all poll directions d D k fail to satisfy f Ω (x k + d) < f k. A refining subsequence is any sequence of minimal frame centers {x k } k K for which {Δ p k } k K converges to zero. MADS guarantees favourable convergence properties [6] for limit points of certain refining subsequences if the following requirements are satisfied: (A) Δ m k /Δm k+1 is an integer power of a given nonzero rational constant τ>1. (B) Δ m k+1 <Δm k if f k+1 = f k, otherwise Δ m k+1 Δm k (C) for some C > 0 and all d D k the bound d CΔ p k is satisfied, (D) lim k Δ p k = 0 if and only if lim k Δ m k = 0. (E) Limit points of D k in the sense of [13] are positive spanning sets. (F) The set k K D k where K corresponds to all failed iterations is dense on the unit sphere. Requirements A D guarantee the existence of at least one refining subsequence. Requirement E ensures the limit points of refining subsequences are also stationary points if f is smooth and Ω = R n, even when Requirement F is not satisfied (see [5]). Finally, F bestows MADS with convergence properties on nonsmooth functions and constrained problems (see [6]). The outline of the proposed approach (QPMADS) is given by Algorithm 2. Operator R k rounds a point to the nearest point in G k. The feasible region Ω is defined as Ω = { x R n : c(x) 0 } (20)

8 Á. Bűrmen et al. Algorithm 2 k-th iteration of the QPMADS algorithm. 1. QP search step. Use linear regression to build a linear model m c (s) of c(x) in the neighbourhood of x k. For given B use linear regression to obtain g for model m f (s) = 2 1 st Bs + g T s + f k. Replace B with a positive definite B = B + βi by choosing a suffucently large β 0. Obtain step s by minimizing m f (s) subject to m c (s) 0 (use a QP solver). Evaluate f Ω at x k + R k (s). If f Ω (x k + R k (s)) < f k go to step Poll step (performed if the QP search step fails). Select a positive spanning set D k G k Evaluate f Ω (x k + d) for d D k until f Ω (x k + d) < f k or D k is exhausted. If d D k is found satisfying f Ω (x k + d) < f k set p k := d. Otherwise go to step Speculative search step. Evaluate f Ω (x k + 2p k ). 4. Update. Compute Δ m k+1 and Δp k+1. where the vector valued function c is a map R n R m representing m inequality constraints. To simplify notation we introduce s = x x k and c k = c(x k ). Note that every evaluation of f Ω (x) involves the evaluation of f (x) and c(x). The quadratic programming (QP) search step first builds a quadratic model of f (x), m f (s) = 1 2 st Bs + g T s + f k, (21) and a linear model of c(x), m c (s) = As + c k. (22) Note that the Hessian approximation B is available because (9) is applied frequently, i.e. every time three collinear points are evaluated. The update formula (9) does not guarantee positive definiteness and therefore B can be indefinite. Because the QP solvers usually are capable of handling only convex problems, a positive definite B = B + βi is used instead of B in (21). The modified quadratic model of f and the linear model of c constitute a convex QP subproblem that approximates the original optimization problem in the neighbourhood of x k along some descent direction of the original QP problem. The modified subproblem can be solved using one of the many available QP solvers. The obtained step s is rounded to the nearest point from G k (denoted by R k (s)) and f Ω is evaluated at x k + R k (s). The QP search fails if f Ω (x k + R(s)) f k. A failed QP search is followed by a poll that examines points x k + d, d D k. If the poll step finds a point x k + p k for which f Ω (x k + p k )< f k the speculative search step evaluates point x k + 2p k further along the direction of last success in the hope of finding an even lower value of f Ω. We also experimented with a speculative step performed after a successful search step, but found that in most cases it had a negative impact on the performance of the algorithm. This is probably due to the fact that most of the information contained in the model is already exploited by the search step. Note that Algorithm 2 does not completely adhere to the MADS framework given by Algorithm 1 because the speculative search is performed at the end of the corresponding iteration. This does not invalidate the convergence theory given in [6]. See Sect. 7 for the details regarding this issue.

9 Mesh adaptive direct search with second directional... 4 Poll direction generation and the update In QPMADS the second directional derivative is approximated along search and poll directions. Lemma 1 guarantees linear convergence of the approximate Hessian to the actual Hessian if these directions are uniformly distributed. The algorithm in [2] generates sets of orthogonal directions that are not uniformly distributed [25]. To generate sets of uniformly distributed directions we give up orthogonality in favour of almost orthogonal search directions [25]. Algorithm 3 Generating poll directions for the k-th iteration of QPMADS. {N t } t=0 is a sequence of realizations of a n n random matrix with independent identically distributed elements from N(0, 1). 1. QR-decompose N tk into orthogonal matrix Q tk and upper triangular matrix R tk. 2. Construct a diagonal matrix D tk with elements d ii = sign(r ii ). 3. U tk := Q tk D tk. 4. Construct V k by rounding the columns of Δ p k U t k to the nearest points in G k. 5. The 2n members of D k are the columns of V k and V k. The basic idea for generating uniformly distributed poll directions given by Algorithm 3 is based on [24]. Every iteration of QPMADS is assigned an integer index t k corresponding to matrix N tk from which the set of poll directions D k is constructed. Matrix U is a random orthogonal matrix from the Haar measure on the orthogonal group O n (see [24] for a proof) and U tk is its t k -th realization. The distribution of such a random matrix is invariant to orthogonal transformations, i.e. the distribution of OU is identical to the distribution of U for every orthogonal matrix O. The following theorem is the basis for generating random vectors that are uniformly distributed on the unit sphere (i.e. vectors that point with equal probability in all directions). Theorem 1 Let U be a random matrix chosen from the Haar measure on the orthogonal group O n and a be a unit vector. Then Ua is distributed uniformly on the unit sphere. Proof It is sufficient to prove that the distribution of Ua does not change under orthogonal transformations. Let O be an arbitrary orthogonal matrix. The distribution of OU is by assumption identical to the distribution of U. From what was previously said and from O(Ua) = (OU)a we conclude the distribution of (OU)a is dentical to the distribution of Ua which completes the proof. By replacing a with a basis vector e i we can see every column of U is a random vector uniformly distributed on the unit sphere. By scaling the columns of U k with Δ p k and then rounding the results to the nearest points in G k an almost orthogonal matrix V k is obtained. Rounding a vector d R n to the nearest point from G k introduces a rounding error denoted by δ. This error is bounded in norm (see [9]): δ n 1/2 Δ m k /2 = δ 0. (23)

10 Á. Bűrmen et al. The effect of scaling and rounding on the distribution of the normalized columns of U can be summarized in the following theorem. Theorem 2 Let U be a random matrix chosen from the Haar measure on the orthogonal group O n and V be obtained by rounding the columns of Δ p ku to the nearest points in G k.asδ m k /Δp k approaches zero, the distribution of normalized columns of V approaches the uniform distribution on the unit sphere. Proof Let u denote a column of U.Byassumptionwehave u =1. The corresponding column of V can be expressed as The normalized vector v is then given by v v = v = R k ( Δ p k u) = Δ p k u + δ (24) Δ p k Δ p k u + δ u + δ Δ p (25) ku + δ. From Δ m k /Δp k 0, and (23) we see that δ/δp k 0. Consequently Δ p k Δ p k u + δ = u + δ 1 Δ p 1. (26) k The second term in (25) can be rewritten as δ Δ p k u + δ = δ Δ p u + δ k Δ p k By taking into account (25), (26), and (27) wearriveat 1 δ Δ p k 0. (27) v u. (28) v Vector u is by assumption uniformly distributed on the unit sphere. The columns of V k become mutually orthogonal as Δ m k /Δp k approaches zero. In the last step of Algorithm 3, the set of poll directions (D k ) is constructed from the columns of V k such that for every poll direction d it also contains d. Every iteration is associated with an index l k defining the mesh and the poll size. Greater values of l k correspond to shorter poll directions and finer meshes. The update rules for l k and t k are based on those from [2,25]: { lk + 1 f l k+1 = k+1 f k l k 1 f k+1 < f k, { lk+1 l t k+1 = k+1 max i k l i 1 + max i k t i otherwise. (29) (30)

11 Mesh adaptive direct search with second directional... The update rule for t k guaratees that every sequence of minimum frame centers {x k } k K for which {l k } k K ={0, 1, 2,...} corresponds to sequence {N t } t=0.inthe first iteration of QPMADS (k = 0), both t 0 and l 0 are set to zero. The set G k in QPMADS is obtained from D =[I I]. Rounding transforms the orthogonal columns of Δ p k U t k into almost orthogonal columns of V k. If the poll size Δ p k is not sufficiently large compared to Δm k, then rounding can result in a singular V k and consequently D k no longer positively spans R n. Due to this, the mesh and the poll size parameters are updated in the following manner: Δ m k = min { 1, 4 l } k /γ, (31) Δ p k = 2 l k. (32) The constant γ ensures Δ p k /Δm k = 2 lk γ is sufficiently large for all l k Z. It depends on the problem dimension and can be obtained via the cosine measure of D k, denoted by cm(d k ) [11]. The cosine measure of a positive basis B k comprising an orthogonal linear basis for R n and its negative is cm(b k ) = n 1/2 [11]. By rounding the members of B k to the nearest points from G k, a new set of vectors D k is obtained. Its cosine measure [9]isgivenby cm(d k ) cm(b k) δ 0 / b min, (33) 1 + δ 0 / b min where b min is the shortest vector in B k (i.e. the column of Δ p k U t k with the smallest norm). The set D k positively spans R n if cm(d k )>0. Consequently, the rounding error must satisfy [9] δ 0 < n 1/2 b min. (34) The columns of U tk are unit length vectors so b min =Δ p k. Therefore (34) can be satisfied with n 1/2 Δ m k /2 < n 1/2 Δ p k. (35) Rearranging (35) leads to Δ p k /Δm k > n/2. (36) For every γ>n/2, (36) is satisfied for all l k Z. In QPMADS we use γ = 1 + n/2. (37) In problems without constraints the set of normalized poll directions d D k resulting in feasible points x k + d Ω is asymptotically dense in the unit sphere. Let us now consider the case where (2) are the only constraints. Every active bound on optimization variables halves the set of normalized poll directions that result in feasible points. Because poll directions are generated in a random way, the probability of generating a feasible point decreases exponentially with the number of active bounds. The Hessian update described in Sect. 2 uses points x k + d, x k, and x k d, which only makes things worse. The set of directions d for which all three points are feasible reduces to R n m when m bounds are active.

12 Á. Bűrmen et al. For now let us consider the case when both x k + d and x k d violate (2). Then poll direction d is modified to obtain d. The components of d are equal to the components of d when the corresponding components of x k +d satisfy (2). The remaining components of d are set to the corresponding components of d. Poll directions d and d are removed from D k and replaced with d and d. Now in every pair of collinear poll directions, there is at least one that generates a point satisfying (2). The modified Hessian update (used when one of the points x k + d or x k d violates 2) is described in Sect Ordering the poll steps and applying the Hessian update Every time three collinear points are evaluated, the corresponding second directional derivative can be approximated using (5), and update (9) can be applied. Lemma 3 assures us that the approximate Hessian will converge to the actual Hessian of f.to maximize the number of times the update is applied, the poll directions d D k are examined in a particular order. Let d 1, d 2,...,d n denote the n columns of V k.the poll directions from D k are evaluated in the following order: d 1, d 1, d 2, d 2,... (38) The proposed ordering makes it possible to update the Hessian after every even poll direction using function values f (x k d i ), f (x k ), and f (x k + d i ). Note that whenever one of the three collinear points lies outside Ω the function f is still evaluated, as long as the point satisfies (2). This was our initial assumption (i.e. the function and the constraints cannot be evaluated separately). When the bounds given by (2) are not satisfied, the function and the constraints cannot be evaluated. Suppose x k +d fails to satisfy (2). Then d is replaced by d (see the last two paragraphs of Sect. 4) such that x k + d satisfies (2). If x k d also satisfies (2), the function and the constraint can be evaluated at both points and the Hessian update can be applied. It is more common that x k d fails to satisfy (2). In this case it is replaced with x k + 2 d M k. As the step size becomes sufficiently small x k + 2 d always satisfies (2), and again we have three collinear points for the Hessian update (x k, x k + d, and x k + 2 d). One could argue that replacing d with d changes the distribution of the normalized random direction (q = d/ d ) along which the Hessian is updated to such an extent that it is no longer uniform on the unit sphere and Lemma 1 no longer holds. The following theorem shows that the convergence remains linear. Theorem 3 Suppose d is a random vector such that d/ d is uniformly distributed on the unit sphere and the Hessian update B + = B + f 2,d d T Bd d 4 dd T (39) converges linearly to the true Hessian of a quadratic function. If d is replaced with d the convergence of the Hessian update remains linear.

13 Mesh adaptive direct search with second directional... Proof According to Lemma 2 the Hessian approximation converges linearly as long as the expected value of ( d T (H B) d) 2 / d 4 = (q T (H B)q) 2 satisfies E [ (q T (H B)q) 2] α H B 2 F (40) for some α>0. Without loss of generality we can assume H = 0 and B F = 1. Now we need to verify E [ (q T Bq) 2] α>0, B : B F = 1. (41) Because of the way direction d is replaced with d, the set of normalized directions corresponds to the part of the unit sphere S satisfying inequalities of the form ei Tq 0 (active lower bound on i-th variable) or ei T q 0 (active upper bound on i-th variable). Within S the normalized random direction is distributed uniformly. This is because changing the sign of one component of d maps one half of the hypersphere to the other. To compute E [ (q T Bq) 2] one has to integrate (q T Bq) 2 0 over S and divide the result with the surface area of S. Suppose the integral over S is not bounded away from zero. Then it must vanish for some B F = 1. This is only possible if q T Bq is zero almost everywhere on S.Fromq T Bq being a quadratic function we conclude B = 0. The obtained contradiction confirms (41). In [8] it was pointed out that MADS performance can be improved if the poll directions are ordered by ascending angle between d D k and the last direction of success p k0 (angle-based ordering). Let ˆd 1,...,ˆd 2n denote the ordered poll directions. Because of the way D k is constructed the first n vectors ˆd 1,...,ˆd n linearly span R n and every member of D k appears exactly once in the sequence {ˆd 1, ˆd 1, ˆd 2, ˆd 2,...}. QPMADS uses this sequence instead of (38). When m f and m c are available the poll directions can be ordered according to the predicted constraint violation obtained from m c and the predicted function decrease obtaned from m f (model-based ordering). This type of poll direction ordering is prefered over the more simple angle-based ordering. The constraint violation corresponding to step s is given by φ(s) = i I (As + c k ) 2 (42) where I denotes the set of constraint indices corresponding to violated constraints. I = { } i {1, 2,...,m} :ei T (As + c k) 0. (43) The primary criterion for the model-based poll direction ordering is the value of φ(s); i.e., steps corresponding to a lower value of φ are ordered before steps corresponding to a higher value of φ. The secondary criterion is based on m f and is applied

14 Á. Bűrmen et al. only when two poll directions result in the same value of φ. Poll directions with a lower value of m f are ordered before poll directions with a higher value of m f. Model-based ordering is performed in two stages. In the first stage, pairs of poll directions {d, d} are compared based on φ and m f. The resulting n steps denoted by ď 1,...,ď n form a basis for R n. In the second stage these steps are ordered again according to φ and m f to obtain n ordered steps ˆd 1,...,ˆd n. The final ordering of poll directions is then ˆd 1, ˆd 1, ˆd 2, ˆd 2,... Every member of D k appears exactly once in this sequence. Finally, after every speculative search when x k + 2p k satisfies the bounds given by (2), the function value is available at three collinear points (x k, x k + p k, and x k + 2p k ) and the Hessian approximation can be updated using (9). 6 The quadratic programming search In the beginning of every QP search a quadratic model of f and a linear model of c are constructed. The models are given by Eqs. (21) and (22). The Hessian approximation B is gradually built using update formula (9). It corresponds to the quadratic terms in model (21). The linear terms represented by g can be obtained from the computed values of f in the neighbourhood of x k using linear regression. QPMADS keeps a list of tuples of the form (x, f (x), c(x)). Every time f Ω is evaluated at some point x not in the list, a new tuple (x, f (x ), c(x )) is added to the list. The length of the list is limited to 2n + 1. If it is exceeded tuples are purged from the list according to their age (oldest first) until the number of the list members drops to 2n + 1. No attemps are made to keep the set of stored points well poised [14]. Linear regression is applied to obtain g and A from the list of stored points whenever it has at least n + 1 members. The value of g is obtained by minimizing the summed square distance between m f (x) and f (x) for the points in the list. This problem can be formulated as a linear least squares problem, arg min g R n D xg d f 2, (44) where the rows of D x are vectors x x k and the components of vector d f are the corresponding values of 2 1 (x x k) T B(x x k ) + g T (x x k ) + f k f (x). Only points x k = x satisfying x x k ρare used in the regression. Based on numerical experiments we have chosen ρ = 4. Problem (44) can be solved via singular value decomposition of D x. Similar to [15] we replace the singular values that are smaller than 2 52 with Similarly A can be obtained by first formulating a linear least squares problem, arg min D xa T D c 2, (45) A R n n where the rows of D c are the values of A T (x x k ) + c k c(x) corresponding to rows of D x. The singular value decompostion of D x needs to be computed only once for both regressions. If both linear regressions succeed, then B, g, A, and c k define a QP problem,

15 Mesh adaptive direct search with second directional... arg min 2 st Bs + g T s, (46) subject to As + c k 0. (47) s R n 1 If B is indefinite, the corresponding QP problem is NP hard. QP problems with positive semi-definite B are convex and can be solved in polynomial time. There exist several software packages for solving such problems. Our implementation of QPMADS uses the CVXOPT package [4]. When B is indefinite the QP problem given by (46) and (47) is modified to obtain a convex QP problem. This modified QP problem approximates the original one in the neighbourhood of s = 0 along some descent direction. For this purpose Hessian modification ([21], Algorithm 3.3) is applied by replacing B with B+βI where β>0 is sufficiently large so that B + βi is positive definite. The value of β is found by attempting to apply the Cholesky algorithm to B + βi for increasing values of β until the decomposition succeeds. When β>0 constraint, s ρ neg Δ p k (48) is added to constraints (47) to prevent s from becoming too large and resulting in points where the modified QP problem differs significantly from the original one. QPMADS uses ρ neg = 1 <ρ. Note that problem (46) (47) is just an approximation. Its purpose is to produce a QP search step that additionally decreases f Ω. The (possibly modified) QP problem is solved by a dedicated solver. The obtained step s is rounded to grid G k and f Ω is evaluated at x k + R k (s). Note that (48) is actually a trust region similar to the one used in [15]. We also considered applying a trust region for the case when B is positive definite but obtained no significant improvement of the algorithm s performance. In the unconstrained case one could also use a standard trust region solver for the quadratic subproblem and thereby avoid the need to modify B (trust region solvers can handle indefinite Hessians). But even with the proposed crude Hessian modification scheme, we still managed to outperform the state of the art MADS solver [18]. Therefore, we did not probe further in the direction of trust region solvers for the unconstrained subproblem. Applying the trust region approach to constrained problems is significantly more complicated which is why we decided in favour of QP. If there are insufficient points in the list for linear regression (i.e. fewer than n + 1), the linear regression fails, or the QP solver fails to compute s, then the QP search is considered as failed. 7 Convergence The following lemma is used in the convergence proof of QPMADS. Lemma 4 Let K be the set of indices corresponding to all failed iterations of QPMADS. Then the set k K D k is dense on the unit sphere with probability 1.

16 Á. Bűrmen et al. Proof The update rules (29) and (30) guarantee the existence of a subsequence of iteration indices K K such that t k covers the whole set {0, 1, 2,...} and Δ m k 0 for k K and k. One half of the members of D k are the columns of U tk scaled by Δ p k and rounded to G k. The remaining members are their corresponding negatives. The columns of U tk are uniformly distributed on the unit sphere and are dense with probability 1. All statements in the remainder of the proof hold with probability 1. For any x satisfying x =1 and any ɛ>0 there exists an infinite subsequence of indices K K such that for every k K there exists y k Δ p k U t k for which x y k / y k <ɛ/2. (49) Rounding y k to G k introduces an error δ with an upper bound on its norm δ δ 0 = n 1/2 Δ m k /2. Let y k denote R k(y k ) D k. It follows that y k / y k y k / y k = y k/ y k (y k + δ)/ y k + δ δ 0 / min( y k, y k + δ ) Due to the existence of at least one refining subsequence, Δ m k can be made arbitrarily small for all k > k by choosing a sufficiently large k. Therefore for all k K and k > k we have y k / y k y k / y k ɛ/2. (50) Merging (49) and (50) results in x y k / y k ɛ/2 + ɛ/2 = ɛ. (51) Because y k / y k D k the proof is complete. Now we can state the final result. Theorem 4 QPMADS is a valid MADS instance. Proof We begin by showing Algorithm 2 adheres to the MADS framework given by Algorithm 1. Then we show requirements A F are satisfied. The QP search step in k-th iteration is a member of G k. QPMADS follows the MADS algorithmic framework given by Algorithm 1 with one exception. The speculative search is performed after a successful poll. The convergence proof for the MADS framework applies to refining subsequences. These subsequences correspond to iterations for which polling fails to decrease the value of f Ω. A failed poll is not followed by a speculative search. Thus the placement of the speculative search step after the poll step does not invalidate the convergence theory inherited from the MADS framework. Finally by choosing D =[I I] we see QPMADS corresponds to the MADS framework given by Algorithm 1. Requirements A and D are satisfied by construction (see 31, 32, and choose τ = 4). The update rule for l k given by (29) which is the basis for computing Δ m k ensures Requirement B is satisfied. One half of the members of D k are columns of orthogonal matrix U tk scaled by Δ p k and rounded to the nearest member of G k while the other half

17 Mesh adaptive direct search with second directional... are their negatives. Rounding introduces an error bounded in norm by δ 0 = n 1/2 Δ m k /2 (see 23). Thus we have ( ) d Δ p k + δ Δp k 1 + n1/2 2 Δm k Δ p = Δ p k (1 + n1/2 2 l ) k 1. (52) γ k Requirement C is satisfied by noting that the last parenthetical expression in (52) is bounded above by C = 1 + n 1/2 /(2γ). The lower bound on the cosine measure of D k is given by (33). Because of the way the set of unrounded poll directions is constructed the lower bound on the cosine measure of D k is cm(d k ) n 1/2 δ 0 /Δ p k 1 + δ 0 /Δ p k = n 1/2 n 1/2 Δ m k /(2Δp k ) 1 + n 1/2 Δ m k /(2Δp k ). (53) By taking into account Δ m k /Δp k γ 1 and γ = n/2 + 1wehave cm(d k ) γ n 1/2 n 1/2 /2 γ + n 1/2 /2 = 2n 1/2 n + n 1/2 > 0. (54) + 2 Since the cosine measure of D k is uniformly bounded away from zero, all limit points of D k must be positive spanning sets (Requirement E). Finally, Lemma 4 ensures Requirement F is satified with probability 1. Theorem 4 guarantes all of the convergence properties given in [6] with probability 1. 8 Results and discussion QPMADS was implemented in Python [26]. For numerical computations the NumPy and the SciPy libraries were used [22]. The performance of QPMADS was compared to the performance of the NOMAD black-box optimization software [12,18], version All algorithmic parameters were kept at their respective default values, except for the following ones: 1. The set of poll directions comprised an orthogonal linear basis for R n and its negative, 2. the algorithm was stopped when Δ p k fell below With this choice of algorithmic parameters NOMAD uses the OrthoMADS instance of MADS [2]. Its main drawback is the non-uniform distribution of poll directions [25]. The poll directions in OrthoMADS tend to concentrate around coordinate directions as the problem dimension increases. All algorithms were stopped when the poll size parameter Δ p k dropped below The number of function evaluations was limited to 1000(n+1). For every problem and every tested algorithm two runs were performed. The first run didn t use quadratic models, implying the QP search was omitted and a

18 Á. Bűrmen et al. simple poll direction ordering was used. In the second run, QP search was included and model-based poll direction ordering was used, whenever this was possible (i.e. a model was successfully computed). The four MADS instances were tested on four sets of test problems. The first three [25] comprised 60 smooth problems (problem dimensions between 3 and 40), 62 nonsmooth problems (problem dimension between 4 and 40), and 28 constrained problems (problem dimension between 4 and 40). The fourth set (also referred to as the MADSMODEL set) comprised a mix of 48 test problems (smooth, nonsmooth, and constrained) from [12] with problem dimension between 2 and 50. The performance of the tested algorithms was expressed in terms of data profiles [20]. A data profile visualizes the share of the problems solved by an algorithm with respect to the computational budget expressed in terms of simplex gradient evaluations (one simplex gradient evaluation corresponds to n+1 function evaluations). A problem is considered as solved when a feasible point is found for which the corresponding value of the function subject to optimization satisfies f < f L + ɛ ( f 0 f L ), (55) where f 0 denotes the function value at the initial point, and f L the lowest function value found by all considered algorithms given the maximal allowed budget (in our case 1000(n + 1) function evaluations). The accuracy given by ɛ was set to 10 3.A data profile is a monotonically increasing function of the computational budget. It is bounded from above by 1. The data profiles are depicted in Fig. 1. Without quadratic models the use of uniformly distributed search directions improves the performance of MADS with respect to OrthoMADS for smooth and nonsmooth problems. The same can be said for MADSMODEL problems and computational budgets up to 500(N + 1). Smooth unconstrained problems admit an open halfspace of descent directions at every point in R n. Nonsmooth and constrained problems have points where the set of descent directions can be significantly smaller. For such functions choosing a uniformly distributed random direction is expected to perform better than choosing a non-uniformly distributed random direction (like in OrthoMADS). This is consistent with the observed improvement of QPMADS compared to NOMAD on nonsmooth and MADSMODEL test problems. On the other hand we found no simple explanation for the worse performance of QPMADS observed on the set of constrained problems. When quadratic models are used QPMADS performs significantly better than NOMAD on constrained problems. In our numerical experiments both algorithms used the extreme barrier approach for handling constraints. Due to this one might think the performance of NOMAD would improve if the progressive barrier approach had been used [7]. To clear this issue we compared the performance of QPMADS, NOMAD with extreme barrier, and NOMAD with progressive barrier on the set of constrained problems and on the MADSMODEL set of problems. The former comprises nonlinearly constrained problems only, while the latter contains 10 nonlinearly constrained problems. The data profiles are depicted in Fig. 2. The profiles show that the use of progressive barrier improves the performance of NOMAD on constrained problems to some extent, but not enough to compete with QPMADS.

19 Mesh adaptive direct search with second directional... Fig. 1 Data profiles for QPMADS and NOMAD for 60 smooth (top left), 62 nonsmooth (top right), and 28 constrained (bottom left) problems from [25]. The bottom right data profile was obtained with the set of 48 MADSMODEL problems from [12] Fig. 2 Data profiles for QPMADS, NOMAD with extreme barrier, and NOMAD with progressive barrier obtained on the set of constrained problems (left) and on the MADSMODEL set (right) As expected, the use of quadratic models significantly improves the performance of NOMAD (see [12]). NOMAD builds the quadratic models from a set of nearby points evaluated in the past. Building a full quadratic model requires (n + 1)(n + 2)/2 points. When there are insufficient points available the quadratic model is obtained by finding matrix B with the smallest Frobenius norm (smallest curvature model). Unlike QPMADS, NOMAD builds quadratic models for all constraints.

20 Á. Bűrmen et al. Table 1 Percentage of problems solved with a budget of 1000(n + 1) function evaluations Smooth Nonsmooth Constrained MADSMODEL NOMAD, no models QPMADS, no models NOMAD QPMADS NOMAD s approach to building quadratic models has a disadvantage due to its computational complexity. It requires the solution of a linear system with up to (n + 1)(n + 2)/2 equations. Thus the computational complexity of the underlying linear subproblems can grow proportionally to n 6. On the other hand QPMADS approximates the Hessian by updating B. The update itself can be expressed in terms of vector dot products and matrix-vector products with computational complexity proportional to n 2. The computation of vector g and matrix A involves linear regression with up to 2n + 1 points computed via singular value decomposition (computational complexity proportional to n 3 ). The obtained QCQP subproblem in NOMAD is solved by means of a simple MADS algorithm. In constrained problems there are often points where the set of descent directions is much smaller than an open halfspace. Guessing a descent direction with given step length (which is actually what MADS does in the poll step) can be hard in the neighbourhood of such points. Consequently the quality of the subproblem solution can be low. On the other hand, QPMADS uses a dedicated QP solver that reliably solves the QP subproblem. When quadratic models were used QPMADS outperformed NOMAD on all four sets of test functions. For small computational budgets (less than 200 simplex gradient evaluations) NOMAD performed slightly better on smooth, nonsmooth, and MADSMODEL problems. We attribute this to the fact that NOMAD uses much more information for building the quadratic models than QPMADS, where the quadratic part of the model is built by means of second directional derivative-based updates. The percentage of the problems solved with a large computational budget is given in Table 1. QPMADS solved 97 % all of the smooth problems. On the remaining three sets of test problems QPMADS solved between 74 and 89 % of all problems. On constrained problems the success rate of QPMADS was more than two times greater than the success rate of NOMAD. Table 2 lists the number of function evaluations required for solving selected problems from [25] with accuracy ɛ = 10 3 and ɛ = 10 7 when f L = f is used as the lowest function value in (55). The first four problems are smooth, the next three are nonsmooth, and the last two are constrained. The computational budget was set to 1000(n + 1) function evaluations. The table illustrates the effect of the required solution accuracy on the number of function evaluations that are needed for finding the solution.

21 Mesh adaptive direct search with second directional... Table 2 Number of function evaluations required for reaching prescribed precision (ɛ) for selected problems Function/constraint n f With models Without models QPMADS NOMAD QPMADS NOMAD Broyden trid / / / /414 Penalty I / / / /771 Trigonometric /- / / / Discrete int. eq / / / /21344 ElAttar /- / / / Active faces / / / /6313 Gen. MAXQ / / / /11843 CB3 I / MAD1 I / / / 1638/4568 LQ / MAD1 I / / 6361/ 7634/ The first and the second value in every pair correspond to ɛ = 10 3 and ɛ = 10 7, respectively. A dash indicates a failure to solve a problem with corresponding precision

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations Oleg Burdakov a,, Ahmad Kamandi b a Department of Mathematics, Linköping University,

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

Constrained Nonlinear Optimization Algorithms

Constrained Nonlinear Optimization Algorithms Department of Industrial Engineering and Management Sciences Northwestern University waechter@iems.northwestern.edu Institute for Mathematics and its Applications University of Minnesota August 4, 2016

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Appendix A Taylor Approximations and Definite Matrices

Appendix A Taylor Approximations and Definite Matrices Appendix A Taylor Approximations and Definite Matrices Taylor approximations provide an easy way to approximate a function as a polynomial, using the derivatives of the function. We know, from elementary

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Implicitely and Densely Discrete Black-Box Optimization Problems

Implicitely and Densely Discrete Black-Box Optimization Problems Implicitely and Densely Discrete Black-Box Optimization Problems L. N. Vicente September 26, 2008 Abstract This paper addresses derivative-free optimization problems where the variables lie implicitly

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Jim Lambers MAT 610 Summer Session Lecture 2 Notes Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Self-Concordant Barrier Functions for Convex Optimization

Self-Concordant Barrier Functions for Convex Optimization Appendix F Self-Concordant Barrier Functions for Convex Optimization F.1 Introduction In this Appendix we present a framework for developing polynomial-time algorithms for the solution of convex optimization

More information

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Optimization Methods

Optimization Methods Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

MESH ADAPTIVE DIRECT SEARCH ALGORITHMS FOR CONSTRAINED OPTIMIZATION

MESH ADAPTIVE DIRECT SEARCH ALGORITHMS FOR CONSTRAINED OPTIMIZATION SIAM J. OPTIM. Vol. 17, No. 1, pp. 188 217 c 2006 Society for Industrial and Applied Mathematics MESH ADAPTIVE DIRECT SEARCH ALGORITHMS FOR CONSTRAINED OPTIMIZATION CHARLES AUDET AND J. E. DENNIS, JR.

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Chapter 5 HIGH ACCURACY CUBIC SPLINE APPROXIMATION FOR TWO DIMENSIONAL QUASI-LINEAR ELLIPTIC BOUNDARY VALUE PROBLEMS

Chapter 5 HIGH ACCURACY CUBIC SPLINE APPROXIMATION FOR TWO DIMENSIONAL QUASI-LINEAR ELLIPTIC BOUNDARY VALUE PROBLEMS Chapter 5 HIGH ACCURACY CUBIC SPLINE APPROXIMATION FOR TWO DIMENSIONAL QUASI-LINEAR ELLIPTIC BOUNDARY VALUE PROBLEMS 5.1 Introduction When a physical system depends on more than one variable a general

More information

Review of Classical Optimization

Review of Classical Optimization Part II Review of Classical Optimization Multidisciplinary Design Optimization of Aircrafts 51 2 Deterministic Methods 2.1 One-Dimensional Unconstrained Minimization 2.1.1 Motivation Most practical optimization

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS A. L. CUSTÓDIO, J. E. DENNIS JR., AND L. N. VICENTE Abstract. It has been shown recently that the efficiency of direct search methods

More information

Gradient-Based Optimization

Gradient-Based Optimization Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems

More information

Lecture 6. Numerical methods. Approximation of functions

Lecture 6. Numerical methods. Approximation of functions Lecture 6 Numerical methods Approximation of functions Lecture 6 OUTLINE 1. Approximation and interpolation 2. Least-square method basis functions design matrix residual weighted least squares normal equation

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Lecture 13: Constrained optimization

Lecture 13: Constrained optimization 2010-12-03 Basic ideas A nonlinearly constrained problem must somehow be converted relaxed into a problem which we can solve (a linear/quadratic or unconstrained problem) We solve a sequence of such problems

More information

Notes on Cellwise Data Interpolation for Visualization Xavier Tricoche

Notes on Cellwise Data Interpolation for Visualization Xavier Tricoche Notes on Cellwise Data Interpolation for Visualization Xavier Tricoche urdue University While the data (computed or measured) used in visualization is only available in discrete form, it typically corresponds

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic

More information

Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions. Charles J. Geyer. March 27, 2013 Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

Quadratic Programming

Quadratic Programming Quadratic Programming Outline Linearly constrained minimization Linear equality constraints Linear inequality constraints Quadratic objective function 2 SideBar: Matrix Spaces Four fundamental subspaces

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

1 Column Generation and the Cutting Stock Problem

1 Column Generation and the Cutting Stock Problem 1 Column Generation and the Cutting Stock Problem In the linear programming approach to the traveling salesman problem we used the cutting plane approach. The cutting plane approach is appropriate when

More information

Nonlinear Manifold Learning Summary

Nonlinear Manifold Learning Summary Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection

More information

Basic Math for

Basic Math for Basic Math for 16-720 August 23, 2002 1 Linear Algebra 1.1 Vectors and Matrices First, a reminder of a few basic notations, definitions, and terminology: Unless indicated otherwise, vectors are always

More information

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:

More information

Multidisciplinary System Design Optimization (MSDO)

Multidisciplinary System Design Optimization (MSDO) Multidisciplinary System Design Optimization (MSDO) Numerical Optimization II Lecture 8 Karen Willcox 1 Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Today s Topics Sequential

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

47-831: Advanced Integer Programming Lecturer: Amitabh Basu Lecture 2 Date: 03/18/2010

47-831: Advanced Integer Programming Lecturer: Amitabh Basu Lecture 2 Date: 03/18/2010 47-831: Advanced Integer Programming Lecturer: Amitabh Basu Lecture Date: 03/18/010 We saw in the previous lecture that a lattice Λ can have many bases. In fact, if Λ is a lattice of a subspace L with

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

Arc Search Algorithms

Arc Search Algorithms Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where

More information

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Key words. conjugate gradients, normwise backward error, incremental norm estimation. Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322

More information

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee J. Korean Math. Soc. 0 (0), No. 0, pp. 1 0 https://doi.org/10.4134/jkms.j160152 pissn: 0304-9914 / eissn: 2234-3008 DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 5 Nonlinear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

8. Diagonalization.

8. Diagonalization. 8. Diagonalization 8.1. Matrix Representations of Linear Transformations Matrix of A Linear Operator with Respect to A Basis We know that every linear transformation T: R n R m has an associated standard

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

Directional Field. Xiao-Ming Fu

Directional Field. Xiao-Ming Fu Directional Field Xiao-Ming Fu Outlines Introduction Discretization Representation Objectives and Constraints Outlines Introduction Discretization Representation Objectives and Constraints Definition Spatially-varying

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS Pré-Publicações do Departamento de Matemática Universidade de Coimbra Preprint Number 06 48 USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS A. L. CUSTÓDIO, J. E. DENNIS JR. AND

More information

arxiv: v1 [math.na] 5 May 2011

arxiv: v1 [math.na] 5 May 2011 ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and

More information

Positive semidefinite matrix approximation with a trace constraint

Positive semidefinite matrix approximation with a trace constraint Positive semidefinite matrix approximation with a trace constraint Kouhei Harada August 8, 208 We propose an efficient algorithm to solve positive a semidefinite matrix approximation problem with a trace

More information

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: Unconstrained Convex Optimization 21 4 Newton Method H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: f(x + p) f(x)+p T f(x)+ 1 2 pt H(x)p ˆf(p) In general, ˆf(p) won

More information

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year MATHEMATICS FOR COMPUTER VISION WEEK 8 OPTIMISATION PART 2 1 Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year 2013-14 OUTLINE OF WEEK 8 topics: quadratic optimisation, least squares,

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

1 Non-negative Matrix Factorization (NMF)

1 Non-negative Matrix Factorization (NMF) 2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

More information

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section 4 of

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L. McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal

More information

Iterative methods for Linear System

Iterative methods for Linear System Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline Basics: Matrices and their properties Eigenvalues, Condition Number Iterative Methods Direct and

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information