College of William & Mary Department of Computer Science

Size: px
Start display at page:

Download "College of William & Mary Department of Computer Science"

Transcription

1 Technical Report WM-CS College of William & Mary Department of Computer Science WM-CS Active Set Identification without Derivatives Robert Michael Lewis and Virginia Torczon September 2008 Submitted to SIAM Journal on Optimization

2 ACTIVE SET IDENTIFICATION WITHOUT DERIVATIVES ROBERT MICHAEL LEWIS AND VIRGINIA TORCZON Abstract. We consider active set identification for linearly constrained optimization problems in the absence of explicit information about the derivative of the objective function. We focus on generating set search methods, a class of derivative-free direct search methods. We begin by presenting some general results on active set identification that are not tied to any particular algorithm. These general results are sufficiently strong that, given a sequence of iterates converging to a Karush Kuhn Tucker point, they allow us to identify binding constraints for which there are nonzero multipliers. We then discuss why these general results, which are posed in terms of the direction of steepest descent, apply to generating set search, even though these methods do not have explicit recourse to derivatives. Nevertheless, there is a clearly identifiable subsequence of iterations at which we can reliably estimate the set of constraints that are binding at a solution. We discuss how active set estimation can be used to accelerate generating set search methods and illustrate the appreciable improvement that can result using several examples from the CUTEr test suite. We also introduce two algorithmic refinements for generating set search methods. The first expands the subsequence of iterations at which we can make inferences about stationarity. The second is a more flexible step acceptance criterion. Key words. active sets, constrained optimization, linear constraints, direct search, generating set search, generalized pattern search, derivative-free methods. AMS subject classifications. 90C56, 90C30, 65K05 1. Introduction. We consider active constraint identification for generating set search (GSS) algorithms for linearly constrained minimization. For our discussion the optimization problem of interest will be posed as minimize f(x) subject to Ax b, (1.1) where f : R n R and A R m n. GSS algorithms are a class of direct search methods and as such do not make explicit use of derivatives of the objective f [11]. Results on active constraint identification for smooth nonlinear programming, on the other hand, do rely explicitly on the gradient f(x) of f (e.g., [5, 3, 2, 4, 7, 18]). Nevertheless, GSS algorithms compute information in the course of their operation that can be used to identify the set of constraints active at a Karush Kuhn Tucker (KKT) point of (1.1). Thus, while we assume for our theoretical results that f is continuously differentiable, for the application to GSS methods we do not assume that f(x) (or an approximation thereto) is computationally available. The fact that active set estimation is possible in the absence of an explicit linear or quadratic model of the objective is the reason for the intentionally provocative title of this paper. We first prove some general active set identification results. While these results are intended to reveal the active set identification properties of GSS methods. they are general and not tied to any particular class of algorithms. The first set of results, Theorems 3.2 and 3.3 and Corollaries 3.4 and 3.5, establish the active set identification properties of a sequence of points in terms of the projection of the direction Department of Mathematics, College of William & Mary, P.O. Box 8795, Williamsburg, Virginia, ; buckaroo@math.wm.edu. This work was supported by the National Science Foundation under Grant No. DMS Department of Computer Science, College of William & Mary, P.O. Box 8795, Williamsburg, Virginia, ; va@cs.wm.edu. 1

3 of steepest descent onto a cone associated with constraints near the points in the sequence. In this first set of results we allow the sequence to be entirely interior to the feasible region in (1.1), and this leads to additional requirements of linear independence of active constraints and strict complementarity that, while standard, we wish to avoid. We do so by observing that an easily computed auxiliary sequence of points (that can include points on the boundary) possesses even stronger active set identification properties. In Theorem 3.6 we show that for this auxiliary sequence we have the stronger result that the projection of the direction of steepest descent onto the feasible region tends to zero. From results of Burke and Moré [3, 4] we then have the following. First is that this auxiliary sequence allows us to identify the active constraints for which there are nonzero Lagrange multipliers (unique or not) without any assumption of linear independence of the active constraints. Second is that under mild nondegeneracy assumptions this auxiliary sequence will find the face determined by the active constraints in a finite number of iterations. Any algorithm for which these general results hold will have active set identification properties at least as strong as the gradient projection method [5, 3, 2, 4]. It is therefore perhaps surprising that these results apply to GSS methods, which do not have direct access to the gradient. However, even though GSS methods do not have recourse to f(x), there does exist an implicit relationship between an explicitly known algorithmic parameter used to control step lengths and an appropriately chosen measure of stationarity for (1.1) [13]. As the step-length control parameter tends to zero, so does this measure of stationarity. This correlation means that GSS methods generate at least one subsequence of iterates that satisfies the hypotheses of our general active set identification results, and so GSS methods are able to identify active sets of constraints, even without direct access to the derivative of the objective. We illustrate the practical benefits of incorporating active set estimation into GSS algorithms. We propose several ways in which one can use estimates of the active set to accelerate the search for a solution of (1.1) and present numerical tests that show these strategies can appreciably accelerate progress towards a solution. These tests show that generating set search can be competitive with a derivative-based quasi- Newton algorithm. As part of these algorithmic developments we also introduce two refinements for GSS methods. The first expands the subsequence of iterations at which we can make inferences about stationarity by introducing the notion of tangentially unsuccessful iterations, as discussed further in Section 4.4. The second is a more flexible step acceptance criterion that scales automatically with the magnitude of the objective, as discussed further in Section 4.3. The general results on active set identification are given in Section 3. The relevant features of GSS methods are reviewed and expanded in Section 4. Section 5 discusses the active set identification properties of GSS methods and proposes acceleration schemes based on them. Section 6 contains some numerical tests of these ideas. 2. Notation. Let a T i be the ith row of the constraint matrix A. The set of constraints active at a point x is defined to be A(x) = { i a T } i x = b i. Let C i = { y a T i y = b } i denote the set of points where the ith constraint is binding. Denote the feasible region by Ω: Ω = { x Ax b }. Given x Ω and ε 0, define { bi a T i I(x,ε) = i dist(x, C i ) = x } ε. (2.1) a i 2

4 The vectors a i for i I(x,ε) are the outward-pointing normals to the constraints defining the boundary of Ω within distance ε of x. We call the set of such vectors the working set of constraints associated with x and ε. The ε-normal cone N(x,ε) and the ε-tangent cone T(x,ε) are defined to be N(x,ε) = v v = i I(x,ε) c i a i, c i 0 {0} and T(x,ε) = N (x,ε). If ε = 0 these cones are the usual normal cone N(x) and tangent cone T(x) of Ω at x. If K R n is a cone and v R n, then we denote the Euclidean norm projection of v onto K by [v] K. Finally, denotes the Euclidean norm. 3. Results concerning active set identification. Suppose x is a KKT point of (1.1), and consider an algorithm that generates a sequence of feasible points {x k } Ω, indexed by k K, such that x k x as k. Let A(x ) denote the set of indices of the linear constraints active at x. Then at x we have f(x ) = λ j a j, λ j 0 for all j A(x ). j A(x ) At this point we make no assumption that the Lagrange multipliers λ j are unique. We wish to show that for some sequence {ˆx k } either {x k } or one we can derive from {x k } and a set of constraints W(ˆx k ) that we can construct knowing ˆx k, we have the following: 1. For all k sufficiently large, we either identify the active set at x or at least those constraints in A(x ) with nonzero multipliers. That is, for all k sufficiently large we would prefer to have A(x ) = W(ˆx k ) and that at the very least we want { i A(x ) λ i > 0 } W(ˆx k ). 2. If x lies in a face of Ω, then we land on the face that contains x in a finite number of iterations. That is, for all k sufficiently large we have A(ˆx k ) = A(x ). In particular, if x is at a vertex of Ω, then we land on x in a finite number of iterations: ˆx k = x for all k sufficiently large. Moreover, we want these results to hold under assumptions on f(x ) and A(x ) that are as mild as possible. Because the gradient projection algorithm possesses these properties [5, 3, 2, 4], we view these requirements as reasonable expectations for any combination of optimization algorithm and active set identification technique applied to (1.1) when f is differentiable. Theorem 3.1 introduces the property of the iterates x k that is central to the active set identification properties of generating set search. Theorem 3.1. Suppose that f is continuously differentiable on Ω, that x k x, and that there exist sequences ε k and η k with ε k,η k 0 and ε k,η k 0 such that [ f(xk )] T(xk,ε k ) ηk. Then x is a KKT point of (1.1). Proof. Since x k x and ε k 0 we know that once k K is sufficiently large we have I(x k,ε k ) A(x ) [16, Proposition 1]. Henceforth restrict attention to such k. Since I(x k,ε k ) A(x ) it follows that N(x k,ε k ) N(x,0). Therefore T(x,0) T(x k,ε k ) and so [ f(xk )] T(x,0) [ f(xk )] T(xk,ε k ) ηk. 3

5 Taking the limit as k yields [ f(x )] T(x,0) = 0, and the result follows. Theorem 3.2 is a first step towards finding a set of constraints that characterize the KKT point x. Theorem 3.2. Suppose that f is continuously differentiable on Ω, that x k x, and that there exist sequences ε k and η k with ε k,η k 0 and η k 0 such that [ f(x k )] T(xk,ε k ) η k. Then for all k K sufficiently large, f(x ) N(x k,ε k ). Proof. Since there is only a finite number of possible distinct working sets I(x k,ε k ), we know that there exists k K such that if k k then the associated working set I(x k,ε k ) appears infinitely often among the working sets { I(x l,ε l ) l k }. Now suppose k K and k k. Then lim l [ f(x l )] T(xk,ε k ) lim l η l, I(x l,ε l )=I(x k,ε k ) I(x l,ε l )=I(x k,ε k ) whence [ f(x )] T(xk,ε k ) = 0. It follows that f(x ) N(x k,ε k ). The next theorem relates active constraints that are missing from I(x k,ε k ) to their associated Lagrange multipliers λ j. Its significance will be made explicit in Corollaries 3.4 and 3.5. For i A(x ) let Π i ( ) denote the projection onto the orthogonal complement of the linear span of the other active constraints { a j j A(x ) {i} }. Theorem 3.3. Suppose that f is continuously differentiable on Ω, that x k x, and that there exist sequences ε k and η k with ε k,η k 0 and ε k 0 such that [ f(xk )] T(xk,ε k ) ηk. Then for all k K sufficiently large, if i A(x ) but i I(x k,ε k ), then λ i Π i (a i ) η k + f(x k ) f(x ). Proof. Since x k x and ε k 0 we know that once k K is sufficiently large we have I(x k,ε k ) A(x ). Henceforth we restrict attention to such k. Let N(A(x ) {i}) be the cone generated by { a j j A(x ) {i} } and let T(A(x ) {i}) be the cone polar to N(A(x ) {i}). Since i A(x ) but i I(x k,ε k ) we know that I(x k,ε k ) A(x ) {i}, so it follows that N(x k,ε k ) N(A(x ) {i}) and thus T(A(x ) {i}) T(x k,ε k ). This means that Meanwhile, [ f(x k )] T(A(x ) {i}) [ f(x k )] T(xk,ε k ). (3.1) Π i ( [ f(x )] N(A(x ) {i}) + [ f(x )] T(A(x ) {i})) = Πi ( whence λ j a j + λ i a i ) j A(x ) {i} Π i ( [ f(x )] T(A(x ) {i})) = λi Π i (a i ), λ i Π i (a i ) = Πi ( [ f(x )] T(A(x ) {i})) [ f(x )] T(A(x ) {i}) 4.

6 The theorem then follows from (3.1): λ i Π i (a i ) [ f(x )] T(xk,ε k ) [ f(x k )] T(xk,ε k ) η k + f(x k ) f(x ). + [ f(x k )] T(xk,ε k ) [ f(x )] T(xk,ε k ) In the last step we use the fact that projection onto a convex cone is nonexpansive. Several useful facts follow from Theorem 3.3. The first concerns the nature of any constraints in A(x ) that we fail to capture in I(x k,ε k ), our estimate of A(x ). Suppose constraint i is active at x, and that Π i (a i ) 0. Then once k is sufficiently large, constraint i can fail to be in I(x k,ε k ) only if the associated multiplier λ i is small. If the multipliers are valid shadow costs, then once k is sufficiently large the only constraints in A(x ) that can possibly be absent from I(x k,ε k ) are those that play only a small role in constraining the optimal objective value. This observation, and its consequences as k, are summarized in Corollary 3.4. Corollary 3.4. Suppose that the conditions of Theorem 3.3 hold. In addition, suppose that η k 0 and i A(x ) is such that Π i (a i ) 0. Then 1. If i I(x k,ε k ) and k K is sufficiently large, then λ i (η k + f(x k ) f(x ) ) / Π i (a i ). (3.2) 2. If i I(x k,ε k ) for infinitely many k K, then λ i = 0, where λ i is the multiplier associated with a i at x. 3. For all k K sufficiently large we have { j A(x ) λ j > 0, Π j (a j ) 0 } I(x k,ε k ). Proof. Since Π i (a i ) 0, Part 1 follows from Theorem 3.3. To prove Part 2, let K K be an infinite subsequence of iterations for which i I(x k,ε k ). Then taking the limit of (3.2) for k K as k yields η k + f(x k ) f(x ) λ i lim k K Π i (a i ) = 0. Part 3 is the contrapositive of Part 2. Note that the nonlinear program (1.1) is posed in terms of inequality constraints. We assume that any equality constraints are expressed as a pair of opposing inequalities: a T i x b i and a T i x b i. Let E denote the set of equality constraints: E = { i a T i x = b i for all x Ω }. The set of constraints A(x ) E is the set of true inequalities that are active at x. Corollary 3.5. Suppose that the conditions of Theorem 3.3 hold, that η k 0, that x k Ω for all k, and that for all i A(x ) E we have Π i (a i ) 0. Then 1. For all k K sufficiently large we have E {j A(x ) E λ j > 0} I(x k,ε k ). 2. If strict complementarity holds at x for the constraints in A(x ) E, then A(x ) = I(x k,ε k ) for all k sufficiently large. Proof. Since by assumption x k Ω for all k, we are guaranteed that the equality constraints are satisfied at every iteration and thus E I(x k,ε k ) for all k. Moveover, Π i (a i ) 0 for all i A(x ) E, so Part 3 of Corollary 3.4 holds for the constraints 5

7 in A(x ) E. Part 1 then follows. If strict complementarity holds for the constraints in A(x ) E then E { j A(x ) E λ j > 0 } = A(x ), so A(x ) I(x k,ε k ). However, for k sufficiently large we know that I(x k,ε k ) A(x ), so Part 2 follows. Theorem 3.3 and Corollary 3.4 allow us to make useful inferences about the active set A(x ) from the sets I(x k,ε k ). Corollary 3.5 shows that under stronger assumptions we have I(x k,ε) = A(x ), which is the first of the goals outlined at the start of this section. However, the linear independence hypothesis of Corollary 3.5 is dissatisfying since it does not hold at such reasonable and benign points as the apex of a pyramid. In addition, we have not required any of the iterates x k to lie on the boundary of Ω, and we need iterates that lie on the boundary of Ω in order to achieve our second goal, that A(x k ) = A(x ) after a finite number of iterations. To address these issues we define an auxiliary sequence {ˆx k } of points that can lie on the boundary of Ω. Suppose the set { x Ax b and a T i x = b i for i I(x k,ε k ) } is nonempty (if I(x k,ε k ) =, then this set should be taken to mean all of R n ). In this situation define { ˆx k = argmin x xk Ax b and a T i x = b i for i I(x k,ε k ) }. (3.3) x Note that when I(x k,ε k ) =, ˆx k = x k. We will show that {ˆx k } has even stronger active set identification properties than {x k }. Theorem 3.6. Suppose that f is continuously differentiable on Ω, that x k x, and that there exist sequences ε k and η k with ε k,η k 0 and ε k,η k 0 such that [ f(xk )] T(xk,ε k ) ηk. Then 1. ˆx k is defined for all k K sufficiently large; 2. lim ˆx k x k = 0; and k 3. lim [ f(ˆx k)] T(ˆxk,0) = 0. k Proof. Let k K be sufficiently large that I(x k,ε k ) A(x ). Then we know that x { x Ax b and a T i x = b i for all i I(x k,ε k ) }. Since x is feasible for the optimization problem in (3.3), it follows that ˆx k is defined. This proves Part 1. Moreover, since x is feasible for (3.3), then for all sufficiently large k we have ˆx k x k x x k. Since x k x we conclude that ˆx k x k 0, which is Part 2. Finally, the definition of ˆx k ensures that all the constraints in I(x k,ε k ) are active at ˆx k. Thus N(x k,ε k ) N(ˆx k,0) and so T(ˆx k,0) T(x k,ε k ), whence [ f(ˆx k )] T(ˆxk,0) [ f(ˆx k )] T(xk,ε k ) [ f(xk )] T(xk,ε k ) η k + f(ˆx k ) f(x k ). + [ f(ˆxk )] T(xk,ε k ) [ f(x k )] T(xk,ε k ) By Part 2, we know that ˆx k x k 0 as k. Taking the limit yields lim [ f(ˆx k)] T(ˆxk,0) = 0, k which is Part 3. 6

8 We next recall some results of Burke and Moré [3, 4], stated here in terms of the linearly constrained problem (1.1). Theorem 3.7 assumes a mild nondegeneracy condition and gives a necessary and sufficient condition for identifying the active set A(x ) from A(ˆx k ) in a finite number of iterations. It also describes a situation in which x will be attained in a finite number of iterations. Theorem 3.8 gives a necessary and sufficient condition for the identification of all constraints in A(x ) for which there is a strictly positive Lagrange multiplier. Theorem 3.7. Let f : R n R be continuously differentiable on Ω, and assume that {ˆx k } Ω converges to a KKT point x for which f(x ) is in the relative interior of N(x ). 1. [3, Corollary 3.6] Then A(ˆx k ) = A(x ) for all k sufficiently large if and only if {[ f(ˆx k )] T(ˆxk,0)} converges to zero. 2. [3, Corollary 3.5] If N(x ) has a nonempty interior and {[ f(ˆx k )] T(ˆxk,0)} converges to zero, then ˆx k = x for all k sufficiently large. The requirement that f(x ) lie in the relative interior of N(x ) is equivalent to f(x ) = λ j a j, λ j > 0 for all j A(x ). j A(x ) (See [3, Lemma 3.2]). Theorem 3.8. [4, Theorem 4.5] Let f : R n R be continuously differentiable on Ω, and assume that {ˆx k } Ω converges to a KKT point x at which we have f(x ) = λ i a i, λ i 0. (3.4) i A(x ) Then lim k [ f(ˆx k )] T(ˆxk,0) = 0 if and only if { i A(x ) λ i > 0 } A(ˆx k ) for all k sufficiently large. From Theorems 3.7 and 3.8 and Part 3 of Theorem 3.6 we obtain Corollary 3.9, which achieves the goals concerning active set identification set out at the beginning of this section. Corollary 3.9 says that while I(x k,ε k ) serves as a useful estimate of the constraints active at x, the constraints in A(ˆx k ) are a sharper estimate. Corollary 3.9. Suppose that x k x, η k 0, ε k 0, and [ f(xk )] T(xk,ε k ) ηk. Then we have the following: 1. If f(x ) is in the relative interior of N(x ), A(ˆx k ) = A(x ) for all k sufficiently large. If, in addition, N(x ) has a nonempty interior, then ˆx k = x for all k sufficiently large. 2. Suppose f(x ) = λ i a i, λ i 0. i A(x ) Then { i A(x ) λ i > 0 } A(ˆx k ) for all k sufficiently large. 4. Generating set search for linearly constrained minimization. We next review the features of GSS that are relevant to active set identification for (1.1) (see [13, 14] for further discussion). GSS algorithms for linearly constrained problems are feasible iterate methods. Associated with each x k is a value ε k 0, defined shortly, which is used to compute a working set of nearby constraints I(x k,ε k ) as defined by (2.1). The connection with the active set identification results of the previous section comes through the stationarity properties of GSS discussed in Section

9 4.1. Partitioning the search directions. At the center of the convergence properties of GSS methods is the notion of a core set of search directions, denoted G k, which is constructed at each iteration in a way that ensures the inclusion of at least one direction of descent along which a feasible step of sufficient length can be taken. In the linearly constrained case G k must comprise a set of generators for the ε-tangent cone T(x k,ε k ). If G k satisfies this requirement, then convergence results can be obtained under straightforward conditions detailed in [13]. The core search directions (the elements of G k ) conform to the boundary of Ω near x k, where near is determined by the value of ε k. To simplify the discussion, we assume that all of the directions in G k are normalized. For details on other options, see [13, section 2.3]. As a practical matter it is useful to allow additional search directions. For instance, in Section 5 we introduce additional search directions to effect the acceleration strategies based on active set estimation. The set of additional search directions is denoted by H k. The total set of search directions D k is thus D k = G k H k. In the event that T(x k,ε k ) contains a lineality space (as in Figure 4.2(a) and Figure 4.2(f)) we have some freedom in our choice of generators. In order to preserve the convergence properties of the algorithm we place Condition 4.1 on the choice of G k. Condition 4.1 makes use of the quantity κ(g), which appears in [13, (2.1)] and is a generalization of that given in [11, (3.10)]. For any finite set of vectors G define v T d κ(g) = inf max, where K is the cone generated by G. (4.1) v R n d G [v] [v] K 0 K d We place the following condition on the set of search directions G k. Condition 4.1. There exists a constant κ min > 0, independent of k, such that the following holds. For every k, G k generates T(x k,ε k ); if G k {0}, then κ(g k ) κ min Determining the lengths of steps. The step-length control parameter k regulates the steps tried along the search directions in D k and prevents them from becoming either too long or too short. The trial point associated with any d D k is where (d) is x k + (d)d, (d) = max { [0, k ] x k + d Ω }. (4.2) This construction ensures that the full step (with respect to k ) is taken if the resulting trial point is feasible. Otherwise, the trial point is found by taking the longest feasible step possible from x k along d. With the directions in G k normalized, the value of ε k used to determine the current working set is derived from the current value of k according to ε k = min{ε max, k } for some ε max > 0. Prudent choices of ε max and 0 mean that in practice it is typically the case that ε k = k Ascertaining success. A trial point is considered acceptable only if it satisfies the sufficient decrease condition f(x k + (d)d) < f(x k ) ρ k ( k ). (4.3) An appropriate sufficient decrease condition allows the possibility of taking exact steps to the boundary of the feasible region [16, 13]. Here we use ρ k ( ) = α k 2. (4.4) 8

10 We require the sequence {α k } be bounded away from zero; i.e., there exists α min > 0 such that α min α k for all k. The choice of α k in the work reported here is α k = α max { f typ, f(x k ) }, (4.5) where α > 0 is fixed (e.g., α = ) and f typ 0 is some fixed value that reflects the typical magnitude of the objective, given feasible inputs. In earlier work [16, 13, 14], the decrease condition was independent of k, whereas here it is not. We introduce ρ k ( ), as in (4.4), so that the step acceptance rule (4.3) scales automatically with the magnitude of the objective. In Section 4.7 we prove that this change does not adversely affect the key stationarity results from [13]. To ensure asymptotic success, if there is no direction d D k for which (4.3) is satisfied, then k+1 θ k k for some choice of θ k satisfying 0 < θ k θ max < 1. Otherwise, k+1 φ k k for some choice of φ k satisfying φ k 1. To keep things simple, in both the examples shown here and in our implementation, we use θ k = 1 2 and φ k = 1 for all k Tangentially unsuccessful iterations. We distinguish three types of iterations. This is a departure from previous work [13, 14] in which the iterations are designated as either successful when a trial point satisfying (4.3) is found, or unsuccessful, when no trial point satisfying (4.3) is found. Let S and U denote the sets of successful and unsuccessful iterations, respectively; then the set of all iterations is S U with S U =. We want to track more closely a lack of decrease along the core search directions contained in the set G k, so we introduce a new set U T, for tangentially unsuccessful iterations. A tangentially unsuccessful iteration is one at which we know that (4.3) is not satisfied for any d G k. We call the iteration tangentially unsuccessful since no decrease is realized by any of the trial steps defined by the generators of the ε- tangent cone. Thus, any unsuccessful step is also tangentially unsuccessful, since at an unsuccessful step no direction in D k = G k H k is found to satisfy (4.3). However, a tangentially unsuccessful step is not necessarily unsuccessful, since steps along the directions in G k may be unsuccessful but a step along one of the extra search directions in H k is successful. To summarize, we have U U T but, possibly, S U T Illustration. In Figure 4.2 we illustrate several iterations of one variant of linearly constrained GSS applied to a simple example. This example will be used later to give some insight into how GSS can be accelerated by active set identification. The objective in Figure 4.2 is a modification of the two-dimensional Broyden tridiagonal function [1, 17]. Its level sets are indicated in the background in Figure 4.2. We have introduced two linear inequality constraints, both of which are active at the KKT point, which is indicated by the star in Figure 4.2(a). Figure 4.1 provides a legend. constraint KKT point ε-ball to identify working set current iterate x k constraint in working set trial point x k + k d k, d k G k direction defined by dk G k infeasible trial point x k + k d k, d k H k direction defined by dk H k feasible trial point x k + (d k )d k, d k H k direction from iteration k 1 feasible trial point from iteration k 1 Fig Legend for Figures At the first iteration, illustrated in Figure 4.2(a), we choose G 0 = {e 1, e 1, e 2 }, where e i, i {1,2} is a unit coordinate vector. The choice of e 1 and e 1 is dictated 9

11 Ω (a) k = 0. contains one constraint. Move East; k S. Ω The working set (b) k = 1. New working set. Change set of search directions. Move North; k S and k U T. Ω (c) k = 2. Same working set. Keep set of search directions. No decrease; halve k ; k U and k U T. Ω (d) k = 3. Same working set. Keep set of search directions. No decrease; halve k ; k U and k U T. Ω (e) k = 4. Same working set. Keep set of search directions. No decrease; halve k ; k U and k U T. Ω (f) k = 5. New working set (with respect to k = 4). Change set of search directions. Move East; k S. Fig A version of linearly constrained GSS applied to the modified Broyden tridiagonal function augmented with two linear inequality constraints. At each iteration, a constraint in the working set is indicated with a dashed line. by the single constraint in the working set. To ensure that G 0 generates T(x 0,ε 0 ) we require a third vector; e 2 is a sensible choice (see [14, section 5]). We also elect to include the vector e 2, which generates N(x 0,ε 0 ), since it allows the search to move toward the constraint contained in the working set. Thus, H 0 = {e 2 }. A full step of length 0 along e 2 is not feasible, so we reduce the length of the step so we stop at the boundary, leading to a step of the form (e 2 )e 2. The step to the East satisfies (4.3), so this trial point becomes the next iterate and we assign k = 0 to S. At the next iteration, illustrated in Figure 4.2(b), the second constraint enters the working set of constraints. This gives us a new core set G 1 containing only two vectors. However, none of the trial steps defined by the vectors in G 1 improve upon the value of the objective at the current iterate. Again, H 1 comprises a set of generators for the ε-normal cone. By considering the feasible steps along the directions in H 1 we see the most improvement by taking the step to the North, so that is the step we accept. We assign k = 1 to both S and U T since, while the iteration was a success, no improvement was found along the directions defined by the vectors in G 1. At the next iteration, illustrated in Figure 4.2(c), the working set remains unchanged, so we do not change the set of search directions: G 2 G 1 and H 2 H 1. 10

12 Once again, the full steps of length 2 along the vectors contained in the core set G 2 are feasible but do not yield improvement. Moreover, there are no feasible steps along the two vectors in H 2. Since no feasible decrease has been found, the iteration is unsuccessful; the current iterate is unchanged, k = 2 is assigned to the set U as well as to the set U T, and The same is true in the next two iterations, illustrated in Figures 4.2(d) 4.2(e) At iteration k = 5, illustrated in Figure 4.2(f), the reductions in k finally pay off. Now ε 5 = 5 is sufficiently small that the second constraint is dropped from the working set. Then it is possible to take a feasible step of length 5 along the vector e 1 G 5. The iteration is successful and k = 5 is assigned to the set S The algorithm. Algorithm 4.1 restates Algorithm 5.1 from [13]. We assume that the search directions in G k have been normalized, though this simplification is not essential to the results here. We use (4.3) as our new sufficient decrease criterion. We also now keep track of tangentially unsuccessful iterations Stationarity results. Next we examine the relevant convergence properties of Algorithm 4.1. The results in this section and their proofs are extensions of results in [11, 13]. They are key to active set identification for GSS. The following theorem, similar to [13, Theorem 6.3] shows that for the subsequence of iterations k U T, Algorithm 4.1 bounds the size of the projection of f(x k ) onto T(x k,ε k ) as a function of the step-length control parameter k. The two modifications introduced in Algorithm 4.1 mean that Theorem 4.2 differs from [13, Theorem 6.3] in two respects. First, we have replaced ρ( k ) with a ρ k ( k ) that does not satisfy the requirements on ρ( ) found in [13, Condition 4]. Second, the bound (4.8) is shown to hold for the subsequence k U T, not just the subsequence k U (recall that U U T ). We also prove a slightly more general result using a modulus of continuity for f: given x Ω and r > 0, ω(x,r) = max { f(y) f(x) y Ω, y x r }. (4.7) Theorem 4.2. Suppose that f is continuous on Ω. Consider the iterates produced by Algorithm 4.1. If k U T and ε k satisfies ε k = k, then [ f(xk )] T(xk 1,ε k ) (ω(x k, k ) + α k k ), (4.8) κ min where κ min is from Condition 4.1 and α k is from (4.5). Proof. Clearly, we need only consider the case when [ f(x k )] T(xk,ε k ) 0. Condition 4.1 guarantees a set G k that generates T(x k,ε k ). Recall that Algorithm 4.1 requires d = 1 for all d G k. Thus, by (4.1) with K = T(x k,ε k ) and v = f(x k ), there exists some ˆd G k such that κ min [ f(xk )] T(xk,ε k ) f(xk ) T ˆd. (4.9) The fact that k U T ensures us that f(x k + (d)d) f(x k ) ρ k ( k ) for all d G k. By assumption ε k = k and Algorithm 4.1 requires d = 1 for all d G k, so k d = ε k for all d G k. From [13, Proposition 2.2] we know that if x Ω and v T(x,ε) satisfies v ε, then x + v Ω. Therefore x k + k d Ω for all d G k. This, together with (4.2), ensures f(x k + k d) f(x k ) + ρ k ( k ) 0 for all d G k. (4.10) 11

13 Step 0. Initialization. Let x 0 Ω be the initial iterate. Let tol > 0 be the tolerance used to test for convergence. Let 0 > tol be the initial value of the step-length control parameter. Let ε max > tol be the maximum distance used to identify nearby constraints (ε max = + is permissible). Let α > 0. Let ρ k ( k ) = α max f typ, f(x k ) 2 k, where f typ 0 is some value that reflects the typical magnitude of the objective for x Ω (use f typ = 1 as a default). Let 0 < θ max < 1. Step 1. Choose search directions. Let ε k = min{ε max, k }. Choose a set of search directions D k = G k H k satisfying Condition 4.1. Normalize the core search directions in G k so that d = 1 for all d G k. Step 2. Look for decrease. Consider trial steps of the form x k + (d)d for d D k, where (d) is as defined in (4.2), until either finding a d k D k that satisfies (4.3) (a successful iteration) or determining that (4.3) is not satisfied for any d G k (an unsuccessful iteration). Step 3. Successful Iteration. If there exists d k D k such that (4.3) holds, then: Set x k+1 = x k + (d k )d k. Set k+1 = φ k k with φ k 1. Set S = S {k}. If during Step 2 it was determined that (4.3) is not satisfied for any d G k, set U T = U T {k}. Step 4. Unsuccessful Iteration. Otherwise, Set x k+1 = x k. Set k+1 = θ k k with 0 < θ k θ max < 1. (4.6) Set U = U {k}. Set U T = U T {k}. If k+1 < tol, then terminate. Step 5. Advance. Increment k by one and go to Step 1. Algorithm 4.1: A linearly constrained GSS algorithm Meanwhile, the mean value theorem says that for some σ k (0,1), f(x k + k d) f(x k ) = k f(x k + σ k k d) T d for all d G k. Combining this with (4.10) yields 0 k f(x k + σ k k d) T d + ρ k ( k ) for all d G k. 12

14 Dividing through by k and subtracting f(x k ) T d from both sides leads to f(x k ) T d ( f(x k + σ k k d) f(x k )) T d + ρ k( k ) k for all d G k. Since 0 < σ k < 1 and d = 1 for all d G k, from (4.7) we obtain f(x k ) T d ω(x k, k ) + ρ k( k ) k for all d G k. (4.11) Since (4.11) holds for all d G k, (4.9) tells us that for some ˆd G k, κ min [ f(x k )] T(xk,ε k ) Using (4.4) and (4.5) finishes the proof: ω(x k, k ) + ρ k( k ) k. [ f(x k )] T(xk,ε k ) 1 ( ω(xk, k ) + α max { f typ, f(x k ) } ) k. κ min If f is Lipschitz continuous with constant M on Ω then ω(x,r) Mr and (4.8) reduces to [ f(xk )] T(xk 1,ε k ) (M + α k ) k. (4.12) κ min From (4.12) it is easy to see that the choice of α k in (4.5) yields a bound on the relative size of the projection of f(x k ) onto T(x k,ε k ): [ f(xk )] T(xk 1 ( {,ε k ) M + α max ftyp, f(x k ) }) k. κ min Next we have the following result, which strengthens [11, Theorem 3.4] while also incorporating the ρ k ( ) we have introduced here as a replacement for the ρ( ) used in [11, 13]. Note the mild conditions placed on φ k and k. Theorem 4.3. Consider the iterates produced by Algorithm 4.1. Suppose that for all k, 1 φ k φ max and k max for some max 0. Then either lim k k = 0 or lim k f(x k ) =. Proof. Suppose lim k k 0. Observe that there must be at least one successful iteration, since otherwise every iteration would be unsuccessful and the updating rule (4.6) would ensure that lim k k = 0. Let k be the first successful iteration. Let = lim sup k k. Since lim k k 0 we know that > 0. Define Q = { k k k, k /2 } Q = { k k k, k /(2φ max ) }. We claim that there are infinitely many successful iterations in Q. To prove the claim, note that Q is infinite. If there are infinitely many successful iterations in Q then there is nothing to prove since Q Q. On the other hand, suppose there are infinitely many unsuccessful iterations in Q. Let k be any such iteration, and let m be such that iteration k m 1 was the last successful iteration preceding k (we know there must be such an iteration since k k). Since iteration 13

15 k m 1 was successful and iterations k m to k were unsuccessful, the update rules tell us that k = θ k 1 θ k 2 θ k m φ k m 1 k m 1 k m 1 = (θ k 1 θ k 2 θ k m φ k m 1 ) 1 k. From (4.6), the assumptions placed on φ k and k, and the fact that k Q we obtain max k m 1 1 (θ max ) m. (4.13) 2φ max 2φ max Since k m 1 k, we conclude that k m 1 Q. Moreover, (θ max ) m 2φ max max. Using the fact log θ max < 0 because θ max < 1 we obtain log 2φ m m = max max. log θ max Note that m does not depend on k. Thus, for each of the infinitely many unsuccessful iterations k Q there is a successful iteration in Q that precedes k by no more than m + 1 iterations, so there must be infinitely many successful iterations in Q. Next, let ρ = α ftyp ( /(2φ max )) 2. Then ρ > 0. If k Q is a successful iteration, it follows from the step acceptance rule (4.3), the definitions (4.4) and (4.5), and the definition of Q that f(x k+1 ) f(x k ) < ρ k ( k ) < ρ < 0. Meanwhile, for all other iterations (successful iterations not in Q and unsuccessful iterations) we have f(x k+1 ) f(x k ). Since there are infinitely many successful iterations in Q, it follows that lim k f(x k ) =, and the theorem is proved. Theorems 4.2 and 4.3 then yield the following. Theorem 4.4. Let f : R n R be continuously differentiable on Ω, and let {x k } be the sequence of iterates produced by the generating set search method in Algorithm 4.1 with ε k satisfying ε k = k. If {x k } is bounded, or if f is bounded below on Ω and f is uniformly continuous on Ω, then lim [ f(x k )] T(xk,ε k ) = 0. k U T 5. Active set identification properties of GSS. It is straightforward to specify Algorithm 4.1 in a way that ensures that the conclusions of Theorems hold. For instance, the usual choice of φ k = 1 for all k ensures that k 0 for all k. In this case the choice ε max 0 ensures that ε k = k for all k since { k } is then monotonically decreasing. Theorems then allow us to conclude that the active set identification results of Section 3 hold for the subsequence K = U T of tangentially unsuccessful iterations produced by Algorithm 4.1. Thus, for problem (1.1) GSS has active set identification properties like those of gradient projection even though GSS does not explicitly rely on derivatives of the objective. 14

16 When the sequence of tangentially unsuccessful iterates {x k }, k U T, converges to a KKT point x, then Corollaries 3.4 and 3.5 allow us to make inferences about A(x ), the constraints binding at x, from the working sets I(x k,ε k ) at the tangentially unsuccessful iterates x k. The even stronger results of Corollary 3.9 hold for the auxiliary sequence ˆx k, k U T, defined in (3.3). In particular, Corollary 3.9 says that in a finite number of iterations ˆx k will identify the face containing x under mild nondegeneracy assumptions. We next turn to strategies for using the results of Section 3 to accelerate GSS algorithms. We begin by illustrating what can slow progress towards a solution if the search does not make better use of information about the working set I(x k,ε k ). In Figure 4.2 the GSS algorithm makes slow and steady progress toward the KKT point. Figure 5.1 shows just how slow the progress might be by illustrating the next three iterations, magnified so that the process is easier to see. Notice the pattern that emerges: so long as the working set contains both the constraints that are active at the solution, no progress is made since there is no search direction that yields feasible improvement. But as soon as the working set drops back down to a single constraint, a feasible step to the East is possible which is exactly what is needed to move the current iterate toward the solution at the vertex formed by the two constraints. This behavior is especially contrary, since the search fails to make progress at those steps where it actually has information that suggests where the solution lies. Ω Ω Ω (a) k = 6. New working set (w.r.t. k = 5). Change set of search directions. No decrease; halve k ; k U and k U T. (b) k = 7. Same working set. Keep set of search directions. No decrease; halve k ; k U and k U T. (c) k = 8. New working set (w.r.t. k = 7). Change set of search directions. Move East; k S. Fig A continuation (at a finer scale) of the example given in Figure 4.2. If k is sufficiently large, then at a tangentially unsuccessful step f(x k ) lies primarily in the direction of N(x k,ε k ). This suggests we include the generators of N(x k,ε k ) in the set of additional search directions H k [14], just as illustrated in Figures However, in the preceding example there are no feasible steps along the generators of N(x k,ε k ) once k > 1. In general, steps along generators of N(x k,ε k ) usually capture only one constraint in A(x ) at a time, as seen in Figure 4.2(b). Typically, active set strategies that change one constraint in the working set at a time are not as efficient as those that allow multiple constraints to change [9]. Our resolution is to make use of ˆx k from (3.3). We now outline some active set strategies for GSS that make use of the sequence {ˆx k } and its properties described in Corollary 3.9, as well as other information we can infer from the working set I(x k,ε k ). 15

17 5.1. Jumping onto the face identified by the working set. If at iteration k we find that (4.3) is not satisfied by any d G k, then the step is tangentially unsuccessful. We then could try the point ˆx k defined by (3.3). If this trial point existed and satisfied (4.3), we could accept it. If it was not acceptable, we still could track A(ˆx k ) as an estimate of A(x ). The rationale for the latter is that we may actually have A(ˆx k ) = A(x ) but might not have taken the right step into this face. A more aggressive strategy, which we have found to be more effective in our tests, is to try the step ˆx k for all iterations, tangentially unsuccessful or not. If this trial point exists and satisfies the acceptance criterion (4.3), we immediately accept it. This strategy is consistent with our stationarity results since it is equivalent to including the vector for this step in H k. In this aggressive strategy we venture that if constraints are showing up in the working set, it may be because they are active at the solution. We can only guarantee that this is true asymptotically, and then only for the subsequence k U T. But in our tests we found that GSS had surprisingly few iterations that were tangentially unsuccessful (i.e., k U T ) while also being successful (i.e., k S). In general, S U T =. Moreover, to ascertain that an iteration is tangentially unsuccessful we must verify that (4.3) is not satisfied for any d G k, which requires G k evaluations of the objective. This can be expensive when evaluation of the objective is costly. On the other hand, determining whether the projection onto a face will be a success requires only one evaluation of the objective. If the step is successful, the work for the iteration is done; otherwise, we fall back on the usual strategy after the expense of this single speculative objective evaluation. We accept the risk knowing that this simple strategy can be fooled, but confident in the knowledge that asymptotically it will work consistently in our favor. It is easy to see that the aggressive strategy might not succeed in the initial iterations if we start far from a solution. For instance, consider a problem with only lower and upper bounds on the variables, assume that x is the vertex where all the upper bounds are active and that x 0 is a feasible point near the vertex where all the lower bounds are active. Choose 0 so that all of the lower bounds, but none of the upper bounds, are contained in the working set I(x 0,ε 0 ). Then at k = 0, the projection will produce the vertex defined by the lower bounds and most likely the objective function at that point will not satisfy (4.3). However, once the upper bounds begin to enter the working set, Corollary 3.9 tells us this strategy will succeed Restricting the search to the face identified by the working set. Corollary 3.9 tells us that a step to ˆx k is a step into a face of Ω that contains x once k is sufficiently large. Once in such a face we have the option of giving priority to searching inside that face. Given an iteration k, let j be the last iteration preceding k that is known to be tangentially unsuccessful. If the current working set I(x k,ε k ) satisfies I(x k,ε k ) I(x j,ε j ), then we first search in the face containing x k. If one of these trial steps satisfies (4.3), then we accept it and declare the iteration successful. However, if none of the steps restricted to the face satisfy (4.3), then we continue the search using the directions remaining in G k and/or H k, which may possibly move the search off of the current face. Figure 5.2 illustrates this strategy. Rather than automatically attempting to evaluate f at all four of the trial points defined by k and the set of directions D k, we start by evaluating the object at trial steps in the face identified by the single constraint in the working set. So long as one of these two trial points is successful, we do not consider the trial steps defined by the two directions normal to this constraint. 16

18 The reasoning here is that, in general, the performance of direct search methods suffers as the number of optimization variables increases. If we can, with reasonable confidence, restrict the search to a face, then the reduced dimension of the problem should make the search more efficient. Again, this is a calculated risk. If the working set is a reasonable approximation of the active set, then searching first in the face for improvement makes sense. Furthermore, as long as the iterations remain successful, we do not risk premature convergence so long as once we cannot find a successful step in the face we return to a search in the full space. Ω Ω Ω Fig Giving priority to searching along directions inside a face believed to contain x with used to denote the steps within the face defined by the constraint in the working set. We hold the third generator for T(x k, ε k ) in reserve in G k in case it is needed for the search to progress, though that is not the case for this particular example. Incidentally, Figure 5.2 illustrates why the working sets at tangentially successful iterations (i.e., k U T ) are not guaranteed to coincide asymptotically with the set of constraints that are active at x. This can also be seen in Figures by considering the subsequence k U T = {0,1,5,8,...}, whereas if we consider the subsequence k U T = {1,2,3,4,6,7,...} it is clear that the working sets for this subsequence of iterations corresponds to A(x ) Identifying vertex solutions. Another benefit of computing ˆx k is that Corollary 3.9 says that if {x k } is converging to a KKT point x which is a vertex, then {x k } converges to x in a finite number of iterations. If we find a successful step to a vertex and this is followed by an unbroken sequence of unsuccessful iterations, then this suggests that we have arrived at a KKT point. This is attractive since it gives us another mechanism for terminating the search. Various analytical results, of which Theorem 4.2 is an example, bound appropriately chosen measures of stationarity in terms of the step-length control parameter k [6, 15, 13]. This makes the magnitude of k a practical measure of stationarity; hence our use of the test θ k k < tol, for some tol > 0, for termination in Step 4 in Algorithm 4.1. However, we only decrease at the end of an unsuccessful iteration. Furthermore, to determine that an iteration is unsuccessful requires verifying that (4.3) is not satisfied for any d G k. Experience has shown that while GSS methods make good progress toward the solution, they can be slow and costly in verifying that a solution has been found precisely because they lack explicit derivative information. In the case of vertex solutions the strategy outlined gives us another stopping rule, one that might require far fewer objective evaluations, as becomes clear when considering the example illustrated in Figure

19 5.4. Illustration of the active set strategies. Returning to the example used in Figure 4.2, Figure 5.3 illustrates what happens if the strategies in Sections are incorporated into Algorithm 4.1. We demonstrate in the next section that these same strategies can be successfully applied to far more difficult problems from the CUTEr test suite. Ω (a) k = 0. New working set. Try but reject the speculative step ˆx k. Move East; k S. Ω (b) k = 1. New working set. Try and accept the speculative step ˆx k ; k S and k U T. Ω (c) k = 2. Same working set. No decrease; halve k ; k U and k U T. Ω (d) k = 3. Same working set. No decrease; halve k ; k U and k U T. Ω (e) k = 4. Same working set. No decrease; halve k ; k U and k U T. Ω (f) k = 5. Same working set. No decrease; halve k ; k U and k U T. Possibly stop. Fig The effect of employing the strategies in Sections on the example used in Figure 4.2. The speculative step ˆx k defined in (3.3) is denoted by. For any k > 1, Corollary 3.9 supports the conclusion for this example that x k = ˆx k = x since x k is at a vertex defined by the constraints in I(x k, ε k ), I(x k, ε k ) remains the same, and k U T. 6. Numerical illustration. We demonstrate the effect of using the active set strategies outlined in Section 5 on eight problems drawn from the CUTEr test suite [10]. Basic characteristics of these eight problems are given in Table 6.1. Seven of the eight problems must deal with degeneracy of the linear constraints in the working set at least once during the course of the search; to do so we used the basic strategy outlined in [14]. We included the problem bleachng, even though it only has bounds on the variables, because it is representative of the type of simulation-based problems to which direct search methods are typically applied [11]. In the case of bleachng, the evaluation of the objective involves the numerical integration of differential equations. The second derivatives of the objective are not available, computing the objective is relatively expensive, and the values obtained are only moderately accurate. 18

1. Introduction. We consider ways to implement direct search methods for solving linearly constrained nonlinear optimization problems of the form

1. Introduction. We consider ways to implement direct search methods for solving linearly constrained nonlinear optimization problems of the form IMPLEMENTING GENERATING SET SEARCH METHODS FOR LINEARLY CONSTRAINED MINIMIZATION ROBERT MICHAEL LEWIS, ANNE SHEPHERD, AND VIRGINIA TORCZON Abstract. We discuss an implementation of a derivative-free generating

More information

Stationarity results for generating set search for linearly constrained optimization

Stationarity results for generating set search for linearly constrained optimization Stationarity results for generating set search for linearly constrained optimization Virginia Torczon, College of William & Mary Collaborators: Tammy Kolda, Sandia National Laboratories Robert Michael

More information

IMPLEMENTING GENERATING SET SEARCH METHODS FOR LINEARLY CONSTRAINED MINIMIZATION

IMPLEMENTING GENERATING SET SEARCH METHODS FOR LINEARLY CONSTRAINED MINIMIZATION Technical Report Department of Computer Science WM CS 2005 01 College of William & Mary Revised July 18, 2005 Williamsburg, Virginia 23187 IMPLEMENTING GENERATING SET SEARCH METHODS FOR LINEARLY CONSTRAINED

More information

STATIONARITY RESULTS FOR GENERATING SET SEARCH FOR LINEARLY CONSTRAINED OPTIMIZATION

STATIONARITY RESULTS FOR GENERATING SET SEARCH FOR LINEARLY CONSTRAINED OPTIMIZATION STATIONARITY RESULTS FOR GENERATING SET SEARCH FOR LINEARLY CONSTRAINED OPTIMIZATION TAMARA G. KOLDA, ROBERT MICHAEL LEWIS, AND VIRGINIA TORCZON Abstract. We present a new generating set search (GSS) approach

More information

PATTERN SEARCH METHODS FOR LINEARLY CONSTRAINED MINIMIZATION

PATTERN SEARCH METHODS FOR LINEARLY CONSTRAINED MINIMIZATION PATTERN SEARCH METHODS FOR LINEARLY CONSTRAINED MINIMIZATION ROBERT MICHAEL LEWIS AND VIRGINIA TORCZON Abstract. We extend pattern search methods to linearly constrained minimization. We develop a general

More information

College of William & Mary Department of Computer Science

College of William & Mary Department of Computer Science Technical Report WM-CS-2010-01 College of William & Mary Department of Computer Science WM-CS-2010-01 A Direct Search Approach to Nonlinear Programming Problems Using an Augmented Lagrangian Method with

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

Date: July 5, Contents

Date: July 5, Contents 2 Lagrange Multipliers Date: July 5, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 14 2.3. Informative Lagrange Multipliers...........

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces Introduction to Optimization Techniques Nonlinear Optimization in Function Spaces X : T : Gateaux and Fréchet Differentials Gateaux and Fréchet Differentials a vector space, Y : a normed space transformation

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Constraint qualifications for nonlinear programming

Constraint qualifications for nonlinear programming Constraint qualifications for nonlinear programming Consider the standard nonlinear program min f (x) s.t. g i (x) 0 i = 1,..., m, h j (x) = 0 1 = 1,..., p, (NLP) with continuously differentiable functions

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Numerical Optimization

Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

More information

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

Asynchronous parallel generating set search for linearly-constrained optimization

Asynchronous parallel generating set search for linearly-constrained optimization SANDIA REPORT SAND2006-4621 Unlimited Release Printed August 2006 Asynchronous parallel generating set search for linearly-constrained optimization Joshua D. Griffin, Tamara G. Kolda, and Robert Michael

More information

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM H. E. Krogstad, IMF, Spring 2012 Karush-Kuhn-Tucker (KKT) Theorem is the most central theorem in constrained optimization, and since the proof is scattered

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

Global convergence of trust-region algorithms for constrained minimization without derivatives

Global convergence of trust-region algorithms for constrained minimization without derivatives Global convergence of trust-region algorithms for constrained minimization without derivatives P.D. Conejo E.W. Karas A.A. Ribeiro L.G. Pedroso M. Sachine September 27, 2012 Abstract In this work we propose

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

A generating set direct search augmented Lagrangian algorithm for optimization with a combination of general and linear constraints

A generating set direct search augmented Lagrangian algorithm for optimization with a combination of general and linear constraints SANDIA REPORT SAND2006-5315 Unlimited Release Printed August 2006 A generating set direct search augmented Lagrangian algorithm for optimization with a combination of general and linear constraints T.

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global and local convergence results

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7 Mathematical Foundations -- Constrained Optimization Constrained Optimization An intuitive approach First Order Conditions (FOC) 7 Constraint qualifications 9 Formal statement of the FOC for a maximum

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Lecture V. Numerical Optimization

Lecture V. Numerical Optimization Lecture V Numerical Optimization Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Numerical Optimization p. 1 /19 Isomorphism I We describe minimization problems: to maximize

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

The Karush-Kuhn-Tucker (KKT) conditions

The Karush-Kuhn-Tucker (KKT) conditions The Karush-Kuhn-Tucker (KKT) conditions In this section, we will give a set of sufficient (and at most times necessary) conditions for a x to be the solution of a given convex optimization problem. These

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS Igor V. Konnov Department of Applied Mathematics, Kazan University Kazan 420008, Russia Preprint, March 2002 ISBN 951-42-6687-0 AMS classification:

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

2.3 Linear Programming

2.3 Linear Programming 2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are

More information

Key words. nonlinear programming, pattern search algorithm, derivative-free optimization, convergence analysis, second order optimality conditions

Key words. nonlinear programming, pattern search algorithm, derivative-free optimization, convergence analysis, second order optimality conditions SIAM J. OPTIM. Vol. x, No. x, pp. xxx-xxx c Paper submitted to Society for Industrial and Applied Mathematics SECOND ORDER BEHAVIOR OF PATTERN SEARCH MARK A. ABRAMSON Abstract. Previous analyses of pattern

More information

Maria Cameron. f(x) = 1 n

Maria Cameron. f(x) = 1 n Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that

More information

2.098/6.255/ Optimization Methods Practice True/False Questions

2.098/6.255/ Optimization Methods Practice True/False Questions 2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Mingbin Feng, John E. Mitchell, Jong-Shi Pang, Xin Shen, Andreas Wächter

Mingbin Feng, John E. Mitchell, Jong-Shi Pang, Xin Shen, Andreas Wächter Complementarity Formulations of l 0 -norm Optimization Problems 1 Mingbin Feng, John E. Mitchell, Jong-Shi Pang, Xin Shen, Andreas Wächter Abstract: In a number of application areas, it is desirable to

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Identifying Active Constraints via Partial Smoothness and Prox-Regularity Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,

More information

Generalized Pattern Search Algorithms : unconstrained and constrained cases

Generalized Pattern Search Algorithms : unconstrained and constrained cases IMA workshop Optimization in simulation based models Generalized Pattern Search Algorithms : unconstrained and constrained cases Mark A. Abramson Air Force Institute of Technology Charles Audet École Polytechnique

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written 11.8 Inequality Constraints 341 Because by assumption x is a regular point and L x is positive definite on M, it follows that this matrix is nonsingular (see Exercise 11). Thus, by the Implicit Function

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Implicitely and Densely Discrete Black-Box Optimization Problems

Implicitely and Densely Discrete Black-Box Optimization Problems Implicitely and Densely Discrete Black-Box Optimization Problems L. N. Vicente September 26, 2008 Abstract This paper addresses derivative-free optimization problems where the variables lie implicitly

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

The Newton Bracketing method for the minimization of convex functions subject to affine constraints

The Newton Bracketing method for the minimization of convex functions subject to affine constraints Discrete Applied Mathematics 156 (2008) 1977 1987 www.elsevier.com/locate/dam The Newton Bracketing method for the minimization of convex functions subject to affine constraints Adi Ben-Israel a, Yuri

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Implications of the Constant Rank Constraint Qualification

Implications of the Constant Rank Constraint Qualification Mathematical Programming manuscript No. (will be inserted by the editor) Implications of the Constant Rank Constraint Qualification Shu Lu Received: date / Accepted: date Abstract This paper investigates

More information

More First-Order Optimization Algorithms

More First-Order Optimization Algorithms More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

Constrained optimization

Constrained optimization Constrained optimization In general, the formulation of constrained optimization is as follows minj(w), subject to H i (w) = 0, i = 1,..., k. where J is the cost function and H i are the constraints. Lagrange

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1 October 2003 The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1 by Asuman E. Ozdaglar and Dimitri P. Bertsekas 2 Abstract We consider optimization problems with equality,

More information

Lecture 7: Semidefinite programming

Lecture 7: Semidefinite programming CS 766/QIC 820 Theory of Quantum Information (Fall 2011) Lecture 7: Semidefinite programming This lecture is on semidefinite programming, which is a powerful technique from both an analytic and computational

More information

MESH ADAPTIVE DIRECT SEARCH ALGORITHMS FOR CONSTRAINED OPTIMIZATION

MESH ADAPTIVE DIRECT SEARCH ALGORITHMS FOR CONSTRAINED OPTIMIZATION SIAM J. OPTIM. Vol. 17, No. 1, pp. 188 217 c 2006 Society for Industrial and Applied Mathematics MESH ADAPTIVE DIRECT SEARCH ALGORITHMS FOR CONSTRAINED OPTIMIZATION CHARLES AUDET AND J. E. DENNIS, JR.

More information

Barrier Method. Javier Peña Convex Optimization /36-725

Barrier Method. Javier Peña Convex Optimization /36-725 Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained

More information

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 5 Nonlinear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Key words. Pattern search, linearly constrained optimization, derivative-free optimization, degeneracy, redundancy, constraint classification

Key words. Pattern search, linearly constrained optimization, derivative-free optimization, degeneracy, redundancy, constraint classification PATTERN SEARCH METHODS FOR LINEARLY CONSTRAINED MINIMIZATION IN THE PRESENCE OF DEGENERACY OLGA A. BREZHNEVA AND J. E. DENNIS JR. Abstract. This paper deals with generalized pattern search (GPS) algorithms

More information

Nonlinear Programming and the Kuhn-Tucker Conditions

Nonlinear Programming and the Kuhn-Tucker Conditions Nonlinear Programming and the Kuhn-Tucker Conditions The Kuhn-Tucker (KT) conditions are first-order conditions for constrained optimization problems, a generalization of the first-order conditions we

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

c 1998 Society for Industrial and Applied Mathematics

c 1998 Society for Industrial and Applied Mathematics SIAM J. OPTIM. Vol. 9, No. 1, pp. 179 189 c 1998 Society for Industrial and Applied Mathematics WEAK SHARP SOLUTIONS OF VARIATIONAL INEQUALITIES PATRICE MARCOTTE AND DAOLI ZHU Abstract. In this work we

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information