Mesh adaptive direct search algorithms for mixed variable optimization

Size: px

Start display at page:

Download "Mesh adaptive direct search algorithms for mixed variable optimization"

Ginger Beatrix Short
6 years ago
Views:

1 Optim Lett (2009) 3:35 47 DOI /s ORIGINAL PAPER Mesh adaptive direct search algorithms for mixed variable optimization Mark A. Abramson Charles Audet James W. Chrissis Jennifer G. Walston Received: 17 July 2007 / Accepted: 27 May 2008 / Published online: 20 June 2008 Springer-Verlag 2008 Abstract This paper introduces a new derivative-free class of mesh adaptive direct search (MADS) algorithms for solving constrained mixed variable optimization problems, in which the variables may be continuous or categorical. This new class of algorithms, called mixed variable MADS (MV-MADS), generalizes both mixed variable pattern search (MVPS) algorithms for linearly constrained mixed variable problems and MADS algorithms for general constrained problems with only continuous variables. The convergence analysis, which makes use of the Clarke nonsmooth calculus, similarly generalizes the existing theory for both MVPS and MADS algorithms, and reasonable conditions are established for ensuring convergence of a subsequence of iterates to a suitably defined stationary point in the nonsmooth and mixed variable sense. M. A. Abramson (B) The Boeing Company, P.O. Box 3707, Mail Code 7L-21, Seattle, WA , USA Abramson.Mark@gmail.com C. Audet Département de Mathématiques et de Génie Industriel, École Polytechnique de Montréal and GERAD, C.P. 6079, Succ. Centre-ville, Montréal, QC, Canada H3C 3A7 Charles.Audet@gerad.ca J. W. Chrissis Department of Operational Sciences, Air Force Institute of Technology, 2950 Hobson Way, Wright Patterson AFB, OH 45433, USA James.Chrissis@afit.edu J. G. Walston Air Force Logistics Management Agency, AFLMA/LGY, 501 Ward Street, Maxwell AFB-Gunter Annex, AL 36114, USA Jennifer.Walston@us.af.mil

2 36 M. A. Abramson et al. Keywords Nonlinear programming Mesh adaptive direct search Mixed variables Derivative-free optimization Convergence analysis 1 Introduction In this paper, we generalize the class of mesh adaptive direct search (MADS) algorithms to mixed variable optimization problems and establish a unifying convergence theory, so that the existing theorems act as corollaries to the results presented here. This is done in a relatively straightforward manner, somewhat similar to the work of Audet and Dennis [7] in their extension of pattern search algorithms to bound constrained mixed variable problems. Mixed variable optimization problems are characterized by a combination of continuous and categorical variables, the latter being discrete variables that must take their values from a finite pre-defined list or set of categories. Categorical variables may be nonnumeric, such as color, shape, or type of material; thus, traditional approaches involving branch-and-bound for solving mixed integer nonlinear programming (MINLP) problems are not directly applicable. There are cases where a problem modeled using categorical variables may be reformulated using nonlinear programming. For example, see [1] for a non-trivial reformulation of the problem presented in [17]. Such reformulations are impossible in cases where the objective or constraint functions are provided as black-boxes. To be as general as possible, we allow for the case where changes in the categorical variable values can mean a change in problem dimensions. We denote the maximum dimensions of the continuous and discrete variables by n c and n d, respectively, and we partition each point x = (x c, x d ) into its continuous and categorical components, so that x c Ω c R nc and x d Ω d Z nd. We adopt the convention of ignoring unused variables. The problem under consideration is given by min f (x) (1) x Ω where f : Ω R { }, and the domain Ω (feasible region) is the union of continuous domains across possible discrete variable values; i.e., Ω = x d Ω d (Ω c (x d ) {x d }), with the convention that x = x c and Ω = Ω c if n d = 0. We treat the constraints by the extreme barrier approach of applying our class of algorithms, not to f, but to the barrier objective function f Ω = f + ψ Ω, where ψ Ω is the indicator function for Ω; i.e., it is zero on Ω, and infinity elsewhere. If a point x does not belong to Ω, then we set f Ω (x) =, and f is not evaluated. This is important in many practical engineering problems where f is expensive to evaluate. In fact, this approach is suitable for general set constraints, in which the only output of a constraint evaluation is a binary response to indicate feasibility or not.

3 Mesh adaptive direct search algorithms for mixed variable optimization 37 The class of MADS algorithms was introduced and analyzed by Audet and Dennis [10], as an extension of generalized pattern search (GPS) algorithms [8,18,24] for solving nonlinearly constrained problems, in the absence of categorical variables. Pattern search was first introduced and analyzed by Torczon [24] as a derivative-free class of algorithms for unconstrained problems. It was extended to bound [19] and linearly [20] constrained problems by Lewis and Torczon. In all these cases, convergence of a subsequence of iterates to a first-order stationary point was proved, provided that the objective function is continuously differentiable on a compact level set containing all the iterates. Using the Clarke nonsmooth calculus [13], Audet and Dennis [8] analyzed the class of generalized pattern search (GPS) algorithms for problems with weaker assumptions on the smoothness of the objective function, and they established a hierarchy of results, such that the previous work acts as corollaries to their own. In another work, Audet and Dennis [7] extended pattern search to mixed variable problems. The approach they used to handle the categorical variables is almost identical to that of the present work, and will be described in Sect. 2. In GPS, the algorithm systematically evaluates trial points that lie on a mesh or lattice centered at the current iterate. The mesh is defined by a finite set of positive spanning directions directions such that any vector in the space can be represented as a nonnegative linear combination of directions in the set [15]. In handling finitely many linear constraints [20], Lewis and Torczon introduce a scheme for selecting these directions so that they generate the tangent cone with respect to any nearby constraints. This scheme is vital to maintaining the algorithm s convergence properties. However, for nonlinearly constrained problems, generating the tangent cone requires the use of an infinite set of positive spanning directions, which violates GPS theory. Two key attempts have been made at overcoming this limitation, but both have serious limitations. The augmented Lagrangian approach of Lewis and Torczon [21] suffers from the same practical issues as most penalty function approaches; namely, that numerical performance is highly dependent on the choice of algorithm parameters and the problem being solved. The filter-gps algorithm of Audet and Dennis [9] works well in practice, but convergence to a stationary point is not guaranteed; in fact, there are known counter-examples [6]. The filter-gps algorithm has been extended to mixed variable problems [2,5], but these limitation remain. The class of MADS algorithms [10] was introduced as an alternative to filter-gps, but with much stronger convergence properties. In MADS, an new parameter is added that enables the algorithm to be more flexible in its choice of positive spanning directions enough so that the variable space is explored in an asymptotically dense set of directions. Under reasonable assumptions, this enables convergence to both firstorder [10] and second-order [3] stationary points in the Clarke [13] sense, depending on assumptions made about the smoothness of the objective function. The present work extends MADS to mixed variable problems in the roughly the same way that Audet and Dennis [7] did for GPS, but with convergence theory that makes use of the Clarke calculus. The paper is divided as follows. Section 2 describes the mixed variable MADS (MV-MADS) algorithm in detail, Sect. 3 contains new theoretical convergence results for the MV-MADS algorithm, and Sect. 4 offers some concluding remarks.

4 38 M. A. Abramson et al. Notation. R, Z, and N denote the set of real numbers, integers, and nonnegative integers, respectively. For any set S, int(s) denotes its interior, and cl(s) its closure. For any matrix A, the notation a A means that a is a column of A. Forx c R nc and ε>0, we denote by B ε (x c ) the open ball {y R nc : y x c <ɛ}. 2 Mixed variable MADS As in [2,5,7], local optimality is defined in terms of local neighborhoods. However, since there is no metric associated with categorical variables, the notion of a local neighborhood must be defined by the user. This is well-defined for continuous variables, but not for categorical variables; special knowledge of the underlying engineering process or physical problem may be the only guide. We can define a general local neighborhood in terms of a set-valued function N : Ω 2 Ω, where 2 Ω denotes the power set (or set of all possible subsets) of Ω. By convention, we assume that for all x Ω, the user-defined set N (x) is finite, and x N (x). As an example, one common choice of neighborhood function for integer variables is the one defined by N (x) ={y Ω : y c = x c, y d x d 1 1}. However, categorical variables may have no inherent metric, which would make this particular choice inapplicable. With this construction, the classical definition of local optimality is extended to mixed variable domains by the following definition, which is similar to one found in [7]. Definition 1 A point x = (x c, x d ) Ω is said to be a local minimizer of f on Ω with respect to the set of neighbors N (x) Ω if there exists an ε>0 such that f (x) f (v) for all v in the set Ω y N (x) ( B ε (y c ) {y d }). Each iteration k of a MADS algorithm is characterized by an optional search step, a local poll step, and an extended poll step, in which f Ω is evaluated at specified points. Every trial point must be chosen on an underlying mesh M k defined on the space of variables whose fineness or coarseness is dictated by a nonnegative scalar called the mesh size parameter m k. The goal of each iteration is to find a feasible improved mesh point; i.e., a point y M k for which f Ω (y) < f Ω (x k ), where x k Ω is the current iterate or incumbent best iterate found thus far. The mesh is purely conceptual and is never explicitly constructed; instead, mesh points are generated as necessary in the algorithm. Consistent with [5], the mesh is described as follows. For each combination i = 1, 2,...,i max of values that the discrete variables may possibly take, let D i = G i Z i be a set of positive spanning directions [15] (i.e., nonnegative linear combinations of D must span R nc ), where G i R nc n c is a nonsingular generating matrix, and Z i Z nc D i.

5 Mesh adaptive direct search algorithms for mixed variable optimization 39 The mesh M k at iteration k is formed as the direct product of Ω d with the union of a finite number of lattices in Ω c, i.e., M k = i max i=1 M i k Ωd with M i k = x V k {x c + m k Di z : z N Di } R nc, (2) where m k > 0 is the mesh size parameter and V k denotes all previously evaluated trial points at iteration k (V 0 is the set of initial trial points). Furthermore, the neighborhood function must be constructed so that all discrete neighbors lie on the current mesh; i.e., N (x k ) M k for all k. The search step allows evaluation of f Ω at any finite set of mesh points. Any strategy may be used, including none. The search step adds nothing to the convergence theory, but well-chosen search strategies, such as those that make use of surrogates, can greatly improve algorithm performance (see [4,11,12,22]). The poll step includes both a traditional MADS [10] poll with respect to the continuous variables, and an evaluation of the discrete neighbors. In the MADS poll step, a second mesh parameter p k > 0, called the poll size parameter, is introduced, which satisfies m k for all k, such that p k lim k K m k = 0 lim k K p k = 0 for any infinite subset of indices K. (3) Thus, GPS becomes the specific MADS instance in which k = p k = m k, where k is the mesh size parameter using the notation from [8]. At iteration k,letd k (x) D i 0 D denote the positive spanning set of poll directions for some x V k corresponding to the i 0 -th set of discrete variable values. The set of points generated when polling about x with respect to continuous variables is called a frame, and x is called the frame center. The formal definition given below (generalized slightly from [10]) allows more flexibility than GPS, which requires D k to be a subset of the finite set D i k. Definition 2 At iteration k, the MADS frame is defined to be the set: P k (x) ={(x c + m k d, xd ) : d D k (x)} M k, where D k (x) is a positive spanning set such that for each d D k, d = 0 can be written as a nonnegative integer combination of the directions in D:d= Du for some vector u N n D that may depend on the iteration number k the distance from the frame center x to a poll point x + m k d is bounded by a constant times the poll size parameter : m k d p k max{ d :d D} limits (as defined in Coope and Price [14]) of the normalized sets D k (x) are positive spanning sets.

6 40 M. A. Abramson et al. In [10], an instance of MADS, called LTMADS, is presented in which the closure of the cone generated by the set of normalized directions { } d k=1 d : d D k equals R n with probability one. In this case, we say that the set of poll directions is asymptotically dense in R n with probability one. Thus the poll step consists of evaluating points in P k (x k ) N (x k ).Ifthepoll step fails to generate a lower objective function value, then the extended poll step is performed around each promising point in N (x k ) whose objective function value is sufficiently close to the incumbent value. That is, for a fixed positive scalar ξ, if y N (x k ) satisfies f Ω (x k ) f Ω (y) < f Ω (x k )+ξ k for some user-specified tolerance value ξ k ξ (called the extended poll trigger), then a finite sequence of poll steps about the points {y j k }J k j=1 is performed, beginning with y0 k = y k N (x k ) and ending with z k = y J k k.theextended poll endpoint z k occurs when either f Ω (zk c + m k d, zd k )< f Ω (x k ) for some d D k (z k ), or when f Ω (x k ) f Ω (zk c+ m k d, zd k ) for all d D k(z k ). In Sect. 3, we make the common assumption that all iterates lie in a compact set, which ensures that J k is always finite, i.e., that any extended poll step generates a finite number of trial points. The set of extended poll points can be expressed as X k (ξ k ) = y k N ξ k k J k j=1 P k (y j k ), (4) where N ξ k k := {y N (x k ) : f Ω (x k ) f Ω (y) f Ω (x k )+ξ k }. In practice, the parameter ξ k is typically set as a percentage of the objective function value (but bounded away from zero), such as ξ k = max{ξ,0.05 f (x k ) }. Higher values of ξ k generate more extended polling, which is more costly, but which may lead to a better local solution since more of the design space is searched. As soon as any of the three steps is successful in finding an improved mesh point, the iteration ends, the improved mesh point becomes the new current iterate x k+1 Ω, and the mesh is either retained or coarsened. If no improved mesh point is found, then P k (x k ) is said to be a minimal frame with minimal frame center x k, the minimal frame center is retained as the current iterate (i.e., x k+1 = x k ), and the mesh is refined. Rules for refining and coarsening the mesh are the same as in [7,10]. Given a fixed rational number τ > 1 and two integers w 1and w + 0, the mesh size parameter m k is updated according to the rule, m k+1 = τ w k m k {0, 1,...,w + } if an improved mesh for some w k point is found {w,w + 1,..., 1} otherwise. (5) The class of MV-MADS algorithms is stated formally in Fig. 1.

7 Mesh adaptive direct search algorithms for mixed variable optimization 41 Fig. 1 A general MV-MADS algorithm 3 Convergence results In this section, we establish convergence results for the new MV-MADS algorithm. Many of these results will appear very similar to results from either the MADS algorithm [10] or mixed variable pattern search [7]. Before we present our main results, we review some definitions, assumptions, and preliminary results, followed by a subsection that extends some Clarke calculus ideas to mixed variables. 3.1 Preliminaries The convergence analysis relies on the following standard assumptions, which are essentially identical to those made in [2,5,7]. A1. An initial point x 0 with f Ω (x 0 )< is available. A2. All iterates {x k } generated by MV-MADS lie in a compact set. A3. The set of discrete neighbors N (x k ) lies on the mesh M k. Under these assumptions, the following results are obtained by proofs that are identical to those found in [2,7] for mixed variable GPS: lim inf k + p k = lim inf k + m k = 0; there exists a refining subsequence {x k } k K of minimal frame centers for which there are limit points ˆx = lim x k, ŷ = lim y k, and ẑ = (ẑ c, ŷ d ) = lim z k, where k K k K k K each z k Ω is the endpoint of the extended poll step initiated at y k N (x k ), and lim p k K k = 0. The notation used in identifying these limit points will be retained and used throughout the remainder of this paper. Some of the results that follow require the additional assumption that ŷ N ( ˆx). For the main results, the following four definitions [13,16,23] are needed. They have been adapted to our context, where only a subset of the variables are continuous. The standard definitions follow when all variables are continuous, i.e., when n d = 0 and x = x c.

8 42 M. A. Abramson et al. Definition 3 A vector v R nc is said to be a hypertangent vector to the continuous variables of the set Ω at the point x = (x c, x d ) Ω if there exists a scalar ε>0 such that (y + tw, x d ) Ω for all y B ε (x c ) with (y, x d ) Ω, w B ε (v) and 0 < t <ε. The set TΩ H (x) of all hypertangent vectors to Ω at x is called the hypertangent cone to Ω at x. Definition 4 A vector v R nc is said to be a Clarke tangent vector to the continuous variables of the set Ω at the point x = (x c, x d ) cl(ω) if for every sequence {y k } that converges to x c with (y k, x d ) Ω and for every sequence of positive real numbers {t k } converging to zero, there exists a sequence of vectors {w k } converging to v such that (y k + t k w k, x d ) Ω. The set TΩ Cl (x) of all Clarke tangent vectors to Ω at x is called the Clarke tangent cone to Ω at x. Definition 5 A vector v R nc is said to be a tangent vector to the continuous variables of the set Ω at the point x = (x c, x d ) cl(ω) if there exists a sequence {y k } that converges to x c with (y k, x d ) Ω and a sequence of positive real numbers {λ k } for which v = lim k λ k (y k x c ). The set TΩ Co (x) of all tangent vectors to Ω at x is called the contingent coneto Ω at x. Definition 6 The set Ω is said to be regular at x if TΩ Cl (x) = T Co Ω (x). 3.2 Extension of the Clarke calculus to mixed variables For the results of this section, we make use of a generalization [16] of the Clarke [13] directional derivative, in which function evaluations are restricted to points in the domain. Furthermore, we restrict the notions of generalized directional derivatives and gradient to the subspace of continuous variables. The generalized directional derivative of a locally Lipschitz function f at x = (x c, x d ) Ω in the direction v R nc is defined by f (x; v) := lim sup y x c,(y, x d ) Ω t 0, (y + tv, x d ) Ω f (y + tv, x d ) f (y, x d ). (6) t Furthermore, it is shown in [10] that if T H Ω f (x; v) = (x) is not empty and v T Cl Ω (x), then lim u v, u T Ω (x) f (x; u). (7) We similarly generalize other derivative ideas. We denote by f (x) R nc and f (x) R nc, respectively, the gradient and generalized gradient of the function f at x = (x c, x d ) Ω with respect to the continuous variables x c while holding the

9 Mesh adaptive direct search algorithms for mixed variable optimization 43 categorical variables x d constant. In particular, the generalized gradient of f at x (see [13]) with respect to the continuous variables is defined by f (x) := { s R nc : f (x; v) v T s for all v R nc}. The function f is said to be strictly differentiable at x with respect to the continuous variables, if the generalized gradient of f with respect to the continuous variables at x is a singleton; i.e., f (x) ={ f (x)}. The final definition, which is adapted from [10] for the mixed variable case, provides some nonsmooth terminology for stationarity. Definition 7 Let f be Lipschitz near x Ω. Then x is said to be a Clarke, or contingent, stationary point of f over Ω with respect to the continuous variables if f (x ; v) 0 for every direction v in the Clarke tangent cone, or contingent cone, to Ω at x, respectively. In addition, x is said to be a Clarke, or contingent, KKT stationary point of f over Ω if f (x ) exists and belongs to the polar of the Clarke tangent cone, or contingent cone, to Ω at x, respectively. If Ω c (x d ) = R nc or x c lies in the relative interior of Ω c (x d ), then a stationary point as described by Definition 7 meets the condition that f (x ; v) 0 for all v R nc. This is equivalent to 0 f (x ). 3.3 Main results Our main convergence results consist of four theorems, all of which are generalizations of similar results from MADS [10] or mixed variable pattern search [2,5,7]. The first result, which is an extension of Theorem 3.12 in [10] to mixed variables, establishes a notion of directional stationarity at certain limit points. The second result ensures local optimality with respect to the set of discrete neighbors. The remaining two results establish Clarke-based stationarity in a mixed variable sense. Theorem 8 Let ŵ be the limit point of a refining subsequence or the associated subsequence of extended poll endpoints, and let v be a refining direction in the hypertangent cone TΩ H (ŵ). If f is Lipschitz at ŵ with respect to the continuous variables, then f (ŵ; v) 0. Proof Let {w k } k K be a refining subsequence converging to ŵ = (ŵ c, ŵ d ). Without any loss of generality, we may assume that w k = (wk c, ŵd ) for all k K. In accordance with Definition 3.2 in [10], let v = lim k k L d k T Ω H (ŵ) be a refining direction d for ŵ, where d k D k for all k L and L is some subset of K. Since k p converges to zero and Definition 2 ensures that { m k d k } k L is bounded above, it follows from (3) that { m k d k } k L must also converge to zero. Thus, it

10 44 M. A. Abramson et al. follows from (6) and (7) that f (ŵ; v) = lim sup y ŵ c,(y,w d ) Ω t 0, (y + tu,w d ) Ω u v, u T H Ω (ŵ) lim sup k L = lim sup k L f f ( y + tu, ŵ d) f (y, ŵ d ) ( w c k + m k d k d k d k, ŵd) f (w c k, ŵd ) t m k d k f ( w k + m k d k, ŵ d) f (w k ) m k d k 0. The last inequality holds because (w k + m k d k, ŵ d ) Ω and f (w k + m k d k, ŵ d ) f (w k ) (since w k is a minimal frame center) for all sufficiently large k L. The next result gives sufficient conditions under which ˆx is a local minimizer with respect to its discrete neighbors. The proof is essentially the same as those given in [2,5]. Theorem 9 If f is lower semi-continuous at ˆx and upper semi-continuous at ŷ N ( ˆx) with respect to the continuous variables, then f ( ˆx) f (ŷ). Proof Since k K ensures that {x k } k K are minimal frame centers, we have f (x k ) f (y k ) for all k K. By the assumptions of lower and upper semi-continuity on f and the definitions of ˆx and ŷ, wehave f ( ˆx) lim k K f (x k ) lim k K f (y k ) = f (ŷ). The next theorem lists conditions that ensure that ˆx satisfies certain stationary conditions, under various smoothness requirements. It is essentially an extension of four results from [10] to the mixed variable case; namely, Theorem 3.13, Corollary 3.14, Proposition 3.15, and Corollary 3.16, and the proofs are somewhat similar. Theorem 10 Assume that TΩ H ( ˆx) = and the set of refining directions is asymptotically dense in TΩ H ( ˆx). 1. If f is Lipschitz near ˆx with respect to the continuous variables, then ˆx isaclarke stationary point of f on Ω with respect to the continuous variables. 2. If f is strictly differentiable at ˆx with respect to the continuous variables, then ˆx is a Clarke KKT stationary point of f on Ω with respect to the continuous variables. Furthermore, if Ω is regular at ˆx, then the following hold: 1. If f is Lipschitz near ˆx with respect to the continuous variables, then ˆx is a contingent stationary point of f on Ω with respect to the continuous variables. 2. If f is strictly differentiable at ˆx with respect to the continuous variables, then ˆx is a contingent KKT stationary point of f on Ω.

11 Mesh adaptive direct search algorithms for mixed variable optimization 45 Proof First, Rockafellar [23] showed that if the hypertangent cone is not empty at ˆx, then TΩ Cl( ˆx) = cl(t Ω H ( ˆx)). Since the set S of refining directions for f at ˆx is a dense subset of TΩ H ( ˆx), S is also a dense subset of T Cl Ω ( ˆx). Thus, any vector v T Cl Ω ( ˆx) can be expressed as the limit of directions in S, and the first result follows directly from (7) and Theorem 8. Strict differentiability ensures the existence of f ( ˆx) and that f ( ˆx) T v = f ( ˆx; v) for all v TΩ Cl( ˆx). Since f ( ˆx; v) 0 for all v TΩ Cl ( ˆx), wehave ( f ( ˆx)) T v 0, and the second result follows from Definition 7. Furthermore, if Ω is regular at ˆx, then by Definition 6, TΩ Cl ( ˆx) = T Co Ω ( ˆx), and the final two results follow directly from Definition 7. The next result is similar to Theorem 10 but considers the limit of extended poll endpoints ẑ instead of ˆx. Theorem 11 Assume that ŷ N ( ˆx), TΩ H (ẑ) =, and the set of refining directions is asymptotically dense in TΩ H (ẑ). 1. If f is Lipschitz near ẑ with respect to the continuous variables, then ẑisaclarke stationary point of f on Ω with respect to the continuous variables. 2. If f is strictly differentiable at ẑ with respect to the continuous variables, then ẑis a Clarke KKT stationary point of f on Ω with respect to the continuous variables. Furthermore, if Ω is regular at ẑ, then the following hold: 1. If f is Lipschitz near ẑ with respect to the continuous variables, then ẑ is a contingent stationary point of f on Ω with respect to the continuous variables. 2. If f is strictly differentiable at ẑ with respect to the continuous variables, then ẑ is a contingent KKT stationary point of f on Ω. Proof The proof is identical to that of Theorem 10, but with ẑ replacing ˆx. We have shown conditions under which the limit points ˆx and ẑ satisfy certain necessary conditions for optimality. We now tie these results together with the notion of local optimality given in Definition 1. Remark 12 If ŷ N ( ˆx) is the limit point of discrete neighbors {y k } k K, where y k N (x k ) for k K, then Theorem 9 ensures that f ( ˆx) f (ŷ). If this inequality is strict, then ˆx is locally optimal (i.e., Definition 1 is satisfied) with respect to ŷ. On the other hand, suppose f ( ˆx) = f (ŷ). Then, since the extended poll triggers are bounded away from zero, an extended poll step must have been performed around y k for infinitely many k K. Since f (x k ) f (z k ) f (y k ) for all k K,it follows that f ( ˆx) = f (ẑ) = f (ŷ), and Theorem 11 ensures first-order stationarity at ẑ with respect to the continuous variables. 4 Concluding remarks This paper fills an important gap in the convergence theory for the class of MADS algorithms. We have introduced a new class of MADS algorithms for mixed variable optimization problems and proved that it possesses appropriate convergence

12 46 M. A. Abramson et al. properties, which are consistent with previous results for less general algorithms. We hope that these results will serve as a springboard for extending MADS to other classes of problems, such as those with stochastic noise and multiple objectives. Acknowledgments Work of the first author was supported by the Air Force Office of Scientific Research (AFOSR) and Los Alamos National Laboratory. Work of the second author was supported by AFOSR, ExxonMobil Upstream Research Company, and NSERC. The views expressed in this paper are those of the authors and do not reflect the official policy or position of the US Air Force, Department of Defense, or US Government. References 1. Abhishekand, K., Leyffer, S., Linderoth, J.T.: Modeling Without Categorical Variables: A Mixed-Integer Nonlinear Program for the Optimization Of Thermal Insulation Systems. Technical Report Preprint ANL/MCS-P , Argonne National Laboratory (2007) 2. Abramson, M.A.: Pattern Search Algorithms for Mixed Variable General Constrained Optimization Problems. PhD Thesis, Department of Computational and Applied Mathematics, Rice University (2002) 3. Abramson, M.A., Audet, C.: Second-order convergence of mesh adaptive direct search. SIAM J. Optim. 17(2), (2006) 4. Abramson, M.A., Audet, C., Dennis, J.E. Jr.: Generalized pattern searches with derivative information. Math. Program. Ser. B. 100(1), 3 25 (2004) 5. Abramson, M.A., Audet, C., Dennis, J.E. Jr.: Filter pattern search algorithms for mixed variable constrained optimization problems. Pac. J. Optim. 3(3), (2007) 6. Audet, C.: Convergence results for pattern search algorithms are tight. Optim. Eng. 5(2), (2004) 7. Audet, C., Dennis, J.E. Jr.: Pattern search algorithms for mixed variable programming. SIAM J. Optim. 11(3), (2000) 8. Audet, C., Dennis, J.E. Jr.: Analysis of generalized pattern searches. SIAM J. Optim. 13(3), (2003) 9. Audet, C., Dennis, J.E. Jr.: A pattern search filter method for nonlinear programming without derivatives. SIAM J. Optim. 14(4), (2004) 10. Audet, C., Dennis, J.E. Jr.: Mesh adaptive direct search algorithms for constrained optimization. SIAM J. Optim. 17(2), (2006) 11. Audet, C., Orban, D.: Finding optimal algorithmic parameters using the mesh adaptive direct search algorithm. SIAM J. Optim. 17(3), (2006) 12. Booker, A.J., Dennis, J.E. Jr., Frank, P.D., Serafini, D.B., Torczon, V., Trosset, M.W.: A rigorous framework for optimization of expensive functions by surrogates. Struct. Optim. 17(1), 1 13 (1999) 13. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York. Reissued in 1990 by SIAM Publications, Philadelphia, as Vol. 5 in the series Classics in Applied Mathematics (1983) 14. Coope, I.D., Price, C.J.: Frame-based methods for unconstrained optimization. J. Optim. Theory Appl. 107(2), (2000) 15. Davis, C.: Theory of positive linear dependence. Am. J. Math. 76(4), (1954) 16. Jahn, J.: Introduction to the Theory of Nonlinear Optimization. Springer, Berlin (1994) 17. Kokkolaras, M., Audet, C., Dennis, J.E. Jr.: Mixed variable optimization of the number and composition of heat intercepts in a thermal insulation system. Optim. Eng. 2(1), 5 29 (2001) 18. Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search: new perspectives on some classical and modern methods. SIAM Rev. 45(3), (2003) 19. Lewis, R.M., Torczon, V.: Pattern search algorithms for bound constrained minimization. SIAM J. Optim. 9(4), (1999) 20. Lewis, R.M., Torczon, V.: Pattern search methods for linearly constrained minimization. SIAM J. Optim. 10(3), (2000) 21. Lewis, R.M., Torczon, V.: A globally convergent augmented Lagrangian pattern search algorithm for optimization with general constraints and simple bounds. SIAM J. Optim. 12(4), (2002)

13 Mesh adaptive direct search algorithms for mixed variable optimization McKay, M.D., Conover, W.J., Beckman, R.J.: A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2), (1979) 23. Rockafellar, R.T.: Generalized directional derivatives and subgradients of nonconvex functions. Can. J. Math. 32(2), (1980) 24. Torczon, V.: On the convergence of pattern search algorithms. SIAM J. Optim. 7(1), 1 25 (1997)

Filter Pattern Search Algorithms for Mixed Variable Constrained Optimization Problems

Filter Pattern Search Algorithms for Mixed Variable Constrained Optimization Problems Mark A. Abramson Air Force Institute of Technology Department of Mathematics and Statistics 2950 Hobson Way, Building