The cluster problem in constrained global optimization

Similar documents
Differentiation in higher dimensions

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households

Polynomial Interpolation

Copyright c 2008 Kevin Long

lecture 26: Richardson extrapolation

Polynomial Interpolation

MVT and Rolle s Theorem

Order of Accuracy. ũ h u Ch p, (1)

Poisson Equation in Sobolev Spaces

A SHORT INTRODUCTION TO BANACH LATTICES AND

Gradient Descent etc.

HOMEWORK HELP 2 FOR MATH 151

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Research Article New Results on Multiple Solutions for Nth-Order Fuzzy Differential Equations under Generalized Differentiability

MA455 Manifolds Solutions 1 May 2008

3.4 Worksheet: Proof of the Chain Rule NAME

Efficient algorithms for for clone items detection

Numerical Differentiation

Exam 1 Review Solutions

7 Semiparametric Methods and Partially Linear Regression

= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c)

Analytic Functions. Differentiable Functions of a Complex Variable

Quasiperiodic phenomena in the Van der Pol - Mathieu equation

7.1 Using Antiderivatives to find Area

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a?

Functions of the Complex Variable z

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

1 Introduction to Optimization

Convexity and Smoothness

arxiv: v1 [math.dg] 4 Feb 2015

How to Find the Derivative of a Function: Calculus 1

The Complexity of Computing the MCD-Estimator

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

Function Composition and Chain Rules

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

Click here to see an animation of the derivative

ch (for some fixed positive number c) reaching c

Material for Difference Quotient

arxiv:math/ v1 [math.ca] 1 Oct 2003

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Math 161 (33) - Final exam

Differential Calculus (The basics) Prepared by Mr. C. Hull

Solutions to the Multivariable Calculus and Linear Algebra problems on the Comprehensive Examination of January 31, 2014

Derivatives. By: OpenStaxCollege

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

Numerical Experiments Using MATLAB: Superconvergence of Nonconforming Finite Element Approximation for Second-Order Elliptic Problems

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*

Time (hours) Morphine sulfate (mg)

Math Spring 2013 Solutions to Assignment # 3 Completion Date: Wednesday May 15, (1/z) 2 (1/z 1) 2 = lim

Name: Answer Key No calculators. Show your work! 1. (21 points) All answers should either be,, a (finite) real number, or DNE ( does not exist ).

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

Subdifferentials of convex functions

IEOR 165 Lecture 10 Distribution Estimation

A = h w (1) Error Analysis Physics 141

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative

Chapter 2 Limits and Continuity

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

The Laplace equation, cylindrically or spherically symmetric case

University Mathematics 2

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

arxiv: v1 [math.oc] 18 May 2018

Linearized Primal-Dual Methods for Linear Inverse Problems with Total Variation Regularization and Finite Element Discretization

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225

2.11 That s So Derivative

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

POLYNOMIAL AND SPLINE ESTIMATORS OF THE DISTRIBUTION FUNCTION WITH PRESCRIBED ACCURACY

OSCILLATION OF SOLUTIONS TO NON-LINEAR DIFFERENCE EQUATIONS WITH SEVERAL ADVANCED ARGUMENTS. Sandra Pinelas and Julio G. Dix

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

The derivative function

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES

Chapter 4: Numerical Methods for Common Mathematical Problems

A.P. CALCULUS (AB) Outline Chapter 3 (Derivatives)

Parameter Fitted Scheme for Singularly Perturbed Delay Differential Equations

The total error in numerical differentiation

Cubic Functions: Local Analysis

ALGEBRA AND TRIGONOMETRY REVIEW by Dr TEBOU, FIU. A. Fundamental identities Throughout this section, a and b denotes arbitrary real numbers.

Dedicated to the 70th birthday of Professor Lin Qun

New Streamfunction Approach for Magnetohydrodynamics

Continuity and Differentiability of the Trigonometric Functions

Convergence and Descent Properties for a Class of Multilevel Optimization Algorithms

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if

3.1 Extreme Values of a Function

arxiv: v1 [math.na] 28 Apr 2017

Quantum Numbers and Rules

Section 3: The Derivative Definition of the Derivative

Combining functions: algebraic methods

. If lim. x 2 x 1. f(x+h) f(x)

232 Calculus and Structures

Complexity of Decoding Positive-Rate Reed-Solomon Codes

GRID CONVERGENCE ERROR ANALYSIS FOR MIXED-ORDER NUMERICAL SCHEMES

3. THE EXCHANGE ECONOMY

Taylor Series and the Mean Value Theorem of Derivatives

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Precalculus Test 2 Practice Questions Page 1. Note: You can expect other types of questions on the test than the ones presented here!

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

4.2 - Richardson Extrapolation

arxiv: v1 [math.ap] 16 Nov 2018

Transcription:

Te cluster problem in constrained global optimization Te MIT Faculty as made tis article openly available. Please sare ow tis access benefits you. Your story matters. Citation As Publised Publiser Kannan, Roit, and Paul I. Barton. Te Cluster Problem in Constrained Global Optimization. Journal of Global Optimization 69, no. 3 May, 207: 629 676. ttp://dx.doi.org/0.007/s0898-07-053-z Springer-Verlag Version Autor's final manuscript Accessed Tu Apr 06:00:45 EDT 209 Citable Link ttp://dl.andle.net/72./68 Terms of Use Creative Commons Attribution-Noncommercial-Sare Alike Detailed Terms ttp://creativecommons.org/licenses/by-nc-sa/4.0/

Journal of Global Optimization manuscript No. will be inserted by te editor Te cluster problem in constrained global optimization Roit Kannan Paul I. Barton Received: date / Accepted: date Abstract Deterministic branc-and-bound algoritms for continuous global optimization often visit a large number of boxes in te neigborood of a global minimizer, resulting in te so-called cluster problem J Glob Optim 53:253-265, 994. Tis article extends previous analyses of te cluster problem in unconstrained global optimization J Glob Optim 53:253-265, 994, J Glob Optim 583:429-438, 204 to te constrained setting based on a recently-developed notion of convergence order for convex relaxation-based lower bounding scemes. It is sown tat clustering can occur bot on nearly-optimal and nearly-feasible regions in te vicinity of a global minimizer. In contrast to te case of unconstrained optimization, were at least second-order convergent scemes of relaxations are required to mitigate te cluster problem wen te minimizer sits at a point of differentiability of te objective function, it is sown tat first-order convergent lower bounding scemes for constrained problems may mitigate te cluster problem under certain conditions. Additionally, conditions under wic second-order convergent lower bounding scemes are sufficient to mitigate te cluster problem around a global minimizer are developed. Conditions on te convergence order prefactor tat are sufficient to altogeter eliminate te cluster problem are also provided. Tis analysis reduces to previous analyses of te cluster problem for unconstrained optimization under suitable assumptions. Keywords Cluster problem Global optimization Constrained optimization Branc-and-bound Convergence order Convex relaxation Lower bounding sceme Matematics Subject Classification 200 49M20 49M37 65K05 68Q25 90C26 90C46 Introduction One of te key issues faced by deterministic branc-and-bound algoritms for continuous global optimization [] is te so-called cluster problem, were a large number of boxes may be visited by te algoritm in te vicinity of a global minimizer [7, 2, 29]. Du and Kearfott [7, 3] were te first to analyze tis penomenon in Te autors gratefully acknowledge financial support from BP. Tis work was conducted as a part of te BP-MIT conversion researc program. Roit Kannan Paul I. Barton Process Systems Engineering Laboratory, Department of Cemical Engineering, Massacusetts Institute of Tecnology, Cambridge, MA, USA Paul I. Barton E-mail: pib@mit.edu Roit Kannan E-mail: roitk@mit.edu

2 Roit Kannan, Paul I. Barton te context of interval branc-and-bound algoritms for unconstrained global optimization. Tey establised tat te accuracy wit wic te bounding sceme estimates te range of te objective function, as determined by te notion of convergence order see Definition 7, dictates te extent of te cluster problem. Furtermore, tey determined tat, in te worst case, at least second-order convergence of te bounding sceme is required to mitigate clustering [7]. Next, Neumaier [2] provided a similar analysis and concluded tat even secondorder convergence of te bounding sceme migt, in te worst case, result in an exponential number of boxes in te vicinity of an unconstrained global minimizer. In addition, Neumaier claimed tat a similar situation olds in a reduced manifold for te constrained case [2]. Recently, Wecsung et al. [29] provided a refined analysis of Neumaier s argument for unconstrained global optimization wic corroborated te previous analyses. In addition, tey sowed tat te number of boxes visited in te vicinity of a global minimizer may scale differently depending on te convergence order prefactor. As a result, second-order convergent bounding scemes wit small-enoug prefactors may altogeter eliminate te cluster problem, wile second-order convergent bounding scemes wit large-enoug prefactors may result in an exponential number of boxes being visited. Also note te analysis by Wecsung [28, Section 2.3] tat sows first-order convergence of te bounding sceme may be sufficient to mitigate te cluster problem in unconstrained optimization wen te optimizer sits at a point of nondifferentiability of te objective function. As igligted above, te convergence order of te bounding sceme plays a key role in te analysis of te cluster problem. Tis concept, wic is based on te rate at wic te notion of excess widt from interval extensions [8] srinks to zero, compares te rate of convergence of an estimated range of a function to its true range. Bompadre and Mitsos [3] developed te notions of Hausdorff and pointwise convergence rates of bounding scemes, and establised sarp rules for te propagation of convergence orders of bounding scemes constructed using McCormick s composition rules [7]. In addition, Bompadre and Mitsos [3] demonstrated second-order pointwise convergence of scemes of convex and concave envelopes of twice continuously differentiable functions, second-order pointwise convergence of scemes of αbb relaxations [], and provided a conservative estimate of te prefactor of αbb relaxation scemes for te case of constant α. Scolz [25] demonstrated second-order convergence of centered forms also see, for instance, te article by Krawczyk and Nickel [5]. Bompadre and coworkers [4] establised sarp rules for te propagation of convergence orders of Taylor and McCormick-Taylor models. Najman and Mitsos [20] establised sarp rules for te propagation of convergence orders of te multivariate McCormick relaxations developed in [9, 26]. Finally, Kan and coworkers [4] developed a continuously differentiable variant of McCormick relaxations [7, 9, 26], and establised second-order pointwise convergence of scemes of te differentiable McCormick relaxations for twice continuously differentiable functions. Te above literature not only elps develop bounding scemes for unconstrained optimization wit te requisite convergence order, but also provides conservative estimates for te convergence order prefactor see Definition 7. Also note te related definition for te rate of convergence of lower bounding scemes for geometric branc-and-bound metods provided by Scöbel and Scolz [23]. Tis work provides an analysis of te cluster problem for constrained global optimization. It is sown tat clustering can occur bot on feasible and infeasible regions in te neigborood of a global minimizer. Akin to te case of unconstrained optimization, bot te convergence order of a lower bounding sceme and its corresponding prefactor see Definition 8 may be crucial towards tackling te cluster problem; owever, in contrast to te case of unconstrained optimization, it is sown tat first-order convergent lower bounding scemes wit small-enoug prefactors may eliminate te cluster problem under certain conditions. Additionally, conditions under wic second-order convergence of te lower bounding sceme may be sufficient to mitigate clustering are developed. Tis work assumes tat boxes can be placed suc tat global minimizers are always in teir relative interior, oterwise an exponential number of boxes can contain global minimizers. Tecniques suc as epsiloninflation [6] or back-boxing [2, 27] can potentially be used to place boxes wit global minimizers in teir relative interior. Tis article is organized as follows. Section 2 provides te problem formulation, describes te notions of convergence used in tis work, and sets up te framework for analyzing te cluster problem in Section 3. Section 3. analyzes te cluster problem on te set of nearly-optimal feasible points in a neigborood of a global minimizer and determines conditions under wic first-order and second-order convergent bounding scemes

Te cluster problem in constrained global optimization 3 may be sufficient to mitigate clustering in suc neigboroods. Section 3.2 analyzes te cluster problem on te set of nearly-feasible points in a neigborood of a global minimizer tat ave a good-enoug objective function value, and develops conditions under wic first-order and second-order convergent bounding scemes may be sufficient to mitigate clustering in suc neigboroods. Finally, Section 4 lists te conclusions of tis work. 2 Problem Formulation and Background Consider te problem min x fx P s.t. gx 0, x = 0, x X, were X R n x is a nonempty open bounded convex set, te functions f : X R, g : X R m I, and : X R m E are continuous on X, and 0 denotes a vector of zeros of appropriate dimension. Te following assumptions are enforced trougout tis work. Assumption Te constraints define a nonempty compact set x X : gx 0, x = 0 X. Assumption 2 Let x X be a global minimum for Problem P, and assume tat te branc-and-bound algoritm as found te upper bound UBD = fx sufficiently early on. Let ε be te termination tolerance for te branc-and-bound algoritm, and suppose te algoritm fatoms node k wen UBD LBD k ε, were LBD k is te lower bound on node k. Wen Assumption is enforced, Problem P attains its optimal solution on X by virtue of te assumption tat f is continuous on X. Note tat te assumption tat X is an open set is made purely for ease of exposition, particularly wen differentiability assumptions on te functions in Problem P are made, and is not practically implementable in general. As a result, we implicitly assume trougout tis work tat finite bounds on te variables wic define an interval in te interior of X are available for use in a branc-and-bound setting. Assumption 2 essentially assumes tat te convergence of te overall lower bound is te limiting factor for te convergence of te branc-and-bound algoritm. Tis is usually a reasonable assumption in te context of branc-and-bound algoritms for global optimization were most of te effort is typically spent in proving ε-optimality of feasible solutions found using euristic local optimization-based tecniques. Te cluster problem analysis in tis work is asymptotic in ε in general; we provide conservative estimates of te worst-case number of boxes visited by te branc-and-bound algoritm in nearly-optimal and nearly-feasible neigboroods of global minimizers for some sufficiently small ε > 0. Te conservatism of te above estimates decreases as ε 0. Te asymptotic nature of our analysis wit respect to ε is not only a result of considering te local beavior of te objective function in te vicinity of a global minimizer wic is also a limitation of te analyses of te cluster problem in unconstrained optimization [7, 2, 28, 29], but is also a consequence of considering te local beavior of te constraints and, terefore, te feasible region in te vicinity of a global minimizer. In practice, values of ε for wic te analysis of te cluster problem provides a reasonable overestimate of te number of boxes visited can be muc larger tan te macine precision on te order of 0. Tis is evidenced by te examples in Section 3. Also note tat te fatoming criterion for te branc-and-bound algoritm in tis work is different from te one considered by Wecsung et al. [29], wo assume tat node k is fatomed only wen LBD k > UBD; owever, te worst-case estimates of te number of boxes visited by te branc-and-bound algoritm are not affected by tis difference in our assumptions.

4 Roit Kannan, Paul I. Barton Trougout tis work, we will use x to denote a global minimizer of Problem P, IZ to denote te set of nonempty, closed and bounded interval subsets of Z R n, Z C to denote te relative complement of a set Z R n wit respect to X, clz to denote te closure of a set Z R n, z to denote te Euclidean norm of z R n, R to denote te nonpositive ortant, z j to denote te j t component of a vector z, z,z 2,,z n to denote a vector z R n wit entries z,z 2,,z n R note tat z,z 2 will be used to denote bot an open interval [ in ] R g and a vector in R 2 ; te intended use will be clear from te context, to denote te ceiling function, to denote a vector-valued function wit domain Y and codomain R m+n corresponding to vector-valued functions g : Y R m and : Y R n, fz to denote te image of Z Y under te function f : Y R m, f z;d to denote te directional derivative of a function f : Z R n R at a point z Z wit Z open in a direction d R n, and differentiability to refer to differentiability in te Frécet sense. Te following definitions are in order. Definition Widt of an Interval Let Z = [z L,zU ] [zl n,z U n ] be an element of IR n. Te widt of Z, denoted by wz, is given by wz := max i=,,n zu i z L i. Definition 2 Distance Between Two Sets Let Y,Z R n. Te distance between Y and Z, denoted by dy,z, is defined as dy,z := inf y z. y Y, z Z Note tat te above definition of distance does not define a metric; owever, it will prove useful in defining a measure of infeasibility for points in X for Problem P. Definition 3 Lipscitz Continuous Function Let Z R n. A function f : Z R is Lipscitz continuous wit Lipscitz constant M 0 if fz fz 2 M z z 2, z,z 2 Z. Since te cluster problem analysis is asymptotic in ε, we will need te following asymptotic notations. Definition 4 Big O and Little o Notations Let Y R, f : Y R, and g : Y R. We say tat fy = Ogy as y ȳ Y if and only if tere exist δ,m > 0 suc tat fy M gy, y Y wit y ȳ < δ. Similarly, we say tat fy = ogy as y ȳ Y if and only if for all M > 0 tere exists δ > 0 suc tat fy M gy, y Y wit y ȳ < δ. Note tat unless oterwise specified, we consider ȳ = 0 in tis work. Definition 5 Convex and Concave Relaxations Given a convex set Z R n and a function f : Z R, a convex function fz cv : Z R is called a convex relaxation of f on Z if f cv Z z fz, z Z. Similarly, a concave function fz cc : Z R is called a concave relaxation of f on Z if f ccz fz, z Z. Te following definition introduces te notion of scemes of relaxations [3]. Definition 6 Scemes of Convex and Concave Relaxations Let Y R n be a nonempty convex set, and let f : Y R. Assume tat for every Z IY, we can construct functions fz cv : Z R and f cc Z : Z R tat are convex and concave relaxations, respectively, of f on Z. Te sets of functions fz cv Z IY and fz cc Z IY define scemes of convex and concave relaxations, respectively, of f in Y, and te set of pairs of functions fz cv, f Z cc Z IY defines a sceme of relaxations of f in Y. Te scemes of relaxations are called continuous wen fz cv and f cc Z are continuous on Z for eac Z IY. Z Te next definition presents a notion of convergence order of scemes of convex and concave relaxations [29] based on te notion of Hausdorff convergence order of a sceme of relaxations [3].

Te cluster problem in constrained global optimization 5 Definition 7 Convergence Orders of Scemes of Convex and Concave Relaxations Let Y R n be a nonempty bounded convex set, and f : Y R be a continuous function. Let fz cv Z IY and fz cc Z IY respectively denote continuous scemes of convex and concave relaxations of f in Y. Te sceme of convex relaxations fz cv Z IY is said to ave convergence of order β > 0 at y Y if tere exists τ cv 0 suc tat min fz min f Z cv z τ cv wz β, Z IY wit y Z. z Z z Z Similarly, te sceme of concave relaxations f cc Z Z IY is said to ave convergence of order β > 0 at y Y if tere exists τ cc 0 suc tat max f Z cc z max fz τ cc wz β, Z IY wit y Z. z Z z Z fz cv Z IY and fz cc Z IY are said to ave convergence of order β > 0 on Y if tey ave convergence of order at least β at eac y Y, wit te constants τ cv and τ cc independent of y. Te following definition seeks to extend te notion of convergence order of a bounding sceme [3, 4, 29] to constrained problems. Conditions under wic specific lower bounding scemes are guaranteed to exibit a certain convergence order will be presented in a future article. Definition 8 Convergence Order of a Lower Bounding Sceme Consider Problem P. For any Z IX, let FZ = x Z : gx 0, x = 0 denote te feasible set of Problem P wit x restricted to Z. Let fz cv Z IX and g cv Z Z IX denote continuous scemes of convex relaxations of f and g, respectively, in X, and let cv Z,cc Z Z IX denote a continuous sceme of relaxations of in X. For any Z IX, let F cv Z = x Z : g cv Z x 0,cv Z x 0,cc Z x 0 denote te feasible set of te convex relaxation-based lower bounding sceme. Te convex relaxation-based lower bounding sceme is said to ave convergence of order β > 0 at. a feasible point x X if tere exists τ 0 suc tat for every Z IX wit x Z, min fz min f z FZ z F cv Z cv z Z τwzβ. 2. an infeasible point x X if tere exists τ 0 suc tat for every Z IX wit x Z, [g ] d Z,R m I 0 d I C Z,R m I 0 τwz β, were [ ] g Z denotes te image of Z under te vector-valued function [ ] g, and I C Z is defined by I C Z Z IX := v,w R m I R m E : v = g cv Z z, cv Z z w cc Z z for some z Z Z IX. Te sceme of lower bounding problems is said to ave convergence of order β > 0 on X if it as convergence of order at least β at eac x X, wit te constants τ and τ independent of x. Definition 8 is motivated by te requirements of a lower bounding sceme to fatom feasible and infeasible regions in a branc-and-bound procedure []. On nested sequences of intervals converging to a feasible point of Problem P, we require tat te corresponding sequences of lower bounds converge rapidly to te corresponding sequences of minimum objective values. On te oter and, on nested sequences of intervals converging to an infeasible point of Problem P, we require tat te corresponding sequences of lower bounding problems rapidly detect te eventual infeasibility of te corresponding sequences of intervals for Problem P. Te latter requirement is enforced by requiring tat te measures of infeasibility of te corresponding lower bounding problems, as determined by te distance function d, converge rapidly to te measures of infeasibility of te corresponding restricted Problems P. Note tat some intervals tat only contain infeasible points may

6 Roit Kannan, Paul I. Barton also potentially be fatomed by value dominance if te lower bounds on tose intervals obtained by solving te corresponding relaxation-based lower bounding problems is greater tan or equal to UBD ε. Tis possibility in considered later in tis section see, for instance, Lemma 3 and in Section 3.2. Te following lemmata detail worst-case conditions under wic nodes containing a global minimum and infeasible points are fatomed. Lemma Fatoming Nodes Containing Global Minimizers Let X IX, wit x X, correspond to te domain of node k in te branc-and-bound tree. Suppose te convex relaxation-based lower bounding sceme as convergence of order β > 0 at x wit a prefactor τ > 0 see Definition 8. For node k to be fatomed, we require, in tat worst case, tat wx β τ. Proof Te condition for node k to be fatomed by value dominance is UBD LBD k = fx LBD k ε. Since we are concerned about convergence at te feasible point x X, we ave from Definition 8 tat min fz min f z FX z F cv X X cv z τ wx β = LBD k = min z F cv X f cv X z fx τ wx β. Terefore, in te worst case, node k is fatomed only wen LBD k fx τ wx β fx ε wx Lemma 2 Fatoming Infeasible Nodes by Infeasibility Let X I IX, wit [ ] X I g x X : d x,r m I 0 > ε f τ β. for some ε f > 0, correspond to te domain of node k I in te branc-and-bound tree. Suppose te convex relaxation-based lower bounding sceme as convergence of order β I > 0 at eac x X I wit a prefactor τ I > 0 tat is independent of x see Definition 8. For node k I to be fatomed by infeasibility, we require, in te worst case, tat ε wx I f β I τ I. Proof For node k I to be fatomed by infeasibility, we require tat te convex relaxation-based lower bounding problem is infeasible on X I, i.e., d I C X I,R m I 0 > 0. Since we are concerned about convergence at infeasible points, we ave from Definition 8 tat [g ] d X I,R m I 0 d I C X I,R m I 0 τ I wx I β I [g ] = d I C X I,R m I 0 d X I,R m I 0 τ I wx I β I. Terefore, node k I is fatomed, in te worst case, only wen d I C X I,R m I 0 [g ] d X I,R m I 0 τ I wx I β I > 0 ε f τ I wx I β I 0 wx I f τ I β I.

Te cluster problem in constrained global optimization 7 Lemma 3 Fatoming Infeasible Nodes by Value Dominance Let X I IX, wit [ ] X I g x X : d x,r m I 0 > 0, correspond to te domain of node k I in te branc-and-bound tree. Suppose x X I, fx fx. Furtermore, suppose te sceme fz cv Z IX as convergence of order β f > 0 at eac x X I wit a prefactor τ f > 0 tat is independent of x see Definition 7. If wx I β f τ f, ten node k I will be fatomed. Proof A sufficient condition for node k I to be fatomed is min f cv z F cv X I X I z fx ε. Since f cv Z Z IX as convergence of order β f, we ave from Definition 7 tat min f cv z X I X I z min fz τ f wx I β f z X I min z X I fz ε fx ε, were Step 2 uses wx I β f τ f, and Step 3 uses fx fx, x X I. Terefore, Te desired result follows. min f cv z F cv X I X I z min f cv z X I X I z fx ε. In wat follows, we sall partition te set X into distinct regions wit te aim of constructing regions tat are eiter relatively easy to fatom based on Lemmata to 3, or are relatively ard to fatom. Suppose te convex relaxation-based lower bounding sceme as convergence of order β > 0 on FX wit prefactor τ > 0, and convergence of order β I > 0 on FX C wit prefactor τ I > 0 note tat it is sufficient for te lower bounding sceme to ave te requisite convergence orders on some neigborood of te global minimizers of Problem P for our analysis to old, as will become clear in Section 3. Furtermore, suppose te sceme fz cv Z IX as convergence of order β f > 0 on X wit prefactor τ f > 0. Pick a feasibility tolerance ε f and an optimality tolerance ε o suc tat f τ I β I = o τ f β f ε β = τ, and consider te following partition of X: [ ] g X := x X : d x,r m I 0 > ε f, [ ] g X 2 := x X : d x,r m I 0 0,ε f ] and fx fx > ε o, [ ] g X 3 := x X : d x,r m I 0 0,ε f ] and fx fx ε o, [ ] g X 4 := x X : d x,r m I 0 = 0 and fx fx > ε, and X 5 := x X : d ] [ g x,r m I 0 = 0 and fx fx ε. TOL

8 Roit Kannan, Paul I. Barton 0.8 0.6 0.4 0.2 0 3.3 3.2 3. 3 2.9 0 0.2 0.4 0.6 0.8 x a Example unconstrained 2.2 2.25 2.3 2.35 2.4 2.45 2.5 x b Example 2 inequality-constrained 2.5 X X 2 X 3 X 4 X 5 x * 0.5 0.4 0.5 0.6 0.7 0.8 0.9 x c Example 3 equality-constrained Fig. : Plots of te sets X troug X 5 for an unconstrained, an inequality-constrained, and an equalityconstrained problem. Te dased lines define te sets X, and te filled-in triangles denote te unique global minimizers of te problems on X. All plots use ε = ε o = ε f = 0. for illustration. Te set X corresponds to te set of infeasible points for Problem P wit te measure of infeasibility greater tan ε f. Te set X 2 corresponds to te set of infeasible points for Problem P wit te measure of infeasibility less tan or equal to ε f and wit te objective function value greater tan fx + ε o, wile te set X 3 corresponds to te set of infeasible points for Problem P wit te measure of infeasibility less tan or equal to ε f and te objective function value less tan or equal to fx +ε o. Te set X 4 corresponds to te set of feasible points for Problem P wit objective value greater tan fx +ε, wile te set X 5 corresponds to te set of feasible points for Problem P wit objective value less tan or equal to fx +ε. Te sets X troug X 5 are illustrated in Figure for te tree two-dimensional problems presented in Examples to 3. Intuitively, we expect tat nodes wit domains contained in te sets X and X 2 can be fatomed relatively easily by infeasibility and value dominance, respectively compared to nodes wit domains contained in te set X 3. Similarly, we expect tat nodes wit domains contained in te set X 4 can be fatomed relatively easily by value dominance compared to nodes wit domains contained in te set X 5. Tis intuition is formalized in Corollary. Consequently, te extent of clustering is dictated primarily by te number of boxes required to cover te regions X 3 and X 5. Section 3 provides conservative estimates of te number of boxes of certain widts tat are required to cover X 3 and X 5 under suitable assumptions. As an aside, note tat te condition

Te cluster problem in constrained global optimization 9 specified by Equation TOL is used to rougly enforce tat nodes wit domains contained in te sets X, X 2, and X 4 can, in te worst case, be fatomed using a similar level of effort. Example Let X = 0, 0,, m I = m E = 0, and fx = x 4 + x4 2 x2 x2 2 wit x = 2, 2. We ave: X = X 2 = X 3 = /0, X 4 = x X : x 4 + x 4 2 2 > fx +ε, and X 5 = x X : x 4 + x4 2 x2 x2 2 fx +ε. Te sets X troug X 5 are depicted in Figure a for ε = 0.. Example 2 Let X = 2.2,2.5 2.9,3.3, m I = 3, m E = 0, fx = x, g x = 2x 4 +8x3 8x2 2, g 2 x = 4x 4 +32x3 88x2 +96x 36, and g 3 x = 3 wit x 2.33,3.8 based on Example 4.0 in [8]. We ave: X = x X : 3 max0,g j > ε f, j= X 2 = x X : 3 max0,g j 0,ε f ], x > fx +ε o, j= X 3 = x X : 3 max0,g j 0,ε f ], x fx +ε o, j= X 4 = x X : gx 0, x > fx +ε, and X 5 = x X : gx 0, x fx +ε. Te sets X troug X 5 are depicted in Figure b for ε = ε o = ε f = 0.. Example 3 Let X = 0.4,.0 0.5,2.0, m I = 2, m E =, fx = 2x 7 + 2, g x = x 0.9, g 2 x = 0.5 x, and x = + 2x 4 2 wit x 0.72,.47 based on Example 4.9 in [8]. We ave: X = x X : 2 max0,g j + x 2 > ε f, j= X 2 = x X : 2 max0,g j + x 2 0,ε f ], 2x 7 + 2 > fx +ε o, j= X 3 = x X : 2 max0,g j + x 2 0,ε f ], 2x 7 + 2 fx +ε o, j= X 4 = x X : gx 0, x = 0, 2x 7 + 2 > fx +ε, and X 5 = x X : gx 0, x = 0, 2x 7 + 2 fx +ε. Te sets X troug X 5 are depicted in Figure c for ε = ε o = ε f = 0..

0 Roit Kannan, Paul I. Barton Te following corollary of Lemmata, 2, and 3, similar to Lemma 2 in [29], provides sufficient conditions under wic nodes wit domains contained in X, X 2, and X 4 can be fatomed. β Corollary Fatoming Nodes Contained in X, X 2, and X 4 Let δ =. τ. Suppose te convex relaxation-based lower bounding sceme as convergence of order β I > 0 at eac x X wit a prefactor τ I > 0 tat is independent of x. Consider X IX corresponding to te domain of node k in te branc-and-bound tree. If w X δ, ten node k will be fatomed by infeasibility. 2. Suppose te sceme of convex relaxations fz cv Z IX as convergence of order β f > 0 at eac x X 2 wit a prefactor τ f > 0 tat is independent of x. Consider X 2 IX 2 corresponding to te domain of node k 2 in te branc-and-bound tree. If w X 2 δ, ten node k 2 will be fatomed by value dominance. 3. Suppose te convex relaxation-based lower bounding sceme as convergence of order β > 0 at eac x X 4 wit a prefactor τ > 0 tat is independent of x. Consider X 4 IX 4 corresponding to te domain of node k 4 in te branc-and-bound tree. If w X 4 δ, ten node k 4 will be fatomed by value dominance. Corollary implies tat nodes wit domains X, X 2, and X 4 suc tat X IX, X 2 IX 2, and X 4 IX 4 can be fatomed wen or before teir widts are δ in fact, nodes wit domains in IX 2 and IX 4 can be fatomed wen or before teir widts are ε o +ε β f and 2ε τ f τ β, respectively. However, nodes X 5 IX 5 may, in te worst case, need to be covered by boxes of widt δ before tey are fatomed. Furtermore, nodes X 3 IX 3 may need to be covered by a large number of boxes depending on te convergence properties of te lower bounding sceme on X 3. Te following example presents a case in wic clustering may occur on X 3 because te lower bounding sceme does not ave a sufficiently-large convergence order at infeasible points. Example 4 Let X = 2,2, m I = 3, and m E = 0 wit fx = x, g x =, g 2 x = x, and g 3 x = x. We ave x = 0 wic is te only feasible point. For any [x L,x U ] =: Z IX, let fz cv x = x, g cv x U,Z x = x L, if 0 [x L,x U ] x min L 2, x U 2 x U x L,, oterwise g cv 2,Zx = x, g cv 3,Z x = x. We ave β = β I = and β f arbitarily-large wit prefactors τ,τ I, and τ f, respectively, greater tan zero. Suppose ε,ε f. Pick γ > 0 and α 0,γ suc tat γ + α 2 = ε f. Let x L := γ α = ε f and x U := γ +α < 0. Te widt of Z is wz = 2α. Note tat g 2 and g 3 are feasible on Z; terefore, we need only be concerned wit te feasibility of g. We ave g Z = [γ α 2,γ + α 2 ] and dgz,r m I = γ α2. Tis implies g is infeasible at eac x Z. Note tat X 3 = [x L,0 0,minε o, ] ε f wic follows, in part, from eac x [x L,0 being infeasible wit fx fx and dgx,r m I ε f. We ave g cv,z Z = [γ α2 2α,γ α 2 2α] and dg cv Z Z,Rm I = max0,γ α2 2α. Te optimal objective value of te lower bounding problem on Z is γ α wen dg cv Z Z,Rm I = 0, and is + oterwise. Note tat te lower bounding problem is infeasible on Z wen γ α 2 2α > 0, wic can be acieved by coosing α to be sufficiently-small and increasing γ accordingly. Te maximum widt of te interval Z for wic it can be fatomed by infeasibility can be sown to be wz = 2α := 2+γ 2 +2γ = Oγ 2 = Oε f note tat γ because ε f. For α > α, te interval Z cannot be fatomed by infeasibility and te optimal objective value of te lower bounding problem on Z is γ α = ε f = O ε. Suc an interval Z cannot be fatomed by value dominance eiter since ε. Terefore, in te worst case, te interval Z can be fatomed only wen wz = Oγ 2 = Oε f. Tis causes clustering in te worst case since w[x L,0 = O ε f and [x L,0 X 3.

Te cluster problem in constrained global optimization 3 Analysis of te Cluster Problem In tis section, conservative estimates for te number of boxes required to cover X 3 and X 5 are provided based on assumptions on Problem P in particular, on its set of global minimizers, and caracteristics of te brancand-bound algoritm. First, some requisite definitions are provided [2]. Definition 9 Neigborood of a Point Let x X R n x. For any α > 0, p N, te set Nα p x := z X : z x p < α is called te α-neigborood of x relative to X wit respect to te p-norm. Note tat all norms on R n x are equivalent. Definition 0 Strict Local Minimum Let FX denote te feasible set of Problem P. A point x FX is called a strict local minimum if x is a local minimum, and α > 0 suc tat fx > f x, x Nα 2 x FX suc tat x x. Definition Nonisolated Feasible Point A feasible point x FX is said to be nonisolated if α > 0, z N 2 α x FX suc tat z x. Definition 2 Set of Active Inequality Constraints Let x FX be a feasible point for Problem P. Te set of active inequality constraints at x, denoted by A x, is given by A x := j,,m I : g j x = 0. Definition 3 Tangent and Cone of Tangents Let x FX R n x be a feasible point for Problem P. A vector d R n x is said to be a tangent of FX at x if tere exists a sequence λ k 0 wit λ k > 0, and a sequence x k x wit x k FX suc tat d = lim k x k x λ k. Te set of all tangents of FX at x, denoted by Tx, is called te tangent cone of FX at x. 3. Estimates for te number of boxes required to cover X 5 Tis section assumes tat Problem P as a finite number of global minimizers wic implies eac global minimum is a strict local minimum, and ε is small enoug tat X 5 is guaranteed to be contained in neigboroods of global minimizers under additional assumptions. An estimate for te number of boxes of widt δ required to cover some neigborood of a minimum x tat contains te subset of X 5 around x is provided under suitable assumptions. An estimate for te number of boxes required to cover X 5 can be obtained by summing te above estimates over te set of global minimizers. Trougout tis section, we assume tat x is a nonisolated feasible point; oterwise, α > 0 suc tat Nα 2 x X 5 = x, wic can be covered using a single box. We begin wit a necessary condition for x to be a local minimum. Teorem First-Order Necessary Optimality Condition Consider Problem P, and suppose f is differentiable at x. Ten d : fx T d < 0 Tx = /0. Proof See Teorem 5..2 in [2].

2 Roit Kannan, Paul I. Barton Lemma 4 Consider Problem P. Suppose x is nonisolated and f is differentiable at x. Ten θ > 0, α > 0 suc tat inf d: d =, t>0 :x +td Nα x FX fx T d > min d: d =,d Tx fx T d θ. Proof See Appendix A.. Te following result, inspired by Lemma 2.4 in [28], provides a conservative estimate of te subset of X 5 around a nonisolated x under te assumption tat te objective function grows linearly on te feasible region in some neigborood of x. Te reader can compare te assumptions of Lemma 5 wit wat follows from Lemma 4 and te necessary optimality conditions in Teorem see Remark for details. Lemma 5 Consider Problem P. Suppose x is nonisolated, f is differentiable at x, and α > 0 suc tat L := inf d: d =, t>0 :x +td N α x FX fx T d > 0. Ten, ˆα 0,α] suc tat te region N ˆα x X 5 can be conservatively approximated by ˆX 5 = x N ˆα x : L x x 2ε. Proof Let x = x +td N α x FX wit d = and t = x x > 0. We ave fx = fx +td = fx + fx T x x +o x x = fx +t fx T d+ot fx +Lt + ot, were Step 2 follows from te differentiability of f at x. Consequently, tere exists ˆα 0,α] suc tat for all x = x +td FX wit d = and t [0, ˆα: fx fx +Lt + ot fx + L 2 t. Terefore, x N ˆα x X 5 we ave x = x +td FX wit d = and t = x x < ˆα, and ε fx fx L 2 t = Lt = L x x 2ε. A conservative estimate of te number of boxes of widt δ required to cover N ˆα x X 5 can be obtained by estimating te number of boxes of widt δ required to cover ˆX 5 see Teorem 2. Te following remark is in order. Remark. Lemma 5 is not applicable wen L = 0. Tis can occur, for instance, wen x is an unconstrained minimum, in wic case oter tecniques ave to be employed to analyze te cluster problem [7, 2, 28, 29] under alternative assumptions. Tis is because wen f is differentiable at an unconstrained minimizer x, it grows slower tan linearly around x as a result of te first-order necessary optimality condition fx = 0 note tat if f is twice-differentiable at x and 2 fx is positive definite, ten f grows quadratically around x. Te assumptions of Lemma 5 may be satisfied for a constrained problem, owever, because tey only require tat te objective function grow linearly in te set of directions tat lead to feasible points in some neigborood of x. An example of L = 0 wen x is not an unconstrained minimum is: X = 2,2,

Te cluster problem in constrained global optimization 3 m I = 2, m E = 0, fx = x 3, g x = x, and g 2 x = x wit x = 0. In tis example, te objective function only grows cubically around x in te direction from x tat leads to feasible points. From Lemma 4, we ave tat a sufficient condition for te key assumption of Lemma 5 to be satisfied is min d: d =,d Tx fx T d > 0. It is not ard to sow tat tis condition is also necessary wen f is differentiable at x. Proposition 2 sows tat te assumptions of Lemma 5 will not be satisfied wen Problem P does not contain any active inequality constraints and te minimizer corresponds to a KKT point for Problem P. 2. ˆα depends on te local beavior of f around x, but is independent of ε since it is determined by te subset of Nα x FX on wic te affine function fx + L 2t underestimates fx. Consequently, for sufficiently small ε, ˆX 5 = x X : L x x 2ε since x X : L x x 2ε will ten be a subset of N ˆα x. Note tat te factor 2 in te denominator of L 2 t is arbitrarily cosen; any factor > can instead be cosen wit a corresponding ˆα. Furtermore, x is necessarily te unique global minimizer of Problem P on N ˆα x since L > 0. 3. If, in addition to te assumptions of Lemma 5, f is assumed to be convex on Nα x, ten we can coose ˆα = α. Additionally, N ˆα x X 5 can be conservatively approximated by x X : L x x ε wen ε is small enoug. 4. Te estimate ˆX 5 becomes less conservative as ε is decreased since te iger order term ot 0 as ε 0. Simply put, tis is because te affine approximation fx +Lt provides a better description of f as ε 0. In fact, under te assumptions of Lemma 5, a less conservative estimate of X 5 can be obtained by accounting for te fact tat not all points x x N ˆα x : L x x 2ε satisfy fx T x x L x x. Proposition Consider Problem P, and suppose te assumptions of Lemma 5 are satisfied. Ten, ˆα 0, α] suc tat te region N ˆα x X 5 can be conservatively approximated by ˆX 5 = x N ˆα x : L x x 2ε, L x x fx T x x. Proof Te desired result follows from Lemma 5 and te fact tat from te assumptions of Lemma 5. fx T x x L x x, x N α x FX, As an illustration of te application of Lemma 5, let us reconsider Example 2. Recall tat X = 2.2,2.5 2.9,3.3, m I = 3, m E = 0, fx = x, g x = 2x 4 + 8x3 8x2 2, g 2x = 4x 4 + 32x3 88 +96x 36, and g 3 x = 3 wit x 2.33,3.8. Let ε 0.07. We ave FX= x X : gx 0, fx =,, α = +, L 0.649, and X 5 = x X : gx 0, x fx +ε. Coose ˆα = + in Lemma 5. From Lemma 5 and Remark, we ave ˆX 5 = x : 0.649 x x ε since f is convex. Figure 2a plots X 5 and ˆX 5 for ε = 0.07, and Figure 2b sows te improvement in te estimate wen Proposition is used, in wic case we obtain ˆX 5 =x : 0.649 x x ε, 0.649 x x x x x 2. Note tat an even better estimate of X 5 may be obtained by using knowledge of te local feasible set Nα x FX. However, oter tan in some special cases see Lemma 6, we sall stick wit te estimate ˆX 5 from Lemma 5 since we are mainly concerned wit te dependence of te extent of clustering on te convergence rate of te lower bounding sceme. Before we provide an estimate of te number of boxes of widt δ required to cover N ˆα x X 5, we provide a few more examples tat satisfy te assumptions of Lemma 5 and present an approac tat could elp determine if its assumptions are satisfied. Example 5 illustrates anoter inequality-constrained case wic satisfies te assumptions of Lemma 5. Note tat te minimizer x does not satisfy te KKT conditions in tis case. Example 5 Let ε, X = 2,2, m I = 3, and m E = 0 wit fx = x, g x = x 3, g 2 x = x, g 3 x = x, and x = 0. We ave FX = [,0], fx =, α = +, L =, and X 5 = [ ε,0]. Coose ˆα = + in Lemma 5. From Lemma 5 and Remark, we ave ˆX 5 = [ ε,+ε] since f is convex.

4 Roit Kannan, Paul I. Barton 3.3 3.3 3.2 3. 3.2 3. 3 3 2.9 2.9 2.2 2.25 2.3 2.35 2.4 2.45 2.5 x a X 5 and estimate ˆX 5 from Lemma 5 2.2 2.25 2.3 2.35 2.4 2.45 2.5 x b X 5 and estimate ˆX 5 from Proposition Fig. 2: Plots of X 5 solid regions and ˆX 5 te areas between te dotted lines for Example 2 for ε = 0.07 note tat we do not use ε = 0. as in Figure b because te corresponding ˆX 5 are not contained in X. Te dased lines define te set X, te filled-in triangles correspond to te minimizer x, and te das-dotted lines represent te axes translated to x. Te reader may conjecture, based on Example 5 and oter examples of low dimension, tat every nonisolated minimizer x wic does not satisfy te KKT conditions will automatically satisfy te main assumption of Lemma 5. Example 6, inspired by [0, Section 4.], owever illustrates a case wen te assumptions of Lemma 5 are not satisfied even toug x does not satisfy te KKT conditions. Example 6 Let X = 2,2 3, m I = 5, and m E = 0 wit fx = x + 3, g x = x, g 2 x = x, g 3 x = 2, g 4x = x 3, g 5 x = x 3, and x = 0,0,0. We ave FX = x [0,] 3 : = 0, fx =,0,0, and L = 0 for every α > 0 since 0,0, Tx and fx T 0,0, = 0. Te next result provides conditions under wic te assumptions of Lemma 5 will not be satisfied. In particular, it is sown tat te assumptions of Lemma 5 will not be satisfied if Problem P is purely equalityconstrained and all te functions in Problem P are differentiable at a nonisolated x. Proposition 2 Consider Problem P wit m E. Suppose x is nonisolated, f is differentiable at x, functions k, k =,,m E, are differentiable at x, and A x = /0. Furtermore, suppose tere exist multipliers λ R m E corresponding to te equality constraints suc tat x,0, λ is a KKT point. Ten Proof See Appendix A.2. min d: d =,d Tx fx T d = 0. Note tat te above result can naturally be extended to accommodate weakly active inequality constraints see [2, Section 4.4]. Te ensuing examples illustrate tat te assumptions of Lemma 5 may be satisfied wen individual assumptions of Proposition 2 do not old. Example 7 Let ε 0.5, X = 2,2 2,2, m I =, and m E = wit fx= x +0 2, gx = x, x = x, and x = 0,0. We ave FX = x X : x =,x, fx =,0, α = +, L = 0.5, and X 5 = x [0,ε] [ ε,ε] : x =,x + 0 2 ε. Coose ˆα = + in Lemma 5. From Lemma 5 and Remark, we ave ˆX 5 = x X : x 2ε since f is convex. Example 8 Let ε 0.5, X = 2,2 2,2, m I = 4, and m E = wit fx = x +, g x = x, g 2 x =, g 3 x = x, g 4 x =, x = x 3, and x = 0,0. We ave FX = x [0,] 2 : = x 3, fx =,, α = +, L =, and X 5 = x [0,ε] [0,ε] : = x 3,x + ε. Coose ˆα = + in Lemma 5. From Lemma 5 and Remark, we ave ˆX 5 = x X : x ε since f is convex.

Te cluster problem in constrained global optimization 5 0.5 0.5 0 0-0.5-0.5 - - -0.5 0 0.5 x a X 5 and estimate ˆX 5 from Lemma 5 - - -0.5 0 0.5 x b X 5 and estimate ˆX 5 from Lemma 6 Fig. 3: Plots of X 5 solid curves and ˆX 5 left figure: area between te dotted lines, rigt figure: curve depicted by te circles for Example 8 for ε = 0.5. Te filled-in triangles correspond to te minimizer x, and te das-dotted lines represent te axes translated to x. Figure 3a plots X 5 and ˆX 5 for Example 8 for ε = 0.5. It is seen tat te estimate ˆX 5 does not capture te one-dimensional nature of X 5 wic is a consequence of te equality constraint in Example 8. Tis issue is addressed in Lemma 6. Note tat X 5 for Example 7 also resides in a reduced-dimensional manifold, but Lemma 6 does not apply in tis case since is not differentiable at x te discussion after Lemma 6 proposes a modification of te assumptions of Lemma 6 tat addresses tis issue. Wile Lemma 5 provides a conservative estimate of N ˆα x X 5 under suitable assumptions, verifying te satisfaction of its assumptions is not straigtforward. Te following proposition provides a conservative approac for determining weter te assumptions of Lemma 5 are satisfied. Proposition 3 Let Lα denote te constant L in Lemma 5 for a given α > 0. Wen te active constraints are differentiable at x, a lower bound on L 0 := lim can be obtained by solving α 0 +Lα min fx T d d s.t. d =, d L x, were L x := d R n x : g j x T d 0, j A x, k x T d = 0, k,,m E denotes te linearized cone at x. If x corresponds to a KKT point, te above formulation provides te exact value of L 0. So far in tis section, we ave establised conditions under wic a conservative estimate of te subset of X 5 around a minimizer x can be obtained, presented examples for wic te above conditions old, and isolated a class of problems for wic te above conditions are not satisfied. Te following teorem follows from Corollary 2. in [28], te proof of wic is rederived in Appendix A for completeness. It provides a conservative estimate of te number of boxes of widt δ required to cover ˆX 5 from Lemma 5. Terefore, from Lemma and te result below, we can get an upper bound on te worst-case number of boxes required to cover N ˆα x X 5 and estimate te extent of te cluster problem on tat region recall from Remark tat te subset of X 5 around x will be contained in N ˆα x for sufficiently small ε. β Teorem 2 Suppose te assumptions of Lemma 5 old. Let δ = τ, r = 2ε L.. If δ 2r, let N =.

6 Roit Kannan, Paul I. Barton 2. If 2r m > δ 2r m for some m N wit m n x and 2 m 5, ten let m N = i=0 2 i nx m 3 + 2n x. i 3 3. Oterwise, let N = 2τ β ε nx β L 2τ β ε β L + 2n x τ β ε β L. Ten, N is an upper bound on te number of boxes of widt δ required to cover ˆX 5. Proof See Appendix A.3. Remark 2 Under te assumptions of Lemma 5, te dependence of N on ε disappears wen te lower bounding sceme as first-order convergence on N ˆα x FX, i.e., β =. Terefore, te cluster problem on X 5 may be eliminated even using first-order convergent lower bounding scemes wit sufficiently small prefactors. Tis is in contrast to unconstrained global optimization were at least second-order convergent lower bounding scemes are required to eliminate te cluster problem see Remark for an intuitive explanation for tis qualitative difference in beavior. Note tat te dependence of N on te prefactor τ can be detailed in a manner similar to Table in [29]. Te above scaling as also been empirically observed by Goldsztejn et al. [9], wo reason removes te tangency between te feasible set and te objective level set, and terefore sould prevent te cluster effect. Te next result refines te analysis of Lemma 5 wen Problem P contains equality constraints tat can locally be eliminated using te implicit function teorem [22]. Lemma 6 Consider Problem P wit m E < n x. Suppose x is nonisolated, f is differentiable at x, and α > 0 suc tat is continuously differentiable on N α x and L := inf d: d =, t>0 :x +td N α x FX fx T d > 0. Furtermore, suppose te variables x can be reordered and partitioned into dependent variables z R m E and independent variables p R n x m E, wit x z,p, suc tat z z,p is nonsingular on Nα z,p, were x z,p. Ten, α p,α z 0,α], a continuously differentiable function φ : Nα p p Nα z z, and ˆα 0,α p suc tat te region Nα z z N ˆα p X 5 can be conservatively approximated by ˆX 5 = z,p N α z z N ˆα p : z = φp, L p p 2ε. Proof Te result follows from te proof of Lemma 5 and te implicit function teorem [22, Capter 9]. Lemma 6 effectively states tat, under suitable conditions, te subset of X 5 around x resides in a reduceddimensional manifold. Figure 3b compares te estimate ˆX 5 obtained from Lemma 6 wen we assume precise knowledge of te implicit function wit te one obtained from Lemma 5 for Example 8. Te reason for distinguising between α p and ˆα is so tat we can ave φ to be continuously differentiable on cl N ˆα p ; tis fact will be used sortly. Note tat te assumptions tat is continuously differentiable on Nα x and z z,p is nonsingular on Nα z,p can be relaxed based on a nonsmoot variant of te implicit function teorem [6, Capter 7] wic can be used to derive a less conservative estimate of X 5 for Example 7, for instance. Te following corollary of Teorem 2 refines te estimate of te number of boxes of widt δ required to cover ˆX 5 under te assumptions of Lemma 6. It provides an upper bound on te number of boxes of widt δ required to cover X 5 tat scales as O Teorem 2. ε n x m E β in contrast to te scaling O ε n x β from

Te cluster problem in constrained global optimization 7 β 2ε Corollary 2 Suppose te assumptions of Lemma 6 old. Let δ =, r = τ L. Define M k :=. If δ 2r, let N = M k. 2. If 2r m > δ 2r k K max p cln ˆα p φ kp K := k,,m E : M k >. nx m E, k,,m E, m for some m N wit m n x m E and 2 m 5, ten let m N = 2 i nx m E m 3 + 2n x m E i=0 i 3 3. Oterwise, let N = 2τ β ε β nx m E L 2τ β ε β L + M k. k K 2n x m E τ β ε β L M k. k K Ten, N is an upper bound on te number of boxes of widt δ required to cover ˆX 5. Proof Teorem 2 can be used to obtain an overestimate of te number of boxes of widt δ required to cover te projection of ˆX 5, as defined by Lemma 6, on p, i.e., p N ˆα p : L p p 2ε, by replacing n x wit n x m E in te expressions for N. Tis estimate can be extended to obtain a conservative estimate of te number of boxes of widt δ required to cover ˆX 5 as follows. Note tat φ k is Lipscitz continuous on cl N ˆα p wit Lipscitz constant Consider any box B of widt δ tat is used to cover te projection of ˆX 5 on p. We ave w φ k B cl N ˆα p M k δ, k,,m E, M k nx m E, k,,m E. from te Lipscitz continuity of φ k. Terefore, we can replace te box B using M k suc boxes and translate tem appropriately to cover te region z,p N αz z B N ˆα p : L p p 2ε, z = φp. Since B B N ˆα p covers te projection of ˆX 5 on p, te desired result follows by multiplying te estimate obtained from Teorem 2 wit n x replaced by n x m E by M k. k K Te next result provides a natural extension of Lemma 5 to te case wen te objective function is not differentiable at te minimizer x [28]. Note tat a similar result was derived for te case of unconstrained optimization in [28, Section 2.3] under alternative assumptions. Lemma 7 Consider Problem P. Suppose x is nonisolated, f is locally Lipscitz continuous on X and directionally differentiable at x, and α > 0 suc tat L := inf d: d =, t>0 :x +td N k K α x FX f x ;d > 0. Ten, ˆα 0,α] suc tat te region N ˆα x X 5 can be conservatively approximated by ˆX 5 = x N ˆα x : L x x 2ε.

8 Roit Kannan, Paul I. Barton Proof Te proof is relegated to Appendix A.4 since it is similar to te proof of Lemma 5. Remark 3 Teorem 2 can be extended to te case wen te assumption tat te function f is differentiable at x is relaxed by using Lemmata and 7 and Corollary 2. in [28] also see Teorem 2. Similar to te differentiable case, te dependence of N on ε disappears wen te lower bounding sceme as first-order convergence on N ˆα x FX, i.e., β =. Additionally, Lemma 6 and Corollary 2 can also be extended to te case wen f is not differentiable at x under suitable assumptions. Tus far, we ave establised conditions under wic first-order convergence of te lower bounding sceme at feasible points is sufficient to mitigate te cluster problem on X 5. In te remainder of tis section, we will present conditions under wic second-order convergence of te lower bounding sceme is sufficient to mitigate clustering on X 5. Te first result in tis regard provides a conservative estimate of te subset of X 5 around a nonisolated x under te assumption tat te objective function grows quadratically or faster on te feasible region in some neigborood of x. Lemma 8 Consider Problem P, and suppose f is twice-differentiable at x. Suppose α > 0,γ > 0 suc tat fx T d+ 2 dt 2 fx d γd T d, d d : x + d N 2 α x FX. Ten ˆα 0,α] suc tat te region N 2 ˆα x X 5 can be conservatively approximated by ˆX 5 = x N ˆα 2 x : γ x x 2 2ε. Furtermore, x is te unique global minimizer for Problem P on N 2 ˆα x. Proof Let x = x + d N 2 α x FX. We ave fx = fx + d = fx + fx T d+ 2 dt 2 fx d+o d 2 fx +γd T d+o d 2. Consequently, tere exists ˆα 0,α] suc tat for all x = x + d FX wit d [0, ˆα: fx fx +γd T d+o d 2 fx + γ 2 dt d. Terefore, x N 2 ˆα x X 5 we ave x = x + d FX wit d < ˆα, and ε fx fx γ 2 dt d = γ d 2 = γ x x 2 2ε. Te conclusion tat x is te unique global minimizer for Problem P on N 2 ˆα x follows from Equation. Remark 4. Lemma 8 is not applicable wen α > 0 and γ > 0, for example X = 2,2 2,2, m I = 2, m E = 0, fx =, g x = x 4, g 2 x =, and x = 0,0. In tis case, for any α > 0, tere exist directions from x to feasible points in wic f grows slower tan quadratically near x.