Safe and Tight Linear Estimators for Global Optimization

Size: px

Start display at page:

Download "Safe and Tight Linear Estimators for Global Optimization"

Cynthia Barton
6 years ago
Views:

1 Safe and Tight Linear Estimators for Global Optimization Glencora Borradaile and Pascal Van Hentenryck Brown University Bo 1910 Providence, RI, {glencora, Abstract. Global optimization problems are often approached by branch and bound algorithms which use linear relaations of the nonlinear constraints computed from the current variable bounds. This paper studies how to derive safe linear relaations to account for numerical errors arising when computing the linear coefficients. It first proposes two classes of safe linear estimators for univariate functions. Class-1 estimators generalize previously suggested estimators from quadratic to arbitrary functions, while class-2 estimators are novel. Class-2 estimators are shown to be tighter theoretically (in a certain sense) and almost always tighter numerically. The paper then generalizes these results to multivariate functions. It shows how to derive estimators for multivariate functions by combining univariate estimators derived for each variable independently. Moreover, the combination of tight class-1 safe univariate estimators is shown to be a tight class-1 safe multivariate estimator. Finally, multivariate class-2 estimators are shown to be theoretically tighter (in a certain sense) than multivariate class-1 estimators. 1 Introduction Global optimization problems arise naturally in many application areas, including chemical and electrical engineering, biology, economics, and robotics to name only a few. They consist of finding all solutions or the global optima to nonlinear programming problems. These problems are inherently difficult computationally (i.e., they are PSPACE-hard [4]) and may also be challenging numerically. In addition, there has been considerable interest in recent years to produce rigorous or reliable results, i.e., to make sure that the eact solutions are enclosed in the results of the algorithms. Global optimization problems are often approached by branch and bound algorithms which use linear relaations of the nonlinear constraints computed from the current bounds for the variables at each node of the search tree (e.g., [5, 1, 6, 9, 10, 15, 16, 19, 20]). The linear relaation can be used to obtain a lower bound on the objective function (in minimization problems) and/or to update the variable bounds. This approach can also be combined with constraint satisfaction techniques for global optimization which are also effective in reducing the variable bounds (e.g., [2, 3, 11, 7, 18]).

2 2 Glencora Borradaile and Pascal Van Hentenryck The linear relaation is generally obtained by linearizing nonlinear terms independently, giving what is often called linear over- and under-estimators. When rigorous and reliable results are desired, it is critical to generate a safe linear relaation which over-approimates the solution to the nonlinear problem at hand (e.g., [13, 14]). Indeed, the coefficients in the linear constraints are generally given by real functions which are subject to rounding errors when evaluated. As a consequence, the resulting linear relaation may not be safe. Moreover, naive approaches (e.g., upward rounding for the overestimators coefficients) are not safe in general either. Once a safe linear relaation is available, it can be solved eactly or safe bounds on the objective function can be obtained using duality as in [14, 8] for instance. Eperimental results (e.g., [13]) have shown that both of these two corrections are critical in practice, even on simple problems, to find all solutions to nonlinear polynomial systems. This paper focuses on obtaining safe linear relaations for global optimization problems and contains two main contributions: 1. The paper presents two classes of safe estimators of univariate functions. The first class of estimators generalizes the results of [13] from quadratic to arbitrary functions, while the second class is entirely novel. Theoretical tightness results are given for both classes, giving the relative strengths of the presented estimators. In particular, the results show that class-2 estimators (when they apply) are theoretically tighter than class-1 estimators (in a certain sense to be defined). Moreover, the numerical results indicate that class-2 estimators are almost always tighter in practice in our eperiments. 2. The paper then generalizes the univariate results to multivariate functions. It shows how to derive estimators for multivariate functions by combining univariate estimators derived for each variable independently. Moreover, the combination of tight class-1 safe univariate estimators is shown to give a tight class-1 safe multivariate estimator. Finally, univariate relative tightness results are shown to carry over to the multivariate case, i.e., multivariate class-2 estimators are shown theoretically tighter (in a certain sense) than multivariate class-1 estimators. As a consequence, these results provide a systematic, comprehensive, and elegant framework to derive safe linear estimators for global optimization problems. In conjunction with the safe bounds on the linear relaations derived in [14, 8], they provide the theoretical foundation for rigorous results in branch and bound approaches to global optimization based on linear programming. The rest of this paper is organized as follows: Section 2 defines the concept of safe estimators and the problems arising in deriving them. Section 3 derives the two classes of linear estimators for univariate functions. Section 4 presents the theoretical and numerical tightness results. Section 5 presents the multivariate results.

3 2 Definitions and Problem Statement Safe and Tight Linear Estimators 3 This section defines the concepts of linear estimators and safe linear overestimators, as well as the problem tackled in this paper. For simplicity, the definitions are given for univariate functions only, but they generalize naturally to multivariate functions. We only consider overestimators, since the treatment for underestimators is similar. A linear overestimator is a linear function which provides an upper bound to a univariate function over an interval. Definition 1 (Linear Overestimators). Let g be a univariate function { R, } R. A linear overestimator of g over the interval [, ] 1 is a linear function m + b satisfying m + b g(), [, ]. In general, given a univariate function g, linear overestimators are obtained through tangent or secant lines. These are implicitly specified by two functions f m (,, g) and f b (,, g) respectively computing the slope m and the intercept b of the estimator. 2 For instance, the secant line for the function n (n even) is the linear overestimator m + b over [, ] specified by m = n n b = n n. (1) Unfortunately, given an underlying floating-point system F and a representation of f m and f b, the computation of these functions is subject to rounding errors and will produce the approimations m and b. However, the linear function m + b is not guaranteed to be a linear estimator of g in [, ]. The main issue addressed in this paper is how to compute safe linear overestimators, i.e., linear overestimators m + b where m and b are floating-point numbers in the underlying floating-point system F. Definition 2 (Safe Linear Overestimators). Let g be a univariate function R R. A safe linear overestimator of g over interval [, ] is a linear overestimator m + b for g over [, ], where m, b F. Notations In the following, we often abuse notation and use m and b to represent the functions f m and f b. The critical point to remember is that m and b cannot be computed eactly and may involve significant rounding errors. Also, given an epression e, we use e and e to denote the most precise lower and upper approimation of e at our disposal given F and the representation of e. 3 1 Without loss of generality, it will be assumed for the remainder of the paper that <. Since functions over the degenerate interval, [a, a] will not be estimated in practice, this is not a limitation. 2 More precisely, they are specified by a representation (e.g., the tet) of these two functions. 3 Note the safety results presented in this paper hold even if the approimations are not the most precise.

4 4 Glencora Borradaile and Pascal Van Hentenryck Fig. 1. Why m + b is not a Safe Linear Overestimator. The Problem At first sight, it may seem that the problem of finding a safe linear overestimator m + b is trivial: simply choose the function m + b, i.e., choose m = m and b = b. Unfortunately, as shown in Figure 1, this is not correct. The figure shows g and its linear overestimator m + b over [, ]. The estimator is correct in the R + region, but not in the R region where the slope m is too strong. Similarly, m + b is not a safe overestimator because its slope is too weak in the R + region. The value b must be chosen carefully when m = m or m. The figure shows such a choice of b. Tightness In addition to safety, one is generally interested in linear overestimators which are as tight as possible given F and the representation of f m and f b. Definition 3 (Error of Linear Overestimators). Let g be a univariate function R R. The error of a linear overestimator m + b for g over [, ] is given by m + b g() d. Definition 4 (Tightness of Linear Overestimators). Let g be a univariate function R R. Let l 1 and l 2 be two linear overestimators of g over [, ]. l 1 is tighter than l 2 wrt g and [, ] if l 1 has a smaller error than l 2, 3 Safe Linear Overestimators for Univariate Functions This section describes two classes of safe overestimators. These estimators are derived from the linear overestimator m + b. In other words, the goal is to find

5 Safe and Tight Linear Estimators 5 Fig. 2. A Safe Overestimator when 0. m and b in F such that m + b m + b, [, ]. (2) As mentioned, the first class generalizes the results of Michel et al. [13] who gave safe estimators for 2. The second class is entirely new and enjoys some nice theoretical and numerical properties. 3.1 A First Class of Safe Overestimators To obtain a safe linear overestimator m + b from m + b, there are only two reasonable choices for m : m and m. Other choices would necessarily be less tight. We derive the safe overestimators for m only, the derivation for m being similar. The problem thus reduces to finding b such that m + b m + b. (3) Since m m, it is sufficient to satisfy (3) at =, which implies b b ( m m). The overestimator can now be derived by a case analysis on the sign of. If 0, we have b ( m m) b ( m m ) = b err(m), where err(m) = m m. Therefore, choosing b = b err(m) satisfies (3). If 0, it is sufficient to choose b = b, as shown in Figure 2. The following theorem summarizes the results.

6 6 Glencora Borradaile and Pascal Van Hentenryck Theorem 1 (Safe Linear Overestimators for Univariate Functions, Class- 1). Let g be a univariate function and let m + b be a linear overestimator for g in [, ]. We have that m + b + err(m) if 0 m + b if 0 m + b m + b err(m) if 0 m + b if 0. As a consequence, the four right-hand sides are safe linear overestimators for g in [, ] under the specified conditions. Proof. We give the proofs for the first two cases. The proofs are similar for the symmetric cases. In the first case, we have m + b + err(m) = m ( s) + b + err(m) letting = s with 0 s m ( s) + b + err(m) = m ( s) + b + ( m m ) = m m s + b m m s + b m m since 0 m ms + b m s ms since s 0 = m + b In the second case, since 0, we have that m m and hence m + b m + b. 3.2 A Second Class of Safe Overestimators In general, the overestimators used in global optimization are either secant lines of g (as in Figure 1) or tangent lines to g at or (see, for instance, [9]). As a consequence, we have that g() = m + b and/or g() = m + b. In these circumstances, it is possible to find a b satisfying (3) which does not depend on the sign of. Assume that g() = m + b. Since b must satisfy m + b m + b = g(), it follows that b g() m. Choosing b = g() m also satisfies (3). This choice for b enjoys some nice theoretical and eperimental properties as detailed in Section 4. Figure 3 illustrates this situation. Theorem 2 (Safe Linear Overestimators of Univariate Functions, Class- 2). Let g be a univariate function g and let m + b be a linear overestimator for g in [, ]. We have that { m + g() m if g() = m + b m + b m + g() m if g() = m + b As a consequence, the right-hand sides are safe linear overestimators for g in [, ] under the specified conditions.

7 Safe and Tight Linear Estimators 7 Fig. 3. Choosing b = g() m which is almost always tighter than m + b. Proof. We give the proof for the first case. The proof is similar for the symmetric case. We have m + g() m = m ( s) + g() m letting = s with s 0 m ( s) + g() m = m + b m s since g() = m + b m + b ms m s ms since s 0 = m + b. 4 Tightness of Safe Linear Overestimators Theorems 1 and 2 provide us with si safe overestimators of a univariate function. Several of the conditions for these estimators are not mutually eclusive and it is natural to study their relative tightness. Of course, it is possible to use all applicable safe estimators in the linear relaation, but this may be undesirable for numerical and efficiency reasons. This section presents theoretical and eperimental results on the tightness of the estimators. 4.1 Theoretical Results on Class-1 Estimators This section studies the tightness of class-1 estimators. We first compare the estimators m + b + err(m) and m + b err(m) which are both

8 8 Glencora Borradaile and Pascal Van Hentenryck Fig. 4. Finding the optimal floating point representation. In this case m + b 2 is a tighter estimator than m + b 1. applicable when 0 [, ]. The result shows which estimators to choose according to the magnitude of and. Figure 4 illustrates this. Theorem 3 (Tightness of Class-1 Safe Linear Overestimators when 0 [, ]). Let g be a univariate function, m + b be a linear overestimator for g in [, ], and < 0 and > 0. The safe linear overestimator m + b+err(m) is tighter than the safe linear overestimator m + b err(m) if <. Similarly, the safe linear overestimator m + b err(m) is tighter than the safe linear overestimator m + b + err(m) if <. Proof. To compare the estimators m + b + err(m) and m + b err(m), we compare the relative tightness of the slightly tighter estimators m + b + err(m) and m + b err(m). Their tightness is easier to determine and approimates well the actual relative tightness, since the rounding errors in computing b + err(m) and b err(m) are comparable. First consider the error of m + b + err(m): E 1 = = ( m + b + err(m) g()) d ( m + b + err(m) (m + b) + m + b g()) d

9 = (( m m) + err(m)) d + Safe and Tight Linear Estimators 9 (m + b g()) d }{{} [ = ( ) ( m 12 m 12 ) ( 1 m + 2 m 1 )] 2 m The error of m + b err(m) is similarly given by: E 2 = = ( ) ( m + b err(m) g()) d E + E. [ ( 1 2 m 1 ) 2 m + ( m 12 m 12 )] m + E. Estimator m + b + err(m) is tighter than estimator m + b err(m) when E 1 < E 2, i.e., when <. Similarly, estimator m + b err(m) is tighter than estimator m + b + err(m) when <. When 0, it is interesting to compare estimators m + b + err(m) with m + b, and m + b with m + b err(m) when 0. Theorem 4 (Tightness of Class-1 Safe Linear Overestimators when 0 / [, ]). Let g be a univariate function and m + b be a linear overestimator for g in [, ]. When 0, m + b is tighter than m + b + err(m). When 0, m + b is tighter than m + b err(m). Proof. Consider the slightly tighter and easily comparable estimators m + b and m + b + err(m). The error of m + b + err(m) is: E 1 = ( m + b + err(m) g()) d = 1 ( ) [ (2 m m m) + ( m m)] + E, 2 where E is the error in m + b. Likewise the error of m + b is: E 2 = ( m + b g()) mathrmd = 1 ( )( m m)( + ) + E 2 m + b is tighter than m + b + err(m) when E 2 < E 1 which reduces to <. Since this condition is always met, m + b is always tighter than m + b + err(m). Similarly, m + b is tighter than m + b err(m) when 0. Theorems 3 and 4 generalize and provide the theoretical justification for the heuristic used by Michel et al. [13].

10 10 Glencora Borradaile and Pascal Van Hentenryck 4.2 Theoretical Results on Class-2 Estimators We now study the tightness of Class-2 estimators. The net theorem compares the real counterparts of the two class-2 operators. Theorem 5 (Tightness of Class-2 Safe Linear Overestimators). Let g be a univariate function with linear overestimator m + b over [, ] such that g() = m + b, g() = m + b. The function m + g() m is a tighter estimator than m + g() m when m m < m m. Proof. The error of m + g() m is given by E 1 = ( m + g() m g()) d = 1 2 ( )2 (m m ) + E, where E is the error of m + b. Likewise the error in m + g() m is E 2 = 1 2 ( )2 ( m m) + E. The m formulation is tighter when E 1 < E 2 which reduces to m m < m m, i.e., when m is a better approimation of m than m. Of course, this result is not useful in practice, since m is not known and there is no way to evaluate the condition stated in Theorem 5. The theorem only considers the real counterparts of the estimators, i.e., it ignores the rounding errors in the actual evaluation of the operators. In other words, since these operators can be rewritten as m ( ) + g() and m ( ) + g(), Theorem 5 applies to the Class-2 estimators whenever the rounding errors in these two terms are similar which in turn requires that g() g(). Fortunately, the net section, which relates both classes, gives a criterion to choose between them. 4.3 Theoretical Results on Class-1 and Class-2 Estimators This section compares the Class-1 and Class-2 overestimators. Its main result shows that class-2 estimators are always theoretically tighter than the corresponding optimal class-1 estimators. The theorems are given for the m estimators, but similar results hold for the m estimators. Theorem 6 (Relative Tightness of Class-1 and Class-2 Safe Linear Overestimators). Let g be a univariate function with linear overestimator m + b over [, ] such that g() = m + b. The class-2 estimator with real intercept, m + g() m, is always tighter than the optimal (using the rules given in Theorems 3 and 4) class-1 estimator with real intercept when.

11 Safe and Tight Linear Estimators 11 Proof. When 0, we have m + g() m m + g() (m + err(m)) = m + b err(m) which is the class-1 estimator with real intercept when 0. When 0, we have m + g() m m + g() m = m + b which is the class-1 estimator with real intercept when 0. Theorem 6 is interesting for many reasons. Although this result compares estimators with real intercepts, it etends to estimators with floating point intercepts. Notice that, for eample, g() m is simply a specific way to compute the intercept b. In most cases (for eample, Equation 1), err(b) err(g() m ), ecept when significant simplifications can be made to the formula defining b (for eample, Equation 1 when n = 2). Since the rounding errors are approimately equivalent in these final computations, this relative tightness result will frequently hold for estimators with floating point intercept. The eperimental results in Section 4.4 confirms this. Second, it provides intuition as to why class-2 estimators are tighter than class-1 estimators. A class-1 operator systematically accounts for an error err(m) = m m in the slope, while a class-2 estimator only adds an error m m (resp. m m ). Third, since class-2 operators can be viewed as tighter versions of the class-1 operators, similar criteria can be applied to choose between them, solving the problem left open in the previous section. For completeness, Appendi A compares the class-2 safe linear overestimator m + g() m with the class-1 estimators that uses m = m. This result is not useful in practice when the class-2 operator using m is available (which is always the case for secant lines for instance). In this case, one should choose the other class-2 estimator which is guaranteed to be tighter theoretically. However, the result sheds some light on the relationships between the two classes of estimators and indicate that, in general, class-2 estimators should be preferred. 4.4 Numerical Results We now compare numerically class-1 and class-2 estimators to confirm the findings of Theorem 6. More precisely, we compare m + g() m with m + b + err(m) and m + g() m with m + b err(m) numerically for 0 [, ] using the above heuristics to choose between the pairs. We use even powers for comparison purposes for which linear overestimators are given as follows. g() = n n n } {{ } m + n n } {{ } b, n even, [, ]. (4)

12 12 Glencora Borradaile and Pascal Van Hentenryck Fig. 5. Numerical Results for 2. This general term can be simplified using m = + and b = when n = 2. Moreover, since (4) is a secant of n, g() = m + b at both = and =. Given the theoretical results, it is easy to derive a set of numerical eperiments. When, we only compare the intercepts g() m and b+err(m), using the estimators with the same slopes. The smallest intercept gives the tightest estimator. Likewise when <, we compare the intercepts g() m and b err(m). Figures 5 and 6 depict the eperimental results. To compute b, our numerical results use the specialized form for n = 2 (Figure 5) and the general form (Figure 6) otherwise. Random values were generated for and in a wide range of values. The results were collected region by region. We computed the percentage of cases where class-2 estimators were strictly tighter than class-1 estimators (%C21) and vice-versa (%C12). The figures report the difference (%C21 - %C12). For the quadratic case, Figure 5 shows that class-2 estimators are very often tighter than class-1 operators. Typically, class-2 estimators are tighter in about 40-50% of the cases, while class-1 estimators are tighter in about 0-10% of the cases (they have equal tightness in the remaining cases). It is only when the two bounds are about the same size that class-1 improves over class-2 in about 40-50% of the cases. Figure 6 depicts the results for n 4 to n 10. The improvements of class-2 estimators here is striking. Class-2 estimators improve over class-1 estimators in more than 99% of the cases. Of course, this is not surprising in light of the proof of Theorem 6 since the errors in the slope multiplying or become larger for class-1 estimators and the functions f m and f b are also more comple than n when n > 2, leading to more pronounced rounding errors. Note also the improvement in tightness is proportional to the magnitude of the bounds.

13 Safe and Tight Linear Estimators 13 Fig. 6. Numerical Results for 4 to 10. The absolute improvement in the value of b is on the order of percent in the cases shown in Figure 6. Although this may seem like a small gain, it is significant, especially in the cases when b is large or when is large. In summary, the eperimental results confirm the theoretical results and clearly demonstrate the value of class-2 estimators. 5 Safe Linear Overestimators for Multivariate Functions We now turn our attention to safe linear overestimators for multivariate functions. Multivariate functions frequently appear in global optimization. They can be estimated using a hyperplane in n dimensions. For eample, a linear overestimator for the bilinear term y (e.g., [9, 15]) is given by two planes: y min{y + y y, y + y y}, (, y) [, ] [y, y] (5) Since slopes of the two planes given in (5) are floating-point numbers, safe estimators for this case are given simply by rounding up the intercepts y and y. The overestimator for the term y used by [15, 20] has slopes that are nonlinear

14 14 Glencora Borradaile and Pascal Van Hentenryck functions of floating point numbers: y min{ 1 1 yy (y y + y), yy (y y + y)}, (, y) [, ] [y, y], y > 0 (6) In order to represent (6) with floating point numbers, special care must be taken in order to satisfy the two dimensional version of (2). More generally, estimators for multivariate functions can be obtained through estimators for factorable functions (see, for instance, [12]). The above discussion indicates that it is critical in practice to generalize our results to multivariate functions. The problem can be formalized as follows (for overestimators): given an n-dimensional hyperplane overestimating an n-variate function g() : R n R, = ( 1,..., n ), over [ 1, 1 ] [ n, n ], the goal is to find m 1,..., m n, b F such that: m i i + b m i i + b g(), [ 1, 1 ] [ n, n ]. (7) The first key result in this section is to show that safe linear estimators for multivariate functions can be derived naturally by combining univariate linear estimators. In other words, the result makes it possible to consider each variable independently in the estimator m i i + b, replace m i and b by their safe counterparts m i and b i, and combine all the individual coefficients into a safe estimator for the multivariate function. One of the interesting aspects of this result is the ability to factor b out from the b i s to obtain a tight estimator. Theorem 7 (Safe Linear Overestimators for Multivariate Functions). Let g be an n-variate function and let n m i i +b be an overestimator for g in [ 1, 1 ] [ n, n ]. Let m i i + b i be a safe linear overestimator of m i i + b in [ i, i ] and δi = b i b, 1 i n. Then, the hyperplane m i i + b + δi is a safe linear overestimator for g in [ 1, 1 ] [ n, n ]. Proof. We show that m i i + b + δi m i i + b.

15 Safe and Tight Linear Estimators 15 We have m i i + b + δi = = m i i + b + δ i (m i i + b i b) + b (m i i + b i ) (n 1)b (m i i + b) (n 1)b m i i + b. Note that class-1 estimators trivially satisfy the requirements of this theorem. As a consequence, the theorem gives an elegant and simple way to obtain safe overestimators for multivariate functions. (A similar result can be derived for underestimators). 5.1 Class-1 Safe Multivariate Linear Estimators We now investigate the tightness of class-1 safe multivariate estimators derived using Theorem 7. The class-1 safe multivariate estimators are derived by combining class-1 univariate estimators. Our first result is given by Theorem 7: Theorem 8 (Tightness of Class-1 Safe Multivariate Overestimators). Class-1 multivariate overestimators derived using Theorem 7 are as tight as possible for a given choice of rounding direction for each variable (i.e., m i or m i ). Proof. The safe overestimating hyperplane given by Theorem 7 can be written as: m i i + j j + b i I K j J L m err(m k ) k + err(m l ) l, k K l L where sets I, J, K and L partition {1,..., n}. Using the same choices for m and m defined by this partition, we now calculate b such that i I K m i i + j J L m j j + b n m i i + b. It is sufficient to satisfy this inequality at ( 1,..., n ) where { i : i J L i = : otherwise i due to the combination of m and m given by the partition. b must satisfy m i i + m j j + b m i i + m j j + b. i I K j J L i I K j J L

16 16 Glencora Borradaile and Pascal Van Hentenryck By a case analysis on the sign of i for each i: m i i + m j j + b i I K j J L m i i + m j j + m k k + l l + b i I j J k K l L m Therefore, m i i + m j j + b i I K j J L m i i + m j j + m k k + l l + b. i I j J k K l L m Collecting terms: b b k K err(m k) k + l L err(m l) l. Correctly rounding this results in the same hyperplane as constructed by the theorem. It remains to choose appropriate rounding directions for each variable. The net theorem shows that the combination of tight class-1 univariate estimators gives a tight class-1 multivariate estimator. Theorem 9 (Optimality of Class-1 Safe Multivariate Overestimators). A multivariate class-1 safe estimator derived using Theorem 7 by choosing optimal class-1 univariate safe estimators for each variable is an optimal class-1 multivariate safe estimator. Proof. Consider the hyperplane: H = m i i + j j + b i I K j J L m err(m k ) k + err(m l ) l, (8) k K l L where I = {i : i 0}, J = {j : j 0}, K = {k : 0 ( k, k ), k < k } and L = {l : 0 ( l, l ), l l } partition {1,..., n}. By theorem 7, H is safe. By theorem 8, H is as tight as possible given the choices of m and m defined by the partition. Consider the error between a slightly tighter version of (8), given by an unrounded intercept, and n m i i + b. The remaining error d 1 n n ( 1 n d n 1 m i i +b g) is constant over all 2 n possible safe class- 1 overestimating hyperplanes. The error is defined by the natural etension of

17 Safe and Tight Linear Estimators 17 Definition 3 to higher dimensions: 1 n E = m i i + j j + b 1 n i I K j J L m err(m k ) k k K + ( n )} err(m l ) l m i i + b d n d 1 l L = 1 1 n n i I K ( m i m i ) i + j J L ( m j m j ) j k K err(m k ) k + l L err(m l ) l } d n d 1 Using the integrals n n n n d n d 1 = n ( j j ) = P j=1 d n d 1 i = 1 2 (2 i 2 i ) j i = 1 2 ( i + i )P the error is rewritten as: 1 E = P 2 ( m i m i )( i + i ) + i I K err(m k ) k + } err(m l ) l k K l L = P ( m i m i )( i + 2 i ) + i I j J j J L ( j j ) 1 2 ( m j m j )( j + j ) ( m j m j )( j + j ) + k K{( m k m k ) k + (2 m k m k m k ) k } (9) + l L{( m l m l ) l + (2 m l m l m l ) l } } Define the type of a variable as the set (one of I, J, K, or L) to which it belongs. We show that another partition, {I, J, K, L }, does not define a tighter hyperplane, H. Since the error given by (9) is linear in the type of variable, we can consider each variable independently. The hyperplanes H and H can differ in any of the following four ways while remaining safe:

18 18 Glencora Borradaile and Pascal Van Hentenryck A variable, i, in set I can be moved to set L producing I = I\{ i } and L = L { i }. This will add the term P err(m i ) i 0 to the error, thereby increasing the total error. The hyperplane H remains tighter than H. A variable, i, in set J can be moved to set K producing J = J\{ i } and K = K { i }. This will add the term P err(m i ) i 0 to the error, thereby increasing the total error. The hyperplane H remains tighter than H. A variable, i in set K can move to set L producing K = K\{ i } and L = L { i }. The difference in error is: P 2 {( m i m i ) i + (2 m i m i m i ) i ( m i m i ) i (2 m i m i m i ) i } = P 2 err(m){ i + i } 0 since the variable in K satisfies >. Since the error does not decrease with this change, H remains tighter than H. Likewise, a variable moving from set L to set K increases the error of the overestimating hyperplane. By the above arguments, the hyperplane defined by theorem 7 created by combining tight safe class-1 univariate overestimators is the tightest class-1 overestimator for an n-variate function. 5.2 Class-2 Safe Multivariate Linear Estimators The definition of class-2 estimators for multivariate functions is a direct etension of those for univariate functions. Notice that for the linear fractional term /y, the overestimator (y y + y)/yy is a plane through the points (, y), (, y), (, y). As a result, a class-2 estimator can be defined at any of these points. In general, a purely conve n-variate function will have a linear overestimator passing through at least n + 1 points (the secant hyperplane). A purely concave function will have a linear overestimator passing through one point (the tangent hyperplane). This information is used to define a safe overestimating hyperplane: Theorem 10 (Class-2 Safe Multivariate Overestimators). Let g be an n-variate function with linear overestimator n m i i + b over [ 1, 1 ] [ n, n ] such that g( 1,..., n ) = n m i i +b where i { i, i }, i = 1,..., n. Let M = {i i = i } and N = {i i = i }. The hyperplane i M m i i + i N m i i + g( 1,..., n ) i M m i i i N m i i is a safe linear overestimator.

19 Safe and Tight Linear Estimators 19 Proof. The proof follows the same format as for the proof of Theorem 7: i i + i M m i N m i i + g( 1,..., n ) i M m i i i N m i ( i + s i ) + i ( i s i ) i M i N m + g( 1,..., n ) m i i i i, with s i 0 i i M i N m = m i s i i s i + m i i + b i M i N m m i s i m i s i + m i i + b i N i M = i M m i ( i + s i ) + i N m i ( i s i ) + b m i i = m i i + b. Just as the univariate class-2 estimators were shown to be tighter than the class-1 estimators, the multivariate class-2 estimators are tighter than their corresponding class-1 multivariate estimators. The first result shows that an optimal class-1 estimator is always less tight than its corresponding class-2 estimator (if it eists). Theorem 11 (Relative Tightness of Class-1 and Class-2 Safe Overestimating Hyperplanes). Let g be an n-variate function with linear overestimator n m i i + b over [ 1, 1 ] [ n, n ]. Let g be such that g( 1,..., n ) = n m i i + b, where i { i, i }, i = 1,..., n. Suppose that i M m i i + i N m i i + b + n δ i is the optimal class-1 estimator with real intercept where M = {i i = i } and N = {i i = i }. The corresponding class-2 estimator with real intercept is tighter than the class-1 estimator. Proof. When the difference in error between the class-1 and class-2 estimators, E, is positive, the class-2 estimator is tighter. We calculate the difference in

20 20 Glencora Borradaile and Pascal Van Hentenryck error of the estimators with real intercepts: 1 { n E = i i + 1 n i M m m i i + b + δ i i N ( i i + i M m m i i + g( 1,..., n ) i N m i i )} m i i d n d 1 i M i N 1 { n n = δ i m i i + i i + 1 n i M m } m i i d n d 1 i N { n n = ( j j ) δ i + ( m i m i ) i } (m i m i ) i. j=1 i M i N }{{} =P 0 The sets, I, J, K, and L, are defined in Theorem 7. { E = P err(m i ) i Using the fact that and err(m i ) i + i K i L + ( m i m i ) i } (m i m i ) i i I K i J L i I K( m i m i ) i i I ( m i m i ) i + i K i J L(m i m i ) i i J (m i m i ) i + i L we get the following lower bound: { E P i I ( m i m i ) i i J err(m i ) i err(m i ) i (m i m i ) i } 0. Therefore, the class-2 estimator is tighter. This theorem etends to compare estimators with floating point estimators by the discussion following Theorem 6. Theorem 13 in Appendi A compares an optimal class-1 estimator with a class-2 operator that is not its direct counterpart. Once again, it provides a reasonable justification for preferring class-2 estimators in general.

21 Safe and Tight Linear Estimators 21 6 Conclusion Global optimization problems are often approached by branch and bound algorithms which use linear relaations of the nonlinear constraints computed from the current variable bounds at each node of the search tree. This paper considered the problem of obtaining safe linear relaations which are guaranteed to enclose the solutions of the nonlinear problem. It made two main contributions. On the one hand, it studied two classes of linear estimators for univariate functions. The first class of estimators generalizes the results in [13] from quadratic to arbitrary functions, while the second class is entirely novel. Theoretical and numerical results indicated that class-2 estimators, when they apply, are tighter than class-1 estimators. On the other hand, the paper generalized the univariate results to multivariate functions. It indicated how to derive estimators for multivariate functions by combining univariate estimators derived for each variable independently. Moreover, it showed that the combination of tight class-1 safe univariate estimators is a tight class-1 safe multivariate estimators and class-2 safe multivariate estimators are tighter than their corresponding optimal class-1 safe multivariate estimators. Together these results provide a systematic, comprehensive, and elegant framework to derive safe linear estimators for global optimization problems. In conjunction with the safe bounds on the linear relaations derived in [14, 8], they provide the theoretical foundation for rigorous results of branch and bound approaches to global optimization based on linear programming. The problem of safely representing a conve, but nonlinear, relaation used within a branch and bound scheme (as in [1, 17], for eample) is left open. Acknowledgements This work is partially supported by NSF ITR Awards DMI and ACI and by a Canadian NSERC PGS-A grant. References 1. C. Adjiman, S. Dallwig, C. Floudas, and A. Neumaier. A global optimization method, αbb, for general twice-differentiable constrained NLPs - I. theoretical advances. Computers and Chemical Engineering, 22: , F. Benhamou, D. McAllester, and P. Van Hentenryck. CLP(Intervals) Revisited. In Proceedings of the International Symposium on Logic Programming (ILPS-94), pages , Ithaca, NY, Nov F. Benhamou and W. Older. Applying Interval Arithmetic to Real, Integer and Boolean Constraints. Journal of Logic Programming, 32(1):1 24, July J. Canny. Some algebraic and geometric computations in pspace. In Proceedings of the 20th Symposium on the Theory of Computation, pages , J. Garloff, C. Jansson, and A. P. Smith. Lower bound functions for polynomials. Journal of Computation and Applied Mathematics, 157: , 2003.

22 22 Glencora Borradaile and Pascal Van Hentenryck 6. I. E. Grossmann and S. Lee. Generalized conve disjunctive programming: Nonlinear conve hull relaation. To appear P. V. Hentenryck, D. McAllister, and D. Kapur. Solving Polynomial Systems Using a Branch and Prune Approach. SIAM Journal on Numerical Analysis, 34(2), C. Jansson. Rigorous lower and upper bounds in linear programming. Technical Report 02.1, Forschungsschwerpunkt Informations and Kommunikationstechnik, TU Hamburg-Harburg, Y. Lebbah, M. Rueher, and C. Michel. A global filtering algorithm for handling systems of quadratic equations and inequations. Lecture Notes in Computer Science, 2470: , S. Lee and I. E. Grossmann. A global optimization algorithm for nonconve generalized disjunctive programming and applications to process systems. Computers and Chemical Engineering, 25: , O. Lhomme. Consistency Techniques for Numerical Constraint Satisfaction Problems. In Proceedings of the 1993 International Joint Conference on Artificial Intelligence, Chamberry, France, Aug G. P. McCormick. Computability of global solutions to factorable nonconve programs: Part I - conve underestimating problems. Mathematical Programming, 10: , C. Michel, Y. Lebbah, and M. Rueher. Safe embedding of the simple algorithm in a CSP framework. Proceedings CPAIOR 03, pages , A. Neumaier and O. Shcherbina. Safe bounds in linear and mied-integer programming. To appear. 15. I. Quesada and I. E. Grossmann. A global optimization algorithm for linear fractional and bilinear programs. Journal of Global Optimization, 6:39 76, H. S. Ryoo and N. V. Sahinidis. Global optimization of nonconve NLPs and MINLPs with applications in process design. Computer and Chemical Engineering, 19: , H. D. Sherali and W. P. Adams. A Reformulation-Linearization Technique for Solving Discrete and Continuous Nonconve Problems, volume 31 of Nonconve Optimization and Its Applications. Kluwer Academic Publishers, P. Van Hentenryck, L. Michel, and Y. Deville. Numerica: A Modeling Language for Global Optimization. The MIT Press, Cambridge, MA, USA, J. M. Zamora and I. E. Grossmann. A comprehensive global optimization approach for the synthesis of heat echanger networks with no stream splits. Computers and Chemical Engineering, 21:S65 S70, J. M. Zamora and I. E. Grossmann. A branch and contract algorithm for problems with concave univariate, bilinear and linear fractional terms. Journal of Global Optimization, 14: , 1999.

23 Safe and Tight Linear Estimators 23 A Additional Tightness Results For completeness, we compare the class-2 safe linear overestimator m + g() m with the class-1 estimators that uses m = m. This result is not useful in practice when the class-2 operator using m is available (which is always the case for secant lines for instance). In this case, one should choose the other class-2 estimator which is guaranteed to be tighter theoretically. However, the result sheds some light on the relationships between the two classes of estimators and indicate that, in general, class-2 estimators should be preferred. Theorem 12 (Relative Tightness of Class-1 and Class-2 Safe Linear Overestimators). Let g be a univariate function with linear overestimator m+ b over [, ] such that g() = m + b. The class-2 estimator with real intercept m + g() m is often tighter than the optimal (using the rules given in Theorems 3 and 4) class-1 estimator with real intercept when. Proof. Consider the difference in error (given by Definition 3) E between the class-1 estimator m +b+err(m) when and 0 (, ) (the conditions for optimality of this estimator) and the class-2 estimator with real intercept. The class-2 estimator is tighter when E > 0. We consider the difference in error: E = ( m + b + err(m) ( m + g() m )) d = 1 2 err(m)(2 2 ) + (err(m) + ( m m))( ) { } 1 = ( ) err(m)( ) + ( m m) 2 The above is positive when or, alternatively, when 1 err(m)( ) + ( m m) 0 2 m m err(m) + 2 [0.5, 1], where the final range is given by the range for : [0, ]. Assuming that m is uniformly distributed in [ m, m ], the class-2 estimator is epected to be tighter in at least 50% of the cases and the percentage tends to 100% as. By the discussion following Theorem 6, this relative tightness result etends to estimators with floating point intercepts. The result generalizes to the multivariate case and suggests that, in general, class-2 estimators should be preferred.

24 24 Glencora Borradaile and Pascal Van Hentenryck Theorem 13 (Relative Tightness of Class 1 and 2 Safe Overestimating Hyperplanes). Let g be an n-variate function with linear overestimator n m i i +b over [ 1, 1 ] [ n, n ] such that g( 1,..., n ) = n m i i +b where i { i, i }, i = 1,..., n. The corresponding class-2 estimator is often tighter than the optimal class-1 estimator. Proof. The optimal class-1 estimator is: m i i + b + δi, m i { m, m } The class-2 estimator in consideration is: m i i + g( 1,..., n ) m i i, m i = { m if i = i, m if i = i. Consider the difference in error, E, between the slightly tighter estimators: E = 1 1 n n { n m i i + b + δ i ( n m i i + g( 1,..., n ) )} m i i d n d 1 1 { n n } = (m i m i ) i + δi (m i m i ) i d n d 1 1 n n 1 = ( j j ) 2 (m i m i)( i + i ) + δi (m i m i ) i. j=1 } {{ } =P 0 The class-2 estimator is tighter when E 0. Let I, J, K, and L be the sets defined in Theorem 7. Also define a partition for the class-2 estimators:

25 Safe and Tight Linear Estimators 25 M = {i i = i } and N = {i i = i }. The difference in error becomes: 1 E = P 2 err(m 1 i)( i + i ) 2 err(m i)( i + i ) i (I K) M i (J L) N err(m i ) i + err(m i ) i i K i L (m i m i ) i + } ( m i m i ) i i M i N 1 = P 2 err(m i)( i + i ) i ((I K) M) ((J L) N) err(m i ) i + err(m i ) i i K N i L M (m i m i ) i + i M i N We now use the assumptions that: ( m i m i ) i } 1 2 err(m i)( i + i ) i (m i m i ), i M 1 2 err(m i)( i + i ) i ( m i m i ), i N These assumptions appear in the proof of Theorem 12 and the corresponding theorem for g() = m + b. The assumptions, as argued in Theorem 12, hold more often as i i when i N and as i i when i M. Using these assumptions, we find that: { E P err(m i ) i err(m i ) i = P i L M i (J L) M { i L M + i K N i K N (m i m i ) i + ( m i m i ) i + i J M ( m i m i ) i + i I N i (I K) N ( m i m i ) i ( m i m i ) i ( m i m i ) i } Upon further analysis, we find that E 0. The class-2 estimator is tighter than the optimal class-1 estimator when the above assumptions hold. Since these conditions are biased to be met more often than not, the class-2 estimator is often the tightest estimator.

1 Convexity, Convex Relaxations, and Global Optimization

1 Convexity, Convex Relaxations, and Global Optimization 1 Conveity, Conve Relaations, and Global Optimization Algorithms 1 1.1 Minima Consider the nonlinear optimization problem in a Euclidean space: where the feasible region. min f(), R n Definition (global