On Latin hypercube designs and estimating black box functions

Size: px

Start display at page:

Download "On Latin hypercube designs and estimating black box functions"

Arleen Nicholson
5 years ago
Views:

1 On Latin hypercube designs and estimating black box functions by L. Gijben [s561960] B.Sc. Tilburg University 2008 A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Operations Research and Management Science Faculty of Economics and Business Administration Tilburg University Supervisor Prof. dr. ir. E. R. van Dam September 2, 2010

2 1 Introduction A set of points x 1, x 2,..., x n {0,..., n 1} m is called an m-dimensional Latin hypercube design (LHD) when (x k ) i (x l ) i for any k l. LHDs are often used in computer experiments in conjunction with computer simulations to approximate black box functions. LHDs are particularly useful when evaluating time consuming black box functions, in which case only a limited number of points can be evaluated. In such a scenario it is important to pick an as small as possible set of points in such a way that the entire domain of the function is well represented. This property of a design is referred to as space-filling. Furthermore, as it is usually hard to tell what factors are important since the function is unknown, designs should also be noncollapsing. That is, projections of the design points on any of the axes have to be distinct. Note that LHDs automatically satisfy this property. The idea of LHDs was first introduced by McKay et al. (1979) [9]. Although all LHDs are noncollapsing, not all LHDs are (equally) space-filling. As a result different kinds of LHDs have been proposed and investigated since they were first introduced. We distinguish between three different options to differentiate within the class of LHDs; the optimization criterion used that indicates space-fillingness of a particular design, the type of LHD, that is extra conditions can be added to create a subclass of LHDs that have certain additional properties, and finally the distance measure used. A distance measure that is used often is the so called p-norm,. p. That is for any two points x, y R m, dist(x, y) = x y p = m p x i y i p In particular the Manhattan norm (l 1, p = 1), the Euclidean norm (l 2, p = 2) and the infinity norm (l, p ) are often used. Two criteria that are used often in the construction of LHDs are the so called maximin and minimax criteria, which were first introduced by Johnson et al (1990) [6]. Albeit that they introduced the criterion in a general framework. The application to LHDs was first brought up in a paper by Morris and Mitchell (1995) [8] who constructed an algorithm based on simulated annealing. Similar algorithms, based on the permutation genetic algorithm and the enhanced stochastic evolutionary (ESE) algorithm respectively, have been proposed by Bates et al. (2004) [3] and Jin et al. (2005) [5]. Improved results have been obtained by Husslage et al. in a yet unpublished paper [4], using periodic designs and the ESE algorithm. Most of the algorithms however give only approximate maximin designs, although some exact maximin designs are known. Van Dam et al. (2007) [14] derived general formulas for 2-dimensional maximin LHDs for the l 1 and the l distance measures. Furthermore, for the l 2 distance measure they obtained optimal 2-dimensional maximin LHDs for values of n 70. Similar results have been obtained by Van Dam (2008) [13] for the minimax criterion. A third criterion that often appears in literature is the following, i=1 n n i=1 j=i+1 1 d(x i, x j ) 2 where d(x i, x j ) is the Euclidean norm. This criterion was first introduced by Audze and Eglais (1977) [2] and LHDs constructed using this criterion are often referred to as Audze-Eglais designs. Besides the general LHDs, as defined above, different types of LHDs have been proposed that are said to possess very good space-filling properties. A construction method based on orthogonal arrays was proposed by Tang (1993) [12]. An n m array B with entries from S = {0, 1,..., s 1} is said to be an orthogonal array with s levels, strength t (for some 0 t m) and index λ if every n t subarray of B contains each t-tuple based on S exactly λ times as a row and is denoted as oa(n, m, s, t) with parameter λ. The construction method proposed by Tang is the following mapping from an orthogonal array oa(n, m, s, t) (B, say) with parameter λ to a n m LHD: for every column of B replace the λs t 1 position with entry k by a permutation of (k 1)λs t 1 + 1, (k 1)λs t 1 + 2,..., (k 1)λs t 1 + λs t 1, k = 1,..., s. For this class of LHDs a similar algorithm as that of Morris and Mitchell (1995) [8], based on simulated annealing, has 1

3 been proposed by Stephen Leary et al (2003) [7]. The difference being that they only search within the class of orthogonal array based Latin hypercube designs (oalhds). Another type of LHDs, orthogonal-column Latin hypercube designs (oclhds) were introduced by Philip Prescott (2009) [10]. Let A be some n m matrix where each row is a point of a LHD. Then an oclhd is such that the columns of A projected onto the hypercube [ 1, 1] m are orthogonal to each other, i.e. have zero pairwise column correlations. From the literature it becomes apparent that several different criteria and construction methods are available to generate LHDs. The most predominant criterion being the maximin criterion. An often mentioned subclass of LHDs are the oalhds. In several papers this criterion and construction method are ascribed excellent spacefilling properties, mostly on intuition-based arguments. However, literature that actually tests this intuition seems to be sparse. This is particularly odd considering that oalhds will most likely be worse, in terms of the maximin distance, compared to general LHDs, at least for three dimensions and higher. A proof of this will not be given. Instead, in Section 2, we will give a counterexample to the following statement. Statement 1. oalhds and general LHDs are equivalent in terms of the maximin distance for dimension 3 and higher. For 2-dimensional designs it seems that optimal maximin designs are also oalhds. Furthermore in this Thesis, we will be comparing several different type of LHDs and try and find out how they relate to one another, which will be the topic of Sections 2 and 3. We will then try to come to some conclusion on what would make a good LHD, which we will do in Section 4. In Section 5, we will construct an algorithm to try and create these particular type of LHDs. Finally in Section 6 and 7 we will show some results we obtained and present a few concluding remarks. 1.1 A Counterexample Consider the set of LHDs based on oa(9, 3, 3, 2) We take as a distance metric the Euclidean distance, i.e. the l 2 measure. For the general case the optimal maximin l 2 distance for a LHD is 22. This is a result taken from a website [15] that is hosted by Tilburg University, containing a number of space-filling designs. To determine the maximin l 2 distance for the oalhds one could simply try all possibilities. This would not be hard considering the limited size of the designs considered. A problem however arises in the number of possible orthogonal arrays. There seem to be quite a few possibilities and it seems hard to find all the orthogonal arrays. To avoid this issue we make the following observation. There are only two different types of orthogonal arrays that have to be considered for LHDs, all the others are isomorphisms of these 2 in terms of the LHDs resulting from them using Tang s construction method. This can be seen as follows. The mapping used, as introduced by Tang (1993) [12], assigns a 0, 1 or 2 to every 0 in the orthogonal array, 2 Figure 1: Cube consisting of 27 blocks of size 3 3 3

a 3, 4 or 5 to every 1 and a 6, 7 or 8 to every 2. In this sense every row of an orthogonal array corresponds to a 3 3 3 box in which exactly 1 point can be selected.

4 a 3, 4 or 5 to every 1 and a 6, 7 or 8 to every 2. In this sense every row of an orthogonal array corresponds to a box in which exactly 1 point can be selected. In total the entire hypercube [0, 8] 3 contains 27 such boxes from which any orthogonal array can represent only 9, see Fig 1.1. These 9 are such that they dont share any rows or columns otherwise it is no longer a oa(9, 3, 3, 2). Because of this restriction there are only 3 possible configuration for each layer (see Fig 1.1) of which (1.1a) and (1.1b) are obviously equivalent. As a result there are two different kinds of orthogonal arrays to consider. One where (1.1c) is in the middle and one where (1.1c) is on the outside. Now we can simply check all possibilities and conclude that the maximin l 2 distance for the LHD based on oa(9, 3, 3, 2) is 21. Hence we have a counterexample to Statement 1. This result appears to be persistent under the general metric. p, p 0. As noted before this is not a proof of the opposite of statement 1 however we do expect this found result to generalize to larger designs and higher dimensions. Hence we suspect the opposite of statement 1 to be correct, afterall it seems unlikely that adding extra points or an extra dimension will suddenly make oalhds and LHDs equivalent again in terms of the maximin distance. Note, for example, that a 4 dimensional LHD consisting of n points can be seen as a 3 dimensional design, consisting of n points, stretched over a series of n cubes. Figure 2: Possible configurations per layer of the cube 2 A Kriging experiment Now if we assume that for dimension three and higher oalhds are striclty worse than general LHDs in terms of the maximin distance, we would expect the latter design to be more space-filling if we also were to assume that the maximin criterion is a good way to construct space-filling designs. Which, as noted in the introduction, is a claim made in several papers. In order to be able to quantify the property we defined as space-filling we set up a small Kriging experiment where we will estimate a function using Kriging. The function we will try to estimate is the so called banana function introduced by Rosenbrock (1960) [11] which is a function that generalises to arbitrary dimensions larger or equal to 2, f (x) = m 1 i 100 ( x i+1 x 2 ) 2 i + (1 xi ) x i i {1,..., m} We shall compare the mean squared error (MSE) obtained in estimating this function using several different sizes of designs for each of these 2 types of designs. More precisely we will first evaluate the banana function at the points of the design, we will then use this information to estimate the banana function using Kriging. We will then use this function to estimate 3375 points that are on a grit spread out evenly over the domain of the banana function, { 2.048, ,..., }3. To keep the comparison as fair as possible we filter out all the outliers, that is all the errors whose value is at a distance from the mean 3

5 (MSE), larger or equal to 3 times the standard deviation of the errors. Furthermore, besides comparing maximin LHDs and oalhds, as obtained directly from the construction method introduced by Tang (1993) [12], we will also be trying oalhds that have been improved in terms of the maximin distance criterion using the algorithm of Stephen Leary et al (2003) [7]. This is to account for the possibility that a combination of the 2 types of LHD could lead to more space-filling designs as well as provide an extra possibility to test the relation between space-fillingness of a design and the maximin distance criterion. The dimension of the designs (and also of the banana function) is 3 while the distance metric chosen is the Euclidean or l 2 distance metric, the results can be found in Table 1. The used designs are not nessecarily optimal designs in terms of the maximin distance. More precisely, the general LHDs used are the current best known designs whereas the improved oalhds are arbitrarily improved oalhds (better designs in terms of the maximin distrance might be known). n LHD oalhd Improved oalhd (203) (27) (101) (244) (9) (118) (297) (14) (137) (341) (19) (154) (353) (6) (86) (406) (29) (182) (531) (22) (110) (554) (14) (107) (713) 5745 (9) 6807 (349) (758) 8906 (6) 5466 (286) (934) 2839 (11) 3581 (141) (1121) 4872 (44) 7168 (269) Table 1: MSE (squared maximin l 2 distance) from estimations of the banana function using Kriging. It immediately becomes clear that neither of the two options is strictly better than the other. That is, there appears to be no correlation between the maximin l 2 distance and the MSE nor can it be concluded that oalhds give smaller MSEs than general LHDs. The general LHDs give the best results in 6 cases. The oalhds are preferable in the other 6 cases, where in 3 cases the errors get larger after the improvement with regard to the l 2 maximin distance. 3 Minimax Latin hypercube designs We start this section by noting that the discussion in the previous sections is in no way conclusive proof that neither the maximin criterion or the oa property are capable of providing good space-filling designs. In order to get a better understanding of this more rigorous testing will ahve to be done using more designs as well as trying many more different functions. This however is not the main topic of this Thesis and we will for now assume that neither of these two types of LHDs can be said to be more space-filling than the other. Then either both types of LHDs are very space-filling or both types are not very space-filling at all. For the remainder of this Thesis we will additionaly assume the latter to be the case, that is both types of LHDs are not optimaly space-filling. This assumption immediately raises the question what criterion could and should be used instead. To try and answer this question we note that for Kriging, in general, estimations at some given point are more accurate when they lie closer to a design point. Hence, intuitively we would want to construct an LHD in such a way that all points of the corresponding hypercube are as close as possible to at least one point of the LHD. More formally we have the following objective min ρ where ρ := max A Π(m,n) x [0,n 1] m min x a(i) p (1) i {1,...,n} 4

6 and where Π(m, n) is the set of all LHDs of dimension m and size n. Furthermore, a(i) corresponds to point i of a given LHD A Π(m, n), i = 1,..., n. We call this the minimax problem. A more intuitive way to think of this problem is to see it as a covering problem. That is, we want to cover the hypercube [0, n 1] m with a set of n ellipsoids {x : x a(i) p ρ}, i = 1,..., n with ρ, the covering radius, as small as possible. These types of LHDs have been studied before in a paper by Van Dam (2008) [13] and are called minimax designs. However this paper only deals with the two-dimensional case and only for the infinity, the Manhattan and the Euclidean norm. For the infinity norm a lower bound is provided as well as a construction method for general n that allows for the construction of LHDs with a minimal covering radius. For the Manhattan norm such a construction method is provided for finitely many n as well as a lower bound for general n. Finally, for the Euclidean norm a lower bound is provided for general n and designs with a minimal covering radius have been found up to n = 27 by an exhaustive search method. In order to try and confirm our intuition on what would be a good criterion to construct LHDs with regard to Kriging, we perform an experiment similar to the one described above. That is we try and estimate some function (again the banana function) using the two-dimensional optimal minimax and maximin designs created using the Euclidean norm. We note that we used 3-dimensional designs earlier in Section 3, however 2-dimensional designs are currently all that we have. Furthermore, we note that this test is in no way intended to provide conclusive proof as to whether minimax designs in general are better than, say, maximin designs but instead is only done to see if we can get some indication as to whether our intuition is possibly correct or not. We shall be performing this test using designs from n = 15 up to n = 27. Predictions will be done on 1600 points on a grit, spread out evenly,..., }2. The results in terms of over the domain of the banana function, { 2.048, the MSE are given in Table 2. These results appear to be consistent with our intuition on what would make a good LHD with regard to Kriging as the minimax designs consistently outperform the maximin designs with as much as a factor 10. That is, the idea that function values at points can be predicted better when these points lie close to a designs point compared to points that lie further away from any of the designs points, appears to be correct. We would expect this to be the case for higher dimensions as well, albeit that the meaning of the term close will change as the dimension increases. MSE # design points l 2 maximin l 2 minimax Table 2: MSE in estimating the banana function using maximin and minimax LHDs Unfortunately, the paper by Van Dam (2008) seems to suggest that, besides for the easier norms (infinity and Manhattan), these so called minimax designs are hard to find for general p-norms. This presumed difficulty has two important and related components. First of all there is the fact that we are considering all points inside a hypercube [0, n 1] m, of which there are infinitely many, which makes it hard to find some general algorithm to construct minimax designs but also to create heuristics based on for example the ESE 5

7 algorithm or simmulated annealing. This is because it is hard to determine the actual covering radius for a given design, especially in higher dimensions. The other component has to do with the complexity of the distance norm. Because these are in general non-linear functions it is hard to find construction methods that allow us to build minimax designs for general p-norms. Note that both the infinity norm and the Manhattan norm, which allow for a construction method to build minimax designs, do not have this problem. In light of this discussion we will, rather than to try and find exact solutions, construct a (1 + ɛ)-approximation scheme for general dimensions m. This will be the topic of the next section. 4 A (1 + ɛ)-approximation scheme The idea behind the approximation scheme is to first solve an easier sub-problem and then to translate the solution of this sub-problem back to the original minimax problem. We do this by constructing a set, say, of points inside the hypercube [0, n 1] m and then require only these points to be covered rather than the entire hypercube. This avoids the problem of having to deal with an infinite number of points as mentioned in the previous section. Then by increasing the found radius, ρ, by some quantity, say δ, it can be shown that the solution to the sub-problem covers the entire hypercube with radius ρ + δ. To show that this can be done we introduce the following lemma. Lemma 1. Given m, n N, r > 0 and a distance metric l p, p 1, let be a set of points inside the hypercube [0, n 1] m, all at a grid with interval lengths θ = m 1 p 2r, where the points on the outside are at distance 1 2 θ of the border of the hypercube, i.e. = { 1 2 θ, 3 2 θ, 5 2 θ,..., n 1 2 θ}m, and where r is chosen such that n 1 θ Z. Let A be an m-dimensional LHD of size n that covers all points in with radius ρ #. Then A covers the entire hypercube with radius ρ # + r. Proof. Consider a subset, Γ, with sides of length θ of the hypercube [0, n 1] m intersected with in such a way that the points in are on the vertices of Γ. The point furthest away from any of the points inside this intersection is right in the middle of Γ, denote this point by γ. Note that all other points in \{Γ} are further away from γ than those in Γ. Let x Γ, then γ = x θ1 n (where 1 n is the n-dimensional all-ones vector) and γ x p = p m[ 1 2 θ]p = p m[ 1 2 (m 1 p 2r)]p = p m 1 m [r]p = r. We can now introduce the following theorem that provides an algorithm and shows that this algorithm is a (1 + ɛ)-approximation scheme for the minimax problem. Theorem 1. Consider the following algorithm where m, n N and p 1, 1. Construct a set of points,, on the hypercube [0, n 1] m, as described in Lemma 1, with r = ɛl where L is a lowerbound for the covering radius, ρ, and where 0 ɛ Solve the sub-problem, min ρ s.t. a y p ρ for some a A y (**) A Π(m, n) where Π(m, n) is the set of all possible LHDs on the hypercube [0, n 1] m. This algorithm is a (1 + ɛ)-approximation scheme for the minimax problem. 6

8 Proof. Let A be the solution to (**) for given values of m,n and p. Furthermore, let ρ # and ρ OP T be the values corresponding to the solutions of (**) and the minimax problem respectively for those same values of n,m and p. Then π # will cover the entire hypercube with radius ρ # + ɛl by Lemma 1. Next, note that obviously ρ # ρ OP T. Hence we get, ρ # ρ OP T ρ # + ɛlb ρ OP T + ɛρ OP T = (1 + ɛ)ρ OP T Hence indeed the algorithm is a (1 + ɛ)-approximation scheme for the minimax problem. The problem we are left with now is how to solve the sub-problem (**). discussed next, both being binary programming formulations. Two ideas on this will be 4.1 Binary programming formulation I The first alternative is a rather straightforward formulation of (**). Let I = (i 1,..., i m ) and K = (k 1,..., k m ) with i j {1,..., n} and k j {1,..., n 1 θ } for j = 1,..., m. Define the sets Ī and as the sets of all such possible I and K respectively. Then for every point K an m-dimensional tensor (d K ) I is constructed containing the distance to all possible I Ī. Let M be some very large number. We can now present the following ILP formulation, min ρ s.t. x I (d K ) I M(y K ) I ρ I Ī, K (1) (y I ) K n m 1 K (2) I (y K ) I + x I 1 I Ī, K (3) n x I = 1 u = 1,..., n, j {1,..., m} (4) I:i(j)=u (y I ) K {0, 1} I Ī, K x I {0, 1} I Ī. The sets of constraints (1), (2) and (3) make sure that every point K is covered by at least 1 point I Ī. The set of constraints (4) makes sure that the points I are chosen such that they form an LHD. Note that the above formulation solves the sub-problem (**). Unfortunately the size of the instances grows very rapidly, both in the size of the LHD, n, as well as in the dimension, m. Say m and n are given. Then the number of variables is O(( 1 ɛ )m n 2m ). The number of constraints is n m ( n 1 θ )m for both of the sets of constraints (1) and (3), ( n 1 θ )m for the constaints (2) and nm for the set of constraints (4). Giving a total number of constraints of O(2( 1 ɛ )m n 2m ). The rapidly increasing size of the instances together with the fact that BP is a known NP-hard problem makes it that this formulation is only useful for really small LHDs in terms of n as well as m and for rather large values of ɛ. Already for m=2 and n=10 where ɛ is as large as 0.25 the problem becomes intractable. To cope with larger values of both n and m we will present an alternative BP formulation that allows for smaller instances and computation times, but that takes on a more ad-hoc approach and requires a bit of trial and error. 4.2 Binary programming formulation II Before presenting the second BP formulation we provide the following corollary to Theorem 1. The proof of this follows directly from the proof of Theorem 1. Corollary 1. Suppose we replace step 2 in the algorithm of Theorem 1 by the following alternative. Find an LHD, A, that solves the problem of covering all points in for a given, not necessarily optimal with 7

9 regards to (**), covering radius ρ #, say. Let L be a lower bound on the covering radius for the corresponding minimax problem which has an optimal covering radius ρ OP T. If ρ # L then A covers the entire hypercube [n 1] m with radius ρ # + ɛl and ρ # + ɛl (1 + ɛ)ρ OP T. That is, the solution found this way is guaranteed to be within a fraction ɛ of the optimal solution to the minimax problem. Now we can define the following BP formulation using similar notation as in the first BP formulation. We first pick some ρ # L (lower bound). Then we construct an m-dimensional tensor (C K ) I for every point K composed of 0 s and 1 s where C I,K = 1 if I K p ρ # and 0 otherwise. min 1 s.t. x I C I,K 1 K (1) I Ī n I:i(j)=u x I = 1 u = 1,..., n, j {1,..., m} (2) x I {0, 1} I Ī. The set of constraints (1) makes sure that every point K is covered by at least 1 point I Ī. The set of constraints (2) makes sure that the points I Ī are chosen in such a way that they form an LHD. The number of variables for this problem is n m. The number of constraints is O(( n ɛ )m ). Furthermore we are now dealing with a feasibility problem only, rather than an optimization problem. The downside is that we need good lower bounds to make this second option viable since we are not guaranteed to find a solution for any values of ρ # and ɛ. In this sense the resulting algorithm is no longer a (1 + ɛ)-approximation scheme since we can no longer guarantee a solution for any ɛ > 0. In fact the values of ɛ that still allow for a solution are highly dependend on the quality of the lower bounds. However, if a solution is found for some given value of ɛ > 0, we can still guarantee that it is within a fraction ɛ of the optimal solution to the minimax problem by Corollary 1. In the next subsection, 5.3 and 5.4 respectively, we will present a lower bound and some remarks on the choice of the norm to be used for the construction of LHDs in different dimensions. 4.3 Lower bound We start by noting that for any x, y R m and for any p, q R ++ such that p q it holds that x y x y p x y q. We can now provide the following lower bound which is a generalisation of a lower bound found by Van Dam (2008) [13] (Lemma 2) and of which the proof is mostly analogous to the one given in that same paper although there are some new ideas in there as well. Theorem 2. Let n 2, m 2 and p > 0. A LHD of n points in m dimensions has covering l p radius ρ at least min{ ρ (1), 1 2 2ρ (2) }, where ρ (1) is such that (ρ + 1)(2ρ) m 1 n 1 2(m 1)( 2ρ 1)((2ρ) m 1 2ρ(2ρ 1) m 2 ) (n 1) m 1 and ρ (2) is such that (ρ )(2ρ)m 1 2((2ρ) m 1 2ρ(2ρ 1 2 )m 2 + (m 1)( 1)((2ρ) m 1 2ρ(2ρ 3 2 )m 2 )) (n 1) m 1. n 1 2ρ Proof. We shall prove that the stated lower bound is a lower bound for the covering l radius (ρ = ρ for the remainder of the proof unless stated otherwise). Then from the remark at the start of this section it immediately follows that this lower bound is also a lower bound for any covering l p radius with p > 0. Consider an m-dimensional Latin hypercube design of n points with some given covering radius ρ. Then ρ is either integer of half integer. Firstly, suppose ρ is integer. Then the hyperplane, defined by x i = 0 for some i {1, 2,..., m}, can only be covered by a total of ρ + 1 points with x i = 0, 1,..., ρ. Note that each of these points can cover a part of the hyperplane of at most (2ρ) m 1. Hence ρ has to be such that (ρ + 1)(2ρ) m 1 (n 1) m 1. Next suppose that equality holds. Then each of the ρ + 1 points have to be chosen such that the areas that they are covering do not overlap. That is for any 2 points there has to be a coordinate, x k say, with k i, such that they differ by at least 2ρ. In particular for any direction x k with k i there have to be a couple of pairs of points who are at a difference of exactly 2ρ. Furthermore the points 8

10 have to be such that the area that each of these points is covering is completely within the hyperplane. That is the distance to the borders of the hyperplane is at least ρ. In fact for a couple of points the distance has to be exactly ρ in each direction x k, k i. Without loss of generality let this point be (0, ρ, ρ,..., ρ), so i = 1. Now we move into any direction, x 2 say. Let the next point in that direction be (0, ρ + 2ρ, ρ,..., ρ), note that otherwise equality can no longer hold. We then have to move this new point in all direction besides x 1 and x 2 as well since ρ has been used already for each of those coordinates. Note that moving further into the hyperplane leaves a small gap at the border that is not covered. Trying to cover such a small area will always result in huge overlaps between points so instead we move a little further outwards, (0, ρ + 2ρ, ρ 1,..., ρ 1) say. By doing this a small part of the area that is covered by this point will fall outside of the hyperplane. In particular a part of size (2ρ) m 1 2ρ(2ρ 1) m 2 (part that can be covered by a point - part that is inside of points of which one is (ρ, ρ,..., ρ) itself. We can now repeat the entire argument for the opposite side of the hyperplane, i.e., without loss of generality starting from the point (0, n 1 ρ, n 1 ρ,..., n 1 ρ). Hence in total we the hyperplane). Furthermore in each of the (m 1) directions x k, k i, lie at least n 1 2ρ loose a part of size at least 2(m 1)( n 1 2ρ 1)((2ρ)m 1 2ρ(2ρ 1) m 2 ). Next, assume that ρ is half integer. Then the hyperplane, as defined by x i = 0 for some i {1, 2,..., m}, can only be covered by a total of ρ points. Each of which can cover a part of the hyperplane of size (2ρ) m 1 as before so that (ρ+ 1 2 )(2ρ)m 1 (n 1) m 1. Then if equality holds we can again repeat the entire argument as for when ρ is integer, although with some differences. That is part of the area that is covered by the point in the corner will be outside of the hyperplane. In particular a part of size (2ρ) m 1 2ρ(2ρ 1 2 )m 2. Furthermore there is a small difference for any of the other points considered, the ones for which part of the area that is covered by them falls outside of the hyperplane. The size of these parts that are outside of the hyperplane will now be (2ρ) m 1 2ρ(2ρ 3 2 )m 2. We have now obtained a lower bound for any covering l p radius with p > 0, albeit that it is not a very strong bound. Especially for lower values of p, which is a direct result from the ordering of different metrics as noted at the start of this subsection. An advantage of the obtained lower bound is that we are now completely free to choose any p-norm we want. Because, as it turns out we will probably require different values for p depending on the dimension of the LHDs. 4.4 Choosing a suitable norm There seems to be a strong resemblance between the maximin designs and the l minimax designs. From the limited observations we made so far we found that good results are obtained using low values of p, this is a notion that seems to imply that for values of p, lower is better. This is an observation also found in a yet unpublished paper by Aggarwal et al. [1]. It seems that this observation also applies to LHDs. Although, as we found, only to a certain extent. Consider for example a 3-dimensional cube and take as distance metric the l 1 norm, which we assumed to provide good minimax designs in two dimensions (see Section 2). Then, finding a minimax designs in three dimensions is equivalent to covering the entire cube with a number of octehedra with some radius ρ. This turns out to be very difficult. We know that for the l 1 distance metric ρ is either integer or half integer. For n = 4 we found, by using BP formulation II with different sufficiently small values of ɛ, that it is (most likely) not possible to cover the entire cube for values of ρ 2.5. Then for ρ = 3 it turns out that almost every possible LHD consisting Figure 3: covering areas for different values of p for some given value for ρ of 4 points covers the entire cube. In fact, intuitively this makes sense as it is quite difficult to cover a cube entirely using octahedra. A notion that probably remains valid for higher dimensions where this same covering problem can be seen as a series of cubes that need to be covered using octahedra. Furthermore making p smaller just makes the problem of covering the entire (hyper)cube even harder, as can be seen from Fig 3. Hence, it seems that very low values for p in comparison to the dimension, m, do not provide 9

11 enough contrast between points to construct good space-filling designs. We appear to be getting the best results when p = m or p = m 1. 5 Results In this section we present some results obtained using binary programming formulation II. Using this BP formulation together with Theorem 1 we created some 2-dimensional approximate minimax LHDs from n=28 (minimax designs up until n = 27 have been found by van Dam (2008) [13]) until n = 49, using the Euclidean distance metric, l 2. We set ɛ equal to 0.1 and then proceeded to take ρ # a bit below the given lower bounds making it such that the guarantee in case of a solution is actually a bit better than 10%. That is, we tried to obtain better results doing less work. For example, for n = 28 the lower bound for ρ is and we took ρ # = 3.8 which gives a guarantee of = rather than just 0.1. Summarized results can be found in table 3. What follows is an explanation of the meaning of the symbols used in Table 3. n - the size of the LHD (i.e., the number of points it consists of), ɛ - the value used to construct the grit and the guarantee given by the algorithm, ɛ ACT - the actual guarantee that can be given for the found approximate minimax LHD, as determined afterwards, lower bound - the lower bound for 2-dimension l 2 minimax designs as found by van Dam (2008) [13], ρ # - the value for ρ that was tried to cover the grid with (i.e., get a solution for the sub-problem with), ρ - the value of ρ that was guaranteed to cover the entire square when a solution for the sub-problem was found with covering radius ρ # as implied by Theorem 1, ρ ACT - the actual covering radius of the found approximate minimax LHD, as determined afterwards, MSE ratio - the fraction, of the MSEs obtained in trying to estimate the banana function, using a minimax design and a maximin design respectively, CPU time - the time needed for CPLEX to solve the binary programming formulation II in seconds, using cplex on a single core running at a speed of 2.13GHz with 1GB of RAM. From Table 3 it can be seen that for n = 28 the actual guarantee that can be given is much better than 0.1 and even better than the that was derived above. In fact more often than not it turns out that the designs found give much better approximations than one would expect from the values chosen for ɛ. Furthermore the relative difference between the actual covering radius ρ and the lower bounds is similar to those obtained by the optimal minimax designs found in van Dam (2008) [13]. This seems to suggest that the approximate designs found here could also be optimal or are at least very close to optimal. Even much more closer than the best guarantee that we have been able to provide in this Thesis. Another thing to note is the huge variations in the CPU times required. They range from as little as 18 seconds to as much as almost 4 hours and there seems to be no direct correlation between the CPU times and any of the parameters. The differences in the required CPU times are probably caused by the fact that some values of ρ # are more tight with regards to the optimum than others. Which means there are less feasible solutions and CPLEX will, in most cases, have to searh through a larger part of the entire tree of possibilities. Finally note that again the MSE is much smaller using minimax designs compared to maximin designs. In fact the differences are very similar to the ones obtained in Section 3. Apart from the cases where n = 31 and n = 33, however for these 2 cases it turns out that the maximin LHDs perform better than usual whereas the minimax LHDs perform consistently with regard to the other minimax LHDs. We then proceeded to play around with the parameters for n = 28 to see if we could find even better minimax designs. We eventually found the same design using ρ # = (lower bound) and ɛ = 0.07 as well a slightly better design, ρ ACT = 4.1, using again ρ # = but with ɛ = The CPU-times were 10

12 n ɛ ɛ ACT lower bound ρ # ρ ρ ACT MSE ratio CPU time n ɛ ɛ ACT lower bound ρ # ρ ρ ACT MSE ratio CPU time Table 3: Summarized results for 2 dimensional l 2 minimax LHDs 36 and 224 respectively. Which are very similar to the one obtained in Table 3. Furthermore we tried to reproduce these results by setting a very low value for ρ # and a very high value ɛ, which in theory would be possible. Note that doing this partly takes care of the memory problem. Unfortunately we found that the accuracy obtained this way becomes worse. In this sense there still is a relation between ɛ and the quality of the designs. Finally we also obtained a few prelimenary results for 3-dimensional designs. We tried to find a few approximate minimax designs for n=10, n=15 and n=20 setting ρ # equal to the lower bound obtained from Theorem 2 and ɛ = 0.2. It seems that the binary programming formulation II is still rather quick, delivering similar CPU-times as the ones obtained in Table 3. There is however a problem with the size of the instances and the memory required in storing them. Although the size of the instances is still reasonable for the 3 cases we looked into, the size of the instances is growing quite rapidly and will eventually become a problem for larger values of n or lower values of ɛ and in particular for higher dimensions. Another problem that we encountered was that the lower bound we obtained in Theorem 2 is not strong enough. Due to this problem we can only take relatively low values for ρ # which, as noted before, often require large values of ɛ in order for the binary programming formulation II to be feasible. This means that the approximations we obtain this way are possibly quite poor. 6 Conclusion We started of comparing Maximin LHDs, orthogonal array based LHDs and orthogonal array based maximin LHDs by trying to estimate some blackbox function using Kriging. We found that neither of the designs could be said to be a better choice than either of the other designs. We then tried to argue what would make a good design and found that the minimax designs would probably be a good choice. We verified this intuition performing a similar test as before using Kriging. Eventually we constructed a (1 + ɛ)-approximation scheme to construct these minimax LHDs as well as a more restricted (with regard to the choice of ɛ) version of this approximation scheme. Furthermore we obtained a general lower bound for m-dimensional minimax designs. We then managed to construct quite decent 2-dimensional (approximate) minimax designs for n = 28 until n = 49 using this more restricted approximation scheme. We also did some prelimenary investigation into 11

13 3-dimensional designs. We found that computing these designs was still possible for values of n not to large. However the approximations were very poor due to the fact that the lower bound was not strong enough. Some ideas for future research would be to find a better (polynomial) algorithm to solve the sub-problem as defined in Section 5 or if this is not possible a proof that the sub-problem is in NP. Further ideas for research could also include the finding of a better lower bound, obtaining more results for 3- and higher-dimensional designs and more rigorous testing of different kind of LHDs. 12

14 References [1] C. C. Aggarwal, A. Hinneburg, and D. A. Keim. On the surprising behaviour of distance metrics in high dimensional space. Lecture Notes in Computer Science, Springer, pages , [2] P. Audze and V. Eglais. New approach for planning out of experiments. Problems of Dynamics and Strengths, 35: , [3] S. J. Bates, J. Sienz, and V. V. Toropov. Formulation of the optimal latin hypercube design of experiments using a permutation genetic algorithm. AIAA , pages 1 7, [4] B. Husslage, G. Rennen, E. R. van Dam, and D. den Hertog. Space-filling latin hypercube designs for computer experiments. Center Discussion Paper - Tilburg University, 18, [5] R. Jin, W. Chen, and A. Sudjianto. An efficient algorithm for constructing optimal design of computer experiments. Journal of Statistical Planning and Inference, 134: , [6] M. E. Johnson, L. M. Moore, and D. Ylvisaker. Minimax and maximin distance designs. Journal of Statistical Planning and Inference, 26: , [7] S. Leary, A. Bhaskar, and A. Keane. Optimal orthogonal-array-based latin hypercubes. Journal of Applied Statistics, 30: , [8] T. J. Mitchell M. D. Morris. Exploratory designs for computational experiments. Journal of Statistical Planning and Inference, 43: , [9] M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21: , [10] P. Prescott. Orthogonal-column latin hypercube designs with small samples. Computational Statistics and Data Analysis, 53: , [11] H. H. Rosenbrock. An automatic method for finding the greatest or least value of a function. Computer Journal, pages , [12] B. Tang. Orthogonal array-based latin hypercubes. Journal of the American Statistical Association, 88: , [13] E. R. van Dam. Two-dimensional minimax latin hypercube designs. Discrete Applied Mathematics, 156: , [14] E. R. van Dam, G. G. M. Husslage, D. den Hertog, and J. B. M. Melissen. Maximin latin hypercube designs in two dimensions. Operations Research, 55: , [15] Website. Space-filling designs

Tilburg University. Two-dimensional maximin Latin hypercube designs van Dam, Edwin. Published in: Discrete Applied Mathematics

Tilburg University. Two-dimensional maximin Latin hypercube designs van Dam, Edwin. Published in: Discrete Applied Mathematics Tilburg University Two-dimensional maximin Latin hypercube designs van Dam, Edwin Published in: Discrete Applied Mathematics Document version: Peer reviewed version Publication date: 2008 Link to publication