arxiv: v1 [stat.me] 10 Jul 2009

Similar documents
Generalized Latin hypercube design for computer experiments

Stat 890 Design of computer experiments

Optimizing Latin hypercube designs by particle swarm

Kullback-Leibler Designs

Sensitivity analysis in linear and nonlinear models: A review. Introduction

arxiv: v1 [stat.me] 24 May 2010

Statistica Sinica Preprint No: SS R2

Park, Pennsylvania, USA. Full terms and conditions of use:

Anale. Seria Informatică. Vol. XIII fasc Annals. Computer Science Series. 13 th Tome 1 st Fasc. 2015

A GENERAL CONSTRUCTION FOR SPACE-FILLING LATIN HYPERCUBES

One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions

Experimental Space-Filling Designs For Complicated Simulation Outpts

Construction of column-orthogonal designs for computer experiments

Sequential adaptive designs in computer experiments for response surface model fit

On Latin hypercube designs and estimating black box functions

Computer experiments with functional inputs and scalar outputs by a norm-based approach

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS

CONSTRUCTION OF NESTED (NEARLY) ORTHOGONAL DESIGNS FOR COMPUTER EXPERIMENTS

A note on optimal foldover design

Available online: 01 Jan 2012

E(s 2 )-OPTIMALITY AND MINIMUM DISCREPANCY IN 2-LEVEL SUPERSATURATED DESIGNS

Latin Hypercube Sampling with Multidimensional Uniformity

Bridging the Gap Between Space-Filling and Optimal Designs. Design for Computer Experiments. Kathryn Kennedy

DESIGNS FOR COMPUTER EXPERIMENTS AND UNCERTAINTY QUANTIFICATION

Gaussian Processes for Computer Experiments

Two-stage Sensitivity-based Group Screening in Computer Experiments

Nested Latin Hypercube Designs with Sliced Structures

Introduction to emulators - the what, the when, the why

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

LOGNORMAL ORDINARY KRIGING METAMODEL

Research the applicability of DOE about optimizing manufacturing process parameters

Tilburg University. Two-dimensional maximin Latin hypercube designs van Dam, Edwin. Published in: Discrete Applied Mathematics

Stratified Random Sampling for Dependent Inputs

SELECTING LATIN HYPERCUBES USING CORRELATION CRITERIA

COMPARING DESIGNS FOR COMPUTER SIMULATION EXPERIMENTS. Rachel T. Johnson Bradley Jones John W. Fowler Douglas C. Montgomery

LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION

CONSTRUCTION OF NESTED ORTHOGONAL LATIN HYPERCUBE DESIGNS

A Unified Framework for Uncertainty and Sensitivity Analysis of Computational Models with Many Input Parameters

Evaluating prediction uncertainty in simulation models

A NEW CLASS OF NESTED (NEARLY) ORTHOGONAL LATIN HYPERCUBE DESIGNS

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson

Design and Analysis of Computer Experiments for Screening Input Variables. Dissertation. Hyejung Moon, M.S. Graduate Program in Statistics

An introduction to Bayesian statistics and model calibration and a host of related topics

A construction method for orthogonal Latin hypercube designs

18Ï È² 7( &: ÄuANOVAp.O`û5 571 Based on this ANOVA model representation, Sobol (1993) proposed global sensitivity index, S i1...i s = D i1...i s /D, w

Orthogonal and nearly orthogonal designs for computer experiments

OPTIMAL DESIGN INPUTS FOR EXPERIMENTAL CHAPTER 17. Organization of chapter in ISSO. Background. Linear models

Weighted space-filling designs

Randomized Quasi-Monte Carlo for MCMC

Application-driven Sequential Designs for Simulation Experiments Kleijnen, J.P.C.; van Beers, W.C.M.

Bayesian optimal design for Gaussian process models

OPTIMAL MAXIMIN L 1 -DISTANCE LATIN HYPERCUBE DESIGNS BASED ON GOOD LATTICE POINT DESIGNS 1

arxiv: v1 [math.st] 2 Oct 2010

Maximin Strong Orthogonal Arrays

Tilburg University. Two-Dimensional Minimax Latin Hypercube Designs van Dam, Edwin. Document version: Publisher's PDF, also known as Version of record

University, Wuhan, China c College of Physical Science and Technology, Central China Normal. University, Wuhan, China Published online: 25 Apr 2014.

Some methods for sensitivity analysis of systems / networks

Numerical studies of space filling designs: optimization of Latin Hypercube Samples and subprojection properties

Construction and analysis of Es 2 efficient supersaturated designs

Mixture Designs Based On Hadamard Matrices

Design of experiments for generalized linear models with random block e ects

Computer Model Calibration or Tuning in Practice

Use of Design Sensitivity Information in Response Surface and Kriging Metamodels

SENSITIVITY ANALYSIS BY THE USE OF A SURROGATE MODEL IN LB-LOCA: LOFT L2-5 WITH CATHARE-2 V2.5 CODE

DESIGNING COMPUTER EXPERIMENTS TO DETERMINE ROBUST CONTROL VARIABLES

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Estimating percentiles of uncertain computer code outputs

Parameter Selection Techniques and Surrogate Models

Uniform Random Number Generators

Covariance function estimation in Gaussian process regression

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Optimizing Low-Discrepancy Sequences with an Evolutionary Algorithm

An Algorithm for Constructing Orthogonal and Nearly Orthogonal Arrays with Mixed Levels and Small Runs

Surrogate Modeling of Computer Experiments With Different Mesh Densities

Youden Square, George Box & Latin Hypercube. The Evolution of Experimental Design. The Evolution of Science

Space-Filling Designs for Computer Experiments

Gaussian Process Regression and Emulation

An adaptive kriging method for characterizing uncertainty in inverse problems

A Polynomial Chaos Approach to Robust Multiobjective Optimization

Sequential Exploration of Complex Surfaces Using Minimum Energy Designs

Journal of Statistical Planning and Inference

TRB 2005: For Presentation and Publication Paper # Final Submission: April 1, 2005 Word Count: 8,191

Bayesian Dynamic Linear Modelling for. Complex Computer Models

A Flexible Strategy for Augmenting Design Points For Computer Experiments

Understanding the uncertainty in the biospheric carbon flux for England and Wales

A GENERAL THEORY FOR ORTHOGONAL ARRAY BASED LATIN HYPERCUBE SAMPLING

E-optimal approximate block designs for treatment-control comparisons

Empirical Bayesian Analysis for Computer Experiments Involving Finite-Difference Codes

Tutorial on quasi-monte Carlo methods

Sequential Importance Sampling for Rare Event Estimation with Computer Experiments

Low Discrepancy Sequences in High Dimensions: How Well Are Their Projections Distributed?

Simplex-based screening designs for estimating metamodels

KRIGING METAMODELING IN MULTI-OBJECTIVE SIMULATION OPTIMIZATION. Mehdi Zakerifar William E. Biles Gerald W. Evans

Polynomial chaos expansions for sensitivity analysis

SEQUENTIAL DESIGN OF COMPUTER EXPERIMENTS TO MINIMIZE INTEGRATED RESPONSE FUNCTIONS

UCLA Department of Statistics Papers

A Statistical Approach to the Study of Qualitative Behavior of Solutions of Second Order Neutral Differential Equations

Understanding Uncertainty in Climate Model Components Robin Tokmakian Naval Postgraduate School

Department of Statistics, University of Ilorin, P.M.B 1515, Ilorin, Nigeria

SURROGATE PREPOSTERIOR ANALYSES FOR PREDICTING AND ENHANCING IDENTIFIABILITY IN MODEL CALIBRATION

Transcription:

6th St.Petersburg Workshop on Simulation (2009) 1091-1096 Improvement of random LHD for high dimensions arxiv:0907.1823v1 [stat.me] 10 Jul 2009 Andrey Pepelyshev 1 Abstract Designs of experiments for multivariate case are reviewed. Fast algorithm of construction of good Latin hypercube designs is developed. 1 Introduction The mathematical theory for designing experiments was started to develop by Sir Ronald A. Fisher who pioneered the design principles in his studies of analysis of variance originally in agriculture. The theory of experimental designs have received considerable development further in the middle of the twentieth century in works by G.E.P Box, J. Kiefer and many others. Computer experiments have become available with the appearance of computer engineering. Mathematical computer models are a replacement for natural (physical, chemical, biological) experiments which are too time consuming or too costly. Moreover, mathematical models may describe phenomena which can not be reproduced, for example, weather modeling. Experimental designs for deterministic computer models was studied first by McKay et al. (1979). The theoretical principles of analysis of deterministic computer models were determined in Sacks et al. (1989) and the analysis of simulation models (deterministic computer codes with stochastic output) in Kleijnen (1987). During the last decade the Bayesian approach to computer experiments was extensively developed, see Kennedy, O Hagan (2001), Conti, O Hagan (2008) and references within. The technique used in the Bayesian approach is close to Kriging in a manner that a special construction is used to interpolate the values of the output of the deterministic code rather than the values of a random field and uncertainty intervals for untried values of inputs are calculated; see Koehler, Owen (1996), Kennedy, O Hagan(2001). One run by a computer model may require considerable time. Thus the main problem is to reduce an uncertainty of inferences on a computer model by making only a few runs. Consequently, we are faced with the problem of optimal choice of experimental conditions. The present paper is organized as follows. In Section 2 we review experimental designs for a multivariate case in order to choose the most appropriate criteria of optimality. In Section 3 we propose a fast algorithm for constructing good optimal designs for computer experiments. 1 University of Sheffield, E-mail: a.pepelyshev@sheffield.ac.uk

2 Comparison of natural and computer experiments Basic features of natural and computer experiments are presented in the following two-column style. Natural experiments The response is observed with errors which may be correlated. The response is described either by known regression function with unknown parameters or by multivariate linear or quadratic model which is valid at a design subspace. A primary objective is to estimate parameters or find conditions which maximize response. Other aims are identifying variables which have a significant effect, etc. Optimal design is typically to minimize the (generalized) variance of estimated characteristics. Optimal designs are, for example, factorial, incomplete block, orthogonal, central composite, screening and D-optimal designs. Computer experiments The output is deterministic. The running of a computer code at the same inputs gives the same output. A computer code is considered to be like as a black box. The main assumption is factor sparsity, that is the output depends in nonlinear way only on a few number of inputs 1. A primary objective is to fit a cheaper unbiased low uncertain predictor. Other aims are calibration of model parameters to physical data, optimization of output, etc. Optimality criteria is the minimization of mean square error over design space or the maximization of entropy. Optimal design is space-filling design. Latin hypercube design is recommended in many papers. Note that optimal designs for natural experiments mostly have two or three points in projection on each coordinate, e.g. the 2 k p block and orthogonal designs have two points in projection, central composite design has three points in projection. This fact is a consequence of the multivariate linear or quadratic model which is assumed to be valid. Such designs are not suitable for computer experiments since weassumethat theoutput maybe highlynonlinearin severalvariables. Due tothe objectives of computer experiments, optimal design should minimize mean square error between the prediction of response at untried inputs and the true output. This criterion leads to the optimal design which should fill an entire design space uniformly at the initial stage of computer experiments. The examples of spacefilling design are Latin hypercube design, sphere packing design, distance based design, uniform design, design based on random or pseudo-random sequences, see Santner et al. (2003), Fang et al. (2006). The optimal design should be a dense set in projection to each coordinate and should be a dense set in entire design space. Each of the above space-filling designs has attractive properties and satisfies some useful criterion. As far as is known, the best design should optimize a compound criterion. 1 Without sparsity assumption we need a lot of runs to construct an unbiased low uncertain predictor. 1092

3 Latin Hypercube Designs At first, we need to recall an algorithm for construction of LH designs, which was introduced in McKay et al. (1979). The algorithm generates n points in dimension d in the following manner. 1) Generate n uniform equidistant points x (s) 1,...,x(s) n in the range of each input, s = 1,...,d. 2) Generate a matrix (p i,j ) of size d n such that each row is a random permutation of numbers 1,...,n and these permutations are independent. 3) Each column of the matrix (p i,j ) corresponds to a design point, that is (x (1) p 1,j,...,x p (d) d,j ) T is jth point of LHD. Without loss of generality, we assume that the range of each input is [0,1] and x (s) j R = {0,1/(n 1),2/(n 1),...,1}. By construction, LHD has the best filling of range in projection on each coordinate. Unfortunately, LHD may have a poor filling of entire hypercube. Several criteria of optimality are introduced in order to choose a good LHD in a class of all LHD. Maximin criterion is a maximization of minimal distance ( d ) 1/p Ψ p (L) = min x i x j p = min x s,i x s,j p i,j=1,...,n i,j=1,...,n usually used with p = 2 where x i = (x 1,i,...,x d,i ) T is ith point of design L. An LHD which maximize Ψ p (L) is called by maximin LHD. Audze-Eglais criterion introduced in Audze, Eglais (1977) is a sum of forces between charged particles and is a minimization of n n 1 Ψ AE (L) = x i x j 2. 2 i=1 j=i+1 Others criteria of uniformity are star L 2 -discrepancy, centered L 2 -discrepancy, wrap-around L 2 -discrepancy which are motivated by quasi-monte-carlo methods and the Koksma-Hlawka inequality, see Hickernell (1998), Fang et al. (2000). Algorithms of optimization are studied in a number of papers, the local search algorithm in Grosso et al. (2008), the enhanced stochastic evolutionary algorithm in Jin et al. (2005), the simulated annealing algorithm in Morris, Mitchell, (1995) the columnwise-pairwise procedure in Ye et al. (2000), the genetic algorithm in Liefvendahl, Stocki, (2006) and Bates et al. (2003), the collapsing method in Fang, Qin (2003). Cited authors concentrate on the case of low dimensions. Basing on an analysis of papers on computer experiments, we can say that the size of LHD is approximately equal to the input dimension multiplied by 10, that is n 10d. Further we propose a fast algorithm of constructing good LHD for the case of high dimensions which is not studied, to the best of our knowledge. First, we need to study features of random LHD generated by the above algorithm. Let L = {x 1,...,x n } be a LHD. Let r i be a minimal distance between x i and other points of L; that is r i = min j i x i x j 2 (further we consider euclidian distances). These distances characterize a design L. Let Q α denote an α-percentile for sample r 1,...,r n. Averaged values of low and upper quartiles, Q 0.25 and Q 0.75, are presented in table 1. We see that the inter-point distances are varied and the quarter of distances are quite small. Also note that distances between points is increased as the dimension is increased since n = 10d. 1093 s=1

Table 1: Low and upper quartiles of distances between points of n-point random LHD for different dimensions, n = 10d. d 2 3 4 5 6 7 Q 0.25 0.108 0.167 0.232 0.305 0.368 0.434 Q 0.75 0.175 0.270 0.347 0.431 0.502 0.573 d 8 9 10 14 20 Q 0.25 0.494 0.554 0.610 0.821 1.096 Q 0.75 0.636 0.699 0.757 0.972 1.249 For construction of n-point LHD with a given inter-point distance r at dimension d, we propose the following heuristic algorithm. Algorithm. 1. Let L k is a k-point design at kth step. Let L 1 = {x 1 } where x 1 is a random point in the middle of R d such that its coordinatesare unequal to eachother. 2. Compute a boolean matrix B = {B i,j } of size d n upon L k such that b i,j = 1 ( used ) if there exists a point in L k with ith coordinate which equals (j 1)/(n 1), and 0 ( unused ) otherwise. 3. Generate a random point z = (q 1,...,q d ) T /(n 1) R d such that each coordinate is unused; that is b i,qi = 0, i = 1,...,d except one random coordinate which should be taken nearby 0.5. 4. Create a set C of candidate points in R d with unused coordinates which are approximate points which are the closest and the furthest point from z lied on spheres S r (x j ) with centers x j L k and radius r, j = 1,...,k. 5. Find apointx C suchthat x liesoutsideofalls r (x j ), thatis x x j > r, j = 1,...,k. If there exist several such points, choose a point which minimizes #{s : d i=1 B i,s = m }, where m d = min j=1,...,n i=1 B i,j and B = B(L k {x }). 6. Add x to design, that is L k+1 = L k {x }. Stop at nth step. 7. If we could not find x at step 5, goto step 3. If we could find x after several trials, we should decrease r since it is impossible to find a point which is far from L k at given distance r. Let a design obtained by Algorithm be called SLHD. Numerical results show that Algorithm is fast and work well for any dimension. It requires 40 seconds to compute 100-point SLHD at dimension d = 10 and 60 seconds for 140-point SLHD at d = 14 and 200 seconds for 200-point SLHD at d = 20 on PC 2.1GHz. The choice of r should be smaller than r where r is the minimal distance between points of exact maximin LHD. Since r is unknown, we recommend the running Algorithm with different r, say, start with Q 0.75 for random LHD and increase it by small increment. The decreasing of r at step 7 does not mean that SLHD does not exist for given r and is a consequence a poor placement of points at previous iterations. Features of SLHD are presented in Table 2. Values of r are taken from website http://www.spacefillingdesigns.nl/. We see that 90% of inter-point distances 1094

Table 2: Low and upper quartiles and Q 0.1 of distances between points of n-point SLHD for different dimensions, n = 10d. The value r of maximin LHD is given. d 2 3 4 5 6 7 Q 0.1 0.217 0.310 0.351 0.393 0.512 0.584 Q 0.25 0.217 0.312 0.363 0.476 0.535 0.617 Q 0.75 0.217 0.323 0.409 0.486 0.552 0.626 r 0.223 0.360 0.476 0.589 0.687 0.779 d 8 9 10 14 20 Q 0.1 0.679 0.763 0.823 1.035 1.268 Q 0.25 0.694 0.765 0.824 1.037 1.271 Q 0.75 0.706 0.774 0.836 1.045 1.281 r 0.867 0.950 1.021 - - of SLHD are higher than the most of distances at random LHD. Thus SLHD has a better filling of entire design space. Figure 1 display points of SLHD for d = 2 and d = 20. We can see a quite uniform filling of square. Further improvement of experimental design can be done by applying the local search or the simulated annealing algorithm. 1 20 13 11 16 0.8 9 6 5 4 0.6 18 1 2 0.4 10 7 3 14 0.2 12 17 8 15 0 19 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Figure 1: The points of SLHD for d = 2 with the order of including (left) and two coordinates of points of SLHD for d = 20 (right). 4 Conclusion The algorithm of construction of LHD with given inter-point distance is constructed and studied. By the algorithm we can quickly compute LHD such that the most of inter-point distances are larger than distances at random LHD. The proposed algorithm is more efficient than simply generate many random LHDs and choose the best one. 1095

References [1] Bates S.J., Sienz J., Langley D.S. Formulation of the AudzeEglais Uniform Latin Hypercube design of experiments. Advances in Engineering Software 34, (2003) 493 506. [2] Conti S., O Hagan A. Bayesian emulation of complex multi-output and dynamic computer models. J. Statist. Plan. Infer. (2008) To appear. [3] Jin R., Chen, W., Sudjianto A. An efficient algorithm for constructing optimal design of computer experiments. J. Statist. Plann. Inf. 134 (2005), 268 287. [4] Grosso A., Jamali A., Locatelli M. Finding maximin latin hypercube designs by Iterated Local Search heuristics. accepted to European J. Operational Research. (2008). [5] Fang K.-T., Qin H. A note on construction of nearly uniform designs with large number of runs. Statist. Probab. Lett. 61 (2003), no. 2, 215 224. [6] Fang K.-T., Lin D.K.J., Winker P., Zhang Y. Uniform design: theory and application. Technometrics 42 (2000), no. 3, 237 248. [7] Fang K.-T. Li R., Sudjianto A. Design and modeling for computer experiments. Chapman & Hall/CRC, (2006). [8] Fang K.-T., Ma C.-X., Winker P. Centered L 2 -discrepancy of random sampling and Latin hypercube design, and construction of uniform designs. Math. Comp. 71 (2002), no. 237, 275 296. [9] Hickernell F.J. A generalized discrepancy and quadrature error bound. Math. Comp. 67 (1998), no. 221, 299 322. [10] Kennedy M.C., O Hagan A. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B 63 (2001), no. 3, 425 464. [11] Kleijnen J.P.C. Statistical tools for simulation practitioners. (1986) [12] Koehler J.R., Owen A.B. Computer experiments. In Handbook of Statistics, (1996), 261 308. [13] Liefvendahl M., Stocki R. A study on algorithms for optimization of Latin hypercubes. J. Statist. Plann. Inference 136 (2006), 3231 3247. [14] McKay M. D., Beckman R. J., Conover W. J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21 (1979), no. 2, 239 245. [15] Morris M.D., Mitchell T.J. Exploratory designs for computer experiments J. Stat. Plan. Inf., 43 (1995), 381 402. [16] Sacks J., Welch W.J., Mitchell T.J., Wynn H.P. Design and analysis of computer experiments. With comments and a rejoinder by the authors. Statist. Sci. 4 (1989), no. 4, 409 435. [17] Santner T.J., Williams B.J., Notz W. The Design and Analysis of Computer Experiments. (2003). [18] Ye K.Q., Li W., Sudjiantoc A. Algorithmic construction of optimal symmetric Latin hypercube designs. J. Stat. Plan. Inf. 90, (2000), 145 159. 1096