Adaptive evolutionary Monte Carlo algorithm for optimization with applications to sensor placement problems

Similar documents
Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Generalized Linear Methods

Lecture Notes on Linear Regression

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Kernel Methods and SVMs Extension

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

A Robust Method for Calculating the Correlation Coefficient

Numerical Heat and Mass Transfer

Boostrapaggregating (Bagging)

Global Sensitivity. Tuesday 20 th February, 2018

Chapter 13: Multiple Regression

The Study of Teaching-learning-based Optimization Algorithm

CSC 411 / CSC D11 / CSC C11

Problem Set 9 Solutions

x = , so that calculated

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Week 5: Neural Networks

Appendix B: Resampling Algorithms

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

EEE 241: Linear Systems

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Feature Selection: Part 1

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Economics 130. Lecture 4 Simple Linear Regression Continued

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

VQ widely used in coding speech, image, and video

Linear Approximation with Regularization and Moving Least Squares

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Maximizing Overlap of Large Primary Sampling Units in Repeated Sampling: A comparison of Ernst s Method with Ohlsson s Method

Lecture 10 Support Vector Machines II

Errors for Linear Systems

Uncertainty as the Overlap of Alternate Conditional Distributions

Lecture 12: Classification

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Composite Hypotheses testing

Gaussian Mixture Models

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

MMA and GCMMA two methods for nonlinear optimization

Module 9. Lecture 6. Duality in Assignment Problems

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Assortment Optimization under MNL

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH

Solving Nonlinear Differential Equations by a Neural Network Method

Chapter Newton s Method

Chapter - 2. Distribution System Power Flow Analysis

Statistics II Final Exam 26/6/18

Lecture 4. Instructor: Haipeng Luo

A Hybrid Variational Iteration Method for Blasius Equation

Comparison of Regression Lines

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Chapter 12 Analysis of Covariance

Convergence of random processes

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Laboratory 3: Method of Least Squares

DUE: WEDS FEB 21ST 2018

Ensemble Methods: Boosting

Bayesian predictive Configural Frequency Analysis

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

First Year Examination Department of Statistics, University of Florida

Difference Equations

Supporting Information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Some modelling aspects for the Matlab implementation of MMA

Cathy Walker March 5, 2010

COS 521: Advanced Algorithms Game Theory and Linear Programming

Computing Correlated Equilibria in Multi-Player Games

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

A New Evolutionary Computation Based Approach for Learning Bayesian Network

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

4DVAR, according to the name, is a four-dimensional variational method.

Lecture 12: Discrete Laplacian

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Originated from experimental optimization where measurements are very noisy Approximation can be actually more accurate than

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

/ n ) are compared. The logic is: if the two

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

On the Multicriteria Integer Network Flow Problem

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

NP-Completeness : Proofs

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Basically, if you have a dummy dependent variable you will be estimating a probability.

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Information Geometry of Gibbs Sampler

Transcription:

Stat Comput (2008) 18: 375 390 DOI 10.1007/s11222-008-9079-6 Adaptve evolutonary Monte Carlo algorthm for optmzaton wth applcatons to sensor placement problems Yuan Ren Yu Dng Famng Lang Receved: 28 Aprl 2008 / Accepted: 13 June 2008 / Publshed onlne: 11 July 2008 Sprnger Scence+Busness Meda, LLC 2008 Abstract In ths paper, we present an adaptve evolutonary Monte Carlo algorthm (AEMC), whch combnes a treebased predctve model wth an evolutonary Monte Carlo samplng procedure for the purpose of global optmzaton. Our development s motvated by sensor placement applcatons n engneerng, whch requres optmzng certan complcated black-box obectve functon. The proposed method s able to enhance the optmzaton effcency and effectveness as compared to a few alternatve strateges. AEMC falls nto the category of adaptve Markov chan Monte Carlo (MCMC) algorthms and s the frst adaptve MCMC algorthm that smulates multple Markov chans n parallel. A theorem about the ergodcty property of the AEMC algorthm s stated and proven. We demonstrate the advantages of the proposed method by applyng t to a sensor placement problem n a manufacturng process, as well as to a standard Grewank test functon. Keywords Global optmzaton Adaptve MCMC Evolutonary Monte Carlo Data mnng Y. Ren Y. Dng Department of Industral and Systems Engneerng, Texas A&M Unversty, College Staton, TX 77843-3131, USA Y. Dng e-mal: yudng@emal.tamu.edu F. Lang ( ) Department of Statstcs, Texas A&M Unversty, College Staton, TX 77843-3131, USA e-mal: flang@stat.tamu.edu 1 Introducton Optmzaton problems arse n many engneerng applcatons. Engneers often need to optmze a black-box obectve functon,.e., a functon that can only be evaluated by runnng a computer program. These problems are generally dffcult to solve because of the complexty of the obectve functon and the large number of decson varables nvolved. Two categores of statstcal methodologes, one based on random samplng and another based on predctve modelng have made great contrbuton to solvng the optmzaton problems of ths nature. In ths artcle, we propose an adaptve evolutonary Monte Carlo (AEMC) method, whch enhances the effcency and effectveness of engneerng optmzaton problems. A real example that motvates ths research s the sensor placement problem. Smply put, n a sensor placement problem, one needs to determne the number and locatons of multple sensors so that certan desgn crtera can be optmzed wthn a gven budget. Sensor placement ssues have been encountered n varous applcatons, such as manufacturng qualty control (Mandrol et al. 2006), structural health montorng (Bukkapatnam et al. 2005), transportaton management (Čvls et al. 2005), and securty survellance (Brooks et al. 2003). Dependng on applcatons, the desgn crtera to be optmzed nclude, among others, senstvty, detecton probablty, and coverage. A desgn crteron s a functon of the number and locatons of sensors, and ths functon s usually complcated and nonlnear. Evaluatng the desgn crteron needs to run a computer program, qualfyng t as a black-box obectve functon. Mathematcally, a sensor placement problem can be formulated as a constraned optmzaton problem. mn H(w) subect to G(w) 0, (1) w W

376 Stat Comput (2008) 18: 375 390 where W R d, d s the number of sensors, w s a vector of decson varables (.e., sensor locatons), H : W R s a user-specfed desgn crteron to be optmzed, and G( ) 0 represents physcal constrants assocated wth engneerng systems. Takng sensor placement n an assembly process for an example, G( ) 0 means that sensors can only be nstalled on the surface of subassembles, and H( ) s an E- optmalty desgn crteron (Mandrol et al. 2006). In Sect. 4, we wll vst ths sensor placement problem wth more detals. When the physcal constrants are complcated and dffcult to handle n an optmzaton routne, engneers could dscretze the soluton space W and create a fnte (yet possbly huge) number of soluton canddates that satsfy the constrants (see Km and Dng 2005, Sect. 1 for an example). For the sensor placement problems, ths means to dentfy all the vable sensor locatons apror; ths can be done relatvely easly because ndvdual sensors are located n a low (less than or equal to three) dmensonal space. One should use a hgh enough resoluton for dscretzaton so that good sensor locatons are not lost. Suppose we do dscretzaton. Then, the formulaton (1) becomes an unconstraned optmzaton problem, mn H(x), (2) x X where X s the sample space that contans the fnte number of canddate sensor locatons. Clearly, X Z d, whch s the set of d-dmensonal vectors wth nteger elements. Note that H( ) n (2) s stll calculated accordng to the same desgn crteron as n (1) but now defned on X. Recall that H( ) s of black-box type wth potentally plenty of local optma, due to the complex nature of engneerng systems. Solvng ths dscrete optmzaton problem mght seem mathematcally trval because one ust needs to enumerate all potental solutons exhaustvely and select the best one. In most real-world applcatons, however, there could be an overwhelmngly large number of potental solutons to be evaluated, especally when a hgh-resoluton dscretzaton was performed. Two categores of statstcal methodologes exst for solvng ths type of optmzaton problems. The frst category s the samplng-based methods: t starts wth a set of random samples, and then generates new samples accordng to some pre-specfed mechansm based on current samples and probablstcally accepts/reects the new samples for subsequent teratons. Many well-known optmzaton methods, such as smulated annealng (Bertsmas and Tstskls 1993), genetc algorthm (Holland 1992), and Markov chan Monte Carlo (MCMC) methods (Wong and Lang 1997), fall nto ths category; the dfferences among them come from the specfc mechansm an algorthm uses to generate and accept new samples. These methods can handle complcated response surface well and have been wdely appled to engneerng optmzatons. Ther shortcomng s that they generally requre a large number of functon evaluatons before reachng a good soluton. The second category s the metamodel-based methods. It also starts wth a set of soluton samples {x}. A metamodel s a predctve model ftted by usng the hstorcal soluton pars {x, H(x)}. Wth ths predctve model, new solutons are generated based on the model s predcton of where one s more lkely to fnd good solutons. Subsequently, the predctve model s updated as more solutons are collected. The model s labeled as metamodel because H(x) s the computatonal output based on a computer model. The metamodel-based method orgnates from the research on computer experments (Chen et al. 2006; Fang et al. 2006; Sacks et al. 1989; Smpson et al. 1997). Ths strategy s also called data-mnng guded method, especally when the predctve model used theren s a classfcaton tree model (Lu and Igusa 2007; Km and Dng 2005; Schwabacher et al. 2001) snce the tree model s a typcal data-mnng tool. For the metamodel-based or data-mnng guded methods, the maor shortcomng s ther neffectveness n handlng complcated response surfaces, and as a result, they only look for local optma. Ths paper proposes an optmzaton algorthm, combnng the samplng-based and metamodel-based methods. Specfcally, the proposed algorthm combnes evolutonary Monte Carlo (EMC) (Lang and Wong 2000, 2001) and a tree-based predctve model. The advantage of such a hybrd s that t ncorporates strengths from both EMC samplng and the predctve metamodelng: the tree-based predctve model adaptvely learns nformatve rules from past solutons so that the new solutons generated from these rules are expected to have better obectve functon values than the ones generated from blnd samplng operatons, whle the EMC mechansm allows a search to go over the whole sample space and gudes the solutons toward the global optmum. We thus label the proposed algorthm adaptve evolutonary Monte Carlo (AEMC). We wll further elaborate the ntutons behnd AEMC at the begnnng of Sect. 3,after we revew the two exstng methodologes wth more detals n Sect. 2. The remander of ths paper s organzed as follows. Secton 2 provdes detals of the two relevant methodologes. Secton 3 descrbes the general dea and mplementaton detals of the AEMC algorthm. We also prove that the AEMC algorthm preserves the ergodcty property of Markov chan samples. In Sect. 4, we employ AEMC to solve a sensor placement problem. We provde addtonal numercal examples to show AEMC s performance n optmzaton as well as ts potental use for samplng. We conclude ths paper n Sect. 5.

Stat Comput (2008) 18: 375 390 377 2 Related work 2.1 Samplng-based methods Among the samplng-based methods, smulated annealng and genetc algorthms have been used to solve optmzaton problems for qute some tme. They use dfferent technques to generate new random samples. Smulated annealng works by smulatng a sequence of dstrbutons determned by a temperature ladder. It draws samples accordng to these dstrbutons and probablstcally accepts or reects the samples. Geman and Geman (1984) have shown that f the temperature decreases suffcently slowly (at a logarthmc rate), smulated annealng can reach the global optmum of H(x)wth probablty 1. However, no one can afford such a slow coolng schedule n practce. People generally use a lnearly or geometrcally decreasng coolng schedule, but when dong so, the global optmum s no longer guaranteed. Genetc algorthm uses evolutonary operators such as crossover and mutaton to construct new samples. Mmckng natural selecton, crossover operators are appled on two parental samples to produce an offsprng that nherts characterstcs of both the parents, whle mutaton operators are occasonally used to brng varaton to the new samples. Genetc algorthm selects new samples accordng to ther ftness (for example, ther obectve functon values can be used as a measure of ftness). Genetc algorthm s known to converge to a good soluton rather slowly and lacks rgorous theores to support ts convergence to the global optmum. The MCMC methods have also been used to solve optmzaton problems (Wong and Lang 1997; Lang 2005; Lang et al. 2007; Neal 1996). Even though a typcal applcaton of MCMC s to draw samples from complcated probablty dstrbutons, the samplng operatons can be readly utlzed for optmzaton. Consder a Boltzmann dstrbuton p(x) exp( H(x)/τ) for some τ>0. MCMC methods could be used to generate samples from p(x). As a result, the MCMC method has a hgher chance to obtan samples wth lower H(x) values. If we keep generatng samples accordng to p(x), we wll eventually fnd samples close enough to the global mnmum of H(x). The MCMC methods perform random walks n the whole sample space and thus may potentally escape from local optma gven long enough run tme. Lang and Wong (2000, 2001) proposed a method called evolutonary Monte Carlo (EMC), whch ncorporates many attractve features of smulated annealng and genetc algorthm nto a MCMC framework. It has been shown that EMC s effectve for both samplng from hgh-dmensonal dstrbutons and optmzaton problems (Lang and Wong 2000, 2001). Because EMC s an MCMC procedure, t guarantees the ergodcty of the Markov chan samples n a long run. Nonetheless, t appears that there s stll a need and room to further mprove the convergence rate of an EMC procedure. Recently, adaptve proposals have been used to mprove the convergence rate of tradtonal MCMC algorthms. For example, Glks et al. (1998) and Brockwell and Kadane (2005) proposed to use regeneratve Markov chans and update the proposal parameters at regeneraton tmes; Haaro et al. (2001) proposed an adaptve Metropols algorthm whch attempts to update the covarance matrx of the proposal dstrbutons by makng use of all past samples. Important theoretcal advances on the ergodcty of the adaptve MCMC method have been made by Haaro et al. (2001), Andreu and Robert (2002), Atchadé and Rosenthal (2005), and Roberts and Rosenthal (2007). 2.2 Metamodel-based methods In essence, metamodel-based methods are not much dfferent from other sequental samplng procedures guded by a predctve model, e.g., response surface methodology used for physcal experments (Box and Wlson 1951). The metamodel based method consttutes of desgn and analyss of computer experments (Chen et al. 2006; Fang et al. 2006; Sacks et al. 1989; Smpson et al. 1997) where the socalled metamodel s an nexpensve surrogate or substtute of the computer model that s oftentmes computatonally expensve to run (e.g., the computer model could be a fnte element model of a cvl structure). Varous statstcal predctve methods have been used as the metamodels, accordng to the survey by Chen et al. (2006), ncludng neural networks, tree-based methods, Splnes, and spatal correlaton models. Durng the past few years, there have emerged a number of research developments, labeled as data-mnng guded engneerng desgns (Gukema et al. 2004; Huyet 2006; Lu and Igusa 2007; Km and Dng 2005; Mchalsk 2000; Schwabacher et al. 2001). The data-mnng guded methods are bascally one form of metamodel-based methods because they also use a statstcal predctve model to gude the selecton of desgn solutons. The predctve models used n the data-mnng guded desgns nclude regresson, classfcaton tree, and clusterng methods. When lookng for an optmal soluton, the predctve model s used as follows. After fttng a metamodel (or smply a model, n the cases of physcal experments), one could use t to predct where good solutons are more lkely to be found and thus select subsequent samples accordngly. Ths samplng-modelng-predcton procedure s consdered a data-mnng operaton. Lu and Igusa (2007) and Km and Dng (2005) demonstrated that the data-mnng operaton could greatly speed up computaton under rght crcumstances. Compared wth the slow convergng samplngbased methods, the metamodel-based methods can be especally useful when one has lmted amount of data samples; ths happens when physcal experments or computer smulatons are expensve to conduct. But the metamodel based

378 Stat Comput (2008) 18: 375 390 methods are greedy search methods and can be easly entrapped n local optma. 3 Adaptve evolutonary Monte Carlo 3.1 General dea of AEMC The strengths as well as the lmtatons of the samplngbased and the metamodel-based search methods motvate us to combne the two schemes and develop the AEMC algorthm. The ntuton behnd how AEMC works s explaned as follows. A crtcal shortcomng of the metamodel-based methods s that ther effectveness hghly depends on how representatve the sampled solutons are of the optmal regons of the sample space. Wthout representatve data, the resultng metamodel could mslead the search to non-optmal regons. Consequently, the subsequent samplng from those regons wll not help the search get out of the trap. In partcular, when the sample space s large and good solutons only le n a small porton of the space, data obtaned by a unform samplng from the sample space wll not be representatve enough. Takng the sensor placement problem shown later n ths paper for an example, we found that only 5% of the solutons have relatvely good obectve functon values. Under ths crcumstance, stand-alone metamodelng mechansm could hardly be effectve (as shown n the numercal results n Sect. 4), thereby promotng the need to mprove the sample qualty for the purpose of establshng a better metamodel. It turns out that samplng-based algorthms (we choose EMC n ths paper), though slow as a stand-alone optmzaton tool, are able to mprove the qualty of the sampled solutons. Ths s because when conductng random searches over a sample space, EMC wll gradually converge n dstrbuton to the Boltzmann dstrbuton n (4),.e., the smaller the value of H(x) s, the hgher the probablty of samplng x s (recall that we want to mnmze H(x)). In other words, EMC wll teratvely and stochastcally drect current samples toward the optmal regons such that the vsted solutons are more representatve of the optmal regons of the sample space. Wth the representatve samples produced by EMC, a metamodelng operaton could generate more accurate predctve models to characterze the promsng subregons of the sample space. The prmary tool for mprovng the samplng-based search s to speed up ts converge rate. As argued n Sect. 2, makng a MCMC method adaptve s an effectve way of achevng such an obectve. The metamodel part of AEMC learns the functon surface of H(x), and allows us to construct more effectve proposal dstrbutons for subsequent samplng operatons. As argued n Glks et al. (1995), the rate of convergence of a Markov chan to the Boltzmann dstrbuton n (4) depends crucally on the relatonshp between the proposal functon and the target functon H(x). To the best of our knowledge, AEMC s also the frst adaptve MCMC method that smulates multple Markov chans n parallel, whle the exstng adaptve MCMC methods are all based on smulaton of one sngle Markov chan. So the AEMC can utlze nformaton from multple chans to mprove the convergence rate. The above dscussons explan the beneft of combnng the metamodel-based and samplng-based method and executng them alternately n a fashon shown n Fg. 1. In the sequel, we wll present the detals of the proposed AEMC algorthm. For metamodelng (or data-mnng) operatons, we use classfcaton and regresson trees (CART), proposed by Breman et al. (1984), to ft predctve models. We choose CART prmarly because of ts computatonal effcency. Our goal of solvng an optmzaton problem requres the data-mnng operatons to be fast and computatonally scalable n order to accommodate large-szed data sets. Snce the data-mnng operatons are repeatedly used, a complcated and computatonally expensve method wll unavodably slow down the optmzaton process. Fg. 1 General framework of combnng samplng-based and metamodel-based methods

Stat Comput (2008) 18: 375 390 379 3.2 Evolutonary Monte Carlo For the convenence of readng ths paper, we provde a bref summary of EMC n ths secton and a descrpton of the operators of EMC n Appendx A. Please refer to Lang and Wong (2000, 2001) for more detals. EMC ntegrates features of smulated annealng and genetc algorthm nto a MCMC framework. Smlar to smulated annealng, EMC uses a temperature ladder and smultaneously smulates a populaton of Markov chans, each of whch s assocated wth a dfferent temperature. The chans wth hgh temperatures can easly escape from local optma, whle the chans wth low temperatures can search around some local regons and fnd better solutons faster. The populaton s updated by crossover and mutaton operators, ust lke genetc algorthm, and therefore adopts some level of learnng capablty,.e., samples wth better ftness wll have a greater probablty of beng selected and pass ther good genetc materals to the offsprngs. A populaton, as mentoned above, s actually a set of n soluton samples. The state space assocated wth a populaton s the product of n sample spaces, namely X n = X X. Denote a populaton x X n such that x = {x 1,...,x n }, where x ={x 1,...,x d } X s the -th d- dmensonal soluton sample. EMC attaches a dfferent temperature, t, to a sample x, and the temperatures form a ladder wth the orderng t 1 t n. We denote t = {t 1,...,t n }. Then the Boltzmann densty can be defned for a sample x as f (x ) = 1 Z(t ) exp{ H(x )/t }, (3) where Z(t ) s a normalzng constant, and Z(t ) = exp{ H(x )/t }. {x } Assumng that samples n a populaton are mutually ndependent, we then have the Boltzmann dstrbuton of the populaton as { n f(x) = f (x ) = 1 n Z(t) exp H(x )/t }, (4) =1 =1 where Z(t) = n =1 Z(t ). Gven an ntal populaton x (0) ={x (0) 1,...,x(0) n } and the temperature ladder t ={t 1,...,t n }, n Markov chans are smulated smultaneously. Denote the teraton ndex by k = 1, 2,..., and the k-th teraton of EMC conssts of two steps: 1. Wth probablty p m (mutaton rate), apply a mutaton operator to each sample ndependently n the populaton x (k). Wth probablty 1 p m, apply a crossover operator to the populaton x (k). Accept the new populaton accordng to the Metropols-Hastngs rule. Detals are gven n Appendces A.1 and A.2. 2. Try to exchange n pars of samples (x (k),x (k) ), wth unformly chosen from {1,...,n} and = ± 1 wth probablty w(x (k) x (k) ) as descrbed n Appendx A.3. EMC s a standard MCMC algorthm and thus mantans the ergodcty property for ts Markov chans. Because of ncorporaton of features from smulated annealng and genetc algorthm, EMC constructs proposal dstrbutons more effectvely and converges faster than tradtonal MCMC algorthms. 3.3 The AEMC algorthm In AEMC, we frst run a number of teratons of EMC and then use CART to learn a proposal dstrbuton (for generatng new samples) from the samples produced by EMC. Denote by (k) the set of samples we have retaned after teraton k. From (k), we defne hgh performance samples to be those wth relatvely small H(x)values. The hgh performance samples are the representatves of the promsng search regons. We denote by H (k) (h) the h percentle of the H(x) values n (k). Then, the set of hgh performance samples at teraton k, H (k), are defned as H (k) ={x : x (k) and H(x) H (k) (h) }. As a result, the samples n (k) are grouped nto two classes, the hgh performance samples n H (k) and the others. Treatng these samples as a tranng dataset, we then ft a CART model to a two-class classfcaton problem. Usng the predcton from the resultng CART model, we can partton the sample space nto rectangular regons, some of whch have small H(x) values and are therefore deemed as the promsng regons, whle other regons as non-promsng regons. The promsng regons produced by CART are represented as a (k) x b (k), = 1,...,d, = 1,...,n. Snce X s dscrete and fnte, there s a lower bound l and an upper bound u n the -th dmenson of the sample space. Clearly we have l a (k) b (k) u.cartmay produce multple promsng regons. We denote by m (k) the number of regons. Then, the collecton of the promsng regons s specfed n the followng: a (k) s x b (k) s, = 1,...,d, = 1,...,n, s = 1,...,m (k). (5) As the algorthm goes on, we contnuously update (k), and hence a (k) s and b (k) s. After we have dentfed the promsng regons, the proposal densty s constructed based on the followng thoughts:

380 Stat Comput (2008) 18: 375 390 get a sample from the promsng regons wth probablty R, and from elsewhere wth probablty 1 R, respectvely. We recommend usng a relatvely large R value, say R =.9. Snce there may be multple promsng regons dentfed by CART, we denote the proposal densty assocated wth each regon by q ks (x), s = 1,...,m (k). In ths paper, we use a Metropols-wthn-Gbbs procedure (Müller 1991) to generate new samples as follows. For = 1,...,n, denote the populaton after the k-th teraton by x (k+1, 1) = (x (k+1) 1,...,x (k+1) 1, x (k),...,x n (k) ), of whch the frst 1 samples have been updated, and the Metropols-wthn-Gbbs procedure s about to generate the -th new sample. Note that x (k+1,0) = (x (k) 1,...,x(k) n ). 1. Set S to be randomly chosen from {1,...,m (k) }. Generate a sample x from the proposal densty q ks( ). q ks (x ) = d =1 (r I(a(k) S x b(k) S ) b (k) S a(k) S or x >b(k) S ) + (1 r) I(x <a(k) S (u l ) (b (k) S a(k) S ) ), (6) where I( ) s the ndcator functon. Here r s the probablty of samplng unformly wthn the range specfed by the CART rules on each dmenson. Snce each dmenson s ndependent of each other, we have R = r d. 2. Construct a new populaton x (k+1,) by replacng x (k) wth x, and accept the new populaton wth probablty mn(1,r d ), where r d = f(x(k+1,) ) T(x (k+1, 1) x (k+1,) ) f(x (k+1, 1) ) T(x (k+1,) x (k+1, 1) ) = exp{ (H (x ) H(x(k) ))/t } T(x(k+1, 1) x (k+1,) ) T(x (k+1,) x (k+1, 1) ), (7) If the proposal s reected, x (k+1,) = x (k+1, 1). The transton probablty n (7) s calculated as follows. Snce we only change one sample n each Metropolswthn-Gbbs step, the transton probablty can be wrtten as T(x (k) x (k) +1,...,x(k) n T(x (k) x x(k) [ ]), where x(k) ). Then we have m(k) x x(k) [ ] ) = 1 m (k) q ks(x ). s=1 [ ] = (x(k+1) 1,...,x (k+1) 1, From (6), t s not dffcult to see that as long as 0 <r<1, the proposal s global,.e., q ks (x) > 0 for all x X. Snce X s fnte, t s natural to assume that f(x) s bounded away from 0 and on X. Thus, the mnorsaton condton (Mengersen and Tweede 1996),.e., ω f(x) = sup x X q ks (x) <, s satsfed. As shown n Appendx C, satsfacton of ths condton would lead to the ergodcty of the AEMC algorthm. Now we are ready to present a summary of the AEMC algorthm, whch conssts of two modes: the EMC mode and the data-mnng (or metamodelng) mode. 1. Set k = 0. Start wth an ntal populaton x (0) by unformly samplng n samples over X and a temperature ladder t ={t 1,...,t n }. 2. EMC mode: run EMC untl a swtchng condton s met. Apply mutaton, crossover, and exchange operators to the populaton x (k) and accept the updated populaton accordng to the Metropols-Hastngs rule. Set k = k + 1. 3. Run the data-mnng mode untl a swtchng condton s met. Wth probablty P k, use the CART method to update the promsng regons,.e., update the values of a (k+1) s and b (k+1) s n (5). Wth probablty 1 P k, do not apply CART and smply let a (k+1) s = a (k) s and b (k+1) s = b (k) s. Generate n new samples followng the Metropolswthn-Gbbs procedure mentoned earler n ths secton. Set k = k + 1. 4. Alternate between the two modes untl a stoppng rule s met. The algorthm could termnate when the computatonal budget (the number of teratons) s consumed or when the change n the best H(x) value does not exceed a gven threshold for several teratons. To effectvely mplement AEMC, several ssues need to be consdered. Frstly, t s the choce of parameters n EMC: n and p m. We smply follow the recommendatons made n the EMC related research. So we set n and p m to values that favor EMC. Typcally, n = 5 20 and p m.25 (Lang and Wong 2000). Secondly, the choce of P k. We need to make sure P 1 > P 2 > >P k >, and lm k P k = 0, whch ensures the dmnshng adaptaton condton requred for the ergodcty of the adaptve MCMC algorthms (Roberts and Rosenthal 2007). As dscussed n Appendx C, meetng the dmnshng adaptaton condton s crucal to the convergence of AEMC. Intutvely, P k could be consdered as the learnng rate. In the begnnng of the algorthm, we do not have much nformaton about the functon surface, and therefore we apply the data-mnng mode to learn new nformaton

Stat Comput (2008) 18: 375 390 381 wth a relatvely hgh probablty. As the algorthm goes on, we may have suffcent knowledge about the functon surface, and t may be a waste to execute the data-mnng mode too often. So we make the learnng rate decrease over tme. Specfcally, we set P k = 1/k δ.theδ(δ>0) controls the decreasng speed of P k. The larger δ s, the faster P k decreases to 0. We choose δ =.1 n ths paper. Thrdly, the constructon of tranng samples (k).the queston s that should we use all the past samples to construct (k) or should we use a subset nstead, for example, usng only recent samples gathered snce the last datamnng operaton. If we use all the past samples, data mnng wll be performed on a large dataset, and dong so wll nevtably take a long tme and thus slow down the optmzaton process. Because EMC s able to randomly sample from the whole sample space, AEMC s less lkely to fall nto local optma even f we ust use recent samples. Thus, the latter becomes our choce. Lastly, we dscuss the followng tunng parameters of AEMC. Swtchng condton M. In order to adopt the strengths of the two mechansms and compensate ther weaknesses, a proper swtchng condton s needed to select the approprate mode of operatons for the proposed optmzaton procedure. Because of the nablty of a stand-alone data-mnng mode to fnd representatve samples (wll be shown n Sect. 4), t s not benefcal to run the datamnng mode for multple teratons. So a natural choce s to run the data-mnng mode only once for every M teratons of EMC. If M s too large, the learnng effect from the data-mnng mode wll be overshadowed and AEMC vrtually becomes a pure EMC algorthm. If M s too small, EMC may not be able to gather enough representatve data and thus data mnng could hardly be effectve. We recommend choosng M based on the value of n M, whch s the same sze used n the data-mnng mode for establshng the predctve model. From our experence, nm = 300 500 works qute well. The proper choce of h vares for dfferent problems. For a mnmzaton problem, a large h value could brng many unnterestng samples nto the set of supposedly hgh performance solutons and then slow down the optmzaton process. On the other hand, a small value of h would ncrease the chance for the algorthm to fall nto local optmum. Besdes, a very small h value may lead to small promsng regons, whch could make the acceptance rate of new samples too low. But the danger of fallng nto the local optma s not grave because the data-mnng mode s followed by EMC that randomzes the populaton agan and makes t possble to escape from the local optma. In lght of ths, we recommend an aggressve choce for the h value,.e. h = 5 15%. Choce of the tree sze n CART. Fttng a tree s to approxmate the response surface of H(x). A small-szed tree may not be sophstcated enough to approxmate the surface, whle a large-szed tree may overft the data. Snce we apply CART multple tmes n the entre procedure of AEMC, we beleve that the mechansm of how the trees work n AEMC s actually smlar to tree boostng (Haste et al. 2001), where a seres of CART are put together to produce a result. For tree boostng, Haste et al. (2001) recommended 2 J 10. In our problem, however, controllng J alone does not precsely fulfll our obectve. Because our goal s to fnd the global optmum rather than to make good predcton for the whole response surface, we are much more nterested n the part of the response surface where the H(x) values are relatvely small, correspondng to the class of hgh performance samples, H (k). It then makes sense to control the number of termnal nodes assocated wth H (k), denoted by J H. Controllng J H enables us to ft a CART model that better approxmates a hgh performance porton of the response surface. Note that the set of termnal nodes representng H (k) s a subset of all termnal nodes n the correspondng tree, meanng that the value of J H s postvely correlated wth the value of J. Thus, controllng J H n the meanwhle also regulates the value of J. The basc ratonale behnd the selecton of J H s smlar to that for J :alargej H results n a large J and could lead to overfttng; the danger of usng too small a J H s that there wll be too few promsng regons for the subsequent actons to search and evolve from, whch may cause the proposed procedure to mss some good samples. From the above arguments, we note that the H (k) n our problem plays an analogous role as the whole response surface n the tradtonal tree boostng. We beleve that the gudelne for J could be transferred to J H,.e., 2 J H 10. In Sect. 4, we shall provde a senstvty analyss of the three tunng parameters, whch reveals how the performance of AEMC depends on ther choces. From the results, we wll see that M and h are the two most mportant tunng parameters and that AEMC s not senstve to the value of J H, provded that t s chosen from the above-recommended range. As an adaptve MCMC algorthm, AEMC smulates multple Markov chans n parallel and could therefore utlze the nformaton from dfferent chans for mprovng the convergence rate. We wll provde some emprcal results n support of ths clam n Sect. 4 because a theoretcal assessment of convergence rate s too dffcult to obtan. But we do nvestgate the ergodcty of AEMC. We are able to show that AEMC s ergodc as long as the proposal q ks ( ) s global and the data-mnng mode s run wth probablty P k 0as k. The ergodcty mples that AEMC wll reach the

382 Stat Comput (2008) 18: 375 390 global optmum of H(x) gven enough run tme. Ths property of the AEMC algorthm s stated n Theorem 1 and ts proof s ncluded n Appendx C. Theorem 1 If 0 <r<1, lm k P k = 0, and X s compact, then AEMC s ergodc,.e., the samples x (k) converge n dstrbuton to f(x). A fnal note s that we assume the sample space X to be dscrete and bounded n ths paper. Yet the AEMC algorthm could easly be extended to an optmzaton problem wth a contnuous and bounded sample space. One ust needs to use the mutaton and crossover operators that are proposed n Lang and Wong (2001). Theorem 1 stll holds. For unbounded sample spaces, some constrants can be put on the tals of the dstrbuton f(x) and the proposal dstrbuton to ensure that the mnorsaton condton hold. Refer to Roberts and Tweede (1996), Rosenthal (1995) and Roberts and Rosenthal (2004) for more dscussons on ths ssue. 4 Numercal results To llustrate the effectveness of the AEMC algorthm, we use t to solve three problems: the frst two are for optmzaton purposes and the thrd s for samplng purposes. The frst example s a sensor placement problem n an assembly process, and n the second example we optmze the Grewank functon (Grewank 1981), a wdely used test functon n the area of global optmzaton. In the thrd example we use AEMC to sample from a mxture Gaussan dstrbuton and see how AMEC, as an adaptve MCMC method, can help the samplng process. For the two optmzaton examples, we compare AEMC wth EMC, the stand-alone CART guded method, and the standard genetc algorthm. As to the parameters n AEMC, we chose n = 5, p m =.25, M = 60, J H = 6 and h = 10%. For the standard genetc algorthm, we let the populaton sze be 100, crossover rate be.9 and mutaton rate be.01. All optmzaton algorthms were mplemented n the MAT- LAB envronment, and all reported performance statstcs of the algorthms were the average result of 10 trals. The performance ndces for comparson nclude the best functon value found by an algorthm and the number of tmes that H( ) has been evaluated (also called the number of functon evaluatons herenafter). The use of the number of functon evaluatons as a performance measure makes good sense for many engneerng desgn problems, where the obectve functon H( ) s complex and tme consumng to evaluate. Thus the tme of functon evaluatons essentally domnates the entre computatonal cost. In Sect. 4.3, we wll also present a senstvty analyss on the three tunng parameters, the swtchng condton M, the percentle value h,andthetreeszej H. 4.1 Sensor placement example In ths secton, we attempt to fnd an optmal sensor placement strategy n a three-staton two-dmensonal (2-D) assembly process (Fg. 2). Coordnate sensors are dstrbuted throughout the assembly process to montor the dmensonal qualty of the fnal assembly and/or of the ntermedate subassembles. M 1 M 5 are fve coordnate sensors that are currently n place on the three statons; ths s smply one nstance of, out of hundreds of thousands of other possble, sensor placements. The goal of havng these coordnate sensors s to estmate the dmensonal devaton at the fxture locators on dfferent statons, labeled as P, = 1,...,8, n Fg. 2. Researchers have establshed physcal models connectng the sensor measurements to the devatons assocated wth the fxture locators (Jn and Sh 1999; Mandrol et al. 2006). Such a relatonshp could be expressed n a lnear model, mathematcally equvalent to a lnear regresson model. Thus, the desgn of sensor placement becomes very smlar to the optmal desgn problem n expermentaton, and the problem to Fg. 2 Illustraton: a mult-staton assembly process. The process proceeds as follows: () at the staton I, part 1 and part 2 are assembled; () at the staton II, the subassembly consstng of part 1 and part 2 receves part 3 and part 4; and () at the staton III, no assembly operaton s performed but the fnal assembly s nspected. The 4-way pns constran the part moton n both the x-andthe z-axes, and the 2-way pns constran the part moton n the z-axs

Stat Comput (2008) 18: 375 390 383 decde where one should place them on respectve assembly statons so that the estmaton of the parameters (.e., fxturng devatons) can be acheved wth the mnmum varance. Smlar to the optmal expermental desgns, people chose to optmze an alphabetc optmalty crteron (such as D-optmalty or E-optmalty) of an nformaton matrx that s determned by the correspondng sensor placement. In ths paper, we use the E-optmalty desgn crteron as a measure of the sensor system senstvty, the same as n Lu et al. (2005). But the AMEC algorthm s certanly applcable to other desgn crtera. Due to the complexty of ths sensor placement problem, one wll need to run a set of MATLAB codes to calculate the response of senstvty for a gven placement of sensors. For more detals of the physcal process and modelng, please refer to Mandrol et al. (2006). We want to maxmze the senstvty, whch s equvalent to mnmzng the maxmum varance of the parameter estmaton. To facltate the applcaton of the AEMC algorthm, we dscretze the geometrc area of each part vable for sensor placement usng a resoluton of 10 mm (whch s the sze of a locator s dameter); ths treatment s the same as what was done n Km and Dng (2005) and Lu et al. (2005). The dscretzaton procedure also ensures that all constrants are ncorporated nto the canddate samples so that we can solve the unconstraned optmzaton (2) for sensor placement. Ths dscretzaton results n N c canddate locatons for any sngle sensor so the sample space for (2) sx =[1,N c ] d Z d. For the assembly process shown n Fg. 2, the 10-mm resoluton level results n the number of canddate sensor locatons on each part as n 1 = 6,650, n 2 = 7,480, n 3 = 2,600, and n 4 = 2,600. Because part 1 and 2 appear on all three statons, part 3 and 4 appear on the second and thrd statons, there are totally N c = 3 (n 1 + n 2 ) + 2 (n 3 + n 4 ) = 52,790 canddate locatons for each sensor. Suppose that d = 9, meanng that nne sensors are to be nstalled, then the total number of soluton canddates s C 52,790 9 8.8 10 36, where Ca b s the combnatonal operator. Evdently, the number of soluton canddates s overwhelmngly large. Moreover, we want to maxmze the senstvty obectve functon (.e., more senstve a sensor system, the better), whle AEMC s to solve a mnmzaton problem. For ths reason, we let H(x) n the AEMC algorthm equal to the senstvty response of x multpled by 1, where x represents an nstance of sensor placement. We solve the optmal sensor placement problem for nne sensors and 20 sensors, respectvely. For the scenaro of d = 9, each algorthm was run for 10 5 functon evaluatons. The results of varous methods are presented n Fg. 3. It demonstrates a clear advantage of AEMC over the other algorthms. EMC and genetc algorthm have smlar performances. After about 4 10 4 functon evaluatons, AEMC fnds H(x) 1.20, whch s the Fg. 3 Performances of the varous algorthms for nne sensors Fg. 4 Uncertanty of dfferent algorthms for nne sensors best value found by EMC and genetc algorthm after 10 5 functon evaluatons. Ths translates to 2.5 tmes mprovement n terms of CPU tme. Fgure 4 gves a Boxplot of the best senstvty values found at the end of each algorthm. AMEC fnds a senstvty value, on average, 10% better than EMC and genetc algorthm. AEMC also has smaller uncertanty than EMC. From the two fgures, t s worth notng that the stand-alone CART guded method performs much worse than the other algorthms n ths example. We beleve ths happens manly because the stand-alone CART guded method fals to gather representatve data n the sample space assocated wth the problem. Fgure 5 presents the best (.e., yeldng the largest senstvty) sensor placement strategy found n ths example. We also test the AEMC method n a hgher dmensonal case,.e., when d = 20. All the algorthms were agan run for 10 5 functon evaluatons. The algorthm performance curves are presented n Fg. 6, where we observe that the senstvty value found by AEMC after 3 10 4 functon evaluatons s the same as that found by EMC after 10 5 functon evaluatons. Ths translates to a 3-fold mprove-

384 Stat Comput (2008) 18: 375 390 Fg. 5 Best sensor placement for nne sensors Fg. 8 Best sensor placement for 20 sensors Fg. 6 Performances of the varous algorthms for 20 sensors Fg. 9 Performances of the varous algorthms for the Grewank functon 4.2 Grewank test functon In order to show the potental applcablty of AEMC to other optmzaton problems, we test t on a well-known test functon. The Grewank functon (Grewank 1981) has been used as a test functon for global optmzaton algorthms n a broad body of lterature. The functon s defned as Fg. 7 Uncertanty of dfferent algorthms for 20 sensors ment n terms of CPU tme. Interestngly, ths mprovement s greater than that n the 9-sensor case. The fnal senstvty value attaned by AMEC s, on average, 7% better than EMC and genetc algorthm. Agan, the stand-alone CART guded method fals to compete wth the other algorthms. We feel that as the dmensonalty of the sample space gets hgher, the performance of the stand-alone CART guded method gets worse compared to others. Fgure 7 shows the uncertanty of each algorthm. In ths case, AEMC has a lttle hgher uncertanty than EMC, but the average results of AEMC are stll better. Fgure 8 presents the best sensor placement strategy found n ths example. H G (x) = d x 2 d 4000 =1 =1 ( ) x cos + 1, 600 x 600, = 1,...,d. The global mnmum s located at the orgn and the global mnmum value s 0. The functon has a very large number of local mnma, exponentally ncreasng wth d. Herewe set d = 50. Fgure 9 presents the performances of the varous algorthms. All algorthms were run for 10 5 functon evaluatons. Please note that we have truncated the y-axs to be between 0 and 400 so as to show the performances at the early-stage of dfferent algorthms more clearly. AEMC clearly outperforms the other algorthms, especally n the begnnng stage, as one can observe that AEMC converges much faster. As explaned earler, ths fast convergence s an appealng property to engneerng desgn problems. In ths example,

Stat Comput (2008) 18: 375 390 385 Table 1 ANOVA analyss for the sensor placement example Source Sum sq. D.f. Mean sq. F Prob >F M 0.33 4 0.08 2.43 0.05 h 0.87 2 0.44 12.75 0.00 J H 0.06 2 0.03 0.84 0.43 M h 0.26 8 0.03 0.94 0.48 M J H 0.14 8 0.02 0.53 0.84 h J H 0.11 4 0.03 0.79 0.53 Error 6.70 196 0.03 Total 8.47 224 Fg. 10 Uncertanty of dfferent algorthms for the Grewank functon the stand-alone CART guded method converges faster than genetc algorthm and EMC, but s entrapped nto a local optmum at an early stage. AEMC appears to level off after 65,000 functon evaluatons. However, as assured by Theorem 1, AEMC wll eventually reach the global optmum f gven enough computatonal effort. Fgure 10 gves a Boxplot of the best H G (x) found at the end of the algorthms. Comparng to the other methods, the AEMC algorthm not only mproves the average performance but also reduces the uncertanty. Although the stand-alone CART guded method has found good solutons, the uncertanty of ths algorthm s much hgher than AEMC and the other two. 4.3 Senstvty analyss We run an ANOVA analyss to nvestgate how senstve the performance of AEMC to the tunng parameters: the swtchng condton M, the percentle value h, and the tree sze J H. The value of M s chosen from fve levels (10, 30, 60, 90, 120), the h s chosen from three levels (1%, 10%, 20%), and the J H s chosen from three levels (3, 6, 12). Then a full factoral desgn wth 45 cases s constructed. For the sensor placement example, we use the 9-sensor case for the ANOVA analyss. AEMC was run for 10 5 functon evaluatons, and we recorded the best functon value found as the output. For each factor level combnaton, ths was done fve tmes. The ANOVA table s shown n Table 1. We can see that the man effect of M and h s sgnfcant at the.05 level. Our study also revealed that relatvely smaller M and h are favored. For the Grewank example, we ran AEMC for 5 10 4 functon evaluatons and recorded the best functon value found as the output. For each factor level combnaton, ths was done fve tmes. The ANOVA table s shown n Table 2. If usng.05 level, then we can see that the man effects of M and h are sgnfcant. Here smaller h and M arefavoredaswell. Table 2 ANOVA analyss for the Grewank example Source Sum sq. D.f. Mean sq. F Prob >F M 2393.40 4 598.35 48.50 0.00 h 1046.26 2 523.13 42.40 0.00 J H 0.18 2 0.09 0.01 0.99 M h 163.51 8 20.44 1.66 0.11 M J H 104.15 8 13.02 1.06 0.40 h J H 43.41 4 10.85 0.88 0.48 Error 2418.08 196 12.34 Total 6169.00 224 Based on our senstvty analyss for both examples, we understand that the swtchng condton M and the percentle value h are the mportant factors affectng the performance of AEMC. To choose sutable values for M and h, users may follow our general gudelnes outlned n Sect. 3.3 and further tune ther values for specfc problems. 4.4 Samplng from a mxture Gaussan dstrbuton AEMC falls nto the category of adaptve MCMC methods, and thus could be used to draw samples from a gven target dstrbuton. As shown by Theorem 1, the dstrbuton of those samples wll asymptotcally converge to the target dstrbuton. We test AEMC on a fve-dmensonal mxture Gaussan dstrbuton π(x) = 1 3 N 5(0,I 5 ) + 2 3 N 5(5,I 5 ), where 0 = (0, 0, 0, 0, 0) and 5 = (5, 5, 5, 5, 5). Ths example s used n Lang and Wong (2001). Snce n ths paper we assume the sample space to be bounded, we set the sample space to be [ 10, 10] 5 here. The dstance between the two modes s 5 5, whch makes t dffcult to ump from one mode to another. We compare the performance of AEMC wth the Metropols algorthm and EMC. Each algorthm

386 Stat Comput (2008) 18: 375 390 Fg. 11 Convergence rate of dfferent algorthms was used to obtan 10 5 samples and all numercal results were averages of 10 runs. The Metropols algorthm was appled wth a unform proposal dstrbuton U[x 2,x+ 2] 5. The acceptance rate was.22. The Metropols algorthm could not escape from the mode n whch t started. We then compare AEMC wth EMC. We only look at samples of the frst dmenson, snce each dmenson s ndependent of each other. Snce the true hstogram of the dstrbuton s known, we can calculate the L 2 dstance between the estmated mass vector and the true dstrbuton. Specfcally, we dvde the nterval [ 10, 10] nto 40 ntervals (wth a resoluton of.5), and we can calculate the true and estmated probablty mass respectvely n each of the ntervals. All EMC related parameters are set followng Lang and Wong (2001). In AEMC, we set h = 25% so that samples from both modes can be obtaned. If h s too small, AEMC wll focus only on the peaks of the functon and thus only samples around the mode 5 can be obtaned (ths s because the probablty of samplng around the mode 5 s twce as large as the mode 0). In EMC, we employ the mutaton and crossover operators used n Lang and Wong (2001). The acceptance rates of mutaton and crossover operators were.22 and.44, respectvely. In AEMC, the acceptance rates of mutaton, crossover, and data-mnng operators were.23,.54, and.10, respectvely. The Fg. 11 shows the L 2 dstance versus the number of samples for the three methods n comparson. AEMC converges faster than EMC and the Metropols algorthm, and ts samplng qualty s far better than the Metropols algorthm and t also acheves better samplng qualty than EMC. 5 Conclusons In ths paper, we have presented an AEMC algorthm for optmzaton problems wth black-box obectve functon, whch are often encountered n engneerng desgns (e.g., a sensor placement problem n an assembly process). Our experence ndcates that hybrdzng a predctve model wth an evolutonary Monte Carlo method could mprove the convergence rate for an optmzaton procedure. We have also shown that the algorthm mantans the ergodcty, mplyng ts convergence to the global optmum. Numercal studes are used to compare the proposed AEMC method wth other alternatves. All methods n comparson are used to solve a sensor placement problem and to optmze a Grewank functon. In these studes, AEMC outperforms other alternatves and shows a much enhanced convergence rate. Ths paper focuses manly on the applcaton of AEMC n solvng optmzaton problems. Yet consderng that the AEMC algorthm s an adaptve MCMC method, t should also be useful for samplng from complcated probablty dstrbutons. The EMC algorthm has already been shown to be a powerful tool as a samplng method. We beleve that the data-mnng component could further mprove the convergence rate to the target dstrbuton wthout destroyng the ergodcty of the algorthm. We demonstrated the effectveness of AEMC for samplng purposes usng a mxture Gaussan dstrbuton. In the current verson of AEMC, we use CART as the metamodelng method; consequently the sample space s slced nto rectangles. When the functon surface s complex, a rectangular partton may not be suffcent. A more sophstcated partton may be requred. However, a good replacement for CART may not be straghtforward to fnd because any vable canddate must be computatonally effcent so as not to slow down the optmzaton process. Some other data-mnng or predctve modelng methods, such as neural networks, may have more learnng power, but they are computatonally much more expensve and are therefore less lkely to be a good canddate for the AEMC algorthm. Acknowledgements The authors would lke to gratefully acknowledge fnancal support from the NSF under grants CMMI-0348150 and CMMI-0726939. The authors also apprecate the edtors and the referees for ther valuable comments and suggestons. Appendx We frst descrbe the crossover, mutaton, and exchange operators used n the EMC algorthm n Appendx A. InAppendx B, we then gve a bref summary of the publshed results on the convergence of adaptve MCMC algorthms. In Appendx C, we prove the ergodcty of the AEMC algorthm.

Stat Comput (2008) 18: 375 390 387 Appendx A: Operators n the EMC algorthm A.1 Crossover From the current populaton x (k) ={x (k) 1,...,x(k) n }, wefrst select one parental par, say x (k) and x (k) ( ). The frst parental sample s chosen accordng to a roulette wheel procedure wth Boltzmann weghts. Then the second parental sample s chosen randomly from the rest of the populaton. So the probablty of selectng the par (x (k),x (k) ) s P ((x (k),x (k) ) x (k) ) 1 = (n 1)G(x (k) ) [ ] exp{ H(x (k) )/τ s }+exp{ H(x (k) )/τ s }, (8) where G(x (k) ) = n =1 exp{ H(x (k) )/τ s }, and τ s s the selecton temperature. Two offsprngs are generated by some crossover operator, and the offsprng wth a smaller ftness value s denoted as y and the other s y. All the crossover operators used n genetc algorthm, e.g., 1-pont crossover, 2-pont crossover and real crossover, could be used here. Then the new populaton y ={x (k) 1,...,y,...,y,...,x n (k) } s accepted wth probablty mn(1,r c ), r c = f(y) T(x (k) y) f(x (k) ) T(y x (k) ) = exp{ (H (y ) H(x (k) ))/t (H (y ) H(x (k) ))/t } T(x(k) y) T(y x (k) ), where T( ) s the transton probablty between populatons, and T(y x) = P ((x,x ) x) P ((y,y ) (x,x )) for any two populatons x and y. If the proposal s accepted, the populaton x (k+1) = y, otherwse x (k+1) = x (k). Note that all the crossover operators are symmetrc,.e., P ((y,y ) (x,x )) = P ((x,x ) (y,y )). So T(x y)/ T(y x) = P ((y,y ) y)/p ((x,x ) x), whch can be calculated accordng to (8). Followng the above selecton procedure, samples wth better H( ) values have a hgher probablty to be selected. Offsprng generated by these parents wll lkely to be good as well. In other words, the offsprng have learned from the good parents. So crossover operator allows us to construct better proposal dstrbutons, and new samples generated by t are more lkely to have better obectve H( ) values. A.2 Mutaton A sample, say x (k), s randomly selected from the current populaton x (k), then mutated to a new sample y by reversng the values of some randomly chosen bts. Then the new populaton y ={x (k) 1,...,y,...,x n (k) } s accepted wth probablty mn(1, r m ), r m = f(y) T(x (k) y) f(x (k) ) T(y x (k) ) = exp{ (H (y ) H(x (k) ))/t } T(x(k) y) T(y x (k) ). The 1-pont, 2-pont, and unform mutaton are all symmetrc operators, and thus T(y x (k) ) = T(x (k) y). Ifthe proposal s accepted, the populaton x (k+1) = y, otherwse x (k+1) = x (k). A.3 Exchange Gven the current populaton x (k) and the temperature ladder t, we try to change (x (k), t) = (x (k) 1,t 1,...,x (k),t,..., x (k),t,...,x n (k),t n ) to (x, t) = (x (k) 1,t 1,...,x (k),t,..., x (k),t,...,x n (k),t n ). The new populaton x s accepted wth probablty mn(1,r e ), where r e = f(x ) T(x (k) x ) f(x (k) ) T(x x (k) ) { ( = exp (H (x (k) ) H(x (k) 1 )) 1 t t )} T(x (k) x ) T(x x (k) ). If ths proposal s accepted, x (k+1) = x, otherwse x (k+1) = x (k). Typcally, the exchange s performed only on states wth neghborng temperature values,.e., =1. Let p(x (k) ) be the probablty that x (k) s chosen to exchange wth another state, and w(x (k) x (k) ) be the probablty that x (k) s chosen to exchange wth x (k).so = ± 1, and w(x (k) ) = w(x (k) +1 x(k) w(x (k) n 1 x(k) 1 x(k) ) =.5 and w(x (k) 2 x(k) 1 ) = n ) = 1. The transton probablty T(x x (k) ) = p(x (k) ) w(x (k) x (k) ) + p(x (k) ) w(x (k) x (k) ), and thus T(x x (k) ) = T(x (k) x ). Appendx B: Publshed results on the convergence of adaptve MCMC algorthms In ths secton, we brefly revew the results presented n Roberts and Rosenthal (2007). Let f( ) be a target probablty dstrbuton on a state space X n wth B X n = B X B X beng the σ -algebra generated by measurable rectangles. Let {K γ } γ Y be a collecton of Markov chan kernels on X n, each of whch has f( ) as a statonary dstrbuton: (f K γ )( ) = f( ). Assume K γ s φ-rreducble and aperodc, and we have that K γ s ergodc for f( ),.e., for all x X n, lm k Kγ k (x, ) f( ) =0, where μ( ) ν( ) =sup B BX n μ(b) ν(b) s the usual total