Design and Analysis of Computer Experiments for Screening Input Variables. Dissertation. Hyejung Moon, M.S. Graduate Program in Statistics

Size: px
Start display at page:

Download "Design and Analysis of Computer Experiments for Screening Input Variables. Dissertation. Hyejung Moon, M.S. Graduate Program in Statistics"

Transcription

1 Design and Analysis of Computer Experiments for Screening Input Variables Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Hyejung Moon, M.S. Graduate Program in Statistics The Ohio State University 2010 Dissertation Committee: Thomas J. Santner, Co-Adviser Angela M. Dean, Co-Adviser William I. Notz

2 c Copyright by Hyejung Moon 2010

3 ABSTRACT A computer model is a computer code that implements a mathematical model of a physical process. A computer code is often complicated and can involve a large number of inputs, so it may take hours or days to produce a single response. Screening to determine the most active inputs is critical for reducing the number of future code runs required to understand the detailed input-output relationship, since the computer model is typically complex and the exact functional form of the inputoutput relationship is unknown. This dissertation proposes a new screening method that identifies active inputs in a computer experiment setting. It describes a Bayesian computation of sensitivity indices as screening measures. It provides algorithms for generating desirable designs for successful screening. The proposed screening method is called GSinCE (Group Screening in Computer Experiments). The GSinCE procedure is based on a two-stage group screening approach, in which groups of inputs are investigated in the first stage and then inputs within only those groups identified as active at the first stage are investigated individually at the second stage. Two-stage designs with desirable properties are constructed to implement the procedure. Sensitivity indices are used to measure the effects of inputs on the response. Inputs with large sensitivity indices are determined by comparison with a benchmark null distribution constructed from user-specified, low-impact inputs. The use of low-impact inputs is useful for screening out inputs ii

4 having small effects as well as those that are totally inert. Simulated examples show that, compared with one-stage procedures, the GSinCE procedure provides accurate screening while reducing computational effort. In this dissertation, the sensitivity indices used as screening measures are computed in a Gaussian process model framework. This approach is known to be computationally efficient by using small numbers of expensive computer code runs for the estimation of sensitivity indices. The existing approach for quantitative inputs is extended so that sensitivity indices can be computed when inputs include a qualitative input in addition to quantitative inputs. An orthogonal design in which the design matrix has uncorrelated columns is important for estimating the effects of inputs. Moreover, a space-filling design for which design points are well spread out is needed to explore the experimental region thoroughly. New algorithms for achieving such orthogonal space-filling designs are proposed in this dissertation. The three kinds of software are provided for the proposed GSinCE procedure, computation of sensitivity indices, and design search algorithms. iii

5 This is dedicated to my daughter Moonyoung, son Nathan, husband Jungick, and parents. iv

6 ACKNOWLEDGMENTS I would first like to express my gratitude to my co-advisors, Professor Thomas Santner and Professor Angela Dean. They have given me tremendous help in my professional development and great guidance in my life. They are very special teachers and mentors to me. I am truly grateful for the effort that they have put into my education and the time that they have shared with me. I would also like to thank Professor William Notz for helpful comments and support as a member of my dissertation committee. I want to give special thanks to my parents for their love and support. Without their help and sacrifices, my husband Jungick and I could not finish Ph.D. study at the same time. I would also like to thank Jungick for his love and for every moment that we have shared during our Ph.D. study. I am most thankful to my precious little ones, daughter Moonyoung and son Nathan. They have given me all the happiness, hope, and strength to do my best in my life. v

7 VITA October Korea B.S. Statistics, Korea University 2000 to Statistician, The Bank of Korea M.S. Statistics, The Ohio State University 2005 to present Graduate Research Associate, Graduate Teaching Associate, The Ohio State University FIELDS OF STUDY Major Field: Statistics vi

8 TABLE OF CONTENTS Page Abstract Dedication Acknowledgments Vita List of Tables ii iv v vi x List of Figures xiv Chapters: 1. Introduction Computer Experiments Gaussian Stochastic Process Model Screening Procedure Screening in Computer Experiments Group Screening in Physical Experiments Design of Computer Experiments Overview of Dissertation Two-stage Sensitivity-based Group Screening in Computer Experiments Introduction Background Overview of the Proposed Procedure GSinCE Initialization Stage GSinCE Procedure Stage vii

9 2.3.1 Stage 1 Sampling Phase Stage 1 Grouping Phase Stage 1 Analysis Phase GSinCE Procedure Stage Stage 2 Sampling Phase Stage 2 Analysis Phase Performance of GSinCE Simulation Studies to Set τ Simulations for f = Simulations for f = Simulations for f = Summary of Simulation Studies Application of GSinCE in Least Favorable Cases Small Percentage of Active Inputs Non-linear Functions Detecting Large Effects Properties of Two-stage Designs Augmented Design Combined Design at Stage Application of GSinCE Examples from the Literature Borehole Model A Model for the Weight of an Aircraft Wing OTL Circuit Model Piston Simulator Model Summary A Real Computer Experiment: FRAPCON Model Description of Code Use of GSinCE Implementations Computation of Sensitivity Indices Sensitivity Indices of Quantitative Inputs Definition of Sensitivity Indices Estimation in Gaussian Process Framework The Integrals: sgint, dbint, mxint Example viii

10 5.2 Sensitivity Indices of Mixed Inputs Setup Correlation Function for Mixed Inputs Estimation of Sensitivity Indices for Mixed Inputs Example Algorithms for Generating Maximin Latin Hypercube and Orthogonal Designs Introduction Maximin Criteria for Space-filling Designs Algorithms for Space-filling Latin Hypercube Designs Complete Search and Random Generation Random Swap Methods for Maximin LHDs A Smart Swap Method for Maximin LHDs Algorithms for Orthogonal Maximin Designs Orthogonal Maximin LHDs Orthogonal Maximin Gram-Schmidt Designs Comparisons Maximin LHDs Orthogonal Maximin Designs Summary Alternative Two-stage Designs Orthogonal Array-based Latin Hypercube Design Stage 1 Design for a Two-stage Group Screening Procedure Construction Secondary Criteria Stage 2 Design for a Two-stage Group Screening Procedure Limitations Availability of OA-based LHD Group Variable Defined by Averaging Software GSinCE Code Sensitivity Code Maximin Code Bibliograhpy ix

11 LIST OF TABLES Table Page 3.1 Marginal probabilities and coefficient distributions for the simulation study Six combinations used to recommend τ Median and IQR values of the performance measures, and average number of groups and average total runs over 200 test functions with about 25% of active inputs among f = 20 inputs for each τ in each combination; value in parentheses is the number of test functions generated with no active inputs Modified values of q L and q NN. Other probabilities are as in Table 3.1 to achieve about 25% of f = 30 inputs active Median values of the performance measures, and median/average values of true/claimed active inputs over 50 test functions with about 25% of active inputs among f = 30 inputs; value in parentheses is the number of test functions generated with no active inputs Modified values of q L to achieve about 25%, and 35% of f = 10 inputs active, while keeping other probabilities as in Table Median values of the performance measures, and median/average values of true/claimed active inputs over 100 test functions with about 25% of active inputs among f = 10 inputs; value in parentheses is the number of test functions generated with no active inputs Median values of the performance measures, and median/average values of true/claimed active inputs over 100 test functions with about 35% of active inputs among f = 10 inputs; value in parentheses is the number of test functions generated with no active inputs x

12 3.9 Median values of the performance measures, and median/average values of true/claimed active inputs over 100 test functions with about 20% of active inputs among f = 10 inputs; value in parentheses is the number of test functions generated with no active inputs Median values of FDR, FNDR, specificity, sensitivity over 30 functions having small percentages of active inputs All coefficients of test function (3.5) Results of automatic grouping and applying GSinCE for test function (3.5) Results of original, modified procedures and one-stage method Comparisons of median values of FDR, FNDR, specificity, sensitivity over 30 non-linear functions for the original and modified procedures Result of automatic grouping and applying GSinCE in favorable situations Minimum inter-point distance and computation time of two design methods and the estimated TESIs based on each of these designs Grouping and active effect selection by GSinCE for the borehole model Computation times and active effect selection of the four procedures for the borehole model using 70 runs Grouping and active effect selection by GSinCE for the aircraft wing weight model Computation times and active effect selection of the four procedures for the aircraft wing weight model using 65 runs Grouping and active effect selection by GSinCE for the OTL circuit model Computation times and active effect selection of the four procedures for the OTL circuit model using 60 runs xi

13 4.7 Grouping and active effect selection by GSinCE for the piston model Computation times and active effect selection of the four procedures for the piston model using 85 runs Summary of screening for all outputs based on grouping by EDA Stage 1 grouping by EDA and selection for y Stage 1 grouping by EDA and selection for y Stage 1 grouping by EDA and selection for y Stage 1 grouping by EDA and selection for y Grouping by expert Summary of screening for all outputs based on grouping by expert Construction of subgroups within a group made by expert Summary of screening for all outputs based on grouping by expert and EDA Summary of screening for all groupings Estimated sensitivity indices for the example function in (5.46) using different correlation functions and approaches Estimated sensitivity indices for the example function in (5.84) Characteristics of best (n, k) = (9, 4) designs formed using criterion d (2) min : ϕ 15 is Morris and Mitchell (1995) objective function with p = 15; ρ 2 ave is average squared correlation; d (4) min is minimum 4-dimensional rectangular distance; ρ max is maximum absolute correlation; T is number of starting designs xii

14 6.2 Characteristics of best (n, k) = (40, 5) designs formed using criterion d (2) min : ϕ 15 is Morris and Mitchell (1995) objective function with p = 15; ρ 2 ave is average squared correlation; d (5) min is minimum 4-dimensional rectangular distance; ρ max is maximum absolute correlation; T is number of starting designs Best orthogonal maximin 9 4 designs found by the OMLHD, OSGSD- ϕ 15, OSGSD-d (2) min algorithms based on 7 minutes of computational time and scatterplot matrices of these designs Comparisons of best designs found by OMLHD, OSGSD-ϕ 15, and OSGSDd (2) min algorithms based on 7 minutes of computational time: ϕ 15 is Morris and Mitchell (1995) objective function with p = 15; ρ 2 ave is average squared correlation; d (4) min is minimum 4-dimensional rectangular distance; ρ max is maximum absolute correlation; T is number of starting designs Distributions of ϕ 15 and d (4) min values in designs produced by the OSGSD-ϕ 15 algorithm (4 seconds of computation) and corresponding values of the best scaled OMLHD design indicated by horizontal lines Comparisons of best designs found by OMLHD, OSGSD-ϕ 15, and OSGSDd (2) min algorithms based on 9 hours and 42 minutes of computational time: ϕ 15 is Morris and Mitchell (1995) objective function with p = 15; ρ 2 ave is average squared correlation; d (5) min is minimum 5-dimensional rectangular distance; ρ max is maximum absolute correlation; T is number of starting designs Distributions of ϕ 15 and d (5) min values in designs produced by the OSGSD-ϕ 15 algorithm (349 seconds of computation) and corresponding values of the best scaled OMLHD design indicated by horizontal lines Secondary design criteria for Stage 1 design Stage 1 design X (1) Ranges and estimated TESIs of 2 groups under 3 different groupings. 153 xiii

15 LIST OF FIGURES Figure Page 3.1 Median values of FDR (line with circle), FNDR (line with asterisk), specificity (line with cross), sensitivity (line with diamond) over 200 test functions versus τ 100% for functions with about 25% of active inputs among f = 20 inputs Median values of FDR (line with circle), FNDR (line with asterisk), specificity (line with cross), sensitivity (line with diamond) over 50 test functions versus τ 100% for functions with about 25% of active inputs among f = 30 inputs Median values of FDR (line with circle), FNDR (line with asterisk), specificity (line with cross), sensitivity (line with diamond) over 100 test functions versus τ 100% for functions with about 25% of active inputs among f = 10 inputs Median values of FDR (line with circle), FNDR (line with asterisk), specificity (line with cross), sensitivity (line with diamond) over 100 test functions versus τ 100% for functions with about 35% of active inputs among f = 10 inputs Median values of FDR (line with circle), FNDR (line with asterisk), specificity (line with cross), sensitivity (line with diamond) over 100 test functions versus τ 100% for functions with about 20% of active inputs among f = 10 inputs Description of dbint of the cubic correlation function Description of R Y (x, η 1 ; θ)r Y (x, η 2 ; θ) of the cubic correlation function Rotate Method xiv

16 7.2 Shrink Method with h = xv

17 CHAPTER 1 INTRODUCTION 1.1 Computer Experiments There are many complex physical phenomena that are impossible or too expensive to study using physical experiments. However, some of these physical processes can be described by means of a mathematical model which relates inputs to output. A computer model is the implementation of such mathematical models in computer code. A computer experiment is the use of the computer code as an experimental tool in which the experimenter seeks to determine the computational response of the code to the inputs. Computer experiments are prevalent in a wide range of studies, for example, in engineering (Fang, Li, and Sudjianto (2005)), in biomechanics (Ong, Lehman, Notz, Santner, and Bartel (2006)), in the physical sciences (Higdon, Kennedy, Cavendish, Cafeo, and Ryne (2004)), in the life sciences (Upton, Guilak, Laursen, and Setton (2006), Fogelson, Kuharsky, and Yu (2003)), in economics (Lempert, Williams, and Hendrickson (2002)) and other areas of natural science. The output from most computer codes is deterministic; that is, two runs of a computer code at the same set of input values give an identical output value. Hence the traditional principles of blocking, randomization, and replication of the physical 1

18 experiment are not required for the design of a computer experiment. A computer code is often complicated and can involve a large number of inputs, so it may take hours or days to produce a single output. Thus there is a need for efficient screening methods for detecting inputs that have influential impacts on an input-output system. A flexible predictor (see Section 1.2) is often fitted to the outputs to provide a rapidlycomputable surrogate predictor (a metamodel for the code). The performance of the predictor depends upon the choice of the training design points used to develop the predictor, so there is a need for careful design. In this dissertation, a new screening method is proposed for computer experiments. The computation of sensitivity indices as screening measures is discussed together with construction of designs for successful screening. 1.2 Gaussian Stochastic Process Model Although the output from a computer code is deterministic, uncertainty arises since the exact functional form of the input-output relationships is unknown from a limited number of runs. So statistical models are needed to characterize the uncertainty. The Gaussian process (GP) model has been popularly used to model the output from a computer experiment, because it provides a flexible framework by producing a large class of potential response surfaces and easily adapts to the presence of nonlinearity and interactions. In the following, the GP model (see Sacks, Welch, Mitchell, and Wynn (1989) and Santner, Williams, and Notz (2003), chapters 2 and 3) is reviewed briefly. Let y(x) be a scalar output which is a function of a k-dimensional vector of inputs, x = (x 1, x 2,..., x k ). Then the GP model treats the deterministic output y(x) as a 2

19 realization of a random function Y (x), Y (x) = f (x)β + Z(x) (1.1) where f(x) = (f 1 (x),..., f q (x)) is a q 1 vector of known regression functions at x and β = (β 1,..., β q ) is a q 1 vector of unknown regression coefficients. The Z( ) is a stationary Gaussian process with mean zero, variance 1/λ Z, and covariance function Cov(Z(x), Z( x)) = 1 λ Z R(x x) (1.2) where x and x are two input sites, and R(x x) is a correlation function of Z( ). Valid correlation functions must have R(0) = 1, must be symmetric about the origin, i.e., R(h) = R( h), and be positive definite. One of the popular correlation functions is the product power exponential correlation function, k R(x x) = exp( θ j (x j x j ) p j ) (1.3) j=1 where θ j > 0, and 0 < p j 2. The Gaussian correlation function is the special case when p j = 2. Cubic and Matérn correlation functions are also widely used (see Santner et al. (2003), chapter 2). Suppose that output y(x 0 ) is predicted at the new input site x 0, based on the training data y n = (y(x 1 ),..., y(x n )) at the n input sites x 1,..., x n. Then Y 0 = Y (x 0 ), Y 1 = Y (x 1 ),..., Y n = Y (x n ) are also a GP from (1.1), and hence the joint distribution of Y 0 and Y n = (Y 1,..., Y n ) is a multivariate normal distribution ( Y0 Y n ) N 1+n [ ( f 0 F ) ( 1 1 r β, 0 λ Z R r 0 ) ] (1.4) where f 0 = f(x 0 ) is a q 1 vector of regression functions at x 0, F is an n q matrix of regression functions having (i, j) th element f j (x i ) for 1 i n, 1 j q, 3

20 r 0 = (R(x 0 x 1 ),..., R(x 0 x n )), and R is an n n matrix having (i, j) th element R(x i x j ). Let ψ be a vector of parameters of the correlation function R( ). Given the training data y n and the model parameters (β, λ Z, ψ), Y 0 has a conditional normal distribution [Y 0 y n, β, λ Z, ψ] N [ f 0 β + r 0 R 1 (y n F β), ] 1 (1 r 0 R 1 r 0 ), (1.5) λ Z from (1.4). The minimum MSPE linear unbiased or a best linear unbiased predictor (BLUP) of Y 0 (see Sacks et al. (1989)) is Ŷ 0 = f 0 β + r 0 R 1 (y n F β) (1.6) where β = (F R 1 F ) 1 F R 1 y n is the generalized least squares estimator of β. In practice, the parameters ψ in the correlation function R( ) are unknown, so the estimates R and ˆr 0 can be used instead of R and r 0 in (1.6). Such a predictor is called empirical best linear unbiased predictor (EBLUP) of Y 0. According to the method used for estimating ψ, different EBLUPs such as maximum likelihood EBLUP, restricted maximum likelihood EBLUP, cross-validation EBLUP, and posterior mode EBLUP can be obtained. See Santner et al. (2003), chapter 3, for more details. In a fully Bayesian approach, prior distributions for the model parameters (β, λ Z, ψ) are specified, and then the predictor of Y 0 is obtained as the mean of the predictive distribution [Y 0 y n ], i.e., E(Y 0 y n ) = E Y0 (E β,λz,ψ(y 0 y n, β, λ Z, ψ)). (1.7) To compute the Bayesian predictor numerically, one can take draws of the model parameters (β, λ Z, ψ) from the posterior distribution [β, λ Z, ψ y n ] using Markov Chain Monte Carlo (MCMC) sampling method, compute E(Y 0 y n, β, λ Z, ψ) = f 0 β + 4

21 r 0 R 1 (y n F β) using each draw of (β, λ Z, ψ), and take a sample mean of estimates of E(Y 0 y n, β, λ Z, ψ) based on different draws of the parameters. 1.3 Screening Procedure A computer code can involve a huge number of inputs and may take hours or days to produce a single output at a certain set of input values because of the complexity of the codes. Thus there is a need for efficient methods for detecting inputs that have major impacts on an input-output system. Also in physical experiments, a conventional factorial experiment may not be economically feasible when numerous factors are considered. So it is necessary to identify influential inputs at an early stage of experimentation. These can be investigated further at a later stage. Screening methods developed in the setting of computer experiments are reviewed in Section and group screening in physical experiments in Section A new two-stage group screening procedure that identifies active inputs in a computer experiment setting and borrows ideas from group screening is proposed in Chapter Screening in Computer Experiments Most of the screening methods in computer experiments are based on the GP model described in Section 1.2. For example, Sacks et al. (1989) used a decomposition of the output function y(x) into an average effect, main effects for each input, twofactor interactions, and high-order interactions, estimated the effects by replacing y(x) by the predictor based on the GP model, and plotted the estimated effects to investigate the importance of the inputs. Welch, Buck, Sacks, Wynn, Mitchell, and Morris (1992) extended Sacks et al. (1989) method so as to build an accurate predictor and identify important inputs when there are up to inputs. Oakley and O Hagan 5

22 (2004) presented a Bayesian approach to perform probabilistic sensitivity analysis which formulates uncertainty in the model inputs by a joint probability distribution and then analyzes the induced uncertainty in the output. Schonlau and Welch (2006) described the implementation of visualizing the estimated effects and quantifying the importance of the inputs via an ANOVA type of decomposition. Linkletter, Bingham, Hengartner, Higdon, and Ye (2006) proposed a Bayesian method to select active inputs based on the posterior distribution of the parameters of the Gaussian correlation function. Campbell, McKay, and Williams (2006) suggested sensitivity analysis for functional computer model outputs by expanding the functional outputs in terms of an appropriate set of basis functions and doing sensitivity analysis on the coefficients of the expansion. Higdon, Gattiker, Williams, and Rightley (2008) performed sensitivity analysis for high-dimensional output using basis representations to reduce the dimensionality Group Screening in Physical Experiments Group screening methodology was first described by Dorfman (1943) for blood screening and was later adapted to the setting of physical experiments by Watson (1961) for identifying active factors in cases where there are many potentially influential input factors. In two-stage group screening, the first stage of experimentation is done on groups of factors. The individual factors within the groups identified as active in the first stage are then investigated individually in a second-stage experiment. The early work in this area considered models in which only main effects were considered, but has now been extended to handle interactions. Lewis and Dean (2001) investigated two-stage group screening strategies for detecting interactions. Vine, Lewis, 6

23 and Dean (2005) developed methodology for handling groups of unequal sizes as well as unequal probabilities for factors being active. Morris (2006) gave a survey of group screening and its use in searching for active factors. Vine, Lewis, Dean, and Brunson (2008) discussed practical aspects involved in running a two-stage group screening experiment for investigating interactions. 1.4 Design of Computer Experiments The output from most computer codes is deterministic and hence, no replications are required at any design point. Moreover, for the thorough exploration of the experimental region, the design points should be spread evenly throughout the region. Such designs for which the points are unreplicated and well spread output are called space-filling. McKay, Beckman, and Conover (1979) introduced Latin hypercube designs for use in computer experiments. In its simplest form, each column of an n k Latin hypercube design (LHD) has hth column ξ h = [ξ 1h, ξ 2h,..., ξ nh ] which can be obtained from a random permutation π h = [π 1h, π 2h,..., π nh ] of 1,..., n. Then ξ ih is the midpoint of the interval [(π ih 1)/n, π ih /n]. In a slightly more sophisticated approach, a random point in this interval may be taken; the latter procedure is used in this dissertation. All LHDs have the one-dimensional space-filling property that an observation is taken in every one of the n evenly spaced intervals over the [0, 1] range of each input. However, they need not have space-filling properties in higher dimensions. Several criteria to generate space-filling designs are described in Santner et al. (2003), chapter 5. In particular, the maximin distance criterion which finds a design which 7

24 maximizes minimum inter-point distance was first introduced by Johnson, Moore, and Ylvisaker (1990) and extended by Morris and Mitchell (1995). Another desirable property of a design for a computer experiment is that of orthogonality, where the design matrix has uncorrelated columns. If values of two inputs are highly correlated, then it is difficult to distinguish their effects on the output. The orthogonal design allows one to assess the effects of the different inputs independently. Tang (1993) proposed a method of constructing the orthogonal array-based LHDs by combining the desirable properties of both orthogonal arrays and LHDs. Owen (1994) proposed an algorithm for generating LHDs with small pairwise correlations between input variables. Tang (1998) developed an algorithm for reducing polynomial canonical correlations of LHDs by extending Owen (1994) s algorithm. Ye (1998) proposed a construction method for orthogonal LHDs with n = 2 m + 1 runs and k = 2m 2 input variables, and used an improvement algorithm for selecting designs within this class under space-filling and other criteria. Butler (2001) presented a construction method for LHDs which are orthogonal with respect to models based on trigonometric functions. Steinberg and Lin (2006) constructed orthogonal LHDs with n = 2 k runs where the number k of inputs is a power of 2, which can include more inputs than those proposed by Ye (1998). Cioppa and Lucas (2007) extended Ye (1998) s approach to construct orthogonal LHDs that can accommodate more inputs and presented a method that improves the space-filling properties of the resulting LHD at the expense of inducing small correlations. Joseph and Hung (2008) proposed an exchange algorithm for efficient generation of LHDs under a weighted combination of orthogonality and space-filling criteria. Lin, Mukerjee, and Tang (2009) proposed a method for constructing large dimensional orthogonal and nearly orthogonal LHDs 8

25 by using an orthogonal array and a small LHD. Bingham, Sitter, and Tang (2009) constructed a class of orthogonal designs which have various choices for the number of levels and flexible runs sizes by relaxing the constraint of LHD. In Chapter 6, a new method for achieving orthogonal and space-filling designs, based on Gram-Schmidt orthogonalization is proposed. 1.5 Overview of Dissertation The rest of dissertation is organized as follows. Chapter 2 proposes a new twostage group screening procedure that identifies active inputs in a computer experiment setting. The performance of the proposed method is discussed in Chapter 3 and application of the new method is demonstrated in Chapter 4. The computation of sensitivity indices as screening measures is discussed in Chapter 5. Chapter 6 presents a new algorithm for generating maximin LHDs and a new algorithm for achieving orthogonal maximin designs using Gram-Schmidt orthogonalization. Alternative approaches for creating two-stage designs are given in Chapter 7. Chapter 8 provides software for the proposed screening procedure, computation of sensitivity indices, and design search algorithms. 9

26 CHAPTER 2 TWO-STAGE SENSITIVITY-BASED GROUP SCREENING IN COMPUTER EXPERIMENTS This chapter proposes a new two-stage group screening procedure that identifies active inputs in computer experiments. The whole procedure is explained here in detail. Further discussions related with the performance of the procedure are described in Chapter 3 and the application using various examples is shown in Chapter Introduction Background A computer model is a numerical implementation of a mathematical description of an input-output relationship of a physical process. Modelling through computer codes is prevalent in a wide range of applications, for example, in engineering (Fang et al. (2005)), in biomechanics (Ong et al. (2006)), in the physical sciences (Higdon et al. (2004)), in the life sciences (Upton et al. (2006), Fogelson et al. (2003)), in economics (Lempert et al. (2002)), and other areas of natural science. Over the past 20 years, the use of computer codes as experimental tools has become increasingly sophisticated. In addition to inputs that describe different treatments, computer models can allow the user to vary environmental inputs that describe the 10

27 conditions in which the process operates, and calibration inputs which are unknown physical constants in the underlying mathematical model and for which expert-based subjective distributions are available. For example, Ong, Santner, and Bartel (2008) presented an application in the biomechanical engineering design of a prosthetic acetabular cup in which the hip socket of a prosthetic total hip replacement rotates. In addition to inputs defining the cup geometry, their study included inputs representing environmental conditions such as the patient bone quality and loading patterns, inputs describing mis-alignments from nominal cup insertion values (which represent the level of surgeon skill), and inputs describing unknown aspects of the true physical setting such as the interface friction between the bone and prosthesis. The finiteelement codes used in this and other complex applications can require up to 24 hours for a single run. Consequently, there is a need for efficient methods for detecting inputs that have major impacts on an input-output system. These are called the active or influential inputs. Once identified, researchers can restrict attention to varying only the active inputs (while setting other inputs to nominal values), thus reducing the number of future code runs needed to understand the detailed input-output relationship. The literature contains several proposals for screening inputs in computer experiments where the deterministic output is modelled as a realization of a random function. An approach that decomposes a Gaussian random function approximator of a computer model into an average effect, main effects for each input, two-factor interactions, and high-order interactions, and plots the estimated effects or quantifies the importance of the effects has been applied by many authors (section 1.3.1), for example, Sacks et al. (1989), Welch et al. (1992), Oakley and O Hagan (2004), and 11

28 Schonlau and Welch (2006). Linkletter et al. (2006) proposed a Bayesian method to select active inputs based on the posterior distribution of the parameters of the Gaussian correlation function. Campbell et al. (2006) and Higdon et al. (2008) performed sensitivity analysis for multiple outputs. For complex computer codes that are expensive to run and that must account for many inputs, standard screening methods described in Section 1.3 can be timeconsuming. A screening method is presented below that incorporates experimental design considerations and group screening and allows the user to identify influential inputs in a computer experiment with computational efficiency. As mentioned in Section 1.3.2, group screening methodology was first described by Dorfman (1943) for blood screening and was later adapted to the setting of physical experiments by Watson (1961) for identifying active factors in cases where there are many potentially influential input factors. The early work in this area considered models in which only main effects were considered, but has now been extended to handle interactions (see Morris and Mitchell (1983), Lewis and Dean (2001), and Vine et al. (2005); see also reviews by Kleijnen (1987) and Morris (2006)). Group Screening works well under effect sparsity, where the proportion of active main effects and interactions is small. In two-stage group screening, the first stage of experimentation is done on groups of factors. The individual factors within the groups identified as active in the first stage are then investigated individually in a secondstage experiment. The proposed method in this dissertation provides a two-stage group screening procedure that identifies active inputs in a computer experiment setting; it eliminates non-active inputs having small effects, not merely those having zero effects. This 12

29 approach reduces the number of experimental runs needed to understand the inputoutput relationship because groups of inputs with small effects are dropped at an early stage of the procedure Overview of the Proposed Procedure For clarity of exposition, the description of each stage of the proposed procedure is divided into sampling, grouping (for Stage 1 only), and analysis phases. The proposed procedure is called the GSinCE (Group Screening in Computer Experiments) procedure. An outline is given below, with further details in Sections Initialization Given that n runs of the computer code are to be made in Stage 1, a matrix X with n rows and (n 1) columns satisfying certain desirable properties is generated, as described in Section 2.2. The choice of n is discussed in Section Stage 1 In the sampling phase, a set of columns from X is selected to produce a design matrix X (1). The computer code is run at the design points (rows) in X (1) and a Gaussian process (GP) model is fitted to the output. In the grouping phase, the output is used to place the inputs into disjoint sets (groups). All inputs in the same group are set equal to the same level, defined by a design matrix G (Section 2.3.2) and the fitted GP model is used to predict the output at the design points in G. The analysis phase (Section 2.3.3) uses total effect sensitivity indices to determine which groups of inputs are inactive and which potentially contain active inputs. To judge whether a group is active or non-active, an additional low-impact input is created to use as a benchmark (c.f. Linkletter et al. (2006), Wu, Boos, and Stefanski (2007)). Stage 2 The inputs in the groups selected as active in Stage 1 are investigated individually in Stage 2. In the Stage 2 sampling phase (Section 2.4.1), a new design 13

30 matrix X (2) is selected in such a way that the design points in the combined X (1), X (2) retain, as closely as possible, desirable properties identified in Section 2.2. The computer code is run at the design points in X (2). The Stage 2 analysis phase uses the outputs from both stages in a second sensitivity analysis to make the final selection of active inputs (Section 2.4.2). 2.2 GSinCE Initialization Stage Suppose there are f experimental inputs, where the range of j th input is [a j, b j ] and a j and b j are known constants, for j = 1,..., f. Assume that the domain for the vector of the f inputs is the entire hyper-rectangle f j=1 [a j, b j ]. The design will be obtained from the scaled input space [0, 1] f and x tj [0, 1] will denote the value of the j th scaled input on the t th run of the design. Then the computer code is run to obtain the output using the unscaled input, z tj x tj (b j a j ) + a j, for t = 1,..., n and j = 1,..., f. In the Initialization Stage, a preliminary design matrix X is constructed with n rows and n 1 columns as below. Denote the j th column of X by ξ j = (ξ 1j,..., ξ nj ), where denotes transpose, j = 1,..., n 1, and the i th row as x i = (x i1,..., x i(n 1) ), i = 1,..., n. The design matrices for the Stage 1 sampling and grouping phases will be drawn from this matrix, as will those for the low-impact inputs. There are three requirements for the design matrix X. First, the columns of X are required to be uncorrelated to allow independent assessment of the effects of the different inputs. Second, the minimum and maximum values in each column must be 0 and 1, respectively; if this is not the case then those variables whose scaled input values in the design have larger ranges will have a larger impact on the response, artificially 14

31 induced by the design (see Section 3.3.2). Third, the design X should be spacefilling at each stage in the sense that the selected design maximizes the minimum inter-point distance in all 2-dimensional subspaces of the input space. This helps to insure that all regions of the input space are explored (c.f. Sacks et al. (1989), and Santner et al. (2003), chapter 5). These three properties are referred to, respectively, as (P.1), (P.2), and (P.3). An algorithm for generating X which satisfies (P.1) and (P.2), and approximately satisfies (P.3) follows. Step 1 Randomly generate an n (n 1) Latin hypercube design matrix Λ = (λ 1,..., λ n 1 ) with rank (n 1), (see McKay et al. (1979)). Step 2 Center each column of Λ : v h = λ h (λ h 1/n)1 for h = 1,..., n 1, where 1 is a vector of n unit elements. Step 3 Apply the Gram-Schmidt algorithm to form orthogonal columns u h = (u 1h,..., u nh ), u h = { v1, h = 1; v h h 1 i=1 u i v h u i 2 u i, h = 2,..., n 1. Step 4 Scale the values of u h to [0,1] to give ξ h = (ξ 1h,..., ξ nh ), where ξ ih = (u ih min{u 1h,..., u nh })/(max{u 1h,..., u nh } min{u 1h,..., u nh }), for i = 1,..., n and h = 1,..., n 1. Set X = (ξ 1,..., ξ n 1 ). Step 5 Select the design matrix X = (ξ 1,..., ξ n 1 ) which maximizes the minimum inter-point distance over all projections of the design into 2-dimensional space, i.e., maximizes min i<j; i,j {1,2,...,( n 2)} { } min (ξ ih ξ jh ) 2 + (ξ il ξ jl ) 2. h<l; h,l {1,2,...,( n 1 2 )} 15

32 Step 5 can be carried out (approximately) in a brute-force manner by repeating Steps 1-4 many times and selecting the best maximin design among the candidate designs generated. Alternatively, some form of genetic exchange algorithm (see Bartz- Beielstein (2006)) could be used to find an approximate maximin design, for example, the evolutionary operation (EVOP) method used in Forrester, Sóbester, and Keane (2008), chapter 1. Step 4 of the algorithm guarantees (P.2), while (P.1) can be verified as follows. Let ξ h = ξ h 1/n be the arithmetic mean of the elements in the h th column of X, then by construction, the correlation of ξ h and ξ l, h l, is r(ξ h, ξ l ) = (ξ h ξ h 1) (ξ l ξ l 1) (ξh ξ h 1) (ξ h ξ h 1)(ξ l ξ l 1) (ξ l ξ l 1) = u h u l u h u h u l u l = 0, where u h and u l are defined in Step 3 and satisfy u h u l = 0 and u h 1 = u l 1 = 0 from Step 2. Alternative distance criteria such as the average distance criterion over all low-dimensional projections could also be used, (see, for example, Welch (1985)). 2.3 GSinCE Procedure Stage Stage 1 Sampling Phase The GSinCE procedure is to be used in a screening situation where it is reasonable to assume that only a small fraction (say 25% or less) of the inputs are active. Loeppky, Sacks, and Welch (2009) justified 10 number of inputs as a reasonable rule of thumb for the number of runs in an effective initial computer experiment. Using this base value, 5 runs for each active input in each stage is reasonble. As an example, with f = 20 inputs and a conservative assumption of a maximum of 40% active inputs, one may take n = 5 (f 0.4) = 2f runs in Stage 1. 16

33 The Stage 1 design matrix, X (1), is taken to be the first f columns of the n (n 1) preliminary design matrix X ; thus X (1) = (ξ 1,..., ξ f ). Denote a vector of outputs from the Stage 1 code runs as y(x (1) ). A Bayesian GP model (see Higdon et al. (2004) and Higdon et al. (2008)), Y (x) = Z(x) + ϵ(x) (2.1) is fitted to the data y(x (1) ). Z( ) is taken to be a stationary Gaussian process with zero mean, variance 1/λ Z and covariance function Cov(Z(x), Z( x)) = 1 λ Z R(x, x) = 1 λ Z f j=1 ρ 4(x j x j ) 2 j, (2.2) where x = (x 1,..., x f ) and x = ( x 1,..., x f ) are two design points. The GP model (2.1) is a special case of (1.1) when f (x)β = 0. The term ϵ(x) in (2.1) is added to represent numerical or other small scale noise and is modeled by a white noise process that is independent of Z( ) and has mean 0 and (small) prior variance 1/λ ϵ. The output y(x (1) ) is centered to have sample mean 0 and unit variance to conform to the prior specification when this model is fitted. The Bayesian model can be fitted using the GPM/SA (Gaussian Process Models for Simulation Analysis) software of Gattiker (2005). The posterior distributions of the model parameters will be used to predict output as in (1.7) for the group variables in the grouping phase in Section Stage 1 Grouping Phase Initial grouping of the inputs into groups that have similar effects on the response is critical for efficient group screening. The individual inputs can be divided into groups using information from subject experts, or using exploratory data analysis of the Stage 1 data, or a combination. Alternatively, an automatic grouping 17

34 procedure can be used as described below, where M is the user-selected maximum group size. The method uses Fisher-transformed Pearson correlation coefficients, r j = tanh 1 (r(ξ j, y(x (1) ))), j = 1,... f, (see Fisher (1921)), where the correlation coefficient r(ξ j, y(x (1) )) measures the strength of the linear relationship between the jth input and the output. Step 1 Set q = f. Step 2 Compute the sample mean r and the sample standard deviation s r of r 1,..., r q. Let a reference distribution for r 1,..., r q be N( r, s 2 r ). Step 3 Divide the reference distribution into ν = q/m intervals where the ith boundary is defined to be the i/ν 100th percentile of N( r, s 2 r ), i.e., Φ 1 r,s 2 (i/ν), r i = 1,..., ν 1. Step 4 Group r 1,..., r q into ν groups based on the boundaries of the reference distribution and count the number of elements observed in each group, h 1,..., h ν. Step 5 If h 1 > M and h ν > M, then go to Step 6. Otherwise, go to Step 7. Step 6 Subdivide each of groups 1 and ν, repeating Steps 1-4, first setting q = h 1 for the leftmost group, and then q = h ν for the rightmost group. Update ν as the total number of groups so that the corresponding group sizes are h 1,..., h ν. Go to Step 5 with the updated groups. Step 7 Sequentially examine h 1, h 2,..., h ν. Let i be the smallest index for which h i > M. If there exists no i for which h i > M, then stop and set m = ν. Otherwise, sequentially examine h ν, h ν 1,..., h ν (ν i). Let j be the smallest index for which h ν j > M. Let q = h i + h i h ν j and go to Step 8. Step 8 Relabel r 1,..., r q corresponding to the inputs to be re-grouped and go to Step 2. 18

35 After the f individual inputs have been divided into the m groups, a design matrix G = (g 1,..., g m ) is formed from a random selection of m columns from X for the group variables. From G, design matrix X P = (ξ P 1,..., ξ P f ) is constructed in terms of the f individual inputs, where all the inputs in group i are set to the levels defined by g i, i = 1,..., m. For example, if the inputs 1, 5, 6 are assigned to group 1, then ξ P 1 = ξ P 5 = ξ P 6 = g 1. The design matrix X P is used to predict the output based on the fitted GP model (Section 2.3.1). The resulting values, denoted by ŷ(x P ) (or, more simply, by ŷ(g)), are used in Section to select the active groups. The training data, y(x (1) ), will be used again in Section to select active individual inputs within the active groups Stage 1 Analysis Phase Sensitivity Indices In this section and Section 2.4.2, the total effect sensitivity index (TESI) is used to detect active effects. This subsection reviews the definition of sensitivity indices when the input region is [0, 1] f. See Chapter 5 for more details. Sobol (1993) showed that the function y(x) can be uniquely decomposed as f y(x) = y 0 + y j (x j ) + y jh (x j, x h ) y 1,2,...,f (x 1,..., x f ) (2.3) j=1 1 j<h f where the terms are recursively defined by y 0 = y(x 1,..., x f )dx 1... dx f [0,1] f y j (x j ) = y(x 1,..., x f )dx j y 0 [0,1] f 1 y jh (x j, x h ) = y(x 1,..., x f )dx jh y j (x j ) y h (x h ) y 0 [0,1] f 2 19

36 and so on. Here dx j denotes integration over all inputs except x j, and dx jh denotes the integration over all inputs except x j and x h. The individual components of Sobol s decomposition are centered; that is, they satisfy 1 0 y j1,...,j s (x j1,..., x js )dx jk = 0, for any 1 k s, and orthogonal; that is, they satisfy y j1,...,j s (x j1,..., x js )y h1,...,h t (x h1,..., x ht )dx 1... dx f = 0, [0,1] f for any (j 1,..., j s ) (h 1,..., h t ). Variance-based indices are obtained by squaring both sides of the Sobol decomposition (2.3) and integrating over [0, 1] f (Sobol (1993)). This leads to the variance decomposition, f V = V j + V jh + + V 1,2,...,f (2.4) j=1 1 j<h f where V = y 2 (x 1,..., x f )dx 1... dx f y0, 2 [0,1] f V j = 1 0 y 2 j (x j )dx j, and V jh = y 2 jh(x j, x h )dx j dx h, and additional terms are defined similarly. Sensitivity indices are obtained by dividing each component in the variance decomposition (2.4) by the total variance V. The main effect sensitivity index of the j th input, is defined to be S j = V j /V. The twofactor sensitivity index of the j th and h th inputs is defined to be S jh = V jh /V. Higherorder sensitivity indices are defined similarly. The TESI of the j th input (Homma and Saltelli (1996)) is the sum of all sensitivity indices involving the j th input, i.e., T j = S j + h j S jh S 1,2,...,f. (2.5) The sensitivity indices are computed using the Bayesian method of Oakley and O Hagan (2004) as implemented in GPM/SA; the sensitivity index is estimated by 20

Two-stage Sensitivity-based Group Screening in Computer Experiments

Two-stage Sensitivity-based Group Screening in Computer Experiments Two-stage Sensitivity-based Group Screening in Computer Experiments Hyejung Moon, Thomas Santner, Angela Dean (hjmoon@stat.osu.edu) (tjs@stat.osu.edu) (amd@stat.osu.edu) Department of Statistics, The Ohio

More information

arxiv: v1 [stat.me] 10 Jul 2009

arxiv: v1 [stat.me] 10 Jul 2009 6th St.Petersburg Workshop on Simulation (2009) 1091-1096 Improvement of random LHD for high dimensions arxiv:0907.1823v1 [stat.me] 10 Jul 2009 Andrey Pepelyshev 1 Abstract Designs of experiments for multivariate

More information

Introduction to Design and Analysis of Computer Experiments

Introduction to Design and Analysis of Computer Experiments Introduction to Design and Analysis of Thomas Santner Department of Statistics The Ohio State University October 2010 Outline Empirical Experimentation Empirical Experimentation Physical Experiments (agriculture,

More information

Gaussian Processes for Computer Experiments

Gaussian Processes for Computer Experiments Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented

More information

Sensitivity analysis in linear and nonlinear models: A review. Introduction

Sensitivity analysis in linear and nonlinear models: A review. Introduction Sensitivity analysis in linear and nonlinear models: A review Caren Marzban Applied Physics Lab. and Department of Statistics Univ. of Washington, Seattle, WA, USA 98195 Consider: Introduction Question:

More information

Stat 890 Design of computer experiments

Stat 890 Design of computer experiments Stat 890 Design of computer experiments Will introduce design concepts for computer experiments Will look at more elaborate constructions next day Experiment design In computer experiments, as in many

More information

One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions

One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions Statistics and Applications Volume 11, Nos. 1&2, 2013 (New Series), pp. 15-32 One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions Fangfang

More information

Sequential adaptive designs in computer experiments for response surface model fit

Sequential adaptive designs in computer experiments for response surface model fit Statistics and Applications Volume 6, Nos. &, 8 (New Series), pp.7-33 Sequential adaptive designs in computer experiments for response surface model fit Chen Quin Lam and William I. Notz Department of

More information

A GENERAL CONSTRUCTION FOR SPACE-FILLING LATIN HYPERCUBES

A GENERAL CONSTRUCTION FOR SPACE-FILLING LATIN HYPERCUBES Statistica Sinica 6 (016), 675-690 doi:http://dx.doi.org/10.5705/ss.0015.0019 A GENERAL CONSTRUCTION FOR SPACE-FILLING LATIN HYPERCUBES C. Devon Lin and L. Kang Queen s University and Illinois Institute

More information

SEQUENTIAL ADAPTIVE DESIGNS IN COMPUTER EXPERIMENTS FOR RESPONSE SURFACE MODEL FIT

SEQUENTIAL ADAPTIVE DESIGNS IN COMPUTER EXPERIMENTS FOR RESPONSE SURFACE MODEL FIT SEQUENTIAL ADAPTIVE DESIGNS IN COMPUTER EXPERIMENTS FOR RESPONSE SURFACE MODEL FIT DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0423.R2 Title Construction of Maximin Distance Designs via Level Permutation and Expansion Manuscript ID SS-2016-0423.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

An introduction to Bayesian statistics and model calibration and a host of related topics

An introduction to Bayesian statistics and model calibration and a host of related topics An introduction to Bayesian statistics and model calibration and a host of related topics Derek Bingham Statistics and Actuarial Science Simon Fraser University Cast of thousands have participated in the

More information

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson Proceedings of the 0 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. RELATIVE ERROR STOCHASTIC KRIGING Mustafa H. Tongarlak Bruce E. Ankenman Barry L.

More information

Park, Pennsylvania, USA. Full terms and conditions of use:

Park, Pennsylvania, USA. Full terms and conditions of use: This article was downloaded by: [Nam Nguyen] On: 11 August 2012, At: 09:14 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Gaussian Process Regression and Emulation

Gaussian Process Regression and Emulation Gaussian Process Regression and Emulation STAT8810, Fall 2017 M.T. Pratola September 22, 2017 Today Experimental Design; Sensitivity Analysis Designing Your Experiment If you will run a simulator model,

More information

Latin Hypercube Sampling with Multidimensional Uniformity

Latin Hypercube Sampling with Multidimensional Uniformity Latin Hypercube Sampling with Multidimensional Uniformity Jared L. Deutsch and Clayton V. Deutsch Complex geostatistical models can only be realized a limited number of times due to large computational

More information

Bridging the Gap Between Space-Filling and Optimal Designs. Design for Computer Experiments. Kathryn Kennedy

Bridging the Gap Between Space-Filling and Optimal Designs. Design for Computer Experiments. Kathryn Kennedy Bridging the Gap Between Space-Filling and Optimal Designs Design for Computer Experiments by Kathryn Kennedy A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of

More information

SELECTING LATIN HYPERCUBES USING CORRELATION CRITERIA

SELECTING LATIN HYPERCUBES USING CORRELATION CRITERIA Statistica Sinica 8(1998), 965-977 SELECTING LATIN HYPERCUBES USING CORRELATION CRITERIA Boxin Tang University of Memphis and University of Western Ontario Abstract: Latin hypercube designs have recently

More information

CONSTRUCTION OF NESTED (NEARLY) ORTHOGONAL DESIGNS FOR COMPUTER EXPERIMENTS

CONSTRUCTION OF NESTED (NEARLY) ORTHOGONAL DESIGNS FOR COMPUTER EXPERIMENTS Statistica Sinica 23 (2013), 451-466 doi:http://dx.doi.org/10.5705/ss.2011.092 CONSTRUCTION OF NESTED (NEARLY) ORTHOGONAL DESIGNS FOR COMPUTER EXPERIMENTS Jun Li and Peter Z. G. Qian Opera Solutions and

More information

A Unified Framework for Uncertainty and Sensitivity Analysis of Computational Models with Many Input Parameters

A Unified Framework for Uncertainty and Sensitivity Analysis of Computational Models with Many Input Parameters A Unified Framework for Uncertainty and Sensitivity Analysis of Computational Models with Many Input Parameters C. F. Jeff Wu H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute

More information

Limit Kriging. Abstract

Limit Kriging. Abstract Limit Kriging V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu Abstract A new kriging predictor is proposed.

More information

OPTIMAL DESIGN INPUTS FOR EXPERIMENTAL CHAPTER 17. Organization of chapter in ISSO. Background. Linear models

OPTIMAL DESIGN INPUTS FOR EXPERIMENTAL CHAPTER 17. Organization of chapter in ISSO. Background. Linear models CHAPTER 17 Slides for Introduction to Stochastic Search and Optimization (ISSO)by J. C. Spall OPTIMAL DESIGN FOR EXPERIMENTAL INPUTS Organization of chapter in ISSO Background Motivation Finite sample

More information

Experimental Space-Filling Designs For Complicated Simulation Outpts

Experimental Space-Filling Designs For Complicated Simulation Outpts Experimental Space-Filling Designs For Complicated Simulation Outpts LTC Alex MacCalman PhD Student Candidate Modeling, Virtual Environments, and Simulations (MOVES) Institute Naval Postgraduate School

More information

arxiv: v1 [math.st] 2 Oct 2010

arxiv: v1 [math.st] 2 Oct 2010 The Annals of Statistics 2010, Vol. 38, No. 3, 1460 1477 DOI: 10.1214/09-AOS757 c Institute of Mathematical Statistics, 2010 A NEW AND FLEXIBLE METHOD FOR CONSTRUCTING DESIGNS FOR COMPUTER EXPERIMENTS

More information

Introduction to emulators - the what, the when, the why

Introduction to emulators - the what, the when, the why School of Earth and Environment INSTITUTE FOR CLIMATE & ATMOSPHERIC SCIENCE Introduction to emulators - the what, the when, the why Dr Lindsay Lee 1 What is a simulator? A simulator is a computer code

More information

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Bayesian Dynamic Linear Modelling for. Complex Computer Models Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer

More information

Anale. Seria Informatică. Vol. XIII fasc Annals. Computer Science Series. 13 th Tome 1 st Fasc. 2015

Anale. Seria Informatică. Vol. XIII fasc Annals. Computer Science Series. 13 th Tome 1 st Fasc. 2015 24 CONSTRUCTION OF ORTHOGONAL ARRAY-BASED LATIN HYPERCUBE DESIGNS FOR DETERMINISTIC COMPUTER EXPERIMENTS Kazeem A. Osuolale, Waheed B. Yahya, Babatunde L. Adeleke Department of Statistics, University of

More information

mlegp: an R package for Gaussian process modeling and sensitivity analysis

mlegp: an R package for Gaussian process modeling and sensitivity analysis mlegp: an R package for Gaussian process modeling and sensitivity analysis Garrett Dancik January 30, 2018 1 mlegp: an overview Gaussian processes (GPs) are commonly used as surrogate statistical models

More information

Optimal Designs for Gaussian Process Models via Spectral Decomposition. Ofir Harari

Optimal Designs for Gaussian Process Models via Spectral Decomposition. Ofir Harari Optimal Designs for Gaussian Process Models via Spectral Decomposition Ofir Harari Department of Statistics & Actuarial Sciences, Simon Fraser University September 2014 Dynamic Computer Experiments, 2014

More information

Space-Filling Designs for Computer Experiments

Space-Filling Designs for Computer Experiments Chapter 5 Space-Filling Designs for Computer Experiments 5. Introduction This chapter and the next discusses how to select inputs at which to compute the output of a computer experiment to achieve specific

More information

Sensitivity Analysis and Variable Screening

Sensitivity Analysis and Variable Screening Chapter 7 Sensitivity Analysis and Variable Screening 7.1 Introduction This chapter discusses sensitivity analysis and the related topic of variable screening. The set-up is as follows. A vector of inputs

More information

Computer experiments with functional inputs and scalar outputs by a norm-based approach

Computer experiments with functional inputs and scalar outputs by a norm-based approach Computer experiments with functional inputs and scalar outputs by a norm-based approach arxiv:1410.0403v1 [stat.me] 1 Oct 2014 Thomas Muehlenstaedt W. L. Gore & Associates and Jana Fruth Faculty of Statistics,

More information

DESIGNING COMPUTER EXPERIMENTS TO DETERMINE ROBUST CONTROL VARIABLES

DESIGNING COMPUTER EXPERIMENTS TO DETERMINE ROBUST CONTROL VARIABLES Statistica Sinica 14(2004), 571-590 DESIGNING COMPUTER EXPERIMENTS TO DETERMINE ROBUST CONTROL VARIABLES Jeffrey S. Lehman 1,ThomasJ.Santner 2 and William I. Notz 2 1 Battelle Memorial Institute and 2

More information

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014 Bayesian Prediction of Code Output ASA Albuquerque Chapter Short Course October 2014 Abstract This presentation summarizes Bayesian prediction methodology for the Gaussian process (GP) surrogate representation

More information

Sequential Importance Sampling for Rare Event Estimation with Computer Experiments

Sequential Importance Sampling for Rare Event Estimation with Computer Experiments Sequential Importance Sampling for Rare Event Estimation with Computer Experiments Brian Williams and Rick Picard LA-UR-12-22467 Statistical Sciences Group, Los Alamos National Laboratory Abstract Importance

More information

Experimental designs for multiple responses with different models

Experimental designs for multiple responses with different models Graduate Theses and Dissertations Graduate College 2015 Experimental designs for multiple responses with different models Wilmina Mary Marget Iowa State University Follow this and additional works at:

More information

A Bayesian Approach to Prediction and Variable Selection Using Nonstationary Gaussian Processes

A Bayesian Approach to Prediction and Variable Selection Using Nonstationary Gaussian Processes A Bayesian Approach to Prediction and Variable Selection Using Nonstationary Gaussian Processes Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in

More information

A construction method for orthogonal Latin hypercube designs

A construction method for orthogonal Latin hypercube designs Biometrika (2006), 93, 2, pp. 279 288 2006 Biometrika Trust Printed in Great Britain A construction method for orthogonal Latin hypercube designs BY DAVID M. STEINBERG Department of Statistics and Operations

More information

Follow-Up Experimental Designs for Computer Models and Physical Processes

Follow-Up Experimental Designs for Computer Models and Physical Processes This article was downloaded by: [Acadia University] On: 13 June 2012, At: 07:19 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

SEQUENTIAL DESIGN OF COMPUTER EXPERIMENTS TO MINIMIZE INTEGRATED RESPONSE FUNCTIONS

SEQUENTIAL DESIGN OF COMPUTER EXPERIMENTS TO MINIMIZE INTEGRATED RESPONSE FUNCTIONS Statistica Sinica 10(2000), 1133-1152 SEQUENTIAL DESIGN OF COMPUTER EXPERIMENTS TO MINIMIZE INTEGRATED RESPONSE FUNCTIONS Brian J. Williams, Thomas J. Santner and William I. Notz The Ohio State University

More information

Computer Emulation With Density Estimation

Computer Emulation With Density Estimation Computer Emulation With Density Estimation Jake Coleman, Robert Wolpert May 8, 2017 Jake Coleman, Robert Wolpert Emulation and Density Estimation May 8, 2017 1 / 17 Computer Emulation Motivation Expensive

More information

A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors

A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors Yichi Zhang 1, Siyu Tao 1, Wei Chen 1, and Daniel W. Apley 2 1 Department of Mechanical Engineering, Northwestern

More information

Computer Model Calibration or Tuning in Practice

Computer Model Calibration or Tuning in Practice Computer Model Calibration or Tuning in Practice Jason L. Loeppky Department of Statistics University of British Columbia Vancouver, BC, V6T 1Z2, CANADA (jason@stat.ubc.ca) Derek Bingham Department of

More information

Lectures. Variance-based sensitivity analysis in the presence of correlated input variables. Thomas Most. Source:

Lectures. Variance-based sensitivity analysis in the presence of correlated input variables. Thomas Most. Source: Lectures Variance-based sensitivity analysis in the presence of correlated input variables Thomas Most Source: www.dynardo.de/en/library Variance-based sensitivity analysis in the presence of correlated

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS Statistica Sinica 24 (2014), 1685-1702 doi:http://dx.doi.org/10.5705/ss.2013.239 CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS Mingyao Ai 1, Bochuan Jiang 1,2

More information

SURROGATE PREPOSTERIOR ANALYSES FOR PREDICTING AND ENHANCING IDENTIFIABILITY IN MODEL CALIBRATION

SURROGATE PREPOSTERIOR ANALYSES FOR PREDICTING AND ENHANCING IDENTIFIABILITY IN MODEL CALIBRATION International Journal for Uncertainty Quantification, ():xxx xxx, 0 SURROGATE PREPOSTERIOR ANALYSES FOR PREDICTING AND ENHANCING IDENTIFIABILITY IN MODEL CALIBRATION Zhen Jiang, Daniel W. Apley, & Wei

More information

Thomas Santner Ohio State University Columbus OH. Joint Statistical Meetings Seattle, WA August 2006

Thomas Santner Ohio State University Columbus OH. Joint Statistical Meetings Seattle, WA August 2006 Introductory Overview Lecture on Computer Experiments - The Modeling and Analysis of Data from Computer Experiments Thomas Santner Ohio State University Columbus OH Joint Statistical Meetings Seattle,

More information

Journal of Statistical Planning and Inference

Journal of Statistical Planning and Inference Journal of Statistical Planning and Inference 42 (22) 763 772 Contents lists available at SciVerse ScienceDirect Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi

More information

DESIGNS FOR COMPUTER EXPERIMENTS AND UNCERTAINTY QUANTIFICATION

DESIGNS FOR COMPUTER EXPERIMENTS AND UNCERTAINTY QUANTIFICATION DESIGNS FOR COMPUTER EXPERIMENTS AND UNCERTAINTY QUANTIFICATION A Thesis Presented to The Academic Faculty by Evren Gul In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in

More information

Evaluating prediction uncertainty in simulation models

Evaluating prediction uncertainty in simulation models Submitted to Computer Physics Communications LA-UR-98 1362 Evaluating prediction uncertainty in simulation models Michael D. McKay, 1 John D. Morrison, Stephen C. Upton Los Alamos National Laboratory Los

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Maximin Strong Orthogonal Arrays

Maximin Strong Orthogonal Arrays Maximin Strong Orthogonal Arrays by Jiaying Weng B.Sc. Simon Fraser University 2012 Project Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in the Department of

More information

Response Surface Methodology

Response Surface Methodology Response Surface Methodology Process and Product Optimization Using Designed Experiments Second Edition RAYMOND H. MYERS Virginia Polytechnic Institute and State University DOUGLAS C. MONTGOMERY Arizona

More information

arxiv: v1 [stat.me] 24 May 2010

arxiv: v1 [stat.me] 24 May 2010 The role of the nugget term in the Gaussian process method Andrey Pepelyshev arxiv:1005.4385v1 [stat.me] 24 May 2010 Abstract The maximum likelihood estimate of the correlation parameter of a Gaussian

More information

Use of Design Sensitivity Information in Response Surface and Kriging Metamodels

Use of Design Sensitivity Information in Response Surface and Kriging Metamodels Optimization and Engineering, 2, 469 484, 2001 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Use of Design Sensitivity Information in Response Surface and Kriging Metamodels J. J.

More information

Generalized Latin hypercube design for computer experiments

Generalized Latin hypercube design for computer experiments Generalized Latin hypercube design for computer experiments Holger Dette Ruhr-Universität Bochum Fakultät für Mathematik 44780 Bochum, Germany e-mail: holger.dette@rub.de Andrey Pepelyshev Sheffield University

More information

Using Orthogonal Arrays in the Sensitivity Analysis of Computer Models

Using Orthogonal Arrays in the Sensitivity Analysis of Computer Models . Using Orthogonal Arrays in the Sensitivity Analysis of Computer Models Max D. Morris Department of Statistics, and Department of Industrial and Manufacturing Systems Engineering Iowa State University

More information

Tilburg University. Two-dimensional maximin Latin hypercube designs van Dam, Edwin. Published in: Discrete Applied Mathematics

Tilburg University. Two-dimensional maximin Latin hypercube designs van Dam, Edwin. Published in: Discrete Applied Mathematics Tilburg University Two-dimensional maximin Latin hypercube designs van Dam, Edwin Published in: Discrete Applied Mathematics Document version: Peer reviewed version Publication date: 2008 Link to publication

More information

Construction of column-orthogonal designs for computer experiments

Construction of column-orthogonal designs for computer experiments SCIENCE CHINA Mathematics. ARTICLES. December 2011 Vol. 54 No. 12: 2683 2692 doi: 10.1007/s11425-011-4284-8 Construction of column-orthogonal designs for computer experiments SUN FaSheng 1,2, PANG Fang

More information

Stratified Random Sampling for Dependent Inputs

Stratified Random Sampling for Dependent Inputs Stratified Random Sampling for Dependent Inputs Anirban Mondal Case Western Reserve University, Cleveland, OH 44106, USA arxiv:1904.00555v1 [stat.me] 1 Apr 2019 Abhijit Mandal Wayne State University, Detroit,

More information

Combining Experimental Data and Computer Simulations, With an Application to Flyer Plate Experiments

Combining Experimental Data and Computer Simulations, With an Application to Flyer Plate Experiments Combining Experimental Data and Computer Simulations, With an Application to Flyer Plate Experiments LA-UR-6- Brian Williams, Los Alamos National Laboratory Dave Higdon, Los Alamos National Laboratory

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Polynomial chaos expansions for sensitivity analysis

Polynomial chaos expansions for sensitivity analysis c DEPARTMENT OF CIVIL, ENVIRONMENTAL AND GEOMATIC ENGINEERING CHAIR OF RISK, SAFETY & UNCERTAINTY QUANTIFICATION Polynomial chaos expansions for sensitivity analysis B. Sudret Chair of Risk, Safety & Uncertainty

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

A GENERAL THEORY FOR ORTHOGONAL ARRAY BASED LATIN HYPERCUBE SAMPLING

A GENERAL THEORY FOR ORTHOGONAL ARRAY BASED LATIN HYPERCUBE SAMPLING Statistica Sinica 26 (2016), 761-777 doi:http://dx.doi.org/10.5705/ss.202015.0029 A GENERAL THEORY FOR ORTHOGONAL ARRAY BASED LATIN HYPERCUBE SAMPLING Mingyao Ai, Xiangshun Kong and Kang Li Peking University

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Sensitivity analysis using the Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Sensitivity analysis using the Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will Lectures Sensitivity analysis using the Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2011 Source: www.dynardo.de/en/library Sensitivity

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

A NEW CLASS OF NESTED (NEARLY) ORTHOGONAL LATIN HYPERCUBE DESIGNS

A NEW CLASS OF NESTED (NEARLY) ORTHOGONAL LATIN HYPERCUBE DESIGNS Statistica Sinica 26 (2016), 1249-1267 doi:http://dx.doi.org/10.5705/ss.2014.029 A NEW CLASS OF NESTED (NEARLY) ORTHOGONAL LATIN HYPERCUBE DESIGNS Xue Yang 1,2, Jian-Feng Yang 2, Dennis K. J. Lin 3 and

More information

Theory of Screening Procedures to Identify Robust Product Designs Using Fractional Factorial Experiments

Theory of Screening Procedures to Identify Robust Product Designs Using Fractional Factorial Experiments Theory of Screening Procedures to Identify Robust Product Designs Using Fractional Factorial Experiments Guohua Pan Biostatistics and Statistical Reporting Novartis Pharmaceuticals Corporation East Hanover,

More information

Available online: 01 Jan 2012

Available online: 01 Jan 2012 This article was downloaded by: Ohio State University Libraries] On: 30 April 0, At: :3 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 07954 Registered office:

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Kullback-Leibler Designs

Kullback-Leibler Designs Kullback-Leibler Designs Astrid JOURDAN Jessica FRANCO Contents Contents Introduction Kullback-Leibler divergence Estimation by a Monte-Carlo method Design comparison Conclusion 2 Introduction Computer

More information

Stochastic optimization - how to improve computational efficiency?

Stochastic optimization - how to improve computational efficiency? Stochastic optimization - how to improve computational efficiency? Christian Bucher Center of Mechanics and Structural Dynamics Vienna University of Technology & DYNARDO GmbH, Vienna Presentation at Czech

More information

Kriging by Example: Regression of oceanographic data. Paris Perdikaris. Brown University, Division of Applied Mathematics

Kriging by Example: Regression of oceanographic data. Paris Perdikaris. Brown University, Division of Applied Mathematics Kriging by Example: Regression of oceanographic data Paris Perdikaris Brown University, Division of Applied Mathematics! January, 0 Sea Grant College Program Massachusetts Institute of Technology Cambridge,

More information

Gradient-enhanced kriging for high-dimensional problems

Gradient-enhanced kriging for high-dimensional problems Gradient-enhanced kriging for high-dimensional problems Mohamed A. Bouhlel mbouhlel@umich.edu Joaquim R. R. A. Martins jrram@umich.edu August 10, 2017 arxiv:1708.02663v1 [cs.lg] 8 Aug 2017 Abstract Surrogate

More information

The Pennsylvania State University The Graduate School A METHODOLOGY FOR EVALUATING SYSTEM-LEVEL UNCERTAINTY

The Pennsylvania State University The Graduate School A METHODOLOGY FOR EVALUATING SYSTEM-LEVEL UNCERTAINTY The Pennsylvania State University The Graduate School A METHODOLOGY FOR EVALUATING SYSTEM-LEVEL UNCERTAINTY IN THE CONCEPTUAL DESIGN OF COMPLEX MULTIDISCIPLINARY SYSTEMS A Thesis in Mechanical Engineering

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Modelling Under Risk and Uncertainty

Modelling Under Risk and Uncertainty Modelling Under Risk and Uncertainty An Introduction to Statistical, Phenomenological and Computational Methods Etienne de Rocquigny Ecole Centrale Paris, Universite Paris-Saclay, France WILEY A John Wiley

More information

Large Scale Modeling by Bayesian Updating Techniques

Large Scale Modeling by Bayesian Updating Techniques Large Scale Modeling by Bayesian Updating Techniques Weishan Ren Centre for Computational Geostatistics Department of Civil and Environmental Engineering University of Alberta Large scale models are useful

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Monitoring Wafer Geometric Quality using Additive Gaussian Process Monitoring Wafer Geometric Quality using Additive Gaussian Process Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1 1 Department of Industrial and Systems Engineering, National University of Singapore 2 Department

More information

Estimating percentiles of uncertain computer code outputs

Estimating percentiles of uncertain computer code outputs Appl. Statist. (2004) 53, Part 1, pp. 83 93 Estimating percentiles of uncertain computer code outputs Jeremy Oakley University of Sheffield, UK [Received June 2001. Final revision June 2003] Summary. A

More information

Spatial smoothing using Gaussian processes

Spatial smoothing using Gaussian processes Spatial smoothing using Gaussian processes Chris Paciorek paciorek@hsph.harvard.edu August 5, 2004 1 OUTLINE Spatial smoothing and Gaussian processes Covariance modelling Nonstationary covariance modelling

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Statistica Sinica 26 (2016), 1117-1128 doi:http://dx.doi.org/10.5705/ss.202015.0240 A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Xu He and Peter Z. G. Qian Chinese Academy of Sciences

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information