Introduction to Design and Analysis of Computer Experiments

Size: px

Start display at page:

Download "Introduction to Design and Analysis of Computer Experiments"

Kelly Hudson
6 years ago
Views:

1 Introduction to Design and Analysis of Thomas Santner Department of Statistics The Ohio State University October 2010

2 Outline Empirical Experimentation

3 Empirical Experimentation Physical Experiments (agriculture, industrial, medicine) are the gold-standard for determining the relationship between a endpoint of scientific interest and potential factors that affect it.

4 Empirical Experimentation Physical Experiments (agriculture, industrial, medicine) are the gold-standard for determining the relationship between a endpoint of scientific interest and potential factors that affect it. Viewpoint Experimental output = true I/O relationship + noise where Y p (x) = η true (x) + ǫ(x) x η true (x) is the unknown true I/O relationship.

5 Empirical Experimentation Some Challenges for Physical Experiments There are scientifically unimportant nuisance ( blocking) factors- model these factors via x i

6 Empirical Experimentation Some Challenges for Physical Experiments There are scientifically unimportant nuisance ( blocking) factors- model these factors via x i Some factors affecting the outcome are not recognized randomize R x s to experimental units to prevent unrecognized factors from systematically affecting treatment comparisons

7 Empirical Experimentation Some Challenges for Physical Experiments There are scientifically unimportant nuisance ( blocking) factors- model these factors via x i Some factors affecting the outcome are not recognized randomize R x s to experimental units to prevent unrecognized factors from systematically affecting treatment comparisons Choice of sample size determine right-sized experiments to detect important treatment differences

8 Empirical Experimentation Some Challenges for Physical Experiments There are scientifically unimportant nuisance ( blocking) factors- model these factors via x i Some factors affecting the outcome are not recognized randomize R x s to experimental units to prevent unrecognized factors from systematically affecting treatment comparisons Choice of sample size determine right-sized experiments to detect important treatment differences Stochastic Models relating the inputs x to the output Y p (x)

9 Empirical Experimentation Stochastic Simulation Experiments popular tool for studying complex physical systems each of whose parts behave in a known stochastic manner but whose ensemble behavior is not understood analytically. (Heavily used in Industrial Engineering e.g., compare job shop set ups)

10 and Physical Experiments Computer Experiment use of (deterministic) computer simulators that implements a detailed mathematical model to study a input/output relationship of scientific interest (micro model vs macro model used in SS Experiments)

11 and Physical Experiments Computer Experiment use of (deterministic) computer simulators that implements a detailed mathematical model to study a input/output relationship of scientific interest (micro model vs macro model used in SS Experiments) Mathematical model is typically a PDE/ODE. Implementation via FE, CFD,...

12 and Physical Experiments Computer Experiment use of (deterministic) computer simulators that implements a detailed mathematical model to study a input/output relationship of scientific interest (micro model vs macro model used in SS Experiments) Mathematical model is typically a PDE/ODE. Implementation via FE, CFD,... Some applications have data from both (one or more) computer experiment(s) and (one or more) physical experiment(s) to investigate the true input/output relationship of scientific interest. Thus computer experiments are used as adjuncts or surrogates for physical experiments for studying input/output relationships

13 Designing an Acetabular Cup

14 Designing an Acetabular Cup Ong et al (2006) conducted an uncertainty analysis of the effects of engineering cup design, surgical, and patient variables on the stability of uncemented accetabular cups ( porous ingrowth cups which rely on growth of the bone into the prosthesis)

15 Designing an Acetabular Cup Engineering Cup design Inputs Length stem, Stem cross section, sphericity of acetabular cup, etc

16 Designing an Acetabular Cup Engineering Cup design Inputs Length stem, Stem cross section, sphericity of acetabular cup, etc Patient Inputs elastic modulus of the bone, friction between prosthesis s tem and the bone

17 Designing an Acetabular Cup Engineering Cup design Inputs Length stem, Stem cross section, sphericity of acetabular cup, etc Patient Inputs elastic modulus of the bone, friction between prosthesis s tem and the bone Surgical Inputs accuracy of reamed acetabular cup in pelvis (gouges area between rear of acetabular cup insert and bone provide space for later debris buildup)

18 Designing an Acetabular Cup Engineering Cup design Inputs Length stem, Stem cross section, sphericity of acetabular cup, etc Patient Inputs elastic modulus of the bone, friction between prosthesis s tem and the bone Surgical Inputs accuracy of reamed acetabular cup in pelvis (gouges area between rear of acetabular cup insert and bone provide space for later debris buildup) Output is one of three measures of the amount of ingrowth of the bone into the acetabular cup

19 Other Disciplinary Examples of Biomechanics Rawlinson, et al. (2006) simulate the stresses and strains experienced by knee prostheses of different engineering designs.

20 Other Disciplinary Examples of Biomechanics Rawlinson, et al. (2006) simulate the stresses and strains experienced by knee prostheses of different engineering designs. Aeronautical Engineering

21 Other Disciplinary Examples of Biomechanics Rawlinson, et al. (2006) simulate the stresses and strains experienced by knee prostheses of different engineering designs. Aeronautical Engineering Drug Development

22 Other Disciplinary Examples of Biomechanics Rawlinson, et al. (2006) simulate the stresses and strains experienced by knee prostheses of different engineering designs. Aeronautical Engineering Drug Development Economics Lempert, et al. (2000) introduce the Wonderland model which simulates world economic development under given scenarios of innovation and pollution

23 Other Disciplinary Examples of Biomechanics Rawlinson, et al. (2006) simulate the stresses and strains experienced by knee prostheses of different engineering designs. Aeronautical Engineering Drug Development Economics Lempert, et al. (2000) introduce the Wonderland model which simulates world economic development under given scenarios of innovation and pollution Tissue Engineering Spilker (2009) modeled the kinetics of fluid flow in two-phase model of cartilage under sinusoidal loading

24 Features of Computer Simulators x Code y(x) is a proxy for the physical process (based on simplified physics or biology)

25 Features of Computer Simulators x Code y(x) is a proxy for the physical process (based on simplified physics or biology) y(x) is presumed to be a biased representation of the true input-response relationship (the true relationship would be obtainable by a physical experiment)

26 Features of Computer Simulators x Code y(x) is a proxy for the physical process (based on simplified physics or biology) y(x) is presumed to be a biased representation of the true input-response relationship (the true relationship would be obtainable by a physical experiment) y(x) need not be replicated (the code is deterministic) (in contrast to stochastic computer simulations)

27 Features of Computer Simulators x Code y(x) is a proxy for the physical process (based on simplified physics or biology) y(x) is presumed to be a biased representation of the true input-response relationship (the true relationship would be obtainable by a physical experiment) y(x) need not be replicated (the code is deterministic) (in contrast to stochastic computer simulations) x can often be high-dimensional (often functional)

28 Features of Computer Simulators x Code y(x) is a proxy for the physical process (based on simplified physics or biology) y(x) is presumed to be a biased representation of the true input-response relationship (the true relationship would be obtainable by a physical experiment) y(x) need not be replicated (the code is deterministic) (in contrast to stochastic computer simulations) x can often be high-dimensional (often functional) When x is high-dimensional, screening is important

29 Features of Computer Simulators x Code y(x) is a proxy for the physical process (based on simplified physics or biology) y(x) is presumed to be a biased representation of the true input-response relationship (the true relationship would be obtainable by a physical experiment) y(x) need not be replicated (the code is deterministic) (in contrast to stochastic computer simulations) x can often be high-dimensional (often functional) When x is high-dimensional, screening is important Computer codes used in practice have running times that can range from minutes to days.

30 Based on Computer Output Suppose we have only computer output and no data from an associated physical experiment (momentarily forget about the possibility of data from an associated physical experiment)

31 Based on Computer Output Suppose we have only computer output and no data from an associated physical experiment (momentarily forget about the possibility of data from an associated physical experiment) Warning Can t determine code bias in such a setting

32 Based on Computer Output Suppose we have only computer output and no data from an associated physical experiment (momentarily forget about the possibility of data from an associated physical experiment) Warning Can t determine code bias in such a setting Notation Let the inputs and outputs for our training data be (x tr 1, y c (x tr 1 )),...,(x tr n, y c (x tr n )),

33 Based on Computer Output Suppose we have only computer output and no data from an associated physical experiment (momentarily forget about the possibility of data from an associated physical experiment) Warning Can t determine code bias in such a setting Notation Let the inputs and outputs for our training data be (x tr 1, y c (x tr 1 )),...,(x tr n, y c (x tr n )), Goal Predict y c (x 0 ) where x 0 is an untried ( new ) input

34 Based on Computer Output Idea Regard y c (x) as a realization(a draw ) from a random function Y c (x). This is a Bayesian model; the desired smoothness properties of y c (x) must be built into all functions drawn from Y c (x).

35 Based on Computer Output Provides flexibility draws from three urns (Y c (x) processes) Y R(h) Y R(h) Y x R(h) h

36 Based on Computer Output We use the training data to pick the exact Y c (x) urn we draw from.

37 Based on Computer Output We use the training data to pick the exact Y c (x) urn we draw from. A naive predictor ŷ c (x 0 ) = E{Y c (x 0 ) training data}

38 Based on Computer Output We use the training data to pick the exact Y c (x) urn we draw from. A naive predictor ŷ c (x 0 ) = E{Y c (x 0 ) training data} A measure of prediction uncertainty is Var(Y c (x 0 ) training data) (due to uncertainty about the model)

39 Based on Computer Output tmp Index 1 to 50

40 The simplest possible model for Y(x) is the regression-like model Y(x) = k β j f j (x) + Z(x) = β f(x) + Z(x) j=1 (Y(x) = Large Scale Trends + Smooth Deviations)

41 The simplest possible model for Y(x) is the regression-like model Y(x) = k β j f j (x) + Z(x) = β f(x) + Z(x) j=1 (Y(x) = Large Scale Trends + Smooth Deviations) f 1 (x),..., f k (x) are known regression functions,

42 The simplest possible model for Y(x) is the regression-like model Y(x) = k β j f j (x) + Z(x) = β f(x) + Z(x) j=1 (Y(x) = Large Scale Trends + Smooth Deviations) f 1 (x),..., f k (x) are known regression functions, β 1,..., β k are unknown regression coefficients, and

43 The simplest possible model for Y(x) is the regression-like model Y(x) = k β j f j (x) + Z(x) = β f(x) + Z(x) j=1 (Y(x) = Large Scale Trends + Smooth Deviations) f 1 (x),..., f k (x) are known regression functions, β 1,..., β k are unknown regression coefficients, and Usually, β f(x) = β 0 in computer experiments

44 Y(x) = k j=1 β jf j (x) + Z(x) = β f(x) + Z(x) In regression we regard deviations from the regression mean as measurement errors meaning that Z(x 1 ) and Z(x 2 ) are unrelated ( white noise ). In computer experiments we want a Z(x) model that exhibits smooth deviations (from the Large Scale Trends). The simplest such model for Z(x) is a stationary Gaussian Stochastic Process ( GaSP )

45 Stationary GaSPs Z(x) is stationarity if for any x, the random variable Z(x) has constant mean and constant variance, and Cor(Z(x 1 ), Z(x 2 ) depends onlyon x 1 x 2, i.e.,

46 Stationary GaSPs Z(x) is stationarity if for any x, the random variable Z(x) has constant mean and constant variance, and Cor(Z(x 1 ), Z(x 2 ) depends onlyon x 1 x 2, i.e., E {Z(x)} = 0 ( = E {Y(x)} = β f(x)(= β 0 ) )

47 Stationary GaSPs Z(x) is stationarity if for any x, the random variable Z(x) has constant mean and constant variance, and Cor(Z(x 1 ), Z(x 2 ) depends onlyon x 1 x 2, i.e., E {Z(x)} = 0 ( = E {Y(x)} = β f(x)(= β 0 ) ) Var(Z(x)) = σz 2 ( ) = Var(Y(x)) = σ 2 Z

48 Stationary GaSPs Z(x) is stationarity if for any x, the random variable Z(x) has constant mean and constant variance, and Cor(Z(x 1 ), Z(x 2 ) depends onlyon x 1 x 2, i.e., E {Z(x)} = 0 ( = E {Y(x)} = β f(x)(= β 0 ) ) Var(Z(x)) = σz 2 ( ) = Var(Y(x)) = σ 2 Z Cor(Z(x 1 ), Z(x 2 )) is determined by a correlation function, i.e., an R( ) that satisfies R(0) = 1 and Cor(Z(x 1 ), Z(x 2 )) = R(x 1 x 2 ) (= Cor(Y(x 1 ), Y(x 2 )) = R(x 1 x 2 )) ) (and Cov(Y(x 1 ), Y(x 2 )) = σz 2 R(x 1 x 2 )

49 Stationary GaSPs For any s 1 and x 1,...,x s Z (Z 1,..., Z s ) (Z(x 1 ),..., Z(x s )) MultVar Normal

50 Stationary GaSPs For any s 1 and x 1,...,x s Z (Z 1,..., Z s ) (Z(x 1 ),..., Z(x s )) MultVar Normal The process Y( ) is completely determined once we identify (β 0,σZ 2, R( )), i.e., ( ) Y (Y 1,..., Y s ) (Y(x 1 ),...,Y(x s )) N s Fβ,σZ 2 R where F = F(x 1,..., x s ) is s k and R i,j = R(x i x j ) is s s

51 Stationary GaSPs For any s 1 and x 1,...,x s Z (Z 1,..., Z s ) (Z(x 1 ),..., Z(x s )) MultVar Normal The process Y( ) is completely determined once we identify (β 0,σZ 2, R( )), i.e., ( ) Y (Y 1,..., Y s ) (Y(x 1 ),...,Y(x s )) N s Fβ,σZ 2 R where F = F(x 1,..., x s ) is s k and R i,j = R(x i x j ) is s s Typically assume R( ) = R( ξ) is known up to a finite vector of parameters ξ, i.e., R( ) is parametric

52 Popular Correlation Functions Let z x 1 x 2 = (z 1,..., z d )

53 Popular Correlation Functions Let z x 1 x 2 = (z 1,..., z d ) White Noise R(z) = { 1, z = 0 0, z 0 = R = I s

54 Popular Correlation Functions Let z x 1 x 2 = (z 1,..., z d ) White Noise R(z) = { 1, z = 0 0, z 0 = R = I s Product Power Gaussian Correlation Function d d R(z) = exp( ξ j zj 2 ) = exp( ξ j zj 2 ), j=1 j=1 z IR d

55 Popular Correlation Functions Product Cubic Correlation Function R(z ξ) = d R(z j ξ j ) where ξ = (ξ 1,...,ξ d ) > 0 and ( ) 2 ( ) 1 6 z ξ + 6 z 3 ξ, z ξ/2 ( ) R(z ξ) = z ξ, ξ/2 < z ξ 0, ξ < z for ξ > 0 and z IR. j=1

56 The Multivariate Normal Distribution Definition X = (X 1,...,X d ) has the multivariate normal distribution with mean µ and covariance matrix Σ, denoted X N d (µ,σ), if X joint pdf 1 exp{ 1 (2π) d 2 det(σ) 2 (x btµ) Σ 1 (x btµ) }

57 The Multivariate Normal Distribution Definition X = (X 1,...,X d ) has the multivariate normal distribution with mean µ and covariance matrix Σ, denoted X N d (µ,σ), if X joint pdf 1 exp{ 1 (2π) d 2 det(σ) 2 (x btµ) Σ 1 (x btµ) } Recall that if W = (W 1, W 2 ) N d (( µ1 µ 2 ) ( Σ11 Σ, 12 Σ 21 Σ 22 )) then given that W 2 = w 2 the conditional density of W 1 given W 2 ( = w 2, denoted [W 1 W 2 = w 2 ] is ) N d1 µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ),Σ 11 Σ 12 Σ 1 22 Σ 21

58 Application Naive Predictor ŷ c (x 0 ) = E{Y c (x 0 ) Y tr } = k β j f j (x 0 ) + r ( 0 R 1 Y tr Fβ ) j=1

59 Application Naive Predictor ŷ c (x 0 ) = E{Y c (x 0 ) Y tr } = k β j f j (x 0 ) + r ( 0 R 1 Y tr Fβ ) j=1 Uncertainty is Var(Y c (x 0 ) Y tr ) = σ 2 Z (1 r 0 R 1 r 0 )

60 Application Naive Predictor ŷ c (x 0 ) = E{Y c (x 0 ) Y tr } = k β j f j (x 0 ) + r ( 0 R 1 Y tr Fβ ) j=1 Uncertainty is Var(Y c (x 0 ) Y tr ) = σ 2 Z Usually don t know (β,σz 2,ξ) :-( (1 r 0 R 1 r 0 )

61 Inference Frequentist Inference Choice of urn equivalent to selection of (β 0,σZ 2,ξ). Use training data to estimate (β 0,σZ 2,ξ) (the urn ) which is the basis for our predictor ŷ c (x 0 ) = E{Y c (x 0 ) training data, β 0, σ Z 2, ξ}

62 Inference Frequentist Inference Choice of urn equivalent to selection of (β 0,σZ 2,ξ). Use training data to estimate (β 0,σZ 2,ξ) (the urn ) which is the basis for our predictor ŷ c (x 0 ) = E{Y c (x 0 ) training data, β 0, σ 2 Z, ξ} Bayesian Inference Places a prior distribution on (β 0,σZ 2,ξ) and uses the posterior [β 0,σZ 2,ξ training data] to select a collection of urns and averages the predictors from each urn ŷ c (x 0 ) = E{E{Y c (x 0 ) training data,β 0,σ 2 Z,ξ}} where the outer E{ } is wrt [β 0,σZ 2,ξ training data].

63 Illustration True Curve (solid); n = 5 training data (diamonds); REML-EBLUP with Gaussian correlation function (dotted) True and Predicted y() x

64 The frequentist plug-in ŷ(x 0 ) is the basic off-the-shelf predictor of y(x 0 ) based on training data

65 The frequentist plug-in ŷ(x 0 ) is the basic off-the-shelf predictor of y(x 0 ) based on training data When (β 0,σ 2 Z,ξ) are known, ŷ(x 0 ) has many theoretical optimality properties although when these parameters are estimated from the data, the list of optimality properties is much shorter

66 The frequentist plug-in ŷ(x 0 ) is the basic off-the-shelf predictor of y(x 0 ) based on training data When (β 0,σ 2 Z,ξ) are known, ŷ(x 0 ) has many theoretical optimality properties although when these parameters are estimated from the data, the list of optimality properties is much shorter How to use the data to estimate (β 0,σZ 2,ξ)? MLE? REML? PMLE? XV? No plug-in method overcomes the smoothing problem seen in the example, although Bayesian predictors can

67 The frequentist plug-in ŷ(x 0 ) is the basic off-the-shelf predictor of y(x 0 ) based on training data When (β 0,σ 2 Z,ξ) are known, ŷ(x 0 ) has many theoretical optimality properties although when these parameters are estimated from the data, the list of optimality properties is much shorter How to use the data to estimate (β 0,σZ 2,ξ)? MLE? REML? PMLE? XV? No plug-in method overcomes the smoothing problem seen in the example, although Bayesian predictors can Bayesian methodology can be used to make more legitimate predictors and honest estimates of prediction uncertainty

68 Other limitations: there are many cases where the y( ) does not look like a draw from a GaSP. Need predictors based on non-stationary models.

69 Other limitations: there are many cases where the y( ) does not look like a draw from a GaSP. Need predictors based on non-stationary models. Two approaches to the Design of a computer experiment, i.e., choice of {x tr i } n i=1.

70 Other limitations: there are many cases where the y( ) does not look like a draw from a GaSP. Need predictors based on non-stationary models. Two approaches to the Design of a computer experiment, i.e., choice of {x tr i } n i=1. Approach 1 geometric-choose v to be space-filling.

71 Other limitations: there are many cases where the y( ) does not look like a draw from a GaSP. Need predictors based on non-stationary models. Two approaches to the Design of a computer experiment, i.e., choice of {x tr i } n i=1. Approach 1 geometric-choose v to be space-filling. Approach 2 criteria-based designs for computer experiments, eg, find an input which achieves arg min x y c (x)}

72 Questions?

Thomas Santner Ohio State University Columbus OH. Joint Statistical Meetings Seattle, WA August 2006

Introductory Overview Lecture on Computer Experiments - The Modeling and Analysis of Data from Computer Experiments Thomas Santner Ohio State University Columbus OH Joint Statistical Meetings Seattle,