A Bayesian Approach to Prediction and Variable Selection Using Nonstationary Gaussian Processes

Size: px
Start display at page:

Download "A Bayesian Approach to Prediction and Variable Selection Using Nonstationary Gaussian Processes"

Transcription

1 A Bayesian Approach to Prediction and Variable Selection Using Nonstationary Gaussian Processes Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Casey Davis, B.S., M.A., M.S. Graduate Program in Statistics The Ohio State University 25 Dissertation Committee: Dr. Christopher M. Hans, Co-Advisor Dr. Thomas J. Santner, Co-Advisor Dr. Matt Pratola

2 c Copyright by Casey Davis 25

3 Abstract This research proposes a Bayesian formulation of the Composite Gaussian Process GP of Ba and Joseph 22. The composite Gaussian Process model generalizes the regression plus stationary GP model in both a stationary and nonstationary manner. The likelihood stage of the model combines two independent Gaussian processes and the remaining stages put priors on the means, variances, and correlation parameters of the Gaussian processes. Markov chain Monte Carlo methods are used to estimate posterior predictions and prediction intervals and are compared with predictions from the composite GP model, a treed GP model, and a universal kriging approach. This research also develops screening methodology for experiments with many inputs that is based on a hierarchical Bayesian Gaussian process model. This flexible model is able to describe output functions having varying range and patterns of fluctuation. Screening is accomplished by identifying inputs with small posterior probability of being correlated with the output by incorporating a Bayesian variable selection prior for the correlation parameters. ii

4 This is dedicated to those that have helped me get here. iii

5 Acknowledgments I would like to thank my co-advisors Professors Christopher Hans and Thomas Santner for their ideas, support, patience, and willingness to help. I am extremely grateful for the effort they expended in helping me do the research for this dissertation and in editing and proofreading my writing. I truly lucked out with my advisors. I would also like to thank my parents, John and Beverly, for encouraging me through all twelve years of college. Without your support, I would not have made it through, and I might be living under a bridge somewhere. Thanks to my sister Katie for letting me whine and complain about grad school in general and dissertationwriting in particular. Now it s your turn. To my brother Jamie and his wife Sara, thanks for letting me come over and play with the kids instead of putting me to work. To my stepmother Judy, thanks for making me feel welcome on my rare trips down South. I was lucky to have my cousins, Matt, Adam, and Annie, and sort-of cousin, Jackie, in Columbus during my time in school. You all made my time in school much more enjoyable. To Uncle Jim, thanks for the encouragement. To Chris, Grant, and John, thanks for the lunches, the s, and the text messages that gave me a break in otherwise long days. To Agniva, Jingjing, Matt, and Rob, thanks for hanging out. iv

6 Vita January 26, Born - Columbus, NC, USA B.S. Mathematics, University of North Carolina Greensboro M.A. Applied Economics, University of North Carolina Greensboro M.S. Statistics, The Ohio State University 2-present Graduate Teaching Associate, The Ohio State University. Fields of Study Major Field: Statistics v

7 Table of Contents Page Abstract Dedication Acknowledgments Vita List of Tables ii iii iv v ix List of Figures x. Introduction Gaussian Process Models Stationary vs. Nonstationary Processes Applications of Gaussian Process Models Prediction Methodologies Kriging Treed Gaussian Processes Composite Gaussian Processes Variable Selection Methodologies Spike-and-Slab and Closely Related Methods Reference Distribution Variable Selection Two-Stage Sensitivity-Based Group Screening Markov Chain Monte Carlo Overview Overview of Dissertation Bayesian Composite Gaussian Processes for Prediction The Stationary BCGP Model Priors Examples of Draws from the Stationary BCGP vi

8 2..3 Computational Methods for Sampling the Posterior from the Stationary BCGP Model Prediction Based on the Stationary BCGP Model Stationary Examples Nonstationary Examples The Nonstationary BCGP Model Priors Examples of Draws from the Nonstationary BCGP Computational Methods for the Nonstationary BCGP Model Prediction Based on the Nonstationary BCGP Model Stationary Examples Nonstationary Examples Prediction Comparisons Bayesian Composite Gaussian Processes for Variable Selection The BCGP Model for Variable Selection Priors Examples of Draws from the Nonstationary BCGP Computational Methods for the Nonstationary BCGP Model for Variable Selection Determining Those Variables That Are Active Examples Discussion Contributions and Future Research Contributions Future Research Appendices 5 A. Proof of Minimum MSPE Predictor and Derivations of Full Conditional Distributions A. Proof of Theorem A.2 Derivation of the Full Conditional Distribution of µ Λ µ, y A.3 Derivation of the Full Conditional Distribution of µ V Λ µv, y.. 5 A.4 Derivation of the Full Conditional Distribution of σk 2 Λ σk 2, y.. 52 A.5 Derivation of the Full Conditional Distribution of p i Λ pi, y B. Software Manuals B. Stationary BCGP for Prediction vii

9 B.. The MATLAB function BCGPpredStat.m B..2 Functions Called by BCGPpredStat B..3 An Example B.2 Nonstationary BCGP for Prediction B.2. The MATLAB function BCGPpredNonStat.m B.2.2 Functions Called by BCGPpredNonStat B.2.3 An Example B.3 Nonstationary BCGP for Variable Selection B.3. The MATLAB function BCGPvarSel.m B.3.2 Functions Called by BCGPvarSel B.3.3 An Example viii

10 List of Tables Table Page 2. MSPEs for Example Functions with Noise-Free Data ix

11 List of Figures Figure Page. Ten draws from each of a stationary GP and nonstationary GP and sample covariances of Y.2 and Y.5 and Y.5 and Y.8 plotted against each other Example draws from the process in Example draws from the process in Kriging predictors for the function in. with 95% prediction intervals. 6.5 Kriging predictors and plots of errors for the function in Kriging predictors for the function in.2 with 95% prediction intervals. 9.7 Kriging predictors for the function in.3 with 95% prediction intervals. 2.8 Kriging predictors and plots of errors for the function in An example partition TGP predictors for the functions in.,.2, and.3 with 95% prediction intervals and small n TGP predictors for the functions in.,.2, and.3 with 95% prediction intervals and larger n TGP predictor and plot of errors at a 2 2 grid for TGP predictor and plot of errors at a 2 2 grid for Examples of two different fixed vx functions and draws from the process in x

12 .5 CGP predictors for the functions in.,.2, and.3 with 95% prediction intervals left column and the estimated vx functions right column CGP predictor and plot of errors at a 2 2 grid for CGP predictor and plot of errors at a 2 2 grid for N, v γi and N, v γi densities. Intersections at δ iγ and δ iγ For.25, boxplots of posterior draws of correlation parameters for one iteration left and m = iterations combined right For.26, boxplots of posterior draws of correlation parameters for one iteration left and m = iterations combined right For.27, boxplots of posterior draws of correlation parameters for one iteration left and m = iterations combined right For.28, boxplots of posterior draws of correlation parameters for one iteration left and m = iterations combined right Example draws from the process in 2. for each w {.5,.75, } An example draw from the process in 2. for each w {.5,.75, } Stationary BCGP predictor and 95% prediction intervals for the function in. when the data is noise-free left and noisy right Stationary BCGP predictor and plot of errors for the function in. when the data is noise-free left and noisy right Stationary BCGP predictor and 95% prediction intervals for the function in.2 when the data is noise-free left and noisy right Stationary BCGP predictor and 95% prediction intervals for the function in.3 when the data is noise-free left and noisy right Stationary BCGP predictor and plot of errors for the function in.4 when the data is noise-free left and noisy right xi

13 2.8 Examples of two different fixed σ 2 x functions and draws from the process in 2.8 for each w {.5,.75, } An example of a fixed σ 2 x function and two draws from the process in 2.8 for each w {.5,.75, } An example draw from the process in 2.2 for w {.5,.75, } Nonstationary BCGP predictor and 95% prediction intervals for the function in. and posterior mean of σ 2 x when the data is noise-free left and noisy right Nonstationary BCGP predictor and plot of errors for the function in. when the data is noise-free left and noisy right Posterior mean of σ 2 x when the data is noise-free left and noisy right Nonstationary BCGP predictor and 95% prediction intervals for the function in.2 and posterior mean of σ 2 x when the data is noise-free left and noisy right Nonstationary BCGP predictor and 95% prediction intervals for the function in.3 and posterior mean of σ 2 x when the data is noise-free left and noisy right Nonstationary BCGP predictor and plot of errors for the function in.4 when the data is noise-free left and noisy right Posterior mean of σ 2 x when the data is noise-free left and noisy right An example of a fixed σ 2 x function and two draws from the process in 3. for each w {.5,.75, } Two draws from the process in 3. for a fixed σ 2 x function and w {.5,.75, } Boxplots of global correlation parameters left and local correlation parameters right for one iteration Boxplots of global correlation parameters left and local correlation parameters right for m = iterations combined xii

14 3.5 Boxplots of global correlation parameters left and local correlation parameters right for one iteration Boxplots of global correlation parameters left and local correlation parameters right for m = iterations combined Boxplots of global correlation parameters left and local correlation parameters right for m = iterations combined Boxplots of global correlation parameters left and local correlation parameters right for one iteration Boxplots of global correlation parameters left and local correlation parameters right for m = iterations combined Boxplots of global correlation parameters left and local correlation parameters right for one iteration Boxplots of global correlation parameters left and local correlation parameters right for m = iterations combined The true function in 3.2 in x and x 2 and training data Boxplots of global correlation parameters left and local correlation parameters right for one iteration Boxplots of global correlation parameters left and local correlation parameters right for m = iterations combined B. Franke function B.2 Franke function B.3 True function xiii

15 Chapter : Introduction The goal of this chapter is to introduce Gaussian process models and some of their applications. A review of previous methods that have employed these models for both prediction and variable selection will be presented in Sections.3 and.4, as will a general overview of Markov Chain Monte Carlo methods.5. This review of these methods will motivate and provide a foundation for the methodology presented later in this dissertation.. Gaussian Process Models A Gaussian process GP can be thought of as an infinite-dimensional generalization of a multivariate normal distribution. A stochastic process, Y x, x X R d with underlying probability space Ω, B, P is a GP if for any n and any x,..., x n in X, the joint distribution of the vector Y = Y x,..., Y x n has a multivariate normal distribution Santner et al. 23. That is, Y N n µ, C,. where µ = µ x,..., µ x n, and C is an n n covariance matrix with ij th element attained from the covariance function, C x i, x j = Cov Y x i, Y x j. A GP is fully specified by its mean function, µx = E Y x, and its covariance function. A function C, is said to be a valid covariance function if it is a symmetric positive semi-definite function, that is, if

16 m m α i α j C x i, x j and i= j= 2 C x, x = C x, x hold for any choice of m, α R m, and x,..., x m X see pg. 27 in Kuß 26. It is also common to work with correlation functions. A function R x, x = Cor Y x, Y x is a valid correlation function if the function is a positive semidefinite, symmetric R x, x = R x, x function, and R x, x =. Valid covariance and correlation functions are typically difficult to construct. However, there are some properties that make this construction easier. Let C i x, x, i =, 2, be valid covariance functions and let R i x, x, i =, 2, be valid correlation functions. Then C x, x = C x, x + C 2 x, x is a valid covariance function. 2 C x, x = C x, x C 2 x, x is a valid covariance function, and R x, x = R x, x R 2 x, x is a valid correlation function. 3 For < α <, C x, x = αc x, x + αc 2 x, x is a valid covariance function, and R x, x = αr x, x + αr 2 x, x is a valid correlation function. These three properties provide many methods for constructing valid correlation and covariance functions. For example, many correlation functions used in practice are separable. A correlation function is separable if R x, x = d i= R i x i, x i, where R i x i, x i is a valid correlation function on R. The validity of separable correlation functions follows directly from property 2 above. Some of the more common families of correlation functions include the Matern family very common in the geostatistical literature, the cubic correlation family, the spherical correlation family, and the product power exponential family. The power 2

17 exponential family has the form d R x, x θ = exp θ j x j x j p j, < p j 2 and θ j >, j =,..., d..2 j= A special case of the power exponential family is when p j = 2, j =,..., d: d R x, x θ = exp θ j xj x 2 j,.3 Equation.3 is the separable Gaussian correlation function. An equivalent parameterization for this correlation function is to let θ j = k lnρ j for some positive constant k so that R x, x ρ = d j= j= ρ kx j x j 2 j, ρ j, j =,..., d,.4 This parameterization is often preferred due to ρ j having the interpretation of the correlation between outputs at two inputs that differ only in the j th dimension by / k of a unit or / k of the domain if the inputs have been scaled to [, ] d. For example, this thesis will let k = 6, so that ρ j is the correlation between outputs at two inputs that differ only in the j th dimension by /4 of a unit. Another common choice is k = 4 which has the same interpretation for /2 of a unit. This correlation function leads to smooth sample paths that are continuous and infinitely differentiable. Covariance and correlation functions may be either isotropic or anisotropic. A covariance function that satisfies Cx, x = C x x, where x x is Euclidean distance is isotropic. This means that the covariance between two values depends only on the Euclidean distance between the two locations, so the covariance decays in the same manner in every direction. An example would be the Gaussian correlation function from.3 with θ,..., θ d = θ. A valid covariance 3

18 function that satisfies Cx, x = C x x K, where x x K = [x x Kx x ] /2 is anisotropic. To ensure validity of the covariance function C x x K, the isotropic correlation function C x x must be positive definite and K must be a symmetric, positive semi-definite matrix see Abrahamsen pg.. Commonly, K = diag θ,..., θ d, θ i >, i =,..., d. The anisotropic property is less restrictive in that the covariance does not have to decay at the same rate in every direction. An example of this is the Gaussian correlation function in.3 with at least one θ i different from the others... Stationary vs. Nonstationary Processes A covariance funtion is stationary if, for any x, x X and any translation h R d such that x + h, x + h X, Cx, x = Cx + h, x + h. A GP is stationary if, for any n, any x,..., x n X, and any translation h R d such that x +h,..., x n + h X, Y x,..., Y x n and Y x + h,..., Y x n + h have the same mean vector and covariance matrix. This is equivalent to saying that a GP is stationary if its mean function is constant and its covariance function is stationary. In this case, V ar Y x = σ 2 x X. Then C, = σ 2 R,. Stationary GPs have the property that the covariance is the same for all pairs of locations that have the same relative orientation and same Euclidean distance from each other. For an isotropic stationary GP, the covariance is a function only of distance between points. In practice, the outputs y = y x,..., y x n = Y x, ω,..., Y x n, ω for some ω Ω are observed from a single sample path. Stationary GPs have a property called ergodicity as long as Ch as h. This property allows 4

19 inference about the process as a whole based on a single draw from the process, yx. See Cressie 993 for more details. A GP is nonstationary if either the mean function is not constant or the covariance function is nonstationary. A nonconstant mean function means that it is not necessary that E Y x = E Y x + h. A common technique for generating a nonconstant mean function is to let the mean depend on x much like a regression model. This process has the form Y x = p f i xβ i + Zx = f xβ + Zx, i= where fx = f x,..., f p x is a vector of known regression functions, β = β,..., β p is a vector of unknown regression coefficients, and Zx is a zero-mean stationary Gaussian process. If a covariance function is nonstationary, then it is not necessarily the case that Cx, x = Cx + h, x + h. The covariance between two locations then depends not only on the orientation and distance between the points, but on the location of the points in X. As mentioned previously, a stationary covariance function can be written as σ 2 R,. One possible method for constructing a nonstationary covariance function is to multiply the correlation function R, by a nonconstant variance function σ 2 such that C x, x = σ x σ x R x, x..5 A stationary covariance function is a special case where σ 2 x = σ 2 x. An example is shown in Figure.. The GP in the left column has constant mean and a stationary Gaussian correlation function.4 with ρ =.6 and σ 2 x = σ 2 =. The GP in the right column has constant mean and correlation function.4 with ρ =.6 multiplied by the nonconstant variance function pictured in the top right 5

20 2 Stationary GP: σ 2 x= 4 Nonstationary GP: σ 2 x Draws From a Stationary GP Draws From a Nonstationary GP Y Y.2 vs. Y Y.2 Y Y.2 vs. Y Y.2 Y Y.5 vs. Y Y.5 Y Y.5 vs. Y Y.5 Figure.: Ten draws from each of a stationary GP and nonstationary GP and sample covariances of Y.2 and Y.5 and Y.5 and Y.8 plotted against each other. panel in the manner of.5. The second row of Figure. shows that the variance at a given value for x is the same for every x in the left panel, but varies as x varies in the right panel. The third row shows the values of Y.2 and Y.5 plotted against each other for 4 draws from each of the processes. Overlayed on each plot is a mean zero bivariate normal density with covariance C.2,.5. The fourth row does the same for x =.5 and x =.8. In the left column, the empirical covariances of the samples in the third and fourth rows look the same, which is expected because the covariance here is a function only of the distance between the locations. In the right 6

21 column, the empirical covariances of the samples in the third and fourth rows look very different, emphasizing that the covariance structure varies throughout X..2 Applications of Gaussian Process Models The general idea of modeling with GPs is to represent an unknown function as a realization of a Gaussian process. There are many possible functions that are consistent with a given dataset. A Gaussian process is used as a prior distribution over the infinite-dimensional space of functions see Rasmussen and Williams 26. O Hagan 978 first introduced these priors over functions in a Bayesian regression context, although his approach was not fully Bayesian in that he did not use prior distributions on the correlation parameters. A fully Bayesian approach to modeling with Gaussian processes involves setting a prior for the mean function, µx, assuming a covariance function, Cx, x, and assuming priors for the hyperparameters in Λ, where Λ contains the correlation and variance parameters in C,. For example, for the stationary Gaussian covariance function, σ 2 R,, where R, is the Gaussian correlation function in.4, Λ = σ 2, ρ,..., ρ d. A mean function is often specified as a constant, µx = µ, or as a linear model µx = f xβ,.6 where fx = f x,..., f p x is a vector of known regression functions and β = β,..., β p is a vector of unknown coefficients. The shape of the model in.6 is meant to approximate the trend in Y x. For example, if x is in one dimension, then a possible fx might be, x, x 2,..., x p, in which case the overall trend in Y x can be approximated by a polynomial of degree p. The constant mean is often given an improper uniform prior, pµ, or a Nm, σµ 2 prior distribution with m and σµ 2 assumed known. For a regression-type mean function, β is often given 7

22 the prior β N p b, B, where b and B are considered known or an improper uniform prior with pβ. When Λ = σ 2, θ,..., θ d as in the Gaussian covariance function in.3, σ 2 can be given an inverse gamma prior, σ 2 IGα, β, or equivalently, if the covariance function is defined in terms of the precision λ = σ 2, then λ Gammaα, β. The hyperparameters in Λ that correspond to correlation function parameters are more difficult to assign appropriate priors. It is often the case that there is little prior information available, so a vague prior is desired. However, it is inadvisable to assign improper priors to correlation parameters, since they will often produce an improper posterior Neal 998 and Banerjee et al. 24. For the Gaussian correlation function.3, the correlation parameters are often given Gamma priors with large variances when little prior information is available. In the re-parameterized Gaussian correlation function in.4, a natural choice for the prior for the new parameter is ρ i Betaα i, β i. Oakley 22 talks about the process of obtaining information from a scientific expert to form useful priors in the field of computer experiments. It involves initially making sure that a GP is appropriate, then obtaining information about the differentiability of Y x so an appropriate covariance function can be chosen. The expert should then propose an approximation of the shape of the function so that an appropriate fx can be chosen. They can also make informed guesses about the correlation and variance of the process, leaving the statistician to formulate this information into a useful prior. In Bayesian analysis, inferences are made using the posterior distribution, y, where y = y x,..., x n is the observed training data. Generally, when the 8

23 model has unknown hyperparameters in the correlation function, this posterior distribution is very difficult to work with directly. Markov Chain Monte Carlo MCMC methods can be used to sample from this posterior. An overview of MCMC in general, and Gibbs sampling and the Metropolis Hastings Algorithm in particular, are provided in Section 3. Neal 998 advocates for hybrid Monte Carlo due to its efficiency when properly implemented. Gaussian process computations can be fairly difficult. In particular, the inversion of the n n covariance matrix of the training data causes problems. This matrix can become ill-conditioned, either due to having a large sample size or having training data very close together, causing the computations to be unstable or unreliable due to round-off errors. The most common approach to avoiding this problem is to add a small nugget, σ 2 ɛ, to the diagonal elements. This makes the matrix more wellconditioned while having a negligible effect on the model. Also, inverting an n n matrix is computationally expensive when n is large, having complexity of the order n 3. This can be especially prohibitive in an iterative algorithm like MCMC, where this inversion must be done at each iteration. To reduce some of the computing issues, some have used the maximum likelihood type II ML-II estimate for the hyperparameters as a plug-in estimate while performing Bayesian inference on the other unknown parameters Kuß 26. The ML-II estimate, Λ is given by Λ = arg max [y Λ] Λ For more on the ML-II approach, see Section in Berger 985. Csato and Opper 22 present another approach, sparse approximation, for large training data sets which performs the time consuming matrix operations only on a representative subset of size p < n of the training data. Chapter 8 in Rasmussen and Williams 26 9

24 presents other methods for large datasets, including a Nyström approximation to the covariance matrix. Gaussian processes have been used in many applications including physical experiments, computer experiments, and machine learning. Kolmogorov 94 presented GPs for use in time series analysis. The use of GPs for prediction in a spatial context can be traced back to Matheron 973, and is often called kriging in that context. Cressie 993 also presents this method. GPs in a regression context were first presented by O Hagan 978 and have been seen since then in this context for models with measurement error physical experiments and without measurement error computer experiments for prediction and uncertainty quantification. GPs are also used in classification problems, where the responses are categorical rather than continuous as in the normal linear regression context. Williams and Barber 998 presents a Bayesian method for this application. GPs in a machine learning context were presented by Williams and Rasmussen 996 after Neal 996 described the connection between GPs and infinite neural networks. Sacks et al. 989 apply GPs for both prediction and design in computer experiments. They propose a sequential design strategy that involves minimizing the integrated mean squared error. GPs have also been used in sensitivity analysis Oakley and O Hagan 24 to determine how an output changes in response to changes in the inputs. Kennedy and O Hagan 2 used GPs in calibration, a process in which unknown parameters in the computer model are adjusted so that the simulator output fits observed data.

25 .3 Prediction Methodologies One common goal in the use of a GP as a model of an unknown function is to predict the value of the function at a previously unobserved location based on observed values of the process. This section presents a few of the methods that have been proposed in the past..3. Kriging One method for prediction using GPs is commonly known as universal kriging. The GP, Y x, is specified as follows: Y x = p f i xβ i + Zx = f xβ + Zx,.7 i= where fx = f x,..., f p x is a vector of known regression functions, β = β,..., β p is a vector of unknown regression coefficients, and Zx is a zero-mean stationary Gaussian process with stationary covariance function CovZx, Zx + h = σ 2 Rh. This model is sometimes called regression plus stationary GP and is nonstationary in the sense that it allows a global trend f xβ to be fit allowing the mean to vary across the input space while allowing for local deviations from this trend. It should be noted that when this model has a constant trend, it reduces to a stationary model the mean does not vary across the input space and is often called ordinary kriging. Some examples of draws from this process can be seen in Figure.2 and Figure.3. Figure.2 shows three draws from the process in.7 for a constant trend with fx =, β =.5, a linear trend with fx =, x, β =.5,.65,

26 .3 Constant Trend.6 Linear Trend Yx. Yx Constant Trend x.6 Linear Trend x.4 Quadratic Trend.4 Cubic Trend.2.2 Yx.2.4 Yx Quadratic Trend x.8 Cubic Trend x Figure.2: Example draws from the process in.7. a quadratic trend with fx =, x, x 2, β =.7, 2.2,.6, and a cubic trend with fx =, x, x 2, x 3, β =.75, 2.4, 4.5, 2.9 in one dimension with σ 2 =. and ρ =.2. This figure clearly shows how the mean of the process varies across the input space. Each draw follows roughly the overall trend with smaller, local deviations around the trend. Figure.3 shows one draw from the process in.7 for a constant trend with fx =, β =.5, a linear trend in two dimensions with fx =, x, x 2, β =., 2,, 2

27 a quadratic trend in two dimensions with fx =, x, x 2, x 2, x 2 2, β =.7, 2.2,.6, 4.5, 2.9, and a quadratic trend in two dimensions with an interaction fx =, x, x 2, x 2, x 2 2, x x 2, β =.7, 2.2,.6, 4.5, 2.9, 3.5. For these draws, σ 2 =.25 and ρ =.2,.4. Constant Trend Linear Trend in x and x 2 Yx 2 Yx x 2 x x 2 x.5 Quadratic Trend in x and x 2 Quadratic Trend and Interaction Yx 2 2 Yx x 2 x x 2 x.5 Figure.3: Example draws from the process in.7. Now, consider training data y = yx,..., yx n, a set of data measured at n different input settings {x,..., x n } R d. It is often desired to make a prediction at a new input setting, x, given the training data. The following theorem gives a result about choosing a best predictor. 3

28 Theorem. Let Y x and Y = Y x,..., Y x n be jointly distributed as follows: Y x Y G, where G is some distribution, and suppose the conditional mean E Y x Y = y = ŷx exists. Then ŷx is the minimum mean square prediction error MSPE predictor. Proof. See Appendix A. In particular, Theorem is true for Gaussian processes, which are being used as the model for the unknown function. In this case, G will be a multivariate normal distribution. Now define an n p matrix F = fx,..., fx n, f = f x,..., f p x, rx = Rx x,..., Rx x n, a vector containing the correlation between the process at the new prediction location and the process at each of the training data locations, and R to be an n n matrix with ij th element Rx i x j, the correlation of the process between the i th and j th locations. Then Y x Y [ f N F rx n+ β, σ 2 rx R ], and, by Theorem and multivariate normal theory, the minimum MSPE predictor is ŷx = E Y x Y = y = f β + r x R y Fβ. Now β is generally unknown, but, as shown in Santner et al. 23, the best linear unbiased predictor is ŷ UK x = f β + r x R y F β,.8 4

29 where β = F R F F R y is the usual generalized least squares estimator. The variance of ŷx, as shown in Santner et al. 23 in Section 4., is V arŷx = σ 2 r x R r x + h F R F h where h = f F R r x. α% prediction intervals can then be given by ŷ UK x ± zα/2 V arŷx..9 This regression plus GP approach is relatively straightforward. A maximum likelihood ML or restricted maximum likelihood REML approach can be used to estimate the parameters in the model. There is a function called MPeRK and a toolbox called DACE in MATLAB that can be used to estimate the parameters and to make predictions at new input locations using this regression plus GP approach. After the correlation parameters are estimated, the kriging predictors and prediction intervals are found as in.8 and.9. Kriging Examples Consider the simple test function yx = sin5x, x [, 3].. Figure.4 shows the true function black, a kriging predictor red, and 95% prediction intervals yellow for each of a constant trend top left, MSP E =.27 9, a linear trend top right, MSP E =.282 9, a quadratic trend bottom left, MSP E = , and a cubic trend bottom right, MSP E =.638 9, where each of the MSPEs were calculated over a -point grid of test locations. The training data here indicate that a stationary process is appropriate, as it looks like there is no trend and a constant variance. This function is fairly easy to predict-the 5

30 Figure.4: Kriging predictors for the function in. with 95% prediction intervals. true function and the kriging predictor overlap nearly perfectly, as indicated by the small MSPEs. There is also very little uncertainty, as indicated by the miniscule prediction intervals in all four plots. A two-dimensional example is the Franke function from Franke 979. yx = yx, x 2 =.75exp 9x 2 2 9x exp 9x + 2 9x exp 9x 7 2 9x exp 9x 4 2 9x 2 7 2, 4 4 x, x 2 [, ].. 6

31 Figure.5 shows the true function and the training data from a 24-run maximin Latin hypercube design top, along with a kriging predictor with a constant trend middle left, MSP E =.4, a kriging predictor with a cubic trend and interactions middle right, MSP E =.4, and a plot for each that shows the degree of prediction error across the surface. The MSPEs were calculated over a 2 2 grid of test locations. It is clear that the kriging predictor with a constant trend per- Figure.5: Kriging predictors and plots of errors for the function in.. forms much better than the kriging predictor with a cubic trend and interactions. A misspecification of the overall trend can lead to poor predictions 7

32 Both of these functions could be modeled well using a GP with stationary covariance. Now consider the test function presented in Ba and Joseph 22 originally from Xiong et al. 27. The true function is yx = sin 3x.9 4 cos 2x.9 + x.9, x [, ]..2 2 By looking at the true function in Figure.6, it can be seen that the mean of the function in the region x [,.4] is smaller than the mean in the region x.4, ]. Also, the volatility is larger in the region x [,.4] than in the region x.4, ]. For these reasons, it seems that a nonstationary model may be more appropriate. This figure shows the true function black, a kriging predictor red for each of a constant trend top left, MSP E =.5, linear trend top right, MSP E =.7, quadratic trend bottom left, MSP E =.6, and cubic trend bottom right, MSP E =.3, where the MSPEs were calculated over a -point grid of test locations, and 95% prediction intervals yellow for the kriging predictors. As mentioned previously, it seems that the volatility is larger for large x than for small x, and so it would make sense that the prediction intervals should narrow to account for this. The model with stationary covariance does not allow for this adjustment, and so the prediction intervals seem to be too wide for large x. This phenomenon can also be seen by considering the test function, yx = e 2x sin4πx 2, x [, 3]..3 It can be seen in Figure.7 that the true function has a fairly constant mean over the input space, but the volatility is decreasing as x gets larger, so a nonstationary model might be more appropriate to account for this. This figure shows the true function black, a kriging predictor red for each of a constant trend top left, MSP E =.82, a linear trend top right, MSP E =.82, a quadratic 8

33 Figure.6: Kriging predictors for the function in.2 with 95% prediction intervals. trend bottom left,msp E =.82, and cubic trend bottom right,msp E =.83, where the MSPEs were calculated over a -point grid of test locations, and 95% prediction intervals yellow for the kriging predictors. Again, it seems as though the prediction intervals should be narrower when the volatility is low large x, but there is no reduction in the width of these prediction intervals. Consider the following 2-dimensional test function: x2.9 y x, x 2 = sin 2 x.9 2 cos 2 x.9 2 x.9 sin 2 x cos 2 x 2.9, 2 9

34 Figure.7: Kriging predictors for the function in.3 with 95% prediction intervals. x, x 2 [, ]..4 Figure.8 below shows this function. This function is very volatile when x and x 2 are both near and is smoother elsewhere in the space. A 4-run design with more densely populated design points for x <.4 was used for this example, and the predictions were tested over a 2 2 grid of test locations. Figure.8 shows the MPERK predictors with a linear trend left column, MSP E =.78 and a cubic trend with interactions right column, MSP E =.87 and a plot that shows the errors of each of these predictors across the input space. The stationary covariance is 2

35 Figure.8: Kriging predictors and plots of errors for the function in.4. not able to capture the high volatility when x and x 2 are small, and so the predictors are too smooth. This model also requires the specification of the overall trend constant, linear, quadratic, interactions, etc.. A misspecification of this trend can lead to poor predictions, particularly in areas that are far from the training data, where predictions will tend towards the overall trend. This can best be seen in Figure.5. Choosing a constant trend leads to relatively good predictions MSP E =.4, while choosing a cubic trend with interactions leads to poor predictions M SP E =.4. The large errors in prediction can be seen particularly at the edges where there is less 2

36 training data. It may be desired to have a more flexible trend that does not need to be specified..3.2 Treed Gaussian Processes A proposed method for handling this nonstationary mean and nonstationary covariance structure is the Treed Gaussian Process TGP model of Gramacy and Lee 28. The basic idea behind this model is to assume that the input space can be partitioned into R rectangular regions such that a GP with a linear trend and stationary covariance structure is appropriate in each region. Treed partition models partition the input space by making binary splits on the value of one input variable at a time, so that the partition boundaries are parallel to coordinate axes. Also, each new partition is a subpartition of a previous partition. The input space is partitioned into R regions: {r ν } R ν=. In the ν th region, there are n ν training data locations and their corresponding responses, D ν = {X ν, Y ν }. For example, in a two-dimensional input space, x = x, x 2 [, ] 2, a first partition may divide the input space by whether x.4 or x >.4. A second partition on whether x 2.6 or x 2 >.6 will then only divide one of the previous rectangles. An example of this partitioning method is shown in Figure.9. The data in each region is then used to fit models independently in the regions. Classification and regression trees CART from Breiman et al. 984 fit a constant surface to each region. Chipman et al. 998 fit a Bayesian hierarchical linear model in each region. The treed Gaussian process model extends the model of Chipman et al. 998 by fitting a GP with a linear trend and stationary covariance structure in each region. This will lead to different mean and covariance structures across the space as a whole. 22

37 .9.8 r x 2.5 r r x Figure.9: An example partition. As mentioned in the previous paragraph, each region, r ν, contains data, D ν, at n ν locations. Let m = d +, be the number of covariates recall that there will be a linear trend for each dimension plus an intercept. The hierarchical model is set up as follows: Y ν β ν, σν, 2 R ν N nν Fν β ν, σνr 2 ν β ν σν, 2 τν 2, W, β N m β, σντ 2 ν 2 W β N m µ, B τν 2 IG ατ /2, qτ /2 σν 2 IG ασ /2, qσ /2 W W ρv, ρ for ν =,..., R, where N p, IG, and W are the p-variate normal, inverse-gamma, and Wishart distributions, respectively, F ν =, X ν, R ν = R ν + σ 2 ɛ ν I nν is a correlation matrix defined in.3 plus a nugget, W is an m m matrix, and the hyperparameters µ, B, V, ρ, α σ, q σ, α τ, and q τ are assumed known. 23

38 To make predictions, samples from the posterior are obtained using MCMC methods. Given a tree, T, that is, a specific partitioning of the input space, the parameters can be sampled from the posterior using Gibbs and Metropolis Hastings steps. Sampling from the posterior for the tree structure is performed by Reversible Jump MCMC. For more details, see Gramacy and Lee 28. Point predictions for the TGP model have a similar form to that of universal kriging. Given a region, r ν, the point prediction has the form ŷ T GP x = E Y x Y = y, x r ν = E E Y x Y = y, x r ν, Λ = E f β ν + r ν x R ν Y ν F ν βν x r ν n mcmc f [i] β ν + r ν [i] x R ν [i] Y ν F ν n β[i] ν mcmc i=.5 where Λ contains all of the parameters, β ν = F ν K ν F ν + W τν 2 f =, x,..., x d, and r ν x = R ν x x,..., R ν x x nν, F ν K ν F ν + W, τν 2 a vector containing the correlation between the process at the new prediction location and the process at each of the training data locations in r ν, and the superscript, [i], indicates the use of the parameters from the i th draw of the MCMC algorithm to calculate the respective estimators. A major advantage this method has is computational. The usual stationary GP model tends to run into trouble when the number of training data points, n, is large. It requires calculating the inverse of an n n covariance matrix. This leads to two computing issues. First, large matrices are often ill-conditioned, and so are 24

39 numerically unstable. Second, even if a large matrix is well-conditioned, it is computationally intensive On 3 to invert the n n matrix. In an iterative algorithm such as MCMC, inverting a large matrix many times leads to slow-running computer code. TGP uses a divide-and-conquer approach, partitioning the data so that there are R smaller n ν n ν, ν =,..., R matrices. These smaller matrices are more likely to be well-conditioned and are not as computationally intensive to invert. The tgp package in R implements the treed Gaussian process model. One of its characteristics is that it gives a zero probability to trees with partitions containing less than min{, n + } data points. For example, for x R, at least 2 training data points are needed for the input space to be partitioned into two regions. If there were only 9 training data points, one of the regions would have to have 9 or fewer points, and so the input space will not be partitioned at all. This is done to ensure that there is enough data in each region to make useful predictions. TGP Examples Below in Figure. are three known functions black, the TGP predictor red, and 95% prediction intervals yellow. For the function in the top row, yx = sin5x, x [, 3], it looks as though a stationary model would be appropriate. The TGP method does not partition the input space in this example and fits a stationary model whose predictor has an MSPE over a -point grid of test locations of As mentioned previously, this function is fairly easy to predict. The true function and the kriging predictor overlap nearly perfectly, as indicated by the small MSPE. There is also very little uncertainty, as indicated by the miniscule prediction intervals. For 25

40 the functions in the second row, yx = sin 3x.9 4 cos 2x.9 + in the left column, and in the right column. yx = e 2x sin4πx 2, x [, 3] x.9, x [, ] 2 Figure.: TGP predictors for the functions in.,.2, and.3 with 95% prediction intervals and small n. It looks as though the mean changes in the input space for the function in the left column and the variance is decreasing as x gets larger in both functions. A 26

41 nonstationary model might seem appropriate to model the data from both of these functions. In the left column, there are n = 7 data points, and in the right column, there are n = 5 data points. With this small amount of data, the input space is not able to be partitioned into more than one region, and so a stationary model is fit for each function. It should be noted that the prediction intervals remain fairly wide for the entire space. The MSPE over a -point grid of test locations for the predictor in the left column is.45, and the MSPE for the predictor in the right column is.7. Below in Figure. are the same known functions but with n = 25 training data observations. Having more data allows the input space to be partitioned if the data advises it. The data in the plot on the first row indicates that a stationary model is appropriate, and so there is no partitioning of the data. The key thing to notice in the plots on the second row is the prediction intervals yellow becoming much narrower at approximately x =.5 in the left panel and x =.6 in the right panel. In both cases, the input space is partitioned into two regions, one with a larger variance, and one with a smaller variance. Figure.2 below shows the function in. and the same 24-run maximin Latin hypercube design as in the previous section, along with the TGP predictor and a plot that shows the errors across the input space. This predictor has an MSPE of.9 over a 2 2 grid of test locations. Figure.3 below shows the function in.4 and the same 4-run design as in the previous section, along with the TGP predictor and a plot that shows the errors across the input space. The predictor has an MSPE of.57 over a 2 2 grid of test locations. 27

42 Figure.: TGP predictors for the functions in.,.2, and.3 with 95% prediction intervals and larger n. It should be noted here that this model estimates a rather large nugget, and so is not an interpolator. The nugget can be fixed to a very small number, but the predictions often become unstable. This often leads to the model interpreting volatility in the data as noise rather than signal, so the predictor remains smooth through the area of volatilty, as can be seen in both Figure.2 and Figure Composite Gaussian Processes Another proposed method for handling a nonstationary mean and nonstationary covariance structure is the Composite Gaussian Process CGP model of Ba and 28

43 Figure.2: TGP predictor and plot of errors at a 2 2 grid for.. Joseph 22. This model incorporates a flexible global trend and a variance model to account for a changing variance throughout the input space. Given the process parameters, Λ, the CGP model is expressed as a sum of two Gaussian processes as follows: Y x = Y g x + σxy l x.6 Y g x Λ GP µ, τ 2 g Y l x Λ GP, l. 29

44 Figure.3: TGP predictor and plot of errors at a 2 2 grid for.4. Without loss of generality, write σ 2 x = σ 2 vx. Then Λ = µ, τ 2, σ 2, θ, κ, vx. The functions, g and l, are Gaussian correlation functions with unknown correlation parameters θ and κ, each of length equal to the number of inputs, d. So gh θ = exp θ j h 2 j, and lh κ = exp κ j h 2 j, and vx is a function that allows the volatility of Y l x to change throughout the input space. Typically, vx is normalized so that the average magnitude of the variance is σ 2, while vx adjusts the magnitude of the variance at x, x X. 3

45 Y g x and Y l x are independent Gaussian process, and Y g x is a smooth, stationary process that captures the global trend, while Y l x makes local adjustments to the trend. To ensure that the global process, Y g x, is smoother than the local process, θ is given an upper bound, κ l, so that θ κ l κ. Note that Y l x is augmented by a variance model that allows the local variability to change throughout the input space. Informally, the model in.6 can be thought of as Y x Λ GP µ, τ 2 g + σ 2 xl. Let V = diag{vx,..., vx n } contain the local variances at each of the training data sites, and let G and L be n n correlation matrices with ij th element gx i x j and lx i x j, respectively. Then, because of the properties of Gaussian processes, the joint distribution of any Y x and the training data Y = Y x,..., Y x n is a multivariate normal distribution Y x Y Λ N +n [ µ µ τ, 2 + σ 2 vx C C C ].7 where C = τ 2 G + σ 2 V /2 LV /2 is the n n covariance matrix for the training data, and C = τ 2 gx + σ 2 v /2 x V /2 lx, with gx = gx x,..., gx x n and lx = lx x,..., lx x n. The first row of Figure.4 shows two possible vx functions. The second row shows six draws from the process in.6 for each vx, where µ =, τ 2 =, σ 2 =.5, θ =, and κ =, and the third row shows a single draw from the process along with its components, the global process and the local process. The key thing to notice in the second row of Figure.4 is the increase in volatility of each draw from the process where the local volatility function, vx, is larger. In the third row of Figure.4, it can be seen that the global process green is relatively smooth and looks stationary, and it captures the overall trend of Y x fairly well. The local process red has varying volatility, becoming more 3

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group Nonparmeteric Bayes & Gaussian Processes Baback Moghaddam baback@jpl.nasa.gov Machine Learning Group Outline Bayesian Inference Hierarchical Models Model Selection Parametric vs. Nonparametric Gaussian

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Gaussian Processes for Computer Experiments

Gaussian Processes for Computer Experiments Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Designing and analyzing a circuit device experiment using treed Gaussian processes

Designing and analyzing a circuit device experiment using treed Gaussian processes Designing and analyzing a circuit device experiment using treed Gaussian processes Herbert K. H. Lee, Matthew Taddy, Robert B. Gramacy, and Genetha A. Gray Abstract The development of circuit devices can

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

An introduction to Bayesian statistics and model calibration and a host of related topics

An introduction to Bayesian statistics and model calibration and a host of related topics An introduction to Bayesian statistics and model calibration and a host of related topics Derek Bingham Statistics and Actuarial Science Simon Fraser University Cast of thousands have participated in the

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Kriging and Alternatives in Computer Experiments

Kriging and Alternatives in Computer Experiments Kriging and Alternatives in Computer Experiments C. F. Jeff Wu ISyE, Georgia Institute of Technology Use kriging to build meta models in computer experiments, a brief review Numerical problems with kriging

More information

arxiv: v1 [stat.me] 24 May 2010

arxiv: v1 [stat.me] 24 May 2010 The role of the nugget term in the Gaussian process method Andrey Pepelyshev arxiv:1005.4385v1 [stat.me] 24 May 2010 Abstract The maximum likelihood estimate of the correlation parameter of a Gaussian

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Gaussian predictive process models for large spatial data sets.

Gaussian predictive process models for large spatial data sets. Gaussian predictive process models for large spatial data sets. Sudipto Banerjee, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang Presenters: Halley Brantley and Chris Krut September 28, 2015 Overview

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

9.2 Support Vector Machines 159

9.2 Support Vector Machines 159 9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of

More information

Bayesian treed Gaussian process models

Bayesian treed Gaussian process models Bayesian treed Gaussian process models Robert B. Gramacy and Herbert K. H. Lee rbgramacy,herbie}@ams.ucsc.edu Department of Applied Math & Statistics University of California, Santa Cruz Abstract This

More information

FastGP: an R package for Gaussian processes

FastGP: an R package for Gaussian processes FastGP: an R package for Gaussian processes Giri Gopalan Harvard University Luke Bornn Harvard University Many methodologies involving a Gaussian process rely heavily on computationally expensive functions

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Gaussian Processes in Machine Learning

Gaussian Processes in Machine Learning Gaussian Processes in Machine Learning November 17, 2011 CharmGil Hong Agenda Motivation GP : How does it make sense? Prior : Defining a GP More about Mean and Covariance Functions Posterior : Conditioning

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University this presentation derived from that presented at the Pan-American Advanced

More information

Nonparametric Regression With Gaussian Processes

Nonparametric Regression With Gaussian Processes Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes

More information

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Bayesian Dynamic Linear Modelling for. Complex Computer Models Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer

More information

Introduction to emulators - the what, the when, the why

Introduction to emulators - the what, the when, the why School of Earth and Environment INSTITUTE FOR CLIMATE & ATMOSPHERIC SCIENCE Introduction to emulators - the what, the when, the why Dr Lindsay Lee 1 What is a simulator? A simulator is a computer code

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Bayesian treed Gaussian process models

Bayesian treed Gaussian process models Bayesian treed Gaussian process models Robert B. Gramacy and Herbert K. H. Lee rbgramacy,herbie}@ams.ucsc.edu Department of Applied Math & Statistics University of California, Santa Cruz Abstract This

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

SEQUENTIAL ADAPTIVE DESIGNS IN COMPUTER EXPERIMENTS FOR RESPONSE SURFACE MODEL FIT

SEQUENTIAL ADAPTIVE DESIGNS IN COMPUTER EXPERIMENTS FOR RESPONSE SURFACE MODEL FIT SEQUENTIAL ADAPTIVE DESIGNS IN COMPUTER EXPERIMENTS FOR RESPONSE SURFACE MODEL FIT DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Limit Kriging. Abstract

Limit Kriging. Abstract Limit Kriging V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu Abstract A new kriging predictor is proposed.

More information

Some general observations.

Some general observations. Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces

More information

Better Simulation Metamodeling: The Why, What and How of Stochastic Kriging

Better Simulation Metamodeling: The Why, What and How of Stochastic Kriging Better Simulation Metamodeling: The Why, What and How of Stochastic Kriging Jeremy Staum Collaborators: Bruce Ankenman, Barry Nelson Evren Baysal, Ming Liu, Wei Xie supported by the NSF under Grant No.

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Spatial omain Hierarchical Modelling for Univariate Spatial ata Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A.

More information

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω ECO 513 Spring 2015 TAKEHOME FINAL EXAM (1) Suppose the univariate stochastic process y is ARMA(2,2) of the following form: y t = 1.6974y t 1.9604y t 2 + ε t 1.6628ε t 1 +.9216ε t 2, (1) where ε is i.i.d.

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Gaussian Processes Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Chris Paciorek March 11, 2005 Department of Biostatistics Harvard School of

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study

Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study Gunter Spöck, Hannes Kazianka, Jürgen Pilz Department of Statistics, University of Klagenfurt, Austria hannes.kazianka@uni-klu.ac.at

More information

Kriging by Example: Regression of oceanographic data. Paris Perdikaris. Brown University, Division of Applied Mathematics

Kriging by Example: Regression of oceanographic data. Paris Perdikaris. Brown University, Division of Applied Mathematics Kriging by Example: Regression of oceanographic data Paris Perdikaris Brown University, Division of Applied Mathematics! January, 0 Sea Grant College Program Massachusetts Institute of Technology Cambridge,

More information

A Process over all Stationary Covariance Kernels

A Process over all Stationary Covariance Kernels A Process over all Stationary Covariance Kernels Andrew Gordon Wilson June 9, 0 Abstract I define a process over all stationary covariance kernels. I show how one might be able to perform inference that

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 4 - Fundamentals of spatial processes Lecture notes TK4150 - Intro 1 Chapter 4 - Fundamentals of spatial processes Lecture notes Odd Kolbjørnsen and Geir Storvik January 30, 2017 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems Andrew Brown 1,2, Arvind Saibaba 3, Sarah Vallélian 2,3 CCNS Transition Workshop SAMSI May 5, 2016 Supported by SAMSI Visiting Research

More information

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May

More information

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Models for models Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Outline Statistical models and tools Spatial fields (Wavelets) Climate regimes (Regression and clustering)

More information

Introduction to Geostatistics

Introduction to Geostatistics Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Reliability Monitoring Using Log Gaussian Process Regression

Reliability Monitoring Using Log Gaussian Process Regression COPYRIGHT 013, M. Modarres Reliability Monitoring Using Log Gaussian Process Regression Martin Wayne Mohammad Modarres PSA 013 Center for Risk and Reliability University of Maryland Department of Mechanical

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada. In Advances in Neural Information Processing Systems 8 eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press, 1996. Gaussian Processes for Regression Christopher K. I. Williams Neural Computing

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

Use of Design Sensitivity Information in Response Surface and Kriging Metamodels

Use of Design Sensitivity Information in Response Surface and Kriging Metamodels Optimization and Engineering, 2, 469 484, 2001 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Use of Design Sensitivity Information in Response Surface and Kriging Metamodels J. J.

More information

Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning

Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning Robert B. Gramacy University of Chicago Booth School of Business faculty.chicagobooth.edu/robert.gramacy

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information