Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Similar documents
Computer Practical: Metropolis-Hastings-based MCMC

eqr094: Hierarchical MCMC for Bayesian System Reliability

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Advanced Statistical Modelling

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Markov Chain Monte Carlo methods

I. Bayesian econometrics

CPSC 540: Machine Learning

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Kobe University Repository : Kernel

Monte Carlo in Bayesian Statistics

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

CSC 2541: Bayesian Methods for Machine Learning

Sampling Methods (11/30/04)

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference and MCMC

Bayesian Methods for Machine Learning

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Markov Chain Monte Carlo

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Modelling Operational Risk Using Bayesian Inference

F denotes cumulative density. denotes probability density function; (.)

Probabilistic Machine Learning

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Advances and Applications in Perfect Sampling

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Contents. Part I: Fundamentals of Bayesian Inference 1

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Kernel adaptive Sequential Monte Carlo

Markov Chain Monte Carlo (MCMC)

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Principles of Bayesian Inference

Learning the hyper-parameters. Luca Martino

Gaussian Processes for Computer Experiments

An introduction to Bayesian statistics and model calibration and a host of related topics

STAT 518 Intro Student Presentation

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Bayesian Nonparametric Regression for Diabetes Deaths

36-463/663: Hierarchical Linear Models

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

What is the most likely year in which the change occurred? Did the rate of disasters increase or decrease after the change-point?

A note on Reversible Jump Markov Chain Monte Carlo

An introduction to Sequential Monte Carlo

ST 740: Markov Chain Monte Carlo

Bayesian Methods in Multilevel Regression

Bayesian Gaussian Process Regression

Machine Learning using Bayesian Approaches

Inferring biomarkers for Mycobacterium avium subsp. paratuberculosis infection and disease progression in cattle using experimental data

Report and Opinion 2016;8(6) Analysis of bivariate correlated data under the Poisson-gamma model

MCMC Methods: Gibbs and Metropolis

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Bayesian Analysis. Bayesian Analysis: Bayesian methods concern one s belief about θ. [Current Belief (Posterior)] (Prior Belief) x (Data) Outline

Spring 2006: Introduction to Markov Chain Monte Carlo (MCMC)

Density Estimation. Seungjin Choi

Introduction to Machine Learning CMU-10701

Computational statistics

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

MCMC algorithms for fitting Bayesian models

Making rating curves - the Bayesian approach

Markov chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics

Bayesian Computation

ELEC633: Graphical Models

17 : Markov Chain Monte Carlo

Bayesian model selection in graphs by using BDgraph package

Confidence Intervals. CAS Antitrust Notice. Bayesian Computation. General differences between Bayesian and Frequntist statistics 10/16/2014

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Multivariate Normal & Wishart

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Bayesian Estimation of Input Output Tables for Russia

Bayesian Phylogenetics:

TUNING OF MARKOV CHAIN MONTE CARLO ALGORITHMS USING COPULAS

On Bayesian Computation

Bayesian Inference and Decision Theory

Bayes: All uncertainty is described using probability.

Markov Chain Monte Carlo Methods

Theory and Methods of Statistical Inference

Lecture Notes based on Koop (2003) Bayesian Econometrics

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

MCMC and Gibbs Sampling. Kayhan Batmanghelich

APM 541: Stochastic Modelling in Biology Bayesian Inference. Jay Taylor Fall Jay Taylor (ASU) APM 541 Fall / 53

Monte Carlo integration

Probabilistic Graphical Networks: Definitions and Basic Results

STA 4273H: Statistical Machine Learning

The Bayesian Approach to Multi-equation Econometric Model Estimation

Applicability of subsampling bootstrap methods in Markov chain Monte Carlo

Markov Chain Monte Carlo methods

Tutorial on Probabilistic Programming with PyMC3

Monte Carlo Methods. Leon Gu CSD, CMU

Introduction to Bayesian Analysis Procedures (Chapter)

Transcription:

Bayesian Prediction of Code Output ASA Albuquerque Chapter Short Course October 2014

Abstract This presentation summarizes Bayesian prediction methodology for the Gaussian process (GP) surrogate representation of computer model output. The conditional predictive distribution for predicting unsampled code output is derived, followed by posterior distributions for GP regression and precision parameters based on two common prior distribution assumptions. These results lead to a description of the predictive distribution itself when the GP correlation parameters are assumed known. In the case of unknown correlation parameters, their posterior distribution is provided. Markov chain Monte Carlo (MCMC) techniques are introduced as a means of sampling this posterior, and this results in a Monte Carlo method of sampling the predictive distribution when the GP correlation parameters are unknown. Two examples of Bayesian prediction of unsampled code output are provided to illustrate the methods discussed. 2

Bayesian Prediction: Framework Inference based on predictive distribution ytr s =(x tr 1,y s (x tr 1 )),...,(x tr n,y s (x tr n )) yte s =(x te 1,y s (x te 1 )),...,(x te n p,y s (x te n p )) training data test data Predictive distribution Z p( yte s ytr s )= Z = All uncertain model parameters p( y s te, y s tr ) d p( y s te y s tr, ) ( y s tr ) d Derived from process modeling assumptions, e.g. Gaussian process Derived analytically (conjugate prior) or sampled via MCMC 3

Bayesian Prediction: Sampling Distribution Joint Sampling Distribution (test and training data) (Yte s, Ytr) s Y s x te 1,...,Y s x te n p,y s x tr 1,...,Y s x tr n F = Fte N np +n F, F tr (n p+n) k regression matrix R = 1 R (given (β, λ, R)) Rte R te,tr R T te,tr R tr correlation (n p+n) (n p+n) Conditional Distribution (test data given training data) matrix p ( y s te y s tr,, ) is N np (m (y s te y s tr,, ),V (y s te y s tr,, )) m (y s te y s tr,, )=F te + R te,tr R 1 tr (y s tr F tr ) V (y s te y s tr,, )= 1 R te R te,tr R 1 tr R T te,tr 4

Prior Distributions: Case I (Informative) ( )isn k Bayesian Prediction: Priors and Posteriors I 0, ( ) is Gamma(a, b) 1 1 Posterior Distributions ( ytr) s is Gamma(a 1,b 1 ) a 1 =(2a + n)/2 b 1 = 2b + y s tr F tr b T R 1 tr ytr s F tr b T + b 0 1 b 0 /2 b = F T tr Rtr 1 1 F tr F T tr Rtr 1 ytr s and = 1 + F T trrtr 1 1 F tr ( ytr s )ist k 2a 1, bµ, b 1 /a1 b b 1 = + F T trr 1 tr F tr and bµ = b h F T trr 1 tr F tr b + 0 i 5

Prior Distributions: Case II (Noninformative) Posterior Distributions ( ytr) s is Gamma(a 1,b 1 ) Bayesian Prediction: Priors and Posteriors II ( ) / 1 independent of ( ) is Gamma(a, b) a 1 =(2a + n b 1 = 2b + k)/2 ytr s F tr b T R 1 tr y s tr F tr b /2 b = F T tr Rtr 1 1 F tr F T tr Rtr 1 ytr s ( ytr s )ist k 2a 1, b,b 1 F T trrtr 1 F tr Results for Jeffreys Prior Distribution (, ) / (1/ ): Set a = b =0 1 /a1 6

Predictive Distribution Bayesian Prediction: Predictive Distribution p( y s te y s tr )ist np (2a 1,µ te,b 1 te /a 1 ) µ te = F te bµ + R te,tr R 1 tr (y s tr F tr bµ) te = R te R te,tr R 1 tr R T te,tr + H te b H T te! Here, H te = F te R te,tr R 1 tr F tr! For Case I priors, bµ and b given on slide 54 For Case II priors, bµ = b and b = F T trrtr 1 1! F tr! For Je reys prior, µ te is the BLUP and te the associated prediction uncertainty 7

Bayesian Prediction: Uncertain Correlation Parameters Parametric Correlation Functions φ: Denotes uncertain correlation function parameters For example, consider the Gaussian correlation function parameterized by correlation length 0 < ρ < 1:# R ( u, v )=exp " 4 # kx log ( i )(u i v i ) 2 ; u, v 2 [0, 1] k i=1 =( 1,..., k ) Obtained from results on next slide Predictive Distribution Z p ( yte s ytr s )= p ( yte s ytr, s ) ( ytr s ) d Obtained from results on slide 7 8

Bayesian Prediction: Correlation Posterior Correlation Parameter Posterior Distribution ( ytr s ( ) ) / b a 1 1 R tr 1/2 F T trrtr 1 1/2 F tr 1/2 For ( ) / 1, use =1 Example: Gaussian Correlation ( 1,..., k ) independent i Beta ( a,b ) a =1, b =0.1 =) e ect sparsity ( 1,..., M ) sampled from ( y s tr ) via MCMC 9

Objective: Generate a sample from the target π(x) Algorithm: Repeat for j = 1, 2,, M proposal density Generate y from q(x j,!) and u from Uniform(0, 1) If ( h i min (y)q(y,x) u apple (x j, y) for (x, y) = (x)q(x,y), 1, if (x)q(x, y) > 0 1, otherwise set x j+1 = y Else, set x j+1 = x j Return values x 1,, x M Implementation: Discard initial m 0 samples as ``burn-in (x, y) =min MCMC: Metropolis-Hastings apple (y) (x), 1 Metropolis: symmetric proposal distribution q(y,x) = q(x,y) Challenge is choosing q(x,!) for effective mixing 23.4% multi-parameter, 44% single parameter, 57.4% Langevin diffusion 10

Bayesian Prediction: Estimation of Predictive Distribution Predictive Distribution Z p ( yte s ytr s )= p ( yte s ytr s, ) ( ytr s ) d Predictive Distribution Estimation Given ( 1,..., M ) ( ytr s ) sampled via MCMC, p ( y s te y s tr ) 1 M µ s te = 1 M s te = 1 M MX p ( yte s ytr, s i ) i=1 MX i=1 MX i=1 µ i te In previous example with uncertain correlation parameters (ξ = φ), estimated predictive distribution is a mixture of t-distributions h i te + µ i te µ s te µ i te µ s te T i 11

p ( y s te y s tr )= Z Bayesian Prediction: Damped Sine Example p ( y s te y s tr, ) ( y s tr ) d 1.5 1 Realizations 0.35 0.3 0.25 Priors: β = 0; λ ~ Gamma(5,5);# φ ~ Beta(1,0.1) y 0.5 0 predictive standard error 0.2 0.15 MAP Bayes 0.1 0.5 Data 0.05 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x Small variation in conditional means compared with average conditional variances s te = v u t 1 M 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x MX h i=1 i te 2 + µ i te µ s te 2 i 12

Observed Failure Depth 350 300 250 200 150 100 50 RMSPE = 17.3 r = 0.98 RMSPE = 17.9 r = 0.98 Bayesian Prediction of 174 Failure Depths 0 50 0 50 100 150 200 250 300 Bayesian Prediction of y(x) Bayesian Prediction: Sheet Metal Pockets Example 6 code inputs 60 training runs 174 test runs Bayesian Prediction Standard Error REML EBLUP Standard Deviation 18 16 14 12 10 8 6 4 Bayesian prediction standard errors tend to be larger than REML-EBLUP standard errors Prior Distributions: β = 0 λ ~ Gamma(5,5) ρ ~ Beta(5,1) Frequentist coverage: Nominal: 95% REML-EBLUP: 61% Bayes: 71% 2 2 4 6 8 10 12 14 Estimated Bayesian Standard Deviation REML-EBLUP Prediction Standard Error 13

Bayesian prediction based on the predictive distribution: π( y new y current ) Derive analytically when possible Bayesian Prediction: Summary Realizations generated from parameter posterior (MCMC) and conditional predictive samples Many MCMC algorithms implemented in software MCMCpack, mcmc, adaptmcmc, AMCMC in R http://cran.r-project.org OpenBUGS, WinBUGS http://www.mrc-bsu.cam.ac.uk/software/bugs/ Delayed Rejection Adaptive Metropolis http://helios.fmi.fi/~lainema/dram/ R package coda provides a suite of MCMC diagnostic tools 14

Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive MCMC, Statistics and Computing, 18, 343-373. Casella, G. and George, E. (1992). Explaining the Gibbs sampler, The American Statistician, 46, 167-174. Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm, The American Statistician, 49, 327-335. Currin, C., Mitchell, T.J., Morris, M.D., and Ylvisaker, D. (1991). Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. Journal of the American Statistical Association, 86, 953-963. Haario, H., Laine, M., and Mira, A. (2006). DRAM: Efficient adaptive MCMC, Statistics and Computing, 16, 339-354. O Hagan, A. (1994). Kendall s Advanced Theory of Statistics, 2B, Bayesian Inference. London: Edward Arnold. Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. New York: Springer. References Santner, T.J., Williams, B.J., and Notz, W.I. (2003). The Design and Analysis of Computer Experiments. New York: Springer. 15