Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Similar documents
Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Distributions, spatial statistics and a Bayesian perspective

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Drought damaged area

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Smoothing, penalized least squares and splines

CS 109 Lecture 23 May 18th, 2016

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Simple Linear Regression (single variable)

Comparing Several Means: ANOVA. Group Means and Grand Mean

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

Tree Structured Classifier

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

Pattern Recognition 2014 Support Vector Machines

Computational modeling techniques

A Scalable Recurrent Neural Network Framework for Model-free

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

Introduction to Regression

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

Data Analysis, Statistics, Machine Learning

Kinetic Model Completeness

5.60 Thermodynamics & Kinetics Spring 2008

A Matrix Representation of Panel Data

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

CHM112 Lab Graphing with Excel Grading Rubric

Experiment #3. Graphing with Excel

SAMPLING DYNAMICAL SYSTEMS

Lecture 8: Multiclass Classification (I)

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Part 3 Introduction to statistical classification techniques

, which yields. where z1. and z2

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

Preparation work for A2 Mathematics [2017]

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Materials Engineering 272-C Fall 2001, Lecture 7 & 8 Fundamentals of Diffusion

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Thermodynamics Partial Outline of Topics

How do we solve it and what does the solution look like?

Computational modeling techniques

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)

Rigid Body Dynamics (continued)

ENSC Discrete Time Systems. Project Outline. Semester

Statistics, Numerical Models and Ensembles

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Data Mining Techniques

ELT COMMUNICATION THEORY

What is Statistical Learning?

Artificial Neural Networks MLP, Backpropagation

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

15-381/781 Bayesian Nets & Probabilistic Inference

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

TEST 3A AP Statistics Name: Directions: Work on these sheets. A standard normal table is attached.

Find this material useful? You can help our team to keep this site up and bring you even more content consider donating via the link on our site.

NUMBERS, MATHEMATICS AND EQUATIONS

Verification of Quality Parameters of a Solar Panel and Modification in Formulae of its Series Resistance

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Least Squares Optimal Filtering with Multirate Observations

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

Exam #1. A. Answer any 1 of the following 2 questions. CEE 371 October 8, Please grade the following questions: 1 or 2

Bayesian nonparametric modeling approaches for quantile regression

Microfacet models for refraction through rough surfaces

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

Measurement of Radial Loss and Lifetime. of Microwave Plasma in the Octupo1e. J. C. Sprott PLP 165. Plasma Studies. University of Wisconsin DEC 1967

Determining the Accuracy of Modal Parameter Estimation Methods

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

City of Angels School Independent Study Los Angeles Unified School District

BASIC DIRECT-CURRENT MEASUREMENTS

CHEM Thermodynamics. Change in Gibbs Free Energy, G. Review. Gibbs Free Energy, G. Review

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

A Few Basic Facts About Isothermal Mass Transfer in a Binary Mixture

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Chapter 3: Cluster Analysis

Kinetics of Particles. Chapter 3

Differentiation Applications 1: Related Rates

Homework 1 AERE355 Fall 2017 Due 9/1(F) NOTE: If your solution does not adhere to the format described in the syllabus, it will be grade as zero.

7 TH GRADE MATH STANDARDS

Lecture 24: Flory-Huggins Theory

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

Linear programming III

IAML: Support Vector Machines

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

Checking the resolved resonance region in EXFOR database

ON-LINE PROCEDURE FOR TERMINATING AN ACCELERATED DEGRADATION TEST

OF SIMPLY SUPPORTED PLYWOOD PLATES UNDER COMBINED EDGEWISE BENDING AND COMPRESSION

Tutorial 4: Parameter optimization

Data Mining Techniques

Randomized Quantile Residuals

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Transcription:

Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm materials prepared by Jerry Sacks and Will Welch fr varius shrt curses Acadia/SFU/UBC Curse n Dynamic Cmputer Experiments September December 2014 J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 1 / 20

Outline f Tpics Outline 1 Estimating the Parameters f the GP Mdel 2 Case Study: G-Prtein Cmputer Experiment 3 Measuring Predictin Accuracy 4 GP Diagnstics 5 Summary 6 Appendix J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 2 / 20

Estimating the Parameters f the GP Mdel Parameters f the Gaussian Prcess (GP) Mdel Recall frm Mdule 2 that the Gaussian prcess prir fr y(x) = y(x 1,, x d ) has hyper-parameters: mean, µ, variance, σ 2 crrelatin parameters, eg, θ 1,, θ d and p 1,, p d fr the pwer-expnential crrelatin functin, R(x, x ) = d exp( θ j x j x j p j ) j=1 Their values will be chsen t be cnsistent with the cmputer-mdel runs J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 3 / 20

Estimating the Parameters f the GP Mdel Maximum Likelihd Recall als that y(x) is assumed t be Gaussian Hence, y = [y(x (1) ),, y(x (n) )] T, the data frm the cmputer mdel, are a sample frm a multivariate-nrmal distributin The likelihd, L(y µ, σ 2, θ 1,, θ d, p 1,, p d ), is 1 (2πσ 2 ) n/2 det 1/2 (R) exp( 1 2σ 2 (y µ1)t R 1 (y µ1)) Maximum likelihd estimatin (MLE) chses the hyper-parameters t maximize this Or use Bayes rule t get a psterir distributin fr the hyper-parameters and fr predictins f y(x) (see Appendix A) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 4 / 20

Estimating the Parameters f the GP Mdel Maximum Likelihd: Cmputatin Fr fixed crrelatin parameters, and ˆµ = 1T R 1 y 1 T R 1 1 σ 2 = 1 n (y ˆµ1)T R 1 (y ˆµ1) The likelihd functin (with ˆµ and σ 2 substituted) has t be numerically maximized wrt the crrelatin parameters J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 5 / 20

Case Study: G-Prtein Cmputer Experiment G-Prtein Cmputer Mdel Bisystems mdel fr s-termed ligand activatin f G-prtein in yeast d = 4 input variables x is cncentratin f ligand u 1,, u 8 is a vectr f 8 kinetic parameters (nly u 1, u 6, and u 7 are varied) Output variable y is the nrmalized cncentratin f part f the cmplex J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 6 / 20

Case Study: G-Prtein Cmputer Experiment G-Prtein System Dynamics: Differential Equatins 1 η 1 = u 1 η 1 x + u 2 η 2 u 3 η 1 + u 5 2 η 2 = u 1 η 1 x u 2 η 2 u 4 η 2 3 η 3 = u 6 η 2 η 3 + u 8 (G tt η 3 η 4 )(G tt η 3 ) 4 η 4 = u 6 η 2 η 3 u 7 η 4 5 y = (G tt η 3 )/G tt where η 1,, η 4 are cncentratins f 4 chemical species and η 1 η 1 t, etc G tt = (fixed) ttal cncentratin f G-prtein cmplex after 30 secnds J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 7 / 20

Case Study: G-Prtein Cmputer Experiment Inputs and Cde Runs Input variables d = 4 variables Wrk with lg(x), lg(u 1 ), lg(u 6 ), lg(u 7 ) ie, what we called the x vectr befre is lg(x), lg(u 1 ), lg(u 6 ), and lg(u 7 ) here All input variable ranges are nrmalized t [0, 1] n the lg scale Number f runs n = 33 (this chice and the design fr the 33 runs is described in Mdule 4) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 8 / 20

Case Study: G-Prtein Cmputer Experiment Cmputer Mdel Data ymd 02 04 06 08 ymd 02 04 06 08 00 02 04 06 08 10 lgu1 00 02 04 06 08 10 lgu6 ymd 02 04 06 08 ymd 02 04 06 08 00 02 04 06 08 10 lgu7 00 02 04 06 08 10 lgx J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 9 / 20

Case Study: G-Prtein Cmputer Experiment Gaussian Prcess (GP) Mdel y(x) is a realizatin f a Gaussian prcess with: mean µ variance σ 2 crrelatins given by Cr(y(x), y(x )) R(x, x ) = The parameters in red need t be estimated 4 j=1 e θ j x j x j p j J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 10 / 20

Case Study: G-Prtein Cmputer Experiment Maximum Likelihd Estimates ˆµ = 036 ˆσ 2 = 051 Variable ˆθ ˆp lg(x) 0929 198 lg(u 1 ) 0179 2 lg(u 6 ) 0082 2 lg(u 7 ) 0083 2 It is difficult t interpret the magnitudes f the estimates (we will revisit this example in Mdule 5 and d a sensitivity analysis) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 11 / 20

Measuring Predictin Accuracy Plug-In Predictin and Standard Errr Replace all hyper-parameters by their MLEs in the cnditinal mean and variance frmulas: and predictin f y(x) = ŷ = ˆm(x) = ˆµ + r T (x)r 1 (y ˆµ1) estimated variance f predictin = ˆv(x) = σ 2 (1 r T (x)r 1 r(x)) (R and r(x) are als estimates) The plug-in estimated variance ignres uncertainty in estimating the hyper-parameters It can be adapted t include uncertainty frm estimating µ: ˆv(x) = σ 2 ( 1 r T (x)r 1 r(x) + [1 1T R 1 r(x)] 2 1 T R 1 1 This plug-in frmula is ften used t give a standard errr, ie, s(x) = ˆv(x) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 12 / 20 )

Measuring Predictin Accuracy Measures f Accuracy We culd rely n the standard errr, ˆv(x) If we have m test data bservatins, the rt mean squared errr (RMSE) f predictin is 1 RMSE = (ŷ y(x)) m 2 But rarely available Crss validatin (CV) test pts J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 13 / 20

Crss Validatin (CV) GP Diagnstics Let x (i) dente x fr run i in the data (i = 1,, n) Fr run i: The crss validated predictin f y(x (i) ) is ŷ i (x (i) ), ie, ŷ(x) = ˆm(x) cmputed frm the n 1 runs excluding run i The crss validated standard errr f ŷ i (x (i) ) is s i (x (i) ), ie, s(x) = ˆv(x) cmputed frm the n 1 runs excluding run i The crss-validated residual fr run i is y(x (i) ) ŷ i (x (i) ) The standardized crss-validated residual fr run i is y(x (i) ) ŷ i (x (i) ) s i (x (i) ) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 14 / 20

Diagnstic Plts GP Diagnstics Plt the crss-validated residuals t assess the verall magnitude f errr Plt the standardized crss-validated residuals t assess the validity f the standard errr fr individual predictins J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 15 / 20

GP Diagnstics G-Prtein Diagnstic Plts ymd 02 04 06 08 Standardized residual 4 2 0 2 4 02 04 06 08 Predicted ymd 02 04 06 08 Predicted ymd J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 16 / 20

GP Diagnstics Crss Validatin: Numerical Summaries Magnitude f errr The crss-validated rt mean squared errr is 1 CVRMSE (y(x n (i) ) ŷ i (x (i) )) 2 = 020 Maximum crss-validated residual is 044 Fairly accurate relative t a range f abut 07 in y Standard errrs? y(x(i) ) ŷ i (x (i) ) fr i = 1,, n are rughly in ( 2, 2) s i (x (i) ) Standard errrs lk reliable J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 17 / 20

Fast and Slw CV GP Diagnstics When run i is remved, the hyper-parameters shuld be re-estimated Fr cmputatinal reasns the crrelatin parameters are ften nt updated (it is cheap t update the estimates f µ and σ 2 ), prducing a fast CV Fr slw CV, d say 10-fld crss-validatin, re-estimating all hyper-parameters The agreement between fast CVRMSE and slw CVRMSE is ften gd The agreement between fast CVRMSE and the RMSE frm test pints has been gd in examples J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 18 / 20

Mdule Summary Summary The GP mdel has t be tuned t data s that its prperties match thse f the cmputer mdel Tuning (fitting) the GP by maximum likelihd is cmputatinally feasible fr up t abut n = 1000 runs and d = 50 input variables GP mdel gives an apprximatin and a measure f accuracy The measure f accuracy (standard errr) can be checked fr validity by crss validatin J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 19 / 20

Appendix Appendix A: Bayesian Treatment f the Hyper-parameters Psterir distributin f the hyper-parameters ( hyper belw), µ, σ 2, θ 1,, θ d, etc, f the GP Frm Bayes rule, given the data y p(hyper y) π(hyper)l(y hyper), π(hyper) is the prir fr hyper L(y hyper) is the multivariate nrmal likelihd Predictive distributin fr y(x) at a new x p(y(x) y) = p(y(x) y, hyper)p(hyper y) dhyper Usually, the integratin is nt carried ut explicitly Rather, prperties such as the psterir predictive mean and variance f p(y(x 0 ) y) are btained by MCMC sampling f the psterir distributin fr the hyper-parameters, p(hyper y) J Sacks and WJ Welch (NISS & UBC) Mdule 3: Estimatin and Uncertainty Cmputer Experiments 2014 20 / 20