Analysis of Regression and Bayesian Predictive Uncertainty Measures

Similar documents
Analysis of regression confidence intervals and Bayesian credible intervals for uncertainty quantification

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Xiaoqing Shi Ming Ye* Stefan Finsterle Jichun Wu

for Complex Environmental Models

Bayesian Regression Linear and Logistic Regression

Statistical Methods for Particle Physics Lecture 4: discovery, exclusion limits

Bayesian Inference: Concept and Practice

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Sampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

MAXIMUM LIKELIHOOD, SET ESTIMATION, MODEL CRITICISM

Finite Population Correction Methods

Markov Chain Monte Carlo methods

Bayesian Methods for Machine Learning

Bayesian Econometrics

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

Introduction to Probability and Statistics (Continued)

Quantile POD for Hit-Miss Data

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Statistical Practice

STA 4273H: Sta-s-cal Machine Learning

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian Modeling of Accelerated Life Tests with Random Effects

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Statistical techniques for data analysis in Cosmology

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Multivariate statistical methods and data mining in particle physics

When using physical experimental data to adjust, or calibrate, computer simulation models, two general

Inference when identifying assumptions are doubted. A. Theory B. Applications

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Primer on statistics:

Theory and Methods of Statistical Inference

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Effect of correlated observation error on parameters, predictions, and uncertainty

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

A Note on Bayesian Inference After Multiple Imputation

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Machine Learning 4771

The Bayesian Approach to Multi-equation Econometric Model Estimation

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Contents. Part I: Fundamentals of Bayesian Inference 1

Inference when identifying assumptions are doubted. A. Theory. Structural model of interest: B 1 y t1. u t. B m y tm. u t i.i.d.

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference and MCMC

A Bayesian Treatment of Linear Gaussian Regression

Probing the covariance matrix

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Bayesian Methods in Multilevel Regression

Fundamental Probability and Statistics

A MultiGaussian Approach to Assess Block Grade Uncertainty

CSC 2541: Bayesian Methods for Machine Learning

Basics of Uncertainty Analysis

Bivariate Degradation Modeling Based on Gamma Process

PIRLS 2016 Achievement Scaling Methodology 1

New Bayesian methods for model comparison

Nonlinear Model Reduction for Uncertainty Quantification in Large-Scale Inverse Problems

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling

Dynamic System Identification using HDMR-Bayesian Technique

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

STA 4273H: Statistical Machine Learning

Using training sets and SVD to separate global 21-cm signal from foreground and instrument systematics

Imperfect Data in an Uncertain World

ML estimation: Random-intercepts logistic model. and z

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

STA 4273H: Statistical Machine Learning

A Likelihood Ratio Test

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Probabilistic Machine Learning. Industrial AI Lab.

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Bayesian Inference. Chapter 1. Introduction and basic concepts

CE 3710: Uncertainty Analysis in Engineering

Outline Lecture 2 2(32)

Default Priors and Effcient Posterior Computation in Bayesian

Reliability Monitoring Using Log Gaussian Process Regression

STA414/2104 Statistical Methods for Machine Learning II

STAT 518 Intro Student Presentation

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

Best Linear Unbiased Prediction: an Illustration Based on, but Not Limited to, Shelf Life Estimation

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Fast Likelihood-Free Inference via Bayesian Optimization

Statistics for the LHC Lecture 2: Discovery

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Obtaining Uncertainty Measures on Slope and Intercept

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

A novel determination of the local dark matter density. Riccardo Catena. Institut für Theoretische Physik, Heidelberg

Modelling Operational Risk Using Bayesian Inference

David Giles Bayesian Econometrics

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Transcription:

Analysis of and Predictive Uncertainty Measures Dan Lu, Mary C. Hill, Ming Ye Florida State University, dl7f@fsu.edu, mye@fsu.edu, Tallahassee, FL, USA U.S. Geological Survey, mchill@usgs.gov, Boulder, CO, USA ABSTRACT Predictive uncertainty can be quantified using confidence and probability intervals constructed around predictions. Confidence intervals are based on regression inferential theory; probability intervals are based on theory. For the confidence intervals, this work considered linear and nonlinear confidence intervals obtained using methods that require tens and hundreds of model runs, respectively. The probability intervals are obtained using Markov Chain Monte Carlo (MCMC) methods that require,s of model runs. Confidence and probability intervals are conceptually different and only mathematically equivalent under certain conditions. We use simple test cases to show that for linear models, the two types of intervals are mathematically equivalent with proper choices of prior probability. However, for nonlinear models, regardless of choice of prior probability, the two types of intervals are always different. The discrepancy depends on the model total nonlinearity. Therefore, it is inappropriate to use the two intervals to validate each other, as has been done in previous practice. INTRODUCTION Groundwater modeling is often used to predict effects of future anthropomorphic or natural occurrences. Since modeling predictions are inherently uncertain, quantification of predictive uncertainty is necessary. Confidence intervals and probability intervals constructed around the predictions can be used as measures of predictive uncertainty. Confidence intervals are based on inferential statistical theory from regression; linear and nonlinear intervals can be calculated. Probability intervals are based on theory. Markov chain Monte Carlo (MCMC) has been popular for estimating the probability intervals. While comparative studies of the two types of predictive uncertainty measures have been conducted (e.g., Vrugt and Bouten, ; Gallagher and Doherty, 7), underlying theoretical differences remain unclear. The purpose of this work is to compare the two kinds of predictive uncertainty measures by investigating their theoretical differences. To illustrate these differences, we consider a set of simple test cases with linear and nonlinear models. CONFIDENCE INTERVALS AND PROBABILITY INTERVALS Confidence intervals, from the frequentist point of view, represent the percent of the time in repeated sampling that the confidence intervals contain true predictions. To understand this better, consider the procedure of evaluating the confidence intervals. It involves first sampling N sets of observations based on distribution of errors and then calculating the confidence intervals (with confidence level -)for a certain prediction function based on the generated N sets of observations. For the N intervals, (- % of the intervals contain the true value. For linear intervals, the portions that the true value is either larger than the upper or smaller than the lower confidence limits are equal, being /. For nonlinear intervals the portions are not necessarily equal. The probability intervals, inferred from theory, represent posterior probability that the predictions lies in the interval. In statistics, a prediction is thought of as a random variable with its own distribution. The posterior distribution summarizes the state of knowledge about the unknown prediction conditional on the prior and current data. The narrower the distribution is, the greater our knowledge about the prediction. The amount is measured by the probability interval, which is a probabilistic region around posterior statistics such as posterior mean. They are calculated here using Markov Chain Monte Carlo (MCMC) methods that generate the entire posterior probability distribution from which the intervals are determined.

RELATIONSHIP BETWEEN CONFIDENCE AND PROBABILITY INTERVALS First consider a linear model, y Xβε, with n observations in the vector y, p unknown true parameters in the vector β, true random errors in the vector. The random error is assumed to be multivariate Gaussian, i.e., ε Nn (, C), where C ω and is the weight matrix used in objective function of inverse modeling. The estimates of β are multivariate Gaussian, i.e., ˆ * T β N p ( β, ( X ωx) ), where X is sensitivity matrix. Consider a linear prediction function g( β) Zβ. Using regression theory, the ( ) % confidence interval on the prediction (assuming that the model correctly represents reality) is given for two circumstances with unknown and known σ. When σ is unknown, and it is estimated by the calculated variance ˆ T s ( yxβ) ω( yxβ ˆ)/( n p), the distribution of g( β ˆ) is t-distribution, and the confidence interval is (Hill and Tiedeman, 7) ˆ T T / g( β) t /( n p)[ s Z ( X ωx) Z ] () where t -/ (n-p) is a t statistic with significance level and degrees of freedom equal to (n-p). When σ is known, the distribution of g( β ˆ) is normal distribution, the confidence interval is (McClave and Sincich, ) ˆ T T / g( β) z /[ Z ( X C X) Z ] () where z / is the z statistic with significance level. In statistics with noninformative priors for which p( β) constant and p( ) /, the posterior distribution of g( β ) is multivariate t-distribution. Thus, the ( ) % probability intervals for g( β) are the same as those of equation () derived from regression theories. In the same context, for informative conjugate prior with p( β) N p ( β p,c p ), and assume σ is known, the posterior distribution of g( β ) is multivariate normal. Thus, the ( ) % probability interval for g( β ) (assuming that the model correctly represents reality) is given as (McLaughlin and Townley, 996) ' T T / g( β p) z /[ Z ( X C XCp ) Z ] (3) As C p I, equation (3) reduces to equation (). The only difference is that the prediction is evaluated ' for β p the posterior mean, as determined from theory, instead of ˆβ the least square estimate, as determined from regression theory. For a linear problem, the two quantities of parameters are the same and equations () and (3) produce the same intervals. For a nonlinear model y f ( β) ε with parameters β, errors ε Nn(, C) with known C, based on theorem, with noninformative prior, the posterior density of parameter β is (Berger, 985) exp[log p( y β)] p( β y) (4) exp[log p( y)] dβ Consider a Taylor series expansion of log p( y β ) about ˆβ to the second order term, where ˆβ maximizes the log likelihood, log p( y β ). Then equation (4) is approximated by:

where ˆ ˆ T exp log ( ) ( ) ( ˆ)( ˆ p y β ββ I β ββ) p( β y) ˆ exp log ( ) ( ˆ T ) ( ˆ)( ˆ p y β ββ I β ββ) d β exp ( ˆ T ) I( ˆ)( ˆ ββ β ββ) p/ ˆ / ( ) I( β) ˆ log p( y β) I( β) T ββ β β ˆ β Xβ ), the posterior density ˆ ˆ p( β y) Nn β, [ I( β)] is the Fisher information matrix. When the model is linear (i.e., f ( ) is exact with [ ( ˆ T I β)] XC X. In this case, the probability interval of g( β ) from posterior distribution is mathematically equivalent with its confidence interval in regression as shown in equation (). However, if the model is highly nonlinear as indicated by large total nonlinearity, ignoring the higher order terms can cause significant error. In this case, confidence and probability intervals can be very different. The difference depends on the size of the higher order terms, which is reflected in the skew of the distribution. In addition to the linear confidence intervals above, nonlinear confidence intervals are also available from regression theories (Vecchia and Cooley, 987; Cooley, 4; Hill and Tiedeman, 7) that should be able to account for higher order terms resulted from model linearization. Nonlinear intervals can be calculated using likelihood method of Vecchia and Cooley (987). It determines the minimum and maximum values of prediction over a confidence region on the parameter set. The confidence region is defined in p-dimensional parameter space and has a specified probability of containing the true set of parameter values, as illustrated in Figure. (5) SA Figure : Geometry of a nonlinear confidence interval on prediction g(b). The parameter confidence region (shaded area), contours of constant g(b) (dashed lines), and locations of the minimum (g(b)=c, with b=b L ) and maximum (g(b)=c 4, with b=b U ) values of the prediction on the confidence region are shown. The lower and upper limits of the nonlinear confidence interval on prediction g(b) are thus c and c 4, respectively. (Adapted from Hill and Tiedeman, 7, Figure 8.3.) The method for computing nonlinear confidence intervals involves first defining the (-)-percent parameter confidence region. This region is defined as the set of parameter values for which the objective-function values, S(b), satisfy the following condition: ' / S( b) S( b ) s t ( n p) (6) Nonlinear intervals are also shown in the results below for simple test cases.

SIMPLE TEST CASES To compare the predictive uncertainty measures of confidence intervals in regression and probability intervals from theory, we apply the three measures, linear and nonlinear confidence intervals and probability intervals, to two simple test cases. In both test cases, we employ MCMC implemented in MICA code (Doherty, 3) to calculate the probability intervals. Linear Test Case Linear model y axbε, with parameters a= and b=3 and true errors i N(,) conjugate prior of the two parameters with C p. We consider I. Twenty data (x=,, ) are used to calibrate model and the calibrated model is used to predict the point at x=3. Nonlinear Test Case In the nonlinear test problem, the model is y x/ asin( abx) ε. All the other conditions are the same as those of the linear test problem. Cumulative distribution function F(a).8.6. Linear Model (a).8.9...3 Parameter a Cumulative distribution function F(a).8.6. Nonlinear Model (d).5.5 3 Parameter a Cumulative distribution function F(b).8.6. (b) 3 4 5 Parameter b Cumulative distribution function F(b).8.6. (e) -....3 Parameter b Cumulative distribution function F(y).8.6. (c) 6 6 64 66 68 Prediction y Cumulative distribution function F(y).8.6. (f) 5 5 5 Prediction y Figure : Cumulative distribution functions of parameters and prediction based on regression and theory for parameter a and b, and prediction y in both linear simple test case (a, b, and c); and nonlinear simple test case (d, e, and f). Figure 3: The nonlinear confidence interval limits (red dots), the minimum and maximum values of prediction (red lines), the confidence region of parameter set bounded by the objective function goal (black contour); the probability interval limits (blue dot), where the upper.5% and lower.5% prediction values include parameter samples indicated by green dots from MCMC, and the median 95% prediction values include the samples indicated by yellow dots. Figure plots the cumulative distribution functions (CDFs) of the parameters and prediction for the linear and nonlinear test cases. The left panel of Figure confirms that the distributions of parameters and prediction from regression and theory are identical in the linear model case, as the mathematical theory above indicates. Therefore, for the linear model, the confidence and probability intervals are equivalent. However, for the nonlinear model, due to nonzero higher order derivatives of the likelihood function that are discarded in equation (5), these two intervals are distinct. In this case, the probability interval is smaller than the linear confidence interval, as shown in the right panel of Figure. And it is also smaller than the nonlinear confidence interval as illustrated in Figure 3. In Figure 3, the

black ellipse represents the 95% confidence region of the true parameters, the black star at the center of the ellipse. The red lines are model evaluations that intersect with the ellipse, and the intersections are the maximum and maximum values of the prediction (specific to the confidence region). The yellow and green dots are parameter samples obtained from MCMC simulation. Model predictions of these samples are first sorted and the threshold parameters values of the.5% and 97.5% percentiles of the predictions are identified. Their corresponding model evaluations are plotted in blue lines in Figure 3. Figure 3 shows the discrepancy between nonlinear confidence interval determined by the minimum and maximum values of prediction over a confidence region on the parameter set and probability interval from MCMC samples. CONCLUSIONS This work includes theoretical analysis and numerical experiments (using simple test cases) for comparing the confidence intervals based on regression theory and probability intervals based on theory. For linear models, the two types of intervals are mathematically and numerically equivalent only with noninformative prior information. However, for the nonlinear models, the confidence intervals and probability intervals are distinct mathematically and numerically. Their discrepancy depends on the model total nonlinearity. For groundwater models that are always nonlinear, it is not appropriate to validate the confidence intervals and probability intervals for each other. ACKNOWLEDGMENTS The authors thank John Doherty for providing the MICA code of MCMC simulation. This work was supported in part by NSF-EAR grant 974 and DOE-SBR grant DE-SC687. REFERENCES Berger, J.O., 985. Statistical decision theory and analysis, nd edition, Springer. Cooley, R. L., 4. A theory for modelling groundwater flow in heterogeneous media, U. S. Geological Survey Professional Paper 979. Doherty, J., 3. MICA: model-independent Markov Chain Monte Carlo analysis, Watermark Numerical Computing, Brisbane, Australia. Gallagher M., Doherty J., 7. Parameter estimation and uncertainty analysis for a watershed model, Environmental Modelling and Software,, -. Hill, M.C., Tiedeman C., 7. Effective calibration of ground water models, with analysis of data, sensitivities, predictions, and uncertainty, John Wiley, New York. McClave, J.T., Sincich T.,. Statistics, 8 th edition, Prentice Hall. McLaughlin, D., Townley L.R., 996. A reassessment of the groundwater inverse problem, Water Resour. Res., 3(5), 3-6. Vecchia, A.V., Cooley R.L., 987. Simultaneous confidence and prediction intervals for nonlinear regression models with application to a groundwater flow model, Water Resour. Res., 3(7), 37-5. Vrugt, J. A., Bouten W.,. Validity of first-order approximations to describe parameter uncertainty in soil hydraulic models, Soil Sci. Soc. Am. J. 66:74-75.