Introduction to Statistical Methods for Understanding Prediction Uncertainty in Simulation Models

Similar documents
Evaluating prediction uncertainty in simulation models

Sensitivity Analysis When Model Outputs Are Functions

J. Beckman, TSA-1 Morrison, TSA-5. May 11-14, m4s

Using Orthogonal Arrays in the Sensitivity Analysis of Computer Models

Polynomial chaos expansions for sensitivity analysis

Evaluating Prediction Uncertainty

18Ï È² 7( &: ÄuANOVAp.O`û5 571 Based on this ANOVA model representation, Sobol (1993) proposed global sensitivity index, S i1...i s = D i1...i s /D, w

Introduction to emulators - the what, the when, the why

Probing the covariance matrix

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

Anale. Seria Informatică. Vol. XIII fasc Annals. Computer Science Series. 13 th Tome 1 st Fasc. 2015

Multiple Predictor Smoothing Methods for Sensitivity Analysis: Example Results

Stat 890 Design of computer experiments

Some methods for sensitivity analysis of systems / networks

Minimizing the Number of Function Evaluations to Estimate Sobol Indices Using Quasi-Monte Carlo

Gaussian Process Regression and Emulation

Kullback-Leibler Designs

A comparison of global sensitivity techniques and sampling method

AN UNCERTAINTY ASSESSMENT METHODOLOGY FOR MATERIALS BEHAVIOUR IN ADVANCED FAST REACTORS

Concepts in Global Sensitivity Analysis IMA UQ Short Course, June 23, 2015

Latin Hypercube Sampling with Multidimensional Uniformity

Authors : Eric CHOJNACKI IRSN/DPAM/SEMIC Jean-Pierre BENOIT IRSN/DSR/ST3C. IRSN : Institut de Radioprotection et de Sûreté Nucléaire

Lectures. Variance-based sensitivity analysis in the presence of correlated input variables. Thomas Most. Source:

BAYESIAN DECISION THEORY

Sensitivity analysis using the Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

A Polynomial Chaos Approach to Robust Multiobjective Optimization

Efficient geostatistical simulation for spatial uncertainty propagation

A GENERAL THEORY FOR ORTHOGONAL ARRAY BASED LATIN HYPERCUBE SAMPLING

Efficient sensitivity analysis for virtual prototyping. Lectures. Thomas Most & Johannes Will

arxiv: v1 [math.st] 23 Sep 2014

Treatment of Implicit Effects with XSUSA.

Statistical Methods for Astronomy

Interpreting Regression Results

Andrew Ford Program in Environmental Science and Regional Planning, Washington State University, USA

However, reliability analysis is not limited to calculation of the probability of failure.

Local and Global Sensitivity Analysis

Uncertainty of the Level 2 PSA for NPP Paks. Gábor Lajtha, VEIKI Institute for Electric Power Research, Budapest, Hungary

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Probability and statistics; Rehearsal for pattern recognition

Advanced Simulation Methods for the Reliability Analysis of Nuclear Passive Systems

Sobol-Hoeffding Decomposition with Application to Global Sensitivity Analysis

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sensitivity analysis and calibration of a global aerosol model

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

Simple Linear Regression: The Model

Experimental Space-Filling Designs For Complicated Simulation Outpts

Keywords: Sonic boom analysis, Atmospheric uncertainties, Uncertainty quantification, Monte Carlo method, Polynomial chaos method.

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Sensitivity Analysis of a Nuclear Reactor System Finite Element Model

Oikos. Appendix A1. o20830

Instrumentation (cont.) Statistics vs. Parameters. Descriptive Statistics. Types of Numerical Data

One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions

Multivariate Distributions

Machine learning for pervasive systems Classification in high-dimensional spaces

HANDBOOK OF APPLICABLE MATHEMATICS

CONSTRUCTION OF NESTED (NEARLY) ORTHOGONAL DESIGNS FOR COMPUTER EXPERIMENTS

Lecture 2: Repetition of probability theory and statistics

Basics of Uncertainty Analysis

Sensitivity analysis in linear and nonlinear models: A review. Introduction

MODIFIED MONTE CARLO WITH LATIN HYPERCUBE METHOD

Sequential Importance Sampling for Rare Event Estimation with Computer Experiments

Review of the role of uncertainties in room acoustics

A Short Course in Basic Statistics

Worst-case design of structures using stopping rules in k-adaptive random sampling approach

BEST ESTIMATE PLUS UNCERTAINTY SAFETY STUDIES AT THE CONCEPTUAL DESIGN PHASE OF THE ASTRID DEMONSTRATOR

Conflicts of Interest

Uncertainty Propagation and Global Sensitivity Analysis in Hybrid Simulation using Polynomial Chaos Expansion

Research the applicability of DOE about optimizing manufacturing process parameters

Reliability analysis of geotechnical risks

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

A construction method for orthogonal Latin hypercube designs

Increasing precision by partitioning the error sum of squares: Blocking: SSE (CRD) à SSB + SSE (RCBD) Contrasts: SST à (t 1) orthogonal contrasts

Local Polynomial Estimation for Sensitivity Analysis on Models With Correlated Inputs

CONSTRUCTION OF NESTED ORTHOGONAL LATIN HYPERCUBE DESIGNS

Statistical Methods for Astronomy

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Part III: Unstructured Data. Lecture timetable. Analysis of data. Data Retrieval: III.1 Unstructured data and data retrieval

Single Maths B: Introduction to Probability

Total interaction index: A variance-based sensitivity index for second-order interaction screening

Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs

A New MCNPX PTRAC Coincidence Capture File Capability: A Tool for Neutron Detector Design

Expert Judgment Elicitation Methods and Tools

SAMPLING IN FIELD EXPERIMENTS

Weighted space-filling designs

CS 5014: Research Methods in Computer Science. Experimental Design. Potential Pitfalls. One-Factor (Again) Clifford A. Shaffer.

Design for Reliability and Robustness through probabilistic Methods in COMSOL Multiphysics with OptiY

Classes of Second-Order Split-Plot Designs

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

Data analysis and Geostatistics - lecture VII

EFFICIENT SHAPE OPTIMIZATION USING POLYNOMIAL CHAOS EXPANSION AND LOCAL SENSITIVITIES

Correlation and Regression

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

IGD-TP Exchange Forum n 5 WG1 Safety Case: Handling of uncertainties October th 2014, Kalmar, Sweden

Bayesian tsunami fragility modeling considering input data uncertainty

Introduction to Basic Statistics Version 2

Claims Reserving under Solvency II

Uncertainty Quantification of EBR-II Loss of Heat Sink Simulations with SAS4A/SASSYS-1 and DAKOTA

TAGUCHI METHOD for DYNAMIC PROBLEMS

Los Alamos NATIONAL LABORATORY

Journal of Statistical Planning and Inference

Transcription:

LA-UR-04-3632 Introduction to Statistical Methods for Understanding Prediction Uncertainty in Simulation Models Michael D. McKay formerly of the Statistical Sciences Group Los Alamos National Laboratory Presented at the 6 th International Topical Meeting on Nuclear Reactor Thermal Hydraulics, Operations and Safety (NUTHOS-6), October 4-8, 2004, Nara, JAPAN

Lecture purpose: Introduce statistical theory and methods of analysis of prediction uncertainty in simulation models. I. Introduction II. Mathematical background Quantifying uncertainty Uncertainty importance and the correlation ratio Estimating the correlation ratio by R 2 III. Application to an environmental pathways model A computer experiment Features of model output Interpreting R 2 s Understanding R 2 s through data plots Cautions IV. Final thoughts and Acknowledgements

Prediction uncertainty for a discrete event simulation model Output is Cumulative Tons of Cargo Delivered Inputs are Use Rate, Fuel Flow, (8 in total). Tons of Cargo Day 15 high Prediction for Day 15 Best estimate calculation and design bounds. Given all the uncertainties, will the delivered cargo fall within the design bounds? low

Model predictions when inputs range over plausible values high Prediction for Day 15 low

It s about connecting-- Connecting physical and mathematical worlds Physical experiment System response w Physical conditions u Nature determines w = M(u). Computer experiment Model prediction y Inputs x Calculation determines y = m(x). System response is a complex quantity W for which y predicts features w = Γ(W). u may not be precisely known (or knowable). Conditions in addition to u might be responsible for w.

Prediction-uncertainty bands and important inputs All 8 inputs vary 1 input fixed at nominal 2 inputs fixed at 2 X 2 = 4 values Full predictionuncertainty band Reduced (conditional) uncertainty bands We want to find inputs that control spread.

Uncertainty due to model inputs D x x m y D y Plausible x values in generate region for y. Small perturbation of x generates perturbation of y. GLOBAL: Probability function over. D x Dx D x Dy f x ( x ) ( ) LOCAL: D y provides uncertainty weighting Probability function f y y induced by model m defines corresponding uncertainty over. D y

Why statistical methods? D x The space of possibilities that generate uncertainties is too big to be enumerated. Suppose uncertainties are due to plausible alternative values of p inputs defined on sets (intervals) characterized by I values (low, high, etc.) p inputs I values # points in input space 30 2 10 9 30 5 10 21 84 5 10 58 30 inputs means there are 435 distinct pairs, 4060 triples, etc., as candidate important subsets of inputs.

Quantifying uncertainty Estimate where y is likely to be and characteristics of its probability distribution, for example: ˆ y 2 D, mean value, variance. µ y Tolerance bounds ( aˆ, b ˆ ) that have probability content p with confidence level 1 α 100%. Histogram or density function and empirical distribution function Fˆ t = Est. Pr y t. y σ y ( ) ( ) { }

Uncertainty importance (McKay 1997 Reliability Engineering and System Safety) Full model prediction using (all) x: y x = m x with x ~ f x ( ) ( ) ( ) s s Partition x = x U x where s {1,2, L, p} selects a subset (1 or more) of input variables. s Best restricted prediction using only x : ( s ) ( s ) s with ~ ( s = ) s y% x E y x x f x x = m x f x x dx ( ) ( ) s x s s s

Uncertainty importance (continued) s How does knowing x reduce uncertainty, or How close is y% to y (on average)? s Measuring uncertainty importance of x by how well it alone, compared to all x, predicts: E y% y y y% 2 ( ) = Var( ) Var( ) leads to the Pearson (1903) correlation ratio 2 η = Var( y% ) / Var( y) which is estimated (by R 2 ) from a sample of runs.

Linking formulas and data All 8 inputs vary 1 input fixed at nominal 2 inputs fixed at 2 X 2 = 4 values Var [ y] [ y % ] ( ) 2 Var h = Var y Var [ y% ]

Estimating the correlation ratio from an appropriate sample of runs s s s Let { x1, L, x n } be n values of x. s s s s For each x, let { x be r values of. i 1, L, x r } x Let { yij i = 1, L, n; j = 1, L, r} be the associated N = n r values from the model computer runs. Then, n r 1 y y estimates E y, r j = 1 = åå nr i = 1 j = 1 ij ( ) r 1 ( s s estimates E ) = ( s y ) i = å yi j y x = xi y % xi,

Estimating the correlation ratio (continued) n 2 åå( y ) ij - y nr [ y] SST = estimates Var, n i= 1 j= 1 r i= 1 j= 1 r 2 SSB = åå( ) estimates Var ( s y E ) i - y nr é y x ù ê ú. ë û y% Finally, Var é ( s ) 2 SSB E y x ù estimates 2 ê ú R = h = ë û. SST Var ( y) Of course, quality (validity) of estimates depends on a proper sampling plan / experimental design.

Experimental designs for estimating R 2 Replicated Latin hypercube sampling (rlhs) for subset size 1. Range of each input is divided into n equal probability intervals. Each interval is (conditionally) sampled once. Values are combined at random across input variables r times for N = n x r design points (r LHSs). Orthogonal array sampling (OAS) for subset size > 1. Like a rlhs but the values are combined in particular patterns, not at random. Usually, small n and larger r.

Application: model of environmental pathways 0. COMPARTMENTS 1. Vegetation surface 2. Vegetation interior 3. Terrestrial invertebrates 4. Small herbivores 5. Large herbivores 6. Insectivores 7. Predators 8. Litter 84 MODEL INPUTS Input Lower Upper Nominal X1 0.0 41.5 15.8 X2 8.50 8.655 8.570

Experimental design for computer experiment p = 84 continuous model inputs discretized at 7 values or levels N = 343 of the possible 7 84 points in D x chosen using orthogonal array sampling (OAS) Note: In some examples that follow, there are 50 or 100 inputs and rlhs instead of OAS was used.

Output of pathways model for Compartment 3 when inputs range over plausible values Concentration Time in days

Candidates for analysis of pathways model output Scalar outputs: Equilibrium concentration (how high): Y 1 = C max Time to Equilibrium (how fast): Y 2 = t max = t @ 0.9 C max Time dependent outputs: C(t) Normalized: C(t) / C max where C max = C(t max ) Standard time: C * (u) = C(u x t max ) / C max, 0 u 1

In the best of worlds: Pattern of ordered R 2 s for C max for 100 inputs from a sample of size 5000 R 2 Pattern suggests 4 groups of indistinguishable inputs Inputs ordered by R 2

What inputs do: Patterns of y% (average y) for top 9 inputs 68 1 69 Average output level 24 84 63 35 83 67

In the not-so-best of worlds: Pattern of ordered R 2 s for C max for 100 inputs from a sample of size 10 R 2 Pattern suggests 1 group of indistinguishable inputs Inputs ordered by R 2

In a real world: R 2 s for the two scalar outputs with N = 343 R 2 s for t max R 2 s for C max Inputs ordered by R 2

Histogram of values of tmax

Values of t max plotted by values of inputs

In a real world: R 2 s for the two scalar outputs with N = 343 R 2 s for t max R 2 s for C max Inputs ordered by R 2

Histogram of values of Cmax

Values of C max plotted by value of inputs

Joint R 2 s for C max and Aliasing with OAS

R 2 s for Concentration as a function of time X24 R 2 X63 Time in days

Some main points Uncertainty quantification requires connecting physical and mathematical worlds. In the mathematical world, variation in inputs x is propagated by the simulation model to variation in outputs y. Variance of y is a measure of prediction uncertainty induced by inputs. Uncertainty importance of a subset of inputs refers to their contribution to prediction uncertainty.

Acknowledgements DECOMPOSITIONS H. H. Panjer (1973), On the Decomposition of Moments by Conditional Moments, The American Statistician, 27, 170-171. CORRELATION RATIO R. L. Iman and S. C. Hora (1990), A Robust Measure of Uncertainty Importance for Use in Fault Tree System Analysis, Risk Analysis, 10, 3, 401-406. B. Krzykacz (1990), SAMOS: A Computer Program for the Derivation of Empirical Sensitivity Measures of Results from Large Computer Models., GRS-A-1700, Gesellschaft fur Reaktorsicherheit (GRS) mbh, Garching, Republic of Germany. K. Pearson (1903), Mathematical Contributions to the Theory of Evolution, Proceedings of the Royal Society of London, 71, 288-313. SAMPLING PLANS M. D. McKay, W. J. Conover and R. J. Beckman (1979), A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code, Technometrics, 21, 2, 239-245. (Latin hypercube sampling) Art B. Owen (1992), Orthogonal Arrays for Computer Integration and Visualization, Statistica Sinica, 2, 2, 439-452. GENERAL REFERENCES M. D. McKay (1995), Evaluating Prediction Uncertainty, NUREG/CR-6311, U.S. Nuclear Regulatory Commission and Los Alamos National Laboratory Report. M. D. McKay, J. D. Morrison, S. C. Upton (1999), Evaluating Prediction Uncertainty in Simulation Models, Computer Physics Communications, 117, 44-51. A. Saltelli, K. Chan, E.M. Scott (Eds.), Sensitivity Analysis, John Wiley and Sons, Ltd. (2000).