Weighted space-filling designs

Similar documents
Stat 890 Design of computer experiments

A Polynomial Chaos Approach to Robust Multiobjective Optimization

Assessing Atmospheric Releases of Hazardous Materials

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Randomized Quasi-Monte Carlo for MCMC

Joint Gaussian Graphical Model Review Series I

arxiv: v1 [stat.me] 10 Jul 2009

Latin Hypercube Sampling with Multidimensional Uniformity

CONSTRUCTION OF SLICED SPACE-FILLING DESIGNS BASED ON BALANCED SLICED ORTHOGONAL ARRAYS

Efficient geostatistical simulation for spatial uncertainty propagation

Bayesian optimal design for Gaussian process models

Multiple Random Variables

Introduction to Probability and Stocastic Processes - Part I

Information geometry for bivariate distribution control

Design of experiments for generalized linear models with random block e ects

Exercises with solutions (Set D)

Professor David B. Stephenson

USING METEOROLOGICAL ENSEMBLES FOR ATMOSPHERIC DISPERSION MODELLING OF THE FUKUSHIMA NUCLEAR ACCIDENT

Kullback-Leibler Designs

Stratified Random Sampling for Dependent Inputs

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Quasi-Monte Carlo Methods for Applications in Statistics

A spatially explicit modelling framework for assessing ecotoxicological risks at the landscape scale

Some methods for sensitivity analysis of systems / networks

Review: mostly probability and some statistics

Continuous Random Variables

Sensitivity analysis in linear and nonlinear models: A review. Introduction

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

A GENERAL CONSTRUCTION FOR SPACE-FILLING LATIN HYPERCUBES

Hybrid Dirichlet processes for functional data

Bayesian hierarchical modelling for data assimilation of past observations and numerical model forecasts

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Uncertainty in energy system models

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Source Term Estimation for Hazardous Releases

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 5: Bivariate Correspondence Analysis

CHAPTER 1 INTRODUCTION

Multivariate Random Variable

Experimental Space-Filling Designs For Complicated Simulation Outpts

The Gaussian distribution

Statistical Analysis of Initial-condition Constraints and Parametric Sensitivity

CERTAIN THOUGHTS ON UNCERTAINTY ANALYSIS FOR DYNAMICAL SYSTEMS

Lecture 3. Conditional distributions with applications

Negative Association, Ordering and Convergence of Resampling Methods

Conditional Distributions

Chapter 5 continued. Chapter 5 sections

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

A Marshall-Olkin Gamma Distribution and Process

Gaussian Process Regression and Emulation

Gaussian Processes for Computer Experiments

Mathematical Preliminaries

Robustness to Parametric Assumptions in Missing Data Models

ON SMALL SAMPLE PROPERTIES OF PERMUTATION TESTS: INDEPENDENCE BETWEEN TWO SAMPLES

ECON 3150/4150, Spring term Lecture 6

One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions

Max stable Processes & Random Fields: Representations, Models, and Prediction

Combining Interval and Probabilistic Uncertainty in Engineering Applications

A Conditional Approach to Modeling Multivariate Extremes

Introduction to Statistical Methods for Understanding Prediction Uncertainty in Simulation Models

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information

Computer simulation methods (1) Dr. Vania Calandrini

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

Multivariate Non-Normally Distributed Random Variables

Copulas. MOU Lili. December, 2014

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Akaike Information Criterion

Bayesian Inference by Density Ratio Estimation

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Hidden Markov Models for precipitation

A Flexible Strategy for Augmenting Design Points For Computer Experiments

Recitation 2: Probability

Extreme Value Analysis and Spatial Extremes

Anale. Seria Informatică. Vol. XIII fasc Annals. Computer Science Series. 13 th Tome 1 st Fasc. 2015

SELECTING LATIN HYPERCUBES USING CORRELATION CRITERIA

Expectation and Variance

A new Hierarchical Bayes approach to ensemble-variational data assimilation

1: PROBABILITY REVIEW

Learning Bayesian network : Given structure and completely observed data

Authors : Eric CHOJNACKI IRSN/DPAM/SEMIC Jean-Pierre BENOIT IRSN/DSR/ST3C. IRSN : Institut de Radioprotection et de Sûreté Nucléaire

Importance Sampling to Accelerate the Convergence of Quasi-Monte Carlo

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Modelling trends in the ocean wave climate for dimensioning of ships

Safety checks and hit probability computation in numerical trajectory calculation using error propagation technology

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

CTBT: Science and Technology 2017 Conference June 2017, Vienna, Austria

LOGNORMAL ORDINARY KRIGING METAMODEL

Center-based initialization for large-scale blackbox

Toward Effective Initialization for Large-Scale Search Spaces

Simulation of Max Stable Processes

Expert judgement and uncertainty quantification for climate change

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Journal of Statistical Planning and Inference

Random Variables and Their Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Multivariate Normal-Laplace Distribution and Processes

Sobol-Hoeffding Decomposition with Application to Global Sensitivity Analysis

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

STAT 430/510: Lecture 16

Transcription:

Weighted space-filling designs Dave Woods D.Woods@southampton.ac.uk www.southampton.ac.uk/ davew Joint work with Veronica Bowman (Dstl) Supported by the Defence Threat Reduction Agency 1 / 34

Outline Motivation & background Designs incorporating prior information Illustrative examples Application to dispersion modelling Graphical method of design evaluation Work in progress... 2 / 34

Dispersion modelling Aim: Prediction of the downwind hazard generated by a chemical or biological (or other) release Accident response; military planning; volcanic ash;... Variety of models Gaussian plume (Clarke, 1979) Gaussian puff (Sykes et al., 1998) NAME (Jones et al., 2007) y.cord 20 40 60 80 100 120 Flat_Skew/2007!May!10_15_17_31/Dosage/DosageRun14 Sensor placement 20 40 60 80 100 120 x.cord 3 / 34

Dispersion modelling Ex. 3 results 4 / 34

Dispersion modelling 1. The input variables are usually of two types (meteorological and source), and can be quantitative or qualitative 2. There is substantial prior information about the distribution of the input variables from, for example, empirical observations (meteorological) or expert prior knowledge (source) 3. These prior distributions are not usually independent, either within type (for example, wind direction and speed is defined via a wind rose) or between type (wind direction and source location) 4. The distributions define a joint probability density (or weight function) on the design region, which is likely to have substantial areas of low weight 5 / 34

Prior information 6 / 34

Aim of this work Reduce the number of model evaluations required through designed experiments Develop a class of criteria for constructing space-filling designs that take account of prior information and, particularly, relationships between input variables Evaluate designs from competing criteria in terms of (a) sampling properties and (b) space-filling properties 7 / 34

Weight function For any point x X, define a weight function w(x) w(x) 0 is a problem-specific weight function, e.g. defining the probability of obtaining a useful response We can (almost) think about this in terms of fuzzy sets X = {(x, w(x)), x X }, where w(x) is the (unnormalised) membership function of the fuzzy set X [e.g. Dubois & Prade, 1980] Space-filling designs for fuzzy design spaces? 8 / 34

Space-filling designs Collections of combinations of values of the input variables that (attempt to) provide information on every (relevant) region of the design space Typically, they either Spread the design points as far apart as possible Cover the design region as evenly as possible Only take one observation at each combination of input values (ideal for deterministic models) Allow a variety of statistical models to be fitted Distance-based, Latin Hypercubes, Uniform designs [see, e.g., Santner et al., 2003] 9 / 34

Extensions To apply space-filling designs to dispersion modelling, we need to include Qualitative variables Weight function, including dependencies between variables Consider Quantitative variables,..., x k1 Qualitative variables x k1+1,..., x k1+k 2, with x j having m j levels denoted by M j = {1,..., m j } 10 / 34

Distance metrics Define the distance between two points x, y X = R j M j, where R R k1, as d(x, y) = k1 (x i y i ) 2 + α i=1 k 1+k 2 j=k 1+1 I[x j y j) ], where I[r s] is the indicator function that takes the value 1 if r s and 0 otherwise In this talk, α = 1 Similar for ordered categorical variables [Qian et al. (2008)] 11 / 34

Space-filling criteria Coverage: minimise Spread: minimise { φ um (d) = X [ ] m 1/m min w(y)d(x, y) dy} x d { n [ ] } p 1/p φ sp (d) = min w(x)w(x i)d(x, x i ) x d\{x i } i=1 In this talk, m = p = 1 [See, e.g., Johnson et al. (1990) for unweighted versions] 12 / 34

Space-filling criteria For coverage designs, we want to attract the design to relevant areas of the design space Note that if w(y) = 0, min x d w(y)d(x, y) = 0 for all d For spread designs, we want the design points to repel away from each other Note that as w(x i ) 0, [ min x d\{xi } w(x)w(x i )d(x, x i ) ] k 13 / 34

Implementation Computer search Row exchange (e.g. Royle, 2002 and SAS) Co-ordinate exchange Efficient storage/updating of distances e.g. cover.design in Fields Quasi-random numbers (low-discrepancy sequence) used for the candidate list and also to approximate X 14 / 34

Alternative and connected approaches Transform an iid (quasi-) random sample or space-filling design Latin Hypercube Sample transformed to match marginal distributions and pairwise correlations [McKay et al., 1979; Iman & Conover, 1982] Design spaces with hard constraints on X Constrained optimisation [e.g. Kleijnen et al., 2010] 15 / 34

Graphical design assessment 1. Fraction of Design Space (FDS) with respect to the distance [Zahran et al., 2003] assess the space-filling properties of a design 2. Fraction of Design Points (FDP) with respect to the weight function assess the sampling properties of a design 16 / 34

Weight functions Assume some prior distribution, p(x), on X can be elicited from subject experts and/or derived from historical data We consider two weight functions (1) w(x) = p(x) w(x) 0 (2) w(x) = (1 αp(x)) γ w(x) 1 [Joseph et al., 2010] with α < 1/ max p(x) and γ > 0 17 / 34

Examples Two quantitative variables (, ); p(x) is a logistic function Two quantitative variables (, ) & one qualitative (x 3 ); conditional on x 3, p(x) is a bivariate normal pdf Seven quantitative variables ( x 7 ); application to the dispersion example Ex. 1 Ex. 2 Ex. 3 18 / 34

Example 1 - coverage (1) (2) α = 1, γ = 1 (2) α = 1, γ = 0.75 (2) α = 0, γ = 0 19 / 34

Example 1 - coverage p(x) 0.0 0.2 0.4 0.6 0.8 1.0 Dist 0 1 2 3 4 5 6 (1) (2) α = 1, γ = 1 (2) α = 1, γ = 0.75 (2) α = 0, γ = 0 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space 20 / 34

Example 1 - spread (1) (2) α = 1, γ = 1 (2) α = 0.75, γ = 0.5 (2) α = 0, γ = 0 21 / 34

Example 1 - spread p(x) 0.0 0.2 0.4 0.6 0.8 1.0 Dist 0 2 4 6 (1) (2) α = 1, γ = 1 (2) α = 1, γ = 0.75 (2) α = 0, γ = 0 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space 22 / 34

Example 1 - comparison p(x) 0.0 0.2 0.4 0.6 0.8 1.0 Dist 0 1 2 3 4 (1) Coverage (1) Spread (2) Coverage α = 0, γ = 0 (2) Spread α = 0, γ = 0 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space Ex. 3 23 / 34

Example 2 - coverage (1) x 3 = 0 x 3 = 1 x 3 = 2 24 / 34

Example 2 - coverage (2) x 3 = 0 x 3 = 1 x 3 = 2 25 / 34

Example 2 - spread (1) x 3 = 0 x 3 = 1 x 3 = 2 26 / 34

Example 2 - spread (2) x 3 = 0 x 3 = 1 x 3 = 2 27 / 34

Example 2 - comparison p(x) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Dist 0 1 2 3 4 (1) Coverage (2) Coverage α = 1 2.71, γ = 0.75 (1) Spread (2) Spread α = 1 2.71, γ = 0.3 Coverage 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space 28 / 34

Example 3 Seven quantitative variables x 7 Wind speed and direction Cloud cover x y location, mass and time of release Prior information... from past data (meteorological) and subject experts (source) defines a release scenario Compare: Weighted space-filling designs with weight function (1) w(x) = p(x) Latin hypercube samples transformed to match marginal distributions and pairwise correlations [McKay et al., 1979; Iman & Conover, 1982] 29 / 34

Example 3 - comparison p(x) 0.0 0.1 0.2 0.3 0.4 0.5 Dist 0.0 0.2 0.4 0.6 0.8 (1) Coverage (2) Coverage α = 0, γ = 0 (1) Spread (2) Spread α = 0, γ = 0 LHS 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space 30 / 34

Example 3 - coverage 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 Wind speed 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 Wind direc 0.0 0.2 0.4 0.6 0.8 1.0 x loc 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 y loc 31 / 34

Example 3 - preliminary results Squared Error Monte Carlo LHS Coverage Mean 7 10 3 6 10 4 4 10 4 Max. 4 10 2 5 10 3 5 10 3 c.f. larger Monte Carlo study 32 / 34

Summary & future work Flexible space-filling design method incorporating prior information Selection of weight function allows a trade-off between space-filling and sampling high density areas Other applications include physical spatial experiments with covariates, and selection of subsets of meteorological ensembles Space-filling in many dimensions? Strong prior information can reduce the effective dimension 33 / 34

Selected references Iman, R.L. and Conover, W.J. (1982). Comm. Statist. Comput., 11, 311-334. Simul. Johnson, M.E. et al. (1990). JSPI, 26, 131-148. Joseph, V.R. et al. (2010). Tech. report. Lam, R.L.H. et al. (2002). Technometrics, 44, 99-109. McKay, M.D. et al. (1979). Technometrics, 21, 239-245. Qian, P.Z.G. et al. (2008). Technometrics, 50, 383-396. Royle, J.A. (2002). JSPI, 100, 121-134. Santner, T.J. (2003). The Design and Analysis of Computer Experiments. Springer. Zahran, A. et al. (2003). JQT, 35, 377-386. 34 / 34

35 / 34

References Bedrick, E.J. et al. (2000). Biometrics, 56, 394-401. Clarke, R.H. (1979). National Radiological Protection Board report NRPB-R91. Dubois, D. & Prade, H. (1980). Fuzzy Sets and Systems. Academic Press. Iman, R.L. and Conover, W.J. (1982). Comm. Statist. Simul. Comput., 11, 311-334. Johnson, M.E. et al. (1990). JSPI, 26, 131-148. Jones, A.R. et al. (2007). Proceedings of the 27th NATO/CCMS International Technical Meeting on Air Pollution Modelling and its Application. Joseph, V.R. et al. (2010). Tech. report. Kleijnen, J.P.C. et al. (2010). EJOR, 202, 164-174. Lam, R.L.H. et al. (2002). Technometrics, 44, 99-109. McKay, M.D. et al. (1979). Technometrics, 21, 239-245. Qian, P.Z.G. et al. (2008). Technometrics, 50, 383-396. Royle, J.A. (2002). JSPI, 100, 121-134. Santner, T.J. (2003). The Design and Analysis of Computer Experiments. Springer. Sykes, R.I. et al. (1998). ARAP Report No. 718. Zahran, A. et al. (2003). JQT, 35, 377-386. 36 / 34

Graphical design assessment (1) (i) Fraction of Design Space (FDS) with respect to the weighted distance. For each point x X, we calculate φ( x d) = min d(x, x), x d and plot the inverse of the empirical distribution function Φ 1 (ν d) = 1 d x, D A 1 A 1 = { x X φ( x d) ν}, D = X dx and 0 Φ 1(ν d) 1 for all ν 0 Design d 1 dominates design d 2 if and only if Φ 1 (ν, d 1 ) Φ 1 (ν, d 2 ) for all ν, with Φ 1 (ν, d 1 ) > Φ 1 (ν, d 2 ) for at least one value of ν Approximate A 1 for any given ν using a quasi-random sample See Zahran et al. (2003) for response surface designs 37 / 34

Graphical design assessment (2) (ii) Fraction of Design Points (FDP) with respect to the sampling density p(x). For each point x in the design, we calculate p(x) and then plot the inverse of the empirical distribution function Φ 2 (ρ d) = 1 n A 2 A 2 = {x d p(x) ρ} and 0 Φ 2 (ρ d) 1 for all ρ 0 Design d 1 dominates design d 2 if and only if Φ 2 (ρ d 1 ) Φ 2 (ρ, d 2 ) for all ρ, with Φ 2 (ρ d 1 ) < Φ 2 (ρ d 2 ) for at least one value of ρ 38 / 34

39 / 34

Weight functions Example 1: ln Example 2: w 1 w = 1.2 + 0.7 1.8 1.9 0.8 1 + 3.0 2 g(x, Σ, p x 3 ) = [ p exp 1 ] 2π Σ 1/2 2 (x µ(x 3)) Σ 1 (x µ(x 3 )) (0, 0) if x 3 = 0 µ(x 3 ) = (1, 1) if x 3 = 1 ( 1, 1) if x 3 = 2 40 / 34