Weighted space-filling designs

Weighted space-filling designs Dave Woods D.Woods@southampton.ac.uk www.southampton.ac.uk/ davew Joint work with Veronica Bowman (Dstl) Supported by the Defence Threat Reduction Agency 1 / 34

Outline Motivation & background Designs incorporating prior information Illustrative examples Application to dispersion modelling Graphical method of design evaluation Work in progress... 2 / 34

Dispersion modelling Aim: Prediction of the downwind hazard generated by a chemical or biological (or other) release Accident response; military planning; volcanic ash;... Variety of models Gaussian plume (Clarke, 1979) Gaussian puff (Sykes et al., 1998) NAME (Jones et al., 2007) y.cord 20 40 60 80 100 120 Flat_Skew/2007!May!10_15_17_31/Dosage/DosageRun14 Sensor placement 20 40 60 80 100 120 x.cord 3 / 34

Dispersion modelling Ex. 3 results 4 / 34

Dispersion modelling 1. The input variables are usually of two types (meteorological and source), and can be quantitative or qualitative 2. There is substantial prior information about the distribution of the input variables from, for example, empirical observations (meteorological) or expert prior knowledge (source) 3. These prior distributions are not usually independent, either within type (for example, wind direction and speed is defined via a wind rose) or between type (wind direction and source location) 4. The distributions define a joint probability density (or weight function) on the design region, which is likely to have substantial areas of low weight 5 / 34

Prior information 6 / 34

Aim of this work Reduce the number of model evaluations required through designed experiments Develop a class of criteria for constructing space-filling designs that take account of prior information and, particularly, relationships between input variables Evaluate designs from competing criteria in terms of (a) sampling properties and (b) space-filling properties 7 / 34

Weight function For any point x X, define a weight function w(x) w(x) 0 is a problem-specific weight function, e.g. defining the probability of obtaining a useful response We can (almost) think about this in terms of fuzzy sets X = {(x, w(x)), x X }, where w(x) is the (unnormalised) membership function of the fuzzy set X [e.g. Dubois & Prade, 1980] Space-filling designs for fuzzy design spaces? 8 / 34

Space-filling designs Collections of combinations of values of the input variables that (attempt to) provide information on every (relevant) region of the design space Typically, they either Spread the design points as far apart as possible Cover the design region as evenly as possible Only take one observation at each combination of input values (ideal for deterministic models) Allow a variety of statistical models to be fitted Distance-based, Latin Hypercubes, Uniform designs [see, e.g., Santner et al., 2003] 9 / 34

Extensions To apply space-filling designs to dispersion modelling, we need to include Qualitative variables Weight function, including dependencies between variables Consider Quantitative variables,..., x k1 Qualitative variables x k1+1,..., x k1+k 2, with x j having m j levels denoted by M j = {1,..., m j } 10 / 34

Distance metrics Define the distance between two points x, y X = R j M j, where R R k1, as d(x, y) = k1 (x i y i ) 2 + α i=1 k 1+k 2 j=k 1+1 I[x j y j) ], where I[r s] is the indicator function that takes the value 1 if r s and 0 otherwise In this talk, α = 1 Similar for ordered categorical variables [Qian et al. (2008)] 11 / 34

Space-filling criteria Coverage: minimise Spread: minimise { φ um (d) = X [ ] m 1/m min w(y)d(x, y) dy} x d { n [ ] } p 1/p φ sp (d) = min w(x)w(x i)d(x, x i ) x d\{x i } i=1 In this talk, m = p = 1 [See, e.g., Johnson et al. (1990) for unweighted versions] 12 / 34

Space-filling criteria For coverage designs, we want to attract the design to relevant areas of the design space Note that if w(y) = 0, min x d w(y)d(x, y) = 0 for all d For spread designs, we want the design points to repel away from each other Note that as w(x i ) 0, [ min x d\{xi } w(x)w(x i )d(x, x i ) ] k 13 / 34

Implementation Computer search Row exchange (e.g. Royle, 2002 and SAS) Co-ordinate exchange Efficient storage/updating of distances e.g. cover.design in Fields Quasi-random numbers (low-discrepancy sequence) used for the candidate list and also to approximate X 14 / 34

Alternative and connected approaches Transform an iid (quasi-) random sample or space-filling design Latin Hypercube Sample transformed to match marginal distributions and pairwise correlations [McKay et al., 1979; Iman & Conover, 1982] Design spaces with hard constraints on X Constrained optimisation [e.g. Kleijnen et al., 2010] 15 / 34

Graphical design assessment 1. Fraction of Design Space (FDS) with respect to the distance [Zahran et al., 2003] assess the space-filling properties of a design 2. Fraction of Design Points (FDP) with respect to the weight function assess the sampling properties of a design 16 / 34

Weight functions Assume some prior distribution, p(x), on X can be elicited from subject experts and/or derived from historical data We consider two weight functions (1) w(x) = p(x) w(x) 0 (2) w(x) = (1 αp(x)) γ w(x) 1 [Joseph et al., 2010] with α < 1/ max p(x) and γ > 0 17 / 34

Examples Two quantitative variables (, ); p(x) is a logistic function Two quantitative variables (, ) & one qualitative (x 3 ); conditional on x 3, p(x) is a bivariate normal pdf Seven quantitative variables ( x 7 ); application to the dispersion example Ex. 1 Ex. 2 Ex. 3 18 / 34

Example 1 - coverage (1) (2) α = 1, γ = 1 (2) α = 1, γ = 0.75 (2) α = 0, γ = 0 19 / 34

Example 1 - coverage p(x) 0.0 0.2 0.4 0.6 0.8 1.0 Dist 0 1 2 3 4 5 6 (1) (2) α = 1, γ = 1 (2) α = 1, γ = 0.75 (2) α = 0, γ = 0 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space 20 / 34

Example 1 - spread (1) (2) α = 1, γ = 1 (2) α = 0.75, γ = 0.5 (2) α = 0, γ = 0 21 / 34

Example 1 - spread p(x) 0.0 0.2 0.4 0.6 0.8 1.0 Dist 0 2 4 6 (1) (2) α = 1, γ = 1 (2) α = 1, γ = 0.75 (2) α = 0, γ = 0 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space 22 / 34

Example 1 - comparison p(x) 0.0 0.2 0.4 0.6 0.8 1.0 Dist 0 1 2 3 4 (1) Coverage (1) Spread (2) Coverage α = 0, γ = 0 (2) Spread α = 0, γ = 0 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space Ex. 3 23 / 34

Example 2 - coverage (1) x 3 = 0 x 3 = 1 x 3 = 2 24 / 34

Example 2 - coverage (2) x 3 = 0 x 3 = 1 x 3 = 2 25 / 34

Example 2 - spread (1) x 3 = 0 x 3 = 1 x 3 = 2 26 / 34

Example 2 - spread (2) x 3 = 0 x 3 = 1 x 3 = 2 27 / 34

Example 2 - comparison p(x) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Dist 0 1 2 3 4 (1) Coverage (2) Coverage α = 1 2.71, γ = 0.75 (1) Spread (2) Spread α = 1 2.71, γ = 0.3 Coverage 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space 28 / 34

Example 3 Seven quantitative variables x 7 Wind speed and direction Cloud cover x y location, mass and time of release Prior information... from past data (meteorological) and subject experts (source) defines a release scenario Compare: Weighted space-filling designs with weight function (1) w(x) = p(x) Latin hypercube samples transformed to match marginal distributions and pairwise correlations [McKay et al., 1979; Iman & Conover, 1982] 29 / 34

Example 3 - comparison p(x) 0.0 0.1 0.2 0.3 0.4 0.5 Dist 0.0 0.2 0.4 0.6 0.8 (1) Coverage (2) Coverage α = 0, γ = 0 (1) Spread (2) Spread α = 0, γ = 0 LHS 0 20 40 60 80 100 % design points 0 20 40 60 80 100 % design space 30 / 34

Example 3 - coverage 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 Wind speed 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.05 0.10 0.15 Wind direc 0.0 0.2 0.4 0.6 0.8 1.0 x loc 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 y loc 31 / 34

Example 3 - preliminary results Squared Error Monte Carlo LHS Coverage Mean 7 10 3 6 10 4 4 10 4 Max. 4 10 2 5 10 3 5 10 3 c.f. larger Monte Carlo study 32 / 34

Summary & future work Flexible space-filling design method incorporating prior information Selection of weight function allows a trade-off between space-filling and sampling high density areas Other applications include physical spatial experiments with covariates, and selection of subsets of meteorological ensembles Space-filling in many dimensions? Strong prior information can reduce the effective dimension 33 / 34

Selected references Iman, R.L. and Conover, W.J. (1982). Comm. Statist. Comput., 11, 311-334. Simul. Johnson, M.E. et al. (1990). JSPI, 26, 131-148. Joseph, V.R. et al. (2010). Tech. report. Lam, R.L.H. et al. (2002). Technometrics, 44, 99-109. McKay, M.D. et al. (1979). Technometrics, 21, 239-245. Qian, P.Z.G. et al. (2008). Technometrics, 50, 383-396. Royle, J.A. (2002). JSPI, 100, 121-134. Santner, T.J. (2003). The Design and Analysis of Computer Experiments. Springer. Zahran, A. et al. (2003). JQT, 35, 377-386. 34 / 34

35 / 34

References Bedrick, E.J. et al. (2000). Biometrics, 56, 394-401. Clarke, R.H. (1979). National Radiological Protection Board report NRPB-R91. Dubois, D. & Prade, H. (1980). Fuzzy Sets and Systems. Academic Press. Iman, R.L. and Conover, W.J. (1982). Comm. Statist. Simul. Comput., 11, 311-334. Johnson, M.E. et al. (1990). JSPI, 26, 131-148. Jones, A.R. et al. (2007). Proceedings of the 27th NATO/CCMS International Technical Meeting on Air Pollution Modelling and its Application. Joseph, V.R. et al. (2010). Tech. report. Kleijnen, J.P.C. et al. (2010). EJOR, 202, 164-174. Lam, R.L.H. et al. (2002). Technometrics, 44, 99-109. McKay, M.D. et al. (1979). Technometrics, 21, 239-245. Qian, P.Z.G. et al. (2008). Technometrics, 50, 383-396. Royle, J.A. (2002). JSPI, 100, 121-134. Santner, T.J. (2003). The Design and Analysis of Computer Experiments. Springer. Sykes, R.I. et al. (1998). ARAP Report No. 718. Zahran, A. et al. (2003). JQT, 35, 377-386. 36 / 34

Graphical design assessment (1) (i) Fraction of Design Space (FDS) with respect to the weighted distance. For each point x X, we calculate φ( x d) = min d(x, x), x d and plot the inverse of the empirical distribution function Φ 1 (ν d) = 1 d x, D A 1 A 1 = { x X φ( x d) ν}, D = X dx and 0 Φ 1(ν d) 1 for all ν 0 Design d 1 dominates design d 2 if and only if Φ 1 (ν, d 1 ) Φ 1 (ν, d 2 ) for all ν, with Φ 1 (ν, d 1 ) > Φ 1 (ν, d 2 ) for at least one value of ν Approximate A 1 for any given ν using a quasi-random sample See Zahran et al. (2003) for response surface designs 37 / 34

Graphical design assessment (2) (ii) Fraction of Design Points (FDP) with respect to the sampling density p(x). For each point x in the design, we calculate p(x) and then plot the inverse of the empirical distribution function Φ 2 (ρ d) = 1 n A 2 A 2 = {x d p(x) ρ} and 0 Φ 2 (ρ d) 1 for all ρ 0 Design d 1 dominates design d 2 if and only if Φ 2 (ρ d 1 ) Φ 2 (ρ, d 2 ) for all ρ, with Φ 2 (ρ d 1 ) < Φ 2 (ρ d 2 ) for at least one value of ρ 38 / 34

39 / 34

Weight functions Example 1: ln Example 2: w 1 w = 1.2 + 0.7 1.8 1.9 0.8 1 + 3.0 2 g(x, Σ, p x 3 ) = [ p exp 1 ] 2π Σ 1/2 2 (x µ(x 3)) Σ 1 (x µ(x 3 )) (0, 0) if x 3 = 0 µ(x 3 ) = (1, 1) if x 3 = 1 ( 1, 1) if x 3 = 2 40 / 34