Design of Screening Experiments with Partial Replication

Design of Screening Experiments with Partial Replication David J. Edwards Department of Statistical Sciences & Operations Research Virginia Commonwealth University Robert D. Leonard Department of Information Systems and Analytics Miami University DATAWorks March 21, 2018

Screening Experiments Screening Experiments are frequently utilized in the early stages of experimentation Goal of screening is to identify important main effects and perhaps some important two-factor interactions Typical screening designs are Unreplicated strength 2 or 3 orthogonal arrays Designs constructed using variance-based optimality criteria (such as D-optimality)

To Pool or Not to Pool That is the Question? Screening designs are usually unreplicated No unbiased estimate of experimental error. Inference is carried out by: A robust estimator for the error variance based on a fitted saturated model (e.g., Lenth s Pseudo Standard Error) Pooling sums of squares from terms identified as not active If statistical inference is an integral part of the analysis, one view is that inference should be performed using a pure error estimate versus the residual mean square from the fitted model.

Partial Replication From Lupinacci and Pigeon (2008, JQT): even in the initial stages of an experimental program, it might be desirable or even necessary for the experimenter to have a realistic estimate of the experimental error. Since full replication of an experiment may be quite costly, partial replication offers a costsaving alternative for obtaining a model independent error estimate.

Gilmour, S.G. and Trinca, L.A. (2012). Optimum Design of Experiments for Statistical Inference. Journal of the Royal Statistical Society, Series A, 61(3), 345-401. Motivation

DP-optimality Based upon the formulation for the volume of a confidence interval region for β, which is proportional to F, ; / X X /, where p is the number of parameters in the assumed model, d is the pure error degrees of freedom, and F, ; is the 1 α -quantile of the F-distribution. The so-called DP-optimal design is then defined as that which maximizes X X / F, ;.

DP-optimality As inference is rarely performed on the intercept, it can be treated as a nuisance parameter and hence, the DP-optimal design maximizes X QX / F, ; where X is the model matrix without a column of ones, Q = I J, I is the n n identity matrix, and J is the n n matrix of ones.

DP-optimal Designs DP-optimal designs tend to have an excessive number of replicate runs and seem inefficient regarding the use of available degrees of freedom in screening situations. 7-factor, 24-run DP-optimal design with a priori main effects model. No lack-of-fit degrees of freedom! - - - + - + - - - - + - + - - - - + - + - - - + - - - + - - + - - - + - - + - - - + - + - + + - + - + - + + - + - + - + + - + - + + - + + - - + + - + + - - + + - + + - + - - - + + + + - - - + + + + - - - + + + + - + + + - - + - + + + - - + - + + + - - + + - - - - - + + - - - - - + + - - - - - + + + + - + + + + + + - + + + + + + - + +

Compound DP-optimality To overcome this potential deficiency and encourage the availability of degrees of freedom for testing lack-of-fit, Gilmour and Trinca (2012) propose to incorporate degree of freedom efficiency ( df = (n d)/n ) into the formulation of a compound criterion. For example, combining the DP-criterion with degrees of freedom efficiency, we maximize X QX n d F, ; where κ = (κ, κ ) are weights.

Compound DP-optimality This modification helps κ = 0.5, 0.5 produces a design with 8 degrees of freedom for pure error (df pe ) and 8 for lack-of-fit (df lof ). κ = 0.2, 0.8 has 5 df pe and 11 df lof Despite this, design construction still critically depends on the choice of a priori model.

Bayesian D-optimality DuMouchel and Jones (1994) Primary terms are those effects assumed to be active while potential terms may or may not be active Joint prior distribution of the unknown model parameters is β σ ~N 0, σ R, where R = K/τ is the prior covariance matrix and K = 0 0 0 I The posterior distribution for β given y is given as β y ~ N X X + R X y, σ X X + R

Bayesian D- and DP-optimality A Bayesian D-optimal design, ξ, maximizes X X + R Extending the formulation of the DP-optimality criterion within the Bayesian framework, we suggest the Bayesian DP-optimality criterion which maximizes X QX + K τ F, ; where K = 0 ( ) ( 0 ( ) 0 ( I

Some Comparisons df pe df LoF D-eff DP-eff Design I (D-opt) Design II (BD-opt) Design III (DP-opt) Design IV (BDP-opt) cdp(0.5, 0.5) cdp(0.2, 0.8) 0 16 1.00 0.00 0 16 0.94 0.00 16 0 1.00 1.00 6 10 0.90 0.57 8 8 1.00 0.76 5 11 1.00 0.54

Protecting Against Model Misspecification Design I (D-opt) Design II (BD-opt) Design III (DP-opt) Design IV (BDP-opt) df pe df LoF D-eff DP-eff Primary & Primary & Potential & Primary Potential Potential tr(aa ) Correlation Correlation Correlation 0 16 1.00 0.00 0.000 0.109 0.071 5.333 0 16 0.94 0.00 0.095 0.027 0.092 1.113 16 0 1.00 1.00 0.000 0.143 0.100 21.000 6 10 0.90 0.57 0.175 0.041 0.228 1.873 cdp(0.5, 0.5) 8 8 1.00 0.76 0.000 0.116 0.081 10.333 cdp(0.2, 0.8) 5 11 1.00 0.54 0.000 0.157 0.100 7.667 A = X X X X is the so-called alias matrix and trace(aa ) (to be minimized) can be viewed as a measure of the potential impact of bias

Additional Comparisons Example 1: Two-levels, 7-factors (MEs primary, 2FIs potential)

Additional Comparisons Example 2: 3-levels, 7-factors (MEs and Pure Quadratics primary and 2FIs potential)

Design Choices via Multiple Criteria It is possible for multiple designs to satisfy any given optimality criterion, in which case additional criteria may be used to further discriminate between choices of optimal designs. Lu et al. (2011, Technometrics) have shown that a suite of designs can be reduced to a smaller set of Pareto-optimal choices by combining multiple design criteria as a weighted sum of scaled criteria values.

Multiple Criteria Four sets of 100 designs are created using each of D-, DP-, Bayesian D-, and Bayesian DP-optimality. Criterion values (df pe, D-eff, and tr(aa )) are calculated for all nondominated designs and scaled so that the maximum value is one and minimum value is zero. Next, L norms are calculated as w f ξ f where f ξ, f ξ, f ξ are the scaled values, F = 1, 1, 0 since we wish to maximize both df pe and D-eff and minimize tr AA ; and w are weights For each choice of weight, the design which minimizes the L 1 norm is selected.,

Revisiting Example 1

Revisiting Example 1 df pe df LoF D-eff DP-eff Primary & Primary Correlation Primary & Potential Correlation Potential & Potential Correlation tr(aa ) Design I (D-opt) Design V (D-opt) Design VI (D-opt) Design IV (BDPopt) Design VII (BDPefficient) 0 16 1.00 0.00 0.000 0.109 0.071 5.333 1 15 1.00 0.01 0.000 0.082 0.081 4.000 0 16 1.00 0.00 0.000 0.075 0.081 3.667 6 10 0.90 0.57 0.175 0.041 0.228 1.873 6 10 0.94 0.59 0.111 0.057 0.205 2.550 20

Revisiting Example 2 Design VIII (D-opt) Design IX (DP-opt) Design X (BDP-efficient) df pe df LoF D-eff DP-eff Primary & Primary & Potential & Primary Potential Potential tr(aa ) Correlation Correlation Correlation 0 9 1.000 0.000 0.054 0.149 0.148 15.755 9 0 0.870 1.000 0.116 0.223 0.237 45.927 6 3 0.904 0.494 0.100 0.140 0.239 25.120

Worth mentioning? Common practice in response surface experiments to replicate center point runs in order to obtain a model independent estimate of error. To generate the designs in this study, the full 3 7 factorial was utilized as the candidate set of points. Out of the 400 24-run designs, none of them include a center point run.

Simulation Protocol For each of 2000 iterations: 1. Randomly assign columns of the design matrix to be active (2-5). 2. Randomly assign two-factor interactions to be active based on the weak effect heredity principle (i.e., a two-factor interaction is active only if one or more of its parent main effects are active) (0-7). 3. Select the coefficient vector (β) for the active effects. Two scenarios are considered: a. Equal: Randomly assign coefficients for each of the active effects from (±1, ±1.5, ±2, ±2.5, ±3, ±3.5). b. Smaller: Randomly assign coefficients for the main effects from (±2, ±2.5, ±3, ±3.5) and the two-factor interactions from (±1, ±1.5, ±2). More likely to represent what is observed in practice (effect hierarchy). 4. Generate the response vector as y = X A β + ε, where X is the matrix consisting of the active effect columns and each ε is simulated from a standard normal distribution.

Simulation Protocol (cont.) 5. Fit the primary model (in this case, a main effects only model) to the simulated data and determine the set of active effects. This is accomplished by: a. If a pure error estimate is available, statistical inference is performed using the mean square pure error (MSPE). Usual t- statistics are computed as β / / var β where var β = MSPE X X 1 and p-values computed based on a t- ii distribution with df pe ; α=0.05 is used for selection of active effects. b. If a pure error estimate is not available, statistical inference is performed using the residual mean square from the fitted model. 6. If a pure error estimate is available, perform a lack-of-fit test via the usual F- statistic with df LoF numerator and df pe denominator degrees of freedom.

Simulation Protocol (cont.) At the end of the 2000 iterations, the average proportion of correctly identified active main effects (power) and the average proportion of main effects declared active which were actually inactive (false discovery rate) are calculated. Also calculate the power for detecting lack-of-fit by reporting the proportion of simulations in which lack-of-fit was correctly identified. 25

Some (Abbreviated) Results

An Aside: Pure Error and False Discovery (Variable Selection)

Final Comments Proposed a Bayesian modification to the DP-criterion to allow for the construction of optimal designs that provide some safeguard against misspecification of the a priori model as well as the ability to obtain a model independent estimate of σ for conducting statistical inference. Examples illustrate that the proposed designs are a compromise between traditional Bayesian D-optimal designs, which rarely provide replicate points, and DPoptimal designs, which tend to over-replicate. The Pareto front approach and multiple design criterion also help to demonstrate the compromising effect of Bayesian DP-optimal designs.

Thanks for your attention! Any questions?