The Planning and Analysis of Industrial Selection and Screening Experiments

Size: px

Start display at page:

Download "The Planning and Analysis of Industrial Selection and Screening Experiments"

Everett Ellis
5 years ago
Views:

1 The Planning and Analysis of Industrial Selection and Screening Experiments Guohua Pan Department of Mathematics and Statistics Oakland University Rochester, MI Thomas J. Santner Department of Statistics Ohio State University Columbus, OH David M. Goldsman School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA Introduction The purpose of this article is to explain methodology for designing and analyzing industrial experiments when selection and screening, rather than hypothesis testing, is the goal. In rough terms, we say the goal is one of selection and/or screening when the scientific objective is to find the best treatment. Empirical investigation, in the form of physical experiments, has been a key tool in the development of many advances in industrial product design, as well as online and offline improvements in manufacturing during the last 75 years. More recently, physical experiments have also been used to design products that 1

2 are robust to the environmental conditions in which they are used or robust with respect to the manufacturing process itself. Broadly speaking, at least three types of experiments have evolved. Historically, physical experiments were the earliest to be conducted; the first principles of the design and analysis of such experiments were developed in response to agricultural improvements, then later to meet industrial and medical needs. Simulation experiments are an attractive alternative to physical experiments when the experimenter has a complex physical system whose parts interact in a known manner but whose ensemble is not understood analytically. Such complex interacting structural components are typically combined with specified noise distributions to produce random output. Banks 1998) gives an overview of the field and Goldsman and Nelson 1998) provides a survey of methods useful for designing simulation experiments to identify best treatments. In the last ten to fifteen years, a third form of experiment, commonly called a computer experiment, has become popular. In a computer experiment, a deterministic output is calculated for each set of input variables. Many phenomena that could only be studied using physical experiments can now be studied by these computer experiments. Computer experiments are possible when the mathematical model of the physical process of interest is known and an algorithm has been developed for solving the resulting mathematical system in a reasonable time frame on fast) computing equipment. In engineering, dynamical models of physical systems implemented using finite-element methods are often the basis for computer code used in computer experiments. Because the output is deterministic, issues such as replication, randomization, and other fundamental tools for designing physical experiments are no longer appropriate. The special techniques used to design and analyze computer experiments are discussed in detail in the survey articles by Koehler and Owen 1996) and Sacks, Welch, Mitchell and Wynn 1989). This article will discuss the design and analysis of physical and simulation experiments for frequently occurring industrial problems involving the identification of best or near-best treatments. Another very useful approach to the problem of identifying best treatments is that of forming simultaneous confidence intervals for important parameters related to this problem. Length considerations require this review paper to focus attention on selection and screening approaches, with one exception in Section 7, but we refer the reader to the seminal book-length treatment of simultaneous confidence intervals in Hsu 1996). Sections 2 and 3 discuss the familiar one-way layout, illustrating statistical methods both for the problem of designing experiments to select best treatments 2

3 and for analysis procedures to screen a given set of data to extract a small) set of treatments containing best treatments. Section 4 reviews such problems for the case of completely randomized full-factorial experiments. Section 5 considers screening for fractional-factorial experiments for the specific goal of finding most robust engineering designs for products or manufacturing processes. Section 6 considers the important case of designing and analyzing randomization restricted experiments; particular attention is given to identifying treatment combinations with largest means in split-plot experiments and selection of robust product designs for the same setup. Finally, Section 7 reviews statistical procedures for finding treatments associated with smallest variances and gives a set of simultaneous confidence intervals for comparing treatment variances. Throughout we use the standard notation in which a bar over the quantity and a dot replacing one or more of its subscripts means that an average has been computed with respect to that those) subscripts); for example,. # Also we let denote the $&% column) vector with common )+*,-*. # element unity and ' denote the $/% zero vector. Further, for 021 IR, let be the smallest integer greater than or equal to 0. Finally, the notations 6 and 9 denote the cumulative distribution function and density function, respectively, of the standard normal distribution. We conclude this section by noting several other recent summary articles concerning selection and screening procedures. These include van der Laan and Verdooren 1989) and Gupta and Panchapakesan 1996) for general overviews, and Driessen 1992) and Dourleijn 1993) for results concerning subset selection in connected experiments. 2 Designing Single-Factor Experiments to Select Treatments Based on Means 2.1 Introduction We begin with the problem of designing an experiment to select the best treatment in the simplest possible setting, that of a single-factor experiment. Subsequent sections will both refine the model by allowing multiple factors and weaken the stochastic normality) assumptions of the one-way layout. The underlying question of interest is Which of competing normal distributions or treatments or systems or populations) has the largest mean?. Without loss of generality, we could also be interested in finding the normal treatment hav- 3

4 ing the smallest mean.) This broad question has a tremendous number of potential applications, for example, Which of fertilizers produces the highest crop yield? Which of groups scores highest on a certain standardized test? Which of factory layouts maximizes production flow? Throughout this section, we adopt the following assumptions on the data that we will use in our experimentation. Statistical Assumptions Independent random samples of normal observations are to be taken that satisfy 2.1) where the index represents the treatments, the index represents the observations taken within a treatment, and the are mutually independent and *, identically distributed i.i.d.) ) measurement errors. The variances of the measurement errors, the s, may or may not be known but the treatment means s are unknown. The number of observations to be taken from each treatment depends on the goal of the experiment, the design requirement, and the particular procedure employed accordingly the sample size may be fixed or random. The goal and design requirement we use are those of Bechhofer 1954) as refined by Fabian 1962). To describe the goal, let the vector of unknown treatment means be given by +. Further, the ordered but unknown -values are denoted as Roughly, our goal is to identify the treatment associated with the largest mean, the best treatment. However, from a practical viewpoint, any treatment with a mean that is sufficiently close to will be essentially equivalent to the best. We formalize this idea by defining a treatment to be # -near-best provided that $ &%. Of course, if the best and second-best treatments have means that are more than apart, i.e., '% ) +*,, then only the best treatment is, -near-best. On the other extreme, if all the means are within - of the best treatment, i.e.,.%,, then all the treatments are / -near-best. Motivated by this consideration, we formulate our experimental goal. 4

5 Goal 2.1 To select a treatment that is -near-best. Here, the choice of / should be made on engineering or scientific considerations to quantify practical equivalence. The difficulty of achieving Goal 2.1 depends on the spread in, whether the variances unknown, and whether the ) are known or s are or can be assumed to be) equal. All of the subsequent procedures we provide to select the best treatment will satisfy the following design àka probability) requirement. Design Requirement Given a desired level of correct selection CS) with and desired level of equivalence defined by # * *, we require 2.2) CS for all and all unknown +, where the event CS means that Goal 2.1 has been achieved. To illustrate the caveat about the in the Design Requirement, if the variances are known then 2.2) need only hold for all whereas if the variances are all unknown and not necessarily homogeneous, then 2.2) must also hold for all * *. As the notation in 2.2) suggests, the left-hand-side probability depends on the, ), the sizes of the samples taken from each differences % treatment, and the variances. Finally, we note that we restrict * because can be achieved without taking any observations by rolling a fair -sided die and selecting the treatment so identified as the best one. The remainder of this section describes statistical selection procedures appropriate in two variance settings. The single-stage procedure of Bechhofer 1954) is for the case in which the variances are known and equal, i.e., A two-stage procedure of Rinott 1978) is for the situation where are unknown and not necessarily equal. An efficient sequential procedure of Kim and Nelson 2000) is for the same assumptions as Rinott s procedure. Lastly, we note that any of the procedures described in this section can be used to find the treatment with the smallest mean by an obvious modification or formally by direct application of the procedure for finding the treatment with the largest mean using the negative of the responses). 5

6 $ $ 2.2 Selection of -Near-Best Treatments Common known variance case Bechhofer 1954) devised a statistical procedure that can be used to select a - near-best treatment when the variances of the measurement errors of the treatments are homogeneous, i.e., and is known. Bechhofer s procedure is intuitive we select the treatment corresponding to the largest of the sample means. To describe his procedure we require the one-sided upper- equicoordinate point of the equicorrelated multivariate normal distribution, ; this value is defined in brief by 2.4) and in the context of other relevant critical points in Section 8. Selection Procedure For the given and, and desired 3, compute ) 4 2.3) Sampling rule Take a random sample of $ observations $ ) from each treatment ). Terminal decision rule Calculate the sample means ). Select the treatment that yielded the largest sample mean,, as having a -near-best mean. Fabian 1962) showed that Procedure achieves Goal 2.1. The sample size $ can be directly computed using the FORTRAN program usenb that is described in Bechhofer, Santner and Goldsman 1995) and is available as part of the package of FORTRAN programs associated with their book at the web site tjs. Alternatively, $ can be determined ) from 2.3) because the critical point can be calculated via a zero-finding routine in conjunction with the very useful public domain FORTRAN program mvnprd from Dunnett 1989). We illustrate both methods below. Example 2.1 Suppose that five shop layouts are to be compared with respect to their mean output. The measured output from the th layout is assumed to be 6

7 . In addition, the experi-,. Invoking the program usenb produces the where. Define layout to be -near-best provided its mean is within 0.1 of the largest mean, i.e., % *, menter wishes to identify such a layout with probability at least 80%. Thus * *,* *,,, and, following dialogue. > usenb ENTER T AND SIGMA CHOICE GIVEN FIND 1 N, P-STAR DELTA-STAR 2 DELTA-STAR, P-STAR N 3 N, DELTA-STAR P-STAR ENTER CHOICE 1, 2, OR 3) 2 ENTER DELTA-STAR AND P-STAR.1.80 FOR T, DELTA*, P*, SIGMA) = 5, 0.10, 0.800, 1.00) THE REQUIRED SAMPLE SIZE FOR PROCEDURE N_B IS N = 422 DO YOU WANT TO CONTINUE? PLEASE ANSWER Y/N N One can also determine $ using 2.3). The FORTRAN program find-zt implements a bisection algorithm to solve % * 2.4) for, where % and has the multivariate normal distribution with mean vector zero, unit variances, and common correlation 1/2. The program find-zt is also included in the set of programs available at tjs.) For example, the dialogue below shows that, to two decimal places, the critical point. > find-zt Enter P = No. Dimensions, and D.o.F. 0 for MVN) or 0 0 to terminate) 7

8 $ Enter 1 for equal-rho case, or 2 to compute rho_ij from sample sizes, or 3 if rho_ij = b_i*b_j 1 Enter common rho.5 Enter Quantile Probability Enter 1 to repeat with new coverage or 0 for new problem 0 Enter P = No. Dimensions, and D.o.F. 0 for MVN) or 0 0 to terminate) 0 0 Thus ) *, *, 4 as produced by usenb, to roundoff error Arbitrary Unknown Variance Case A Two-Stage Procedure We now consider the more-realistic heteroscedastic problem in which the vari- are assumed to be completely arbitrary, i.e., unknown and not necessarily equal. This is clearly the situation that the experimenter would most likely encounter in practice. For example, this case covers a variety of problems that arise in comparing competing simulated systems. The most widely used procedures are those based on the work of Dudewicz and Dalal 1975) and Rinott 1978). These procedures are often carried out in two stages the first stage is meant to provide variances estimates; these estimates are then used to determine the number of observations that will be needed in the second stage to achieve the probability requirement. Rinott s procedure, presented next, is a bit easier to apply than that of Dudewicz and Dalal. ances 8

9 * 3 Selection Procedure Fix a common number of observations $ treatment. For the given and desired # that solves 2.5) where $ %. to be taken in Stage 1 from each, find the constant Stage 1 Take a random sample of $ observations treatment. $ from Stage 2 Calculate the sample means and variances based on the initial $ observations, $ $ and % $, an unbiased estimate of based on $ &% degrees of freedom d.f.). Take % $ additional observations from treatment where % $ if $, 4 % $ if * $ + Calculate based on the combined results of the Stage 1 and Stage 2 samples. Select the treatment associated with the largest sample mean over both stages,, as having a -near-best mean. The constant is the solution to ) #$ % 6, 2.5) where 6 is the standard normal c.d.f. and & is the p.d.f. of the ' -distribution with d.f. The FORTRAN program rinott in Bechhofer et al. 1995) calculates values of for arbitrary, or one can use the table in Wilcox 1984). Note that Procedure does not eliminate treatments at the end of Stage 1, but instead uses the Stage 1 data from each treatment to estimate that treatment s variance and then uses the combined sample data to make a final decision. A second characteristic of is that one cannot bound a priori the total number of observations that will require; intuitively this is the result of having no initial information about the treatment variances. Rather, the number of Stage 2 and total) observations that takes from each treatment is a random variable, 9

10 * possibly different for each treatment. The quantity shows that stochastically) the larger the variance for a particular treatment, the greater the number of Stage 2 observations one must take, as is to be expected intuitively. Example 2.2 The Rinott procedure is particulary useful for analyzing the output from simulation data. Here each treatment corresponds to a system that is being studied in the simulation experiment. The following case study was first described in Goldsman, Nelson and Schmeiser 1991). The goal is to evaluate different airline-reservation systems. The measure of performance is the expected time to failure, E[TTF], with the larger the value of this quantity the better. The system works if either of two computers works. Computer failures are rare, repair times are fast, and the resulting E[TTF] is large. The four systems arise from variations in parameters affecting the timeto-failure and time-to-repair distributions. From experience it is known that the E[TTF] s are roughly **, *** minutes about days) for all four systems. Suppose that we are indifferent to expected differences of less than minutes *** about two days). The large E[TTF] s, the highly variable nature of rare failures, the similarity of the systems, and as it turns out) the relatively small indifference zone of minutes yield a problem with reasonably large computational costs. We denote the E[TTF] arising from system by ), and the associated ordered s by The s, s, and their pairings are completely unknown. Since larger E[TTF] is preferred, the mean difference between the two best systems in this example is % ). For this example, *** the smallest difference worth detecting is minutes. We shall also use a desired probability of correct selection of *,* in this study. Let, ) denote the observed time to failure from the th independent simulation replication of system. Application of the Rinott Procedure requires i.i.d. normal observations from each system. If each simulation replication is initialized from a particular system under the same operating conditions, but with independent random number seeds, the resulting will be i.i.d. for each system. However, the cannot be justified as being normally distributed and, in fact, are somewhat skewed to the right. Thus, instead of using the raw, Goldsman et al. 1991) ap- in Procedure plied the procedure to the so-called macroreplication estimators of the. These into disjoint batches and use the batch averages estimators group the as the data to which Procedure is applied. In particular, they fix an integer number of simulation replications that comprise each macroreplication that is, *** 10

11 * * * * * * * Table 1 Summary of First- and Second-Stage Results for the Airline Reservation Experiment Based on an Initial Common Cample of Size $, where is the Estimated Standard Error of the Sample Mean $ is the batch size) and let, ) where - is the number of macroreplications to be taken from system. The macroreplication estimators from the th system, - -, are i.i.d. with expectation. If least is sufficiently large, say at, then the Central Limit Theorem yields approximate normality for each. No assumptions are made concerning the variances of the macroreplications. To apply Procedure, the authors conducted a pilot study to serve as the first stage of the procedure; each system was run for $ macroreplications with each macroreplication consisting of the averages of simulations of the system. The results are summarized in the first two rows of Table 1. Shapiro- Wilks tests on the macroreplications from each system showed no evidence of non-normality of the *. from each system.) From the appropriate table or from program rinott), we find that the critical constant for and *,* is. The total sample sizes were computed for each system and are displayed in row 3 of Table 1. For example, System 2 required an additional % * macroreplications in the second stage each macroreplication again being the average of system simulations). In all, a total of about 40,000 simulations of the four systems were required to implement Procedure. The combined sample means for each system are listed in row 4 of Table 1, and the standard errors of each mean are listed in row 5. They clearly 11

12 establish System 1 as having the largest E[TTF] Arbitrary Unknown Variance Case A Sequential Procedure For the same case as in Subsection 2.2.2, we present a multi-stage procedure due to Kim and Nelson 2000). After an initial stage of sampling similar to that of Rinott 1978), the Kim and Nelson procedure samples observations one-at-a-time from each contending treatment until a certain stopping criterion is met. The procedure has the advantage that it can eliminate treatments that appear to be noncompetitive with the best ones. Hence, if the experimenter is willing to use a sequential procedure, Kim and Nelson is often more efficient than Rinott s. So why would one ever want to use the Rinott procedure instead of Nelson and Kim? If each stage of sampling requires a unit of time, then logistic constraints may prevent the experimenter from sampling as the Nelson and Kim procedure requires. This is more often the case when conducting physical experiments than computer experiments. 12

13 Selection Procedure to be taken in the first stage. Initialization Fix a number of observations $ For the given and specified /, calculate the constant % % % #$ Further, set and let $ %. Stage 1 Take a random sample of $ observations treatment $ observations, $ $. For all from. For treatment compute the sample mean based on the, compute the sample variance of the difference between treatments and, and set where 0 set $ % % % $ % $ denotes the greatest integer less than or equal to 0. Finally, for all If $ * mean $ counter $ and go to the Screening phase of the procedure. Screening Set where, then stop and select the treatment with the largest sample as one having a / -near-best mean. Otherwise, set the sequential 1 and Stopping Rule If % &% and re-set $ % *,, % $# for all 1 $, then stop and select the treatment with index in as having a -near-best mean. If % &% *, take one additional observation ' from each treatment 1. Increment if. If associated with the largest having index and go to the screening stage, then stop and select the treatment

14 The Kim and Nelson Procedure is somewhat more complicated to implement than the Rinott Procedure, but it has several distinct advantages. First, once the initial set of $ observations is collected from each treatment, is parsimonious in taking additional observations in that they are added one-at-a-time and the data is examined to determine if sufficient information has been collected to stop. In contrast, takes potentially large) groups of observations. Second, allows treatments to be discarded before the final decision; those treatments that appear inferior can legitimately be dropped from further consideration. 3 Single-Factor Experiments to Screen Treatments Based on Means 3.1 Introduction Screening is concerned with the problem of identifying a subset of candidate treatments containing the best treatment from a set of potential candidates. As motivation, consider the data given in Table 2 on the mean amounts of fat absorbed by batches of doughnuts during cooking using one of four brands of shortening A, B, C, D). These data are from a completely randomized experiment conducted Table 2 Grams of Fat Absorbed per Batch for Four Brands of Shortening Brand of Shortening Observation A B C D Means by Lowe in 1935 at Iowa State University and analyzed in Snedecor and Cochran 1967) pp ). For each of the four shortenings considered, six batches of doughnuts were prepared. 14

15 Traditional hypothesis testing decides if the mean amounts of fat absorbed by the four brands are equal. Selection procedures seek to identify the brand of shortening for which the mean amount of fat absorbed is minimal or in other problems maximal, depending on the research goal). In this case, the data were not collected using an experimental design that had this objective in mind. This section introduces statistical screening procedures that allow the experimenter to identify a subset of the brands that contains the best or practically equivalent best) brand with at least a given confidence level. For simplicity, we initially explain the philosophy of screening in the context of the same normal theory single-factor experimental setup discussed in Section 2. For this model, we momentarily take the best treatment to be the treatment with the largest mean,. Later, we relax this definition and consider identifying any treatment whose mean is equivalent to from a practical viewpoint, i.e., any treatment associated with a -near-best mean. We also consider more complicated models where screening is natural to use. In the one-way layout, screening procedures are concerned with identifying a subset of the test treatments that is guaranteed to contain the best treatment with high probability; these procedures are usually called subset selection procedures in the statistical literature. More accurately, the procedures discussed in this section eliminate screen out ) inferior treatments those that exhibit strong evidence of having small -values) and retain all other treatments. To make clear the contrast between screening and the Bechhofer-Fabian selection formulation of Section 2, the latter is ordinarily employed when designing experiments; the choice of the sample size $ is central to guarantee the probability requirement. Screening methods can be employed when analyzing results from an experiment with arbitrary sample sizes. However, there is no free lunch here and the consequence of using too small an $ may be that a large subset is selected. Thus we describe two methods of choosing $ when the choice of sample size is under control of the experimenter. Lastly, we note that unlike the Bechhofer-Fabian probability requirement 2.2), it will be seen that the subset selection probability requirement 3.1) can be guaranteed using a single-stage procedure even when the common variance is unknown. In a more comprehensive approach, after the inferior treatments have been eliminated, the selected treatments can be subjected to further studies. For example, it might be desired to identify a # -near-best treatment using two stages of sampling screening after the first stage to eliminate obviously inferior treatments, followed by additional sampling of the remaining treatments to select a single best treatment so that the overall design requirement of Bechhofer-Fabian 15

16 is satisfied. For common known measurement error variance this can be accomplished by using, for example, the two-stage procedures of Tamhane and Bechhofer 1977) and Tamhane and Bechhofer 1979) which uses the Gupta procedure for screening) and Santner and Behaxeteguy 1992) which uses the Gupta and Santner Procedure, below, for screening) or the general procedure of Nelson, Swann, Goldsman and Song 2000). For the moment, we wish to attain the following goal. Goal 3.1 Select a random-size) subset that contains the treatment associated with. Confidence Requirement Given with for all, we require that 3.1) and any unknown, where CS denotes the event that the subset of treatments selected includes the treatment associated with. In Subsection 3.2 we illustrate the philosophy of screening procedures to attain Goal 3.1 for the simplest case that of the one-way layout with common known or unknown variance and equal numbers of observations from all treatments. As usual, screening for the treatment having smallest mean can be accomplished by applying the procedures given in this section to the negatives of the original data. In Subsection 3.3, extensions of the basic method will be given that allow additional flexibility in the specification of the best treatment. Subsection 3.4 discusses screening procedures that control the number of treatments selected by the subset. 3.2 Screening for the Best Treatment A Screening Procedure for the Balanced One-Way Layout Gupta 1956) and Gupta 1965) proposed screening procedures for the balanced one-way layout in 2.1) when the variances are homogeneous and the common value is either known or unknown. We emphasize that statistical screening procedures can be applied to data obtained either from designed experiments or from observational studies. 16

17 $ $ $ * Screening Procedure Calculate the sample means $ ) from the treatments. Let denote their ordered values. If is unknown also calculate % the unbiased pooled estimate of based on Case 1 & $ % d.f. Known) Include treatment in the selected subset if ) % where is the one-sided upper- % equicorrelated multivariate normal distribution defined by 2.4) or 8.1). Case 2 Unknown) Include treatment in the selected subset if % 3.2) equicoordinate point of the 3.3) where ) is the one-sided upper- % equicoordinate point of the equicorrelated multivariate central -distribution defined by 8.2). The constant ) required by Procedure case can be computed using the program find-zt as illustrated in Example 2.1. for the unknown Example 3.1 For the fat absorption data in Table 2, the sample variance, pooled **, over the four brands, is based on % is unknown, the constant should be used in. Notice that, for this example, the best brand is that having the lowest mean amount of fat absorption. Therefore, the analogue of Procedure for selecting the treatment associated with the smallest mean selects brand if **, d.f. Since Brands A and D satisfy this criterion and are selected. From another viewpoint, it can be concluded, with confidence, that brands B and C are inferior. 17

18 Intuitively, Procedure works by including in the subset the treatment corresponding to the largest sample mean plus possibly other treatments having sample means sufficiently close to the largest sample mean. The total number of additional treatments selected is not specified ahead of time but is a random variable that depends on the common sample size, the variability of the individual responses, the total number of treatments, and the distribution of the population means relative to one another. Physically, these characteristics are embodied in the length of the yardstick, $ or $ ), for the appropriate critical point. The yardstick is simply a constant times the true standard deviation of the difference between two means or an estimate of this quantity). The yardstick has the appealing feature that its length increases with the noise in the data and the desired confidence guarantee through ), while its length decreases as the sample size or the number of treatments increases again, through ). The number of additional treatments selected also depends on the relative spacing among the true population means; if the true means are close together, more treatments will tend to be selected than if the treatments are separated. The fact that screening procedures select a random number of treatments suggests that these procedures have some features in common with confidence intervals. In favor of this analogy, consider the confidence interval for the mean of a normal distribution with known variance, i.e., $, where is the sample mean of the data and is the upper ** % % percentile point of the standard normal distribution. This interval adjusts its center and, in the analogous case when is unknown, its width) to exactly achieve the nominal coverage for all. The width of this interval depends on the noise or through the sample standard deviation when is unknown) and on the sample size $. However, the analogy with confidence intervals is not perfect because screening procedures do not typically achieve their nominal confidence level exactly for all parameter configurations. It is true that and unknown, but in addition, whenever ) as $ for all. This fact shows that is conservative its achieved level is at least equal its nominal level but can be arbitrarily close to unity. The same is true of all screening procedures. 18

19 % Sample Size Determination for Screening It is natural to consider criteria for choosing the sample size when an experimenter has some) control over the sample size. We mention two criteria for choosing $ based on the number of treatments,, selected by the screening procedure. Of course is random with 1. The most informative possible outcome is and the least informative is. For any fixed and with ), it can be shown that as $ This fact suggests that the experimenter could choose $ to control the expected number of treatments in the selected subset, i.e., given or ) and * *, choose $ so that. Example 3.2 To illustrate this first strategy, consider using Procedure with *,* when and * is known. Table 3 lists the expected number and proportion of treatments selected under the slippage mean configuration,, and the equi-spaced mean configuration, *, *, *, *, %. Table 3 Expected Number and Proportion of Treatments Selected Under the Slippage and Equi-spaced Configurations when * * and $ Notice the extent to which the operating characteristics of depend on the true underlying mean. The screening Procedure requires over 100 observations to reduce the number of treatments selected to under 2/3 of the original number if the means are as tightly clustered as in the slippage configuration. ** On the other hand, $ observations per treatment screens out roughly 90% 19

20 of the treatments when the components of are as spread out as the equi-spaced configuration. These two extremes give the investigator guidance as to the choice of sample size. All computations in Example 3.2 were performed using the FORTRAN program eval-ng from Bechhofer et al. 1995) that uses simulation to estimate these expected values. The following dialogue illustrates the use of this program * for $. >eval-ng ENTER T AND SIGMA ENTER NUMBER OF SIMULATION REPS AND INTEGER SEED ENTER N, DELTA AND P-STAR K P{S.GEQ.K SC} P{S.GEQ.K ES} E{S SC} = E{S ES} = As can be seen in Example 3.2, the program eval-ng also provides a second criterion for choosing $. We choose $ to force the probability that the procedure 20

21 $ $ selects more than a specified number of treatments to be small, i.e., give a true configuration small. and some, choose $ to make Example 3.2 continued) Table 4 lists the tail probabilities for Procedure when and the means are either in the slippage or equi-spaced configurations defined above for several runs of eval-ng. Notice that under the very spread.* Table 4 Tail Probabilities for when $ and ** Based on when Under the Slippage and Equi-spaced Configurations.* ** out) equi-spaced configuration, the chance that the subset contains 4 or fewer of the treatments is over 97%. In contrast, under the very pessimistic) slippage configuration, there still is about a 10% chance of selecting all treatments, a worst-case scenario. The experimenter can identify some intermediate configuration that is more realistic and require that be sufficiently small for some choice of. 21

22 3.2.3 Other One-Factor Experiments The operation of Procedure is common to all screening procedures. Every screening procedure estimates the characterizing quantity of interest for all treatments and bases its decisions on this estimate; treatments whose estimated characteristics are sufficiently distant from the best estimated levels of the quantity are eliminated. These general features are illustrated, for example, in the screening procedures that we reference for two frequently occurring cases related to the balanced oneway layout. First, a number of screening procedures have been proposed for the unbalanced one-way layout, a situation often arising in observational studies. See Section of Bechhofer et al. 1995) for a recommendation of a particular screening procedure.) A second important practical situation where screening procedures are well documented is when the experiment consists of a randomized complete block design so that the observations have the structure +*, where the measurement errors are i.i.d., the s are treatment effects, and the s are additive block effects. This model, with batches viewed as block effects, more aptly describes the motivating doughnut example than does the oneway layout. See Section of Bechhofer et al. 1995) or Section 3.4 of Gupta and Panchapakesan 1996) who present screening procedures for this situation. Also see Driessen 1992) and Dourleijn 1993) for screening procedures involving general connected designs. 3.3 Screening for -Near-Best Treatments This section introduces additional flexibility in screening by considering the alternative definition of best treatment that was introduced in Section 2, namely, identification of a subset containing at least one near-best treatment. Recall that the idea of this formulation is that the experimenter is willing to consider a treatment as equivalent to the best treatment if its mean is sufficiently close to the mean of the best treatment. Recall that treatment is # -near-best if is within a specified amount * * of the largest treatment mean, i.e., %. The experimental goal and the associated probability design) requirement considered in this section are stated in Goal 3.2 and Equation 3.4), respectively. Goal 3.2 To select a random-size) subset that contains at least one treatment whose mean satisfies * %. 22

23 $ Confidence Requirement Given * * that and all unknown, where CS denotes the event of correctly achiev- for all ing Goal 3.2). and with, CS, we require 3.4) With the modified constant defined in 3.5) below, the basic procedure of Section 3.2 will guarantee probability requirement 3.4) see Panchapakesan and Santner 1977), van der Laan 1992), and Roth 1978)). The Procedure presented next is appropriate for the case where there is a common known measurement error variance. In this case the probability 3.4) depends on. Screening Procedure $ ). Let denote the ordered sample means. Calculate the sample means Include treatment in the selected subset if % where, ) % $ 3.5) ) and is the one-sided upper- % equicoordinate point of the equicorrelated multivariate normal distribution defined by 2.4) or 8.1). The minimum value of configuration, CS over all vectors occurs in the slippage ) %, 3.6) When 3.6) holds, there is only one # -near-best treatment, namely, the treatment associated with mean ; hence the procedure is designed in such a way that this treatment is selected with probability at least. Intuitively, it is easier to select a / -near-best treatment than to select the treatment associated with the best treatment). Thus it should be no surprise that 23

24 Table 5 Probability that Procedure *, +*,-*,,, and Probability 0.90 $ * Chooses the Best Treatment when is Chosen to Guarantee / CS with * Procedure uses a shorter yardstick to achieve Goal 3.2 than the best treatment. Mathematically, ) of %, $ ), used to select As an illustration, suppose that the constant *,*. Table 5 from van der Laan 1992)) lists the probability that Procedure selects the treatment with mean * when and the means are in * the slippage configuration 3.6). For example, when $ and - is defined by 3.5) for *.-*,*., then the achieved probability of selecting the treatment having mean * is When is modified for, then the achieved probability of selecting the treatment with mean is Therefore, by relaxing the requirement of selecting the best treatment to that of selecting a -near-best treatment, Procedure increases the probability of correct selection from 0.67 for finding the treatment with mean ) to 0.90 for finding a treatment with mean % ). When the measurement error variance is unknown, Santner 1976) and # of is chosen to guarantee CS Panchapakesan and Santner 1977) presented two-stage screening procedures for the goals of selecting a subset of treatments containing all - -near-best treatments and at least one / -near-best treatment, respectively. 3.4 Bounding the Number of Treatments Selected This section discusses screening procedures for finding - -near-best treatments that bound the maximum number of treatments that the screening procedures select. Recall that Procedure can select all treatments; it is this feature of the 24

25 6 6 $ $ $ $ 6 that we modify in Procedure below. For the one-way layout with known measurement error, Gupta and Santner 1973) introduced a procedure that selects a random number of treatments subject to an a priori specified upper bound, say ), on the number of treatments selected. Initially, they stressed a formulation with a goal and confidence requirement that selects a random-size subset of size or less, that contains the best treatment with a probability at least whenever satisfies the condition % ). However, their procedure achieves the stronger guarantee CS for all, where CS denotes the event of correctly achieving Goal 3.2 Hooper and Santner 1979)). To illustrate the philosophy of this approach, we present their screening procedure. Screening Procedure denote the ordered sample means. Calculate the sample means +. Let Include treatment in the selected subset if ) % $ where solves Equation 3.7) below. As usual, the width of the yardstick defining Procedure, is a multiple of the standard error of the difference between pairs of sample means, namely, $. The constant is the solution of ). 6 % where % % * )

26 is the incomplete beta function, and is the gamma function. Procedure 0 % 0 * *. can not be implemented for all,,,, and. Intuitively, this is explained by the fact that for the given the probability must be attainable for the fixed subset procedure that chooses the largest sample means ) in ). If it is, then one can choose a finite to permit a possibly smaller, random-size subset of size at most to attain that. In practice, for given, measurement error standard deviation, and confidence level, one might be interested in finding the sample size $ required by a given rule defined by ) to achieve a given level of practical equivalence defined by ). the yardstick component if it exists) required to achieve a given level of practical equivalence defined by # ) for a given sample size $. the level of practical equivalence / if it exists) corresponding to a given sample size $ and yardstick component. The FORTRAN program use-gs, available as part of the package of programs with Bechhofer et al. 1995) at tjs, can solve all three of these problems. We illustrate this program with an artificial example. Example 3.3 In a study of fertilizing methods, an experimenter measures the yields on a set of 10 test plots planted with each fertilizer. Let denote the yield on the th test plot that employs the th fertilizing method, and let denote the mean of, * ). We assume that the yields are independent and normally distributed with a common known standard deviation of bushels per test plot. Suppose that the experimenter wishes to select a subset containing any fertilizing method whose mean yield is within 2 bushels of the fertilizing method having the largest mean yield, that is, associated with any treatment having % %. Further assume that the experimenter would like the selected subset to contain no more than treatments. If the experimenter wishes to guarantee this goal with probability at least 0.90 using Procedure, then program use-gs gives as the required constant as the following dialogue *, shows. 26

27 > use-gs ENTER T,Q,SIGMA AND P-STAR CHOICE GIVEN FIND 1 N, H DELTA-STAR 2 DELTA-STAR, H N 3 DELTA-STAR, N H ENTER CHOICE 1, 2, OR 3) 3 ENTER DELTA-STAR AND N FOR T, Q, SIGMA, N, DELTA*) = 5, 3, 3.000, 10, 2.000) THE CRITICAL VALUE H IS DO YOU WANT TO CONTINUE? PLEASE ANSWER Y/N n The rule selects all those treatments satisfying % % +*, % *, $ * The statistical screening Procedure requires that the measurement error variance be known. Sullivan and Wilson 1984) and Sullivan and Wilson 1989) devised a bounded subset selection procedure for selecting a -near-best normal treatment when the treatments have unknown and not necessarily equal variances. They also proposed a procedure to allow selection of a # -near-best normal treatment when the data from the treatments come from stationary normal processes with unknown means and unknown covariance structures based on correlated sampling within each process. Santner 1975) presents bounded screening procedures for a number of other parametric families. 27

28 ' ' 4 Selection and Screening for Completely Randomized Experiments Factorial experiments are a mainstay of industrial statistics. Many of the selection and screening goals for multi-factor experiments are direct generalizations of those for single-factor experiments. The structure of factorial experiments also yields several useful new goals such as screening for the most robust product design or manufacturing process design. The randomization scheme involved in a factorial experiment plays an important role in formulating the statistical models. This section briefly mentions some basic issues for completely randomized factorial experiments. These issues are also relevant to randomization restricted factorial experiments such as split-plot experiments discussed in Section 6. The specific goals and statistical procedures for multi-factor experiments depend on the structure of the underlying mean model. For specificity consider a two-factor experiment where the row) factor R has levels and the column) factor C has levels. Let - - be the observations from treatment combination with an unknown mean, ). The treatment means always can be decomposed as 5 subject to the identifiability conditions 5 * and * The model is additive if * for all and, i.e., 5 4.1) and is non-additive otherwise. In practice, knowledge that the means are additive could be based on previous experience or knowledge of the factors being studied. When preliminary data are available, hypothesis tests for interactions can be conducted to facilitate the choice of a model. Strictly speaking, exact additive models are rare in practice; however, non-additive models with small or negligible interactions are not uncommon. In these situations, additive models can provide good approximations to the true models and can obtain increased efficiencies while maintaining the validity of the analyses. Note that the null hypothesis of no interactions can be rejected based 28

29 on large samples even if the interaction effects are not large enough to invalidate conclusions based on the additivity assumption. Suppose additivity holds. Denote the ordered values of the unknown s and s by 7 7 7& ' and 7 7 7&, respectively. The levels of factors R and C corresponding to the treatment combination having the largest mean,, are ' and, respectively. Therefore, under an additive model, selecting the best treatment combination is equivalent to simultaneously selecting the best levels of factors R and C. Chapter 6 of Bechhofer et al. 1995) provides details for several selection and screening procedures under additivity when completely randomized experiments can be run. The discussions there cover procedures that use single- and multi-stage sampling, as well as having other features. Suppose non-additivity holds for the means. Many of the procedures for single-factor experiments can be extended naturally to handle such situations. For example, to select or screen for the treatment combination associated with, one can treat the two-factor experiment as a giant single-factor experiment with treatments and apply the procedures discussed in the previous sections. Again, see Chapter 6 of Bechhofer et al. 1995) for details and references. The models for completely randomized factorial experiments can be viewed as special cases of those for factorial experiments with randomization restrictions. The procedures for split-plot designs presented in Section 6 can be applied to completely randomized experiments by setting the variances of the randomization restriction errors to zero. 5 Screening to Identify Robust Products Based on Fractional-Factorial Experiments This section considers screening procedures for the important industrial applications of determining robust product designs and robust manufacturing processes. In many quality improvement experiments, two types of factors are investigated. The first are control factors that can be manipulated by the experimenter to form various product or process) designs; control factors are also called design or engineering or manufacturing factors. The second type of factors are noise factors that represent uncontrollable field or factory conditions affecting the performance or the manufacture of a product; noise factors are also called environmental factors. Because of the similarity of process improvement and product improvement, we 29

30 will use the language of product improvement for simplicity although we keep in mind that these methods apply to both settings. In quantifying robustness, we consider applications where it is appropriate to measure the performance or quality of a product by its worst performance under the different environments. This criterion is natural in situations where a low response at any combination of the levels of the noise factors can have potentially serious consequences. For example, engineering designs for seat belts or heart valves that fail catastrophically under rare, though non-negligible, sets of operating conditions must be eliminated early in the product design cycle. One useful goal in such applications is to identify product designs whose worst-case performances are among the best. In a sense, this is similar to the larger-the-better criterion in robust design.) This section introduces a screening procedure to facilitate such identification. The procedure determines a random-sized subset of the product designs so as to contain, with a prespecified confidence level, the product designs having the best worst-case performances. In more detail, suppose that the mean of a response that characterizes the output quality depends on control factors and noise factors. In our notation, and denote the number of control and noise factors that interact with each other, respectively, while is the number of control factors having no interactions with noise factors, and is the number of noise factors having no interactions with control factors. Of special practical importance is the frequently occurring case when all factors are at two levels; however, nothing in the development below requires this assumption. We introduce the following notation to distinguish these two types of control and noise factors. C Notation C C C Interpretation Control Factors that interact with Noise Factors Control Factors that do not interact with Noise Factors N N N ' Noise Factors that interact with Control Factors N Noise Factors that do not interact with Control Factors The superscripts and are mnemonic; in C indicates that the th control factor of this type interacts with one or more noise factors; the in C indicates that th control factor of this type interacts with no noise factors. A parallel 30

Theory of Screening Procedures to Identify Robust Product Designs Using Fractional Factorial Experiments

Theory of Screening Procedures to Identify Robust Product Designs Using Fractional Factorial Experiments Guohua Pan Biostatistics and Statistical Reporting Novartis Pharmaceuticals Corporation East Hanover,