DOE Wizard Screening Designs

Similar documents
How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Box-Cox Transformations

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

Item Reliability Analysis

Polynomial Regression

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Multivariate T-Squared Control Chart

Ratio of Polynomials Fit One Variable

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp

Experimental design (DOE) - Design

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Principal Components. Summary. Sample StatFolio: pca.sgp

Ratio of Polynomials Fit Many Variables

TMA4267 Linear Statistical Models V2017 (L19)

Multiple Variable Analysis

Taguchi Method and Robust Design: Tutorial and Guideline

Distribution Fitting (Censored Data)

Correspondence Analysis

An area chart emphasizes the trend of each value over time. An area chart also shows the relationship of parts to a whole.

Design of Engineering Experiments Part 5 The 2 k Factorial Design

Objective Experiments Glossary of Statistical Terms

Analysis of Covariance (ANCOVA) with Two Groups

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.

y response variable x 1, x 2,, x k -- a set of explanatory variables

LOOKING FOR RELATIONSHIPS

1 Introduction to Minitab

MULTIPLE LINEAR REGRESSION IN MINITAB

Using SPSS for One Way Analysis of Variance

Design and Analysis of Experiments Prof. Jhareshwar Maiti Department of Industrial and Systems Engineering Indian Institute of Technology, Kharagpur

Introduction to Regression

Unit 10: Simple Linear Regression and Correlation

RESPONSE SURFACE MODELLING, RSM

3.4. A computer ANOVA output is shown below. Fill in the blanks. You may give bounds on the P-value.

Using Microsoft Excel

Diagnostics and Remedial Measures

Fractional Polynomial Regression

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

Computer simulation of radioactive decay

LEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT

TWO-LEVEL FACTORIAL EXPERIMENTS: IRREGULAR FRACTIONS

Answer Keys to Homework#10

Assignment 9 Answer Keys

Strategy of Experimentation II

The 2 k Factorial Design. Dr. Mohammad Abuhaiba 1

MATH602: APPLIED STATISTICS

Chapter 19: Logistic regression

2016 Stat-Ease, Inc.

1 Correlation and Inference from Regression

Unreplicated 2 k Factorial Designs

Automatic Forecasting

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7: Figure 8: Figure 9: Figure 10: Figure 11: Figure 12: Figure 13:

Session 3 Fractional Factorial Designs 4

Passing-Bablok Regression for Method Comparison

Using Tables and Graphing Calculators in Math 11

Decision 411: Class 3

1 A Review of Correlation and Regression

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Example name. Subgroups analysis, Regression. Synopsis

Two Correlated Proportions Non- Inferiority, Superiority, and Equivalence Tests

Unit 9: Confounding and Fractional Factorial Designs

Analysis of 2x2 Cross-Over Designs using T-Tests

SPSS LAB FILE 1

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

Daniel Boduszek University of Huddersfield

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

One-Way Repeated Measures Contrasts

Chapter 1 Statistical Inference

Fractional Factorial Designs

CHAPTER 6 A STUDY ON DISC BRAKE SQUEAL USING DESIGN OF EXPERIMENTS

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

2 Prediction and Analysis of Variance

Confidence Intervals for One-Way Repeated Measures Contrasts

41. Sim Reactions Example

Geog 210C Spring 2011 Lab 6. Geostatistics in ArcMap

The Model Building Process Part I: Checking Model Assumptions Best Practice

Hotelling s One- Sample T2

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

8 RESPONSE SURFACE DESIGNS

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

ISIS/Draw "Quick Start"

Canonical Correlations

General Linear Models (GLM) for Fixed Factors

Six Sigma Black Belt Study Guides

Seasonal Adjustment using X-13ARIMA-SEATS

One-Way Analysis of Covariance (ANCOVA)

Chapter 13: Analysis of variance for two-way classifications

Design of Experiments SUTD - 21/4/2015 1

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Mixed Models No Repeated Measures

Quantification of JEOL XPS Spectra from SpecSurf

Chapter 6 The 2 k Factorial Design Solutions

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Transcription:

DOE Wizard Screening Designs Revised: 10/10/2017 Summary... 1 Example... 2 Design Creation... 3 Design Properties... 13 Saving the Design File... 16 Analyzing the Results... 17 Statistical Model... 18 Analysis Summary... 21 Pareto Chart... 22 ANOVA Table... 24 Normal Probability Plot of Effects... 27 Analysis Options... 30 Main Effects Plot... 32 Interaction Plot... 34 Regression Coefficients... 36 Correlation Matrix... 36 Response Plots... 37 Predictions... 47 Diagnostic Plots... 49 Optimization... 54 Extrapolation... 57 Summary The Experimental Design section of STATGRAPHICS contains a set of procedures that support the design and analysis of many different types of experiments. These procedures enable the analyst to construct a set of experimental runs that will yield the maximum amount of 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 1

information about a process in the smallest number of trials. In contrast to haphazard experimentation, designed experiments are characterized by systematic manipulation of a process in order to determine the effects attributable to different factors. In the early stages of an investigation, the analyst is often faced with a long list of factors that could affect the process. For example, in a typical chemical process, there could easily be dozens of factors which have an impact on the yield of the process, such as the temperature at which it is run, the amount of catalyst added, the mixing speed, and so on. Since it is difficult to study many factors in detail simultaneously, screening designs have been developed to quickly determine which factors have the greatest impact on a process. This document describes the construction and analysis of designs that are intended to identify the most important factors. After the critical factors are determined, a more complicated experimental design involving a larger set of factor levels may be constructed to find the optimal settings for those factors. Example As an example, a typical screening experiment will be constructed involving 5 factors and 1 response. The example, which involves a chemical reaction, is discussed in Chapter 12 of the well-known book by Box, Hunter and Hunter (2005). The factors that will be varied are: X1: feed rate X2: amount of catalyst X3: agitation rate X4: temperature X5: concentration There is one response variable: Y: percent reacted Sample StatFolio: doewiz screening.sgp 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 2

Design Creation To begin the design creation process, start with an empty StatFolio. Select DOE Experimental Design Wizard to load the DOE Wizard s main window. Then push each button in sequence to create the design. Step #1 Define Responses The first step of the design creation process displays a dialog box used to specify the response variables. For the current example, there is a single response variable: Name: The name for the variable is reacted. Units: Reacted is measured as a percentage. Analyze: The parameter of interest is the mean percent reacted. Goal: The goal of the experiment is to maximize the reacted percentage. Impact: The relative importance of each response (not relevant if only one response). Sensitivity: The importance of being close to the best desired value (in this case, the maximum). Setting Sensitivity to Medium implies that the desirability attributed to the response rises linearly between the Minimum and Maximum values indicated. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 3

Minimum and Maximum: Range of desirable values for the response. Step #2 Define Experimental Factors The second step displays a dialog box on which to specify the factors that will be varied. In the chemical reaction example, there are 5 factors: Name Each factor must be assigned a unique name. Units Units are optional. Type Set the type of each component to Continuous, since they can be set at any value within a continuous interval. Role The role of each component is Controllable. Low - the lower level L j for the factor. High - the upper level U j for the factor. Step #3 Select Design The third step begins by displaying the dialog box shown below: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 4

Since all of the factors are controllable process factors, only one Options button is enabled. Pressing that button displays a second dialog box: Five general classes of designs are offered: 1. Screening - designs intended to select the most important factors affecting a response. Most of the designs involve only 2 levels of each factor. The factors may be quantitative or categorical. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 5

2. Response Surface - designs intended to select the optimal settings of a set of experimental factors. The designs involve at least 3 levels of the experimental factors, which must be quantitative. 3. Multilevel Factorial - designs involving different numbers of levels for each experimental factor. The factors must be quantitative. 4. Orthogonal Array a general class of designs developed by Genichi Taguchi. The factors may be quantitative or categorical. 5. Computer Generated If this type of design is selected, the computer will select a set of runs during Step 5 that are optimal for the model to be fit. If Screening Designs is selected, a third dialog box will be displayed listing all of the screening designs available for 5 experimental factors: Name - the design name, including an abbreviation such as 2^5 if relevant. For screening designs, the following types may appear in the list, depending upon the number of experimental factors: 1. Factorial - includes runs at all combinations of the low and high levels of each factor, for a total of 2 k runs. Such designs are capable of estimating the main effects of all factors and all interactions amongst the factors. 2. Factorial in m blocks - includes the same runs as a full factorial design. However, the runs are divided into blocks, which are groups of runs to be done together (on the same day, or by the same operator, or from the same batch of raw material) to 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 6

eliminate the effect of one or more nuisance factors. As the number of blocks increases, the ability to estimate certain interactions is lost. The Alias Structure table displayed after the design is initially created shows which interactions are confounded with block effects. 3. Half fraction (or quarter, eighth, ) - a subset of the runs in a full factorial design, either one-half of the full 2 k runs, one-fourth of the runs, one-eighth of the runs, or some other regular fraction. The number of runs in the design equals 2 k-p, where p = 1 for a half-fraction, p = 2 for a quarter fraction, p = 3 for an eighth fraction, etc. For such designs, the Resolution field indicates important information about what order of interactions may be estimated by such a design, as described below. As with blocked factorials, the Alias Structure table shows the confounding pattern of the design. 4. Irregular fraction - fractional factorial designs in which the number of runs is not a power of 2. Certain irregular fractions, although not completely orthogonal, have attractive confounding patterns. The designs included here are those described by Haaland (1989). 5. Mixed level fraction - in contrast to all of the other screening designs, these designs allow one factor (factor A) to be run at 3 levels rather than 2. This allows a quadratic effect to be estimated for that factor, which must be quantitative. For the other factors, the runs form a standard fractional factorial design. The designs included are those described by Haaland (1989). 6. Plackett-Burman - two-level designs intended for screening a large number of factors in a small number of runs, where the number of runs is not a power of 2. For example, a design is available for studying 11 factors in 12 runs. Main effects are confounded with 2 factor interactions, so the design should only be used when interactions are either not present or are known to be small. 7. Folded Plackett-Burman - similar to Plackett-Burman designs, except the two-factor interactions are not confounded with main effects. However, two-factor interactions are heavily confounded amongst themselves and cannot be resolved. 8. Definitive screening design two-level designs with centerpoints, capable of estimating both linear and quadratic effects. However, two-factor interactions are confounded with each other and with quadratic terms. Designs with 6 or more factors can fit a full second-order model in any 3 factors, often eliminating the need for follow-up experimentation. 9. Blocked definitive screening design definitive screening designs in which the runs have been divided into 2 or more blocks. 10. User-specified design - allows an empty experiment datasheet to be constructed so that the analyst can enter his or her own runs. This allows the analysis procedures to be executed using a design created elsewhere. The user should be careful, when creating such a design, to enter the proper low and high values for each factor on the 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 7

earlier dialog boxes, since the manner in which the effects are defined during the analysis depends on these low and high settings. Runs - the number of runs in the base design, before adding any additional replicates or centerpoints. Resolution - an indication of the confounding pattern of the design. Designs are classified as having one of the following resolutions: Resolution III: designs which confound the estimates of the main effects with two-factor interactions. Such designs can be safely interpreted only if all two-factor interactions are small or nonexistent. Resolution IV: designs which are capable of obtaining clear estimates of all main effects. However, some or all of the two-factor interactions are confounded with other two-factor interactions or block effects. The Alias Structure table described below indicates where the confounding occurs. Resolution V: designs which are capable of obtaining clear estimates of all main effects and all two-factor interactions. Higher order interactions, however, are confounded with these effects. In most cases, this is not a problem since third-order and higher effects are usually assumed to be small or nonexistent. Resolution V designs are typically excellent selections. Resolution V+: the design has resolution greater than 5, allowing for the estimation of 3- factor or higher order interactions if desired. For blocked designs, an asterisk is shown next to the design resolution to indicate that the stated resolution assumes that blocking factors do not interact with experimental factors, the standard assumption when the analysis is performed. Error d.f. - the number of degrees of freedom from which the experimental error may be estimated remaining after estimating all main effects, second-order interactions, and quadratic effects (if relevant). This is prior to any replication or addition of centerpoints. In general, at least 3 d.f. must be available if the statistical tests to be performed during the analysis are to have reasonable statistical power. Block Size - for a design that is run in more than 1 block, the number of runs in the largest block. For the current example, a 16 run half-fraction will be selected. This design is resolution V, which means it is capable of estimating all main effects and two-factor interactions. However, there are 0 degrees of freedom remaining to estimate the experimental error. In order to do formal statistical testing, additional runs will need to be added to the base design. The final dialog box allows the analyst to add additional runs to the design and to specify the order in which the runs will be performed: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 8

Centerpoints (number) - the number of centerpoints to be added to the base design, which are additional experimental runs located at a point midway between the low and high level of all the factors. Each additional centerpoint adds one degree of freedom from which to estimate experimental error. If the design involves a single categorical factor, the centerpoints will be placed at a middle level of the quantitative factors and divided equally between the two levels of the categorical factor. Centerpoints (placement) - positioning of the centerpoints with respect to the runs in the base design. They may be randomly scattered throughout the other experimental runs, spaced evenly throughout the other runs, or placed at the beginning or end of the experiment. The first two options are usually preferable. Replicate design - if a number other than 0 is entered, the entire design will be repeated the indicated number of times. Randomize - check this box to randomly order the runs in the experiment. Randomization is generally a good idea, since it can reduce the effect of lurking variables such as trends over time. However, when replicating the examples in this documentation, do not randomize the designs. Generate Button - this button displays a dialog box that allows experienced analysts to change the design generators for fractional factorial designs: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 9

In order to generate a 2 k-p fractional factorial design, a full factorial design for k - p factors is first generated. Columns for the additional p factors are then created by multiplying together various combinations of columns of the initial factorial (when expressed in standardized units where the low level equals -1 and the high level equals +1). In the current example, the column for factor E is created by multiplying column A by column B by column C by column D. This is abbreviated as E = ABCD (1) Alternatively, column E could have been created by multiplying together the same four columns and then changing the sign, i.e., E = -ABCD (2) which would result in a different set of 16 runs. For details of these procedures, see Box, Hunter and Hunter (2005). Note that 3 centerpoints have been added to the base design in the current example, resulting in a total of 19 runs and providing 3 degrees of freedom from which to estimate the experimental error. Since the Spaced option has been selected, the three centerpoints will be positioned at the beginning, middle, and end of the experiment. The tentatively selected design is displayed in the Select Design dialog box: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 10

If the design is acceptable, press OK to save it to the STATGRAPHICS DataBook and return to the DOE Wizard s main window, which should now contain a summary of the design: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 11

Step #4: Specify Model Before evaluating the properties of the design, a tentative model must be specified. Pressing the fourth button on the DOE Wizard s toolbar displays a dialog box to make that choice: The default model includes main effects for each of the 5 experimental factors, together with 10 two-factor interactions (shown as two-letter combinations). Selected terms could be excluded by double-clicking on them with the left mouse button. Step #5: Select Runs Since we intend to run all of the runs in the base design, this step can be omitted. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 12

Design Properties Step #6: Evaluate Design Several of the selections presented when pressing button #6 are helpful in evaluating the selected design: Design Worksheet The design worksheet shows the 19 runs that have been created, in the order they are to be run: Worksheet for <untitled> - Chemical reaction screening experiment run feed rate catalyst agitation temperature concentration reacted liters/min % rpm degrees % % 1 12.5 1.5 110.0 160.0 4.5 2 10.0 1.0 100.0 140.0 6.0 3 15.0 1.0 100.0 140.0 3.0 4 10.0 2.0 100.0 140.0 3.0 5 15.0 2.0 100.0 140.0 6.0 6 10.0 1.0 120.0 140.0 3.0 7 15.0 1.0 120.0 140.0 6.0 8 10.0 2.0 120.0 140.0 6.0 9 15.0 2.0 120.0 140.0 3.0 10 12.5 1.5 110.0 160.0 4.5 11 10.0 1.0 100.0 180.0 3.0 12 15.0 1.0 100.0 180.0 6.0 13 10.0 2.0 100.0 180.0 6.0 14 15.0 2.0 100.0 180.0 3.0 15 10.0 1.0 120.0 180.0 6.0 16 15.0 1.0 120.0 180.0 3.0 17 10.0 2.0 120.0 180.0 3.0 18 15.0 2.0 120.0 180.0 6.0 19 12.5 1.5 110.0 160.0 4.5 Note that 3 centerpoints have been added to the 16 runs in the base design, one at the beginning of the experiment, one halfway through, and one at the end. ANOVA Table The ANOVA table shows the breakdown of the degrees of freedom in the design: ANOVA Table Source D.F. Model 15 Total Error 3 Lack-of-fit 1 Pure error 2 Total (corr.) 18 15 of the 18 total degrees of freedom are used to estimate the main effects and two-factor interactions. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 13

Model Coefficients The table of model coefficients is shown below: Model Coefficients Power at Power at Power at Coefficient Standard Error VIF Ri-Squared SN = 0.5 SN = 1.0 SN = 2.0 A 0.25 1.0 0.0 11.13% 28.88% 75.50% B 0.25 1.0 0.0 11.13% 28.88% 75.50% C 0.25 1.0 0.0 11.13% 28.88% 75.50% D 0.25 1.0 0.0 11.13% 28.88% 75.50% E 0.25 1.0 0.0 11.13% 28.88% 75.50% AB 0.25 1.0 0.0 11.13% 28.88% 75.50% AC 0.25 1.0 0.0 11.13% 28.88% 75.50% AD 0.25 1.0 0.0 11.13% 28.88% 75.50% AE 0.25 1.0 0.0 11.13% 28.88% 75.50% BC 0.25 1.0 0.0 11.13% 28.88% 75.50% BD 0.25 1.0 0.0 11.13% 28.88% 75.50% BE 0.25 1.0 0.0 11.13% 28.88% 75.50% CD 0.25 1.0 0.0 11.13% 28.88% 75.50% CE 0.25 1.0 0.0 11.13% 28.88% 75.50% DE 0.25 1.0 0.0 11.13% 28.88% 75.50% alpha = 5.0%, sigma estimated from total error with 3 d.f. Since the design is perfectly orthogonal, all of the variance inflation factors (VIF) are equal to their ideal value of 1.0. The rightmost column shows that there is a 75.5% chance of detecting any effects with a magnitude equal to 2 times the standard deviation of the experimental error. Correlation Matrix The correlation matrix has 0 s in all the off-diagonal locations, showing that the estimates of the main effects and two-factor interactions will all be uncorrelated. Correlation Matrix A B C D E AB AC AD AE BC BD BE CD CE DE A 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 B 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 C 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 D 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 E 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 AB 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 AC 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 AD 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 AE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 BC 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 BD 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 BE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 CD 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 CE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 DE 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 14

Stnd. error Design Points The graph of the design points shows that each pair of factors is run at all combinations of the high and low levels: 15 feed rate 10 2 catalyst 1 120 agitation 100 180 temperature 140 6 concentration 3 Prediction Variance Plot The prediction variance plot shows that the variance of the predicted response will be fairly constant over most of the experimental region: Prediction Variance Plot agitation=110.0,temperature=160.0,concentration=4.5 1 0.8 0.6 0.4 0.2 0 10 11 12 13 14 15 feed rate 1 1.21.4 1.6 1.8 2 catalyst The only location where the variance is relatively high is close to the vertices. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 15

Saving the Design File Step #7: Save experiment Once the experiment has been created and any additional runs entered, it must be saved on disk. Press the button labeled Step 7 and select a name for the experiment file: Design files are extended data files and have the extension.sgx. They include the data together with other information that was entered on the input dialog boxes. To reopen an experiment file, select Open Data Source from the File menu. The data will be loaded into the datasheet, and the Experimental Design Wizard window will be displayed. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 16

Analyzing the Results After the design file has been created and saved, the experiments would be performed. At a later date, once the results have been collected, the experimenter would return to STATGRAPHICS and reopen the saved design file using the Open Data Source selection on the main File menu. The results can then be typed into the response columns. The results for the example are displayed below: run feed rate catalyst agitation temperature concentration reacted (liters/min) (%) (rpm) (degrees) (%) (%) 1 12.5 1.5 110.0 160.0 4.5 65.0 2 10.0 1.0 100.0 140.0 6.0 56.0 3 15.0 1.0 100.0 140.0 3.0 53.0 4 10.0 2.0 100.0 140.0 3.0 63.0 5 15.0 2.0 100.0 140.0 6.0 65.0 6 10.0 1.0 120.0 140.0 3.0 53.0 7 15.0 1.0 120.0 140.0 6.0 55.0 8 10.0 2.0 120.0 140.0 6.0 67.0 9 15.0 2.0 120.0 140.0 3.0 61.0 10 12.5 1.5 110.0 160.0 4.5 67.0 11 10.0 1.0 100.0 180.0 3.0 69.0 12 15.0 1.0 100.0 180.0 6.0 45.0 13 10.0 2.0 100.0 180.0 6.0 78.0 14 15.0 2.0 100.0 180.0 3.0 93.0 15 10.0 1.0 120.0 180.0 6.0 49.0 16 15.0 1.0 120.0 180.0 3.0 60.0 17 10.0 2.0 120.0 180.0 3.0 95.0 18 15.0 2.0 120.0 180.0 6.0 82.0 19 12.5 1.5 110.0 160.0 4.5 63.0 Important Notes: 1. If more than one sample was taken at each set of experimental conditions, the data values should be entered into data tables B through Z. The summary statistics in data table A will then be automatically calculated from the other tables. Do not treat the samples as replicates unless you actually reset the process between each sample. 2. If any experiments were not performed, leave the corresponding cell blank. The program will recognize the imbalance in the design and handle it properly. 3. If any experimental runs were done at conditions different than originally planned, change the entries in the experimental factor columns to correspond to the values that were actually used. 4. If additional runs were performed, you may add them to the bottom of the datasheet. They will be included in the fit. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 17

Statistical Model The statistical model upon which the analysis of screening designs is based expresses the response variable Y as a linear function of the experimental factors, interactions between the factors, and an error term. There are two types of models that are generally fit, illustrated below for 5 experimental factors: 1. First-order model contains terms representing main effects only. Y = 0 + 1 X 1 + 2 X 2 + 3 X 3 + 4 X 4 + 5 X 5 + (3) 2. Second-order model contains terms representing main effects and second-order interactions. Y = 0 + 1 X 1 + 2 X 2 + 3 X 3 + 4 X 4 + 5 X 5 + 12 X 1 X 2 + 13 X 1 X 3 + 14 X 1 X 4 + 15 X 1 X 5 + 23 X 2 X 3 + 24 X 2 X 4 + 25 X 2 X 5 + 34 X 3 X 4 + 35 X 3 X 5 + 45 X 4 X 5 + (4) The experimental error is assumed to be randomly drawn from a normal distribution with a mean of 0 and a standard deviation equal to. In rare cases, interactions amongst 3 or more factors may be included by adding terms consisting of cross-products of more than 2 factors. For a mixed level factorial design that has 3 levels of the first factor, a term such as also be included in the second-order model. 2 11X 1 would For quantitative variables, STATGRAPHICS represents the experimental factor X j using its original values as entered into the datasheet. For categorical factors, indicator variables are used of the form -1 at low level of factor j X j = (5) +1 at high level of factor j where the low and high levels are those defined when the design was constructed. Effects In order to simplify the interpretation of screening designs, it is common to reexpress the above model in terms of effects. The main effect of factor j is defined as the change in the response variable Y when X j is changed from its low level to its high level, with all other factors being held constant midway between their lows and their highs. In a balanced two-level factorial design, the estimated effect of factor j equals the difference between the average response at the high level of the factor and the average response at the low level of the factor: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 18

j y y (6) where y = average response at high level of factor j y = average response at low level of factor j In an unbalanced design, the effect is a more complicated function of the coefficients, but the basic interpretation remains the same. Two-factor interactions may also be defined. In general, a two-factor interaction may be thought of as the additional effect of one factor over and above the main effect when the second factor is held at its high level. In a balanced two-level factorial design, this interaction effect equals where jk j k ˆ y y ˆ ˆ (7) y = average response at the high levels of both factors j and k. and y is the grand average: n yi i y 1 n (8) An additional important characteristic of the effects is that they are all expressed in units of the response variable, so that effects of factors can be compared directly, regardless of the units in which the factors are expressed. Step #8: Analyze data Once the data have been entered, press the button labeled Step #8 on the Experimental Design Wizard toolbar. This will display a dialog box listing each of the response variables: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 19

Response: column containing the response variable to be analyzed. Transformation: the desired transformation to be applied before the model is fit. Power and addend: the transformation parameters if a Power or Box-Cox transformation is selected. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 20

Analysis Summary The analysis of a screening design involves estimating the average or main effect of each experimental factor and interactions between the factors. The Analysis Summary displays information about the estimated effects: Analyze Experiment - reacted File name: chemical reaction.sfx Comment: Chemical reactor example Estimated effects for reacted (%) Effect Estimate Stnd. Error V.I.F. average 65.2105 0.378313 A:feed rate -2.0 0.824515 1.0 B:catalyst 20.5 0.824515 1.0 C:agitation 0.0 0.824515 1.0 D:temperature 12.25 0.824515 1.0 E:concentration -6.25 0.824515 1.0 AB 1.5 0.824515 1.0 AC 0.5 0.824515 1.0 AD -0.75 0.824515 1.0 AE 1.25 0.824515 1.0 BC 1.5 0.824515 1.0 BD 10.75 0.824515 1.0 BE 1.25 0.824515 1.0 CD 0.25 0.824515 1.0 CE 2.25 0.824515 1.0 DE -9.5 0.824515 1.0 Standard errors are based on total error with 3 d.f.. The table shows: Average - the estimated response at the center of the design region. For complete data from most orthogonal designs, this equals the grand average of all the data values y. Estimated main effects - the difference between the response at the high level of a factor and the response at the low level of a factor, when all other factors are held at their central values. Estimated 2-factor interactions - the additional effect of one factor when a second is held at its high level. Interactions occur when the effect of one factor is different at different levels of another factor. Other effects - defined as twice the coefficient associated with the corresponding term in the regression model when all variables are standardized according to: j low j highj high low / 2 X / 2 X j (9) j j Standard errors - Each effect is shown by default with its estimated standard error. The standard errors are measures of the estimation error associated with each effect. The display 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 21

can be changed to show confidence intervals for each effect using the Analysis Options dialog box. V.I.F. variance inflation factors that measure the extent to which any imbalance in the experiment has inflated the variance of the estimated effects. For a perfectly orthogonal design, the factors will equal 1.0. Any values of 10 of greater are usually taken to be a sign that serious correlation exists amongst the estimated effects, which causes the estimates to be much more variable than they would be in a well-designed experiment. Standard errors are based on - an indication of how the experimental error has been estimated, as determined by the Analysis Options dialog box. Pareto Chart The Pareto Chart shows a graphical depiction of each of the effects in the above table. There are two forms of the Pareto chart: a standardized form and an unstandardized form. The unstandardized chart displays the absolute value of the effects in decreasing order: Pareto Chart for reacted B:catalyst D:temperature BD DE E:concentration CE A:feed rate BC AB BE AE AD AC CD C:agitation + - 0 4 8 12 16 20 24 Effect The color of the bars shows whether an effect is positive or negative. From the above plot, it is easy to see that the three most important factors in the example are catalyst, temperature, and concentration. To create a standardized Pareto chart, each effect is converted to a t-statistic by dividing it by its standard error. These standardized effects are then plotted in decreasing order of absolute magnitude: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 22

Standardized Pareto Chart for reacted B:catalyst D:temperature BD DE E:concentration CE A:feed rate BC AB BE AE AD AC CD C:agitation + - 0 5 10 15 20 25 Standardized effect In addition, a line is drawn on the chart beyond which an effect is statistically significant at a specified significance level, usually 5%. In the above chart, the main effects of factors B, D, and E are significant, as are the BD and DE interactions. Noticeably absent from the list are any effects involving factors A and C. Pane Options Standardize: check to plot the standardized effects rather than the absolute effects. Alpha: the significance level corresponding to the vertical line on the chart. Bars extending beyond the line are statistically significant at the significance level. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 23

ANOVA Table To determine the level of significance for each effect, the ANOVA Table may be used: Analysis of Variance for reacted - Chemical reactor example Source Sum of Squares Df Mean Square F-Ratio P-Value A:feed rate 16.0 1 16.0 5.88 0.0937 B:catalyst 1681.0 1 1681.0 618.17 0.0001 C:agitation 0.0 1 0.0 0.00 1.0000 D:temperature 600.25 1 600.25 220.74 0.0007 E:concentration 156.25 1 156.25 57.46 0.0048 AB 9.0 1 9.0 3.31 0.1664 AC 1.0 1 1.0 0.37 0.5870 AD 2.25 1 2.25 0.83 0.4301 AE 6.25 1 6.25 2.30 0.2268 BC 9.0 1 9.0 3.31 0.1664 BD 462.25 1 462.25 169.99 0.0010 BE 6.25 1 6.25 2.30 0.2268 CD 0.25 1 0.25 0.09 0.7815 CE 20.25 1 20.25 7.45 0.0720 DE 361.0 1 361.0 132.75 0.0014 Total error 8.15789 3 2.7193 Total (corr.) 3339.16 18 R-squared = 99.7557 percent R-squared (adjusted for d.f.) = 98.5341 percent Standard Error of Est. = 1.64903 Mean absolute error = 0.254848 Durbin-Watson statistic = 1.37903 (P=0.4687) Lag 1 residual autocorrelation = 0.00827674 The ANOVA partitions the variance of the response into several components: one for each main effect, one for each interaction, and one for the experimental error. The ANOVA table shows: Sum of Squares - the Type III sums of squares attributable to each term in the model. This measures the increase in the variance of the experimental error that would occur if each term was separately removed from the model. The sum of squares for total error is also included, where 2 2 S e ( y y ) (10) error n i1 i n i1 i i e i is the i-th residual, measuring the difference between the observed response for run i and the value predicted by the fitted model. Df - the degrees of freedom associated with each term. Mean Square - the mean square associated with each term, obtained by dividing the associated sum of squares by its degrees of freedom. The mean squared error (MSE) estimates the variance of the experimental error: 2 Serror ˆ MSE (11) df error 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 24

F-Ratio - an F ratio which divides the mean square of an effect by the mean squared error: MSeffect F (12) MSE The F-ratios may be used to determine the statistical significance of each effect. P-Value - the P-Value associated with testing the null hypothesis that the coefficient for a selected effect equals 0, implying that the effect is not present. P-Values below a critical level (such as 0.05 if operating at the 5% significance level) indicate that the corresponding effect is statistically significant at that significance level. R-squared - the percentage of the variability in the response variable that has been accounted for by the fitted model, calculated from R 2 S 1001 S error total % (13) R-squared ranges from 0% to 100% and measures how well the model fits the observed response data. R-squared (adjusted for d.f.) - the adjusted R-squared, which accounts for the number of degrees of freedom in the fitted model. In situations such as the current one where the number of coefficients in the fitted model is large relative to the total number of runs, the ordinary R-squared statistic may overstate the ability of the fitted model to predict the response. The adjusted R-squared compensates for this effect by 2 n 1 S error R adj 1001 % (14) n p Stotal where p is the number of estimated coefficients in the fitted model. Standard error of est. - the estimated standard deviation of the experimental error, given by MSE (15) This value is used when constructing prediction intervals for the response. Mean absolute error - the average of the absolute values of the residuals, given by n ei i MAE 1 n (16) 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 25

This value indicates the average error in predicting the observed response using the fitted model. Durbin-Watson statistic - a statistic calculated from the residuals according to DW n1 i1 ( e e ) i1 n i1 e 2 i i 2 (17) The Durbin-Watson statistic measures serial correlation in the residuals to determine whether there is any dependence between successive observations. In this case, it could detect drifts over the course of the experiment. A small P-value would indicate that the analyst should take a close look at the residuals to look for any trends, which may be done using the Diagnostic Plots graph option. There are five effects in the ANOVA table with P-values below 0.05. These are the same five effects identified as significant on the standardized Pareto chart (the two methods are equivalent). As a whole, the model accounts for at least 98% of the observed variability in the response. It is unnecessarily complicated, however, since many effects are not statistically significant. A later section illustrates how to remove selected effects using Analysis Options. Pane Options Include Lack-of-Fit Test: If checked, a line will be added to the ANOVA table to determine whether the current model adequately represents the observed data. Note: this option has no effect unless there are replicate experimental runs at identical settings of the experimental factors. The resulting ANOVA table is shown below: Analysis of Variance for reacted Source Sum of Squares Df Mean Square F-Ratio P-Value A:feed rate 16.0 1 16.0 4.00 0.1835 B:catalyst 1681.0 1 1681.0 420.25 0.0024 C:agitation 0.0 1 0.0 0.00 1.0000 D:temperature 600.25 1 600.25 150.06 0.0066 E:concentration 156.25 1 156.25 39.06 0.0247 AB 9.0 1 9.0 2.25 0.2724 AC 1.0 1 1.0 0.25 0.6667 AD 2.25 1 2.25 0.56 0.5315 AE 6.25 1 6.25 1.56 0.3377 BC 9.0 1 9.0 2.25 0.2724 BD 462.25 1 462.25 115.56 0.0085 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 26

percentage BE 6.25 1 6.25 1.56 0.3377 CD 0.25 1 0.25 0.06 0.8259 CE 20.25 1 20.25 5.06 0.1534 DE 361.0 1 361.0 90.25 0.0109 Lack-of-fit 0.157895 1 0.157895 0.04 0.8609 Pure error 8.0 2 4.0 Total (corr.) 3339.16 18 Note the lines labeled Lack-of-fit and Pure error, which provide two separate estimates of the experimental error sigma: 1. Pure error: an estimate calculated by pooling the variance within sets of observations at identical levels of X. It is pure in the sense that it estimates the experimental error whether or not the proper model has been selected. 2. Lack-of-fit: an estimate calculated from the deviation between the average response for each group of replicate values and the values predicted by the fitted model. If the model is not correct, this estimates plus a positive quantity that measures the lack-of-fit of the selected model. The P-Value in the lack-of-fit line may be used to test the hypothesis that the current model is adequate. A small P-Value would indicate an inadequate model. In the current example, the P- Value is well above 0.05, so the selected model appears to be adequate. Normal Probability Plot of Effects When the degrees of freedom available for estimating the experimental error is small, the formal F tests conducted in the ANOVA table may not have much power, so that smaller effects will not appear to be significant. On the other hand, testing a large number of effects, each at a 5% significance level, may well generate more significant results than are actually present. A somewhat less rigorous way of judging which effects are real and which are probably just manifestations of noise is through the Normal Probability Plot of Effects: Normal Probability Plot for reacted 99.9 99 95 80 50 20 5 1 0.1-12 -2 8 18 28 Standardized effects 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 27

In this plot, the standardized effects are ordered from smallest to largest and plotted versus quantiles of a normal distribution. Any estimates which are just noise will fall approximately along a straight line. Any estimates which correspond to real signals will lie off the line to the left or right. Two types of normal probability plots are available, a full normal and a half-normal, which may be chosen using Pane Options. Pane Options Plot Type: Select Normal to plot each effect while retaining its positive or negative sign. Select Half-Normal to plot the absolute values of the effects. Direction: Select Horizontal to plot percentages on the horizontal axis or Vertical to plot them on the vertical axis. Fitted Line: If checked, a reference line is added to the plot by fitting a least squares regression line to the smallest 50% of the effects. Label Effects: If checked, the names of the effects are added to the plot. Example: Half-Normal Plot with Effect Labels 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 28

Standard deviations Half-Normal Plot for reacted 2.4 2 1.6 1.2 0.8 0.4 0 CE A:feed rate AE AB BC CD ACADBE C:agitation D:temperature BD DE E:concentration 0 5 10 15 20 25 Standardized effects B:catalyst This plot has the advantage that all signals fall to the right of the noise line. The above plot confirms the conclusion that 5 significant effects are present. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 29

Analysis Options The mathematical model currently being used to fit the data contains 5 main effects and 10 twofactor interactions. As seen above, many of these terms are not statistically significant. When building empirical models based solely on observed data, it is important to keep the models as simple as possible, since simple models tend to be easier to interpret and have a better chance of extrapolating to other combinations of the experimental factors. In accordance with the principle of parsimony or K.I.S.S. (Keep It Simple Statistically), insignificant effects should be removed from the model according to the following rules: 1. Remove any insignificant two-factor interactions (or other second-order effects). 2. Remove any insignificant main effects that are not involved in significant interactions. Note that main effects corresponding to factors that are involved in significant interactions should not normally be removed even if those main effects are not significant, since doing so would place artificial constraints on the underlying polynomial models. To eliminate effects from the model, select Analysis Options: Maximum Order Effect - the maximum order effect to be included in the model. Set to 2 by default to request fitting of both main effects and 2-factor interactions. If set to 1, only main effects will be estimated. Ignore Block Numbers - for designs that contain more than 1 block, indicates whether block effects should be estimated or ignored. Note that column 1 of the datasheet for any experiment file contains block numbers corresponding to each row. Estimate Sigma From - whether the standard deviation of the experimental error is to be estimated from the experimental error, or whether the analyst will provide a known value. If a known value is provided, the statistical tests and confidence intervals will use that value. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 30

Display - affects the output displayed for each effect after the on the Analysis Summary. Confounding Pattern - specifies how the procedure determines which effects to estimate when fitting the model. The choices are: 1. From Original Design - examines the confounding pattern of the original design to determine which effects can be estimated. For example, in a resolution IV design, the program will estimate specific combinations of the two-factor interactions. This is the default choice and is appropriate in all but very special circumstances. 2. From Data - examines the X matrix of all runs performed to determine which effects can be estimated. This may be desirable when the analyst has added additional runs to the base design to clear certain interactions that would otherwise be confounded. Since the program attempts to estimate all effects when this option is chosen, the Exclude dialog box may have to be used to tell the program exactly which effects are to be estimated. Exclude - when pressed, generates the dialog box shown below: Effects can be excluded from the model by double-clicking on them one at a time. Doubleclicking on an effect in either of the two columns moves it to the other column. In the current example, all effects other than the 5 that appeared to be statistically significant would be removed. The standardized Pareto chart for the new model shows the remaining effects: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 31

reacted Standardized Pareto Chart for reacted B:catalyst + D:temperature - BD DE E:concentration 0 4 8 12 16 Standardized effect Main Effects Plot Once a suitable model has been fit and checked, the results must be displayed in a manner that is understandable by all involved. Since it is often difficult to gain insights by looking at a mathematical equation, various plots are provided for displaying the fitted model. The Main Effects Plot is almost always important: Main Effects Plot for reacted 78 74 70 66 62 58 54 1.0 2.0 140.0 180.0 3.0 6.0 catalyst temperature concentration 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 32

reacted It shows how the predicted response Y varies when each of the factors in the model is changed from its low level to its high level, with all other factors held at the center of the experimental region (halfway between the low level and the high level). When all of the factors are plotted together as in the above plot, it is easy to judge which factors have the greatest impact. When plotted individually, the predicted response at the extremes of a selected factor is shown: Main Effects Plot for reacted 78 74 75.4605 70 66 62 58 54 54.9605 1.0 2.0 Note: In some cases, the values shown at the endpoints of the line in the above plot will be equal to the average response at the low and high level of the plotted factor. That is not the case in general, however. It is important to note that STATGRAPHICS plots the predicted response from the current model, not the observed data. This allows the plot to be used with many types of designs other than a two-level factorial. Pane Options catalyst 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 33

reacted Factors: factors to be included in the plot. Interaction Plot When significant interactions exist amongst the experimental factors, the main effects plots do not tell the whole story about the factors that interact and can even be misleading. In such cases, an Interaction Plot should be produced for each pair of factors. If more than one interaction is plotted, the display will take the following form: 94 Interaction Plot for reacted 84 74 + - 64 54 + - 1 2 BD - + - + 140 180 DE A pair of lines will be plotted for each interaction, corresponding to the predicted response when one factor is varied from its low value to its high value, at each level of the other factor. All factors not involved in the interaction are held at their central value. The plot is usually easier to understand if Pane Options if used to plot each interaction separately: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 34

reacted Interaction Plot for reacted 94 84 temperature=180.0 74 64 temperature=140.0 54 temperature=180.0 temperature=140.0 1.0 2.0 The predicted response for each combination of the low and high levels of two factors is plotted at the end of each line segment. If two factors do not interact, the effect of one factor will not depend upon the level of the other and the two lines in the interaction plot will be approximately parallel. If the factors interact, as in the above figure, the lines will not be parallel and may even cross. Interpretation of interaction plots is usually highly informative. For example, the plot above shows that temperature has little effect on the response at a low level of catalyst. However, it has a large effect at the high level of catalyst. Pane Options catalyst 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 35

Factors: two or more factors to include on the plot. All interactions for which both factors have been checked will be included. Reverse Factors: If checked, the first factor rather than the second will be used to define the lines on the plot. Regression Coefficients The underlying regression model may be displayed by selecting the Regression Coefficients pane: Regression coeffs. for reacted - Chemical reactor example Coefficient Estimate constant 9.83553 B:catalyst -65.5 D:temperature 0.2125 E:concentration 23.25 BD 0.5375 DE -0.158333 Interpretation This pane displays the regression equation which has been fitted to the data. The equation of the fitted model is reacted = 9.83553-65.5*catalyst + 0.2125*temperature + 23.25*concentration + 0.5375*catalyst*temperature - 0.158333*temperature*concentration where the values of the variables are specified in their original units. To have STATGRAPHICS evaluate this function, select Predictions from the list of Tabular Options. To plot the function, select Response Plots from the list of Graphical Options. The StatAdvisor displays the equation, which corresponds to the regression model described earlier. This is the equation that is used to predict the response at specified values of the experimental factors. Note: In the regression equation, all factors that were defined as continuous when the experiment was initially created are expressed in their original units (e.g., temperature is expressed in degrees C). Factors that were not defined as continuous use the coding of -1 for the low level and +1 for the high level. Correlation Matrix The correlation matrix displays the estimated correlation between the coefficients in the fitted regression model: Correlation Matrix for Estimated Effects (1) (2) (3) (4) (5) (6) (1) average 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 (2) B:catalyst 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 (3) D:temperature 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 (4) E:concentration 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 (5) BD 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 (6) DE 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 The correlations are estimated from the variance-covariance matrix of the coefficients, given by 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 36

s ( b) MSE X X 2 1 (18) A diagonal matrix such as that shown above indicates that the estimates of each of the effects are uncorrelated with the other estimates, which stems from the orthogonality of the original design. If any data values were missing (indicated by leaving the corresponding cells of the response column in the spreadsheet empty), or additional runs have been added by the user, there may well be non-zero values for the off-diagonal terms. Large correlations are likely to lead to poorly defined effects that are difficult to interpret. Response Plots The list of graphs available in the Analyze Data procedure contains two selections labeled Response Plots that allow the predicted values of the response to be plotted in various ways. By default, the first selection creates a response surface plot and the second a contour plot. However, each is controlled by the same Pane Options dialog box, which allows the analyst to display any of six types of plots: 1. a surface plot. 2. a contour plot. 3. a square plot. 4. a cube plot. 5. A 3-D contour plot 6. A 3-D mesh plot 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 37

temperature reacted Surface plot The surface plot displays a plot of the predicted response as a function of any two of the experimental factors, with the other factors held at selected values. For example, the plot below shows reacted as a function of catalyst and temperature: Estimated Response Surface feed rate=12.5,agitation=110.0,concentration=4.5 94 84 74 64 54 1 1.2 1.4 1.6 1.8 2 catalyst 180 140 150160170 temperature The height of the surface represents the predicted value Y, which is plotted over the range of the experimental factors. Contour plot Contour plots draw lines or colored regions based on values of the predicted response. For example, the plot below displays the range of the predicted values for reacted using colors extending from blue at 50 to red at 85: Contours of Estimated Response Surface feed rate=12.5,agitation=110.0,concentration=4.5 180 170 160 150 reacted 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 140 1 1.2 1.4 1.6 1.8 2 catalyst The color ramp used is controlled by the Palette tab on the Graphics Options dialog box. Square plot The Square Plot shows the predicted response at combinations of the low and high levels for any 2 factors: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 38

concentration temperature 180.0 Square Plot for reacted feed rate=12.5,agitation=110.0,concentration=4.5 55.7105 86.9605 140.0 54.2105 63.9605 1.0 2.0 catalyst Cube plot The Cube Plot shows the predicted response at combinations of the low and high levels for any 3 factors: Cube Plot for reacted feed rate=12.5,agitation=110.0 6.0 55.8355 47.8355 65.5855 79.0855 52.5855 3.0 1.0 63.5855 catalyst 94.8355180.0 62.3355 2.0140.0 temperature In the example, the highest predicted value for reacted is obtained at catalyst = 2, temperature = 180, and concentration = 3. The strong interactions amongst the factors result in a predicted value of nearly 95 at that combination of the factors. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 39

concentration 3-D contour plot The 3-D Contour Plot draws contours with respect to 3 variables at a time: Contours of Estimated Response Surface feed rate=12.5,agitation=110.0,concentration=4.5 6 5.5 5 4.5 4 3.5 3 1 1.2 1.4 1.6 1.8 2 catalyst 170 180 140 150160 temperature reacted 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 The top face of the block shows contours with respect to catalyst and temperature, with concentration held at its central value. The front face of the block shows contours with respect to catalyst and concentration, with temperature held at its low value. The right face of the block shows contours with respect to temperature and concentration, with catalyst held at its high value. The Explore button on the analysis toolbar may be used to manipulate the surface. It displays the floating dialog box shown below: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 40

The sliders may be used to change the values of the experimental factors. Note that: 1. Changing the value of a factor such as feed rate that has not been selected changes the level for that factor in the model that is evaluated. 2. Changing the factors that define the X and Y axes moves the square located on the top face of the surface. The predicted value of the model at the location of that square is displayed on the dialog box below the Extrapolate checkbox. 3. Changing the factor that corresponds to the Z axis moves the top face of the surface up or down. 4. Pushing the Ascend or Descend button will cause the square to follow the path of steepest ascent or descent until it reaches the location of the optimal value of the predicted 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 41

response within the experimental region. If Extrapolate is checked, the square will continue moving outside of the experimental region. For the sample data, pushing the Ascend button locates the point where the predicted response is maximized: 3-D mesh plot The 3-D Mesh Plot show the predicted response at all points in a grid throughout the experimental region (reduce the resolution using Pane Options for best results): 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 42

concentration Estimated Response Surface Mesh feed rate=12.5,agitation=110.0 6 5.5 5 4.5 4 3.5 3 1 1.2 1.4 1.6 1.8 2 catalyst 170 180 140 150160 temperature reacted 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 43

Pane Options Type: type of response plot to create. From: location at which the first contour line is drawn, or the start of the first region. To: location at which the last contour line is drawn, or the end of the last region. By: spacing between contour lines or regions. Lines: if selected, a sequence of contour lines is drawn at selected levels of the predicted response, as on a topographical map. Painted Regions: if selected, a set of regions is drawn covering various ranges of the predicted response. Painted Regions: if selected, a set of regions is drawn covering various ranges of the predicted response. Continuous: draws contours using a continuous range of colors. Continuous with grid: draws contours using a continuous range of colors and adds rectangular grid lines. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 44

Resolution: defines the resolution m of an m-by-m grid of predicted values which is used to draw the surface and contour lines. Increasing the resolution may improve the smoothness and definition of the plots, at the expense of computer time and memory. Horizontal Divisions: the number of divisions along the first experimental axis. This determines how many lines will be drawn on the surface plot parallel to the Y-axis. Vertical Divisions: the number of divisions along the second experimental axis. This determines how many lines will be drawn on the surface plot parallel to the X-axis. Contours Below: requests that a contour plot, of type specified below, be drawn in the bottom face of the 3-D plot. Show Points: requests that the observed data values Y i be added to the plot, with vertical lines drawn from each point to the surface. Wire Frame: requests that the surface be drawn using cross-hatched lines. Solid: requests that the surface be drawn using a solid color. Contoured: requests that the surface be drawn showing contour levels of the response. Factors: specifies the factors to be plotted on each axis and the levels at which the other factors will be held: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 45

If creating a surface, contour, or square plot, two factors must be checked. If creating a cube plot, three factors must be checked. The current example plots predicted values versus catalyst and temperature, when feed rate = 12.5, agitation = 110, and concentration = 4.5. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 46

Predictions The Predictions pane may be used to generate predictions from the fitted model: Estimation Results for reacted Observed Fitted Studentized Lower 95.0% CL Upper 95.0% CL Row Value Value Residual Residual for Mean for Mean 1 65.0 65.2105-0.210526-0.0846423 63.8485 66.5726 2 56.0 55.8355 0.164474 0.0807762 52.248 59.4231 3 53.0 52.5855 0.414474 0.203853 48.998 56.1731 4 63.0 62.3355 0.664474 0.327704 58.748 65.9231 5 65.0 65.5855-0.585526-0.28848 61.998 69.1731 6 53.0 52.5855 0.414474 0.203853 48.998 56.1731 7 55.0 55.8355-0.835526-0.413139 52.248 59.4231 8 67.0 65.5855 1.41447 0.708878 61.998 69.1731 9 61.0 62.3355-1.33553-0.667797 58.748 65.9231 10 67.0 65.2105 1.78947 0.735268 63.8485 66.5726 11 69.0 63.5855 5.41447 4.1464 59.998 67.1731 12 45.0 47.8355-2.83553-1.52039 44.248 51.4231 13 78.0 79.0855-1.08553-0.539401 75.498 82.6731 14 93.0 94.8355-1.83553-0.933357 91.248 98.4231 15 49.0 47.8355 1.16447 0.57969 44.248 51.4231 16 60.0 63.5855-3.58553-2.04408 59.998 67.1731 17 95.0 94.8355 0.164474 0.0807762 91.248 98.4231 18 82.0 79.0855 2.91447 1.57129 75.498 82.6731 19 63.0 65.2105-2.21053-0.919228 63.8485 66.5726 20 62.6605 60.6918 64.6293 Average of 3 centerpoints = 65.0 Average of model predictions at center = 65.2105 The table may include all rows in the datasheet, or only rows for which the value of the response variable Y has not been entered. The latter feature allows the analyst to make predictions at combinations of X that were not included in the experiment. For example, the above table shows the result of adding a 20 th row with feed rate = 12.5, catalyst = 1.2, agitation = 105, temperature = 165, and concentration = 3.5. The predicted value of reacted is 62.66. The 95% confidence interval for the mean value of reacted at that same combination of the factors ranges from 60.7 to 64.6. The table also displays the average of the experimental runs performed at the center of the experimental region, together with the predicted response. If the assumed model is correct, the two values should be close. If not, there may be unmodeled curvature with respect to one or more of the experimental factors. Determining the nature of that curvature would require performing additional runs at different levels of the factors. To add additional runs to a screening experiment, you can use the Augment Design selection on the DOE menu. One other noticeable entry in the above table is the Studentized residual for row #11. The Studentized residual measures the difference between the observed response and the predicted response, in units of its standard error, when the observation in question is not used to fit the model. The Studentized residual for observation #11 equals 4.1. Values in excess of 3.0 are unusual and would typically require further scrutiny. If the point in question gave a desirable result (which it does not), a rerun of that set of experimental conditions might be necessary. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 47

Pane Options Include: items to include in the table: 1. Observed Y - the observed response values Y i. 2. Fitted Y - the predicted values Y i calculated from the fitted model. 3. Residuals - the residuals e i. 4. Studentized Residuals - a type of standardized residual, where each residual is divided by an estimate of its standard error. STATGRAPHICS computes Studentized deleted residuals, in which each observation is removed one at a time and the model refit without that data value. The deleted residual then equals the observed response minus the value predicted from a model fit without that observation, i.e., di Yi Y (19) ( i) The Studentized residual is calculated from where e * i di (20) s( d ) i i ( i) i ( i) ( i) i 2 1 s ( d ) MSE 1 X ( X X ) X (21) 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 48

The deleted residuals should follow a t distribution with n - p - 1 degrees of freedom, where p is the number of estimated coefficients in the fitted model. 7. Standard Errors for Forecasts - the standard error for new observations at a selected combination of the experimental factors X h, given by 1 MSE 1 X ( X X ) X (22) h h 8. Confidence Limits for Individual Forecasts - confidence limits for new observations at a selected combination of the experimental factors X h, given by 1 Y t MSE 1 X ( X X ) X h n p h h (23) 9. Confidence Limits for Forecast Means - confidence limits for the mean response at a selected combination of the experimental factors X h, given by 1 Y t MSE X ( X X ) X h n p h h (24) Predict - whether forecasts are displayed for all of the runs in the experiment data file, or only for runs that have a missing value in the response column. Confidence level - the confidence level for the intervals. Diagnostic Plots Several plots are also provided under Diagnostic Plots to examine the residuals from the fitted model. The Pane Options dialog box displays the various choices, which include the following: Observed versus Predicted This plot displays the observed response Y i versus the fitted values Y i, together with a diagonal line: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 49

residual observed Plot of reacted 95 85 75 65 55 45 45 55 65 75 85 95 predicted If the model fits well, the values should lie close to the line, as in the example above. Curvature around the line may suggest the need to transform the values of Y i using a logarithm or similar function. Residual versus Predicted This plot displays the residuals e i versus the fitted values Y i, with a horizontal line at zero: 6 4 Residual Plot for reacted 2 0-2 -4-6 45 55 65 75 85 95 predicted The residuals should vary randomly around the line. Changes in the magnitude of the residuals from left to right may signal that the variance of the experimental error varies with the mean level of the response. Such heteroscedasticity may frequently be eliminated by a variancestabilizing transformation such as a logarithm or a square root. Residuals versus Run Order This plot displays the residuals e i versus run number i, with a horizontal line at zero: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 50

residual Residual Plot for reacted 6 4 2 0-2 -4-6 0 4 8 12 16 20 run number Any non-random pattern may indicate a time trend or other effect. In such cases, addition of a factor to account for the change may improve the fit of the model. The above plot does suggest an increase in variability during the second half of the experiment, which would be worthy of further investigation. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 51

percentage residual Residuals versus Factor This plot displays the residuals e i versus the observed values of a selected experimental factor: 6 4 Residual Plot for reacted 2 0-2 -4-6 1 1.2 1.4 1.6 1.8 2 catalyst Any curvature around the line may suggest the need for a model with quadratic effects. The above plot suggests that the variability amongst the replicated values at the centerpoint may be somewhat less than that of the residuals at the low and high levels of catalyst. Normal Probability Plot of Residuals This plot displays the residuals e i versus quantiles of a normal distribution, with an optional fitted line as reference: Normal Probability Plot for Residuals 99.9 99 95 80 50 20 5 1 0.1-3.6-1.6 0.4 2.4 4.4 6.4 residuals If the experimental error follows a normal distribution, the points should lie along a straight line. The above plot suggests that the largest residual (row #11) is somewhat higher than expected, since it lies off the line defined by the others. This could indicate the presence of an outlier or significant curvature. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 52

probability of detection Power Curve The power curve shows the ability of the statistical tests to detect effects of a given magnitude: Power Curve for B:catalyst 1 0.8 0.6 0.4 0.2 0-8 -4 0 4 8 true effect The vertical axis shows the probability that an effect will generate a statistically significant result when the data are analyzed. The horizontal axis displays the assumed value of the effect, in units of the response. The above plot shows that the current experiment has an excellent chance of detecting any effects for which the change in reacted is 4 or more. Pane Options Plot: the type of plot to be created. Plot versus: selects the experimental factor to be shown in the plot, for those plots where a factor is needed. Direction: defines the orientation of the normal probability plot. Fitted Line: specifies whether a line should be fit to the data on the normal probability plot. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 53

Alpha: specifies the -risk associated with the Power Curve. Optimization Step #9: Optimize responses Once a statistical model has been developed for each response, the analyst may now determine what combination of factors will yield the best results. Pressing the button labeled Step #9 on the Experimental Design Wizard toolbar first displays the dialog box shown below: Since optimization requires searching for the best conditions throughout the experimental region, it is a good idea to begin that search at many different points in order to avoid finding only a local optimum. When the optimization is complete, a message similar to that shown below will be displayed: The dialog box indicates the Desirability of the final result, based on a metric designed to balance competing requirements of multiple responses (see the document titled DOE Wizard for full details). The value displayed in this case indicates that the predicted reactivity at the optimum factor settings is 74.18% of the distance between 80 and 100, which was the desired range specified when the design was created. If you press OK, additional information will be added to the main DOE Wizard window: 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 54

concentration Step 9: Optimize the responses Response Values at Optimum Response Prediction Lower 95.0% Limit Upper 95.0% Limit Desirability reacted 94.8355 91.6295 98.0415 0.741776 Factor Settings at Optimum Factor Setting feed rate 10.0 catalyst 2.0 agitation 120.0 temperature 180.0 concentration 3.0 The table shows the estimated response at the optimal settings of the experimental factors. For the chemical reaction data, it is estimated that the mean percent reacted will equal 94.84% when the factors are set at feed rate = 10, catalyst = 2, agitation = 120.0, temperature = 180, and concentration = 3. The 95% confidence interval for the mean ranges between 91.63% and 98.04%. NOTE: Since feed rate and agitation have been completely eliminated from the statistical model, the solution displayed is only one of many. Any setting of feed rate and agitation would give the same predicted response. If you push the Tables and Graphs button on the analysis toolbar, you can display the estimated desirability throughout the experimental region by selecting Desirability Plot. An interesting type of display is the 3-D Contour plot shown below (use Pane Options and the Factors button to select the factors to plot on each axis): Desirability Plot feed rate=10.0,agitation=120.0,concentration=6.0 6 5.5 5 4.5 4 3.5 3 1 1.2 1.4 1.6 1.8 2 catalyst 170 180 140 150160 temperature Desirability 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 It is clear that the best place to operate is in the lower right back corner. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 55

Step 10: Save results The button labeled Step 10 allows you to save the results in a StatFolio: Actually, the StatFolio can be saved at any point and reloaded at a later date. IMPORTANT: When using the Experimental Design Wizard, two files are created: 1. An experiment file with the extension.sgx which stores information about the experimental data. 2. A StatFolio with the extension.sgp that stores the results of the analysis. If you move the experiment to another computer, be sure to transfer both files. Step 11: Augment Design Since the conclusions from the design are fairly clear, there is no need to augment the design. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 56

Extrapolation Step 12: Extrapolate The maximum predicted reactivity within the design space is 94.84%. To use the statistical model to predict settings of the factors outside the experimental region that might produce even better results, press the button labeled Step 12. The following dialog box will be displayed: Start at: the position from which to start the search. Change: the factors you wish to consider changing. Since feed rate and agitation have been completely eliminated from the model, they have been unchecked. Display steps of: The program will begin at the starting location and follow the path of steepest ascent in an attempt to increase the desirability of the predicted response. Specify the increment of increased desirability at which the results should be displayed. 2017 by Statgraphics Technologies, Inc. DOE Wizard Screening Designs - 57