OPTIMIZATION OF FIRST ORDER MODELS

Chapter 2 OPTIMIZATION OF FIRST ORDER MODELS One should not multiply explanations and causes unless it is strictly necessary William of Bakersville in Umberto Eco s In the Name of the Rose 1 In Response Surface Methods, the optimal region to run a process is usually determined after a sequence of experiments is conducted and a series of empirical models are obtained. As mentioned in Chapter 1, in a new or poorly understood process it is likely that a first order model will fit well. The Box-Wilson methodology suggests the use of a steepest ascent technique coupled with lack of fit and curvature tests to move the process from a region of little curvature to one where curvature and the presence of a stationary point exists. In this chapter we discuss, at an elementary level, steepest ascent/descent methods for optimizing a process described by a first order model. We will assume readers are familiar with the linear regression material reviewed in Appendix A. More advanced techniques related to exploring a new region that incorporate the statistical inference issues into the optimization methods are discussed in Chapter 6. 1 Paraphrasing a non-fictional Franciscan monk of the epoch, William of Ockham (1285 1347): Essentia non sunt multiplicanda praeter necessitatem (Entities should not be multiplied beyond necessity), a statement that became to be known as Ockham s razor. 29

30 PROCESS OPTIMIZATION: A STATISTICAL APPROACH 2.1 New Region Exploration A first order model will serve as a good local approximation in a small region close to the initial operating conditions and far from where the process exhibits curvature. A fitted first order polynomial model in k factors has the form ŷ = β 0 + k β j x j. Experimental designs used for fitting this type of models are discussed in more detail in Chapter 3. Usually, a two-level factorial experiment is conducted with repeated runs at the current operating conditions which serve as the origin of coordinates in coded factors. The idea behind this region exploration Phase of RSM is to keep experimenting along the direction of steepest ascent (or descent, as required) until there is no further improvement in the response. At that point, a new fractional factorial experiment with center runs is conducted to determine a new search direction. This process is repeated until at some point significant curvature in ŷ or lack of fit in the 1st order model will be detected. In the new region, the operating conditions x 1,x 2,...,x k are close to where a stationary point (and hopefully, a local maximum or minimum, as required) of y occurs. When significant curvature or lack of linear fit is detected, the experimenter should proceed to fit and optimize a higher order model, in what Box and Wilson called canonical analysis. Figure 2.1 illustrates a sequence of line searches when seeking a region where curvature exists in a problem with 2 factors (i.e., k =2). In practical problems, more than 3 iterations of steepest ascent/descent are rare. There are two main decisions we must make in the region exploration phase of RSM: 1 determine the search direction; 2 determine the length of the step to move from the current operating conditions. These two decisions are linked, because a search in a direction where there is strong evidence the process will improve may proceed at larger steps than a search in a less reliable direction. In Chapter 6 we present some approaches on how to deal with these two aspects jointly. i=1

Optimization of First Order Models 31 first steepest ascent direction x 2 Second 2- level factorial with center runs (get new direction) Third factorial experiment, LOF or curvature detected; augment DOE and fit 2nd order model second steepest ascent direction Initial 2-level factorial with center runs x 1 Figure 2.1. A sequence of line searches for a 2 factor optimization problem 2.2 Steepest Ascent/Descent Procedure Suppose a first order model has been fitted and the model does not exhibit lack of fit due to second order terms. Then the direction of maximum improvement is given by: 1 ŷ, if the objective is to maximize y (this gives the steepest ascent direction); 2 ŷ, if the objective is to minimize y (this gives the steepest descent direction). The direction of the gradient is given by the values of the parameter estimates (excluding the intercept), that is, β = b =(b 1,b 2,...,b k ). Since the parameter estimates depend on the scaling convention for the factors, the steepest ascent (descent) direction is also scale-dependent. That is, two experimenters using different scaling conventions will follow different paths

32 PROCESS OPTIMIZATION: A STATISTICAL APPROACH for process improvement. We note, however, that the region of the search, as given by the signs of the parameter estimates, does not change with scale 2. Let us consider without loss of generality a maximization problem. The coordinates of the factor settings on the direction of steepest ascent separated a distance ρ from the origin are obtained from solving: max b 1 x 1 + b 2 x 2 + + b k x k = b x k subject to: x 2 i ρ 2 or x x ρ 2. i=1 Unconstrained optimization of the first order model will evidently provide an unbounded solution, hence the constraint is added. The value of ρ is the step size, which is user-defined. A value of ρ =1will give a point close to the edge of the experimental region where the model was fit (in coded factors). To solve the maximization problem, form the Lagrangian (see Appendix C for general results on optimality conditions): maximize L = b x λ(x x ρ 2 ) where λ is a Lagrange multiplier. Compute the partials and equate them to zero L x = b 2λx = 0 L λ = (x x ρ 2 )=0. These two equations have two unknowns (the vector x and the scalar λ) and thus can be solved yielding the desired solution: or, in non-vector notation: x = ρ b b b i x i = ρ kj=1 b 2 j i =1, 2,...,k. (2.1) From this equation, we can see that any multiple ρ of the direction of the gradient (given by the unit vector b/ b ) will lead to points on the steepest 2 For a recent proposal on a scale-independent direction finding method, see Section 6.3.

Optimization of First Order Models 33 ascent direction. For steepest descent, we use instead b i in the numerator of the equation above. An equivalent and simple approach to finding the points on the steepest ascent path is as follows [110]: 1 Choose a step size in one of the controllable factors, say x j. This may be the factor we are most comfortable varying or simply the one with the largest b j. 2 Take a proportional step in all other factors, i.e., x i b i = x j b j, i =1, 2,...,k; i j from which x i = b i b j x j, i =1, 2,...,k; i j. (2.2) These two approaches are equivalent since the second approach will result in ki=1 a point on the path of steepest ascent located at a distance ρ = x 2 i (coded units) from the origin. Starting at some given operating conditions (the origin in coded units), the recommended practice is to run a 2-level factorial experiment with center points replicated to allow to test for lack of fit and for curvature (see Appendix A). If there is evidence of lack of fit (LOF) or of curvature, we should add points to the 2-level factorial so that a second order model is estimable. With the second order model, we can then estimate the location of a local optima more precisely, as explained later in Chapter 4. We now illustrate the steepest ascent procedure. Example. Optimization of a Chemical Process. It has been concluded (perhaps after a factor screening experiment) that the yield (y, in%)of a chemical process is mainly affected by the temperature (ξ 1,in C ) and by the reaction time (ξ 2, in minutes). Due to safety reasons, the region of operation is limited to 50 ξ 1 250 150 ξ 2 500 The process is currently run at a temperature of 200 C and a reaction time of 200 minutes. A process engineer decides to run a 2 2 full factorial experiment

34 PROCESS OPTIMIZATION: A STATISTICAL APPROACH with factor levels at factor low center high X 1 170 200 230 X 2 150 200 250 Five repeated runs at the center were conducted to assess lack of fit (see Appendix A for details on lack of fit tests). The orthogonally coded factors are x 1 = ξ 1 200 and x 2 = ξ 2 200. 30 50 The experimental results are shown in Table 2.1. The corresponding ANOVA table for a first order polynomial model is given in Table 2.2. Neither the single degree of freedom test of curvature nor the lack of fit test indicate a problem with the model. Furthermore, there is evidence that the first Table 2.1. First 2-level experimental design, chemical experiment example x 1 x 2 ξ 1 ξ 2 Y (=yield) 1 1 170 150 32.79 1 1 230 150 24.07 1 1 170 250 48.94 1 1 230 250 52.49 0 0 200 200 38.89 0 0 200 200 48.29 0 0 200 200 29.68 0 0 200 200 46.50 0 0 200 200 44.15 Table 2.2. ANOVA table for first DOE, chemical example Source SS d.o.f. MS F 0 P-value Model 503.3035 2 251.6517 4.810 0.0684 Curvature 8.1536 1 8.1536 0.1558 0.7093 Residual (Error) 261.5935 5 52.3187 Lack of Fit 37.6382 1 37.6382 0.6722 0.4583 Pure Error 223.9553 4 55.9888 Total 773.0506 8

Optimization of First Order Models 35 order model is significant. The fitted equation in coded factors, using ordinary least squares (OLS), is Ŷ =39.57 1.2925x 1 +11.14x 2 Diagnostic checks (see Appendix A) show conformance to the regression assumptions, although the R 2 value (0.6580) is not very high. To maximize ŷ, we use the direction of steepest ascent. Suppose we select ρ =1, since a point on the steepest ascent direction distanced one unit (in the coded units) from the origin is desired. Then from equation (2.1), the coordinates of the factor levels for the next run are given by: x 1 = ρb 1 2j=1 b 2 j = (1)( 1.2925) ( 1.2925) 2 +(11.14) 2 = 0.1152 and x 2 = ρb 2 2j=1 b 2 j = (1)(11.14) ( 1.2925) 2 +(11.14) 2 =0.9933. This means that to improve the process, for every ( 0.1152)(30) = 3.456 C that temperature is varied (decreased), the reaction time should be varied by (0.9933)(50) = 49.66 minutes. Alternatively, we could have used instead the procedure that lead to expression (2.2): 1 Suppose we select ξ 2 = 50 minutes. This can be based on process engineering considerations. It could have been felt that ξ 2 =50does not move the process too far away from the current region of experimentation. This makes sense since the R 2 value of 0.6580 for the fitted model is quite low, providing a steepest ascent direction not very reliable. 2 x 2 = 50 50 =1.0. 3 x 1 = 1.2925 11.14 = 0.1160. 4 ξ 2 =(.1160)(30) = 3.48 C. Thus the step size in original units is ξ =( 3.48 C, 50 minutes). To conduct experiments along the direction of maximum improvement, we just continue selecting operating conditions using the same step size as selected before, adding the step ξ to the last point on the steepest ascent/descent direction ξ i.

36 PROCESS OPTIMIZATION: A STATISTICAL APPROACH 1 Given current operating conditions ξ 0 = (ξ 1,ξ 2,...,ξ k ) and a step size ξ =( ξ 1, ξ 2,..., ξ k ), perform experiments at factor levels ξ 0 + ξ, ξ 0 +2 ξ, ξ 0 +3 ξ,... as long as improvement in the response y (decrease or increase, as desired) is observed. 2 Once a point has been reached where there is no further improvement, a new first order experiment (e.g. a 2-level factorial) should be performed with repeated center runs to assess lack of fit and curvature. If there is no significant evidence of lack of fit, the new first order model will provide a new search direction, and another iteration is performed. Otherwise, if either there is evidence of lack of fit or of (pure quadratic) curvature, the experimental design is augmented and a second order model should be fitted. That is, the experimenter should proceed with the next Phase of RSM. Example. Experimenting along the direction of maximum improvement. Let us apply the steepest ascent procedure to the Chemical Experiment analyzed earlier. Recall that a first order model was significant, and did not show lack of fit or evidence of curvature. We proceed to move the process in the direction of steepest ascent, by using the step size computed earlier. Step 1: Given ξ 0 = (200 C, 200 minutes) and ξ =( 3.48 C, 50 minutes), perform experiments as shown on Table 2.3 (the step size in temperature was rounded to 3.5 C for practical reasons). Since the goal was to maximize y, the point of maximum observed response is ξ 1 = 189.5 C, ξ 2 = 350 minutes. Notice that the search was stopped after 2 consecutive drops in response. This was done to reassure us that the mean response was actually decreasing and to avoid Table 2.3. Illustration of a steepest ascent search ξ 1 ξ 2 x 1 x 2 y (=yield) ξ 0 200 200 0 0 ξ 0 + ξ 196.5 250 0.1160 1 56.2 ξ 0 +2 ξ 193.0 300 0.2333 2 71.49 ξ 0 +3 ξ 189.5 350 0.3500 3 75.63 ξ 0 +4 ξ 186.0 400 0.4666 4 72.31 ξ 0 +5 ξ 182.5 450 0.5826 5 72.10

Optimization of First Order Models 37 stopping the search too early due to noise. This issue calls for a formal stopping rule for steepest ascent, a topic we discuss in detail in Chapter 6. Step 2: Anew2 2 factorial experiment is performed with ξ = (189.5, 350) as the origin. Using the same scaling factors as before, the new scaled controllable factors are: x 1 = ξ 1 189.5 30 and x 2 = ξ 2 350 50 Five center runs (at ξ 1 = 189.5,ξ 2 = 350) were repeated to assess lack of fit and curvature. The experimental results are shown on Table 2.4. The corresponding ANOVA table for a linear model is shown on Table 2.5. From the table, the linear effects model is significant and there is no evidence of lack of fit. However, there is a significant curvature effect, which implies Table 2.4. Second 2-level factorial run, chemical experiment example x 1 x 2 ξ 1 ξ 2 y (=yield) 1 1 159.5 300 64.33 1 1 219.5 300 51.78 1 1 158.5 400 77.30 1 1 219.5 400 45.37 0 0 189.5 350 62.08 0 0 189.5 350 79.36 0 0 189.5 350 75.29 0 0 189.5 350 73.81 0 0 189.5 350 69.45 Table 2.5. ANOVA table for the second DOE, chemical example Source SS d.o.f. MS F 0 P-value Model 505.300 2 252.650 4.731 0.0703 Curvature 336.309 1 336.309 6.297 0.0539 Residual 267.036 5 53.407 Lack of fit 93.857 1 93.857 2.168 0.2149 Pure Error 173.179 4 43.295 Total 1108.646 8

38 PROCESS OPTIMIZATION: A STATISTICAL APPROACH that we have moved the process operating conditions to a region where there is curvature, so we should proceed to the next phase in RSM, namely, fitting and optimizing a second order model. The discussants of the Box and Wilson paper [29] pointed out that no clear stopping criterion existed in the methodology, to which the authors agreed. Although the discussants were referring to the final phase of RSM, the same can be said about a steepest ascent search. Using a simple first drop stopping rule (i.e., stop if the response drops for the first time) may miss the true maximum in the steepest ascent direction, as the observed drop may be due to noise. Box and Wilson noted this, mentioning that the observed response values could be used to test for the difference of the mean response, and warned that the steepest ascent procedure is useful under low noise conditions. Formal tests of hypothesis that consider noise in the ascent procedure were developed by later authors, and we present them in Chapter 6. A Bayesian stopping rule, much in the sense of the discussion of the Box-Wilson paper, has been developed by Gilmour and Mead [61]. We discuss it in detail in Chapter 12. Considerable work has been conducted since the Box and Wilson paper [29] on using search directions other than the steepest ascent direction, and on computing a confidence cone around a given direction. If the cone is too wide, this is evidence to proceed cautiously along such direction, so the cone provides some guidance about how large one should choose the step size along the steepest ascent direction other than selecting either ρ or x j arbitrarily. Again, we refer to Chapter 6 for details. Directions other than steepest ascent, which is well-known in the non-linear programming literature to have zig-zagging problems, have been proposed as well. We review one such proposal in Chapter 6. A final comment on the steepest ascent procedure as proposed in Box and Wilson refers to the assumed model. If a fractional factorial experiment at two levels is conducted, why neglecting two factor interactions? The argument in favor of this model is that in the initial stage of experimentation interactions will be dominated by main effects. This might be true for pure quadratic terms, but it is not clear why it should be true for two factor interactions. The argument reminds us a point in the controversy around the Taguchi methods (see Chapter 9). Taguchi considers apriorithat some interactions are insignificant, and this has been criticized fairly in our view to be unjustified in general.

Optimization of First Order Models 39 Likewise, in steepest ascent it seems that the first order model is assumed to fit the optimization technique (steepest ascent) because otherwise one would have to rely on relatively more complex nonlinear programming techniques. A two level factorial of enough resolution may be adequate to estimate twofactor interactions which can be utilized to optimize a process more efficiently, even at an earlier stage. The success of steepest ascent in practice, a evidenced by the many papers that have reported its application, is due to the fact that the technique will achieve some improvement maybe considerable improvement with respect to an earlier, non-optimized state of the process. That this improvement is as large as it could have been obtained or that it was found at the minimum cost (i.e., with a minimum number of experiments) has not been shown, and this point has not been emphasized much in the literature. Thus, there is room for improvement in this topic. We further discuss steepest ascent and related methods at a somewhat more advanced point of view in Chapter 6. 2.3 Problems 1 Consider the first order model ŷ = 100 + 5x 1 +8x 2 3x 3. This model was fit using an unreplicated 2 3 design in the coded variables 1 <x i < 1,i =1, 2, 3. The model fit was adequate and Sb 2 =2.54. The region of exploration on the natural variables was ξ 1 = temperature (100, 110 degrees C) ξ 2 = time (1, 2 hrs.) ξ 3 = pressure (50,75 psi) a) Using x 2 as the variable to define a step size along the steepest ascent direction, choose a step size large enough to take you to the boundary of the experimental region in that particular direction. Find and show the coordinates of this point on the path of steepest ascent in the coded variables x i. b) Find and show the coordinates of this point on the path of steepest ascent from part a) using the natural variables. c) Find a unit vector that defines the path of steepest ascent. d) What step size multiplier for the unit vector in c) above would give the same point on the path of steepest ascent you found in parts a) and b)?

40 PROCESS OPTIMIZATION: A STATISTICAL APPROACH e) Find the fraction of directions excluded by the 95% confidence cone of steepest ascent. 2 Consider the first order model ŷ =14.4 5.09x 1 13.2x 2 4x 3, where y denotes the life of a tool, in minutes, as a function of 3 process variables. This model was fit using an unreplicated 2 3 design in the coded variables 1 <x i < 1,i=1, 2, 3. The model fit was adequate with Sb 2 =1.74. The region of exploration on the natural variables was ξ 1 = speed of cut (650 to 800 units of surface per minute, sfm) ξ 2 = cut feed (0.01 to 0.03 inches per revolution, ipr) ξ 3 = depth of cut (0.05 to 0.20 in.) and the current operating conditions are the center of this region. a) For steepest ascent purposes, the Quality Engineer chooses to decrease the coded variable x 2 one unit from the origin. Find the coordinates of the resulting point on the path of steepest ascent in all other coded variables x i. b) Find and show the coordinates of this point on the path of steepest ascent from part a) using the natural variables. c) Find a unit vector that defines the path of steepest ascent. d) What step size multiplier for the unit vector in c) above would give the same point on the path of steepest ascent you found in parts a) and b)? e) Find the fraction of directions excluded by the 95% confidence cone of steepest ascent. 3 Consider the first order model: ŷ = 52.1 3.1x 1 +6.4x 2 1.25x 3, where the variance of the parameter estimates, s 2 b, equals 0.4 computed based on 5 replications of the center point of the design. Does the point x =( 0.9, 1.0, 0.3) generate a direction vector inside the 95% confidence cone of steepest ascent? 4 Consider the tool-life experimental data shown in Table 2.6 [44]. Two factors were varied in a replicated 2 2 with three center runs. The tool life of the tool was considered by the first occurrence of 0.015 inch uniform flank wear, 0.004 inch crater depth, 0.030 inch localized wear, or catastrophic failure.

Optimization of First Order Models 41 Table 2.6. Data for problems 4 and 5 Tool life (minutes) Speed (sfm) Feed (ipr) Grade A Grade B 500 0.015 67.0 84.4 500 0.015 101.9 91.2 500 0.015 63.6 66.7 800 0.015 23.5 16.0 800 0.015 17.6 15.2 800 0.015 21.3 17.6 500 0.027 17.9 24.6 500 0.027 25.3 15.3 500 0.027 25.4 30.4 800 0.027 0.4 1.1 800 0.027 0.6 0.5 800 0.027 0.5 0.9 650 0.021 21.4 11.8 650 0.021 19.2 8.9 650 0.021 22.6 10.6 a) Fit the best polynomial model you can to the grade A tool data. (Hint: transformations of the response may be necessary). b) Find the direction of steepest ascent, and determine a point that would take the process to a distance approximately equal to the boundary of the current experimental region. 5 Consider the grade B tool life data in Table 2.6. a) Fit the best polynomial model you can. b) Determine a point on the direction of steepest ascent that would take the process to a distance approximately equal to the boundary of the current experimental region. For the following problems, readers may wish to consult Appendix A.

42 PROCESS OPTIMIZATION: A STATISTICAL APPROACH 6 Consider the following one factor experiment: ξ 1 y (observed) 650 7 800 18 650 6 800 11 725 10 a) Using regression analysis, find the least squares estimates and the estimated variance of the parameters estimates in the model: y = β 0 + β 1 x 1 + ε where x 1 is a coded factor corresponding to ξ 1. b) Find Cook s D i diagnostic statistics based on the studentized residuals (r i ). Which point(s) seem to be outliers and/or influential? c) Test the significance of the regression model. What are the null and the alternative hypotheses? d) Test for lack of fit of the linear model. What are the null hypothesis and the alternative hypotheses? 7 Assume a 1st order model is to be fit in k factors. From R(β)/ β j = 0, j = 0, 1, 2,...,k, find the set of normal equations. Do not use any matrix notation. 8 Consider the following unit length coding convention: x iu = ξ iu ξ i s i, i =1,...,k; u =1, 2,...,n where and nu=1 ξ iu ξ i =, i =1,...,k n s i = n (ξ iu ξ i ) 2, i =1,...,k. u=1

Optimization of First Order Models 43 Show that the X X matrix will contain the correlation matrix of the controllable factors. Assume a first order model for simplicity. 9 It is well-known in regression analysis that e =(I H)Y =(I H)ε a) Interpret this expression using the concept of projections described in Appendix A. Draw a picture of the corresponding vectors to explain. b) Does this expression imply that Y = ε? Explain. 10 Assume we fit a linear model with an intercept. Argue why 1 H = 1 where 1 is an n 1 vector of ones. What does this imply for the average prediction, Ŷ and the average response Y?Is1 H = 1 if the model has no intercept? Why? 11 Consider the basic ANOVA for a linear regression model. Show that the following expressions for the corrected sums of squares are true. Do not use the relation SS total = SS reg + SS error : a) SS total =(Y Y 1) (Y Y 1) =Y Y ny 2 ; b) SS regression =(Ŷ Y 1) (Ŷ Y 1) =Y X β ny 2. (Hint: use the result in Problem 10). 12 Show that h = p/n. (Hint: use the fact that tr(ab) =tr(ba), where tr means the trace; see Appendix C.). Relate this fact to n i=1 Var ŷ(x i ) σ 2 = p. What is this last expression saying in favor of simpler models (i.e., models with fewer parameters)? 13 Show that a random variable W F 1,n p is equivalent to W t 2 n p.

http://www.springer.com/978-0-387-71434-9