Ridge analysis of mixture response surfaces

Statistics & Probability Letters 48 (2000) 3 40 Ridge analysis of mixture response surfaces Norman R. Draper a;, Friedrich Pukelsheim b a Department of Statistics, University of Wisconsin, 20 West Dayton Street, Madison, WI 53706-685, USA b Universitat Augsburg, D-8635 Augsburg, Germany Received July 999; received in revised form September 999 Abstract Previous applications of ridge analysis to second-order response surfaces based on mixture ingredients (x ;x 2;:::;x q) which sum to have required transforming from the origin (0; 0;:::;0), which is not in the mixture space, to a point inside the space, most often the centroid (=q; =q;:::; =q). We show that this transformation is not necessary for tracking the maximum ŷ and the minimum ŷ paths. In addition, we show that the application of ridge analysis is somewhat simplied if the Schee model form is replaced by the K (Kronecker) model form, an alternative, homogeneous model given elsewhere. c 2000 Elsevier Science B.V. All rights reserved MSC: 62J05 Keywords: Kronecker model; Mixture data; Ridge analysis; Response surfaces; Schee model. Introduction Ridge analysis was initially suggested by Hoerl (959, 962) in the context of tted second-order response surface models where the factors were not restricted. Some theoretical foundation for the method was later given by Draper (963). See also Hoerl (964) and Hoerl (985). The basic method denes a series of paths outward from the origin (x ;x 2 ;:::;x q )=(0; 0;:::;0) of the factor space. Suppose the tted second-order surface is written as ŷ = b 0 + x b + x Bx; () where x =(x ;x 2 ;:::;x q ); b =(b ;b 2 ;:::;b q ) Corresponding author. Tel.: +-608-262-0926; fax: +-608-262-0032. E-mail address: draper@stat.wisc.edu (N.R. Draper) 067-752/00/$ - see front matter c 2000 Elsevier Science B.V. All rights reserved PII: S067-752(99)0095-9

32 N.R. Draper, F. Pukelsheim / Statistics & Probability Letters 48 (2000) 3 40 and B = b 2 b 2 ::: 2 b q 2 b 2 b 22 ::: 2 b 2q. 2 b q..... 2 b 2q ::: b qq : Then () is the matrix format for the second-order tted equation (2) ŷ = b 0 + b x + b 2 x 2 + + b q x q + b x 2 + b 22 x 2 2 + + b qq x 2 q + b 2 x x 2 + b 3 x x 3 + + b q ;q x q x q : (3) The outward paths mentioned above are dened by imagining a sphere about the origin x =(0; 0;:::;0) of radius R, say. On such a sphere, we can nd the point of maximum response and the point of minimum response. In general, there may also be points at which ŷ has stationary values that are neither maxima nor minima. As R is increased from zero outwards, the loci of the maximum ŷ and the minimum ŷ can be followed out, thus giving us (second-order) paths of steepest ascent and descent. Typically, the paths of the intermediate stationary values, if any exist, begin at non-zero values of R which depend on the specic response surface being analyzed. 2. Mixtures ridge analysis Mixtures ridge analysis has been featured in only a few papers to date. Related references are Becker (969), Cornell and Ott (975), and Hoerl (987). In the last-mentioned paper, ridge analysis is done by rst transforming the q mixture variables to (q ) orthogonal variables, essentially removing the mixture restriction, but also unbalancing the coordinate system. This makes ridge analysis somewhat awkward to apply, and also makes it necessary to transform any conclusions back into the mixture coordinates. We now show how ridge analysis can be applied directly to mixtures applications in which x = x + x 2 + + x q =: For the moment, however, we shall continue to use the general form () and (2) for the response surface, particularizing to specic mixture forms when needed. Consider the Lagrangian function F = b 0 + x b + x Bx (x ) 2 (x x R 2 ): (4) When = 0, we fall back to the usual second-order ridge analysis Lagrangian function, as in Hoerl (959, 962) and Draper (963). Dierentiating (4) with respect to x (which can be achieved by dierentiating (4) with respect to x ;x 2 ;:::;x q in turn and rewriting these equations in matrix form) gives @F @x = b +2Bx 2 2 x: Setting (5) equal to zero leads to 2(B 2 I)x = (b ): (5) (6) Supercially, it would seem that if (B 2 I) exists, which will happen as long as 2 is not an eigenvalue of B, we obtain solutions for all the stationary points of ŷ on the sphere of radius R from the q equations x = 2 (B 2I) (b ): (7)

N.R. Draper, F. Pukelsheim / Statistics & Probability Letters 48 (2000) 3 40 33 In fact, however, solutions exist via (6) or (7) for the mixtures problem even at those eigenvalues, in general. This is because we must add, to the q equations of (7), the two restrictions x x x + x 2 + + x q = (8) which ensures that the solution lies in the mixture subspace, and x x x 2 + x 2 2 + + x 2 q = R 2 ; (9) which means that the solution is also on a sphere of radius R. Moreover, for mixtures models, b 0 ; B and b can take only certain forms, as will be explained in Section 4. As in ordinary ridge regression, we could in theory x R, substitute from (8) and (9) into (7) and then solve (7); it is far simpler to rst select a value for 2 in (7), however, whereupon we can apply (8) to obtain = x = 2 (B 2 I) (b ) (0) which implies that so that + 2 (B 2 I) b = 2 (B 2 I) () = + 2 (B 2 I) b 2 (B 2 I) : (2) We can now in general invoke (7) to give a value for x for the particular combination of 2 (chosen) and (from (2)) and evaluate R 2 = x x from (9) and ŷ from () or (3). We thus have a point on one of the stationary paths dened by ( 2 ; ; x;r;ŷ). Furthermore, all such points satisfy (8). Note that from (5) and (6), it is apparent that when we set 2 = 0, whereupon from (2), = + 2 B b 2 B ; (3) Eq. (7) will deliver the stationary point on the mixture space. How do we know which path we are on overall maximum ŷ on (9), overall minimum ŷ on (9), or intermediate stationary values, such as local maxima or minima? The usual matrix of second derivatives { @ 2 F @x i @x j } =2(B 2 I) (4) is not appropriate here, because it does not reect the fact that the solution has to be on the mixture space. Consider the (q ) by q matrix t t 2 ::: t q t 2 t 22 ::: t 2q T q =...... t q ; t q ;2 ::: t q ;q (5)

34 N.R. Draper, F. Pukelsheim / Statistics & Probability Letters 48 (2000) 3 40 say, where each row consists of a vector of orthogonal polynomial coecients, normalized so that the row sum of squares is. For q = 3 and 4, needed for our examples later, we have 0 T 3 = 2 2 2 ; (6) 6 6 6 T 4 = 3 20 20 20 3 20 2 2 2 2 3 3 20 20 20 20 : (7) [See, for example, Draper and Smith (998, p. 466).] The addition of a last row u q =(= q; = q;:::;= q)=(= q) converts T q into a transformation matrix U q =(T q; u q ) which allows the x ;x 2 ;:::;x q coordinates to be converted into values z ;z 2 ;:::;z q ; = q; via z = U q x or x = Uq z = U q(z ;z 2 ;:::;z q ; = q), due to the orthogonality of U q. It follows that, if we dene w =(z ;z 2 ;:::;z q ), so that z =(w ; = q), x Bx = z U q BU qz = w T q BT qw + 2 u q qbt qw + q u qbu q : (8) This means that we can replace (4), after dierentiating (8) twice with respect to w, by 2(T q BT q 2 I): Note that the size of this square matrix is (q ) not q because T q is (q ) q. We see that, if (9) is positive denite, we have a minimum, while if (9) is negative denite, we have a maximum. If (9) is indenite, intermediate stationary values are indicated. In fact, the theory at this point is a complete parallel of that in Draper (963). If the eigenvalues of T q BT q are 6 2 6 6 q, arranged in order with due regard to sign then on the mixture space: (a) choosing 2 q provides a locus of maximum ŷ as R changes, and (b) choosing 2 provides a locus of minimum ŷ as R changes. (c) choosing 6 2 6 q gives intermediate stationary values. As in the non-mixture case, when 2 = i exactly for i =; 2;:::;q ; Ris innite. [See Draper, 963.] Note that we do not need these eigenvalues to obtain the paths, but only to distinguish between paths. For the loci of maximum ŷ and the minimum ŷ, the eigenvalues are not necessary since choosing 2 values decreasing from gives the path of maximum ŷ, while using values increasing from gives the path of minimum ŷ. (9) 3. The geometry To see why the solution works without moving to an origin on the x = plane, we show geometrically the simplest cases; see Fig.. Imagine a sphere x 2 + x2 2 + x2 3 = R2 centered at the origin O in Fig. (b). When R = q, the sphere will not intersect the mixture space x + x 2 + x 3 = so there will be no solutions to (7). When R == q, the sphere just touches the mixtures centroid (= q; = q;:::;= q)=(= q), which will thus be the only solution point x of Eqs. (7) (9). It can easily be conrmed that, for this solution,

N.R. Draper, F. Pukelsheim / Statistics & Probability Letters 48 (2000) 3 40 35 Fig.. Ridge analysis geometry in the mixtures case for (a) q = 2, (b) q =3. 2 = ; = ; x=(= q) and R == q. [Or 2 = ; =, etc.] When R = q, the sphere will intersect the plane x + x 2 + x 3 = in a circle centered at the mixture centroid, which is exactly the way we wish to apply ridge analysis in this q = 3 mixture space. [Fig. (a) shows the more elementary q =2 case where the spheres are now circles and the circles are now pairs of points equally spaced on the line x + x 2 = around the centroid ( 2 ; 2 ).] For q 4, the picture of Fig. (b) must be mentally extended to higher dimensions. For q = 4, for example, the spheres cannot be drawn and the circles are spheres around the centroid of a pyramid. 4. Choice of mixture model In actually carrying out calculations (7) for selected 2, and with derived from (2) we have to specify the particular form of second order mixture model to t. Most readers would probably choose the Schee model (S-model) which, for second order, consists only of terms in x i and in x i x j. In such a case B in (2) has all diagonal terms zero, while b is, in general non-zero. Certainly the calculations oer no diculty if carried out in this form. A more interesting possibility, we suggest, is to employ the K-model where K stands for Kronecker. This involves using no x i terms, replacing them by xi 2 terms to obtain ŷ = b x 2 + b 22 x 2 2 + + b qq x 2 q + b 2 x x 2 + b 3 x x 3 + + b q ;q x q x q : (20) This form has been suggested by Draper and Pukelsheim (998); a comparison between the second order S- and K-models has been provided therein. The advantage in the ridge analysis formulation is that while B now contains diagonal terms, b = 0. Thus (2) becomes =2{ (B 2 I) } (2)

36 N.R. Draper, F. Pukelsheim / Statistics & Probability Letters 48 (2000) 3 40 whereupon (7) reduces to x = { (B 2 I) } (B 2 I) : (22) Our examples will be analyzed using the K-approach; either approach gives the same numerical solutions, of course. The proof of this follows from the fact that B S + 2 b + 2 b = B K, where subscript S denotes the Schee form of B and subscript K denotes the Kronecker form of b. We can now premultiply both sides by x, postmultiply both sides of the result by x, set x = x = and we obtain x B S x + 2 (x b + b x)=x B K x, the bracketed terms being identical and reducing to b x + + b q x q : All solutions x will be on the subspace x =, but depending on the 2 value chosen, some points will have coordinates that exceed or that are negative. Since the mixture space is such that 06x i 6, solutions that violate these restrictions are not relevant. We shall discuss this in our examples. 5. Examples Example (Kurotori, 966; see also Draper and Smith, 998; pp. 48 49). A set of 0 experimental runs was performed on a propellant problem with three ingredients (q = 3). In the original experiment, a restricted region with x 0:2; x 2 0:4; and x 3 0:2 was explored, and the best predictions were found to be on the x =0:20 boundary. In the present paper we do not restrict the surface to the smaller region, but follow the maximum ŷ path from the centroid (which is outside the region explored) into the restricted region. Note, in this regard, that the ridge paths may exist mathematically even when they might not be practically relevant. Fitting the surface in K-model form, we obtain the following equation: ŷ = 2:732 x 2 3:340 x 2 2 7:259 x 2 3 +3:249 x x 2 +4:694 x x 3 +28:83 x 2 x 3 ; Table Ridge path (x ;x 2 ;x 3 ) of maximum ŷ as 2 decreases and R increases, Kurotori data 2 x x 2 x 3 R ŷ 0.333 0.333 a 0.333 0.577 2.603 50 0.32 0.359 a 0.320 0.578 2.74 20 0.302 0.384 a 0.34 0.58 2.800 0 0.276 0.42 0.32 0.586 2.883 9 0.27 0.47 0.32 0.587 2.900 7 0.258 0.429 0.33 0.590 2.928 5 0.239 0.447 0.34 0.596 2.969 3 0.208 0.474 0.38 0.608 3.02 0.52 a 0.522 0.326 0.634 3.082 0 0.0 a 0.564 0.335 0.664 3.099 0.05 a 0.635 0.350 0.725 3.050 2 0:65 b 0.78 0.383 0.886 2.635 Restrictions in the original data set: x 0:2; x 2 0:4; x 3 0:2 a Point lies outside the original restricted data space. b Point lies outside the main simplex.

N.R. Draper, F. Pukelsheim / Statistics & Probability Letters 48 (2000) 3 40 37 Fig. 2. How R varies with 2. Kurotori data. whereupon 2:732 :624 7:347 B = :624 3:340 4:406 7:347 4:406 7:259 with eigenvalues ( 27:50; 4:24; 8:40). However, these eigenvalues are not the ones that aect movement on the mixture surface. With T 3 dened as in (6) we nd the eigenvalues of T 3 BT 3 to be ( 27:28; 3:86). The path for the maximum ŷ will thus be mapped out for values of 3:86. [Because this problem is not well conditioned, slightly dierent numbers may be obtained by dierent programs.] Table shows some selected calculations for this path, moving out from the centroid ( 3 ; 3 ; 3 ). We see that the path enters the restricted subspace across the x 2 =0:40 boundary, and exits it across the x =0:20 boundary later. The maximum predicted ŷ in or on the restricted subspace is at about (0.20, 0.48, 0.32), close to the 2 = 3 entry of Table. In examining the path, we can ignore points which violate the conditions of the practical problem. Note that a path could in theory pass outside the mixture space (or a dened restricted sub-region of it) and then come back in. The practical interpretation of such behavior would be to follow the path to the border and then move along the border until the path returned to the valid part of the mixture region.

38 N.R. Draper, F. Pukelsheim / Statistics & Probability Letters 48 (2000) 3 40 Fig. 3. How varies with 2. Kurotori data. Figs. 2 and 3 are helpful in understanding the calculations made for Table. Fig. 2 shows how the radius R varies with 2, leaping to innity at the eigenvalues 27:28 and 3:86 of T 3 BT 3. Table corresponds to the 2 3:86 part of Fig. 2 only, of course; this part is smooth where not shown. Since the sphere of radius R passes outside the mixture space when R, only the lower portion of Fig. 2 is meaningful in practice. The radius R = q always [here 0.577]. When 2 = or, we are at the centroid exactly. The two separate curves forming the U-shaped curve between the eigenvalues may (or may not) go as low as = q [here they do not]. Compare with a similar diagram in Draper (963). Fig. 3 shows how varies as a function of 2 ;as 2 increases, basically decreases but there are three sections in the plot (q in general) with divisions at the eigenvalues of T 3 BT 3. At each eigenvalue, the curve passes instantaneously from to. [We note again that the eigenvalues of B itself are not relevant to these calculations.] Example 2 (Draper et al., 993; Draper and Smith, 998; pp. 49 422). The data consist of 36 observations in four blocks. A second-order model with three added blocking variables was used. Fitting the surface in K-model form, we get ŷ = 4:89B 2:78B 2 20:B 3 + 400:40x 2 + 449:32x 2 2 + 398:90x 2 3 + 403:49x 2 4 + 946:7x x 2 + 988:86x x 3 + 954:56x x 4 + 8:33x 2 x 3 + 823:69x 2 x 4 + 746:39x 3 x 4 : The rst three terms do not depend on the x s and do not contribute to the ridge analysis except for the tted value calculations. For the purposes of this example, we simply choose to omit them, i.e., set B =B 2 =B 3 =0,

N.R. Draper, F. Pukelsheim / Statistics & Probability Letters 48 (2000) 3 40 39 Fig. 4. How R varies with 2. Bread data. because the ridge calculations are not thus aected [beyond ŷ levels]. Now 400:403 473:083 494:43 477:278 473:083 449:39 405:667 4:847 B = 494:43 405:667 398:903 373:94 477:278 4:847 373:94 403:486 with eigenvalues ( 28:9; 6:62; 30:8; 733:6). These eigenvalues are not needed. The eigenvalues of T 4 BT 4 where T 4 is given by (7), are ( 26:3; 6:78; 30:8). Fig. 4, parallel to Fig. 2 but for Example 2, shows that R goes to at these eigenvalues. For the path of maximum predicted response, we need 2 values for

40 N.R. Draper, F. Pukelsheim / Statistics & Probability Letters 48 (2000) 3 40 Table 2 Ridge path of maximum ŷ as R increases. Bread data 2 x x 2 x 3 x 4 R ŷ 0.250 0.250 0.250 0.250 0.500 433 600 0.289 0.255 0.230 0.226 0.503 436 200 0.335 0.272 0.205 0.87 0.54 440 00 0.369 0.308 0.87 0.36 0.533 442 50 0.389 0.44 0.82 0.06 0.596 447 48 0.389 0.425 0.84 0.003 0.605 448 46 0.389 0.437 0.86 0.03 a 0.64 448 a Point lies outside the mixture region. which 30:86 2 6. Table 2 shows some selected calculations for this path, moving out from the centroid ( 4 ; 4 ; 4 ; 4 ), until the path goes outside the mixture space [on the last line]. For both Examples and 2, diagrams of the response contours appear in Draper and Smith (998, pp. 49 and 42). In the case of the second of these diagrams, the eects of three non-signicant terms have been omitted in the equation used, but this does not have a material eect on the contours drawn. [The ridge analysis could be redone in terms of the reduced model with very similar results.] In both Examples and 2, the ridge paths obtained are consistent with the diagrams. Acknowledgements We are grateful to the Alexander von Humboldt-Stiftung for support through a Max Planck-Award for cooperative research. References Becker, N.G., 969. Regression problems when the predictor variables are proportions. J. Roy. Statist. Soc. Ser. B 3, 07 2. Cornell, J.A., Ott, L., 975. The use of gradients to aid in the interpretation of mixture response surfaces. Technometrics 7, 409 424. Draper, N.R., 963. Ridge analysis of response surfaces. Technometrics 5, 469 479. Draper, N.R., Pukelsheim, F., 998. Mixture models based on homogeneous polynomials. J. Statist. Plann. Inference 7, 303 3. Draper, N.R., Prescott, P., Lewis, S.M., Dean, A.M., John, P.W.M., Tuck, M.G., 993. Mixture designs for four components in orthogonal blocks. Technometrics 35, 268 276. See also Technometrics 37 (995) 3 32. Draper, N.R., Smith, H., 998. Applied Regression Analysis, 3rd Edition. Wiley, New York. Hoerl, A.E., 959. Optimum solution of many variables equations. Chem. Eng. Progr. 55 (), 69 78. Hoerl, A.E., 962. Applications of ridge analysis to regression problems. Chem. Eng. Progr. 58 (3), 54 59. Hoerl, A.E., 964. Ridge analysis. Chem. Eng. Progr. Symp. Ser. 60, 67 77. Hoerl, R.W., 985. Ridge analysis 25 years later. Amer. Statist. 39, 86 92. Hoerl, R.W., 987. The application of ridge techniques to mixture data: ridge analysis. Technometrics 29, 6 72. Kurotori, I.S., 966. Experiments with mixtures of components having lower bounds. Indust. Qual. Control 22, 592 596.