MATH602: APPLIED STATISTICS - PDF Free Download

MATH602: APPLIED STATISTICS Dr. Srinivas R. Chakravarthy Department of Science and Mathematics KETTERING UNIVERSITY Flint, MI 48504-4898 Lecture 10 1

FRACTIONAL FACTORIAL DESIGNS Complete factorial designs cannot always be conducted. Why? 7 factors at 2 levels requires 128 runs, a significantly large number. In some cases, it impossible to run all these 128 combinations. In some cases, the commercial production may have to be stopped for the duration of the study. 2

In any case, the expenses related to such experiments are prohibitively large. Often higher order interactions are insignificant. no distinguishable effects are noticed when moderately large number of factors are used. Fractional factorial designs (FFD) are significant alternatives to the above situations. The fraction is a carefully selected subset of all combinations. 3

In many cases, the experiments that involve many factors are routinely conducted as fractional factorials to identify factor-level combinations for a future full study using complete factorial designs. It is important to note that 1. the analysis of a FFD is relatively simple and straightforward; and 2. the use of a FFD doesn t stop us from completing a full factorial design. 4

CONFOUNDING IN FFD In a complete factorial design involving k factors at two levels, we have 2 k units, which and the analysis of such a design results in estimating k main effects and (2 k -k-1) interaction effects. Suppose we look at a fractional factorial design, say, the fraction 1/2 p. That is, there will be 2 k-p experimental units. Obviously, some effects are going to be confounded with one another. 5

What is confounding? Two or more effects are said to be confounded if calculated effects can only be attributed to their joint influence on the response, and not to their individual ones. That is to say that the contrasts of the effects will be the same (except possibly for the sign). 6

Construction and Analysis of FFD Here we will illustrate how to construct and analyze fractional factorial designs through a number of examples. Defining Relation: FFD s are generated using one or more generators. The relation that defines the generator is called the defining relation. This is the key to the confounding pattern. 7

Analysis: The analysis of a FFD is very similar to what we saw earlier in the case of a factorial design. However, here one has to be even more careful in interpreting the results as some factor effects are confounded with one another. 8

DESIGN RESOLUTION In practice, an important tool that will be used in selecting a FFD out of many possible ones, is the concept of design resolution. This identifies the order of confounding of the main effects and the interactions. Design Resolution: A design is said to be a design of resolution R if no p-factor effect is confounded with any other effect containing less than R-p factors. 9

In general, the resolution of a 2-level FFD is the length of the shortest word in the defining relation. Note: The resolution of a design is denoted by appropriate Roman letter appended as a subscript. For example, R III refers to a design of resolution R = 3. Also, another notation that is commonly used to denote a FFD is 2 III k-p. This a p-fraction 2-level, k-factor FFD of resolution III. 10

To further grasp the idea of design resolution, let us look at the following examples: (a) A design resolution of III does not confound main effects with one another but does confound the main effects with two-factor interactions. (b) A design resolution of IV does not confound main effects and two-factor interactions but does confound the two-factor interactions with the other two-factor interactions. 11

(c) A design resolution of V does not confound main effects and two-factor interactions but does confound the two-factor interactions with the other three-factor interactions, and so on. HOW TO SELECT A FFD? Having seen how to construct a FFD using appropriate generator(s), and also know how to analyze a FFD, the question now is how to select a generator in practice? 12

That is, how to select an appropriate FFD for a problem under study in practice? Recall that no matter how best you choose a statistical technique for analysis, it will not do much good if the design is not properly chosen. Hence, it is imperative that one is able to identify a proper design for the problem under study. After identifying the important factors that influence the response variable for the problem under study, use 13

(a) the existing tables to choose a FFD (b) computer to generate a FFD (c) trial and error method to identify a FFD. 14

EXAMPLE 5: (2 4-1 design - Problem 7.18): A chemical product is produced in a pressure vessel Four factors at two levels each A Temperature B Pressure C Concentration of Formaldehyde D Stirring Rate Response is the filtration rate. 15

EXAMPLE 6: (2 6-2 design - Example 7.9): Shrinkage (Y) of parts in injection molding process. Six factors at two levels each A Molding Temperature B Screw Speed C Holding Time D Cycle Time E Gate Size F Holding Pressure 16

Use E = ABC and F = BCD as generators Defining relation is: I = ABCE = BCDF = ADEF Alias Structure : I + ABCE + ADEF + BCDF A + BCE + DEF + ABCDF B + ACE + CDF + ABDEF C + ABE + BDF + ACDEF D + AEF + BCF + ABCDE E + ABC + ADF + BCDEF F + ADE + BCD + ABCEF ABD + ACF + BEF + CDE ABF + ACD + BDE + CEF AB + CE + ACDF + BDEF AC + BE + ABDF + CDEF AD + EF + ABCF + BCDE AE + BC + DF + ABCDEF AF + DE + ABCD + BCEF BD + CF + ABEF + ACDE BF + CD + ABDE + ACEF 17

CONSTRUCTION OF A FFD A 2 k-1 FFD of highest resolution construction. Write down a full FD for k-1 factors. Add the k-th factor by identifying its high and low levels to those of the highest order-interaction. [Note: We can use any order interaction to assign the k-th factor, but will not get the highest resolution]. 18

A 2 k-2 FFD is constructed as follows Write down a full FD with k-2 factors. Choose two generators, say, P and Q. Design Generators: E = ABC and F = BCD 19

RESPONSE SURFACE METHODOLOGY RSM is collection of statistical and mathematical methods that are used in modeling a response variable, as a function of 2 or more predictor variables and the objective is to find optimum settings that will optimize the response variable. Suppose the proposed model (when k=2) is of the form: Y = f(x 1, X 2 ) + e or equivalently E(Y) = f(x 1, X 2 ). The surface represented by the graph of E(Y) = f(x 1,X 2 ) is known as response 20

surface. The projection of this surface for fixed values of E(Y) onto the (X 1,X 2 )-plane is referred to as the contour plot. Fitting RSM models using MINITAB is very similar to the Factorial designs. In addition to these, there are other options that are unique to RSM. These include surface and contour plots and searching for optimum settings for the parameters. 21

Most applications of RSM are sequential in nature. Start with a screening experiment (reduce a long list of variables into a smaller set) Determine whether the current levels of the factors under study result in optimum for the response value. If not, optimize using method of steepest ascent or descent. When the process is near optimum, obtain a reasonable function that is quite descriptive of the real situation around the region of optimum. Usually a second-order model will be needed. 22

Imagine a 2 2 CENTRAL COMPOSITE DESIGNS design in which first-order model is not appropriate. To introduce quadratic terms and estimate the parameters require running additional experiments at selected points. CCD is a simple and highly efficient design to the above problem. CCD is performed often in a sequential setting. First we run a FD (or a FFD) and if there is evidence of lack of fit, we augment this with axial runs. 23

To generate a CCD we must specify a, the distance of the axial runs from the center of the design, and n C, the number of center points. Usually 3 n C 5. How do we choose a? This is chosen so that the response surface design is rotatable. By a rotatable design, we mean a design in which the variance of the response, s 2 x' (X'X) -1 x, is the same at all points x that have the same distance from the center of the design. In a 2 k design a = (2 k ) 0.25. Thus, if k = 2, a = 2. If k = 4, then a = 2. 24

25 ILLUSTRATIVE EXAMPLE

TAGUCHI APPROACH TO QUALITY So far we saw the classical DOE and some specific designs, and their analysis. Dr. Genichi Taguchi of Japan has incorporated a number of quality engineering methods that use DOE with an idea to design high-quality systems at reduced cost. These methods provide an efficient and systematic approach to optimize designs for performance, quality and cost. These methods emphasize designing quality into the products and processes, compared to the standard method of inspection for quality. 26

These have been very effective in improving quality of Japanese products, and hence got popular in western industries. Although there is some controversy about Taguchis methods in terms of the philosophical and technical aspects, here we will briefly illustrate the basic ideas and the usefulness of these designs in practice. Note that until before Taguchi's methods were introduced, traditional designs were used only to assess effect on averages. However, Taguchi made experimenters aware of the value in using the designs to assess the impact of factors on the variability of the response variable. 27

In practice, one usually deals with many factors to study a response variable. This leads to the study of many test combinations in the case of a full study. A standard method of reducing the number of test runs is to use fractional factorial designs (recall this from earlier lecture on DOE). Taguchi constructed a special set of general designs that consist of orthogonal arrays. These determine the least number of test runs for a given number of factors to be studied. 28

Taguchi's loss function L(y) is: L(y) = k (y - y 0 ) 2, where y represents the quality characteristic (such as dimension, speed, rate, performance), y 0 is the target value for y, and k is a constant that is dependent on the cost structure of the manufacturing process that produces the product. This function possesses the following properties: (a) the loss must be zero when target is attained; (b) the magnitude of the loss increases rapidly as y deviates farther away from the target value; (c) the loss function is a continuous function of the deviation 29

No matter how the quality of a product is defined, the measure will fall under one of the following three characteristics: (A) the smaller the better (B) the larger the better (C) the nominal is best 30

ORTHOGONAL ARRAYS Earlier we saw how factorial designs (FD) and fractional factorial designs (FFD) were used in practice. FFD is a means to reduce the number of runs needed when the number of factors is large. In choosing a FFD for a study, certain treatment conditions are chosen in order to maintain the orthogonality among the various main effects and interaction effects. Orthogonal arrays were first recorded sometime in 1897 by French Mathematician J. Hadamard; but the utility of these 31

were not explored until World War II by British Statisticians: Plackett and Burman. Taguchi has developed a family of FFD's that can be utilized in various situations. The associated design matrices are labeled as L * where * is a selected positive integer. L stands for Latin square design, as these designs are simply a form of well known designs such as Plackett-Burman, FFD or Latin square designs. However, the combination of the loss function and the robust designs to find optimal settings is one of the strongest aspects of Taguchi's method. 32

The construction and the use of Latin squares orthogonal arrays date back to the period of World War II mainly in the context of agricultural applications. Since then they have been used extensively in many applications. Taguchi constructed a new set of OA's using the orthogonal Latin squares in a unique way. This construction along with a set of rules for choosing a particular OA has simplified the task for many engineers and statisticians, who apply DOE in practice. Below we will illustrate the use of Taguchi's OA by taking L 4 and L 8 OA s. 33

L 4 Orthogonal Array Table 1: L 4 Orthogonal Array Run 1 2 3 4 Column number 1 2 3-1 -1-1 -1 1 1 1-1 1 1 1-1 34

From the above table, we see that an L 4 experiment consists of four rows and three columns. Each row corresponds to a particular run in the experiment and each column corresponds to the factors specified in the study. Each column contains 2 low levels and 2 high levels for the factor assigned to that column. We use -1 for low level and 1 for high level. In the first run, for example, the three design variables are set at their low level and in the second run, the first parameter is set at low level and the remaining two variables are set at high level, and so on. 35

L 8 Orthogonal Array Table 2: L 8 Orthogonal Array Run Column number 1 2 3 4 5 6 7 1-1 -1-1 -1-1 -1-1 2-1 -1-1 1 1 1 1 3-1 1 1-1 -1 1 1 4-1 1 1 1-1 -1-1 5 1-1 1-1 1-1 1 6 1-1 1 1-1 1-1 7 1 1-1 -1 1 1-1 8 1 1-1 1-1 -1 1 36

Notice that this is a one-sixteenth FFD which has only the 8 of the total possible 128 runs. The seven columns are used to assign up to 7 factors. If only three factors are of interest, then this will become a full factorial design. When all columns are assigned a factor, this is known as a saturated design. In general, a FFD is said to be saturated when the design only allows for the estimation of main effects. These designs are effective when used as part of screening process when there are a number of factors to be examined. 37

There are two sets of Taguchi's OA's available for use in practice: one set deals with 2-level factors, which are denoted by L 4, L 8, L 12, L 16, L 32 ; the second set deals with 3- level factors: L 9, L 18, and L 27. SELECTION OF AN OA: The selection of an OA is easily achieved once the number of degrees of freedom in the study is determined along with the number of levels of the factors. Taguchi has tabulated a total of 18 standard orthogonal arrays which can be found in many text books dealing with Taguchi methods (see some references at the end of the handout). 38

ASSIGNMENT OF FACTORS TO COLUMNS: As you would have seen there are several columns available in an OA. The question is how to assign the factors (and the interactions of the factors, if any) to these columns? They cannot be done arbitrarily as confounding of the factors will result. If there are no significant interaction effects among the factors, then arbitrarily we can assign factors to columns. Taguchi devised a scheme to assign factors and interactions using linear graphs and triangular tables. The purpose of linear graphs is to indicate which factors may be assigned to which columns and the purpose of the 39

triangular tables is to identify appropriate columns to assign the interactions. Usually there will be more than one choice for the assignment. However, careful assignment is necessary to avoid any unnecessary confounding effects. Any unassigned columns will be used to estimate the error sum of squares. 40

Example 1: Consider an L 4 OA. This one has 3 columns and so three factors or two factors and an interaction of these two factors can be assigned to these columns. For example, Factor A can be assigned to column 1 and Factor B can be assigned to column 2. The interaction AB is now assigned to column 3. Now this assignment resembles the design matrix corresponding to a 2 2 full factorial design except that the interaction effect is really -AB.[Why?]. However, if interaction term is negligible or if one is not interested in it, then another factor, say, Factor C can be assigned to column 3. 41

Example 2: Consider an L 8 OA. This one has 7 columns and so 4 to 7 factors or a full factorial design with 3 factors at two levels can be studied. For studying a 2 3 full factorial design, we can assign Factors A, B and C to columns 1, 2 and 4, respectively. Note that we cannot assign these three factors to the first three columns as this will result in confounding the effects of Factor C and the interaction of AB. One has to use the linear graphs or the triangular tables to appropriately assign the factors. Also, this OA can be used to screen 4 to 7 factors, by assigning them properly. 42

INNER AND OUTER ARRAYS Taguchi proposed a collection of techniques to identify the settings for the controlled factors that will yield a robust performance. These include the selection of the DOE and the statistical analysis of the data. First select a DOE for the controlled factors. The design matrix for this is referred to as an inner array. Now select a DOE for the noise factors. The design matrix for this is referred to as an outer array. For each combination of factors in an inner array, run all combinations of the noise factors in the outer array. 43

Taguchi classifies the parameter design problems into different categories and the effects are evaluated using the concept: signal-to-noise ratio (A) The smaller the better: The target value for the response is zero (why?). Thus, for this S/N ratio= - 10 log( i n = 1 y 2 i / n The goal of the experiment here is to minimize the sum of the squared response values, which is equivalent to maximizing the S/N ratio. ) 44

(B) The larger the better: Here the goal is to maximize the response variable, which is equivalent to minimizing the reciprocal of the response value. Thus, S/N ratio= - 10 n 1 log 2 i = 1 nyi The goal of the experiment here is to maximize the S/N ratio. 45

(C) The nominal the better: For this case Taguchi recommends using: where s 2 2 y S/N ratio= 10 log 2 s is the sample variance of the observations. 46

ANALYSIS OF OA DESIGNS The analysis of designs based on OAs is very similar to the analysis of FFDs seen earlier. But now there is some interest in the effect of the variation also. Using S/N ratio, the effect of variation is studied. G.E.P. Box (Signal-to-Noise ratios, performance criteria, and transformations, Technometrics1988) states that the use of S/N ratio concept is equivalent to an analysis of the logarithm of the data. This is due to the fact that the assumption of the variance is proportional to the mean requires a logarithmic transformation to stabilize the condition. 47

ILLUSTRATIVE EXAMPLE 1 Run A B C y 1 y 2 y s 2 1-1 -1-1 11 19 15 32 2 1-1 -1 4 6 5 2 3-1 1-1 16 14 15 2 4 1 1 1 9 1 5 32 48

ILLUSTRATIVE EXAMPLE 2 An injection molding process engineer is interested in identifying the factors that contribute to variability in part shrinkage as well as to determine the best settings for the factors that will minimize the shrinkage. Part shrinkage is measured as the amount of deviation from the desired part size. Seven controlled factors A-Cycle time; B-Mold temperature; C-Holding pressure; D-Gate size; E-Cavity thickness; F-Holding time; G-Screw speed; and 3 noise factors: H-Ambient temperature; I-Moisture content; J- Percent regrind; were identified after a brain storming session. The data for this study is given in the following table. 49

L 8 : OA for Illustrative Example 2 Run 1 2 3 4 5 6 7 8 A B C D E F G -1-1 -1-1 -1-1 -1-1 -1-1 1 1 1 1-1 1 1-1 -1 1 1-1 1 1 1-1 -1-1 1-1 1-1 1-1 1 1-1 1 1-1 1-1 1 1-1 -1 1 1-1 1 1-1 1-1 -1 1-1 1 1-1 -1 1-1 1-1 -1 1 1 Y 1 Y 2 Y 3 Y 4 2.6 2.7 2.8 2.8 0.8 3.0 0.8 3.2 3.6 1.0 3.3 0.9 2.5 2.4 2.5 2.3 3.5 3.6 3.5 3.5 2.6 4.7 3.6 1.5 4.5 2.4 2.7 5.1 2.4 2.5 2.3 2.4 50