Unit 12: Response Surface Methodology and Optimality Criteria

Size: px
Start display at page:

Download "Unit 12: Response Surface Methodology and Optimality Criteria"

Transcription

1 Unit 12: Response Surface Methodology and Optimality Criteria STA 643: Advanced Experimental Design Derek S. Young 1

2 Learning Objectives Revisit your knowledge of polynomial regression Know how to use polynomial models as an approximation to a true response surface Know the typical coding scheme used in response surface models Become familiar with contour plots Know how to estimate and test in the first-order and second-order models Understand the role of axial points in some of these experimental designs Understand the property of rotatability Know how to estimate of and perform inference on quadratic response surfaces Understand optimality criteria 2

3 Outline of Topics 1 First-Order Models and Visualizations 2 Second-Order Models and Composite Designs 3 Optimality Criteria 3

4 Outline of Topics 1 First-Order Models and Visualizations 2 Second-Order Models and Composite Designs 3 Optimality Criteria 4

5 Polynomial Regression: A Brief Review Polynomial regression models are special cases of the general linear regression model that contain squared or higher-order terms of the predictor variables, thus making the response function curvilinear. Some examples of polynomial regression models are: Quadratic (or 2 nd -order) regression model in one predictor: Y i = β 0 + β 1 X i + β 2 X 2 i + ɛ i p th -order regression model in one predictor: Y i = β 0 + β 1 X i + β 2 X 2 i + + βpxp i + ɛ i 2 nd -order regression model in two predictors: Y i = β 0 + β 1 X i,1 + β 2 X 2 i,1 + β 3X i,2 + β 4 X 2 i,2 + ɛ i Polynomial regression models are linear models because they are linear in the parameters; the linear description does not refer to the shape of the response surface. Because they are linear models, we utilize ordinary least squares for estimating the parameters of polynomial regression models, but also note that maximum likelihood provides the same estimators (and hence, possess the properties of minimum variance, unbiasedness, and consistency). 5

6 Polynomial Regression: A Brief Review When we have replicates at one or more of the predictor levels, we can test for a linear lack-of-fit. The derivation of the SS quantities is similar to those found for some of our standard experimental designs, but now we have the regression model as the primary source of variation and the error partitioned into pure error and lack-of-fit error. Below is the ANOVA table for testing for a linear lack-of-fit: Source df SS MS F Regression p 1 m ni i=1 j=1 (ŷ ij ȳ) 2 MSR MSR/MSE Error n p m ni i=1 j=1 (ŷ ij ŷ i ) 2 MSE Lack of Fit m p m ni i=1 j=1 (ȳ i ŷ ij ) 2 MSLOF MSLOF/MSPE Pure Error n m m ni i=1 j=1 (y ij ȳ i ) 2 MSPE Total n 1 m ni i=1 j=1 (y ij ȳ) 2 In the above, we have p regression parameters, m < n unique X values, and n i is the number of replicates for each of those i = 1,..., m unique predictors. If one does not have replicates, then a general linear F -test can be used. 6

7 Example: Checking Account Data Over time, a bank recorded the number of new checking accounts they received (Y ) when a certain minimum deposit size was advertised (X). In total, 11 different advertising periods were monitored. To the right is a figure of the data with a linear and a quadratic fit. To understand the notation on the previous slide, look at the values for x 2j = 100, where j = 1, 2: y 21 = 112 y 22 = 136 ȳ 2 = 124 ŷ 21 = ŷ 22 = (linear fit) ŷ 21 = ŷ 22 = (quadratic fit) # of New Accounts y 22 = 136 y 2 = 124 y 21 = 112 y^21=y^22= y^21=y^22=99.39 Checking Accounts Size of Minimum Deposit Linear Quadratic 7

8 Example: Checking Account Data Analysis of Variance Table Response: new Df Sum Sq Mean Sq F value Pr(>F) Regression Residuals Lack of fit ** Pure Error Signif. codes: 0 *** ** 0.01 * Above is the test for a linear lack-of-fit. Clearly, the lack-of-fit is significant, with a p-value of Thus, we proceed to fit the quadratic polynomial, which has the following ANOVA (with a significant result for the quadratic fit): Analysis of Variance Table Response: new Df Sum Sq Mean Sq F value Pr(>F) Regression *** Residuals Signif. codes: 0 *** ** 0.01 *

9 Moving Towards Response Surface Methods The objective of all experiments is to characterize the relationship between a response and specified treatment factors. When the factor levels are quantitative, we can use a polynomial regression equation to test for any curvature in the response surface. The response equation can be displayed as a surface when experiments investigate the effect of two quantitative factors, such as the effect of temperature and pressure on the rate of a chemical reaction. The response surface enables the investigator to visually inspect the response over a region of interesting factor levels and to gauge the sensitivity of the response to the treatment factors. In certain industrial applications, the response surface can be explored to determine the combination of factor levels that provides an optimum operating condition; e.g., the combination of temperature and time to maximize the yield of chemical production. They can also be used for analytical studies of fundamental processes; e.g., they can characterize the interplay of factors on the response variable, such as the interaction between nitrogen and phosphorus in soil on the growth of plants. 9

10 RSMs as a Sequential Process To the right is a figure taken from the text Design and Analysis of Experiments, 8 th Edition (2013) by D. C. Montgomery. This graphic illustrates a design process using a response surface model (RSM) in four dimensions: the first three dimensions are the factors (i.e., the cube illustrated in the lower and upper corners) and the fourth dimension is the response space, which the cubes are drawn within. The lower right-hand corner shows where we currently are with our designed experiment. RSMs help inform us as to which direction we need to travel to get at the region of the optimal operating conditions. Traveling along this path and defining new factor levels accordingly will typically yield improved operating conditions. Notice that an RSM is used as a process since we are adjusting our experimental conditions until we identify a region of optimal operating conditions. 10

11 Visualizing Response Surfaces To the right is a solid surface of a fitted response surface equation from some hypothetical data. For this particular case, there are two factors (X1 and X2) and a response (Y ). Response Response Surface We would use a 3D response surface like this to not only show the fitted response surface, but to help us determine features of the underlying relationship; e.g., is it curvilinear and what sort of effect does a given factor have on the response X X

12 Visualizing Response Surface Contours To the right is a contour plot of the response surface given on the previous slide. The contour plot has lines of equal response values (in this case, increments of 1) similar to contours of equal elevation levels shown on topographic maps. General trends one looks for are peaks, rising ridges, and saddle point regions, which we will illustrate later. X Response Surface Contours These help us identify optimal regions in the design space X1 12

13 Approximating True Response Using Polynomial Models Over the ranges in which one or more of the quantitative factors are changed, it is implicitly assumed that the response changes in a regular manner that can be adequately represented by a smooth response surface. In response surface designs, we assume that the mean of the response variable, µ y, is a function of the quantitative factor levels represented by the variables x 1, x 2,..., x k. The true underlying function is, of course, unknown, but polynomial functions tend to provide good approximations. Below are the mean functions for the response models of interest (in terms of two factors): Steepest Ascent Model (first-order model): µ y = β 0 + β 1 x 1 + β 2 x 2 Screening Response Model (first-order model with interaction): µ y = β 0 + β 1 x 1 + β 2 x 2 + β 12 x 1 x 2 Optimization Model (second-order model with interaction): µ y = β 0 + β 1 x 1 + β 2 x 2 + β 11 x β 22x β 12x 1 x 2 In RSMs, orders > 2 (e.g., cubic models) are rarely of interest. 13

14 Contrasting RSM with Polynomial Regression In an RSM, the number of factors, k, are each measured at p levels (usually 2 or 3); however, more complex models can be developed outside of these constraints. The factors are treated as categorical variables, so the design matrix X will have a noticeable pattern based on the way the experiment was designed. The number of factor levels must be at least as large as the number of factors (p k). If examining a response surface with interaction terms, then the model must obey the hierarchy principle (this is not required of general polynomial models, although it is usually recommended). The number of factor levels must be greater than the order of the model (i.e., p > h). The number of observations (n) must be greater than the number of terms in the model, including all higher-order terms and interactions. A rule-of-thumb is to try to have at least 5 observations per term in the model, although this is not always practically feasible. Typically, RSMs only have two-way interactions while polynomial regression models can (in theory) have k-way interactions. 14

15 Example: Contour Plot for Peak/Valley For these next four slides, consider a hypothetical experiment where two factors time and temperature are controlled and the yield for a chemical reaction is measured. One possible shape of the contour plot could be that to the right, which shows a maximum response occurring within the center contour, indicating a symmetrical surface with a peak in the center. A similar shape could be found for a minimum response, but the contours would have smaller values of the response as you approach the center contour, which would indicate a symmetrical surface surface with a valley in the center. Such a figure would indicate where an optimal combination of time and temperature result in a maximum (or minimum) yield. Temp Contour Plot for Yield Time 15

16 Example: Contour Plot for Rising (Sloping) Ridge Another possible shape of the contour plot is to the right, which shows a rising (sloping) ridge. This is similar to the contour plot on the previous slide, but the maximum occurs outside of the design space. We can similarly have a decreasing ridge, where the contours are decreasing towards a minimum region that falls outside of the design space. Temp Contour Plot for Yield Time 16

17 Example: Contour Plot for Stationary Ridge Contour Plot for Yield 68 Another possible shape of the contour plot is to the right, which shows a stationary ridge. There is a decreasing trend in the contours as you move in either direction of the maximum ridge, which is at a value of 75. Temp Time 17

18 Example: Contour Plot for Saddlepoint Region Contour Plot for Yield A final possible shape of the contour plot is given to the right, which shows a saddle contour plot or minimax. In this case, the response can increase or decrease from the center of the region, depending on the direction of movement from the center. Temp Time 18

19 Coding Factor Levels Coded factor levels provide a uniform framework to investigate factor effects in any experimental context, since the units of different factors will usually differ. Suppose we have i = 1,..., k factors each measured at j = 1,..., p levels; i.e., a p k factorial design. Let Zi1 < Z i2 < < Z ip be the p levels of factor Z i, where, typically, the levels will be chosen at equal increments between the minimum factor level (Z i1 ) and the maximum factor level (Z ip ). Coded levels for this design are given by Z ij Z X ij = 1 2 (Z ip Z i1 ), where Z is the average level for factor Z i. For example, suppose we are performing an experiment with k = 2 factors where one of the factors, say Z 1, is a certain chemical concentration in a mixture. The factor levels for the chemical concentration are Z 11 = 10%, Z 12 = 20%, and Z 13 = 30% (so p = 3). The factors are then coded in the following way: 10 [ ]/3 X 11 = = 1 [30 10]/2 20 [ ]/3 X 12 = = 0 [30 10]/2 30 [ ]/3 X 13 = = +1 [30 10]/2 19

20 Using 2 k Factorial Designs Complete or fractional factorial designs can be used for the initial experimental conducted when studying response surfaces. When the region of optimal response is unknown, then 2 k factorial or fractional factorial designs can help identify factors that affect the response variable. 2 k factorials are suitable and highly-effective designs to estimate the mean responses for the first-order model. In order to both assess the adequacy of the linear RSM and to estimate experimental error, the standard practice is to include two (or more) observations at the middle of all factor levels under consideration. These are called center points or center design points. 20

21 Method of Steepest Ascent Ultimately, the investigator will want to characterize the region of optimum response. We don t know if a peak or hill exists in the first place, so we choose factor levels where we think the optimum exists. While we might not identify the best region for our design, we can use the first-order model to help us move towards that region. Specifically, the method of steepest ascent is a procedure developed to move the experimental region in the response variable in a direction of maximum change toward the optimum. 21

22 Example: Vinylation of Methyl Glucoside Vinylation of methyl glucoside occurs when it is added to acetylene under high pressure and high temperature in the presence of a base to produce monovinyl ethers. The monovinyl ether products are useful for various industrial synthesis processes. A study was conducted to illustrate the methods to identify and evaluate important factors for response surface characterization. The ultimate goal of the project was to determine which conditions produced maximum conversion of methyl glucoside to each of several monovinyl isomers. We attempt to identify important factors with the first-order RSM µ y = β 0 + β 1 x 1 + β 2 x 2. The treatment design was a 2 2 factorial with temperature (with levels 130 and 160 ) and pressure (with levels 325 psi and 475 psi) as factors. Four replications were conducted in the center of the experimental region at a temperature of 145 C and a pressure of 400 psi, which are used to provide an estimate of experimental error variance and to evaluate the adequacy of the linear response model. Thus, we will fit a steepest ascent model to the 2 2 factorial portion of the design and then use the replicates at the design center for an estimate of experimental error variance. The data are below: Temperature Pressure % Conversion

23 Example: Vinylation of Methyl Glucoside For these data, the coded factors are X 11 = X 12 = X 1C = X 21 = X 22 = X 2C = 130 [ ]/2 [ ]/2 160 [ ]/2 [ ]/2 145 [ ]/2 [ ]/2 325 [ ]/2 [ ]/2 475 [ ]/2 [ ]/2 400 [ ]/2 [ ]/2 = 1 = +1 = 0 = 1 = +1 = 0 The ordinary least squares estimates for the first-order RSM are (Intercept) FO(Temp, Pressure)Temp FO(Temp, Pressure)Pressure

24 Example: Vinylation of Methyl Glucoside While the estimates for the regression coefficients were calculated on the previous slide using ordinary least squares, we can also obtain the same estimates when calculating the treatment effects for a 2 2 design: (intercept) ˆβ 0 = ȳ = 1 ( ) = 20 4 (temperature slope) ˆβ 1 = 1 ( ) = 8 4 (pressure slope) ˆβ 2 = 1 ( ) = 4 4 The signs in the above contrasts are dictated by the design matrix for a 2 2 factorial design in standard order: Intercept Temperature Pressure % Conversion

25 Example: Vinylation of Methyl Glucoside The temperature by pressure interaction measures the lack-of-fit to the linear model and is represented by including β 3x 1x 2 in the first-order model. The ordinary least squares estimates for the screening response model are: (Intercept) FO(Temp, Pressure)Temp FO(Temp, Pressure)Pressure TWI(Temp, Pressure) We see that the interaction term is 0, thus indicating that temperature and pressure act independently on the percentage of conversion. Note that we can again obtain these estimates by calculating the treatment effects for a 2 2 design. For the interaction term, we have (interaction) ˆβ 12 = 1 ( ) = 0, 4 where the signs in the above contrast are dictated by the signs calculated for the interaction term using the design matrix on the previous slide: (+,-,-,+). 25

26 Example: Vinylation of Methyl Glucoside The variance of the n = 4 observations at the design center is s 2 = 3.33 and an estimate of the standard error for the RSM coefficients is 4 4 s ˆβ = n2 k (s2 ) = (3.33) = Whether the experimental error variance is adequately estimated with replication only at the center of the design factor levels can matter. If the variance of the response depends on the factor level, then replication at the low- and high-levels of the factor combinations is recommended to detect any heterogeneous variability among the treatment combinations. The replicate observations at the design center also allow us to measure the degree of curvature in the experimental region. Letting ȳ f = 20 be the mean of the responses for the 2 2 factorial design and ȳ c = 22 be the mean of the center points, the difference (ȳ f ȳ c) = 2 is an estimate of (β 11 + β 22 ) from the second-order optimization model. The standard error of this estimate is estimated by ( ) = Since the estimate is within two standard errors of 0, it is likely that the degree of curvature is not significant and, thus, the first-order linear response model is appropriate. 26

27 Example: Vinylation of Methyl Glucoside Based on the estimated linear equation ŷ = x 1 + 4x 2, the path of steepest ascent perpendicular to the contours of equal response moves 4 units in the x 2 direction for every 8 units in the x 1 direction. Equivalently, the path has a movement of 4/8=0.5 unit in the x 2 direction for every 8/8=1 unit in the x 1 direction. This tells the experimenter how many units they should consider moving for future runs of the experiment to identify an optimal region for testing. To the right is the contour plot for the current design, where the open circles in the corners represent the four design points of the 2 2 design and the solid point in the center represents the center design point. Temperature y^ = 30 y^ = 25 y^ = 20 y^ = 15 Contour Plot Pressure 27

28 Example: Vinylation of Methyl Glucoside The path of steepest ascent is started at the center of the design with coded units (x 1, x 2 ) = (0, 0). The center of the design values in uncoded units is (145, 400). A change of x 1 = 1 in the x 1 direction is a 15 change in temperature. A change of x 2 = 0.5 in the x 2 direction is a 37.5 psi change in pressure. The experimenter will perform experiments in combination along this path of steepest ascent (the red line). Eventually, the increases in the response will become smaller until an actual decrease is observed in the response. The decrease should indicate the region of maximum response in the neighborhood of the current temperature and pressure conditions. At that point in the process, an experiment can be designed to estimate a quadratic polynomial equation to approximate the response surface. Temperature y^ = 25 y^ = 20 y^ = 15 Contour Plot Step 4 Step 3 Step 2 Step 1 y^ = Pressure 28

29 Example: Vinylation of Methyl Glucoside The table below gives the points (in uncoded and coded units) along the path of steepest ascent. The experimenter would continue to test in this manner until a decrease in the response is observed. Step x 1 x 2 Temperature Pressure

30 Outline of Topics 1 First-Order Models and Visualizations 2 Second-Order Models and Composite Designs 3 Optimality Criteria 30

31 Curvature in the Response Surface Suppose we have performed the method of steepest ascent for our experiment, which consequently allowed us to identify a region of optimum response. The surface in the region of optimum response is approximated by a quadratic equation, which usually does a good job characterizing curvature in the response surface. While 2 k factorial and fractional factorial designs are helpful for identifying the region of optimum response, they do provide insufficient information for estimating quadratic surfaces in this region. Desirable properties for a response surface estimation include the ability to: estimate experimental error variance; test for lack-of-fit; efficiently estimate model coefficients (parameters); and predict responses. 31

32 Central Composite Designs 3 k factorial designs can be used to estimate quadratic polynomial equations, but it is clear that as the number of factors k increases, then the number of treatment combinations becomes impractical quickly. A common alternative to the 3 k factorial is the central composite design, which is a 2 k factorial designs with 2k additional treatment combinations called axial points, which occur along the coordinate axes of the coded factor levels. This design is also called the circumscribed central composite design. The 2 k factorial design points are also called cube points. The coordinates of the axial points are placed at (±α, 0, 0,..., 0), (0, ±α, 0,..., 0),..., (0, 0, 0,..., ±α). How we choose α will be discussed later in this lecture. Central composite designs generally also have m replications added to the design center, (0, 0, 0,..., 0). Therefore, the total number of design points for a central composite design is N = 2 k + 2k + m. 32

33 Our Recipe for Sequential Experimentation Assume that we are conducting a study involving k factors. Our general recipe for performing sequential experimentation is: 1 Conduct a series of tests along the path of steepest ascent until you arrive at the optimal region (or best possible region given practical constraints). 2 Once identifying the optimal region (or region where you want to perform extensive testing), conduct a new 2 k factorial design with several replications at the design center. 3 Calculate the difference between the mean of the 2 k factorial design (ȳ f ) and the mean of the center points (ȳ c). If (ȳ f ȳ c) is small, then the first-order model is likely adequate for your experiment. If (ȳ f ȳ c) is large, then add axial points at ±α along each coded axis and perform additional experiments here. The treatment combinations at these axial points, along with those at the center points and the 2 k factorial design, make up a central composite design. 33

34 Example: Central Composite Design (2 Factors) Consider the setting where we have two factors, each measured at two levels. Figure (a) is the design for a 2 2 factorial. The corners are the standard coded design points of (0, ±1) and (±1, 0). Figure (b) shows the set of axial points (0, ±α) and (±α, 0), where additional runs of experiments will be conducted. Figure (c) is the result of overlying the axial points on the 2 2 factorial. When we add replicates at the design center, we then have a central composite design. (a) (b) (c) 34

35 Example: Central Composite Design (2 Factors) Below are the design points for a central composite designs with k = 2 factors: 2 2 Design Axial Center x 1 x 2 x 1 x 2 x 1 x α α α α We assume there are m replications. 35

36 Example: Central Composite Design (3 Factors) Consider the setting where we have three factors, each measured at two levels. Figure (d) is the design for a 2 3 factorial. The corners of the cube are the standard coded design points of (0, 0, ±1), (0, ±1, 0), and (±1, 0, 0). Figure (e) shows the set of axial points (0, 0, ±α), (0, ±α, 0), and (±α, 0, 0), where additional runs of experiments will be conducted. Figure (f) is the result of overlying the axial points on the 2 3 factorial cube. When we add replicates at the design center, we then have a central composite design. (d) (e) (f) 36

37 Example: Central Composite Design (3 Factors) Below are the design points for a central composite designs with k = 3 factors: 2 3 Design Axial Center x 1 x 2 x 3 x 1 x 2 x 3 x 1 x 2 x α α α α α α We, again, assume there are m replications. 37

38 Example: Vinylation of Methyl Glucoside Let us return to the vinylation of methyl glucoside experiment. Recall that we are interested in the effects of temperature and pressure on a conversion percentage measurement. Suppose that after performing two steps of the method of steepest ascent, the experimenter found a decrease in the conversion percentage. The conditions for this case (in uncoded units) are 175 C (temperature or T ) and 475 psi (pressure or P ). This is the design center. From the original experiment, we know that the temperature levels are 15 C from the design center and the pressure levels are 75 psi from the design center. In uncoded units, the value of α = 2 is calculated (how this is calculated will be discussed shortly). This means our axial points for temperature are Our axial points for pressure are T (lower) = 175 ( 2)(15) = T (upper) = ( 2)(15) = P (lower) = 475 ( 2)(75) = P (upper) = ( 2)(75) =

39 Example: Vinylation of Methyl Glucoside Vinylation Experimental Design To the right is a figure of the design points (on the uncoded scale) for the central composite design that the experimenter should use. Notice how the axial points are connected by a circle. There should be replications at the design center to estimate experimental error. Temperature Design Center Cube Points Axial Points Pressure 39

40 Example: Vinylation of Methyl Glucoside Below is a possible configuration of the experiment using this central composite design. m = 4 replications are included at the design center. Note that the runs are blocked by the cube points and the axial points each having two runs at the design center. run.order std.order Temp Pressure Block Data are stored in coded form using these coding formulas... x1 ~ (Temp - 175)/15 x2 ~ (Pressure - 475)/75 40

41 Rotatability Equal precision for all estimates of means is a desirable property in any experimental setting. The precision of the estimated values on the response surface based on the estimated regression equation will not be constant over the entire experimental region. The property of rotatability in a central composite design requires that the variance of estimated values be constant at points equidistant from the (coded) design center. Prior to conducting a study, little or no knowledge may exist about the region that contains the optimum response therefore, the experimental design matrix should not bias an investigation in any direction, which is why rotatability is a desirable property. 2 k factorials used as first-order designs to implement the method of steepest ascent are rotatable designs, which implies the design orientation does not hinder the method of steepest ascent. 41

42 Calculating Axial Points The central composite design can be made rotatable by setting the axial points at α = (2 k ) 1/4. The value of α for a two-factor design is α = (4) 1/4 = 2 = 1.414, which we saw in the methyl glucoside example. The value of α for a three-factor design is α = (8) 1/4 = If there are n e replications of the 2 k factorial and n a replications of the axial treatment combinations, then a more general form for α is α = (n e 2 k /n a ) 1/4. If a 2 k f fractional factorial is used, then α = (n e 2 k f /n a ) 1/4. 42

43 Inscribed Central Composite Design In some situations, the limits specified for the factor levels are truly limits. In this case, we can set the factor level settings as the axial points and create a factorial (or fractional factorial) design within those limits. This type of central composite design is called an inscribed central composite design. An inscribed central composite design is a scaled-down circumscribed central composite design, where each factor level of the circumscribed design is divided by α to generate the inscribed design. This design requires five levels of each factor. 43

44 Designs for Uniform Precision in the Center As noted earlier, the variance of the estimated surface is not constant over the entire surface. However, the number of center points in a rotatable central composite design can be chosen to provide a design with uniform precision for the estimated surface within one unit of the design center coordinates on the coded scale. These can be constructed for full or fractional factorial designs, examples of which are given in the table below: Number of Factors Fractional Type Full Full Full Full Half Full Half α Number of Cube Points Number of Axial Points m N

45 Face-Centered Cube Design The circumscribed and inscribed central composite designs each require five levels of each factor coded as 1, α, 0, +α, and +1. Five levels of each factor could be too difficult, expensive, or time-consuming. The face-centered cube design is another variation of the central composite design, but where α = 1 and, thus, requires only three levels of each factor. The design is most attractive when the region of interest is a cuboidal region and not a spherical region like for the circumscribed and inscribed designs. The face-centered design is not rotatable, but the absence of this property may be offset by the desire to have a cuboidal inference region. 45

46 Central Composite Designs Below are the three different central composite designs we have discussed. Figure (g) is the circumscribed central composite design, where α = 2. Figure (h) is the inscribed central composite design, where α = 1 and the factorial design points have been divided by 2. Figure (i) is the face-centered central composite design, where α = 1. Circumscribed Inscribed Face-Centered (g) (h) (i) 46

47 Box-Behnken Designs Box-Behnken designs are a class of three-level designs (i.e., three levels for each factor) that are used to estimate second-order response surfaces. The designs are rotatable (or nearly so) with a reduction in the number of EUs compared to 3 k designs. The treatment combinations are at the midpoints of edges of the corresponding cuboidal design space. The designs have limited capability for orthogonal blocking compared to central composite designs. Box-Behnken designs should only be used if one is not interested in predicting responses at the corners of the cuboidal region. 47

48 Box-Behnken Designs Below are the coordinates for a Box-Behnken three-factor design with an arbitrary number of replicates at the design center: Factor A B C Coded Level x 1 x 2 x Factorial for A and B Factorial for A and C Factorial for B and C Design Center

49 Box-Behnken Designs To the right is a figure of the Box-Behnken design given on the previous slide. The different colors have been included just to help with the visualization. As you can see, the design points all fall on the edges of the cube rather than the corners, while there is still a point in the middle of the cube for the design center. 49

50 Quadratic Response Surface Estimation After identifying the region of optimum response (e.g., through experimentation or the method of steepest ascent), we then proceed to characterize that region using a response surface. Designing experiments using the central composite designs discussed thus far allows us to obtain data for estimating a quadratic approximation to the response surface. The estimated response equation will enable us to locate a stationary response point that could be a maximum, minimum, or a saddle point on the surface. An examination of the contour plot will indicate how sensitive the response variable is to each of the factors and to what degree the factors affect the response variable. 50

51 Example: Tool Life Experiment A new cutting tool available from a vendor was going to be used by a company. The vendor claimed the new model tool would reduce production costs because it would last longer than the old model; thus reducing tool replacement costs. The life of the cutting tool depends on several operating conditions, including the speed of the lathe and the depth of the cut made by the tool. The plant engineer determined from previous studies that the maximum tool life was achieved for the current tool with a lathe velocity setting of 400 and a cutting depth setting of The engineer wanted to determine the optimum settings required for the new tool. A circumscribed central composite design was used for an experiment to characterize the life of the new tool. The 2 2 factorial design points for lathe speed are 200 and 600 and for depth they are and The axial points for lathe speed are 400 ( 2)(200) = 117 and ( 2)(200) = 683, while the axial points for cutting depth are ( 2)(0.025) = and ( 2)(0.025) = m = 6 replications were made at the design center (400,0.075). 51

52 Example: Tool Life Experiment Below is the data, which shows the 2 2 factorial design cube points (black), the axial points (red), and the center points (green). Original Factors Coded Factors Lathe Speed Cutting Depth x 1 x 2 Tool Life

53 Example: Tool Life Experiment We next turn to fitting the optimization model; i.e., the second-order model with an interaction term: y = β 0 + β 1 x 1 + β 2 x 2 + β 11 x β 22x β 12x 1 x 2 + ɛ, where, note, that the model to be estimated is in terms of the coded units. Below is the output for the least squares estimates, including the individual t-tests about the regression coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-12 *** speed * depth e-06 *** speed:depth ** speed^ ** depth^ e-05 *** --- Signif. codes: 0 *** ** 0.01 * Multiple R-squared: ,Adjusted R-squared: F-statistic: on 5 and 8 DF, p-value: 9.991e-06 Therefore, the estimated second-order response model with an interaction is ŷ = x x x x x 1x 2 Notice in the above output that all of the individual t-tests show that the regression coefficients are significantly different from 0. Moreover, Radj 2 = , which indicates that a high percentage of variation in tool life is explained by the predictors (factors) used in the model. 53

54 Example: Tool Life Experiment Analysis of Variance Table Response: Life Df Sum Sq Mean Sq F value Pr(>F) FO(speed, depth) *** TWI(speed, depth) ** PQ(speed, depth) *** Residuals Lack of fit Pure Error Signif. codes: 0 *** ** 0.01 * Above is the ANOVA table for the optimization model. The first line is the regression SS due to the first-order terms in the model; i.e., SSR(x 1, x 2 ). The second and third lines are the Type I SS for the interaction and quadratic effect, respectively; i.e., SSR(x 1 x 2 x 1, x 2 ) and SSR(x 2 1, x2 2 x 1, x 2, x 1 x 2 ). Therefore, the SS due to the entire optimization model is obtained by the typical SS partitioning: SSR(x 1, x 2, x 1 x 2, x 2 1, x2 2 ) = SSR(x 1, x 2 ) + SSR(x 1 x 2 x 1, x 2 ) + SSR(x 2 1, x2 2 x 1, x 2, x 1 x 2 ) = = The df for the full optimization model are also additive, thus we have = 5 df. Moreover, the mean square is MSR(x 1, x 2, x 1 x 2, x 2 1, x2 2 ) = SSR(x 1, x 2, x 1 x 2, x 2 1, x2 2 )/5 = /5 =

55 Example: Tool Life Experiment Analysis of Variance Table Response: Life Df Sum Sq Mean Sq F value Pr(>F) FO(speed, depth) *** TWI(speed, depth) ** PQ(speed, depth) *** Residuals Lack of fit Pure Error Signif. codes: 0 *** ** 0.01 * The SSE can also be partitioned into two parts: the SS due to lack-of-fit (SSLOF) and the SS due to the pure error (SSPE). SSPE=260.0, which has 5 df and is computed from the m = 6 replicate observations at the design center. SSLOF=110.8, which has 3 df, can be attributed to error in specification of the response surface model, or rather, the lack-of-fit of the specified model. For testing the null hypothesis of no lack-of-fit, we have F = MSLOF MSPE = = , which follows an F 3,5 distribution. The p-value is 0.586, which indicates that there is not a statistically significant lack-of-fit for the optimization model. Note that the estimate of pure error provides an unbiased estimate of the experimental error variance σ 2. Therefore, we will test the different sources of variation due to the model against the pure error (and not the error due to the residuals). 55

56 Example: Tool Life Experiment Analysis of Variance Table Response: Life Df Sum Sq Mean Sq F value Pr(>F) FO(speed, depth) *** TWI(speed, depth) ** PQ(speed, depth) *** Residuals Lack of fit Pure Error Signif. codes: 0 *** ** 0.01 * A test of H 0 : β 1 = β 2 = β 11 = β 22 = β 12 = 0 for the complete optimization model is F = MSR(x 1, x 2, x 1 x 2, x 2 1, x2 2 ) MSPE = = 42.10, which follows an F 5,5 distribution. The p-value is 4.344e-04, which means we reject the null hypothesis and claim that the optimization model is significant. A test of H 0 : β 1 = β 2 = 0 for the linear component of the model is F = MSR(x 1, x 2 ) MSPE = = 57.05, which follows an F 2,5 distribution. The p-value is 3.612e-04, which means we reject the null hypothesis and claim that the linear component of the model is significant. 56

57 Example: Tool Life Experiment Analysis of Variance Table Response: Life Df Sum Sq Mean Sq F value Pr(>F) FO(speed, depth) *** TWI(speed, depth) ** PQ(speed, depth) *** Residuals Lack of fit Pure Error Signif. codes: 0 *** ** 0.01 * A test of H 0 : β 12 = 0 for the interaction component of the model is F = MSR(x 1x 2 x 1, x 2 ) MSPE = = 17.89, which follows an F 1,5 distribution. The p-value is 8.252e-03, which means we reject the null hypothesis and claim that the interaction term is significant. A test of H 0 : β 11 = β 22 = 0 for the quadratic component of the model is F = MSR(x2 1, x2 2 x 1, x 2, x 12 ) MSE = = 39.27, which follows an F 2,5 distribution. The p-value is 8.766e-04, which means we reject the null hypothesis and claim that the quadratic component of the model is significant. 57

58 Location of Coordinates for the Stationary Point Estimates of the coordinates for the stationary point on an estimated response surface as well as the estimated response at the stationary point, provide a more specific characterization of the response surface. For k factors, let ŷ = ˆβ 0 + x T b + x T Bx be the estimated quadratic response surface model with all two-way interactions. In the above, x T = (x1, x 2,..., x k ) and b T = ( ˆβ 1, ˆβ 2,..., ˆβ k ) are each k-dimensional vectors and ˆβ 11 ˆβ12 /2 ˆβ13 /2 ˆβ1k /2 ˆβ 12 /2 ˆβ22 ˆβ23 /2 ˆβ2k /2 ˆβ B = 13 /2 ˆβ23 /2 ˆβ33 ˆβ3k / ˆβ 1k /2 ˆβ2k /2 ˆβ3k /2 ˆβkk The stationary point is found by ŷ x = b + 2Bx = 0 = x SP = 1 2 B 1 b, which substitution into the estimated response function yields the estimated response ŷ SP = ˆβ 0 + x T SP b/2 at the stationary point. 58

59 The Canonical Form The canonical form of the quadratic response equation is given by k ŷ = ŷ SP + λ i Z 2 i, (1) i=1 where the λ i are the eigenvalues of the matrix B and the Z i are variables associated with the rotated axes that correspond to the axes of the contours of the response surface. The origin for the rotated coordinate system is the stationary point with all Zi = 0 and response ŷ SP. The eigenvalues of the B are the roots of the characteristic equation B λi = 0. The relationship between B and the λi is Bm i = m i λ i, i = 1,..., k, where the m i are the eigenvectors corresponding to the λ i, and note that m i 2 = 1. The relationship between the coded versions of x and the vector of canonical variables Z is Z = M T (x x SP ), where the column vectors of M are the normalized eigenvectors m i. 59

60 Example: Tool Life Experiment Recall that the estimated response surface model for the tool experiment is ŷ = x x x x x 1x 2. Setting the partial derivatives equal to 0 produces the following equations: ŷ x 1 = 0 = 2( )x 1 + ( )x 2 = ŷ x 2 = 0 = ( )x 1 + 2( )x 2 = The solutions to the above equations are x 1(SP ) = and x 2(SP ) = The estimated response at the stationary point is ŷ SP = ( 0.156) (0.665) ( 0.156) (0.665) ( 0.156)(0.665) = Given x 1 = (V 400)/200 and x 2 = (D 0.075)/0.025, the values of lathe speed (V ) and cutting depth (D) at the stationary point are V = 0.156(200) = D = 0.665(0.025) =

61 Example: Tool Life Experiment The characteristic equation is ( B λi = / / ) ( λ 0 0 λ ) = 0, with λ λ = 0. The roots of this quadratic equation are λ 1 = 6.92 and λ 2 = Thus, the canonical equation using ŷ SP found above is ŷ = Z Z 2 2. The matrix of normalized eigenvectors is ( M = The coordinate of the stationary point is (x 1,SP, x 2,SP ) and the relationship between the canonical variables and the coded factor variables is ( ) ( ) ( ) Z (x ) = Z (x ) or ). Z 1 = (x ) (x ) Z 2 = (x ) (x ). 61

62 Example: Tool Life Experiment Note that the previous quantities can also be obtained in R: Stationary point of response surface: speed depth Stationary point in original units: Speed Depth Eigenanalysis: $values [1] $vectors [,1] [,2] speed depth

63 Example: Tool Life Experiment To the right is a contour plot of the estimated tool life based on the second-order response surface model with an interaction. We have marked where the stationary point is located, which tells us the ideal levels of lathe speed and cutting depth for maximizing tool life. The orientation of the contours highlights the interaction between the two factors (otherwise we would simply see circular contours instead of elliptical contours). Depth Contours of Tool Life Speed 63

64 Example: Tool Life Experiment To the right is a perspective plot of the estimated tool life based on the second-order response surface model with an interaction Tool Life The same conclusions can be made here, however, such a plot allows us to better visualize the slopes when you have increasing/decreasing levels of a particular factor Depth Speed

65 Outline of Topics 1 First-Order Models and Visualizations 2 Second-Order Models and Composite Designs 3 Optimality Criteria 65

66 Overview The standard response surface designs, such as the central composite design, the Box-Behnken design, and the face-centered design, are widely used because they are quite general and flexible designs. If the experimental region is either a cube or a sphere, typically a standard response surface design will be applicable to the problem. However, occasionally an experimenter encounters a situation where a standard response surface design may not be the obvious choice. This situation leads us to a particular type of design theory. 66

67 Optimal Design Theory Optimal design theory is a measure-theoretic approach in which an experimental design is viewed in terms of a design measure. In other words, optimal design theory sets forth criteria for us to select the best design from a class of candidate designs. Note that while the criteria are rooted in measure theory, we forego such discussions and merely present some of the more common criteria. We also note that some criteria focus on good estimation of model parameters, while others focus on good prediction in the design region. Design optimality criteria are characterized by letters of the alphabet and as a result, are often called alphabetic optimality criteria. In our discussion below, let Ξ be the set of all continuous designs on Ω (our design space) and ξ Ξ be a particular design. The designs are treated as probability measures on Ω. 67

68 The Moment Matrix One approach that focuses on good model parameter estimation is based on the notion that the experimental design should be chosen so as to achieve certain properties in the moment matrix: M = XT X N, where N is, again, the design size. Note that M is not the same matrix as in our response surface model example from earlier. The inverse of M: M 1 = N(X T X) 1, is the scaled dispersion matrix and contains variances and covariances of the regression coefficients scaled by N/σ 2. It further turns out that the determinant of the moment matrix, M = XT X N p, where p is the number of parameters in the model, is helpful for design optimality. 68

69 D-Optimality and A-Optimality A D-optimal design is one in which M = X T X /N p is maximized (or, equivalently, one may minimize N p X T X 1 ); that is, max M(ξ). ξ Ξ The D-efficiency of a design ξ Ξ is ( ) 1/p D eff = M(ξ )/ max M(ξ). ξ Ξ Recall that the variances of regression coefficients appear on the diagonals of (X T X) 1. An A-optimal design is one that optimizes the trace of M; i.e., The A-efficiency of a design ξ Ξ is min tr ξ Ξ (M(ξ)) 1. A eff = tr (M(ξ )) 1 min ξ Ξ tr (M(ξ)) 1. 69

70 Other Optimality Criteria D-optimality and A-optimality are both geared towards optimizing the model parameters within our design space. Some other criteria, which we merely state, include C-optimality: this criterion minimizes the variance of a best linear unbiased estimator of a predetermined linear combination of model parameters. E-optimality: this criterion maximizes the minimum eigenvalue of the information matrix. T -optimality: this criterion maximizes the trace of the information matrix. 70

71 Optimizing with Respect to Prediction Variance However, we might be interested in a criteria that optimizes our design with respect to prediction variance. For such criteria, we use the scaled prediction variance as part of our measure: v(x h ) = NVar(Ŷ ) σ 2 = Nx T h (XT X) 1 x h, where x h reflects a specified location in the design space as well as the nature of the model. In particular, one might be interested in designs for which the maximum of v(x) in the region of the design is not too large. We seek to protect against the worst case prediction variance, since when we use the results for our analysis, we may wish to predict new response values anywhere in the design region. 71

72 G-Optimality A G-optimal design is one in which we have ( ) min max v(x), ξ Ξ x R p which is equivalent to ( { } ) min max x T ξ Ξ x R p h (M(ξ)) 1 x h. The G-efficiency of a design ξ Ξ is p G eff = max x R p v(x), which results from the fact that v(x) is a scale-free quantity. Some other criteria for optimizing with respect to prediction variance, which we merely state, include I-optimality: this criterion seeks to minimize the average prediction variance over the design space. V -optimality: this criterion seeks to minimize the average prediction variance over a set of m specific points 72

73 Remark on Optimality It should be emphasized that the notion of design optimality considers an experimental design as a probability measure, so most finite designs (i.e., having N < design points) will naturally have design efficiencies less than 1. As a result, smaller designs will naturally have smaller efficiencies. Moreover, numerical algorithms are used to construct the design efficiencies, perhaps the most common of which is called Federov s exchange algorithm. 73

7. Response Surface Methodology (Ch.10. Regression Modeling Ch. 11. Response Surface Methodology)

7. Response Surface Methodology (Ch.10. Regression Modeling Ch. 11. Response Surface Methodology) 7. Response Surface Methodology (Ch.10. Regression Modeling Ch. 11. Response Surface Methodology) Hae-Jin Choi School of Mechanical Engineering, Chung-Ang University 1 Introduction Response surface methodology,

More information

Response Surface Methodology

Response Surface Methodology Response Surface Methodology Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 27 1 Response Surface Methodology Interested in response y in relation to numeric factors x Relationship

More information

Response Surface Methodology

Response Surface Methodology Response Surface Methodology Process and Product Optimization Using Designed Experiments Second Edition RAYMOND H. MYERS Virginia Polytechnic Institute and State University DOUGLAS C. MONTGOMERY Arizona

More information

Response Surface Methodology III

Response Surface Methodology III LECTURE 7 Response Surface Methodology III 1. Canonical Form of Response Surface Models To examine the estimated regression model we have several choices. First, we could plot response contours. Remember

More information

Design and Analysis of Experiments

Design and Analysis of Experiments Design and Analysis of Experiments Part IX: Response Surface Methodology Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Methods Math Statistics Models/Analyses Response

More information

Response Surface Methodology:

Response Surface Methodology: Response Surface Methodology: Process and Product Optimization Using Designed Experiments RAYMOND H. MYERS Virginia Polytechnic Institute and State University DOUGLAS C. MONTGOMERY Arizona State University

More information

Contents. Response Surface Designs. Contents. References.

Contents. Response Surface Designs. Contents. References. Response Surface Designs Designs for continuous variables Frédéric Bertrand 1 1 IRMA, Université de Strasbourg Strasbourg, France ENSAI 3 e Année 2017-2018 Setting Visualizing the Response First-Order

More information

DESIGN OF EXPERIMENT ERT 427 Response Surface Methodology (RSM) Miss Hanna Ilyani Zulhaimi

DESIGN OF EXPERIMENT ERT 427 Response Surface Methodology (RSM) Miss Hanna Ilyani Zulhaimi + DESIGN OF EXPERIMENT ERT 427 Response Surface Methodology (RSM) Miss Hanna Ilyani Zulhaimi + Outline n Definition of Response Surface Methodology n Method of Steepest Ascent n Second-Order Response Surface

More information

2 Introduction to Response Surface Methodology

2 Introduction to Response Surface Methodology 2 Introduction to Response Surface Methodology 2.1 Goals of Response Surface Methods The experimenter is often interested in 1. Finding a suitable approximating function for the purpose of predicting a

More information

Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 3rd Edition

Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 3rd Edition Brochure More information from http://www.researchandmarkets.com/reports/705963/ Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 3rd Edition Description: Identifying

More information

Process/product optimization using design of experiments and response surface methodology

Process/product optimization using design of experiments and response surface methodology Process/product optimization using design of experiments and response surface methodology Mikko Mäkelä Sveriges landbruksuniversitet Swedish University of Agricultural Sciences Department of Forest Biomaterials

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Chemometrics Unit 4 Response Surface Methodology

Chemometrics Unit 4 Response Surface Methodology Chemometrics Unit 4 Response Surface Methodology Chemometrics Unit 4. Response Surface Methodology In Unit 3 the first two phases of experimental design - definition and screening - were discussed. In

More information

Contents. 2 2 factorial design 4

Contents. 2 2 factorial design 4 Contents TAMS38 - Lecture 10 Response surface methodology Lecturer: Zhenxia Liu Department of Mathematics - Mathematical Statistics 12 December, 2017 2 2 factorial design Polynomial Regression model First

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

OPTIMIZATION OF FIRST ORDER MODELS

OPTIMIZATION OF FIRST ORDER MODELS Chapter 2 OPTIMIZATION OF FIRST ORDER MODELS One should not multiply explanations and causes unless it is strictly necessary William of Bakersville in Umberto Eco s In the Name of the Rose 1 In Response

More information

Response-surface illustration

Response-surface illustration Response-surface illustration Russ Lenth October 23, 2017 Abstract In this vignette, we give an illustration, using simulated data, of a sequential-experimentation process to optimize a response surface.

More information

J. Response Surface Methodology

J. Response Surface Methodology J. Response Surface Methodology Response Surface Methodology. Chemical Example (Box, Hunter & Hunter) Objective: Find settings of R, Reaction Time and T, Temperature that produced maximum yield subject

More information

14.0 RESPONSE SURFACE METHODOLOGY (RSM)

14.0 RESPONSE SURFACE METHODOLOGY (RSM) 4. RESPONSE SURFACE METHODOLOGY (RSM) (Updated Spring ) So far, we ve focused on experiments that: Identify a few important variables from a large set of candidate variables, i.e., a screening experiment.

More information

Response Surface Methods

Response Surface Methods Response Surface Methods 3.12.2014 Goals of Today s Lecture See how a sequence of experiments can be performed to optimize a response variable. Understand the difference between first-order and second-order

More information

7.3 Ridge Analysis of the Response Surface

7.3 Ridge Analysis of the Response Surface 7.3 Ridge Analysis of the Response Surface When analyzing a fitted response surface, the researcher may find that the stationary point is outside of the experimental design region, but the researcher wants

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Unbalanced Data in Factorials Types I, II, III SS Part 1

Unbalanced Data in Factorials Types I, II, III SS Part 1 Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 25 Outline 1 Multiple Linear Regression 2 / 25 Basic Idea An extra sum of squares: the marginal reduction in the error sum of squares when one or several

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

8 RESPONSE SURFACE DESIGNS

8 RESPONSE SURFACE DESIGNS 8 RESPONSE SURFACE DESIGNS Desirable Properties of a Response Surface Design 1. It should generate a satisfactory distribution of information throughout the design region. 2. It should ensure that the

More information

ANOVA (Analysis of Variance) output RLS 11/20/2016

ANOVA (Analysis of Variance) output RLS 11/20/2016 ANOVA (Analysis of Variance) output RLS 11/20/2016 1. Analysis of Variance (ANOVA) The goal of ANOVA is to see if the variation in the data can explain enough to see if there are differences in the means.

More information

Contents. TAMS38 - Lecture 10 Response surface. Lecturer: Jolanta Pielaszkiewicz. Response surface 3. Response surface, cont. 4

Contents. TAMS38 - Lecture 10 Response surface. Lecturer: Jolanta Pielaszkiewicz. Response surface 3. Response surface, cont. 4 Contents TAMS38 - Lecture 10 Response surface Lecturer: Jolanta Pielaszkiewicz Matematisk statistik - Matematiska institutionen Linköpings universitet Look beneath the surface; let not the several quality

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I PubH 7405: REGRESSION ANALYSIS MLR: INFERENCES, Part I TESTING HYPOTHESES Once we have fitted a multiple linear regression model and obtained estimates for the various parameters of interest, we want to

More information

Analysis of Variance and Design of Experiments-II

Analysis of Variance and Design of Experiments-II Analysis of Variance and Design of Experiments-II MODULE VIII LECTURE - 36 RESPONSE SURFACE DESIGNS Dr. Shalabh Department of Mathematics & Statistics Indian Institute of Technology Kanpur 2 Design for

More information

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement EconS 450 Forecasting part 3 Forecasting with Regression Using regression to study economic relationships is called econometrics econo = of or pertaining to the economy metrics = measurement Econometrics

More information

Topic 4: Orthogonal Contrasts

Topic 4: Orthogonal Contrasts Topic 4: Orthogonal Contrasts ANOVA is a useful and powerful tool to compare several treatment means. In comparing t treatments, the null hypothesis tested is that the t true means are all equal (H 0 :

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

7 The Analysis of Response Surfaces

7 The Analysis of Response Surfaces 7 The Analysis of Response Surfaces Goal: the researcher is seeking the experimental conditions which are most desirable, that is, determine optimum design variable levels. Once the researcher has determined

More information

Design & Analysis of Experiments 7E 2009 Montgomery

Design & Analysis of Experiments 7E 2009 Montgomery Chapter 5 1 Introduction to Factorial Design Study the effects of 2 or more factors All possible combinations of factor levels are investigated For example, if there are a levels of factor A and b levels

More information

CHAPTER 6 A STUDY ON DISC BRAKE SQUEAL USING DESIGN OF EXPERIMENTS

CHAPTER 6 A STUDY ON DISC BRAKE SQUEAL USING DESIGN OF EXPERIMENTS 134 CHAPTER 6 A STUDY ON DISC BRAKE SQUEAL USING DESIGN OF EXPERIMENTS 6.1 INTRODUCTION In spite of the large amount of research work that has been carried out to solve the squeal problem during the last

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

RESPONSE SURFACE MODELLING, RSM

RESPONSE SURFACE MODELLING, RSM CHEM-E3205 BIOPROCESS OPTIMIZATION AND SIMULATION LECTURE 3 RESPONSE SURFACE MODELLING, RSM Tool for process optimization HISTORY Statistical experimental design pioneering work R.A. Fisher in 1925: Statistical

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is. Linear regression We have that the estimated mean in linear regression is The standard error of ˆµ Y X=x is where x = 1 n s.e.(ˆµ Y X=x ) = σ ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. 1 n + (x x)2 i (x i x) 2 i x i. The

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Design of Engineering Experiments Part 5 The 2 k Factorial Design

Design of Engineering Experiments Part 5 The 2 k Factorial Design Design of Engineering Experiments Part 5 The 2 k Factorial Design Text reference, Special case of the general factorial design; k factors, all at two levels The two levels are usually called low and high

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Appendix IV Experimental Design

Appendix IV Experimental Design Experimental Design The aim of pharmaceutical formulation and development is to develop an acceptable pharmaceutical formulation in the shortest possible time, using minimum number of working hours and

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Performing response surface analysis using the SAS RSREG procedure

Performing response surface analysis using the SAS RSREG procedure Paper DV02-2012 Performing response surface analysis using the SAS RSREG procedure Zhiwu Li, National Database Nursing Quality Indicator and the Department of Biostatistics, University of Kansas Medical

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Lecture 10. Factorial experiments (2-way ANOVA etc)

Lecture 10. Factorial experiments (2-way ANOVA etc) Lecture 10. Factorial experiments (2-way ANOVA etc) Jesper Rydén Matematiska institutionen, Uppsala universitet jesper@math.uu.se Regression and Analysis of Variance autumn 2014 A factorial experiment

More information

DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition

DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition Douglas C. Montgomery ARIZONA STATE UNIVERSITY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore Contents Chapter 1. Introduction 1-1 What

More information

Solution to Final Exam

Solution to Final Exam Stat 660 Solution to Final Exam. (5 points) A large pharmaceutical company is interested in testing the uniformity (a continuous measurement that can be taken by a measurement instrument) of their film-coated

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Regression With a Categorical Independent Variable: Mean Comparisons

Regression With a Categorical Independent Variable: Mean Comparisons Regression With a Categorical Independent Variable: Mean Lecture 16 March 29, 2005 Applied Regression Analysis Lecture #16-3/29/2005 Slide 1 of 43 Today s Lecture comparisons among means. Today s Lecture

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Chapter 6 The 2 k Factorial Design Solutions

Chapter 6 The 2 k Factorial Design Solutions Solutions from Montgomery, D. C. (004) Design and Analysis of Experiments, Wiley, NY Chapter 6 The k Factorial Design Solutions 6.. A router is used to cut locating notches on a printed circuit board.

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Response Surface Methodology IV

Response Surface Methodology IV LECTURE 8 Response Surface Methodology IV 1. Bias and Variance If y x is the response of the system at the point x, or in short hand, y x = f (x), then we can write η x = E(y x ). This is the true, and

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

2.2 Classical Regression in the Time Series Context

2.2 Classical Regression in the Time Series Context 48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression

More information

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent

More information

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Longitudinal Data Analysis of Health Outcomes

Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis Workshop Running Example: Days 2 and 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information