Performing response surface analysis using the SAS RSREG procedure

Similar documents
7.3 Ridge Analysis of the Response Surface

Response Surface Methodology

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

7 The Analysis of Response Surfaces

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Generating Half-normal Plot for Zero-inflated Binomial Regression

SAS/STAT 15.1 User s Guide The GLMMOD Procedure

Chapter 31 The GLMMOD Procedure. Chapter Table of Contents

Simple linear regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Qinlei Huang, St. Jude Children s Research Hospital, Memphis, TN Liang Zhu, St. Jude Children s Research Hospital, Memphis, TN

7. Response Surface Methodology (Ch.10. Regression Modeling Ch. 11. Response Surface Methodology)

SAS/STAT 12.3 User s Guide. The GLMMOD Procedure (Chapter)

The program for the following sections follows.

Topic 14: Inference in Multiple Regression

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

STATISTICS 479 Exam II (100 points)

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

Handout 1: Predicting GPA from SAT

Topic 28: Unequal Replication in Two-Way ANOVA

Tornado Inflicted Damages Pattern

Statistical Techniques II EXST7015 Simple Linear Regression

Overview Scatter Plot Example

Lab # 11: Correlation and Model Fitting

Ratio of Polynomials Fit One Variable

The SEQDESIGN Procedure

Chapter 8 Quantitative and Qualitative Predictors

General Linear Model (Chapter 4)

Contents. Response Surface Designs. Contents. References.

STA 303H1F: Two-way Analysis of Variance Practice Problems

Lecture 1 Linear Regression with One Predictor Variable.p2

Implementation of Pairwise Fitting Technique for Analyzing Multivariate Longitudinal Data in SAS

11 Factors, ANOVA, and Regression: SAS versus Splus

Chapter 1 Linear Regression with One Predictor

Design and Analysis of Experiments

STOR 455 STATISTICAL METHODS I

Assignment 9 Answer Keys

Answer Keys to Homework#10

jh page 1 /6

STAT 350. Assignment 4

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

df=degrees of freedom = n - 1

DESIGN OF EXPERIMENT ERT 427 Response Surface Methodology (RSM) Miss Hanna Ilyani Zulhaimi

Power Analysis for One-Way ANOVA

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar

The ARIMA Procedure: The ARIMA Procedure

Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Chapter 6 Multiple Regression

Six Sigma Black Belt Study Guides

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Math 3330: Solution to midterm Exam

DETERMINATION OF RESPONSE SURFACE FUNCTION USING REPEATED MEASURES

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

Lecture 3: Inference in SLR

Topic 20: Single Factor Analysis of Variance

Chapter 15 The PDLREG Procedure. Chapter Table of Contents

Chapter 8 (More on Assumptions for the Simple Linear Regression)

SAS/STAT 14.3 User s Guide The GLM Procedure

Reducing Computation Time for the Analysis of Large Social Science Datasets

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

MATH 644: Regression Analysis Methods

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

J. Response Surface Methodology

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Lecture 10 Multiple Linear Regression

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

A Short Introduction to Curve Fitting and Regression by Brad Morantz

Dynamic Determination of Mixed Model Covariance Structures. in Double-blind Clinical Trials. Matthew Davis - Omnicare Clinical Research

Ex-Ante Forecast Model Performance with Rolling Simulations

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

Unit 12: Response Surface Methodology and Optimality Criteria

Analysis of Covariance (ANCOVA) with Two Groups

Analyzing and Interpreting Continuous Data Using JMP

Statistical Modelling in Stata 5: Linear Models

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Lecture 18 Miscellaneous Topics in Multiple Regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Chapter 2. Review of Mathematics. 2.1 Exponents

A SAS/AF Application For Sample Size And Power Determination

Response-surface illustration

Paper PO-27 Does SAS Distance Measurement Differ from ArcGIS Distance Measurement?

Getting Correct Results from PROC REG

Orthogonal and Non-orthogonal Polynomial Constrasts

SAS/STAT 15.1 User s Guide The SEQDESIGN Procedure

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) STATA Users

Appendix A Summary of Tasks. Appendix Table of Contents

PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY

Multicollinearity Exercise

Model Selection Procedures

PLS205 Lab 2 January 15, Laboratory Topic 3

Basic Business Statistics 6 th Edition

Charity Quick, Rho, Inc, Chapel Hill, NC Paul Nguyen, Rho, Inc, Chapel Hill, NC

6. Multiple regression - PROC GLM

The Model Building Process Part I: Checking Model Assumptions Best Practice

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

Transcription:

Paper DV02-2012 Performing response surface analysis using the SAS RSREG procedure Zhiwu Li, National Database Nursing Quality Indicator and the Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS 66160 Ying Liu, School of dentistry, University of Missouri at Kansas City, Kansas City, MO 64110 ABSTRACT Response surface methodology is a set of technology to present the cause and/or effect relationship between factor variables and response variable. The standard procedure of response surface contains four steps: (a) design the experiment, (b) estimate the coefficient of response surface, (c) do the lack-of fit test, and (d) investigate the region of interest. This paper described the RSREG procedure, which was designed for standard response surface analysis. Application of the RSREG procedure includes estimating the coefficient of response surface, lack of fit test and canonical structure analysis and predicting the new response values. INTRODUCTION Response surface methodology is a useful tool in modeling curvature effects in many scientific areas. The general scenario requires that the response is a quantitative continuous variable; the most important function is to identify the combination of levels of factors in experimental design that leads to determine optimum conditions and save resources. The RSREG procedure is specialized for analyzing the response surface analysis. GLM procedure also can estimate the coefficient of response surface and carry out lack of fit test. The fundamental function of response surface analysis is to examine the characteristics of the fitted surface with first or second order of quantitative predictors. Give that, response surface analysis is like a regression issue. However, response surface analysis is quite different from routine regression analysis in using very unique experimental design, coded predictor variables, etc. Therefore, RSREG procedure is superior to GLM and/or REG procedure with response surface analysis because: (1) RSGRE contains canonical analysis and ridge of optimum response; (2) it requires comparably shorter model statement. The primary goal of this paper is to present an overview of RSREG procedure and how its commands used in design and analyzing response surface experiments. The second goal is to provide illustrative SAS codes in producing visual graphs to help the researcher deeply understand the design and the properties of dataset. EXAMPLE ONE The following example uses a two factor quadratic model and the data set is from Table 16.9 of Dean and Voss (1999). The experiment studied the relation between the standard deviation of a copper-plating thickness (Y) and anode-cathode separation (x1) and cathodic current density (x2) of the product. x1 x2 Y 9.5 31 5.6 9.5 41 6.45 11.5 31 4.84 11.5 41 5.19 10.5 36 4.32 10.5 36 4.25 9 36 5.76 12 36 4.42 10.5 29 5.46 10.5 43 5.81 The quadratic model for this example is written as 1

y 2 2 i = 0 + β1x1 i + β2x2i + β3x1 i + β4x2i + β5x1 ix2i β + ε We will take the following steps to analyze this data: 1. fit the model and estimate the parameter 2. use canonical analysis to examine the shape of response surface 3. use ridge analysis to look for the optimum region. i MODEL FITTING AND PARAMTER ESTIMATION The following statement invokes the RRREG procedure on data a. We require a lack of fit test on the fitted model with LACKFIT option. We will orderly illustrate the output tables from the following statement. proc rsreg data=a; model y=x1 x2/lackfit; Using an appropriate coding transformation of the data is one important aspect of responsesurface analysis. The coding way on predictors will affect the results of canonical analysis. The coding approach makes all coded variables vary over the same range and fall between -1 and 1. Therefore, each predictor has an equal share in potentially determining the dominant predictor in response surface analysis. The coded variable x1=(factor A-10.5)/1.5 and X2=(Factor B-36)/7 (Table1). statistics Coding Coefficients for the Independent Variables Factor Subtracted off Divided by x1 10.500000 1.500000 x2 36.000000 7.000000 Response Surface for Variable y Response Mean 5.210000 Root MSE 0.189920 R-Square 0.9705 Coefficient of Variation 3.6453 Table 1.1: Summary for example one Simple statistics of response of y is also showed in Table 1. R square is 0.9705, which indicates 97% of variability explained by the fitted model. Hypothesis test on linear, quadratic and crossproduct in ANOVA table indicates the linear and quadratic terms are significantly important, and interaction between x1 and x2 is not significant (Table2). 2

Type I Sum Regression DF of Squares R-Square F Value Pr > F Linear 2 2.271313 0.4645 31.49 0.0036 Quadratic 2 2.411708 0.4932 33.43 0.0032 Crossproduct 1 0.062500 0.0128 1.73 0.2584 Total Model 5 4.745521 0.9705 26.31 0.0037 Table 1.2: Analysis of Variance Parameter Estimate Standard from Coded Parameter DF Estimate Error t Value Pr > t Data Intercept 1 79.689773 13.747965 5.80 0.0044 4.293903 x1 1-7.818717 1.831147-4.27 0.0130-0.711176 x2 1-1.812593 0.330244-5.49 0.0054 0.298737 x1*x1 1 0.392600 0.080832 4.86 0.0083 0.883350 x2*x1 1-0.025000 0.018992-1.32 0.2584-0.262500 x2*x2 1 0.029413 0.003651 8.06 0.0013 1.441260 Total Model 5 4.745521 0.9705 26.31 0.0037 Sum of Residual DF Squares Mean Square F Value Pr > F Lack of Fit 3 0.1418 0.0473 19.30 0.1654 Pure Error 1 0.00245 0.00245 Total Error 4 0.144279 0.036070 Table1.3: Lack of fit test 3

The table 3 includes a breakdown of lack of fit and pure error. The test indicates the second-order model is adequate for the data (p-value=0.1654). CANONICAL ANALYSIS Canonical analysis is used to investigate the overall shape of the curvature and determine the stationary point is a maximal, minimal or saddle point. The eigenvalues and eigenvectors indicate the shape of the response surface. Positive eigenvalues direct an upwards curvature, and all and Canonical Analysis of Response Surface Based on Coded Data Critical Value Factor Coded Uncoded x1 0.392457 11.088685 x2-0.067898 35.524714 Predicted value at stationary point: 4.144209 Eigenvectors Eigenvalues x1 x2 1.470594-0.218120 0.975922 0.854015 0.975922 0.218120 Stationary point is a minimum. negative eigenvalues direct a downward curvature. Therefore, positive eigenvalue indicate an estimate stationary is a minimum, and all positive eigenvalue indicate a maximum mixture of positive and negative eigenvalues indicate a saddle point. The larger absolute eigenvalues indicate the more importance in the curvature of response surface. A canonical analysis of the surface on example one indicates that the stationary point of the fitted surface is at (11.09, 35.52) in raw data and (0.39, -0.07) (Table 4) in coded units. Both eigenvalues are positive, indicating that the stationary point is a minimum. The eigenvalue of x1(anode-cathode separation) is larger than that of x2 (cathodic current density), indicates x1 is relatively more important than x2 (Table 4). This is an ideal situation in response surface design. The previous two steps may be sufficient, since optimum point is within a range of the experimental design. If the optimum is out of the range of experiment, ridge analysis will be applied to further search for the optimum region. Example two will illustrate how to use ridge analysis. Table 1.4: Canonical Analysis for Example one 4

EXAMPLE TWO: A SADDLE SURFACE USING RIDGE ANALYSIS This data is from problem 6.15 of Kutner et al. (2004) to study the relation between patient satisfaction (Y) and three factors x1(patient age), x2(severity of illness) and x3 (anxiety level) in a hospital. There are 46 patients were collected and data set are available in the following address: http://br312.math.tntech.edu/6080/cd/kutnerdata/chapter%20%206%20data%20sets/ch06pr15.txt. The following statement invokes RSREG procedure containing LACKFIT option and ridge analysis. The statements produce Output table 2.1 through table 2.3. proc rsreg data=patient; model y=x1 x2 x3/lackfit; ridge max; Table2.1 summary statistics for example two Coding Coefficients for the Independent Variables Factor Subtracted off Divided by x1 38.500000 16.500000 x2 51.500000 10.500000 x3 2.350000 0.550000 Response Surface for Variable y: satisfaction Response Mean 61.565217 Root MSE 9.252264 R-Square 0.7695 Table2.2 shows that the model is adequate for data with p-value=0.8510. The linear regression and corssproduct showed significant contrition to the model. Note that the X1(patient s age is not significant in the analysis of variance for the model. 5

The canonical analysis (Table 2.3) indicates that the shape of predicted response surface like a saddle. The eigenvlue of -33.21 shows that the hill orientation of the saddle is more curved than valley orientation, with eigenvalues of 30.05 and 0.63, respectively. The coefficients of the associated eigenvectors show that the valley is more aligned with X3(anxiety level) and hill with X2(severity of illness). There is no a unique optimum, since a saddle point in the canonical analysis. Table2.2 Lack of fit test and analysis of variance Type I Sum Regression DF of Squares R-Square F Value Pr > F Linear 3 9120.463666 0.6822 35.51 <.0001 Quadratic 3 215.590388 0.0161 0.84 0.4812 Crossproduct 3 951.492601 0.0712 3.70 0.0202 Total Model 9 10288 0.7695 13.35 <.0001 Sum of Residual DF Squares Mean Square F Value Pr > F Lack of Fit 35 2901.257693 82.893077 0.46 0.8510 Pure Error 1 180.500000 180.500000 Total Error 36 3081.757693 85.604380 Standard Parameter DF Estimate Error t Value Pr > t Parameter Estimate from Coded Data Intercept 1 22.435576 143.810689 0.16 0.8769 58.697270 x1 1-1.308107 3.304523-0.40 0.6945-22.494445 x2 1 14.158692 7.850701 1.80 0.0797-4.121989 x3 1-218.700531 89.903816-2.43 0.0201-4.595765 x1*x1 1 0.039910 0.024817 1.61 0.1165 10.865365 x2*x1 1 0.115937 0.085696 1.35 0.1845 20.086024 x2*x2 1-0.247873 0.129187-1.92 0.0630-27.327950 x3*x1 1-3.871899 1.162199-3.33 0.0020-35.137482 x3*x2 1 2.772787 2.628167 1.06 0.2984 16.012847 x3*x3 1 46.088116 26.486581 1.74 0.0904 13.941655 Sum of Factor DF Squares Mean Square F Value Pr > F Label x1 4 3754.898605 938.724651 10.97 <.0001 patient age x2 4 357.235231 89.308808 1.04 0.3985 severity of illness x3 4 1422.924474 355.731118 4.16 0.0072 anxiety level 6

Table 2.3 Canonical analysis for example two Canonical Analysis of Response Surface Based on Coded Data Critical Value Factor Coded Uncoded Label x1 10.279168 208.106267 patient age x2 6.458802 119.317420 severity of illness x3 9.409070 7.524989 anxiety level Predicted value at stationary point: -91.847309 Eigenvectors Eigenvalues x1 x2 x3 30.054195-0.679937-0.016717 0.733080 0.633872 0.661899 0.416236 0.623407-33.208997-0.315556 0.909103-0.271949 Stationary point is a saddle point. The ridge analysis in Table 2.4 indicates that the maximum satisfaction will result from relatively young age, relatively lighter severity of illness and higher anxiety level. Table2.4 Ridge analysis for example two Estimated Ridge of Maximum Response for Variable y: satisfaction Coded Estimated Standard Uncoded Factor Values Radius Response Error x1 x2 x3 0.0 58.697270 2.658718 38.500000 51.500000 2.350000 0.1 61.121258 2.639690 36.887244 51.289162 2.346387 0.2 63.812896 2.635607 35.273517 51.085272 2.357823 0.3 66.892226 2.715762 33.724997 50.925388 2.381385 0.4 70.446485 2.978678 32.264079 50.806620 2.412237 0.5 74.527045 3.505900 30.878784 50.716552 2.446976 0.6 79.162288 4.325171 29.550451 50.645359 2.483799 0.7 84.368246 5.423290 28.263794 50.586758 2.521798 0.8 90.154432 6.775988 27.007848 50.536846 2.560501 0.9 96.526790 8.362759 25.774946 50.493141 2.599649 1.0 103.489208 10.169313 24.559691 50.454014 2.639091 GRAPHS FOR RESPONSE SURFACE The canonical analysis provided the shape of a second-order response surface, the effective graphs will make the explanation easier and help the researcher understand deeply on the data. In RSREG procedure, plot option can produce appropriate plots through ODS graphics. For example, plot =all will produce all plots in some output panels. 7

You may choose some plots, such as, plot=surface option allows you to print out contour plot(s) of all pairs of predictors, and plot=surface(3d) option will provide a three-dimension plot. The following code was used to generate plot in RSREG procedure with Example one. ods graphics on; proc rsreg data=a plots=(surface); model y=x1 x2/lackfit; ods graphics off; CREATING YOUR OWN PLOT Beyond the SAS default contour and surface graphs, you can make your own 3D graphs with GCONTOUR, G3D or G3DGRID procedures. In GCONTOUR the response to two independent variables is displayed as different contour lines. G3D is a 3-dimensional perspective representation, either as a 'sheet' of joined points or a scatter plot. In this section, we will demonstrate how to produce the various shapes of a plot. Data step allows you to create potential data points used in graphs step. The example is showed with data=grid. The predicted response would be stored in the data predict using the OUT=predict. data grid; do y=.; do x1=9 to 13 by 0.02; do x2 = 29 to 43 by 0.02; output; end; end; end; data new; set a grid; proc rsreg data=new out=predict noprint; 8

model y=x1 x2/lackfit predict; The plots from GCONTOUR procedure represent three-dimensional relationships in two dimensions. Lines or areas in a contour plot represent levels of magnitude (z) corresponding to a position (x, y) on a plane. By default, the GCONTOUR procedure automatically use seven contour levels of the contour variable, representing those levels with default colors and line types, you also can create more or less levels according to the need with levels option (see example below), meanwhile, it generates a legend that is labeled with the contour variable's name. The G3D procedure allows you to view the surface plot from different angles by rotating the X-Y plane around the Z axis, or tilting the X-Y plane. goptions /*reset=global*/ gunit=pct border cback =white colors=(black blue green red); proc gcontour data=predict; plot x1*x2=y/grid xticknum=10 yticknum=10 levels=4 to 8 by 0.5 pattern join; proc g3d data=predict; plot x1*x2=y/plot x1*x2=y/grid caxis=blue xticknum=5 yticknum=4 zticknum=6 rotate=0 to 180 by 60 tilt=0 to 90 by 15; CONCLUSION The RSREG procedure is another method to execute response surface analysis. It is easier to understand and use in graphing and canonical analysis. We would recommend learning this useful procedure when you need response surface analysis. GCONTOUR, G3D and G3dGRID are useful procedures to obtain nicer graphs. REFERENCES Dean A M and Voss D (1999). Design and Analysis of Experiments. Springer. Kutner M, Nachtsheim C, Meter J and Li W (2004) Applied Linear Statistical Methods. McGraw -Hill/Irwin. 9

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Ying Liu, Ph.D Assistant Clinical Professor, Enterprise: School of Dentistry, University of Missouri, Kansas City Kansas City, MO. 64108 Work Phone: (816)235-2066 Fax: (816) 235-5524 E-mail: liuyi2@umkc.edu SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 10