Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Similar documents
STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Answer Keys to Homework#10

Assignment 9 Answer Keys

Data Set 8: Laysan Finch Beak Widths

STATISTICS 479 Exam II (100 points)

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Lecture 11: Simple Linear Regression

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

Soil Phosphorus Discussion

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

ST Correlation and Regression

Residuals from regression on original data 1

2-way analysis of variance

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

Chapter 12: Multiple Regression

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Six Sigma Black Belt Study Guides

General Linear Model (Chapter 4)

1 A Review of Correlation and Regression

Homework 2: Simple Linear Regression

Inferences for Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

STAT 350 Final (new Material) Review Problems Key Spring 2016

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

STAT 350. Assignment 4

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

Module 6: Model Diagnostics

ANOVA: Analysis of Variation

Topic 14: Inference in Multiple Regression

Getting Correct Results from PROC REG

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam

Assignment 6 Answer Keys

Introduction and Single Predictor Regression. Correlation

Lecture 11 Multiple Linear Regression

Factorial designs. Experiments

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

A discussion on multiple regression models

STOR 455 STATISTICAL METHODS I

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Stat 500 Midterm 2 12 November 2009 page 0 of 11

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Lecture 18: Simple Linear Regression

22S39: Class Notes / November 14, 2000 back to start 1

Correlation and Regression

Chapter 6 Multiple Regression

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

Notes 6. Basic Stats Procedures part II

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

df=degrees of freedom = n - 1

INFERENCE FOR REGRESSION

R 2 and F -Tests and ANOVA

1 Introduction to Minitab

3 Variables: Cyberloafing Conscientiousness Age

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

Multiple Predictor Variables: ANOVA

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Lecture notes on Regression & SAS example demonstration

Lecture 10: 2 k Factorial Design Montgomery: Chapter 6

Can you tell the relationship between students SAT scores and their college grades?

Statistical Modelling in Stata 5: Linear Models

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Chapter 16. Simple Linear Regression and Correlation

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Math 423/533: The Main Theoretical Topics

Confidence Intervals, Testing and ANOVA Summary

13 Simple Linear Regression

2. Outliers and inference for regression

Model Selection Procedures

Ch. 5 Two-way ANOVA: Fixed effect model Equal sample sizes

Lectures on Simple Linear Regression Stat 431, Summer 2012

Multi-Way Analysis of Variance (ANOVA)

Density Temp vs Ratio. temp

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

One-way analysis of variance

Overview Scatter Plot Example

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X.

Business Statistics. Lecture 9: Simple Regression

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

N J SS W /df W N - 1

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

Lectures 5 & 6: Hypothesis Testing

Lecture 3: Inference in SLR

Lecture 13 Extra Sums of Squares

UNIVERSITY EXAMINATIONS NJORO CAMPUS SECOND SEMESTER 2011/2012

Transcription:

Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results of hypothesis tests to determine whether main effects are present (significant). Solution: According to the hypothesis tests, main effects for temperature level are present, but not for humidity level (p = 6.78 10 5 for temperature and p = 0.4326 for humidity.) Sum of Source DF Squares Mean Square F Value Pr > F Model 5 204.3216667 40.8643333 37.23 0.0002 Error 6 6.5850000 1.0975000 Corrected Total 11 210.9066667 R-Square Coeff Var Root MSE propcc Mean 0.968778 5.533185 1.047616 18.93333 Source DF Type I SS Mean Square F Value Pr > F humidity 2 2.1216667 1.0608333 0.97 0.4326 temperature 3 202.2000000 67.4000000 61.41 <.0001 2. Plot the data vs. the temperature factor using three different lines for the three humidity levels. Based on your graph, do you think that interaction is important for this problem? Solution: The graph (Figure 1) shows that the lines for the three levels of humidity do seem to interact across different levels of temperature. (The interaction is less pronounced when temperature is on the x-axis.) It looks like interaction may be important for this problem. (The Tukey Test for additivity, however, is not significant (p = 0.2118).) 3. Use the Tukey comparison to determine all significant differences in means in the main effect for temperature. Solution: Temperatures 1 and 2 are not significantly different from each other, according to the Tukey comparison. Everything other pairwise comparison is significant. Tukey Grouping Mean N temperature A 24.8333 3 4 B 20.7000 3 3 C 15.3000 3 2 C C 14.9000 3 1 For problem 4, use the case hardening assembly data described in Problem 24.6 on page 1022 of the text (CH23PR06.DAT). 1

14 16 18 20 22 24 26 Humidity Level 1 Humidity Level 2 Humidity Level 3 Figure 1: Interaction plot for Problem 1 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Levels of Temperature 4. Run the full three-way analysis of variance for these data, and check the assumptions. Summarize the results of the hypothesis tests for main and interaction effects, and your conclusions regarding the assumptions. Solution: The main effects for chemical agent (A), temperature (B), and duration (C) are all significant, but none of the interactions are significant. Dependent Variable: hardness Sum of Source DF Squares Mean Square F Value Pr > F Model 7 4772.258333 681.751190 202.98 <.0001 Error 16 53.740000 3.358750 Corrected Total 23 4825.998333 R-Square Coeff Var Root MSE hardness Mean 0.988864 3.056605 1.832689 59.95833 Source DF Type I SS Mean Square F Value Pr > F A 1 788.906667 788.906667 234.88 <.0001 B 1 1539.201667 1539.201667 458.27 <.0001 A*B 1 0.240000 0.240000 0.07 0.7926 C 1 2440.166667 2440.166667 726.51 <.0001 A*C 1 0.201667 0.201667 0.06 0.8095 B*C 1 2.940000 2.940000 0.88 0.3634 A*B*C 1 0.601667 0.601667 0.18 0.6778 The variance appears to be fairly constant, though in the residual plot show that the variance in the higher level appears somewhat smaller than the lower level (Figure 2). This is due to the presence of two potential outliers. A Box-Cox analysis suggests the squaring transformation. The result of such a transformation is that all but the three-way interaction (A B C) are significant. 2

(All diagnostic plots in this case are very much improved.) Dependent Variable: hardnesssq Sum of Source DF Squares Mean Square F Value Pr > F Model 7 68976850.93 9853835.85 278.17 <.0001 Error 16 566772.49 35423.28 Corrected Total 23 69543623.42 R-Square Coeff Var Root MSE hardnesssq Mean 0.991850 4.958022 188.2107 3796.085 Source DF Type I SS Mean Square F Value Pr > F A 1 11433218.22 11433218.22 322.76 <.0001 B 1 21814783.54 21814783.54 615.83 <.0001 A*B 1 247063.22 247063.22 6.97 0.0178 C 1 34754721.51 34754721.51 981.13 <.0001 A*C 1 377102.94 377102.94 10.65 0.0049 B*C 1 339773.57 339773.57 9.59 0.0069 A*B*C 1 10187.94 10187.94 0.29 0.5991 The normality assumption appears fine (both in the original and squared data), with a fairly linear qqplot and bell-shaped histogram. See Figure 3. Problems 5-7 refer to the hay fever relief problem 19.14 on page 868 (CH19PR14.DAT). The two active ingredients occur in the following quantities in the study: Quantity (in milligrams) Factor X 1 X 2 Level (ingredient 1) (ingredient 2) Low 5.0 7.5 Medium 10.0 10.0 High 15.0 12.5 5. Treating the quantities of each ingredient as quantitative variables, analyze these data using linear regression. Include linear and centered quadratic terms for each predictor and the product of the centered linear terms: Y i,j,k = β 0 + β 1 x i,j,k,1 + β 2 x i,j,k,2 + β 3 x 2 i,j,k,1 + β 4 x 2 i,j,k,2 + β 5 x i,j,k,1 x i,j,k,2 + ϵ i,j,k. Summarize the results of this analysis. Solution: The levels for factor A are 5, 10, and 15, and the levels for factor B are 7.5, 10, and 12.5. With regression we find that the linear, quadratic, and interaction terms are all significant in the model. For this model, R 2 = 0.9886, indicating an excellent fit. The regression coefficients are positive for the linear terms, and negative for the quadratic terms. This indicates that there is an increase in relief with increasing amounts of the ingredients, but that the relief increase levels off with the higher amounts. The coefficient of the interaction term is positive, suggesting that the relief increases with increased levels of both A and B. 3

1.0 1.2 1.4 1.6 1.8 2.0 Factor A 1.0 1.2 1.4 1.6 1.8 2.0 Factor B 1.0 1.2 1.4 1.6 1.8 2.0 40 50 60 70 80 Figure 2: Residual plots for Problem 4 Factor C Predicted Values The REG Procedure Dependent Variable: relief Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 370.46063 74.09213 520.63 <.0001 Error 30 4.26937 0.14231 Corrected Total 35 374.73000 Root MSE 0.37724 R-Square 0.9886 Dependent Mean 7.18333 Adj R-Sq 0.9867 Coeff Var 5.25165 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1-6.06667 0.37197-16.31 <.0001 amt1 1 0.59500 0.01540 38.63 <.0001 amt2 1 0.87000 0.03080 28.25 <.0001 amt1sq 1-0.03900 0.00534-7.31 <.0001 4

Histogram Normal Q Q Plot Frequency 0 1 2 3 4 5 6 2 1 0 1 2 Figure 3: Univariate plots of residuals for Problem 4 Theoretical Quantiles amt2sq 1-0.18000 0.02134-8.43 <.0001 amt12 1 0.10350 0.00754 13.72 <.0001 6. Check the assumptions of the regression model used in Problem 5. Solution: Scatterplots: Since we have significant quadratic terms we expect to see some deviations from linearity vs. the first order terms (Figure 4). Since the quadratic terms have only two X values each, it is not really helpful to check for linearity with those terms (Figures 5 and 6). There appear to be some problems with constant variance. Figure 4: Scatterplots for Problem 6 Residual Plots: The residual plots (Figures 7, 8, and 9) show that there may be some problems with the constant variance assumption. The variance appears smaller for low levels of the ingredients, and there are some strange patterns on the residual vs. predictor and residual vs. interaction plots, which indicate a poor fit (linearity problems). Normality: The qqplot (Figure 10) shows a fairly good fit to normality, with slight 5

Figure 5: Scatterplots for Problem 6 Figure 6: Scatterplot for Problem 6 deviations at the tails. 7. Give a discussion comparing the two-way ANOVA model and the regression (Problem 5) approaches to this analysis. Include a comparison of the values of R 2 for the two analyses and the conclusions drawn concerning interactions. Solution: Both models showed that the two ingredients and their interactions were highly significant. In addition, both models provided a good fit to the data with R 2 = 0.995664 for the ANOVA model and R 2 = 0.9886 for the regression model. The fit is slightly better for the ANOVA model, which indicates that there is some behavior that is not captured by the linear and quadratic terms in the regression model. However, this is a very tiny difference, indicating that the regression model also fits very well. The ANOVA model is simpler and easier to understand than the regression model which involves quadratic terms. Also, the assumptions of constant variance and normality seem to be better satisfied by the ANOVA model. The ANOVA model has four degrees of freedom (four parameters) for modeling the interaction, while the regression model only has one. As we saw in Problem 6

Figure 7: Residual Plots for Problem 6 Figure 8: Residual Plots for Problem 6 Set 9, the interaction is of a specific form where it appears primarily at the high levels of both ingredients. Since the regression model only allows one parameter for interaction, it must necessarily be an average interaction, as opposed to the more specific four-parameter interaction of ANOVA, which treats the high-high combination separately. Thus the ANOVA model seems to be preferable. 7

Figure 9: Residual Plots for Problem 6 Figure 10: Qqplot for Problem 6 8