1. (Problem 3.4 in OLRT)

Size: px

Start display at page:

Download "1. (Problem 3.4 in OLRT)"

Raymond Allison
5 years ago
Views:

1 STAT:5201 Homework 5 Solutions 1. (Problem 3.4 in OLRT) The relationship of the untransformed data is shown below. There does appear to be a decrease in adenine with increased caffeine intake. This is a completely randomized design, and an initial fit of a 1-way ANOVA model to the data is very significant, but the residuals suggest our assumption of constant variance is not met. Here is a plot of the (studentized) residuals vs. predicted values: S t u d e n t i z e d R e s i d s Yhat

2 First transformed model: log(adenine) vs. caffeine As the variance of the observations increases with the fitted value, a log transformation may rectify this problem. Here is the relationship of the transformed data: The 1-way ANOVA fit to the data is again very significant, and the plot of the residuals vs. predicted looks better (seems acceptable):

3 As this is a dose-response relationship, we can try to find a simpler model than the 1-way ANOVA to model the relationship. The most complex polynomial to model the data would use 8 parameters to model the mean structure. But we re trying to simplify the situation (so I m not going to gain anything by using something that complex), and it does look like the relationship is monotonic (decreasing in caffeine). So, I m going to compare the quadratic fit of the transformed data and the 1-way ANOVA fit of the transformed data using a full vs. reduced test. proc glm data=pr3_4; class caffeine; model logadenine=caffeine; The GLM Procedure Dependent Variable: logadenine Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total proc glm data=pr3_4; model logadenine=caffeine caffeinecaffeine; Dependent Variable: logadenine Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total Source DF Type I SS Mean Square F Value Pr > F caffeine <.0001 caffeinecaffeine <.0001 The F-statistic=(( )/(7-2))/ =0.4884, which is less than 1. And we accept the null hypothesis that the quadratic is sufficient in explaining the relationship between log(adenine) and caffeine. We gained very little in explaining variability when we added the 5 extra parameters for the more complex model. According to the Type I sums of squares output, it looks like a quadratic is as simple as we can go (the quadratic term is needed, and a linear would not be complex enough). A quadratic seems to fit well. The diagnostic plots (not shown) show our assumptions (normality and constant variance) are reasonable.

4 Second transformed model: log(adenine) vs. caffeine as piecewise linear model The coefficients of the quadratic: , , and for intercept, linear, and quadratic terms, respectively, give a quadratic model with a curve minimum at So, it predicts that the response will begin to increase at higher levels of caffeine. There is some evidence of this in the data, but if we test for a significant difference between the mean of the log(adenine) at 25 and 50 mm of caffeine, we do not find a significant difference. Given this information, another option (and a more creative one suggested by the book solutions) would be to fit a piece-wise linear model that decreases linearly up to 25 mm, and then remains flat after 25mM. Here, we re only interested in estimating an intercept and slope for the line (and knowing where the flat line begins). Conceptually, all the responses from a caffeine value of 25 or greater are considered to be from one group (a flat line suggests the explanatory variable has no impact on the response). We can fit this piece-wise linear model by creating a new explanatory variable called x. By shifting the caffeine values that are <=25 to the left by 25 units (x =caffeine-25), and then setting the new explanatory variable to 0 for caffeine values that are >25 (x =0), when we fit a model of log(adenine)=b0+b1x, we will get a model that looks like the one below on the left. The shift to the right doesn t change our interpretation of the Piece-wise linear model slope, which is the main parameter of interest in this case. To find the predicted log response at caffeine=0, we would look at the predicted log response in this fitted model at x =-25. log(adenine) caffeine This model does give a SSE that is lower than the quadratic model. Or a nonlinear function as the mean structure is something that is used in biological studies, and there may be a function that works well here.

5 Third transformed model: log(adenine) vs. sqrt(caffeine) This is also an option, but the log(adenine) mean response at sqrt(caffeine)=5 falls a bit below where a linear model would predict: has ANOVA l o g a d e n i n e sqrt(caffeine) You could, again, be creative, and fit a linear model to all the sqrt(caffeine) values except 5, and then let the log(adenine) at 5 be it s own predicted value(not related to the line) using dummy variable. Fourth transformed model: log(adenine) vs. treatment number If you plot log(adenine) vs. treatment number, here is the relationship we see:

6 This explanatory variable (treatment group number) coincides with some unspecified transformation of caffeine. This looks like a quadratic polynomial may fit pretty well. And if you fit a cubic, you ll find that a quadratic is sufficient. Here is the ANOVA output for the quadratic: The GLM Procedure Dependent Variable: logadenine Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total Source DF Type I SS Mean Square F Value Pr > F group <.0001 groupgroup The diagnostics in this model may show some inadequacy, but it seems like a reasonably OK model. One nice element to using the equally spaced transformed caffeine values is that the levels of 25 and 50 don t have such a huge impact (or leverage) on the fit of the curve. But, to predict the responses at values between treatment groups would take some fancy interpolation.

7 Problem 4.1 The ANOVA table is 2. Book answer is below: Exercise 4.2 We want to compare the control to the average of the three other treatments. We can use a contrast with coefficient 1 for control and 1/3 for the other treatments. The observed value of the contrast is The from the ANOVA is 12.15, and the s are all 3, so the standard error is There are 8 df for error, so we use the multiplier to obtain 3. In the book answer below, they perform a one-sided test, but the question really asked for a two-sided test (p-value would ve been 2P(t8>1.51)= , also not significant). Either option is OK here. Exercise 4.3 We can use a contrast with coefficients (1, 1, 1, 1) to compare the experienced (1 and 2) and inexperienced (3 and 4) workers. I would assume that the experienced workers should produce stronger joints, so let stest that one-sided alternative. The treatment means were , 6.675, , and , with three units per operator (substrate is unit!). The observed contrast is The is.4288 with 8 degrees of freedom, so the standard error of thecontrastis 16 Chapter 4 The -statistic is with 8 degrees of freedom. The upper-tail -value is about.9, so there is no evidence that the experienced workers produce stronger 15 joints; fortunately, there is no real evidence that the inexperiencedworkers are better either. If Exercise I were doing 4.4 this, Cookies I would 1, have 3, and used 5(0.5, are chewy, 0.5, -0.5, and -0.5) 2, 4, to and represent 6 are crispy, the average so the of contrast the novices with coefficients vs. the average (1, 1, of 1, the 1, experienced 1, 1) will compare operators. chewy But and the crispy. test statistic will be the Cookies same either 1, 2, and way. 5 are expensive, whereas 3, 4, and 6 are inexpensive, so the contrast with coefficients (1, 1, 1, 1, 1, 1) compares expensive and inexpensive. To determine orthogonality, we compute This sum is not zero, so the contrasts are not orthogonal.

8 4. The book answer is below: Exercise 4.4 Cookies 1, 3, and 5 are chewy, and 2, 4, and 6 are crispy, so the contrast with coefficients (1, 1, 1, 1, 1, 1) will compare chewy and crispy. Cookies 1, 2, and 5 are expensive, whereas 3, 4, and 6 are inexpensive, so the contrast with coefficients (1, 1, 1, 1, 1, 1) compares expensive and inexpensive. To determine orthogonality, we compute This sum is not zero, so the contrasts are not orthogonal. 5. The book answer is below: Part a) Problem 4.1 The ANOVA table is DF SS MS F P-value make e-06 Error Contrasts 17 There is strong evidence against the null hypothesis. There are many possible contrasts; here are some SS is 1764, the F is 8.04 with 1, 120 df, and a -value of.005. There is fairly strong evidence that receptive females shorten the males lifespans. (c) Coefficients. This compares the average lifetime with one companion to that with eight companions. On average, a male lives 16.4 days longer Part b) There are many possible contrasts of interests. You were asked to specify 3. with only one female companion than with eight female companions. The SS is , Here are a few that you might have considered: the F is 30.8 with 1, 120 df and a -value of 10. There is extremely strong evidence Contrast that more females shortenscoefficients a male s lifespan. Confidence interval (d) Coefficients. This determines if the one versus eight comparison differs versus between export receptive.25,.25,-.5,.25,.25,-0.5 and pregnant females. On average, (-1.02, the -0.38) one versus eight Import females effect is 9.6 days greater for receptive females than pregnant females. The SS Cheap In is , addition, versus the all expensive Fpairwise is 10.6 with differences 1/3,1/3,1/3,-1/3,-1/3,-1/3 1, 120 may df forbe a of-value interest. of (-1.10, There -0.50) is strong evidence that addition of receptive females lowers the lifespan by a greater amount than addition Cheap Problem versus 4.2 expensive I considered (import) four 0,0,1,0,0,-1 contrasts. Tests for the contrasts (-2.33, -1.27) of pregnant females. are made against (The the following from the was ANOVA: assigned onlywith to students 120 df. in 22S:165) Ford vs. GM.5,-.5,0,.5,-.5,0 (-0.27,0.47) Question (a) Coefficients 4.1 The treatment means are. independent This compares andthe normally controldistributed flies with treated with variances flies. On average,.thusthecovariancebetween control flies live 7.4 days longer. andthe SS is , is the Fis4.99with1, 120df, anda -value of.027. There is moderate evidence against the null hypothesis that treatment does not change the average lifetime. (b) Coefficients. This compares the average lifetime with pregnant companions to the average lifetime with receptive companions. On average, a male lives because 8.4of days orthogonality. longer withnormally pregnant companions distributed random than with variables receptive withcompanions. covariance 0The are independent. Computing Notes PROC GLM in SAS has a contrast statement that allows the specification and testing

10 7. proc import datafile="\\client\h$\iowa_classes\stat_5201_design\hw\hw5\pr3-1.xlsx" out=operators dbms=excel replace; / average over the 4 technical replicates/ proc means data=operators; by operator substrate; var strength; output out=means mean=str; data means; set means; keep operator substrate str; proc glm data=means; class operator; model str = operator; estimate 'novice vs. expert' operator ; estimate 'novice vs. expert' operator ; lsmeans operator; Below, we see the contrast of interest, shown twice First using coefficients of (1, 1, -1,-1) and then using (0.5, 0.5, -0.5, -0.5). Standard Parameter Estimate Error t Value Pr > t novice vs. expert novice vs. expert We do not reject the null. There is not a statistically significant difference in mean joint strength between the novice group and the experienced group.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y. Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a