STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf of 1 N = 46 Leaf Unit = 1.0 3 3 5 3 8899999 1 3 01 17 3 333 19 3 44 3 3 6667 3 3 8 4 001 19 4 333 14 4 4455 4 777 7 4 9 6 5 0 5 5 33 5 55 There are more people in the 8-9 age range than in most of the other age groups shown here, but otherwise the distribution seems reasonably uniform. 1
a) Stem-and-leaf of N = 46 Leaf Unit = 1.0 1 4 1 3 4 3 4 4 4 9 4 66667 19 4 8888899999 (1) 5 000011111111 15 5 3333 9 5 4445 5 5 67 3 5 8 6 0 1 6 About half of the patients had a severity of illness index between 48, 49, and 51. Stem-and-leaf of 3 N = 46 Leaf Unit = 0.0 5 18 00000 6 19 0 11 0 00000 14 1 000 1 0000000 (6) 3 000000 19 4 0000000 1 5 0000 8 6 000 5 7 0 4 8 0 3 9 000 Most of the patients anxiety levels are between and 4 (perhaps a moderate level). There are 3 patients who appear to have a higher anxiety level than any of the others in the study.
b) Scatter plot matrix: 75.5 4.5 psat 46.75 30.5 age 56.75 46.5 severity.65.075 4.5 75.5 30.5 46.75 46.5 56.75 anxiety.075.65 Correlation matrix: psat age -0.787 0.000 age severity severity -0.603 0.568 0.000 0.000 anxiety -0.645 0.570 0.671 0.000 0.000 0.000 Cell Contents: Pearson correlation P-Value The plots of patient satisfaction vs. age show a negative linear relationship, and the plots of patient sat. vs. severity of illness and anxiety level also indicate a negative relationship (though not as strong as with sat. and age). The correlations between age and the predictor variables support these visual findings. The scatter plots of age vs. severity and age vs. anxiety do not show strong linear relationships (note that the relevant correlations are moderate). Finally, the plots of severity vs. anxiety level indicate a strong positive linear relationship (and the correlation between these two variables is about 0.80). c) The regression equation is psat = 158-1.14 age - 0.44 severity - 13.5 anxiety Predictor Coef SE Coef T P Constant 158.49 18.13 8.74 0.000 age -1.1416 0.148-5.31 0.000 3
severity -0.440 0.490-0.90 0.374 anxiety -13.470 7.0-1.90 0.065 S =.06 R-Sq = 68.% R-Sq(adj) = 65.9% Analysis of Variance Source DF SS MS F P Regression 3 9.5 3040. 30.05 0.000 Residual Error 4 448.8 1. Total 45 13369.3 Source DF Seq SS age 1 875.4 severity 1 480.9 anxiety 1 364. The estimated regression function is: Yˆ = 158 1.14 1 0.44 13. 5 3 Interpretation of b : If age and anxiety level are held constant, and the severity index is increased by 1 unit, the patient satisfaction score decreases by -0.44. d) 0 RESI1 0 - -0 There do not appear to be any outliers among the residuals. The residuals plot shows no violation on the assumption of the regression model. 4
e) All of the residual plots appear to have points randomly scattered about the 0-level, so the regression function is appropriate and there is no evidence of nonconstancy of error terms. The 5
normal probability plot indicates that the assumption of a normal distribution for the residuals is very reasonable (the correlation between the ordered residuals and their expected values under normality is about 0.98). f) No, since there are no repeat observations with the same levels of 1, and 3. g) Regressing the squared residuals against 1, and 3 gives SSR* = 1356. SSE from the original model is 448.8. 1356 χ BP = 448.8 46 =1.5 χ (.99, 3) = 11.34 χ BP = 1.5< 11.34, so we conclude that the error variance is constant. 6.16 a) H 0 : β 1 = β = β 3 H A : not all β i s = 0, i=1,,3 Reject H 0 if F* > F(1- α, p-1, n-p) F* = 30.05 (from MINITAB output) F(.90, 3, 4) =. < 30.05 Reject H 0 The test implies that at least one of β 1, β and β 3 is not zero, i.e. at least one of age, severity and anxiety is useful in predicting patient satisfaction. The p-value of the test is 0.000 (from MINITAB s probability distributions feature). α b) B = t 1, n p g Here, g = 3, α =. B = t(.9833, 4) =. (standard errors are from computer output) β 1 : -1.1416 ± (.)(0. 148) ( -0.6690, -1.614) β : -0.44± (.)( 0.49) (0.6404, -1.544) β 3 : -13.47± (.)( 7.1) (.15, -9.09) β 1, β and β 3 are contained in these intervals with 90% confidence. The slope of the age variable is the only one that is significant at α=0. because its interval does not contain 0. c) The R for this model is 0.68 (SSR / SSTO), so the coefficient of multiple correlation R is 0.68 = 0.86. It indicates the strength of the linear relationship between the set of variables and patient satisfaction. 6
6.17 The REG Procedure Model: MODEL1 Dependent Variable: psat ' Inverse, Parameter Estimates, and SSE Variable Intercept Age Severity anxiety Satisfaction Intercept 3.477116535 0.00911391-0.06793079-0.06798817 158.4915167 Age 0.00911391 0.0004560816-0.000318596-0.0046671-1.141611847 Severity -0.06793079-0.000318596 0.00394814-0.0177085-0.440046 anxiety -0.06798817-0.0046671-0.0177085.498577303-13.47016319 Satisfaction 158.4915167-1.141611847-0.440046-13.47016319 448.8406818 Note, the matrix (') -1 consists of only the first 4 rows and first 4 columns of the above matrix obtained in SAS. The fifth row and column contain the parameter estimates: b 0, b 1, b and b 3, and SSE is the element in the 5 th row and 5 th column. a) h s Yˆ ± t(1 α/, n p) s{ ˆ } { ˆ } Y h Y h = ' h s {b} h = MSE (' h (' -1 ) h ) s { ˆ } Y h { } = 1. * ( 1 35 45.) = 7.0756 3.477 0.009 0.0679 0.0673 0.009 0.0679 0.0005 0.0003 0.0003 0.004 0.0047 0.0177 s Y ˆ h =.66 α =. t(1 α/, n p) = t(.95, 4) = 1.68 Y ˆ = 158 1.14(35) 0.44(45) 13.5(.) = 69.01 0.0673 1 0.0047 35 0.0177 45.4983. 69.01 ± (1.68)(.66) When h1 = 35, h = 45 and h3 =. the mean patient satisfaction (Y) is in the interval (64.53, 73.49) with 90% confidence. b) s {pred} = MSE + s { ˆ } Y h = 1. + 7.0756 = 8.756 s{pred}=.4056 Yˆ h ± t(1 α/, n p) s{pred} 69.01 ± (1.68)(.4056) When h1 = 35, h = 45 and h3 =. the patient satisfaction (Y) will be in the interval (51.51, 86.51) with 90% confidence. 7.6 H 0 : β = β 3 = 0 H A : at least one of β and β 3 not equal to 0 α = 0.05 SSR(, 3 1) = SSE(1) SSE(1,, 3) = 5093.9 448.8 7
= 845.1 F* = SSR(, 3 1) SSE( 1,, [( n ) ( n 4) ] 3) ( n 4) = 845.1/ 448.8/ 4 = 4.18 If F* > F(.975,, 4) = 4.037, reject H 0 otherwise do not reject. 4.037 < 4.18 Reject H 0. and 3 cann t be dropped from the model given 1 is retained. p-value = 0.0 8