SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot. 2. Fit the linear regression line. Regression Analysis: y versus x y = 3.62-0.0147 x Predictor Coef SE Coef T P Constant 3.62091 0.09949 36.39 0.000 x -0.014711 0.001436-10.24 0.000 S = 0.147890 R-Sq = 89.7% R-Sq(adj) = 88.9% Analysis of Variance Regression 1 2.2947 2.2947 104.92 0.000 Residual Error 12 0.2625 0.0219 Total 13 2.5571 Unusual Observations Obs x y Fit SE Fit Residual St Resid 3 36 3.4000 3.0913 0.0559 0.3087 2.25R
R denotes an observation with a large standardized residual. Answer the following questions for each data set on a separate sheet of paper (1) Based only on the scatterplot a. Does a straight line model appear to be reasonable? Explain. Based on the scatterplot a straight line model appears to be reasonable because the points lie close to a straight line. b. Does the date appear to be positively or negatively correlated? Explain. The data appears to be negatively correlated. As x increases y decreases. (2) What is the equation of linear regression line? Y=3.62-0.147x (3) Is the linear regression line statistically significant at α =.05? Yes. For both the t statistic and the anova table the pvalue is zero less than.05. (4) What is the percentage of the variation accounted for by the regression line? The regression line accounts for 89.7% of the variation. (5) What is the correlation coefficient? The correlation coefficent is r = 0.947. Usually an adequate model (1) has a scatterplot reasonably close to a straight line or the curve that was fitted. (2) Accounts for at least 80% of the variation. (3) Has a correlation coefficient with absolute value at least 0.9.(Only relevant for a straight line) (4) Is statistically significant at α =.05. This in itself might not be enough though. Write a paragraph of 3-5 sentences that indicates whether the model is adequate with specific reference to the issues above citing the relevant values in the computer printout. The straight line model appears to be adequate. The scatterplot is reasonably close to a straight line. The regression line accounts for 89.7%>80% of the variation. The absolute value of the correlation coefficient is 0.947>0.9. The p value for the hypothesis test of whether the slope differs from zero is zero indicating that it is highly significant. There is some additional work for data set 2. Please see after the data set. Data set 1: This data is the result of an investigation of how the propagation of an ultrasonic stress wave through a substance depends on the properties of the substance. Data on fracture strength (x as a percentage of ultimate tensile strength) and attenuation (y in neper/cm, the decrease in amplitude of the stress wave) in fiberglass-reinforced polyester composites. Row x y 1 12 3.3 2 30 3.2 3 36 3.4
4 40 3.0 5 45 2.8 6 57 2.9 7 62 2.7 8 67 2.6 9 71 2.5 10 78 2.6 11 93 2.2 12 94 2.0 13 100 2.3 14 105 2.1 Data set 2: For this data set the independent(prediction) variable x represents the percentage of wood in a batch of pulp. The dependent (response) variable y represents the tensile strength of Kraft paper measured in pounds per square inch made from the batch. Row x y 1 1.0 6.3 2 1.5 11.1 3 2.0 20.0 4 3.0 24.0 5 4.0 26.1 6 4.5 30.0 7 5.0 33.8 8 5.5 34.0 9 6.0 38.1 10 6.5 39.9 11 7.0 42.0 12 8.0 46.1 13 9.0 53.1 14 10.0 52.0 15 11.0 52.5 16 12.0 48.0 17 13.0 42.8 18 14.0 27.8 19 15.0 21.9. For each of the following data sets use Minitab to 3. Make a scatterplot.
4. Fit the linear regression line. Regression Analysis: y versus x y = 3.62-0.0147 x Predictor Coef SE Coef T P Constant 3.62091 0.09949 36.39 0.000 x -0.014711 0.001436-10.24 0.000 S = 0.147890 R-Sq = 89.7% R-Sq(adj) = 88.9% Analysis of Variance Regression 1 2.2947 2.2947 104.92 0.000 Residual Error 12 0.2625 0.0219 Total 13 2.5571 Unusual Observations Obs x y Fit SE Fit Residual St Resid 3 36 3.4000 3.0913 0.0559 0.3087 2.25R R denotes an observation with a large standardized residual.
Results for: Worksheet 2 Scatterplot of y vs x Regression Analysis: y versus x y = 21.3 + 1.77 x Predictor Coef SE Coef T P Constant 21.321 5.430 3.93 0.001 x 1.7710 0.6478 2.73 0.014 S = 11.8159 R-Sq = 30.5% R-Sq(adj) = 26.5% Analysis of Variance Regression 1 1043.4 1043.4 7.47 0.014 Residual Error 17 2373.5 139.6 Total 18 3416.9 Unusual Observations Obs x y Fit SE Fit Residual St Resid 19 15.0 21.90 47.89 5.70-25.99-2.51R R denotes an observation with a large standardized residual. Answer the following questions for each data set on a separate sheet of paper 1.Based only on the scatterplot a. Does a straight line model appear to be reasonable? Explain. No it does not. The points are clearly curvilinear. b.does the data appear to be positively or negatively correlated? Explain. You cannot really tell. Part of the way as x increases y increases. The points reach a maximum and then decrease. 2. What is the equation of linear regression line? Y=21.3+1.77x 3.Is the linear regression line statistically significant at α =.05? Yes it is. The relavant p value is.014. Note that it is not significant at α =.01 4.What is the percentage of the variation accounted for by the regression line? The regression line accounts for only 30.5% of the variation. 5. What is the correlation coefficient? The correlation coefficient is.552. Usually an adequate model
(5) has a scatterplot reasonably close to a straight line or the curve that was fitted. (6) Accounts for at least 80% of the variation. (7) Has a correlation coefficient with absolute value at least 0.9.(Only relevant for a straight line) (8) Is statistically significant at α =.05. This in itself might not be enough though. Write a paragraph of 3-5 sentences that indicates whether the model is adequate with specific reference to the issues above citing the relevant values in the computer printout. The regression line is quite inadequate. The points appear to lie on a parabola instead of a straight line.only 30.5% of the variation is accounted for much less than 80%. The correlation coefficient is 0.552 suggesting at best a weak linear relationship. However, although based on the p value of.014 the model accounts for a statistically significant amount of the variation the scatterplot, Rsquare and the correlation coefficient suggest that there is much room for improvement. There is some additional work for data set 2. Please see after the data set. For data set 2 fit use fitted line plot under the regression menu to fit a quadratic curve. Based on criteria 1, 2 and 4 for an adequate model does the quadratic model appear to be adequate. Explain. [Note: the correlation coefficient is not relevant here because it is only a measure of the linear association.] Polynomial Regression Analysis: y versus x y = - 6.674 + 11.76 x - 0.6345 x**2 S = 4.42040 R-Sq = 90.9% R-Sq(adj) = 89.7%
Analysis of Variance Regression 2 3104.25 1552.12 79.43 0.000 Error 16 312.64 19.54 Total 18 3416.89 Sequential Analysis of Variance Source DF SS F P Linear 1 1043.43 7.47 0.014 Quadratic 1 2060.82 105.47 0.000 Fitted Line: y versus x This is a definite improvement. The scatterplot is indeed a quadratic curve. The curve accounts for 90.9% of the variation and the quadratic term is highly significant.