HOMEWORK ANALYSIS #2 - STOPPING DISTANCE Total Points Possible: 35 1. In your own words, summarize the overarching problem and any specific questions that need to be answered using the stopping distance data. Discuss how statistical modeling will be able to answer the posed questions. (a) (1 pt) Discuss the potential value of determining stopping distance from the speed of a car in determining speed limits. Safety, etc. could be mentioned. (b) (2 pts) The main interest in this problem is predicting stopping distance of cars based on their speeds. -0.5 pt if there is a decent explanation, but the word prediction is missing. (c) (2 pts) Statistical modeling can help predictions by providing a quantifiable relationship where speeds can be plugged in and stopping distance predicted. 2. Use the data to assess if a simple linear regression model (without doing any transformations) is suitable to analyze the stopping distance data. Justify your answer using any necessary graphics and relevant summary statistics. Provide discussion on why an SLR model on the raw data (not transformed) is or is not appropriate. (a) (2.5 pts) Draw plot (e.g. a scatterplot, fitted vs. residuals or both). -1 pt if there are incorrect label(s), or something else is wrong with the plot(s) (b) (2.5 pts) Discuss, in writing, why fitting a linear model is a bad idea (they only need to mention at least one of the following to receive full points): The scatterplot indicates the data has a curved relationship, violating the linearity assumption. Residuals vs. fitted-values plot shows there is more variation in stopping distances at higher speeds, violating the equal variance assumption. Histogram of Residuals 2 1 0 1 2 3 Density 0.0 0.1 0.2 0.3 0.4 0 20 40 60 80 100 Fitted Values 3 2 1 0 1 2 3 4 1
3. Write out (in mathematical form) a justifiable (perhaps after a transformation) SLR model that would help answer the questions in problem. Provide an interpretation of each mathematical term (variable or parameter) included in your model. Using the mathematical form, discuss how your model, after fitting it to the data, will be able to answer the questions in this problem. The model needs a transformation. Several transformations are possible. (a) (2 pts) Write out their model in equation form. The following are preferable transformations: log() = β 0 + β 1 log() + ɛ i where ɛ i N(0, σ 2 ) (Model 1) = β 0 + β 1 + ɛi where ɛ i N(0, σ 2 ) (Model 2) = β 0 + β 1 + ɛ i where ɛ i N(0, σ 2 ) (Model 3) The following are poor transformations (-1.5 pts if one of these is used): = β 0 + β 1 log() + ɛ i where ɛ i N(0, σ 2 ) (Model 4) log() = β 0 + β 1 + ɛ i where ɛ i N(0, σ 2 ) (Model 5) = β 0 + β 1 + ɛi where ɛ i N(0, σ 2 ) (Model 6) The following is the untransformed model (-2 pts if used): = β 0 + β 1 + ɛ i where ɛ i N(0, σ 2 ) (Model 7) Subtract 0.5 pt for any missing parts, including ɛ i. (b) (3 pts) Define y i, x i, and ɛ i and interpret β 0 and β 1 correctly (depends on their transformation, 0.5 pt each). Make sure they keep interpretations in the units of the transformed variables, not the originals. If they interpret the variables in terms of the original untransformed data, but the interpretations are otherwise correct, subtract 1.5 pts. 4. List, then discuss and justify your model assumptions using appropriate graphics or summary statistics. (a) (1 pt) List the assumptions of linearity, independence, normality, and homoskedasticity. (b) (4 pts) Discuss and justify the assumptions of linearity, independence, normality and homoskedasticity (1 pt for each assumption). For linearity, a scatterplot of the transformed data should be used. The correlation could also be mentioned. For independence, a reasonable explanation is all that is necessary. A residuals vs. fitted values plot could also be utilized, but is not necessary. For normality, a histogram of standardized residuals should be used. The KS or JB test could also be used. A Q-Q plot is another option. For equal variance, the BP test could be used, or a discussion regarding one of the plots above could be used. 2
Histogram of Residuals 2 1 0 1 2 Density 0.0 0.1 0.2 0.3 0.4 1 2 3 4 Fitted Values 2 1 0 1 2 Transformed Scatterplot log() 1 2 3 4 5 1.5 2.0 2.5 3.0 3.5 log() 5. Assess and interpret the fit and predictive accuracy of your model on the level of your target audience. (a) (2 pts) Report R 2 (1 pt) and interpret it in context (1 pt) (% of the variation in (potentially transformed) y is explained by (potentially transformed) x. R 2 Model 1 0.902 Model 2 0.906 Model 3 0.925 Model 4 0.723 Model 5 0.868 Model 6 0.816 Model 7 0.878 (b) (3 pts) Perform cross validation to assess predictive accuracy and interpret the results. Students should report bias and RMSPE and interpret these. Note, because of random variation in the simulation, bias and RMSPE values will differ. Give full points for reasonable answers with reasonable interpretations. -1 pt for insufficient or unclear interpretations -2 pts if cross validation was attempted, but all answers are clearly wrong 3
-3 pts if cross validation was not attempted 6. Fit your model in #3 to the stopping distance data and summarize the results by displaying the fitted model in equation form (do NOT just provide a screen shot of the R or SAS output). Interpret each of the fitted parameters in the context of the problem. Provide a plot of the data with a fitted regression line on the original scale of the data. (a) (2 pts) Report coefficients in equation form. log( ) = 1.102 + 1.568 log() (1) = 3.117 + 2.107 (2) = 0.932 + 0.252 (3) = 91.022 + 46.889 log() (4) log( ) = 1.487 + 0.094 (5) = 67.681 + 25.540 (6) = 20.131 + 3.142 (7) (b) (2 pts) Interpret coefficients in the context of the problem. E.g. As log() goes up by 1, then log(y) goes up by 1.568 on average. (c) (1 pt) Provide a plot on the original scale of the data like the one below (including a fitted regression line). -0.5 pt if a plot was attempted, but something is wrong with it (line seems off, variables are switched, etc.) -1 pt if there isn t a plot 4
7. The local law enforcement is considering implementing a speed limit of 35 MPH. Use your model to obtain a prediction of the distance required by a vehicle to stop when traveling at 35 MPH. How much of a reduction in stopping distance would be achieved by making it a 30 MPH speed limit instead? Given that the road is a rural road with many homes, provide an argument for or against the use of 35 MPH. (a) (3 pts) Predict at 35 MPH and then at 30 MPH (1.5 pts each). 30 MPH 35 MPH Model 1 68.79 87.60 Model 2 70.95 87.38 Model 3 72.36 95.43 Model 4 68.46 75.68 Model 5 73.15 116.76 Model 6 72.21 83.41 Model 7 74.12 89.83 (b) (2 pts) Provide some form of argument that the 30 MPH speed limit is preferred. Any reasonable argument gets full credit. 5