Stat Final Exam (Regression) Summer Professor Vardeman This exam concerns the analysis of 99 salary data for n = offensive backs in the NFL (This is a part of the larger data set that serves as the basis of your Lab #) Attached to this exam are a number of JMP reports for these data Use them in answering the questions on this exam As on Lab #, the variables available for modeling were: salary 99 season salary draft round in the player draft when the player was selected y r s _exp years of NFL experience for the player played the number of regular season games played in 99 started the number of regular season games started in 99 citypop the population of the city in which the player's team is located Vardeman used these variables and created several more: logsalary the base logarithm of salary /draft the reciprocal of draft percentstarted the ratio started/played To begin, consider the problem of modeling a salary variable in terms of only a draft position variable a) Consider the plots of salary vs draft and logsalary vs draft What about these suggests that (as far as using standard statistical methodology is concerned) logsalary is probably a better "y" than salary? Ultimately, Vardeman decided to use /draft instead of draft in a SLR regression analysis for logsalary So until further notice, consider a SLR analysis using the model log salary = β + β (/ draft) + ε b) What fraction of raw variability in logsalary is accounted for using /draft as a predictor variable? c) Give and interpret a p-value for testing H : β = Say exactly where you found this on the printout p-value: where: interpretation:
d) Notice that if draft is large, /draft is near So β might be interpreted as a mean logsalary for a high draft number (or perhaps even undrafted) offensive back Give 9% confidence limits for this e) What does the SLR model on the previous page give as the difference in mean logsalary values for st and nd round draft picks? (Note that these are the cases / draft = and / draft = ) Give 9% confidence limits for this difference in means f) A particular offensive back not included in this data set is a former first round draft pick and was offered a $, contract for 99 (The base logarithm of, is about ) On the basis of draft position alone, did this person have a good case that the offer was too low? Explain carefully Now consider MLR analyses of logsalary Notice that printouts are available for two different multiple linear regressions The first is a regression on draft, yrs_exp, played, started, citypop, /draft, and percentstarted The second is a regression on only yrs_exp, /draft, and percentstarted g) Give 9% confidence limits for the standard deviation of logsalary when all of draft, yrs_exp, played, started, citypop, /draft, and percentstarted are held fixed
h) What on the MLR printouts suggests that it may be feasible to model logsalary using fewer than predictors? i) There is a decrease in R if one moves from the variable regression to the variable regression Give an appropriate F value, degrees of freedom and approximate p-value to attach to the decrease F: df:, p-value: Henceforth consider the variable regression Besides the raw data, the JMP data table at the end of the printout has summaries of that fit Notice that although n = cases were used in the fitting, the table includes some values for an additional ( st ) case j) According to this model, what increase in mean logsalary accompanies a year increase in NFL experience, if draft position and percentage of games started are held fixed? Give 9% confidence limits k) Dropping which of the predictors would cause the biggest decrease in variable: reasoning: R? How do you know?
l) Player has a large hat value What about his values of yrs_exp, /draft, and percentstarted makes this qualitatively plausible/expected? m) Considering both x and y variables, which player among the in the data set was the most influential in terms of fitting the variable model? Explain n) Make 9% prediction limits for the logsalary of player (based on the variable model!) o) Player s actual salary was $, Does your answer to n) provide solid statistical evidence that his salary (for unknown reasons) was atypical? Explain
Bivariate Fit of SALARY By DRAFT SALARY DRAFT Bivariate Fit of LogSalary By DRAFT LogSalary DRAFT
Bivariate Fit of LogSalary By /Draft LogSalary /Draft Linear Fit Linear Fit LogSalary = 99 + /Draft Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance Source Model Error C Total DF 9 Sum of Squares 999 Parameter Estimates Term Intercept /Draft Estimate 99 9 Std Error Mean Square t Ratio 9 Prob> t < F Ratio Prob > F
Response LogSalary Whole Model Actual by Predicted Plot LogSalary Actual LogSalary Predicted P< RSq= RMSE= Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance Source Model Error C Total DF 9 Term Intercept DRAFT YRS_EXP PLAYED STARTED CITYPOP /Draft PercentStarted Effect Tests Source DRAFT YRS_EXP PLAYED STARTED CITYPOP /Draft PercentStarted Sum of Squares 999 Parameter Estimates Estimate 9-99 -99 - e- 9 9 Nparm DF 99 Mean Square 99 Std Error 99 e-9 t Ratio - - -9 Sum of Squares 99 9 Residual by Predicted Plot - - - - LogSalary Predicted LogSalary Residual Press 99 F Ratio Prob > F < Prob> t < 9 9 9 F Ratio 9 Prob > F 9 9 9
Response LogSalary Whole Model Actual by Predicted Plot LogSalary Actual LogSalary Predicted P< RSq= RMSE= Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance Source Model Error C Total DF 9 Term Intercept YRS_EXP /Draft PercentStarted Effect Tests Source YRS_EXP /Draft PercentStarted Sum of Squares 99 99 999 Parameter Estimates Estimate 999 Nparm DF 9 Mean Square Std Error 9 9 t Ratio 9 Sum of Squares 9 Residual by Predicted Plot - - - - LogSalary Predicted LogSalary Residual Press F Ratio 9 Prob > F < Prob> t < F Ratio 99 Prob > F YRS_EXP Leverage Plot LogSalary Leverage Residuals YRS_EXP Leverage, P= PercentStarted Leverage Plot LogSalary Leverage Residuals - PercentStarted Leverage, P= /Draft Leverage Plot - /Draft Leverage, P= LogSalary Leverage Residuals
Rows SALARY DRAFT YRS_EXP PLAYED STARTED CITYPOP LogSalary /Draft PercentStarted Pred Formula LogSalary PredSE LogSalary Residual LogSalary 9 9 9 9 9 9 9 99 99 99 9 99 9 99 9 9 99 9 9 9 99 9 9 99 9 99 9 9 99 9 9999 9 9 9 9 9 9 9 9 9 9 9 99 9 999 9 99 9 9 9 9 9 9 9 9 999 99 9 9 9 9 9 9 9 9 9 9 9 9-99 - - 9 - - -9 - - -99 - - 9 - - -9-99 9 -
Rows Studentized Resid LogSalary h LogSalary Cook's D Influence LogSalary 9 9 9-9 -99-9 9-9 - - 999-9 9 - -99-9 9 - -9 9-9 -9-9 99-999 9 9 99 9 9 999 9 9 9 9 99 9 9 99 99 9 99 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9