Stat 328 Final Exam (Regression) Summer 2002 Professor Vardeman

Similar documents
SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

STAT 350: Summer Semester Midterm 1: Solutions

IE 361 Exam 1 October 2004 Prof. Vardeman

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Mrs. Poyner/Mr. Page Chapter 3 page 1

Regression Models for Time Trends: A Second Example. INSR 260, Spring 2009 Bob Stine

Stat 231 Final Exam. Consider first only the measurements made on housing number 1.

2.1: Inferences about β 1

Regression Analysis IV... More MLR and Model Building

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences. PROBLEM SET No. 5 Official Solutions

STATISTICS 110/201 PRACTICE FINAL EXAM

Exam Applied Statistical Regression. Good Luck!

Class Notes Spring 2014

Swarthmore Honors Exam 2012: Statistics

Stat 401B Exam 2 Fall 2017

Statistics 5100 Spring 2018 Exam 1

A discussion on multiple regression models

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

Stat 401B Exam 2 Fall 2015

STATISTICS 479 Exam II (100 points)

Stat 401B Exam 3 Fall 2016 (Corrected Version)

Stat 401XV Final Exam Spring 2017

Analysis of Bivariate Data

Stat 401B Exam 2 Fall 2016

Lecture 10 Multiple Linear Regression

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Table 1: Fish Biomass data set on 26 streams

Categorical Predictor Variables

Multiple Linear Regression

Lecture 12 Inference in MLR

Stat 401B Final Exam Fall 2015

Unit 11: Multiple Linear Regression

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Introduction and Single Predictor Regression. Correlation

Chapter 15 Multiple Regression

Introduction to Regression

Stat 231 Final Exam Fall 2013 Slightly Edited Version

Exponential Smoothing. INSR 260, Spring 2009 Bob Stine

Question Possible Points Score Total 100

Assumptions in Regression Modeling

Pre-Calculus Multiple Choice Questions - Chapter S8

Unit 6 - Introduction to linear regression

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

22s:152 Applied Linear Regression

MATH 644: Regression Analysis Methods

Outline. 2. Logarithmic Functional Form and Units of Measurement. Functional Form. I. Functional Form: log II. Units of Measurement

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

sociology 362 regression

2. Outliers and inference for regression

7. Prediction. Outline: Read Section 6.4. Mean Prediction

Multicollinearity Exercise

Stat 231 Final Exam Fall 2011

Density Temp vs Ratio. temp

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Lecture 19: Inference for SLR & Transformations

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

Section 3: Simple Linear Regression

Data Analysis 1 LINEAR REGRESSION. Chapter 03

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Lecture 18 Miscellaneous Topics in Multiple Regression

STAT 4385 Topic 06: Model Diagnostics

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

Advanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5

Simple Linear Regression

Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX. confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate)

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Ch Inference for Linear Regression

Statistical Modelling in Stata 5: Linear Models

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference in Regression Analysis

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Unit 6 - Simple linear regression

Ph.D. Preliminary Examination Statistics June 2, 2014

YEAR 10 GENERAL MATHEMATICS 2017 STRAND: BIVARIATE DATA PART II CHAPTER 12 RESIDUAL ANALYSIS, LINEARITY AND TIME SERIES

28. SIMPLE LINEAR REGRESSION III

MULTIPLE REGRESSION METHODS

9 Correlation and Regression

Chapter 13 Experiments with Random Factors Solutions

SMAM 314 Exam 42 Name

ST430 Exam 1 with Answers

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT Chapter 11: Regression

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

9. Linear Regression and Correlation

Lecture 3: Inference in SLR

Summarizing Data: Paired Quantitative Data

Correlation and Simple Linear Regression

CHAPTER 10. Regression and Correlation

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Unit 10: Simple Linear Regression and Correlation

Transcription:

Stat Final Exam (Regression) Summer Professor Vardeman This exam concerns the analysis of 99 salary data for n = offensive backs in the NFL (This is a part of the larger data set that serves as the basis of your Lab #) Attached to this exam are a number of JMP reports for these data Use them in answering the questions on this exam As on Lab #, the variables available for modeling were: salary 99 season salary draft round in the player draft when the player was selected y r s _exp years of NFL experience for the player played the number of regular season games played in 99 started the number of regular season games started in 99 citypop the population of the city in which the player's team is located Vardeman used these variables and created several more: logsalary the base logarithm of salary /draft the reciprocal of draft percentstarted the ratio started/played To begin, consider the problem of modeling a salary variable in terms of only a draft position variable a) Consider the plots of salary vs draft and logsalary vs draft What about these suggests that (as far as using standard statistical methodology is concerned) logsalary is probably a better "y" than salary? Ultimately, Vardeman decided to use /draft instead of draft in a SLR regression analysis for logsalary So until further notice, consider a SLR analysis using the model log salary = β + β (/ draft) + ε b) What fraction of raw variability in logsalary is accounted for using /draft as a predictor variable? c) Give and interpret a p-value for testing H : β = Say exactly where you found this on the printout p-value: where: interpretation:

d) Notice that if draft is large, /draft is near So β might be interpreted as a mean logsalary for a high draft number (or perhaps even undrafted) offensive back Give 9% confidence limits for this e) What does the SLR model on the previous page give as the difference in mean logsalary values for st and nd round draft picks? (Note that these are the cases / draft = and / draft = ) Give 9% confidence limits for this difference in means f) A particular offensive back not included in this data set is a former first round draft pick and was offered a $, contract for 99 (The base logarithm of, is about ) On the basis of draft position alone, did this person have a good case that the offer was too low? Explain carefully Now consider MLR analyses of logsalary Notice that printouts are available for two different multiple linear regressions The first is a regression on draft, yrs_exp, played, started, citypop, /draft, and percentstarted The second is a regression on only yrs_exp, /draft, and percentstarted g) Give 9% confidence limits for the standard deviation of logsalary when all of draft, yrs_exp, played, started, citypop, /draft, and percentstarted are held fixed

h) What on the MLR printouts suggests that it may be feasible to model logsalary using fewer than predictors? i) There is a decrease in R if one moves from the variable regression to the variable regression Give an appropriate F value, degrees of freedom and approximate p-value to attach to the decrease F: df:, p-value: Henceforth consider the variable regression Besides the raw data, the JMP data table at the end of the printout has summaries of that fit Notice that although n = cases were used in the fitting, the table includes some values for an additional ( st ) case j) According to this model, what increase in mean logsalary accompanies a year increase in NFL experience, if draft position and percentage of games started are held fixed? Give 9% confidence limits k) Dropping which of the predictors would cause the biggest decrease in variable: reasoning: R? How do you know?

l) Player has a large hat value What about his values of yrs_exp, /draft, and percentstarted makes this qualitatively plausible/expected? m) Considering both x and y variables, which player among the in the data set was the most influential in terms of fitting the variable model? Explain n) Make 9% prediction limits for the logsalary of player (based on the variable model!) o) Player s actual salary was $, Does your answer to n) provide solid statistical evidence that his salary (for unknown reasons) was atypical? Explain

Bivariate Fit of SALARY By DRAFT SALARY DRAFT Bivariate Fit of LogSalary By DRAFT LogSalary DRAFT

Bivariate Fit of LogSalary By /Draft LogSalary /Draft Linear Fit Linear Fit LogSalary = 99 + /Draft Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance Source Model Error C Total DF 9 Sum of Squares 999 Parameter Estimates Term Intercept /Draft Estimate 99 9 Std Error Mean Square t Ratio 9 Prob> t < F Ratio Prob > F

Response LogSalary Whole Model Actual by Predicted Plot LogSalary Actual LogSalary Predicted P< RSq= RMSE= Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance Source Model Error C Total DF 9 Term Intercept DRAFT YRS_EXP PLAYED STARTED CITYPOP /Draft PercentStarted Effect Tests Source DRAFT YRS_EXP PLAYED STARTED CITYPOP /Draft PercentStarted Sum of Squares 999 Parameter Estimates Estimate 9-99 -99 - e- 9 9 Nparm DF 99 Mean Square 99 Std Error 99 e-9 t Ratio - - -9 Sum of Squares 99 9 Residual by Predicted Plot - - - - LogSalary Predicted LogSalary Residual Press 99 F Ratio Prob > F < Prob> t < 9 9 9 F Ratio 9 Prob > F 9 9 9

Response LogSalary Whole Model Actual by Predicted Plot LogSalary Actual LogSalary Predicted P< RSq= RMSE= Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance Source Model Error C Total DF 9 Term Intercept YRS_EXP /Draft PercentStarted Effect Tests Source YRS_EXP /Draft PercentStarted Sum of Squares 99 99 999 Parameter Estimates Estimate 999 Nparm DF 9 Mean Square Std Error 9 9 t Ratio 9 Sum of Squares 9 Residual by Predicted Plot - - - - LogSalary Predicted LogSalary Residual Press F Ratio 9 Prob > F < Prob> t < F Ratio 99 Prob > F YRS_EXP Leverage Plot LogSalary Leverage Residuals YRS_EXP Leverage, P= PercentStarted Leverage Plot LogSalary Leverage Residuals - PercentStarted Leverage, P= /Draft Leverage Plot - /Draft Leverage, P= LogSalary Leverage Residuals

Rows SALARY DRAFT YRS_EXP PLAYED STARTED CITYPOP LogSalary /Draft PercentStarted Pred Formula LogSalary PredSE LogSalary Residual LogSalary 9 9 9 9 9 9 9 99 99 99 9 99 9 99 9 9 99 9 9 9 99 9 9 99 9 99 9 9 99 9 9999 9 9 9 9 9 9 9 9 9 9 9 99 9 999 9 99 9 9 9 9 9 9 9 9 999 99 9 9 9 9 9 9 9 9 9 9 9 9-99 - - 9 - - -9 - - -99 - - 9 - - -9-99 9 -

Rows Studentized Resid LogSalary h LogSalary Cook's D Influence LogSalary 9 9 9-9 -99-9 9-9 - - 999-9 9 - -99-9 9 - -9 9-9 -9-9 99-999 9 9 99 9 9 999 9 9 9 9 99 9 9 99 99 9 99 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9