Homework 2: Simple Linear Regression

Similar documents
STAT 4385 Topic 03: Simple Linear Regression

Ch 2: Simple Linear Regression

Simple Linear Regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Math 3330: Solution to midterm Exam

Lecture 18 MA Applied Statistics II D 2004

Correlation Analysis

Six Sigma Black Belt Study Guides

Simple Linear Regression

Lectures on Simple Linear Regression Stat 431, Summer 2012

Inferences for Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Overview Scatter Plot Example

Multiple Linear Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

STAT Chapter 11: Regression

9. Linear Regression and Correlation

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Ch 3: Multiple Linear Regression

Lecture 3: Inference in SLR

Chapter 16. Simple Linear Regression and dcorrelation

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Simple Linear Regression

Simple Linear Regression

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Can you tell the relationship between students SAT scores and their college grades?

Lecture 11: Simple Linear Regression

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Statistics for Engineers Lecture 9 Linear Regression

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

STATISTICS 110/201 PRACTICE FINAL EXAM

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Inference for the Regression Coefficient

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Final Exam - Solutions

Density Temp vs Ratio. temp

Statistics for Managers using Microsoft Excel 6 th Edition

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Linear models and their mathematical foundations: Simple linear regression

Applied Regression Analysis

Lecture 10 Multiple Linear Regression

ST505/S697R: Fall Homework 2 Solution.

Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Chapter 16. Simple Linear Regression and Correlation

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Unit 6 - Simple linear regression

Lecture 18: Simple Linear Regression

Simple Linear Regression

ANOVA (Analysis of Variance) output RLS 11/20/2016

13 Simple Linear Regression

Basic Business Statistics 6 th Edition

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Simple and Multiple Linear Regression

Ch Inference for Linear Regression

Statistics II Exercises Chapter 5

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

MATH 644: Regression Analysis Methods

Topic 14: Inference in Multiple Regression

Section 4.6 Simple Linear Regression

Section 11: Quantitative analyses: Linear relationships among variables

Simple linear regression

Lecture 15. Hypothesis testing in the linear model

CHAPTER EIGHT Linear Regression

y n 1 ( x i x )( y y i n 1 i y 2

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

STAT5044: Regression and Anova. Inyoung Kim

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

SCHOOL OF MATHEMATICS AND STATISTICS

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Chapter 2 Inferences in Simple Linear Regression

Chapter 6 Multiple Regression

Inference for Regression

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)

Introduction and Single Predictor Regression. Correlation

STAT 215 Confidence and Prediction Intervals in Regression

Lecture 6 Multiple Linear Regression, cont.

Correlation 1. December 4, HMS, 2017, v1.1

Simple Linear Regression

STT 843 Key to Homework 1 Spring 2018

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Lec 1: An Introduction to ANOVA

STATISTICS 479 Exam II (100 points)

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Simple Linear Regression Analysis

Econometrics Homework 1

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Correlation & Simple Regression

Analysis of Bivariate Data

R 2 and F -Tests and ANOVA

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

STAT 212 Business Statistics II 1

Transcription:

STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA (cumulative grade-point average X) and SAL (Starting Annual Salary in thousands dollars Y ) were recorded, as shown in the following table. Reference: Kleinbaum, D., Kupper, L., Nizam, A., and Rosenberg, E. (013) Applied Regression Analysis and Other Multivariable Methods. Cengage Learning. ID CGPA SAL x i yi x i y i 1.58 10.455 6.656 109.307 6.974.31 9.680 5.336 93.70.361 3.47 7.300 6.101 53.90 18.031 4.5 9.388 6.350 88.135 3.658 5 3. 1.496 10.368 156.150 40.37 6 3.37 11.81 11.357 139.53 39.806 7.43 9.4 5.905 85.08.414 8 3.08 11.75 9.78 11.30 7.78 18.14 31.470 10.98 1.000 8.880 144.000 35.760 11 3.55 1.500 1.603 156.50 44.375 1 3.64 13.310 13.50 177.156 48.448 13 3.7 1.105 13.838 146.531 45.031 14.4 6.00 5.018 38.440 13.888 15.7 11.5 7.90 13.756 31.109 16.3 8.000 5.90 64.000 18.400 17.83 1.548 8.009 157.45 35.511 18.37 7.700 5.617 59.90 18.49 19.5 10.08 6.350 100.561 5.71 0 3. 13.176 10.368 173.607 4.47 1 3.55 13.55 1.603 175.695 47.055 3.55 13.004 1.603 169.104 46.164 3.47 8.000 6.101 64.000 19.760 4.47 8.4 6.101 67.634 0.313 5.78 10.750 7.78 115.563 9.885 6.78 11.669 7.78 136.166 3.440 7.98 1.3 8.880 151.83 36.70 8.58 11.00 6.656 11.044 8.385 9.58 10.666 6.656 113.764 7.518 30.58 10.839 6.656 117.484 7.965 sum 85.15 3. 47.515 3573.136 935.738 1. Complete the above table first by filling the blanks and do some preliminary calculations. Based on the above worksheet, compute x, ȳ,, SS yy, and SS xy. 1

. The following graph provides a scatterplot of the data. Interpret the graph. Scatter Plot of the Data SAL 6 7 8 9 10 11 1 13.5 3.0 3.5 CGPA 3. Compute the sample correlation coefficient r and interpret. Using Fisher s Z-transformation approach, construct a 95% confidence interval of the (population) correlation coefficient ρ between CGPA and SAL. 4. Simple linear regression is used to study the relationship between CGPA and SAL. The model can be stated as y i = β 0 + β 1 x i + ε i, with ε i N (0, σ ). Compute the least squares estimator of (β 0, β 1 ) and add the fitted LS line to the scatterplot. 5. First complete the ANOVA table by filling blanks (1) (9) and answer questions accordingly. Analysis of Variance Table (ANOVA) Source df SS MS F P-Value Model (1) (4) (7) (9) <.0001 Error () (5) (8) Total (3) (6) (a) Compute the coefficient of determination and interpret; (b) Test the overall usefulness of the model. (c) Provide an estimator of the error variance σ.

6. Construct a 95% confidence interval for β 0 and β 1, respectively, and interpret. 7. Perform a test at significance level α = 0.05 to see if the following statement is true: If Student A has a CGPA 0.5 point higher than Student B, then Student A is expected to make more than one thousand per year than Student B. 8. Compute the fitted value ŷ 1 and the corresponding residual r 1 for the first student, based on the LS fitted model. 9. The following figure provides diagnostic plots based on the residuals r i s: (a) histogram; (b) normal Q-Q plot; (c) r i vs. fitted value; and (d) r i vs. x i. Comment on the plots in terms of model assumptions. (a) Histogram (b) Normal Q Q Plot Frequency 0 4 6 8 Sample Quantiles 1 0 1 3 1 0 1 residuals (c) residual vs. fitted 1 0 1 Theoretical Quantiles 1 0 1 1 0 1 r r (d) residual vs. x 9 10 11 1 13 14 y^.5 3.0 3.5 x 10. Construct a 95% confidence interval for the mean annual salary for students who have a CGPA of 3.5. 3

11. Suppose that Tom is graduating with a CGPA of 3.5, construct a 95% prediction interval for his starting salary. 1. The following figure plots the fitted LS line as well as the 95% Working-Hoteling confidence band. Please comment on the model fit and potential outliers. Working Hoteling Confidence Bands SAL 6 8 10 1 14 16 LS line Hoteling CB.0.5 3.0 3.5 4.0 CGPA 4

STAT 4385 Applied Regression Analysis Solutions for Homework 1. Some preliminary calculations include n = 30 xi = 85.15 yi = 3. x i = 47.515 y i = 3573.136 xi y i = 935.738 = x =.838 ȳ = 10.741 = 5.831 SS yy = 11.78 SS xy = 1.170.. It can be found that r = 0.874, indicating a strong positive linear association between college GPA and starting salary among college graduates. A 95% CI for Z-transformed ρ = (1/) ln{(1 + ρ)/(1 ρ)} is given by 1 ln 1 + r 1 r ± z 1 0.975 n 3 = 1 1 + 0.874 ln 1 0.874 ± 1.96 1 30 3 = 1.1797 ± 1.96/5.196 = (0.805, 1.5569) or denoted as (L, U ) Now transform (L, U ) back to a 95% CI for ρ: { exp(l ) 1 exp(l ) + 1, exp(u } ) 1 exp(u ) + 1 = (0.6655, 0.9149). 3. The completed tables are given below (see next page): 4. From the ANOVA table, compute R = 68.45% and ˆσ = MSE = 1.65. The overall usefulness (or global utility) F test statistic is 60.758 with p-value = Pr { F (1,8) > 60.758 } <.0001. Thus we reject H 0 and conclude that the model is useful. If you want to use the critical value approach, it can be found that F (1,8) 0.95 = 4.196. 1

Parameter Estimates parameter estimate s.e. t P-Value β 0 0.4359 1.3379 0.36 0.747 β 1 3.6306 0.4658 7.795 < 0.0001 ANVOA Table Source df SS MS F P-Value Model 1 76.858 76.858 60.758 < 0.0001 Error 8 35.40 1.65 Total 9 11.78 5. A 95% CI for β 0 is ˆβ 0 ± t (n ) ˆσ { 1 n + 0.4359 ±.0484 1.3379 (.3046, 3.1764) x } In interpreting the CI for β 0, note the extrapolation problem since no student gets 0 GPA. A 95% CI for β 1 is ˆβ 1 ± t (n ) ˆσ 3.6306 ±.0484 0.4658 (.6765, 4.5846). 6. Want to see if 0.5 point increase in GPA leads to 1K increase in SAL. Proportionally, 1 point increase in GPA would lead to K increase in SAL. Namely, the alternative hypothesis, and hence the null, can be given by { Ha : 0.5β 1 > 1 = β 1 > 1/0.5 = The t test can be used: H 0 : β 1 =. t obs = ˆβ 1 SE( ˆβ 1 ) = 3.6306 = 3.501. 0.4658 This is an upper-sided test. We reject H 0 at α = 0.05 since t obs > t (8) 0.95 = 1.701.

7. For the first observation, CGPA (x 1 ) =.58 and SAL (y 1 ) = 10.455. It can be found that ŷ 1 = 9.803 and the residual r 1 = y 1 ŷ 1 = 0.65. 8. At GPA x = 3.5, a 95% CI for estimating the mean SAL E(y x) is given by ( { ˆβ0 + ˆβ 1 x) ± t (n ) 1 ˆσ n + x } x 13.149 ±.0484 0.3703 (1.384, 13.901). A 95% PI for predicting the individual SAL is given by ( ˆβ0 + ˆβ 1 x) ± t (n ) 13.149 ±.0484 1.1841 (10.717, 15.568). { ˆσ 1 + 1 n + x } x 9. In Working-Hoteling confidence band for E(y x), the critical value is (, n ) W = F 1 α = F (,8) 0.95 = 3.3404 =.5847. 3