STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Similar documents
Model Building Chap 5 p251

Confidence Interval for the mean response

Analysis of Bivariate Data

Multiple Regression Examples

Basic Business Statistics 6 th Edition

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Ch 13 & 14 - Regression Analysis

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 10 Multiple Linear Regression

28. SIMPLE LINEAR REGRESSION III

Inference for the Regression Coefficient

Inferences for Regression

Lecture 11: Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

23. Inference for regression

Statistics for Managers using Microsoft Excel 6 th Edition

Chapter 14 Multiple Regression Analysis

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Simple Linear Regression Using Ordinary Least Squares

General Linear Model (Chapter 4)

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Lecture 18: Simple Linear Regression

The simple linear regression model discussed in Chapter 13 was written as

Basic Business Statistics, 10/e

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Lecture 3: Inference in SLR

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Formal Statement of Simple Linear Regression Model

Six Sigma Black Belt Study Guides

Intro to Linear Regression

Intro to Linear Regression

Correlation & Simple Regression

Regression Models. Chapter 4

Inference for Regression

Multiple Regression Methods

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

STA 4210 Practise set 2a

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Models with qualitative explanatory variables p216

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Biostatistics 380 Multiple Regression 1. Multiple Regression

SMAM 314 Exam 42 Name

STATISTICS 110/201 PRACTICE FINAL EXAM

Examination paper for TMA4255 Applied statistics

INFERENCE FOR REGRESSION

1 Introduction to Minitab

Chapter 4. Regression Models. Learning Objectives

Chapter 15 Multiple Regression

Correlation Analysis

2. Outliers and inference for regression

School of Mathematical Sciences. Question 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Chapter 12: Multiple Regression

TMA4255 Applied Statistics V2016 (5)

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

STAT 360-Linear Models

The Multiple Regression Model

STAT Chapter 11: Regression

22S39: Class Notes / November 14, 2000 back to start 1

Chapter 6 Multiple Regression

Chapter 14 Simple Linear Regression (A)

Solution: X = , Y = = = = =

AMS 7 Correlation and Regression Lecture 8

Lecture notes on Regression & SAS example demonstration

Can you tell the relationship between students SAT scores and their college grades?

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

27. SIMPLE LINEAR REGRESSION II

Lecture 13 Extra Sums of Squares

Lecture 18 MA Applied Statistics II D 2004

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Conditions for Regression Inference:

Confidence Intervals, Testing and ANOVA Summary

Lecture 9: Linear Regression

9. Linear Regression and Correlation

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Stat 501, F. Chiaromonte. Lecture #8

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Ch 2: Simple Linear Regression

Math 3330: Solution to midterm Exam

Simple Linear Regression

This document contains 3 sets of practice problems.

Correlation and Simple Linear Regression

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 9. Correlation and Regression

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Lecture 12 Inference in MLR

Notebook Tab 6 Pages 183 to ConteSolutions

SMAM 314 Practice Final Examination Winter 2003

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Multiple Regression an Introduction. Stat 511 Chap 9

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Transcription:

STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf of 1 N = 46 Leaf Unit = 1.0 3 3 5 3 8899999 1 3 01 17 3 333 19 3 44 3 3 6667 3 3 8 4 001 19 4 333 14 4 4455 4 777 7 4 9 6 5 0 5 5 33 5 55 There are more people in the 8-9 age range than in most of the other age groups shown here, but otherwise the distribution seems reasonably uniform. 1

a) Stem-and-leaf of N = 46 Leaf Unit = 1.0 1 4 1 3 4 3 4 4 4 9 4 66667 19 4 8888899999 (1) 5 000011111111 15 5 3333 9 5 4445 5 5 67 3 5 8 6 0 1 6 About half of the patients had a severity of illness index between 48, 49, and 51. Stem-and-leaf of 3 N = 46 Leaf Unit = 0.0 5 18 00000 6 19 0 11 0 00000 14 1 000 1 0000000 (6) 3 000000 19 4 0000000 1 5 0000 8 6 000 5 7 0 4 8 0 3 9 000 Most of the patients anxiety levels are between and 4 (perhaps a moderate level). There are 3 patients who appear to have a higher anxiety level than any of the others in the study.

b) Scatter plot matrix: 75.5 4.5 psat 46.75 30.5 age 56.75 46.5 severity.65.075 4.5 75.5 30.5 46.75 46.5 56.75 anxiety.075.65 Correlation matrix: psat age -0.787 0.000 age severity severity -0.603 0.568 0.000 0.000 anxiety -0.645 0.570 0.671 0.000 0.000 0.000 Cell Contents: Pearson correlation P-Value The plots of patient satisfaction vs. age show a negative linear relationship, and the plots of patient sat. vs. severity of illness and anxiety level also indicate a negative relationship (though not as strong as with sat. and age). The correlations between age and the predictor variables support these visual findings. The scatter plots of age vs. severity and age vs. anxiety do not show strong linear relationships (note that the relevant correlations are moderate). Finally, the plots of severity vs. anxiety level indicate a strong positive linear relationship (and the correlation between these two variables is about 0.80). c) The regression equation is psat = 158-1.14 age - 0.44 severity - 13.5 anxiety Predictor Coef SE Coef T P Constant 158.49 18.13 8.74 0.000 age -1.1416 0.148-5.31 0.000 3

severity -0.440 0.490-0.90 0.374 anxiety -13.470 7.0-1.90 0.065 S =.06 R-Sq = 68.% R-Sq(adj) = 65.9% Analysis of Variance Source DF SS MS F P Regression 3 9.5 3040. 30.05 0.000 Residual Error 4 448.8 1. Total 45 13369.3 Source DF Seq SS age 1 875.4 severity 1 480.9 anxiety 1 364. The estimated regression function is: Yˆ = 158 1.14 1 0.44 13. 5 3 Interpretation of b : If age and anxiety level are held constant, and the severity index is increased by 1 unit, the patient satisfaction score decreases by -0.44. d) 0 RESI1 0 - -0 There do not appear to be any outliers among the residuals. The residuals plot shows no violation on the assumption of the regression model. 4

e) All of the residual plots appear to have points randomly scattered about the 0-level, so the regression function is appropriate and there is no evidence of nonconstancy of error terms. The 5

normal probability plot indicates that the assumption of a normal distribution for the residuals is very reasonable (the correlation between the ordered residuals and their expected values under normality is about 0.98). f) No, since there are no repeat observations with the same levels of 1, and 3. g) Regressing the squared residuals against 1, and 3 gives SSR* = 1356. SSE from the original model is 448.8. 1356 χ BP = 448.8 46 =1.5 χ (.99, 3) = 11.34 χ BP = 1.5< 11.34, so we conclude that the error variance is constant. 6.16 a) H 0 : β 1 = β = β 3 H A : not all β i s = 0, i=1,,3 Reject H 0 if F* > F(1- α, p-1, n-p) F* = 30.05 (from MINITAB output) F(.90, 3, 4) =. < 30.05 Reject H 0 The test implies that at least one of β 1, β and β 3 is not zero, i.e. at least one of age, severity and anxiety is useful in predicting patient satisfaction. The p-value of the test is 0.000 (from MINITAB s probability distributions feature). α b) B = t 1, n p g Here, g = 3, α =. B = t(.9833, 4) =. (standard errors are from computer output) β 1 : -1.1416 ± (.)(0. 148) ( -0.6690, -1.614) β : -0.44± (.)( 0.49) (0.6404, -1.544) β 3 : -13.47± (.)( 7.1) (.15, -9.09) β 1, β and β 3 are contained in these intervals with 90% confidence. The slope of the age variable is the only one that is significant at α=0. because its interval does not contain 0. c) The R for this model is 0.68 (SSR / SSTO), so the coefficient of multiple correlation R is 0.68 = 0.86. It indicates the strength of the linear relationship between the set of variables and patient satisfaction. 6

6.17 The REG Procedure Model: MODEL1 Dependent Variable: psat ' Inverse, Parameter Estimates, and SSE Variable Intercept Age Severity anxiety Satisfaction Intercept 3.477116535 0.00911391-0.06793079-0.06798817 158.4915167 Age 0.00911391 0.0004560816-0.000318596-0.0046671-1.141611847 Severity -0.06793079-0.000318596 0.00394814-0.0177085-0.440046 anxiety -0.06798817-0.0046671-0.0177085.498577303-13.47016319 Satisfaction 158.4915167-1.141611847-0.440046-13.47016319 448.8406818 Note, the matrix (') -1 consists of only the first 4 rows and first 4 columns of the above matrix obtained in SAS. The fifth row and column contain the parameter estimates: b 0, b 1, b and b 3, and SSE is the element in the 5 th row and 5 th column. a) h s Yˆ ± t(1 α/, n p) s{ ˆ } { ˆ } Y h Y h = ' h s {b} h = MSE (' h (' -1 ) h ) s { ˆ } Y h { } = 1. * ( 1 35 45.) = 7.0756 3.477 0.009 0.0679 0.0673 0.009 0.0679 0.0005 0.0003 0.0003 0.004 0.0047 0.0177 s Y ˆ h =.66 α =. t(1 α/, n p) = t(.95, 4) = 1.68 Y ˆ = 158 1.14(35) 0.44(45) 13.5(.) = 69.01 0.0673 1 0.0047 35 0.0177 45.4983. 69.01 ± (1.68)(.66) When h1 = 35, h = 45 and h3 =. the mean patient satisfaction (Y) is in the interval (64.53, 73.49) with 90% confidence. b) s {pred} = MSE + s { ˆ } Y h = 1. + 7.0756 = 8.756 s{pred}=.4056 Yˆ h ± t(1 α/, n p) s{pred} 69.01 ± (1.68)(.4056) When h1 = 35, h = 45 and h3 =. the patient satisfaction (Y) will be in the interval (51.51, 86.51) with 90% confidence. 7.6 H 0 : β = β 3 = 0 H A : at least one of β and β 3 not equal to 0 α = 0.05 SSR(, 3 1) = SSE(1) SSE(1,, 3) = 5093.9 448.8 7

= 845.1 F* = SSR(, 3 1) SSE( 1,, [( n ) ( n 4) ] 3) ( n 4) = 845.1/ 448.8/ 4 = 4.18 If F* > F(.975,, 4) = 4.037, reject H 0 otherwise do not reject. 4.037 < 4.18 Reject H 0. and 3 cann t be dropped from the model given 1 is retained. p-value = 0.0 8