Multiple Regression Examples

Similar documents
Confidence Interval for the mean response

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Model Building Chap 5 p251

Models with qualitative explanatory variables p216

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Introduction to Regression

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

The simple linear regression model discussed in Chapter 13 was written as

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Correlation & Simple Regression

Stat 501, F. Chiaromonte. Lecture #8

INFERENCE FOR REGRESSION

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Multiple Regression Methods

Basic Business Statistics, 10/e

Stat 231 Final Exam. Consider first only the measurements made on housing number 1.

SMAM 314 Exam 42 Name

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Chapter 12: Multiple Regression

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Inferences for Regression

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

1. Least squares with more than one predictor

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Analysis of Bivariate Data

23. Inference for regression

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Data Set 8: Laysan Finch Beak Widths

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

One-Way Analysis of Variance (ANOVA)

Lecture 18: Simple Linear Regression

Concordia University (5+5)Q 1.

School of Mathematical Sciences. Question 1

Ch 13 & 14 - Regression Analysis

STAT 360-Linear Models

School of Mathematical Sciences. Question 1. Best Subsets Regression

STATISTICS 110/201 PRACTICE FINAL EXAM

This document contains 3 sets of practice problems.

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

SMAM 314 Practice Final Examination Winter 2003

ANOVA: Analysis of Variation

Data files & analysis PrsnLee.out Ch9.xls

Business 320, Fall 1999, Final

Multiple Regression: Chapter 13. July 24, 2015

28. SIMPLE LINEAR REGRESSION III

Inference for Regression Inference about the Regression Model and Using the Regression Line

TAMS38 Experimental Design and Biostatistics, 4 p / 6 hp Examination on 19 April 2017, 8 12

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

1 Introduction to Minitab

TMA4255 Applied Statistics V2016 (5)

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

42 GEO Metro Japan

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Contents. 2 2 factorial design 4

Chapter 15 Multiple Regression

MBA Statistics COURSE #4

STAB27-Winter Term test February 18,2006. There are 14 pages including this page. Please check to see you have all the pages.

Inference for the Regression Coefficient

Inferences for linear regression (sections 12.1, 12.2)

Interpreting the coefficients

27. SIMPLE LINEAR REGRESSION II

ACOVA and Interactions

Simple Linear Regression: A Model for the Mean. Chap 7

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

Department of Mathematics & Statistics STAT 2593 Final Examination 17 April, 2000

22S39: Class Notes / November 14, 2000 back to start 1

Two-Way Analysis of Variance - no interaction

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

MULTIPLE LINEAR REGRESSION IN MINITAB

Department of Mathematics & Statistics Stat 2593 Final Examination 19 April 2001

Lecture 9: Linear Regression

Multiple Predictor Variables: ANOVA

Examination paper for TMA4255 Applied statistics

IF YOU HAVE DATA VALUES:

Unit 6 - Introduction to linear regression

10 Model Checking and Regression Diagnostics

Is economic freedom related to economic growth?

Unit 6 - Simple linear regression

1 Use of indicator random variables. (Chapter 8)

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Basic Business Statistics 6 th Edition

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Statistics 5100 Spring 2018 Exam 1

Multiple Regression an Introduction. Stat 511 Chap 9

Ph.D. Preliminary Examination Statistics June 2, 2014

Chapter 9 - Correlation and Regression

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Multiple Linear Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Intro to Linear Regression

CHAPTER 10. Regression and Correlation

Soil Phosphorus Discussion

Lectures on Simple Linear Regression Stat 431, Summer 2012

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Intro to Linear Regression

Q Lecture Introduction to Regression

Transcription:

Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x + β 2 x 2 explains the curvature simple fit: MTB > regress c2 1 c1; SUBC> residuals c3. The regression equation is volume = - 191 + 11.0 diameter Predictor Coef Stdev t-ratio p Constant -191.12 16.98-11.25 0.000 diameter 11.0413 0.5752 19.19 0.000 s = 20.33 R-sq = 95.3% R-sq(adj) = 95.1% Analysis of Variance SOURCE DF SS MS F Regression 1 152259 152259 368.43 0. Error 18 7439 413 Total 19 159698

quadratic fit MTB > let c3=c1**2 MTB > name c3 diameter^2 MTB > Regress c2 2 c1 c3; SUBC> Constant; SUBC> Predict 30 900; SUBC> Brief 2. Regression Analysis: volume versus diameter, diameter^2 The regression equation is volume = 29.7-5.62 diameter + 0.290 diameter^2 Predictor Coef SE Coef T P Constant 29.74 51.39 0.58 0.570 diameter -5.620 3.792-1.48 0.157 diameter^2 0.29037 0.06572 4.42 0.000 S = 14.2715 R-Sq = 97.8% R-Sq(adj) = 97.6% Analysis of Variance Source DF SS MS F P Regression 2 156235 78118 383.54 0.000 Residual Error 17 3463 204 Total 19 159698

Source DF Seq SS diameter 1 152259 diameter^2 1 3976 Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 122.46 5.15 (111.59, 133.33) (90.45, 154.47) Values of Predictors for New Observations New Obs diameter diameter^2 1 30.0 900 note that the estimates of β 0 and β 1 change when the quadratic term is included the DF for regression has increased to two, and the DF for residual has decreased by one the residual sum of squares for the quadratic fit is much smaller, 3463 versus 7439 the MSE = s 2 in also much smaller, 204 versus 413, (sometimes SSE decreases but

MSE does not, because the decrease in the numerator does not offset the decrease in the denominator) confidence intervals will be narrower, inferences will be more precise SST is the same, it depends on the responses only, not on the model R 2 is larger,.978 versus.953 the quadratic term is highly significant (T = 4.42, P =.000) the SEQ SS table shows the breakdown of SSR into SSR(diameter) = 152259, the same as before, and SSR(diameter 2 diameter) = 3976, the amount explained by the quadratic term, given that the linear term is already in the model the increase in SSR between the two models is the same as the drop in SSE formerly the prediction interval for a new response at diameter = 30 was (96.31, 183.92), now it is (90.45, 154.47)

this is narrower, but also shifted because the prediction has changed from 140.11 to 122.46

Example: The data set height.mtw can be found on BLS, in the Minitab Data Files folder. The variables in the data set include Y (height of student), X 1 (height of same sex parent), X 2 (height of opposite sex parent), and sex of the student (X 3 = 1 for males; X 3 = 0 for females). The following Minitab output gives the various Pearson correlations for different pairs. Results for: HEIGHTS.MTW MTB > corr c1-c4 Student Height SSP Height OSP Height SSP 0.641 0.000 Height OSP -0.193-0.408 0.205 0.005 M/F 0.692 0.606-0.539 0.000 0.000 0.000 Cell Contents: Pearson correlation P-Value the height of student is positively correlated with the height of the same sex

parent, and slightly negatively correlated with the height of the opposite sex parent the positive correlation with sex indicates that the males have greater heights than the females a pairs plot gives all pairwise scatterplots - the first row shows how student height relates to the predictors

the negative association of student height with height of opposite sex parent is surprising using separate labels for males and females, and superimposing separate regression lines shows positive associations for both males

and females!!!

the similar plot of student heights versus height of same sex parent shows nearly equal association for both males and females

There are a number of possible regression models, including regressions on single or multiple independent variables. Student height versus height of same sex parent MTB > regress c1 1 c2 Y i = β 0 + β 1 X i1 + ɛ i The regression equation is Student height = 30.7 + 0.557 SSP Predictor Coef SE Coef T P Constant 30.702 6.765 4.54 0.000 SSP 0.5567 0.1017 5.47 0.000 S = 2.74051 R-Sq = 41.1% R-Sq(adj) = 39.7% Analysis of Variance Source DF SS MS F P Regression 1 225.07 225.07 29.97 0.000 Residual Error 43 322.95 7.51 Total 44 548.02 only 41% of the variation in student height

is explained by the parent of the same height the estimate of σ is s = 2.74, so prediction intervals are very roughly ±5.5 inches - very wide! Student height versus height of opposite sex parent MTB > regress c1 1 c3 The regression equation is Student height = 81.9-0.209 OSP Predictor Coef SE Coef T P Constant 81.89 11.06 7.40 0.000 OSP -0.2094 0.1626-1.29 0.205 S = 3.50309 R-Sq = 3.7% R-Sq(adj) = 1.5% Analysis of Variance Source DF SS MS F P Regression 1 20.34 20.34 1.66 0.205 Residual Error 43 527.68 12.27 Total 44 548.02

Student height versus height of same sex parent and height of opposite sex parent Y i = β 0 + β 1 X i1 + β 2 X i2 + ɛ i MTB > regress c1 2 c2 c3 Predictor Coef SE Coef T P Constant 22.66 14.30 1.59 0.120 SSP 0.5859 0.1122 5.22 0.000 OSP 0.0897 0.1403 0.64 0.526 S = 2.75953 R-Sq = 41.6% R-Sq(adj) = 38.9% Analysis of Variance Source DF SS MS F P Regression 2 228.19 114.09 14.98 0.000 Residual Error 42 319.83 7.62 Total 44 548.02 note that the extra amount explained by the opposite sex parent is small SSR(OSP SSP ) = 228.19 225.17 = 3.02 - the T value for this variable is small and not significant

Student height versus heights of both parents taking into account their sex Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ɛ i MTB > regress c1 3 c2 c3 c4 Predictor Coef SE Coef T P Constant 21.50 11.70 1.84 0.073 SSP 0.3375 0.1061 3.18 0.003 OSP 0.3249 0.1254 2.59 0.013 M/F 4.4446 0.9534 4.66 0.000 S = 2.25790 R-Sq = 61.9% R-Sq(adj) = 59.1% Analysis of Variance Source DF SS MS F P Regression 3 339.00 113.00 22.16 0.000 Residual Error 41 209.02 5.10 Total 44 548.02 Source DF Seq SS Height of same sex parent 1 225.07 Height of opposite sex parent 1 3.11 Male or Female ( 1=M, 0 = F) 1 110.81

Unusual Observations Height of same sex Student Obs parent height Fit SE Fit Residual St Resid 11 66.0 71.000 66.524 0.468 4.476 2.03R 28 60.0 69.000 63.524 0.725 5.476 2.56R 44 71.3 63.400 64.999 1.381-1.599-0.89 X R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. the student s sex greatly improves the fit the R 2 has increased to 61.9 all variables are significant s has dropped to s = 2.26 - prediction intervals are still wide two outliers are large (St. Resid denotes a standardized residual, so most should be between -2 and 2 one case is has the potential to influence the fit, because it is an outlier in the space of the predictors investigation reveals this is a female student with mother 71.3 inches tall and father 59.8 inches short! it is quite possible that the sex of the parents was recorded incorrectly

note how the sequential sum of squares tables changes if the order in which the variables is entered into the model is changed Source DF Seq SS Male or Female ( 1=M, 0 = F) 1 262.23 Height of same sex parent 1 42.54 Height of opposite sex parent 1 34.23 Source DF Seq SS Male or Female ( 1=M, 0 = F) 1 262.23 Height of opposite sex parent 1 25.18 Height of same sex parent 1 51.59 Source DF Seq SS Height of opposite sex parent 1 20.34 Male or Female ( 1=M, 0 = F) 1 267.07 Height of same sex parent 1 51.59

plots of the residuals versus each of the predictors show no problems

in this plot the males, with the larger fitted values, are indicated with squares, the females are circles

the coefficients in the model for SSP and OSP are nearly the same this suggests that one can add or average the heights of the two parents the following model uses the average, and with one fewer predictor the fit to the data is nearly as good as before let c10 = (c2+c3)/2 MTB > regress c1 2 c4 c10 Regression Analysis: Student height versus Male or Female (, C10 The regression equation is Student height = 21.3 + 4.50 Male or Female ( 1=M, 0 = F) + 0.665 C10 Predictor Coef SE Coef T P Constant 21.31 11.33 1.88 0.067 Male or Female ( 1=M, 0 = F) 4.4971 0.6969 6.45 0.000 C10 0.6649 0.1693 3.93 0.000 S = 2.23104 R-Sq = 61.9% R-Sq(adj) = 60.0% Analysis of Variance Source DF SS MS F P Regression 2 338.96 169.48 34.05 0.000 Residual Error 42 209.06 4.98 Total 44 548.02 Source DF Seq SS Male or Female ( 1=M, 0 = F) 1 262.23 C10 1 76.74

Unusual Observations Male or Female ( 1=M, Student Obs 0 = F) height Fit SE Fit Residual St Resid 3 0.00 59.000 63.198 0.782-4.198-2.01R 11 0.00 71.000 66.522 0.461 4.478 2.05R 28 0.00 69.000 63.530 0.713 5.470 2.59R 33 1.00 66.000 67.362 1.022-1.362-0.69 X the R 2 for this model is the same as before, and the MSE and its square root s have been reduced