22S39: Class Notes / November 14, 2000 back to start 1

Size: px
Start display at page:

Download "22S39: Class Notes / November 14, 2000 back to start 1"


1 Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1

2 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2

3 Model diagnostics Before we interpret a fitted (linear) regression model, we need to check whether or not the fitted regression model provide good fit to the data. In other words, we need to check whether or not the regression model is a plausible data mechanism for the data on hand. 22S39: Class Notes / November 14, 2000 back to start 2

4 Model diagnostics Before we interpret a fitted (linear) regression model, we need to check whether or not the fitted regression model provide good fit to the data. In other words, we need to check whether or not the regression model is a plausible data mechanism for the data on hand. This requires checking whether or not the residuals fulfill the asumptions of (1) having no pattern (2) constant variance (3) normally distributed and (4) independent. 22S39: Class Notes / November 14, 2000 back to start 2

5 Model diagnostics Before we interpret a fitted (linear) regression model, we need to check whether or not the fitted regression model provide good fit to the data. In other words, we need to check whether or not the regression model is a plausible data mechanism for the data on hand. This requires checking whether or not the residuals fulfill the asumptions of (1) having no pattern (2) constant variance (3) normally distributed and (4) independent. The residuals may display a trend if the mean response is not linear in X. This can be more readily detected by plotting the residuals against the fitted values, or plotting the residuals against the X variable. Either plots convey the same information. Why? 22S39: Class Notes / November 14, 2000 back to start 2

6 For example: below are data simulated from the model Y = X X 2 + e where the errors are normally distributed and the X s are N(0, 5 2 ) 22S39: Class Notes / November 14, 2000 back to start 3

7 y residuals x x residuals normal quantiles fitted residual quantiles Figure 1: 22S39: Class Notes / November 14, 2000 back to start 4

8 22S39: Class Notes / November 14, 2000 back to start 5

9 The scatter diagram of the X and Y shows slight curvature which is much clearer in the residual vs fitted values plot. 22S39: Class Notes / November 14, 2000 back to start 5

10 The scatter diagram of the X and Y shows slight curvature which is much clearer in the residual vs fitted values plot. Also, the q-q plot of the residuals show non-normality mainly due to the curvature in the residuals. 22S39: Class Notes / November 14, 2000 back to start 5

11 The scatter diagram of the X and Y shows slight curvature which is much clearer in the residual vs fitted values plot. Also, the q-q plot of the residuals show non-normality mainly due to the curvature in the residuals. When the residuals show curvature, we may fit a quadratic model Y = α 0 + β 1 (X X) + β 2 (X X) 2 + e or cubic models, or polynomial models of higher degrees. 22S39: Class Notes / November 14, 2000 back to start 5

12 The scatter diagram of the X and Y shows slight curvature which is much clearer in the residual vs fitted values plot. Also, the q-q plot of the residuals show non-normality mainly due to the curvature in the residuals. When the residuals show curvature, we may fit a quadratic model Y = α 0 + β 1 (X X) + β 2 (X X) 2 + e or cubic models, or polynomial models of higher degrees. Sometimes, the mean response only bears a linear relationship with X after a suitable transformation. 22S39: Class Notes / November 14, 2000 back to start 5

13 The scatter diagram of the X and Y shows slight curvature which is much clearer in the residual vs fitted values plot. Also, the q-q plot of the residuals show non-normality mainly due to the curvature in the residuals. When the residuals show curvature, we may fit a quadratic model Y = α 0 + β 1 (X X) + β 2 (X X) 2 + e or cubic models, or polynomial models of higher degrees. Sometimes, the mean response only bears a linear relationship with X after a suitable transformation. For example, we have simulated data from the model: Y = exp(0.6 X + e)) where e N(0, ), i.e., Y grows exponentially with X on the average. If we transform Y to Z = log(y ), we have Z = 0.6 X + e, a linear regression model. 22S39: Class Notes / November 14, 2000 back to start 5

14 The following figures show the diagnostics plots when we fit a simple linear regression model of Y on X 22S39: Class Notes / November 14, 2000 back to start 6

15 y residuals x fitted residuals normal quantiles x data quantiles Figure 2: 22S39: Class Notes / November 14, 2000 back to start 7

16 22S39: Class Notes / November 14, 2000 back to start 8

17 We see that the residuals has a curved pattern, and also its variance increases with the magnitude of the fitted value. That is the residuals appear to have non-constant variance, known as heteroscedasticity. Do you expect these two phenomena? 22S39: Class Notes / November 14, 2000 back to start 8

18 We see that the residuals has a curved pattern, and also its variance increases with the magnitude of the fitted value. That is the residuals appear to have non-constant variance, known as heteroscedasticity. Do you expect these two phenomena? If the residuals show curvature and heteroscedascity, it calls for transforming Y. 22S39: Class Notes / November 14, 2000 back to start 8

19 We see that the residuals has a curved pattern, and also its variance increases with the magnitude of the fitted value. That is the residuals appear to have non-constant variance, known as heteroscedasticity. Do you expect these two phenomena? If the residuals show curvature and heteroscedascity, it calls for transforming Y. Common transformations that one may try include 22S39: Class Notes / November 14, 2000 back to start 8

20 We see that the residuals has a curved pattern, and also its variance increases with the magnitude of the fitted value. That is the residuals appear to have non-constant variance, known as heteroscedasticity. Do you expect these two phenomena? If the residuals show curvature and heteroscedascity, it calls for transforming Y. Common transformations that one may try include 1. Z = Y (good to try for responses that are counts so that variance of residuals increases with the fitted value), 22S39: Class Notes / November 14, 2000 back to start 8

21 We see that the residuals has a curved pattern, and also its variance increases with the magnitude of the fitted value. That is the residuals appear to have non-constant variance, known as heteroscedasticity. Do you expect these two phenomena? If the residuals show curvature and heteroscedascity, it calls for transforming Y. Common transformations that one may try include 1. Z = Y (good to try for responses that are counts so that variance of residuals increases with the fitted value), 2. Z = log(y ) (good to try if you suspect that the response should increase by a certain percent per unit change in X, or if the that variance of residuals increases with the squared fitted value), 22S39: Class Notes / November 14, 2000 back to start 8

22 3. and Z = 1/Y (good to try if the recipropocal may make sense, e.g., automobile efficieny can be measured by miles per gallon, or gallons per mile). 22S39: Class Notes / November 14, 2000 back to start 9

23 residuals (sqrt(y)) residuals (log(y)) fitted fitted residuals (1/y)) fitted Figure 3: 22S39: Class Notes / November 14, 2000 back to start 10

24 22S39: Class Notes / November 14, 2000 back to start 11

25 Based on the residual plots, we find that the log transformation is appropriate. 22S39: Class Notes / November 14, 2000 back to start 11

26 Based on the residual plots, we find that the log transformation is appropriate. Other common problems encountered in regression analysis are outliers and influential data. 22S39: Class Notes / November 14, 2000 back to start 11

27 Based on the residual plots, we find that the log transformation is appropriate. Other common problems encountered in regression analysis are outliers and influential data. Influential cases refer to cases with extreme X values that may influence the fitted model unduly. 22S39: Class Notes / November 14, 2000 back to start 11

28 y x Figure 4: Note that if we lower the Y-value of the point with X = 10, the fitted line is changed a lot. 22S39: Class Notes / November 14, 2000 back to start 12

29 Errors need not be independent, for example, if there is a learning effect, then the error variance of the later runs (of experiments) than the earlier ones. Plotting the residuals against the order of the experiments were done may point out some such problems. 22S39: Class Notes / November 14, 2000 back to start 13

30 Errors need not be independent, for example, if there is a learning effect, then the error variance of the later runs (of experiments) than the earlier ones. Plotting the residuals against the order of the experiments were done may point out some such problems. Outliers refer to cases where the residuals are unusual, e.g., with magnitude larger than 3 MSE. Here is the scatter plot of the Presidential Election data from Florida 22S39: Class Notes / November 14, 2000 back to start 13

31 Errors need not be independent, for example, if there is a learning effect, then the error variance of the later runs (of experiments) than the earlier ones. Plotting the residuals against the order of the experiments were done may point out some such problems. Outliers refer to cases where the residuals are unusual, e.g., with magnitude larger than 3 MSE. Here is the scatter plot of the Presidential Election data from Florida Clearly, Palm Beach is an outlier. 22S39: Class Notes / November 14, 2000 back to start 13

32 Errors need not be independent, for example, if there is a learning effect, then the error variance of the later runs (of experiments) than the earlier ones. Plotting the residuals against the order of the experiments were done may point out some such problems. Outliers refer to cases where the residuals are unusual, e.g., with magnitude larger than 3 MSE. Here is the scatter plot of the Presidential Election data from Florida Clearly, Palm Beach is an outlier. Both outliers and influential cases may unduly affect the model fit, and they may be dropped from the analysis in many cases. However, the bottom line is that regression analysis can point out unusual cases which may tell us something interesting. 22S39: Class Notes / November 14, 2000 back to start 13

33 Interpretation of fitted regression model Here we check the model fit of the regression model of Price on (polishing) Time: Coef SE Coef T p TIME (constant) s= 20.5 R-Sq: 84.47%,R-sq(adj): 84.2% Analysis of Variance Source Df SS MS F p Regression Residual Error S39: Class Notes / November 14, 2000 back to start 14

34 Total S39: Class Notes / November 14, 2000 back to start 15

35 Total We shall consider this model fit later on. Accepting the model fit for the moment, we note that the intercept is not significant(ly different from zero), which makes sense. The slope is significantly different from 0, indicating that TIME is a useful predictor for PRICE. In particular, each unit increase in polishing time leads to an increase of about 2.5 unit price, on the average. 22S39: Class Notes / November 14, 2000 back to start 15

36 Total We shall consider this model fit later on. Accepting the model fit for the moment, we note that the intercept is not significant(ly different from zero), which makes sense. The slope is significantly different from 0, indicating that TIME is a useful predictor for PRICE. In particular, each unit increase in polishing time leads to an increase of about 2.5 unit price, on the average. Furthermore, TIME explains about 84.5% of the variation in in PRICE. 22S39: Class Notes / November 14, 2000 back to start 15

37 Total We shall consider this model fit later on. Accepting the model fit for the moment, we note that the intercept is not significant(ly different from zero), which makes sense. The slope is significantly different from 0, indicating that TIME is a useful predictor for PRICE. In particular, each unit increase in polishing time leads to an increase of about 2.5 unit price, on the average. Furthermore, TIME explains about 84.5% of the variation in in PRICE. Now, let s stand back to see if the above model provides good fit to the data. 22S39: Class Notes / November 14, 2000 back to start 15

38 Total We shall consider this model fit later on. Accepting the model fit for the moment, we note that the intercept is not significant(ly different from zero), which makes sense. The slope is significantly different from 0, indicating that TIME is a useful predictor for PRICE. In particular, each unit increase in polishing time leads to an increase of about 2.5 unit price, on the average. Furthermore, TIME explains about 84.5% of the variation in in PRICE. Now, let s stand back to see if the above model provides good fit to the data. Note that we shall plot the standardized residuals, defined as residuals divided by the square root of RMS, so that the standardized residuals are approximately N(0, 1) if the fitted model is appropriate for the data. 22S39: Class Notes / November 14, 2000 back to start 15

39 PRICE standardized residuals TIME fitted standardized residuals normal quantiles diameter data quantiles Figure 5: 22S39: Class Notes / November 14, 2000 back to start 16

40 22S39: Class Notes / November 14, 2000 back to start 17

41 We note that there seems to be an outlier, as indicated by the case whose standardized residual has magnitude larger than 3. 22S39: Class Notes / November 14, 2000 back to start 17

42 We note that there seems to be an outlier, as indicated by the case whose standardized residual has magnitude larger than 3. Also, the residuals vs fitted plot appears to show no particular pattern. However, the residuals vs the diameter plot suggests a linear relationship. That is, we can predict the residual values based on the diameter. 22S39: Class Notes / November 14, 2000 back to start 17

43 We note that there seems to be an outlier, as indicated by the case whose standardized residual has magnitude larger than 3. Also, the residuals vs fitted plot appears to show no particular pattern. However, the residuals vs the diameter plot suggests a linear relationship. That is, we can predict the residual values based on the diameter. The above points to the fact that besides TIME, DIAM, diameter of the tableware, is also an important predictor. 22S39: Class Notes / November 14, 2000 back to start 17

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

Stat 501, F. Chiaromonte. Lecture #8

Stat 501, F. Chiaromonte. Lecture #8 Stat 501, F. Chiaromonte Lecture #8 Data set: BEARS.MTW In the minitab example data sets (for description, get into the help option and search for "Data Set Description"). Wild bears were anesthetized,

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Analysis of Bivariate Data

Analysis of Bivariate Data Analysis of Bivariate Data Data Two Quantitative variables GPA and GAES Interest rates and indices Tax and fund allocation Population size and prison population Bivariate data (x,y) Case corr&reg 2 Independent

More information

Chapter 12: Multiple Regression

Chapter 12: Multiple Regression Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x

More information

Multiple Regression Examples

Multiple Regression Examples Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each) SMAM 319 Exam 1 Name 1.Pick the best choice for the multiple choice questions below (10 points 2 each) A b In Metropolis there are some houses for sale. Superman and Lois Lane are interested in the average

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation Scatterplots and Correlation Name Hr A scatterplot shows the relationship between two quantitative variables measured on the same individuals. variable (y) measures an outcome of a study variable (x) may

More information

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot. SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot. 2. Fit the linear regression line. Regression Analysis: y versus x y

More information

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

STAT 3A03 Applied Regression Analysis With SAS Fall 2017 STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information


INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information


EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 009 MODULE 4 : Linear models Time allowed: One and a half hours Candidates should answer THREE questions. Each question carries

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line? Steps for Regression Simple Linear Regression Make a Scatter plot Does it make sense to plot a line? Check Residual Plot (Residuals vs. X) Are there any patterns? Check Histogram of Residuals Is it Normal?

More information


CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS QUESTIONS 5.1. (a) In a log-log model the dependent and all explanatory variables are in the logarithmic form. (b) In the log-lin model the dependent variable

More information

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables 26.1 S 4 /IEE Application Examples: Multiple Regression An S 4 /IEE project was created to improve the 30,000-footlevel metric

More information


LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Linear Regression Communication, skills, and understanding Calculator Use

Linear Regression Communication, skills, and understanding Calculator Use Linear Regression Communication, skills, and understanding Title, scale and label the horizontal and vertical axes Comment on the direction, shape (form), and strength of the relationship and unusual features

More information


STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

STAT 420: Methods of Applied Statistics

STAT 420: Methods of Applied Statistics STAT 420: Methods of Applied Statistics Model Diagnostics Transformation Shiwei Lan, Ph.D. Course website: http://shiwei.stat.illinois.edu/lectures/stat420.html August 15, 2018 Department

More information

Example: 1982 State SAT Scores (First year state by state data available)

Example: 1982 State SAT Scores (First year state by state data available) Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

AP Statistics. The only statistics you can trust are those you falsified yourself. RE- E X P R E S S I N G D A T A ( P A R T 2 ) C H A P 9

AP Statistics. The only statistics you can trust are those you falsified yourself. RE- E X P R E S S I N G D A T A ( P A R T 2 ) C H A P 9 AP Statistics 1 RE- E X P R E S S I N G D A T A ( P A R T 2 ) C H A P 9 The only statistics you can trust are those you falsified yourself. Sir Winston Churchill (1874-1965) (Attribution to Churchill is

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

Polynomial Regression

Polynomial Regression Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)

More information

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3 SMAM 319 Exam1 Name 1. Pick the best choice. (10 points-2 each) _c A. A data set consisting of fifteen observations has the five number summary 4 11 12 13 15.5. For this data set it is definitely true

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

The following formulas related to this topic are provided on the formula sheet:

The following formulas related to this topic are provided on the formula sheet: Student Notes Prep Session Topic: Exploring Content The AP Statistics topic outline contains a long list of items in the category titled Exploring Data. Section D topics will be reviewed in this session.

More information

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference. Steps for Regression Simple Linear Regression Make a Scatter plot Does it make sense to plot a line? Check Residual Plot (Residuals vs. X) Are there any patterns? Check Histogram of Residuals Is it Normal?

More information

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X) Simple Linear Regression 1 Steps for Regression Make a Scatter plot Does it make sense to plot a line? Check Residual Plot (Residuals vs. X) Are there any patterns? Check Histogram of Residuals Is it Normal?

More information


IF YOU HAVE DATA VALUES: Unit 02 Review Ways to obtain a line of best fit IF YOU HAVE DATA VALUES: 1. In your calculator, choose STAT > 1.EDIT and enter your x values into L1 and your y values into L2 2. Choose STAT > CALC > 8.

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

CRP 272 Introduction To Regression Analysis

CRP 272 Introduction To Regression Analysis CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous

More information

What is the easiest way to lose points when making a scatterplot?

What is the easiest way to lose points when making a scatterplot? Day #1: Read 141-142 3.1 Describing Relationships Why do we study relationships between two variables? Read 143-144 Page 144: Check Your Understanding Read 144-149 How do you know which variable to put

More information

Stat 101 L: Laboratory 5

Stat 101 L: Laboratory 5 Stat 101 L: Laboratory 5 The first activity revisits the labeling of Fun Size bags of M&Ms by looking distributions of Total Weight of Fun Size bags and regular size bags (which have a label weight) of

More information

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice Name Period AP Statistics Bivariate Data Analysis Test Review Multiple-Choice 1. The correlation coefficient measures: (a) Whether there is a relationship between two variables (b) The strength of the

More information

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example The Big Picture Remedies after Model Diagnostics The Big Picture Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Residual plots

More information

Multiple Regression an Introduction. Stat 511 Chap 9

Multiple Regression an Introduction. Stat 511 Chap 9 Multiple Regression an Introduction Stat 511 Chap 9 1 case studies meadowfoam flowers brain size of mammals 2 case study 1: meadowfoam flowering designed experiment carried out in a growth chamber general

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 1

More information

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression Chapter 12 12-1 North Seattle Community College BUS21 Business Statistics Chapter 12 Learning Objectives In this chapter, you learn:! How to use regression analysis to predict the value of a dependent

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information


28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007 Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20 The Big

More information


27. SIMPLE LINEAR REGRESSION II 27. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Chapter 14 Multiple Regression Analysis

Chapter 14 Multiple Regression Analysis Chapter 14 Multiple Regression Analysis 1. a. Multiple regression equation b. the Y-intercept c. $374,748 found by Y ˆ = 64,1 +.394(796,) + 9.6(694) 11,6(6.) (LO 1) 2. a. Multiple regression equation b.

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables.

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables. Lab 13. Linear Regression www.nmt.edu/~olegm/382labs/lab13r.pdf Note: the things you will read or type on the computer are in the Typewriter Font. All the files mentioned can be found at www.nmt.edu/~olegm/382labs/

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Math 141. Lecture 20: Regression Remedies. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Math 141. Lecture 20: Regression Remedies. Albyn Jones 1.  jones/courses/ Library 304. Albyn Jones Math 141 Math 141 Lecture 20: Regression Remedies Albyn Jones 1 1 Library 304 jones@reed.edu www.people.reed.edu/ jones/courses/141 LAST TIME Formal Inference: Hypothesis tests and Confidence Intervals for regression

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Dr. Bob Gee Dean Scott Bonney Professor William G. Journigan American Meridian University 1 Learning Objectives Upon successful completion of this module, the student should

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Regression Models. Chapter 4

Regression Models. Chapter 4 Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Introduction Regression analysis

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Is economic freedom related to economic growth?

Is economic freedom related to economic growth? Is economic freedom related to economic growth? It is an article of faith among supporters of capitalism: economic freedom leads to economic growth. The publication Economic Freedom of the World: 2003

More information

Second Midterm Exam Name: Solutions March 19, 2014

Second Midterm Exam Name: Solutions March 19, 2014 Math 3080 1. Treibergs σιι Second Midterm Exam Name: Solutions March 19, 2014 (1. The article Withdrawl Strength of Threaded Nails, in Journal of Structural Engineering, 2001, describes an experiment to

More information

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190. Name Chapter 3 Learning Objectives Identify explanatory and response variables in situations where one variable helps to explain or influences the other. Make a scatterplot to display the relationship

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

2. Outliers and inference for regression

2. Outliers and inference for regression Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16

More information

Section Least Squares Regression

Section Least Squares Regression Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it

More information

Business 320, Fall 1999, Final

Business 320, Fall 1999, Final Business 320, Fall 1999, Final name You may use a calculator and two cheat sheets. You have 3 hours. I pledge my honor that I have not violated the Honor Code during this examination. Obvioiusly, you may

More information

Review of Regression Basics

Review of Regression Basics Review of Regression Basics When describing a Bivariate Relationship: Make a Scatterplot Strength, Direction, Form Model: y-hat=a+bx Interpret slope in context Make Predictions Residual = Observed-Predicted

More information

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments. Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a

More information

23. Inference for regression

23. Inference for regression 23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence

More information

Chapter 3. Diagnostics and Remedial Measures

Chapter 3. Diagnostics and Remedial Measures Chapter 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed Y i = β 0 + β 1 X i + ǫ i i = 1, 2,..., n, where ǫ i iid N(0, σ 2 ), β 0, β 1 and σ 2 are unknown parameters,

More information

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE)) School of Mathematical Sciences MTH5120 Statistical Modelling I Tutorial 4 Solutions The first two models were looked at last week and both had flaws. The output for the third model with log y and a quadratic

More information

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical

More information

School of Mathematical Sciences. Question 1

School of Mathematical Sciences. Question 1 School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 8 and Assignment 7 Solutions Question 1 Figure 1: The residual plots do not contradict the model assumptions of normality, constant

More information

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations

More information



More information

Conditions for Regression Inference:

Conditions for Regression Inference: AP Statistics Chapter Notes. Inference for Linear Regression We can fit a least-squares line to any data relating two quantitative variables, but the results are useful only if the scatterplot shows a

More information

3. Diagnostics and Remedial Measures

3. Diagnostics and Remedial Measures 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed where ɛ i iid N(0, σ 2 ), Y i = β 0 + β 1 X i + ɛ i i = 1, 2,..., n, β 0, β 1 and σ 2 are unknown parameters, X i s

More information

Regression Model Building

Regression Model Building Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation in Y with a small set of predictors Automated

More information

Model Building Chap 5 p251

Model Building Chap 5 p251 Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

SMAM 314 Practice Final Examination Winter 2003

SMAM 314 Practice Final Examination Winter 2003 SMAM 314 Practice Final Examination Winter 2003 You may use your textbook, one page of notes and a calculator. Please hand in the notes with your exam. 1. Mark the following statements True T or False

More information

The response variable depends on the explanatory variable.

The response variable depends on the explanatory variable. A response variable measures an outcome of study. > dependent variables An explanatory variable attempts to explain the observed outcomes. > independent variables The response variable depends on the explanatory

More information

Correlation and Regression Theory 1) Multivariate Statistics

Correlation and Regression Theory 1) Multivariate Statistics Correlation and Regression Theory 1) Multivariate Statistics What is a multivariate data set? How to statistically analyze this data set? Is there any kind of relationship between different variables in

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies

More information

Handbook of Regression Analysis

Handbook of Regression Analysis Handbook of Regression Analysis Samprit Chatterjee New York University Jeffrey S. Simonoff New York University WILEY A JOHN WILEY & SONS, INC., PUBLICATION CONTENTS Preface xi PARTI THE MULTIPLE LINEAR

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. C) 2 1 3

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. C) 2 1 3 Exam Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. ) Subtract the mixed numbers. ) 4-2 A) 2 4 B) 2 C) 2 D) 4 2) If oak flooring, including

More information

Residuals from regression on original data 1

Residuals from regression on original data 1 Residuals from regression on original data 1 Obs a b n i y 1 1 1 3 1 1 2 1 1 3 2 2 3 1 1 3 3 3 4 1 2 3 1 4 5 1 2 3 2 5 6 1 2 3 3 6 7 1 3 3 1 7 8 1 3 3 2 8 9 1 3 3 3 9 10 2 1 3 1 10 11 2 1 3 2 11 12 2 1

More information



More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information