Simple linear regression: estimation, diagnostics, prediction

Size: px
Start display at page:

Download "Simple linear regression: estimation, diagnostics, prediction"

Transcription

1 UPPSALA UNIVERSITY Department of Mathematics Mathematical statistics Regression and Analysis of Variance Autumn 2015 COMPUTER SESSION 1: Regression In the first computer exercise we will study the following subjects: Simple linear regression: estimation, diagnostics, prediction Multiple regression Examples of transformation Let s begin! 1 Simple linear regression For simplicity, we use for large parts of the session the built-in data set mtcars. Load the data by writing: data(mtcars); attach(mtcars) In simple linear regression we assume the model y i = β 0 + β 1 x i + ɛ i, i = 1,..., n where β 0 and β 1 are regression coefficients, (x i, y i ) are observed values and ɛ i NID(0, σ 2 ). Let us, to begin with, study the effect of weight on fuel consumption, that is, we take mpg as our y and wt as our x. We can plot the data by writing: plot(mpg ~ wt, xlab = "Weight", ylab="miles per gallon") As we will see below, mpg wt is interpreted as mpg as a function of wt. 1.1 Estimation of the parameters A useful R-routine for linear regression is lm (short for linear model). The model is written using a symbolic notation suggested by Wilkinson and Rogers: y = β 0 + β 1 x 1 corresponds to y x1 y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 corresponds to y x1+x2+x1*x2 y = β 0 + β 1 x 1 + β 2 x 2 1 corresponds to y x1+i(x1^2) The routine for fitting a linear model by the least-squares method is lm. By the following command, information is collected in the object called m1. m1 <- lm(mpg ~ wt, mtcars) 1

2 The general R command str gives an overview of the object, type str(m1) in this case. Moreover, an output summarizing the important information can be given by summary(m1). In the output from the summary command, to find where R 2 and the estimates of β i are located and which variables are significant. It possible to access the numerical values of an object. Try for instance m1$coefficients summary(m1)$coefficients[2,1] We can add the estimated curve (actually, a straight line) in red to the figure that we plotted earlier by using the following command (note that the short form coef works for coefficients): abline(coef(m1), col="red") Compare the figure with the estimated coefficients and convince yourself that the fit seems reasonable. Confidence interval for the regression coefficients The standard errors of the estimators are given in the summary-table. To calculate a confidence interval for β 1 we need a t-quantile, which we obtain using the function qt. # Some preparations beta1 <- summary(m1)$coefficients[2,1] sterror <- summary(m1)$coefficients[2,2] f <- m1$df.res # Degrees of freedom quantile <- qt(0.975,f) # the (or 0.975) t-quantile # The interval c(beta1 - quantile*sterror, beta1 + quantile*sterror) 1.2 Diagnostics To study the impact of randomness we study the residuals y i ŷ i. These are extracted from the model by residuals(m1) Let us, to begin with, plot the sequence of raw residuals. plot(residuals(m1)) Does there seem to be some evident pattern in the sequence? If so, we need to examine our data more carefully. For a (very) rough check of the normality assumption we can look at the histogram of the observations: hist(residuals(m1)) 2

3 Does the histogram resemble the normal bell-curve? Looking at a Q-Q-plot is probably a better way to investigate the normality assumption. We get one by writing qqnorm(residuals(m1)); qqline(residuals(m1)) Do the points follow the line? A popular (and, usually, powerful) formal test for normality is the Shapiro-Wilk test: shapiro.test(residuals(m1)) Remind yourself what the null hypothesis of the test is. What is the conclusions given by the p-value? Extracting the design matrix Given a model from a call of lm we can extract the so called design matrix, i.e. X in the matrix formulation of the regression model; y = Xβ. The following command extracts the matrix and calculates the useful matrix (X T X) 1. X <- model.matrix(m1); solve( t(x) %*% X ) EXERCISE. Recall from theory that the estimated covariance matrix for the regression coefficients is σ 2 (X T X) 1. For an object m1, this can be found as vcov(m1) in R. Now, verify the elements in the covariance matrix by multiplying suitable elements of (X T X) 1 found above and the estimated variance of the residuals (which is found in the summary table). Compare your answer to that found elsewhere in the summary table (squaring the standard errors forthe individual parameter estimates). Diagnostics plots Next we ll see how we can plot different residuals and leverage/influence measures. To plot some useful figures we can write # Plot are wanted in two rows and two columns: par(mfrow=c(2,2)) # Sequence of residuals: plot(residuals(m1)) # Residuals against fitted values: plot(m1$fit,m1$res) # R-Student residuals: plot(rstudent(m1)) # Cook s distance: plot(cooks.distance(m1)) Judging by the Cook s distance, it seems that some points have large influence. To find out which, use the commands below. A crosshair will appear in the figure; click on interesting points using the the left mouse button to identify them and click the right mouse button to end the identification procedure. car.models <- row.names(mtcars) identify(1:32,cooks.distance(m1),car.models) 3

4 If we only are interested in the index of the observation, we leave out the car.models part of the identify call. Given an lm object, e.g. m1 in our case, R can give a lot of figures for residual diagnostics. Simply write plot(m1) Press Enter on the keyboard to go to the next figure. 1.3 Prediction With a model from lm we can easily do prediction: predict(m1,mtcars) Compare the predicted values ŷ i to the original observations y i. Prediction intervals are obtained by adding another argument: predict(m1, mtcars, interval="prediction") Let us now assume that we want to predict arbitrary values and that we want to show the result in a figure. We plot confidence intervals (for the curve) as well as prediction intervals (for the values that we want to predict; wider and more insecure ). attach(mtcars) # Sequence of x-values that we want to do prediction for pred.frame <- data.frame(wt=seq(1.5,5.5,0.5)) # Calculate prediction and confidence intervals pp <- predict(m1, int="p", newdata = pred.frame) pc <- predict(m1, int="c", newdata = pred.frame) # Graphics (introducing the command matlines) plot(wt,mpg,ylim=range(mpg,pp,na.rm=t)) pred.mpg <- pred.frame$wt matlines(pred.mpg,pc,lty=c(1,2,2),col="blue") matlines(pred.mpg,pp,lty=c(1,3,3),col="black") For the graphical options with lty, se the help text for par. For instance, lty=1,2 or 3 correspond to solid, dashed or dotted, respectively. 2 Multiple regression Next we ll use a second explanatory variable: the power of the engine measured in horse powers (hp). An lm call storing the model in m2 is now m2 <- lm(mpg ~ wt + hp); summary(m2) Feel free to examine residuals, normality and influence as we did above, this time using m2. It might also be of interest to look at the design matrix. Compare the R 2 of m1 with that of m2. Was it what you expected? Prediction: Assume that we want to predict the fuel consumption of a new, rather heavy, car with a small engine: wt= 3.5, hp= 90. The commands are: 4

5 x0 <- data.frame(wt=3.5,hp=90) yhat <- predict(m2,x0) EXERCISE. The hat matrix H, satisfying ŷ = Hy, is given by H = X(X T X) 1 X T. Plot studentized residuals for the fitted object m2. Do you find, perhaps, three spurious observations? Find the related values of the hat matrix, either by typing in from definition (using model.matrix and diag) or the ready-to use command as hatvalues(m2). Use the rule of thumb that hat values for leverage points exceed 2(k + 1)/n where k is the number of explanatory variables and n the number of observations and draw conclusions for the observations. 2.1 Polynomial regression and model choice Suppose we want to introduce a model with a quadratic term, when modelling mpg as a function of wt. (Could seem possible from the original plot?) The commands in R are as follow: mqua <- lm(mpg ~ wt + I(wt^2), mtcars); summary(mqua) Compare the R 2 values between models m1 and mqua (also the adjusted ones). Now, consider an example also including a qualitative variable. An engineer wants to estimate the expected time E[Y ] per month (in hours) for machines out of use due to maintenance as a function of the explanatory variables machine type (1 or 2) and the age of the machine (in years). The following model is suggested: E[Y ] = β 0 + β 1 x 1 + β 2 x β 3 x 2 where x 1 is the age of the machine, x 2 the type of machine (x = 1: Type 1, x = 0, Type 2). In the file shutdown.dat data is collected. Save into a suitable directory, read into R and study the data structure: mask <- read.table("shutdown.dat") mask str(mask) attach(mask) EXERCISE. (a) Estimation of parameters: mask0 <- lm(v1 ~ V2 + I(V2^2) + V4) summary(mask0) Does the model seem reasonable? What variables are significant? 5

6 (b) A simpler model is considered, and one wants to test β 1 = β 2 = 0 (at the level α = 0.10). More precisely, H 0 : β 1 = β 2 = 0 H 1 : At least one β i 0, i = 1, 2. We may use the fact (Sundberg, page 75) that F = R2 0 R2 1 k l /1 R2 0 N k F (k l, N k) where R 2 0 and R2 1 are the R2 values of the general (full) model and hypothesis model, respectively, and we have N = 20, k = 3 and l = 1. If the observed value of the test quantity F is larger than the F quantile, we reject the hypothesis of the simpler model and reperation time depends on age. Find if this is the case for our data. (A quantile from the F distribution is found by using qf.) 3 Transformations Transformations are often useful in regression. We ll study the effect of heating on the strength of vegetables (e.g. carrots). Such experiments are of importance for packing strategies for food. The data comes from an experiment from Belgium. The temperature was fixed at 90 C, the heating times were measured in seconds and the force in N. We import the data (download the file skalla.dat from the course page of the student portal) and plot the strength as a function of the length of the heating time. We also plot a figure where we ve taken the logarithm of the strength: skalla <- read.table("skalla.dat", col.names=c("temp","time","force") ) attach(skalla) plot(force~time) plot(log(force)~time) Do you think that the use of the logarithm will give us a better linear regression and a more homogeneous variance? The regression models are fitted as usual: m3 <- lm(force ~ Time) m4 <- lm(log(force) ~ Time) Look at the result of the regression: significant variables, R 2 and so on. If you have the time, use the boxcox function in R to find out whether taking the logarithm is a good transformation. When selecting a power transformation of the response variable with the Box Cox method, a suggested exponent of 0 corresponds to taking logarithm. The routine is found in the MASS-package, activate this by typing library(mass). If not already installed in your computer environment, it can be installed by writing install.packages(), choosing Sweden and then choosing MASS. For the routine, use help(boxcox) to get instructions on its usage. 6

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

COMPUTER SESSION: ARMA PROCESSES

COMPUTER SESSION: ARMA PROCESSES UPPSALA UNIVERSITY Department of Mathematics Jesper Rydén Stationary Stochastic Processes 1MS025 Autumn 2010 COMPUTER SESSION: ARMA PROCESSES 1 Introduction In this computer session, we work within the

More information

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Generating OLS Results Manually via R

Generating OLS Results Manually via R Generating OLS Results Manually via R Sujan Bandyopadhyay Statistical softwares and packages have made it extremely easy for people to run regression analyses. Packages like lm in R or the reg command

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model Checking/Diagnostics Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics The session is a continuation of a version of Section 11.3 of MMD&S. It concerns

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Computer exercise 4 Poisson Regression

Computer exercise 4 Poisson Regression Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

Regression Analysis in R

Regression Analysis in R Regression Analysis in R 1 Purpose The purpose of this activity is to provide you with an understanding of regression analysis and to both develop and apply that knowledge to the use of the R statistical

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11)

Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11) Introductory Statistics with R: Linear models for continuous response (Chapters 6, 7, and 11) Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh

More information

Chapter 16: Understanding Relationships Numerical Data

Chapter 16: Understanding Relationships Numerical Data Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear

More information

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator. B. Sc. Examination by course unit 2014 MTH5120 Statistical Modelling I Duration: 2 hours Date and time: 16 May 2014, 1000h 1200h Apart from this page, you are not permitted to read the contents of this

More information

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Multicollinearity Read Section 7.5 in textbook. Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response. Example of multicollinear

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE. It would be very unusual for all the research one might conduct to be restricted to

CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE. It would be very unusual for all the research one might conduct to be restricted to CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE It would be very unusual for all the research one might conduct to be restricted to comparisons of only two samples. Respondents and various groups are seldom divided

More information

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data

More information

Passing-Bablok Regression for Method Comparison

Passing-Bablok Regression for Method Comparison Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

More information

Chapter 5 Exercises 1

Chapter 5 Exercises 1 Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine

More information

Probability Distributions

Probability Distributions CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 ) Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship

More information

Chapter 3 - Linear Regression

Chapter 3 - Linear Regression Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

Inference Tutorial 2

Inference Tutorial 2 Inference Tutorial 2 This sheet covers the basics of linear modelling in R, as well as bootstrapping, and the frequentist notion of a confidence interval. When working in R, always create a file containing

More information

Motor Trend Car Road Analysis

Motor Trend Car Road Analysis Motor Trend Car Road Analysis Zakia Sultana February 28, 2016 Executive Summary You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are

More information

Linear Models II. Chapter Key ideas

Linear Models II. Chapter Key ideas Chapter 6 Linear Models II 6.1 Key ideas Consider a situation in which we take measurements of some attribute Y on two distinct group. We want to know whether the mean of group 1, µ 1, is different from

More information

Ch3. TRENDS. Time Series Analysis

Ch3. TRENDS. Time Series Analysis 3.1 Deterministic Versus Stochastic Trends The simulated random walk in Exhibit 2.1 shows a upward trend. However, it is caused by a strong correlation between the series at nearby time points. The true

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Leiden University Leiden, 30 April 2018 Outline 1 Error assumptions Introduction Variance Normality 2 Residual vs error Outliers Influential observations Introduction Errors and

More information

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of Wooldridge, Introductory Econometrics, d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of homoskedasticity of the regression error term: that its

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Module 6: Model Diagnostics

Module 6: Model Diagnostics St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

STAT 520 FORECASTING AND TIME SERIES 2013 FALL Homework 05

STAT 520 FORECASTING AND TIME SERIES 2013 FALL Homework 05 STAT 520 FORECASTING AND TIME SERIES 2013 FALL Homework 05 1. ibm data: The random walk model of first differences is chosen to be the suggest model of ibm data. That is (1 B)Y t = e t where e t is a mean

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Time Series Analysis. Smoothing Time Series. 2) assessment of/accounting for seasonality. 3) assessment of/exploiting "serial correlation"

Time Series Analysis. Smoothing Time Series. 2) assessment of/accounting for seasonality. 3) assessment of/exploiting serial correlation Time Series Analysis 2) assessment of/accounting for seasonality This (not surprisingly) concerns the analysis of data collected over time... weekly values, monthly values, quarterly values, yearly values,

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Ratio of Polynomials Fit Many Variables

Ratio of Polynomials Fit Many Variables Chapter 376 Ratio of Polynomials Fit Many Variables Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Instead of a single independent variable, these polynomials

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis CHAPTER 8 MODEL DIAGNOSTICS We have now discussed methods for specifying models and for efficiently estimating the parameters in those models. Model diagnostics, or model criticism, is concerned with testing

More information

Regression_Model_Project Md Ahmed June 13th, 2017

Regression_Model_Project Md Ahmed June 13th, 2017 Regression_Model_Project Md Ahmed June 13th, 2017 Executive Summary Motor Trend is a magazine about the automobile industry. It is interested in exploring the relationship between a set of variables and

More information

Graphical Diagnosis. Paul E. Johnson 1 2. (Mostly QQ and Leverage Plots) 1 / Department of Political Science

Graphical Diagnosis. Paul E. Johnson 1 2. (Mostly QQ and Leverage Plots) 1 / Department of Political Science (Mostly QQ and Leverage Plots) 1 / 63 Graphical Diagnosis Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas. (Mostly QQ and Leverage

More information

Warm-up Using the given data Create a scatterplot Find the regression line

Warm-up Using the given data Create a scatterplot Find the regression line Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444

More information

Take-home Final. The questions you are expected to answer for this project are numbered and italicized. There is one bonus question. Good luck!

Take-home Final. The questions you are expected to answer for this project are numbered and italicized. There is one bonus question. Good luck! Take-home Final The data for our final project come from a study initiated by the Tasmanian Aquaculture and Fisheries Institute to investigate the growth patterns of abalone living along the Tasmanian

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Polynomial Regression

Polynomial Regression Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing

More information

Stat 311: HW 9, due Th 5/27/10 in your Quiz Section

Stat 311: HW 9, due Th 5/27/10 in your Quiz Section Stat 311: HW 9, due Th 5/27/10 in your Quiz Section Fritz Scholz Your returned assignment should show your name and student ID number. It should be printed or written clearly. 1. The data set ReactionTime

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Multiple Linear Regression (solutions to exercises)

Multiple Linear Regression (solutions to exercises) Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................

More information

R STATISTICAL COMPUTING

R STATISTICAL COMPUTING R STATISTICAL COMPUTING some R Examples Dennis Friday 2 nd and Saturday 3 rd May, 14. Topics covered Vector and Matrix operation. File Operations. Evaluation of Probability Density Functions. Testing of

More information

STAT 420: Methods of Applied Statistics

STAT 420: Methods of Applied Statistics STAT 420: Methods of Applied Statistics Model Diagnostics Transformation Shiwei Lan, Ph.D. Course website: http://shiwei.stat.illinois.edu/lectures/stat420.html August 15, 2018 Department

More information

Notes on Maxwell & Delaney

Notes on Maxwell & Delaney Notes on Maxwell & Delaney PSY710 9 Designs with Covariates 9.1 Blocking Consider the following hypothetical experiment. We want to measure the effect of a drug on locomotor activity in hyperactive children.

More information

Analysis of 2x2 Cross-Over Designs using T-Tests

Analysis of 2x2 Cross-Over Designs using T-Tests Chapter 234 Analysis of 2x2 Cross-Over Designs using T-Tests Introduction This procedure analyzes data from a two-treatment, two-period (2x2) cross-over design. The response is assumed to be a continuous

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 4. Linear Regression 4.1 Introduction So far our data have consisted of observations on a single variable of interest.

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis Model diagnostics is concerned with testing the goodness of fit of a model and, if the fit is poor, suggesting appropriate modifications. We shall present two complementary approaches: analysis of residuals

More information

Lecture 1: Linear Models and Applications

Lecture 1: Linear Models and Applications Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

STA442/2101: Assignment 5

STA442/2101: Assignment 5 STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to

More information

Determination of Density 1

Determination of Density 1 Introduction Determination of Density 1 Authors: B. D. Lamp, D. L. McCurdy, V. M. Pultz and J. M. McCormick* Last Update: February 1, 2013 Not so long ago a statistical data analysis of any data set larger

More information

Lecture 9: Predictive Inference

Lecture 9: Predictive Inference Lecture 9: Predictive Inference There are (at least) three levels at which we can make predictions with a regression model: we can give a single best guess about what Y will be when X = x, a point prediction;

More information

You can use numeric categorical predictors. A categorical predictor is one that takes values from a fixed set of possibilities.

You can use numeric categorical predictors. A categorical predictor is one that takes values from a fixed set of possibilities. CONTENTS Linear Regression Prepare Data To begin fitting a regression, put your data into a form that fitting functions expect. All regression techniques begin with input data in an array X and response

More information

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly

More information

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv). Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,

More information

Ratio of Polynomials Fit One Variable

Ratio of Polynomials Fit One Variable Chapter 375 Ratio of Polynomials Fit One Variable Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Examples of this type of model are: and Y = A0 + A1 X

More information

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 36 Simple Linear Regression Model Assessment So, welcome to the second lecture on

More information

1. (Problem 3.4 in OLRT)

1. (Problem 3.4 in OLRT) STAT:5201 Homework 5 Solutions 1. (Problem 3.4 in OLRT) The relationship of the untransformed data is shown below. There does appear to be a decrease in adenine with increased caffeine intake. This is

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Inference for Single Proportions and Means T.Scofield

Inference for Single Proportions and Means T.Scofield Inference for Single Proportions and Means TScofield Confidence Intervals for Single Proportions and Means A CI gives upper and lower bounds between which we hope to capture the (fixed) population parameter

More information