Applied Regression Analysis. Section 4: Diagnostics and Transformations

Similar documents
Advanced Regression Summer Statistics Institute. Day 3: Transformations and Non-Linear Models

Other Stuff and Examples: Polynomials, Logs, Time Series, Model Selection, Logistic Regression...

Week 4: Diagnostics and Transformations

Regression and Model Selection Book Chapters 3 and 6. Carlos M. Carvalho The University of Texas McCombs School of Business

Section 5: Dummy Variables and Interactions

Section 4: Multiple Linear Regression

Section 3: Simple Linear Regression

Applied Regression Analysis. Section 2: Multiple Linear Regression

Regression 1. Rebecca C. Steorts Predictive Modeling: STA 521. September

9. Linear Regression and Correlation

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Week 3: Multiple Linear Regression

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

Big Data BUS Week 2: Regression. University of Chicago Booth School of Business.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

The Simple Linear Regression Model

Learning Objectives. Math Chapter 3. Chapter 3. Association. Response and Explanatory Variables

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Inference with Simple Regression

Simple Regression Model. January 24, 2011

Advanced Regression Summer Statistics Institute. Day 2: MLR and Dummy Variables

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Week 3: Simple Linear Regression

Unit 6 - Introduction to linear regression

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

The flu example from last class is actually one of our most common transformations called the log-linear model:

Chapter 14 Student Lecture Notes 14-1

Chapter 16. Simple Linear Regression and dcorrelation

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

STAT 420: Methods of Applied Statistics

Inferences for Regression

Data Analysis and Statistical Methods Statistics 651

Homework 2: Simple Linear Regression

Interpreting coefficients for transformed variables

Fitting a regression model

Ch 2: Simple Linear Regression

Unit 6 - Simple linear regression

Correlation Analysis

CHAPTER 6: SPECIFICATION VARIABLES

Lecture 19: Inference for SLR & Transformations

2. Linear regression with multiple regressors

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

Simple linear regression

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Multiple Regression Analysis: Further Issues

Physics 509: Bootstrap and Robust Parameter Estimation

1 A Non-technical Introduction to Regression

Solving a Series. Carmen Bruni

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

9 Correlation and Regression

Basic Business Statistics 6 th Edition

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Lectures on Simple Linear Regression Stat 431, Summer 2012

Chapter 6 Scatterplots, Association and Correlation

Formal Statement of Simple Linear Regression Model

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Chapter 16. Simple Linear Regression and Correlation

Nonlinear Regression Functions

ECO220Y Simple Regression: Testing the Slope

Multiple Regression Analysis

R 2 and F -Tests and ANOVA

Multiple Regression: Chapter 13. July 24, 2015

Chapter 7 Linear Regression

Linear Regression is a very popular method in science and engineering. It lets you establish relationships between two or more numerical variables.

3. Diagnostics and Remedial Measures

22S39: Class Notes / November 14, 2000 back to start 1

STATS DOESN T SUCK! ~ CHAPTER 16


22s:152 Applied Linear Regression. Chapter 6: Statistical Inference for Regression

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

Mathematics for Economics MA course

1 Multiple Regression

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Lecture 9 SLR in Matrix Form

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Midterm 2 - Solutions

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Exam Applied Statistical Regression. Good Luck!

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,

Inference for the Regression Coefficient

Interpretation, Prediction and Confidence Intervals

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Statistics for Managers using Microsoft Excel 6 th Edition

Chapter 3 Multiple Regression Complete Example

Transformations. Merlise Clyde. Readings: Gelman & Hill Ch 2-4

Transcription:

Applied Regression Analysis Section 4: Diagnostics and Transformations 1

Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions of our linear regression model: (i) The mean of Y is linear in X s. (ii) The additive errors (deviations from line) are normally distributed independent from each other identically distributed (i.e., they have constant variance) Y i X i N(β 0 + β 1 X i, σ 2 ) 2

Regression Model Assumptions Inference and prediction relies on this model being true If the model assumptions do not hold, then all bets are off: prediction can be systematically biased standard errors, intervals, and t-tests are wrong We will focus on using graphical methods (plots) to detect violations of the model assumptions. 3

Example y1 4 5 6 7 8 9 11 y2 3 4 5 6 7 8 9 4 6 8 10 12 14 x1 4 6 8 10 12 14 x2 Here we have two datasets... Which one looks compatible with our 6 8 10 12 modeling assumptions? y3 y4 6 8 10 12 4 6 8 10 12 14 x3 8 10 12 14 16 18 x4 4

Example Example where things can go bad (1) (2) 5

Example The regression output values are exactly the same... y1 4 5 6 7 8 9 10 11 y2 3 4 5 6 7 8 9 4 6 8 10 12 14 x1 4 6 8 10 12 14 x2 Thus, whatever decision or action we might take based on the 6 8 10 12 output would be the same in both cases y3 y4 6 8 10 12 4 6 8 10 12 14 x3 8 10 12 14 16 18 x4 6

Example...but the residuals (plotted against Ŷ ) look totally different reg1$residuals -2-1 0 1 reg2$residuals -2.0-1.0 0.0 1.0 5 6 7 8 9 10 5 6 7 8 9 10 reg1$fitted reg2$fitted reg3$residuals -1 0 1 2 3 Plotting e vs Ŷ is your #1 tool for finding fit problems. reg4$residuals -1.5-0.5 0.5 1.5 5 6 7 8 9 10 reg3$fitted 7 8 9 10 11 12 reg4$fitted 7

Residual Plots We use residual plots to diagnose potential problems with the model. From the model assumptions, the error term (ɛ) should have a few properties... we use the residuals (e) as a proxy for the errors as: ɛ i = y i (β 0 + β 1 x 1i + β 2 x 2i + + β p x pi ) y i (b 0 + b 1 x 1i + b 2 x 2i + + b p x pi = e i 8

Residual Plots What kind of properties should the residuals have?? e i N(0, σ 2 ) iid and independent from the X s We should see no pattern between e and each of the X s This can be summarized by looking at the plot between Ŷ and e Remember that Ŷ is pure X, i.e., a linear function of the X s. If the model is good, the regression should have pulled out of Y all of its x ness... what is left over (the residuals) should have nothing to do with X. 9

Example Mid City (Housing) Left: ŷ vs. y Right: ŷ vs e Example, the midcity housing regression: Left: y vs fits, Right: fits vs. resids (ŷ vs. e). 100 120 140 160 180 80 120 160 200 yhat y=price 100 120 140 160 180 30 10 0 10 20 30 yhat e=resid 10

Example Mid City (Housing) Size vs. e x= size of house vs. resids for midcity multiple regression. 1.6 1.8 2.0 2.2 2.4 2.6 30 20 10 0 10 20 30 size resids 11

Example Mid City (Housing) In the Mid City housing example, the residuals plots (both X vs. e and Ŷ vs. e) showed no obvious problem... This is what we want Although these plots don t guarantee that all is well it is a very good sign that the model is doing a good job. 12

Non Linearity Example: Telemarketing How does length of employment affect productivity (number of calls per day)? 13

Non Linearity Example: Telemarketing Residual plot highlights the non-linearity 14

Non Linearity What can we do to fix this?? We can use multiple regression and transform our X to create a no linear model... Let s try Y = β 0 + β 1 X + β 2 X 2 + ɛ The data... months months2 calls 10 100 18 10 100 19 11 121 22 14 196 23 15 225 25......... 15

Telemarketing Adding Polynomials Linear Model Applied Regression Analysis Carlos M. Carvalho Week VIII. Slide 5 16

Telemarketing Adding Polynomials With X 2 Applied Regression Analysis Carlos M. Carvalho Week VIII. Slide 6 17

Telemarketing Adding Polynomials Applied Regression Analysis 18

Telemarketing What is the marginal effect of X on Y? E[Y X ] X = β 1 + 2β 2 X To better understand the impact of changes in X on Y you should evaluate different scenarios. Moving from 10 to 11 months of employment raises productivity by 1.47 calls Going from 25 to 26 months only raises the number of calls by 0.27. 19

Polynomial Regression Even though we are limited to a linear mean, it is possible to get nonlinear regression by transforming the X variable. In general, we can add powers of X to get polynomial regression: Y = β 0 + β 1 X + β 2 X 2... + β m X m You can fit any mean function if m is big enough. Usually, m = 2 does the trick. 20

Closing Comments on Polynomials We can always add higher powers (cubic, etc) if necessary. Be very careful about predicting outside the data range. The curve may do unintended things beyond the observed data. Watch out for over-fitting... remember, simple models are better. 21

Be careful when extrapolating... calls 20 25 30 10 15 20 25 30 35 40 months 22

...and, be careful when adding more polynomial terms calls 15 20 25 30 35 40 2 3 8 10 15 20 25 30 35 40 months 23

Variable Interaction So far we have considered the impact of each independent variable in a additive way. We can extend this notion by the inclusion of multiplicative effects through interaction terms. This provides another way to model non-linearities Y i = β 0 + β 1 X 1i + β 2 X 2i + β 3 (X 1i X 2i ) + ε E[Y X 1, X 2 ] X 1 = β 1 + β 3 X 2 What does that mean? 24

Example: College GPA and Age Consider the connection between college and MBA grades: A model to predict McCombs GPA from college GPA could be GPA MBA = β 0 + β 1 GPA Bach + ε Estimate Std.Error t value Pr(> t ) BachGPA 0.26269 0.09244 2.842 0.00607 ** For every 1 point increase in college GPA, your expected GPA at McCombs increases by about.26 points. 25

College GPA and Age However, this model assumes that the marginal effect of College GPA is the same for any age. It seems that how you did in college should have less effect on your MBA GPA as you get older (farther from college). We can account for this intuition with an interaction term: GPA MBA = β 0 + β 1 GPA Bach + β 2 (Age GPA Bach ) + ε Now, the college effect is E[GPAMBA GPA Bach Age] GPA Bach = β 1 + β 2 Age. Depends on Age 26

College GPA and Age GPA MBA = β 0 + β 1 GPA Bach + β 2 (Age GPA Bach ) + ε Here, we have the interaction term but do not the main effect of age... what are we assuming? Estimate Std.Error t value Pr(> t ) BachGPA 0.455750 0.103026 4.424 4.07e-05 *** BachGPA:Age -0.009377 0.002786-3.366 0.00132 ** 27

College GPA and Age Without the interaction term Marginal effect of College GPA is b 1 = 0.26. With the interaction term: Marginal effect is b 1 + b 2 Age = 0.46 0.0094Age. Age Marginal Effect 25 0.22 30 0.17 35 0.13 40 0.08 28

Non-constant Variance Example... This violates our assumption that all ε i have the same σ 2. 29

Non-constant Variance Consider the following relationship between Y and X : Y = γ 0 X β 1 (1 + R) where we think about R as a random percentage error. On average we assume R is 0... but when it turns out to be 0.1, Y goes up by 10% Often we see this, the errors are multiplicative and the variation is something like ±10% and not ±10. This leads to non-constant variance (or heteroskedasticity) 30

The Log-Log Model We have data on Y and X and we still want to use a linear regression model to understand their relationship... what if we take the log (natural log) of Y? [ ] log(y ) = log γ 0 X β 1 (1 + R) log(y ) = log(γ 0 ) + β 1 log(x ) + log(1 + R) Now, if we call β 0 = log(γ 0 ) and ɛ = log(1 + R) the above leads to log(y ) = β 0 + β 1 log(x ) + ɛ a linear regression of log(y ) on log(x ) 31

The Log-Log Model Consider a country s GDP as a function of IMPORTS: Since trade multiplies, we might expect to see %GDP to increase with %IMPORTS. GDP 0 2000 4000 6000 8000 10000 Japan India Brazil United France Kingdom Canada rgentina Australia Bolivia Denmark Egypt Samoa Liberia Nigeria amaica auritius Cuba anama Greece Finland Haiti Israel Malaysia Netherlands United States log(gdp) 0 2 4 6 8 United States Japan India Brazil United France Kingdom Canada Australia ArgentinaNetherlands Egypt Greece Malaysia Finland Denmark Nigeria Israel BoliviaCuba Panama Haiti Mauritius Jamaica Liberia Samoa 0 200 400 600 800 1000 1200 IMPORTS -2 0 2 4 6 8 log(imports) 32

Elasticity and the log-log Model In a log-log model, the slope β 1 is sometimes called elasticity. In english, a 1% increase in X gives a beta % increase in Y. β 1 d%y d%x (Why?) For example, economists often assume that GDP has import elasticity of 1. Indeed, log(gdp) = β 0 + β 1 log(imports) Coefficients: (Intercept) log(imports) 1.8915 0.9693 33

Price Elasticity In economics, the slope coefficient β 1 in the regression log(sales) = β 0 + β 1 log(price) + ε is called price elasticity. This is the % change in sales per 1% change in price. The model implies that E[sales] = A price β 1 where A = exp(β 0 ) 34

Price Elasticity of OJ A chain of gas station convenience stores was interested in the dependency between price of and Sales for orange juice... They decided to run an experiment and change prices randomly at different locations. With the data in hands, let s first run an regression of Sales on Price: SUMMARY OUTPUT Sales = β 0 + β 1 Price + ɛ Regression Statistics Multiple R 0.719 R Square 0.517 Adjusted R Square 0.507 Standard Error 20.112 Observations 50.000 ANOVA df SS MS F Significance F Regression 1.000 20803.071 20803.071 51.428 0.000 Residual 48.000 19416.449 404.509 Total 49.000 40219.520 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 89.642 8.610 10.411 0.000 72.330 106.955 Price -20.935 2.919-7.171 0.000-26.804-15.065 35

Price Elasticity of OJ Fitted Model Residual Plot Sales 20 40 60 80 100 120 140 residuals -20 0 20 40 60 80 1.5 2.0 2.5 3.0 3.5 4.0 Price 1.5 2.0 2.5 3.0 3.5 4.0 Price No good 36

Price Elasticity of OJ But... would you really think this relationship would be linear? Moving a price from $1 to $2 is the same as changing it form $10 to $11?? We should probably be thinking about the price elasticity of OJ... log(sales) = γ 0 + γ 1 log(price) + ɛ SUMMARY OUTPUT Regression Statistics Multiple R 0.869 R Square 0.755 Adjusted R Square 0.750 Standard Error 0.386 Observations 50.000 ANOVA df SS MS F Significance F Regression 1.000 22.055 22.055 148.187 0.000 Residual 48.000 7.144 0.149 Total 49.000 29.199 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 4.812 0.148 32.504 0.000 4.514 5.109 LogPrice -1.752 0.144-12.173 0.000-2.042-1.463 How do we interpret ˆγ 1 = 1.75? (When prices go up 1%, sales go down by 1.75%) 37

Price Elasticity of OJ Fitted Model Residual Plot Sales 20 40 60 80 100 120 140 residuals -0.5 0.0 0.5 1.5 2.0 2.5 3.0 3.5 4.0 Price 1.5 2.0 2.5 3.0 3.5 4.0 Price Much better 38

Making Predictions What if the gas station store wants to predict their sales of OJ if they decide to price it at $1.8? The predicted log(sales) = 4.812 + ( 1.752) log(1.8) = 3.78 So, the predicted Sales = exp(3.78) = 43.82. How about the plug-in prediction interval? In the log scale, our predicted interval in [ log(sales) 2s; log(sales) + 2s] = [3.78 2(0.38); 3.78 + 2(0.38)] = [3.02; 4.54]. In terms of actual Sales the interval is [exp(3.02), exp(4.54)] = [20.5; 93.7] 39

Making Predictions Plug-in Prediction Plug-in Prediction Sales 20 40 60 80 100 120 140 log(sales) 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.5 2.0 2.5 3.0 3.5 4.0 0.4 0.6 0.8 1.0 1.2 1.4 Price log(price) In the log scale (right) we have [Ŷ 2s; Ŷ + 2s] In the original scale (left) we have [exp(ŷ ) exp( 2s); exp(ŷ ) exp(2s)] 40

Some additional comments... Another useful transformation to deal with non-constant variance is to take only the log(y ) and keep X the same. Clearly the elasticity interpretation no longer holds. Always be careful in interpreting the models after a transformation Also, be careful in using the transformed model to make predictions 41

Summary of Transformations Coming up with a good regression model is usually an iterative procedure. Use plots of residuals vs X or Ŷ to determine the next step. Log transform is your best friend when dealing with non-constant variance (log(x ), log(y ), or both). Add polynomial terms (e.g. X 2 ) to get nonlinear regression. The bottom line: you should combine what the plots and the regression output are telling you with your common sense and knowledge about the problem. Keep playing around with it until you get something that makes sense and has nothing obviously wrong with it. 42

Outliers Body weight vs. brain weight... X =body weight of a mammal in kilograms Y =brain weight of a mammal in grams brain 0 1000 2000 3000 4000 5000 std residuals -2 0 2 4 6 0 1000 2000 3000 4000 5000 6000 body 0 1000 2000 3000 4000 5000 6000 body Do additive errors make sense here?? Also, what are the standardized residuals plotted above? 43

Standardized Residuals In our model ɛ N(0, σ 2 ) The residuals e are a proxy for ɛ and the standard error s is an estimate for σ Call z = e/s, the standardized residuals... We should expect z N(0, 1) (How aften should we see an observation of z > 3?) 44

Outliers Let s try logs... log(brain) -2 0 2 4 6 8 std residuals -4-2 0 2 4-4 -2 0 2 4 6 8 log(body) -4-2 0 2 4 6 8 log(body) Great, a lot better But we see a large and positive potential outlier... the Chinchilla 45

Outliers It turns out that the data had the brain of a Chinchilla weighting 64 grams In reality, it is 6.4 grams... after correcting it: log(brain) -2 0 2 4 6 8 std residuals -4-2 0 2 4-4 -2 0 2 4 6 8 log(body) -4-2 0 2 4 6 8 log(body) 46

How to Deal with Outliers When should you delete outliers? Only when you have a really good reason There is nothing wrong with running regression with and without potential outliers to see whether results are significantly impacted. Any time outliers are dropped the reasons for removing observations should be clearly noted. 47