Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Similar documents
Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Review: Second Half of Course Stat 704: Data Analysis I, Fall 2014

STAT 4385 Topic 06: Model Diagnostics

Chapter 10 Building the Regression Model II: Diagnostics

Multiple Linear Regression

STAT 540: Data Analysis and Regression

Regression Diagnostics Procedures

STAT5044: Regression and Anova

Remedial Measures for Multiple Linear Regression Models

holding all other predictors constant

Regression Diagnostics for Survey Data

Matrix Approach to Simple Linear Regression: An Overview

Diagnostics and Remedial Measures: An Overview

CHAPTER 5. Outlier Detection in Multivariate Data

Regression Model Building

Lectures on Simple Linear Regression Stat 431, Summer 2012

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

Formal Statement of Simple Linear Regression Model

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Multicollinearity and A Ridge Parameter Estimation Approach

Quantitative Methods I: Regression diagnostics

Regression diagnostics

MLR Model Checking. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Math 423/533: The Main Theoretical Topics

10 Model Checking and Regression Diagnostics

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

Data Mining Stat 588

Unit 10: Simple Linear Regression and Correlation

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

A Modern Look at Classical Multivariate Techniques

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

The Model Building Process Part I: Checking Model Assumptions Best Practice

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

3 Multiple Linear Regression

STAT Checking Model Assumptions

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Diagnostics and Remedial Measures

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

MIT Spring 2015

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Checking model assumptions with regression diagnostics

Topic 18: Model Selection and Diagnostics

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Lecture 10 Multiple Linear Regression

STAT 100C: Linear models

Beam Example: Identifying Influential Observations using the Hat Matrix

Model Selection. Frank Wood. December 10, 2009

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

Multiple Regression Analysis. Part III. Multiple Regression Analysis

STATISTICS 479 Exam II (100 points)

Need for Several Predictor Variables

Lecture 1: Linear Models and Applications

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Labor Economics with STATA. Introduction to Regression Diagnostics

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Lecture 12 Inference in MLR

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Ch 2: Simple Linear Regression

Linear Regression Models

Regression, Ridge Regression, Lasso

Lecture One: A Quick Review/Overview on Regular Linear Regression Models

Chapter 7. Scatterplots, Association, and Correlation

Regression Diagnostics

Simple Linear Regression for the MPG Data

Introduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Linear Models, Problems

3. Diagnostics and Remedial Measures

Homoskedasticity. Var (u X) = σ 2. (23)

Lecture 9 SLR in Matrix Form

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Simultaneous Inference: An Overview

Linear Models in Machine Learning

L7: Multicollinearity

Section 3: Simple Linear Regression

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Diagnostics for Linear Models With Functional Responses

Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses

Chapter 5 Matrix Approach to Simple Linear Regression

Inferences for Regression

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Steven F. Arnold Professor of Statistics Penn State University

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Transcription:

Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32

Model Diagnostics: An Overview Basic diagnostics, review Model adequacy for a predictor variable added-variable plots Outlying Y observation and studentized/deleted residuals Outlying X observation and hat matrix/leverage values Influential cases Multicollinearity diagnostics and variance inflation factor W. Zhou (Colorado State University) STAT 540 July 6th, 2015 2 / 32

Model Assumptions Recall multiple linear regression model, for i = 1,..., n p 1 Y i = β 0 + β j X ij + ɛ i, ɛ i iid N(0, σ 2 ). j=1 Relationship between Y and X: E(Yi) = β 0 + p 1 j=1 βjxij. Homogeneous variance: Var(Yi) = Var(ɛ i) σ 2. Independence: Cov(ɛi, ɛ j) = Cov(Y i, Y j) = 0, i j. Normal distribution: Yi N(β 0 + p 1 j=1 βjxij, σ2 ). W. Zhou (Colorado State University) STAT 540 July 6th, 2015 3 / 32

Basic Diagnostics Exploratory Data Analysis Same as before: scatterplots, boxplots, histograms, summaries New: scatterplot matrices, split boxplots, brush/spin, coplots Linearity, Homoscedasticity, Normality Same as before: (externally studentized) residuals vs. each X, against Ŷ, and against time (also note: ACF plot), QQplot. Tests: e.g., F test for lack of fit, Breusch-Pagan, etc. (see Chapter 6.8 KNNL) Outliers, Influence, and Correlated Predictors Major focus of this set of notes W. Zhou (Colorado State University) STAT 540 July 6th, 2015 4 / 32

1 Review of Residuals 2 Detecting Outliers Outlying Response Outlying Predictor 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 5 / 32

Residuals Review Recall that the residuals e = (e 1,..., e n ) T = Y Ŷ = (I H)Y, where H is the hat/projection matrix. The mean of the residuals is e1 T = The variance-covariance matrix of the residuals is Var{e} = and is estimated by s 2 {e} = W. Zhou (Colorado State University) STAT 540 July 6th, 2015 6 / 32

Residuals Review Denote H = [h ij ] n i,j=1. Then we have variance of e i Var{e i } = σ 2 (1 h ii ), estimated by s 2 {e i } = MSE(1 h ii ) The covariance of e i and e j (i j) is Cov{e i, e j } = σ 2 (0 h ij ) = σ 2 h ij estimated by s 2 {e i } = MSE h ij. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 7 / 32

Studentized Residuals Review The variance of e i is not constant and the covariance of e i, e j is not zero. Observations with a large residual relatively to its standard deviation may be outlying. To compare n residuals, standardize so that the residuals are on the same scale. Studentized residuals (a.k.a. internally studentized) are defined as r i = ei s{e = e i. i} MSE(1 hii) If the model is appropriate, the studentized residuals {ri} have constant variance, while the ordinary residuals {e i} do not. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 8 / 32

Deleted Residuals Review Influence: the ith point can pull the line response surface strongly toward it if it is highly influential. This masks the point s influence. Strategy: define the residual for the ith point as the prediction error for that point using the model fit to the data omitting that point. Deleted residuals are defined as It can be shown that d i = Y i Ŷi( i) = d i = Y i Ŷi( i). e i 1 h ii = Y i Ŷi 1 h ii. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 9 / 32

Deleted Residuals Review Let X i = (1, X i1,..., X i,p 1 ) (a row vector). Let X i and MSE i denote the design matrix and the MSE with the ith row (observation) deleted. Recall that s 2 {pred} = MSE(1 + X h (X T X) 1 Xh T ), one can show W. Zhou (Colorado State University) STAT 540 July 6th, 2015 10 / 32

Studentized Deleted Residuals Review The studentized deleted residuals (a.k.a. externally studentized) are defined as, for i = 1,..., n, t i = d i s{d i } = ei 1 h ii = MSE i 1 h ii Note that (n p)mse = (n p 1)MSE i + e i MSE i (1 h ii ). e2 i 1 h ii, so and there is no need to fit n separate regressions. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 11 / 32

Outlying Y Observation Outlying observations are well separated from the remainder of the data. Consider three types of outlying observations: 1 Outlying not in X but in Y X: Usually not influential. 2 Outlying in X but not Y X: Usually not influential. 3 Outlying in X and Y X: Can be very influential. Goal: Identify outlying and influential observations. The task is relatively straightforward for 1-2 predictor variables but becomes more challenging for more than 2 predictor variables. Hidden Extrapolation. Basic idea: Outlying observations may involve large residuals and often have large impact on the model fit. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 12 / 32

Identifying Outlying Y Observations Basic idea: the ith observation is outlying in Y if t i is large. Under H 0 : observation i is not outlying in Y The decision rule is t i = d i s{d i } t n p 1. Need Bonferroni adjustment, why? n multiple comparisons. For most n and p, t1 α 2n ;n p 1 at the α = 5% level is greater than 3. In practice, t i > 3 then observation i is a possible outlier. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 13 / 32

Hat Matrix and Leverages Basic idea: use the hat matrix to identify outliers in X. Recall that H = [h ij ] n i,j=1 and h ii = X i (X T X) 1 X T i. The diagonal elements hii are called leverages. Properties of leverages hii: 1 0 h ii 1 (can you show this? ) 2 n i=1 h ii = p h ni=1 h = ii = p (show it). n n 3 h ii is a measure of the distance between X values of the ith observation and the means of the X values for all n observations (show: h ii = 1/n + (x 1i x 1 ) T (X cx c) 1 (x 1i x 1 ), where X c is the centerred design matrix X.) W. Zhou (Colorado State University) STAT 540 July 6th, 2015 14 / 32

Identifying Outlying X Observation Effects of hat values: if the ith data point is outlying in X with a high leverage h ii, it can influence the fitted response Ŷi. A higher leverage hii results in more weight of Y i in determining Ŷi (as Ŷ = HY ). A higher leverage hii results in a smaller s{e i}, as Ŷi is closer to Yi. Connections to nonparametric smoothing. What is a bad hat value? 1 If h ii > 2p/n, then observation i is considered to be outlying in X. 2 Moderate leverage if h ii [0.2, 0.5) and high leverage if h ii [0.5, 1]. 3 Draw a histogram, stem-and-leaf, or other plot of h ii. Outlying observations tend to be large and there tends to be a gap between the outlying group and other leverage values. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 15 / 32

Hidden Extrapolation H can be used to identify hidden extrapolation for large p. It is possible for a point Xnew to have each X new,i (i = 1,..., p) within the corresponding marginal range of X, but for the p-dim point X new to lie outside the support region of the empirical joint distribution of X. Can be very difficult to detect, especially if no 2-way scatterplot or 3-way brush/spin illustrates it. Consider h new,new = X new (X T X) 1 X T new. If h new,new max i h ii, then it is fine to make predictions at X new. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 16 / 32

1 Review of Residuals 2 Detecting Outliers Outlying Response Outlying Predictor 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 17 / 32

Identifying Influential Observations An observation is influential if its deletion leads to major changes in the fitted regression. Not all outlying observations are influential. Main idea: Leave-one-out approach like the deleted residuals. Consider 3 measures: 1 DFFITS 2 Cook s distance 3 DFBETAS No diagnostics identify all possible problems. For example, leave-one-out methods do not address multiple influential observations. More complicated methods are possible: bootstrap, highd-dimensional situations. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 18 / 32

DFFITS DFFITS measures the effect of the ith case on fitted value of Y i DF F IT S i = Ŷi Ŷ i MSE i h ii and we can show DF F IT S i = t i where t i is the ith studentized deleted residual. hii 1 h ii For small to medium data sets, DF F IT S i > 1 implies that the ith observation may be influential. For large data sets, DF F IT S i > 2 p/n implies that the ith observation may be influential. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 19 / 32

Cook s Distance Cook s distance measures the influence of the ith observation on all n fitted values. n j=1 D i = (Ŷj Ŷj( i)) 2. p MSE i and show D i = where r i is the studendized residual. ( ) ( r 2 i hii ) p 1 h ii W. Zhou (Colorado State University) STAT 540 July 6th, 2015 20 / 32

Cook s Distance Cook s D is large when r i is large and h ii is large D i < F p,n p,0.2 (the 20th percentile) is no concern D i > F p,n p,0.5 indicate substantial influence What about between? Crude rule of thumb: If Di > 1, investigate the ith observation as possibly influential. If p, what happens? W. Zhou (Colorado State University) STAT 540 July 6th, 2015 21 / 32

DFBETAS DFBETAS measures the influence of the ith observation on a single coefficient β k. DF BET AS k(i) = ( ˆβ k ˆβ k( i) )/ MSE i c kk where c kk = [(X T X) 1 ] kk Recall that V ar( ˆβ) = σ 2 (X T X) 1. Larger DF BET AS k(i) indicates larger impact of observation i on ˆβ k. For small to medium data sets, if DF BET ASk(i) > 1, then the ith observation may be influential. For large data sets, if DF BET ASk(i) > 2/ n, then the ith observation may be influential. The sign of DF BET ASk(i) tells whether inclusion of observation i leads to an increase (+) or decrease (-) in ˆβ k. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 22 / 32

1 Review of Residuals 2 Detecting Outliers Outlying Response Outlying Predictor 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 23 / 32

Multicollinearity When the predictor variables are correlated among themselves, multicollinearity among them is said to exist. Consider two extreme cases. Uncorrelated predictor variables. Predictor variables are perfectly correlated. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 24 / 32

Linearly Independent Predictor Variables Consider Y = β 0 + β 1 X 1 + β 2 X 2 + ɛ. Suppose X1 X 2, i.e. ˆ Corr(X 1, X 2) = 0. We can show ˆβ 1 = n i=1 (Yi Ȳ )(Xi1 X 1) n i=1 (Xi1 X 1) 2, ˆβ2 = n i=1 (Yi Ȳ )(Xi2 X 2) n i=1 (Xi2 X 2) 2. The LS estimate of β1 is not affected by X 2 and vice versa. Also, the order in which the predictor variables are put in the model is inconsequential. Interpretation of regression coefficients is clear: β 1 is the expected change in Y for one unit increase in X 1 with X 2 held constant. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 25 / 32

Predictor Variables are Linearly Dependent Again, suppose Y = β 0 + β 1 X 1 + β 2 X 2 + ɛ. But X 2 = 2X 1 + 1. Suppose β 0 = 3, β 1 = 2, β 2 = 5. Then all the following models give the same fit for Y : Y = 3 + 2X1 + 5X 2 + ɛ. Y = 8 + 12X1 + ɛ. Y = 2 + 6X2 + ɛ. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 26 / 32

What is still fine. Prediction of Y is fine within the model/data scope, but unreliable outside the model/data scope. What is not. The β s are not unique because X is reduced rank (why?) and X T X is not invertible. Interpretation of the effect of the jth predictor holding all other variables constant is difficult. A regression coefficient may no longer reflect the effect of its corresponding predictor variable. Even worse: multicollinearity does not violate any model assumptions! W. Zhou (Colorado State University) STAT 540 July 6th, 2015 27 / 32

Concerns with Multicollinearity Multicollinearity could be between 3 or more variables, rather than just a correlated pair. That would be harder to detect. Effects of multicollinearity on the inference of regression coefficients: Large changes in the fitted ˆβk when another X is added or deleted Small changes in the data lead to very large changes in ˆβ Large s{ ˆβk }. Makes the ˆβ k seem non-significant even though the predictors are jointly significant and R 2 is large. More difficult to interpret ˆβk as the effect of X k on Y because the other X s cannot be held constant. Estimated coefficients may have wrong sign or implausible magnitudes. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 28 / 32

Some Diagnostics for Multicollinearity Multicollinearity is harmless for estimation of mean response and prediction of new observation at X h. Assuming no extrapolation! Diagnosing multicollinearity Large changes in ˆβ s when a predictor (or an observation) is added or deleted. Important predictors are not statistically significant (large p-values) in individual tests. Wide confidence intervals for β s corresponding to important predictor variables. The sign of ˆβ is counter-intuitive. Predictors are highly correlated. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 29 / 32

Variance Inflation Factor (VIF) Variance inflation factor (VIF) for ˆβ k : 1 VIF k = 1 Rk 2, k = 1,..., p 1 where R 2 k is the R2 for a regression of X k on the other predictor variables. VIF measures the increase in the standard error of βk due to the presence of other variables. If maxk VIF k > 10, multicollinearity may have a large impact on the inference. If p 1 j=1 VITj > p 1, there may be serious multicollinearity problems (for large p). W. Zhou (Colorado State University) STAT 540 July 6th, 2015 30 / 32

Variance Inflation Factor (VIF) As R 2 k is the coefficient of multiple determination R2 of the model σ 2 { ˆβ k } σ 2 VIF k = σ2. 1 R k 2 p 1 X ik = β 0 + X ij + ɛ. j k 1 When R 2 k decreases, σ2 { ˆβ k } decreases. 2 When R 2 k increases, σ2 { ˆβ k } increases. In fact, VIF k = (n 1) ( (X c X c ) 1) kk where X c is the scaled design matrix. (Can you show this?) W. Zhou (Colorado State University) STAT 540 July 6th, 2015 31 / 32

Some Remedial Measures for Multicollinearity Classical method. Drop one or more predictor variables from the model (selection, frontier of statistics). For polynomial or interaction regression models, use centered predictor variables X ik X k to reduce multicollinearity (Gram-Schmidt transformation, why?) Modern method. Create new predictor variables: principal component regression, PLRS, dimension-reductions. Use shrinkage regression such as ridge, LASSO, SCAD, group LASSO, adaptive LASSO, ˆβ R = (X T X + λi) 1 X T Y Although ˆβR has a smaller variance, it is a biased estimator of β. Going into very frontier of Statistical Machine Learning and High-dimensional Inference. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 32 / 32