Diagnostics and Remedial Measures: An Overview

Similar documents
Formal Statement of Simple Linear Regression Model

Diagnostics and Remedial Measures

Chapter 3. Diagnostics and Remedial Measures

3. Diagnostics and Remedial Measures

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Remedial Measures, Brown-Forsythe test, F test

STAT5044: Regression and Anova

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

STAT 571A Advanced Statistical Regression Analysis. Chapter 3 NOTES Diagnostics and Remedial Measures

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Concordia University (5+5)Q 1.

The Model Building Process Part I: Checking Model Assumptions Best Practice

One-way ANOVA Model Assumptions

Lectures on Simple Linear Regression Stat 431, Summer 2012

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

Unit 10: Simple Linear Regression and Correlation

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

Lecture 10 Multiple Linear Regression

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Ch 2: Simple Linear Regression

STAT Checking Model Assumptions

1 One-way Analysis of Variance

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Chapter 2. Continued. Proofs For ANOVA Proof of ANOVA Identity. the product term in the above equation can be simplified as n

Lecture 1: Linear Models and Applications

Diagnostics of Linear Regression

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 11 - Lecture 1 Single Factor ANOVA

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Math 423/533: The Main Theoretical Topics

Linear models and their mathematical foundations: Simple linear regression

Topic 23: Diagnostics and Remedies

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

F-tests and Nested Models

STAT 540: Data Analysis and Regression

Statistics for Managers using Microsoft Excel 6 th Edition

Review of Statistics 101

Advanced Regression Topics: Violation of Assumptions

Applied Econometrics (QEM)

6. Multiple Linear Regression

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Sociology 6Z03 Review II

Chapter 16. Simple Linear Regression and dcorrelation

Ch. 1: Data and Distributions

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Homework 2: Simple Linear Regression

holding all other predictors constant

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

Inferences for Regression

Chapter 1. Linear Regression with One Predictor Variable

Regression Models for Quantitative and Qualitative Predictors: An Overview

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Assessing Model Adequacy

Confidence Intervals, Testing and ANOVA Summary

Stat 427/527: Advanced Data Analysis I

Matrix Approach to Simple Linear Regression: An Overview

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression Analysis

Ch 3: Multiple Linear Regression

Two-Way Factorial Designs

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Mathematics for Economics MA course

Multiple Linear Regression

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

What to do if Assumptions are Violated?

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

STAT 4385 Topic 06: Model Diagnostics

STAT2012 Statistical Tests 23 Regression analysis: method of least squares

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Simple Linear Regression

Lec 3: Model Adequacy Checking

Inference for Regression

Regression Models - Introduction

Inference for the Regression Coefficient

Basic Business Statistics 6 th Edition

STAT22200 Spring 2014 Chapter 8A

Chapter 4: Regression Models

Correlation and regression

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Design & Analysis of Experiments 7E 2009 Montgomery

Simple Linear Regression for the Advertising Data

Section 4.6 Simple Linear Regression

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Regression diagnostics

Formulary Applied Econometrics

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Correlation Analysis

Remedial Measures Wrap-Up and Transformations-Box Cox

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Confidence Interval for the mean response

Applied linear statistical models: An overview

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Transcription:

Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics Graphical techniques Hypothesis testing Remedial measures Transformation Later: more about all this for multiple regression W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 50

Model Assumptions Recall simple linear regression model. Y i = β 0 + β 1 X i + ɛ i, ɛ i iid N(0, σ 2 ), for i = 1,..., n. A linear-line relationship between E(Y ) and X: Homogeneous variance: Independence: Normal distribution: W. Zhou (Colorado State University) STAT 540 July 6th, 2015 2 / 50

Ramifications If Assumptions Violated Recall simple linear regression model. Nonlinearity Linear model will fit poorly Parameter estimates may be meaningless Non-independence Parameter estimates are still unbiased Standard errors are a problem and thus so is inference Nonconstant variance Parameter estimates are still unbiased Standard errors are a problem Non-normality Least important, why? Inference is fairly robust to non-normality Important effects on prediction intervals W. Zhou (Colorado State University) STAT 540 July 6th, 2015 3 / 50

Model Diagnostics Reliable inference hinges on reasonable adherence to model assumptions Hence it is important to evaluate the FOUR model assumptions, that is, to perform model diagnostics. The main approach to model diagnostics is to examine the residuals (thanks to the additive model assumption) Consider two approaches. Graphical techniques: More subjective but quick and very informative for an expert. Hypothesis tests: More objective and comfortable for amateurs, but outcomes depend on assumptions, sensitivity. Tendency to use as a crutch. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 4 / 50

Graphical Techniques At this point in the analysis, you have already done EDA. 1D exploration of X and Y. 2D exploration of X and Y. Not very effective for model diagnostics except in drastic cases Recall the definition of residual e i = Y i Ŷi, where i = 1,..., n e i can be treated as an estimate of the true error ɛ i = Y i E(Y i ) iid N(0, σ 2 ) e i can be used to check normality, homoscedasticity, linearity, and independence. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 5 / 50

Properties of Residuals Mean: ē = Variance: Nonindependence: MSE = SSE n n n 2 = i=1 e2 i n 2 = i=1 (e i ē) 2 = s 2. n 2 When the sample size n is large, however, residuals can be treated as independent. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 6 / 50

Standardized Residuals For diagnostics there are superior choices to the ordinary residuals Standardized (KNNL: semi-studentized ) residuals: V ar(ɛ i) = σ 2 therefore is is natural to apply the standardization e i = But each e i has a different variance.. Use this fact to derive superior type of residuals below W. Zhou (Colorado State University) STAT 540 July 6th, 2015 7 / 50

Hat Values Ŷ i = The h ij are called hat values. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 8 / 50

Deriving the Variance of Residuals Using Ŷi = j h ijy j we obtain e i = Therefore (since the Y s are independent) V ar{e i } = W. Zhou (Colorado State University) STAT 540 July 6th, 2015 9 / 50

Continuing to Derive Variance of Residuals Using V ar{e i } = σ [(1 2 h ii ) 2 + ] j i h2 ij, we have h 2 ij = h ii (show it in HW.) Finally, j V ar{e i } = σ 2 (1 h ii ) 2 + j i h 2 ij = σ 2 1 2h ii + h 2 ii + j i h 2 ij = σ 2 1 2h ii + j h 2 ij = σ 2 (1 2h ii + h ii ) = σ 2 (1 h ii ) W. Zhou (Colorado State University) STAT 540 July 6th, 2015 10 / 50

Studentized Residuals Now we may scale each residual separately by its own standard deviation The (internally) studentized residual is r i = e i / MSE(1 h ii ) There is still a problem: Imagine that Y i is a severe outlier Yi will strongly pull the regression line toward it ei will understate the distance between Y i and the true regression line The solution is to use externally studentized residuals... W. Zhou (Colorado State University) STAT 540 July 6th, 2015 11 / 50

Studentized Residuals Now we may scale each residual separately by its own standard deviation The (internally) studentized residual is r i = e i / MSE(1 h ii ) There is still a problem: Imagine that Y i is a severe outlier Yi will strongly pull the regression line toward it ei will understate the distance between Y i and the true regression line The solution is to use externally studentized residuals... W. Zhou (Colorado State University) STAT 540 July 6th, 2015 11 / 50

Studentized Deleted Residuals To eliminate the influence of Y i on the misfit at the ith point, fit the regression line based on all points except the ith. Define the prediction at X i using this deleted regression as Ŷi(i) The deleted residual is d i = Y i Ŷi(i) The studentized deleted residual is t i = d i /ŝ{d i } = Y i Ŷi(i) MSE(i) /(1 h ii ) No need to fit n deleted regressions, we can show that d i = e i /(1 h ii ) (n 2)MSE = (n 3)MSE (i) + e 2 i /(1 h ii ) Also, t i has a t-distribution: t i t n 3 W. Zhou (Colorado State University) STAT 540 July 6th, 2015 12 / 50

Residual Plots Residual plot is a primary graphic diagnostic method. Departures from model assumptions can be difficult to detect directly from X and Y. Use the externally standardized residuals Some key residual plots: Plot ti against predicted values Ŷi (Not Yi) detect nonconstant variance detect nonlinearity detect outliers Plot ti against X i. In simple linear regression this is same as above (Why?) In multiple regression will be useful to detect partial correlation Plot ti versus other possible predictors (e.g., time) Detect important lurking variable Plot ti versus lagged residuals Detect correlated errors QQ-plot or normal probability (PP-) plot of ti. Detect non-normality W. Zhou (Colorado State University) STAT 540 July 6th, 2015 13 / 50

Nonlinearity of Regression Function Plot t i against Ŷi (and X i for multiple linear regressions). Random scatter indicates no serious departure from linearity. Banana indicates departure from linearity. Could fit nonparametric smoother to residual plot to aid detection Example: Curved relationship (KNNL Figure 3.4(a)). Plotting Y vs. X is not nearly as effective for detecting nonlinearity because trend has not been removed Logically, you are investigating model assumptions not marginal effect. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 14 / 50

Nonlinearity of Regression Function Plot t i against Ŷi (and X i for multiple linear regressions). Random scatter indicates no serious departure from linearity. Banana indicates departure from linearity. Could fit nonparametric smoother to residual plot to aid detection Example: Curved relationship (KNNL Figure 3.4(a)). Plotting Y vs. X is not nearly as effective for detecting nonlinearity because trend has not been removed Logically, you are investigating model assumptions not marginal effect. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 14 / 50

Nonconstant Error Variance Plot t i against Ŷi (and X i for multiple linear regressions). Random scatter indicates no serious departure from constant variance. Could fit nonparametric smoother to this plot to aid detection Funnel indicates non-constant variance. Example: KNNL Figure 3.4(c). Often both nonconstant variance and nonlinearity exist. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 15 / 50

Nonindependence of Error Terms Possible causes of nonindependence. Observations collected over time and/or across space. Study done on sets of siblings. Departure from independence. For example, Trend effect (KNNL Figure 3.4(d), 3.8(a)). Cyclical nonindependence (KNNL Figure 3.8(b)). Plot t i against other covariate, such as time. Autocorrelation function plot ( acf() ) W. Zhou (Colorado State University) STAT 540 July 6th, 2015 16 / 50

Nonnormality of Error Terms Box plot, histogram, stem-and-leaf plot of t i. QQ (quantile-quantile) plot. 1 Order the residuals: t (1) t (2) t (n). 2 Find the corresponding rankits : z (1) z (2) z (n), where for k = 1,..., n, z (k) = MSE z ( ) k 0.375 n + 0.25 is an approximation of the expected value of the kth smallest observation in a normal random sample. 3 Plot t (k) against z (k). QQ plot should be approximately linear if normality holds S shape means distribution of residuals has light ( short ) tails Backwards S means heavy tails C or backwards C means skew It is a good idea to examine other possible problems first. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 17 / 50

Presence of Outliers An outlier refers to an extreme observation. Some diagnostic methods Box plot of ti. Plot ti against Ŷi (and Xi). ti which are very unlikely compared to the reference t-distribution could be called outliers Modern cluster analysis methods Outliers may convey important information. An error. A different mechanism is at work. A significant discovery. Temptation to throw away outliers because they may strongly influence parameter estimates. Doesn t mean that the model is right and the data point is wrong The data point is right and the model is wrong W. Zhou (Colorado State University) STAT 540 July 6th, 2015 18 / 50

Graphical Techniques: Remarks We generally do not plot residuals (t i ) against response (Y i ). Why? Residual plots may provide evidence against model assumptions, but do not generally validate assumptions. For data analysis in practice: Fit model and check model assumptions (an iterative process). Generally do not include residual plots in a report, but include a sentence or two such as Standard diagnostics did not indicate any violations of the assumptions for this model. For this class, always include residual plots for homework assignments so you can learn the methods No magic formulas. Decision may be difficult for small sample size. As much art as science. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 19 / 50

Diagnostic Methods Based on Hypothesis Testing Tests for linearity: F test for lack of fit (Section 3.7). Tests for constancy of variance (Section 3.6): Brown-Forsythe test. Breusch-Pagan test. Levene s test. Bartlett s test. Tests for independence (Chapter 12): Runs test. Durbin-Watson test. Tests for normality (Section 3.5). χ 2 test. Kolmogorov-Smirnov test. Tests for outliers (Chapter 10). W. Zhou (Colorado State University) STAT 540 July 6th, 2015 20 / 50

F Test for Lack of Fit Residual plots can be used to assess the adequacy of a simple linear regression model. A more formal procedure is a test for lack of fit using pure error. Need repeat groups For a given data set, suppose we have fitted a simple linear regression model and computed regression error sum of squares SSE = n (Y i Ŷi) 2. i=1 These deviations Y i Ŷi could be due to either random fluctuations around the linear line or an inadequate model W. Zhou (Colorado State University) STAT 540 July 6th, 2015 21 / 50

Pure Error and Lack of Fit The main idea is to take several observations on Y for the same X, independently, to distinguish the error due to random fluctuations around the linear line and the error due to lack of fit of the simple linear regression model. The variation among the repeated measurements is called pure error. The remaining error variation is called lack of fit Thus we can partition the regression SSE into two parts: SSE = SSPE + SSLF where SSPE = SS Pure Error and SSLF = SS Lack of Fit. Actually, we are comparing a Linear Function with a Simple function. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 22 / 50

Pure Error and Lack of Fit One possibility is that pure error is comparatively large and the linear model seems adequate. That is, pure error is a large part of the SSE. The other possibility is that pure error is comparatively small and linear model seems inadequate. That is, pure error is a small part of the regression error and error due to lack of fit is then a large part of the SSE. If the latter case holds, there may be significant evidence of lack of fit. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 23 / 50

Notation Models (R notation): Null (N): Y 1, common mean model Linear regression is Reduced (R): Y X, regression model ANOVA is Full (F): Y factor(x), separate mean model Notation: Y ij are the data, where j indexes groups and i indexes individuals. (Sums will be taken over all available indices). Ȳ is the grand mean Ȳ j is the jth group mean Ŷ ij are the fitted values using the regression line. Note that Ȳj are the fitted values under the ANOVA model that fits group means, Y factor(x) W. Zhou (Colorado State University) STAT 540 July 6th, 2015 24 / 50

Sums of Squares Recall: All sums are over both i and j except as noted. SST O = (Y ij Ȳ )2 SSR R = (Ŷij Ȳ )2 SSE R = (Y ij Ŷij) 2 SST O = SSR R + SSE R SSP E = SSE F = (Y ij Ȳj) 2 SSLF = (Ȳj Ŷij) 2 = j n j(ȳj Ŷij) 2 SSE R = SSP E + SSLF W. Zhou (Colorado State University) STAT 540 July 6th, 2015 25 / 50

LOF ANOVA Table One way to summarize the LOF test is by ANOVA: Source df SS MS Regression 1 SSR SSR/1 Lack of Fit r 2 SSLF MS LF =SSLF/(r 2) Pure Error n r SSPE MS P E =SSPE/(n r) Total n 1 SSTO E(MSP E) = σ 2 and E(MS LF ) = σ 2 + F-test for lack of fit is therefore: r n i (µ i (β 0 +β 1 x i )) 2 i=1 r 2 W. Zhou (Colorado State University) STAT 540 July 6th, 2015 26 / 50

LOF as model comparison In fact, the above lack of fit test is doing model comparison our desired model Y X to the potentially better model Y factor(x) which would be required if the linear model fit poorly. Apply the GLT to compare these two models (are they nested?) F LOF = SSE / R SSE F SSEF df R df F df F Notice SSE R SSE F = SSE R SSP E = SSLF and SSE F = SSP E so F LOF = MSLF/MSP E and LOF ANOVA F-test is same as model comparison by F LOF. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 27 / 50

Lack of Fit in R anova(reduced.lm) Df Sum Sq Mean Sq F value Pr(>F) x 1 60.95 60.950 193.07 1.395e-09 #<--------------------SSR_R Residuals 14 4.42 0.316 #<------------------SSE_R full.lm=lm(y~factor(x),purerr) anova(full.lm) Df Sum Sq Mean Sq F value Pr(>F) factor(x) 7 65.272 9.3245 758.6 1.198e-10 #<---------------SSR_F Residuals 8 0.098 0.0123 #<-----------------SSE_F = SSPE anova(reduced.lm,full.lm) Model 1: y ~ x Model 2: y ~ factor(x) Res.Df RSS Df Sum of Sq F Pr(>F) 1 14 4.4196 2 8 0.0983 6 4.3213 58.594 3.546e-06 #<-------------------F_LOF (= F_GLT here) Therefore F LOF = (SSE R SSE F )/(df SSE,R df SSE,F ) SSE F /df SSE,F (4.42 0.098)/(14 8) = = (4.3213/6)/(0.0983/8) = 58.594 0.098/8 W. Zhou (Colorado State University) STAT 540 July 6th, 2015 28 / 50

LOF p-value Compare F = 58.594 with F (6, 8), p-value = P (F (6, 8) 58.594) < 0.001. There is very strong evidence of a lack of fit. ANVOA Table for LOF Source df SS MS Regression 1 SSR = 60.950 MSR = 60.950 Lack of Fit 6 SSLF = 4.322 MSLF = 0.720 Pure Error 8 SSPE = 0.098 MSPE = 0.0123 Total 15 SSTO = 65.370 W. Zhou (Colorado State University) STAT 540 July 6th, 2015 29 / 50

y LOF by hand The data consist of 16 observations with X repeated at several values: X: 4.1 5.1 5.1 5.1 6.3 6.3 7.0 7.9 Y : 6.3 7.3 7.4 7.4 7.8 7.7 8.4 10.8 X: 7.9 7.9 8.6 9.4 9.4 9.4 10.2 10.2 Y : 10.6 10.9 11.0 11.1 10.9 11.0 12.6 12.8 7 8 9 10 11 12 13 4 5 6 7 8 9 10 x W. Zhou (Colorado State University) STAT 540 July 6th, 2015 30 / 50

Computing SS Pure Error by hand There are 5 repeat groups out of 8 groups: i 1 2 3 4 5 X 5.1 6.3 7.9 9.4 10.2 4.1 7.0 8.6 Y 7.3 7.8 10.8 11.1 12.6 6.3 8.4 11.0 7.4 7.7 10.6 10.9 12.8 7.4 10.9 11.0 n i 3 2 3 3 2 1 1 1 Compute SSPE as + 3 2 3 (Y i1 Ȳ1) 2 + (Y i2 Ȳ2) 2 + (Y i3 Ȳ3) 2 i=1 3 (Y i4 Ȳ4) 2 + i=1 i=1 i=1 2 (Y i5 Ȳ5) 2 i=1 W. Zhou (Colorado State University) STAT 540 July 6th, 2015 31 / 50

Computing SS Pure Error by hand Y ij : the ith observation for the jth group. n j : the number of observations in the jth group. c: the number of repeat groups. In general, SSP E = df P E = n c j (Y ij Ȳj) 2, j=1 i=1 c (n j 1) = n r j=1 In the example, n 1 = 3, n 2 = 2, n 3 = 3, n 4 = 3, n 5 = 2, c = 5 and hence df PE is 2 + 1 + 2 + 2 + 1 = 8 and SSPE = 0.098. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 32 / 50

Computing SS Lack of Fit by Hand By subtraction. In the example, SSE = 4.42 from the regression ANOVA table on df = 14. SSP E = 0.098 on df = 8. Thus SSLF = SSE SSP E = 4.42 0.098 = 4.322. df LF = 14 8 = 6. Note SSLF = c n j (Ȳj Ŷij) 2 j=1 i=1 where k denotes the number of groups (here c = 8). W. Zhou (Colorado State University) STAT 540 July 6th, 2015 33 / 50

Lack of Fit Test by Hand For testing H 0 : No lack of fit (here, simple linear regression is adequate) versus H a : Lack of fit (here, simple linear regression is inadequate). Use the fact that, under H 0, F = SSLF/df LF SSP E/df P E F (df LF, df P E ). In the example, the observed F test statistic is F = 4.332/6 0.098/8 = 58.62. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 34 / 50

Lack of Fit Test: Remarks Note that the R 2 = 93.2% is high, but according to the LOF test, the model is still inadequate. Possible remedy is to use polynomial regression. The repeats need to be independent measurements. If there are no repeats at all, some consider approximate repeat groups by binning the X s close to one another into groups. In this case, the LOF test is an approximate test. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 35 / 50

Remedial Measures For simple linear regression, consider two basic approaches. Abandon the current model and look for a better one. Transform the data so that the simple linear regression model is appropriate. Nonlinearity of regression function: Transformation (X or Y or both) Polynomial regression. Nonlinear regression. Nonconstancy of error variance: Transformation (Y) Weighted least squares. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 36 / 50

Remedial Measures Simultaneous nonlinearity and nonconstant variance Sometimes works to... Transform Y to fix variance then Transform X to fix linearity Nonindependence of error terms: First-order differencing. Models with correlated error terms. Nonnormality of error terms. Transformation. Generalized linear models. Presence of outliers: Removal of outliers (with extreme caution). Analyze both with and without outliers Robust estimation. New model. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 37 / 50

Example: bacteria Data consist of number of surviving bacteria after exposure to X-rays for different periods of time. Let t denote time (in number of 6-minute intervals) and let n denote number of surviving bacteria (in 100s) after exposure to X-rays for t time. t 1 2 3 4 5 6 7 8 n 355 211 197 166 142 166 104 60 t 9 10 11 12 13 14 15 n 56 38 36 32 21 19 15 W. Zhou (Colorado State University) STAT 540 July 6th, 2015 38 / 50

Example: bacteria n We fit a simple linear regression model to the data. Residuals vs Fitted 50 100 150 200 250 300 350 Residuals 50 0 50 100 15 8 1 2 4 6 8 10 12 14 t 0 50 100 150 200 250 Fitted values It appears that the linear-line model is not adequate. The assumption of correct model seems to be violated. What to do? W. Zhou (Colorado State University) STAT 540 July 6th, 2015 39 / 50

Example: bacteria Consider nonlinear model (why nonlinear?) n t = n 0 e βt, where t is time, n t is the number of bacteria at time t, n 0 is the number of bacteria at t = 0, and β < 0 is a decay rate. Take natural logs of both sides of the model, we have, by setting α = ln(n 0 ). ln(n t ) = ln(n 0 ) + ln(e βt ) = ln(n 0 ) + βt = α + βt, That is, we log-transformed n t and the result is a usual linear-line model! W. Zhou (Colorado State University) STAT 540 July 6th, 2015 40 / 50

Example: bacteria The transformed data are as follows. t 1 2 3 4 5 6 7 8 ln(n) 5.87 5.35 5.28 5.11 4.96 5.11 4.64 4.09 t 9 10 11 12 13 14 15 ln(n) 4.03 3.64 3.58 3.47 3.04 2.94 2.71 Residuals vs Fitted log(n) 3.0 3.5 4.0 4.5 5.0 5.5 Residuals 0.2 0.0 0.2 0.4 10 6 2 2 4 6 8 10 12 14 t 3.0 3.5 4.0 4.5 5.0 5.5 Fitted values W. Zhou (Colorado State University) STAT 540 July 6th, 2015 41 / 50

Example: bacteria Based on the log-transformed counts, we can fit the model to get the LS estimates ˆα = 6.029, ˆβ = 0.222, sln(n) t = 0.1624, R 2 = 0.9757. Inference for β is straightforward. [Unit? Interpretation?] Inference for α is straightforward. [Unit? Interpretation?] Inference for n 0 is not straightforward. Since α = ln(n0), n 0 = e α. Given ˆα = 6.029, we obtain an estimate of n0 ˆn 0 = e ˆα = 415.30. But the estimate ˆn0 is biased (i.e., E(ˆn 0) n 0). W. Zhou (Colorado State University) STAT 540 July 6th, 2015 42 / 50

Transformation: Remarks The purpose of transformation is to meet the assumptions of the linear regression analysis. Linear Models Nonlinear Models (L1) Y = β 0 + β 1 X (L2) Y = β 0 + β 1 X 2 (L3) Y = β 0 + β 1 e X (N1) Y = αe βx ln(y ) = ln(α) + βx (N2) Y = αx β ln(y ) = ln(α) + β ln(x) (N3) Y = α + e βx In (L2) and (L3), the relationship between X and Y is not linear, but the model is linear in the parameters and hence the model is linear. In (N1) and (N2), the model is nonlinear, but can be log-transformed to a linear model (i.e., linearized). In (N3), the model is nonlinear and cannot be linearized. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 43 / 50

Transformation: Remarks Transformation could be for X, or Y, or both. Common transformations are log 10 {Z}, ln{z}, Z, and Z 2. Less common transformations include 1/Z, 1/Z 2, arcsin Z, and log 2 {Z}. Another use of transformation is to control unequal variance. Example: If Y are counts, then often larger variances are associated with larger counts. In this case, Y transformation can help stabilize variance. Example: If Y are proportions (of successes among trials), then V ar(y ) = π(1 π)/n, which depends on the true success rate π. Residual plots would reveal the unequal variance problem. In this case, arcsin( Y ) transformation can help stabilize variance. Rule of thumbs: positive data - use log Y ; data are proportions - use arcsin Y ; data are counts - use Y W. Zhou (Colorado State University) STAT 540 July 6th, 2015 44 / 50

Transformation: Remarks Ideally, theory should dictate what transformation to use as in the bacteria count example. But in practice, transformation is usually chosen empirically. Transforming Y can affect both linearity and variance homogeneity, but transforming X can affect only linearity. Sometimes solving one problem can create another. For example, transforming Y to stabilize variance causes curved relationship. Usually it is best to start with a simple transformation and experiment. It happens often that a simple transformation allows the use of the linear regression model. When needed, use more complicated methods such as nonlinear regression. Transformations are useful not only for simple linear regression, but also for multiple linear regression and design of experiment. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 45 / 50

Variance Stabilizing Transformations If Var(Y ) = g(e(y )), then a variance stabilizing transform is h(y) (g(z)) 1/2 dz Example: if var mean, then g(z) = z and h(y) = y if var mean 2, then g(z) = z 2 and h(y) = ln(y) if var mean(1-mean), then g(z) = z(1 z), then h(y) = sin 1 y W. Zhou (Colorado State University) STAT 540 July 6th, 2015 46 / 50

Box-Cox Transformation Consider a transformation ladder for Z = X or Y. λ 2 1 0.5 0 0.5 1 2 Z 1 1 Z 2 Z log(z) Z Z Z 2 1 Z Moving up or down the ladder (starting at 1) changes the residual plots in a consistent manner. Use only these choices for manual search, too! Box-Cox method is a formal approach to selecting λ to transform Y. The idea is to consider Y λ i = β 0 + β 1 X i + ɛ i. Estimate λ (along with β 0, β 1, σ 2 ) using maximum likelihood. Box-Cox method may give ˆλ = 0.512. Round to the nearest interpretable value ˆλ = 0.5. If ˆλ 1, do not transform. In R, boxcox gives a 95% CI for λ. Choose an interpretable ˆλ within the CI. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 47 / 50

Box-Cox Transformation Box-Cox family of transformations Z = Y λ I(λ 0) + ln(y ) Estimate λ using maximum likelihood or use the variance - mean relationship Using variance-mean relationship to estimate g(y ) works for 2 groups or many groups (ANOVA presented in near future) Compute Ȳ and S Y for each group Regress log(sy ) on log(ȳ ) and estimate the slope β Use transformation Y λ with λ = 1 β W. Zhou (Colorado State University) STAT 540 July 6th, 2015 48 / 50

model for variability σ = Var(Y ) = kµ β Box-Cox Transformation When does this work or Var(Y ) = σ 2 = [kµ β ] 2 := g(µ) Use the delta method to obtain the transformation: Z = g(y ) = Y λ Consider the Taylor expansion Z = g(y ) g(µ) + (Y µ)g (µ) then an approximation for Var(g(µ)) is Var(g(Y )) [g (µ)] 2 Var(Y ) which is the famous Delta Method W. Zhou (Colorado State University) STAT 540 July 6th, 2015 49 / 50

Box-Cox Transformation For Z = g(y ) = Y λ we have From the Delta method dz dy = g (Y ) = λy λ 1 Var(Z) = (λµ λ 1 ) 2 (kµ β ) 2 = k 2 λ 2 µ 2(λ 1+β) When λ = 1 β, Var(Z) k 2 λ 2 is approximately constant Analyze the transformed data: e.g., Z 11 = ln(y 11 ), Z 12 = ln(y 12 ),..., Z 2,n2 = ln(y 2,n2 ) Usually round to a reasonable value if β is not integer as discussed above Caution: Some researchers estimate the slope from the regression of ln(var(y )) on ln(ȳ ) then use the transform Z = Y λ with λ = 1 β/2. W. Zhou (Colorado State University) STAT 540 July 6th, 2015 50 / 50