Chapter 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed Y i = β 0 + β 1 X i + ǫ i i = 1, 2,..., n, where ǫ i iid N(0, σ 2 ), β 0, β 1 and σ 2 are unknown parameters, X i s are fixed constants. Question: What are the possible mistakes or violations of these assumptions?
1. Regression function is not linear (E(Y ) β 0 + β 1 X) 2. Error terms do not have a constant variance 3. Error terms are not independent We will use Residual Plots to diagnose the problems Residuals: e i = Y i Ŷi = Y i (b 0 + b 1 X i ) Sample Mean: ē = 1 n Sample Var: 1 n 1 i (e i ē) 2 = 1 n 1 i e i = 0 i e2 i MSE We will sometimes use standardized (semistudentized) residuals
Nonlinearity of Regression Function (1.) Residual plot against the predictor variable, X. Or use a residual plot against the fitted values, Ŷ. Look for systematic tendencies! Example: e i < 0 e i > 0 e i < 0 plant growth residuals 0 e i < 0 e i > 0 e i < 0 water/week water/week
Nonconstancy of Error Variance (2.) We diagnose nonconstant error variance by observing a residual plot against X and looking for structure. Example: entertainment 0 residuals salary salary
Modified Levene Test 1. Divide residuals into two groups. For this example, low and high salary groups, because the variance is suspected to depend on salary. 2. Calculate d i1 = e i1 ẽ 1 and d i2 = e i2 ẽ 2, where e ij is the i th residual in group j and ẽ j is the median of residuals in group j. 3. Conduct two-sample t-test with d ij.
Nonindependence of Error Terms (3.) We diagnose nonindependence of errors over time or in some sequence by observing a residual plot against time (or the sequence) and looking for a trend (see textbook, p. 101, for typical plots). Example: #parts residuals 0 #hours #hours
But, if the data is like day 1: (X 1, Y 1 ) day 2: (X 2, Y 2 ). day n: (X n, Y n ) then we can see the effect of learning. residuals 0 residuals 0 #hours day
Model fits all but a few observations (4.) Example: LS Estimates with 2 outlying points (solid) and without them (dashed). Rule of Thumb: Outliers are detected by observing a plot of e i vs. X i. y e i MSE 3 0 +3 x x
Errors not normally dist d (5.) We assumed ǫ 1,..., ǫ n iid N(0, σ 2 ) but we can t observe these error terms! We will be convinced that this assumption is reasonable, if e 1,..., e n appear to be iid N(0, MSE). Fact: If e 1,..., e n iid N(0, MSE), then one can show that the expected value of the ith smallest is [ ( )] i 3/8 MSE z, i = 1, 2,..., n n + 1/4 Then we have pairs residual expected [ ( residual e min MSE z 1 0.375 )] n+0.25 [ ( e 2nd smallest MSE z 2 0.375 )] n+0.25.. [ ( MSE z n 0.375 )] n+0.25 e max
Notice: If Y 1,..., Y 4 iid N(0, σ 2 ), then E(Y 1 ) = = E(Y 4 ) = 0, and E(Ȳ ) = 0, but E(Y min ) = σ [ z ( )] 1 0.375 4+0.25 = σz(0.147) = 1.05σ, E(Y 2nd ) = σ [ z ( )] 2 0.375 4+0.25 = σz(0.382) = 0.30σ, E(Y 3rd ) = σ [ z ( )] 3 0.375 4+0.25 = σz(0.618) = +0.30σ, E(Y max ) = σ [ z ( )] 4 0.375 4+0.25 = σz(0.853) = +1.05σ,
Points on a straight line: Errors are normal (left) Points on a curve: Errors are not normal (right) semistud. residuals 2 1 0 1 2 2 1 0 1 2 3 expected residuals semistud. residuals 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2 1 0 1 2 3 expected residuals
Omission of important predictors (6.) Example: X i = #years of education, Y i = salary salary semistud. residuals 0 #years of education #years in job Means, that a better model would be (Multiple Regression Model)
Lack of Fit Test Suppose we want to test whether the relationship between X and Y is linear vs the possibility that it is NOT linear. Test for: H 0 : E(Y ) = β 0 + β 1 X versus H a : Not H 0 Here, H 0 includes the cases when either or both β 0 and β 1 are zero. We can t use this test unless there are multiple Y s observed at at least 1 value of X.
Y 2j Y Y 2 Y^ 2 E^(Y) = b 0 + b 1 X X 1 X 2 X 3 X 4 X Can we use this test when X=day and Y=stock price? Can we use this test when X=weight and Y=height and those are measured with a super accurate measure?
New Notation: Y values are observed at c different levels of X, say X 1, X 2,..., X c. n j such Y values, say Y 1j, Y 2j,..., Y nj j, are observed at level X j, j = 1, 2,..., c, n j 1. Let Ȳj = 1 n j i Y ij be the average of the Y s at X j and Ŷj = b 0 + b 1 X j the fitted mean under the SLR. The data now look like at X 1 : (X 1, Y 11 ), (X 1, Y 21 ),..., (X 1, Y n1 1) at X 2 : (X 2, Y 12 ), (X 2, Y 22 ),..., (X 2, Y n2 2). at X c : (X c, Y 1c ), (X c, Y 2c ),..., (X c, Y nc c) Ȳ1 Ȳ2 Ȳc
The less restricting model puts no structure on the means at each level of X (Full model). Full model: Y ij = µ j + ǫ ij, where ˆµ j = Ȳj Reduced model: Y ij = β 0 + β 1 X j + ǫ ij F-test!!!!
Note that Y ij Ŷj = (Y ij Ȳj) + (Ȳj Ŷj) Let s partition the SSE into 2 pieces SSE = SSPE + SSLF where n c j (Y ij Ŷj) 2 = j=1 i=1 n c j n c j (Y ij Ȳj) 2 + (Ȳj Ŷj) 2 j=1 i=1 j=1 i=1 If SSPE SSE, it says that the means ( ) are close to the fitted values ( ). That is, even if we fit a less restrictive model, we can t reduce the amount of unexplained variability. If SSLF SSE, the means ( ) are far away from the fitted values ( ) and the (linear) restriction seems unreasonable. Thus,
Formal Test for: H 0 : E(Y ) = β 0 + β 1 X H A : E(Y ) β 0 + β 1 X Let MSLF = SSLF c 2 and MSPE = SSPE n c F = = = SSE(R) SSE(F) df(r) df(f) SSE(F) df(f) SSE SSP E (n 2) (n c) SSPE SSLF c 2 SSPE n c n c F c 2,n c F SSE SSPE The model is bad. Test Statistic: Rejection Rule:
This fits nicely into our ANOVA Table: Source of variation SS df M S Regression SSR 1 M SR Error SSE n 2 MSE Lack of Fit SSLF c 2 MSLF Pure Error SSPE n c MSPE Total SSTO n 1 Example: Suppose that the house prices follow a SLR in #bedrooms. The estimated regression function is Ê(price/1,000) = 37.2 + 43.0(#bedrooms) Variation SS df M S Regression 62,578 1 62,578 Error 117,028 91 1,286 Lack of Fit Pure Error Total
Because F = MSLF/MSPE = 1, 432/1, 281 = 1.12 < F(0.95; 3, 88) = 2.71 we do not reject H 0. price 50 100 150 200 250 300 1 2 3 4 5 bedrooms
Transformations Motivation: Consider the function y = x 2 x y 0 0 1 1 2 4 3 9 4 16 y 0 5 10 15 y = x 2 x 2 y 0 0 1 1 4 4 9 9 16 16 y 0 5 10 15 0 1 2 3 4 x 0 5 10 15 x 2 If you have (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ) and you know y = f(x), then (f(x 1 ), y 1 ), (f(x 2 ), y 2 ),..., (f(x n ), y n ) will be on a straight line. What follows are two situations in which transformations may help:
Situation 1: nonlinear regression function with constant error variances (1.) Note that E(Y ) doesn t appear to be a linear function of X, that is, the points do not seem to lie on a line. The spread of the Y s at each level of X appears to be constant, however. Y X vs. Y 0 4 8 12 16 X Typical remedy Transform X We consider Do not transform Y because this will disturb the spread of the Y s at each level X. Y sqrt(x) vs. Y 0 2 8 12 4 sqrt(x)
Situation 2: nonlinear regression function with nonconstant error variances (1. with 2.) X vs. Y Note that E(Y ) isn t a linear function of X. The variance of the Y s at each level of X is increasing with X. Y 0 4 8 12 16 X X vs. sqrt(y) Typical remedy Transform Y (or maybe X and Y ) We consider And hope that both problems are fixed. sqrt(y) 0 4 8 12 16 X
Prototypes for Transforming Y Y Y Y X Try Y, log 10 Y, or 1/Y X X Prototypes for Transforming X Y Y Y X Use X or log 10 X (left); X 2 or exp(x) (middle); 1/X or exp( X) (right). X X