Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Topic 19: Remedies

Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Regression Diagnostics Summary Check normality of the residuals with a normal quantile plot or histogram Plot the residuals versus predicted values, versus each of the X s and (when appropriate) versus time/space Examine the partial regression plots Use the graphics smoother to see if there appears to be a curvilinear pattern

Regression Diagnostics Summary Examine the studentized deleted residuals (RSTUDENT in the output) The hat matrix diagonals Dffits, Cook s D, and the DFBETAS Check observations that are extreme on these measures relative to the other observations

Regression Diagnostics Summary Examine the tolerance for each X If there are variables with low tolerance, you need to do some model building Recode variables Variable selection

Remedial measures Weighted least squares Ridge regression Robust regression Nonparametric regression Bootstrapping

i Maximum Likelihood ( β + β σ 2 ) Y~ N X, f i 0 1 i = 2 1 πσ e 1 2 L= f f f Y 1 2 n 0 1 β β X i 0 1 i 2 (likelihood function) Find β and β which maximizes σ L

i Maximum Likelihood What is Y have different (but known) variance? i ( β + β σ 2 ) i Y~ N X, f i 0 1 i = 1 2πσ 1 Y 2 1 2 n i e L= f f f 0 1 β β X i 0 1 i 2 (likelihood function) Find β and β which maximizes σ i L

Weighted regression Maximization of L with respect to β s is equivalent to minimization of σ 1 Y X X 2 i ( ) 2 β β β p ip i 0 1 i,1 1, 1 Weight of each case is w i =1/σ i 2

Weighted least squares Least squares problem is to minimize the sum of w i times the squared residual for case i Computations are easy use the weight statement in proc reg b w = (X WX) -1 (X WY) where W is a diagonal matrix of the weights The problem in practice now becomes determining the weights

Determination of weights Find a relationship between the absolute residual and another variable and use this as a model for the standard deviation Similar approach using the squared residual to model the variance Or use grouped data or approximately grouped data to estimate the variance for all cases in the group

Determination of weights With a model for the standard deviation or the variance, we can approximate the optimal weights Optimal weights are proportional to the inverse of the variance

KNNL Example KNNL p 427 Y is diastolic blood pressure X is age n = 54 healthy adult women aged 20 to 60 years old

Get the data and check it data a1; infile../data/ch11ta01.txt'; input age diast; proc print data=a1; run;

Plot the relationship symbol1 v=circle i=sm70; proc gplot data=a1; plot diast*age / frame; run;

Diastolic bp vs age Strong linear relationship, no skewness but nonconstant variance

Run the regression proc reg data=a1; model diast=age; output out=a2 r=resid; run;

Regression output Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 2374.96833 2374.96833 35.79 <.0001 Error 52 3450.36501 66.35317 Corrected Total 53 5825.33333 Root MSE 8.14575 R-Square 0.4077 Dependent Mean 79.11111 Adj R-Sq 0.3963 Coeff Var 10.29659

Regression output Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept 1 56.15693 3.99367 14.06 <.0001 age 1 0.58003 0.09695 5.98 <.0001 Estimators still unbiased but no longer have minimum variance Prediction interval coverage often lower or higher than 95%

Use the output data set a2 to get the absolute and squared residuals data a2; set a2; absr=abs(resid); sqrr=resid*resid;

Generate plots with a smooth proc gplot data=a2; plot (resid absr sqrr)*age; run;

Absolute value of the absr 20 residuals vs age 18 16 14 12 10 8 6 4 2 0 20 30 40 50 60 age

Squared residuals vs age

Model the std dev vs age (absolute value of the residual) proc reg data=a2; model absr=age; output out=a3 p=shat; Note that a3 has the predicted standard deviations (shat)

Compute the weights data a3; set a3; wt=1/(shat*shat);

Regression with weights proc reg data=a3; model diast=age / clb; weight wt; run;

Output Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 83.34082 83.34082 56.64 <.0001 Error 52 76.51351 1.47141 Corrected Total 53 159.85432 Root MSE 1.21302 R-Square 0.5214 Dependent Mean 73.55134 Adj R-Sq 0.5122 Coeff Var 1.64921

Output Parameter Estimates Parameter Standard 95% Confidence Variable DF Estimate Error t Value Pr > t Limits Intercept 1 55.56577 2.52092 22.04 <.0001 50.5072 60.6244 age 1 0.59634 0.07924 7.53 <.0001 0.43734 0.75534 Reduction in std err of the age coeff

Ridge regression If (X X) is difficult to invert (near singular) then approximate by inverting (X X+kI). Estimators of coefficients are now biased but more stable. For some value of k, ridge regression estimator has a smaller mean square error than ordinary least square estimator. Can be used to reduce number of predictors Ridge = k is an option for model statement. Cross-validation / ridge plots used to determine k

Ridge Regression Can express ridge constraint in terms of finding b to minimize 2 ( Y Zβ) ( Y Zβ) + λ β j where Z in standardized X Note: LASSO is a variation of this approach in which you minimize ( Y Zβ) ( Y Zβ) s.t. β j t

Ridge Regression in SAS proc reg data=a1 outest=b; run; model fat=skinfold thigh midarm / ridge=0 to.1 by.001;

Robust regression Basic idea is to have a procedure that is not sensitive to outliers Alternatives to least squares, minimize sum of absolute values of residuals median of the squares of residuals Do weighted regression with weights based on residuals, and iterate

Nonparametric Several versions regression We have used i=sm70 Interesting theory All versions have some smoothing or penalty parameter similar to the 70 in i=sm70

Regression trees Standard approach in area of data mining replacing multiple regression Basically partition the X space into rectangles Repeatedly split data two nodes based on a single predictor Predicted value is mean of responses in rectangle

Bootstrap Very important theoretical development that has had a major impact on applied statistics Uses resampling to approximate the sampling distribution Sample with replacement from the data or residuals and repeatedly refit model to get the distribution of the quantity of interest

Background Reading We used programs topic19.sas This completes Chapter 11