Multicollinearity Exercise Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program below into the SAS editor window and run it.] You do not need to submit any output, so there is no need to print anything... Identify which variables are key participants in the most serious near linear dependency in the data. Hint: Look at the Variance Decomposition Proportions associated with the smallest eigenvalue of X X. 2. Which variable has the wrong sign for its coefficient in this regression? Explain why its sign is wrong. 3. What is the smallest value of the ridge constant (k) that fixes the sign of the coefficient you named in #2? 4. What is the smallest value of the ridge constant (k) that reduces all VIF s so that they are below the guideline of 0? 5. What is the smallest value of k that seems (in your opinion) to stabilize the coefficients? 6. If one principal component is removed, give the estimated coefficients for X, X2, X3, X4. Does this fix the one with the wrong sign? ******************************************************************* ************ LAW SCHOOL ADMISSION DATA ****************** **************** PARTLY FROM PAGE 599 OF SMITH *************** *******************************************************************; **** DATA FOR 20 STUDENTS ****** Y IS THE LAW SCHOOL GPA X IS THE UNDERGRADUATE SCHOOL GPA X2 IS THE LMAT PERCENTILE X3 IS A RATING OF THE UNDERGRADUATE SCHOOL QUALITY X4 IS THE GRE SCORE; DATA LAW; INPUT Y X X2 X3 X4 NO $; CARDS; 3.42 3.28.96 6 330 3.60 3.8.97 7 370 2 3.28 2.89.93 5 40 3 3.75 3.72.99 8 520 4 3.36 3.8.95 6 270 5 3.96 3.50.98 8 450 6 3.3 3.04.94 5 200 7 3.33 3.87.95 5 340 8 3.60 3.54.96 7 350 9 4.00 3.27.99 0 480 a 3.28 3.30.95 5 280 b 3.44 3.29.9 7 080 c 3.25 3.7.93 5 70 d 3.75 3.62.97 8 40 e 3.30 3.34.96 5 330 f 3.20 3.08.90 4 00 g 3.50 3.37.96 6 340 h 3.28 3.6.94 5 220 i 3.7 3.20.95 4 270 j 3.3 3.0.94 5 20 k ; TITLE 'LAW SCHOOL ADMISSIONS DATA'; PROC CORR; VAR Y X X2 X3 X4; PROC REG; MODEL Y=X X2 X3 X4 / COLLIN VIF; PROC REG RIDGE = 0 TO.0 BY.00 OUTEST=B; MODEL Y=X X2 X3 X4 ; PROC PRINT; PROC PLOT; PLOT (X X2 X3 X4) * _RIDGE_ / VREF=0 VPOS=25 HPOS=45; PROC REG DATA = LAW RIDGE= 0 TO.0 BY.00 OUTEST=C OUTVIF; MODEL Y=X X2 X3 X4; PROC PRINT; PROC REG DATA = LAW PCOMIT= 2 3 OUTEST=C; MODEL Y=X X2 X3 X4; PROC PRINT; run;
The CORR Procedure 5 Variables: Y X X2 X3 X4 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Y 20 3.45450 0.24522 69.09000 3.7000 4.00000 X 20 3.30500 0.2450 66.0000 2.89000 3.87000 X2 20 0.9550 0.02346 9.03000 0.90000 0.99000 X3 20 6.05000.5796 2.00000 4.00000 0.00000 X4 20 289 3.598 25770 00 520 Pearson Correlation Coefficients, N = 20 Prob > r under H0: Rho=0 Y X X2 X3 X4 Y.00000 0.4733 0.76094 0.95925 0.76574 0.0350 <.000 <.000 <.000 X 0.4733.00000 0.529 0.42078 0.65377 0.0350 0.064 0.0647 0.008 X2 0.76094 0.529.00000 0.69724 0.9878 <.000 0.064 0.0006 <.000 X3 0.95925 0.42078 0.69724.00000 0.69983 <.000 0.0647 0.0006 0.0006 X4 0.76574 0.65377 0.9878 0.69983.00000 <.000 0.008 <.000 0.0006
The REG Procedure Model: MODEL Dependent Variable: Y Number of Observations Read 20 Number of Observations Used 20 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 4.0743 0.26786 56.54 <.000 Error 5 0.0706 0.00474 Corrected Total 9.4249 Root MSE 0.06883 RSquare 0.9378 Dependent Mean 3.45450 Adj RSq 0.922 Coeff Var.99243 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Variance Inflation Intercept 2.37864 24.38266 0.0 0.9236 0 X 0.2572 0.64288 0.20 0.8476 96.67364 X2 6.05826 32.4872 0.9 0.853 2280.94358 X3 0.2977 0.047 9.6 <.000.9905 X4 0.00087848 0.00646 0.4 0.8937 2880.98384 Collinearity Diagnostics Number Eigenvalue Condition Index Proportion of Variation Intercept X X2 X3 X4 4.95347.00000.6469E8 0.0000022.027655E8 0.0020.375895E7 2 0.04096 0.9976 0.000005 0.00006334 4.377599E7 0.56236 6.249609E8
Number Eigenvalue Condition Index Collinearity Diagnostics Proportion of Variation Intercept X X2 X3 X4 3 0.00348 37.70759 0.00003532 0.0026 0.0000063 0.3407 0.0004449 4 0.00209 48.668 3.79043E7 0.0504 0.00000729 0.73 0.00052977 5.475688E7 5793.782 0.99996 0.98274 0.99999 0.0064 0.99906
The REG Procedure Model: MODEL Dependent Variable: Y
The REG Procedure Model: MODEL Dependent Variable: Y
MODEL TYPE DEPVAR RIDGE RMSE Intercept X X2 X3 X4 Y MODEL PARM S Y. 0.068829 2.37864 0.25 72 2 MODEL RIDGE Y 0.000 0.068829 2.37864 0.25 72 3 MODEL RIDGE Y 0.00 0.06887 0.89844 0.039 89 4 MODEL RIDGE Y 0.002 0.068880.7880 0.032 49 5 MODEL RIDGE Y 0.003 0.068886.28047 0.029 77 6 MODEL RIDGE Y 0.004 0.068892.3340 0.028 38 7 MODEL RIDGE Y 0.005 0.068899.36094 0.027 55 8 MODEL RIDGE Y 0.006 0.068907.37947 0.027 00 9 MODEL RIDGE Y 0.007 0.06895.3960 0.026 63 0 2 MODEL RIDGE Y 0.008 0.068925.39968 0.026 37 MODEL RIDGE Y 0.009 0.068935.40504 0.026 7 MODEL RIDGE Y 0.00 0.068947.40850 0.026 03 6.058 26 6.058 26.736 02.365 24.230 08.6 82.2 78.096 27.079 20.067 48.059 35.053 73 0.29 77 0.29 77 0.29 32 0.29 07 0.28 83 0.28 59 0.28 36 0.28 3 0.27 90 0.27 67 0.27 44 0.27 2.0008784 80.0008784 80.0000077 58 0.000068 635 0.000097 650 0.0003 203 0.00023 073 0.00030 0 0.00035 238 0.00039 379 0.00042 787 0.00045 676
Plot of X*_RIDGE_. Legend: A = obs, B = 2 obs, etc. X 0.5 ˆ A 0.0 ˆ 0.05 ˆ A A A A A A A A A A 0.00 ˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ 0.000 0.002 0.004 0.006 0.008 0.00 Ridge regression control value NOTE: obs had missing values.
etc. Plot of X2*_RIDGE_. Legend: A = obs, B = 2 obs, X2 6 ˆ A 4 ˆ 2 ˆ A A A A A A A A A A 0 ˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ 0.000 0.002 0.004 0.006 0.008 0.00 Ridge regression control value NOTE: obs had missing values.
etc. Plot of X3*_RIDGE_. Legend: A = obs, B = 2 obs, X3 0.5 ˆ A A A A A A A A A A A 0.0 ˆ 0.05 ˆ 0.00 ˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ 0.000 0.002 0.004 0.006 0.008 0.00 Ridge regression control value NOTE: obs had missing values.
Plot of X4*_RIDGE_. Legend: A = obs, B = 2 obs, etc. X4 0.00025 ˆ A A A A A A A A A 0 ˆƒƒƒƒƒAƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0.00025 ˆ 0.0005 ˆ 0.00075 ˆ A 0.00 ˆ Šƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒˆƒƒƒ 0.000 0.002 0.004 0.006 0.008 0.00 Ridge regression control value NOTE: obs had missing values.
MODEL TYPE DE P LAW SCHOOL ADMISSIONS DATA RIDGE RMSE Intcpt X X2 X3 X4 MODEL PARMS Y.. 0.068829 2.37864 0.257 6.06 0.2977 0.00 2 MODEL RIDGEVIF Y 0.000... 96.6736 2280.94.9905 2880.98 3 MODEL RIDGE Y 0.000. 0.068829 2.37864 0.257 6.06 0.2977 0.00 4 MODEL RIDGEVIF Y 0.00... 3.9053 59.05.95543 74.08 5 MODEL RIDGE Y 0.00. 0.06887 0.89844 0.0399.74 0.2932 0.00 6 MODEL RIDGEVIF Y 0.002... 2.86 7.99.94548 22.2 7 MODEL RIDGE Y 0.002. 0.068880.7880 0.0325.37 0.2907 0.00 8 MODEL RIDGEVIF Y 0.003....802 8.89.93597 0.72 9 MODEL RIDGE Y 0.003. 0.068886.28047 0.0298.23 0.2883 0.00 0 MODEL RIDGEVIF Y 0.004....6537 5.48.92660 6.4 MODEL RIDGE Y 0.004. 0.068892.3340 0.0284.6 0.2859 0.00 2 MODEL RIDGEVIF Y 0.005....5803 3.84.9732 4.34 3 MODEL RIDGE Y 0.005. 0.068899.36094 0.0275.2 0.2836 0.00 4 MODEL RIDGEVIF Y 0.006....5373 2.93.908 3.9 5 MODEL RIDGE Y 0.006. 0.068907.37947 0.0270.0 0.283 0.00 6 MODEL RIDGEVIF Y 0.007....5090 2.37.89898 2.48 7 MODEL RIDGE Y 0.007. 0.06895.3960 0.0266.08 0.2790 0.00 8 MODEL RIDGEVIF Y 0.008....4887 2.00.88992 2.02 9 MODEL RIDGE Y 0.008. 0.068925.39968 0.0264.07 0.2767 0.00 20 MODEL RIDGEVIF Y 0.009....473.74.88093.70 2 MODEL RIDGE Y 0.009. 0.068935.40504 0.0262.06 0.2744 0.00 22 MODEL RIDGEVIF Y 0.00....4606.55.8720.47 23 MODEL RIDGE Y 0.00. 0.068947.40850 0.0260.05 0.272 0.00
Obs MODEL TYPE D E P PCOMIT RMSE Intcpt X X2 X3 X4 MODEL PARMS Y. 0.06883 2.3786 0.257 6.0583 0.298.0008785 2 MODEL IPC Y. 0.06670.5276 0.0235 0.9076 0.295 0.000569 3 MODEL IPC Y. 2 0.0775 0.6633 0.375 3.6579 0.0658 0.0005384 4 MODEL IPC Y. 3 0.3095 0.7603 0.2087 2.782 0.0357 0.000538