Chap 10: Dagnostcs, p384 Multcollnearty 10.5 p406 Defnton Multcollnearty exsts when two or more ndependent varables used n regresson are moderately or hghly correlated. - when multcollnearty exsts, regresson results can be confusng and msleadng. For example n a multple regresson model all partal slopes wll be sgnfcant wth a sgnfcant global F-test. Sgns of the regresson coeffcents mght not make sense. - Varance nflaton factors Tolerance : T = 1 R 2 VIF = 1 = 1 T 1 R 2 VIF = r 1 XX 1
/*Multcollnearty*/ optons ls=75; data cgar; nfle 'cgar.txt' frstobs=2; nput Row co tar ncotne weght; proc reg; model co = tar ncotne weght/ tol vf; model tar = ncotne weght; model ncotne = tar weght; model weght = tar ncotne; proc corr; var tar ncotne weght; run; Row co tar ncotne weght 1 13.6 14.1 0.86 0.9853 2 16.6 16.0 1.06 1.0938 3 23.5 29.8 2.03 1.1650 4 10.2 8.0 0.67 0.9280 5 5.4 4.1 0.40 0.9462 6 15.0 15.0 1.04 0.8885 7 9.0 8.8 0.76 1.0267 8 12.3 12.4 0.95 0.9225 9 16.3 16.6 1.12 0.9372 10 15.4 14.9 1.02 0.8858 11 13.0 13.7 1.01 0.9643 12 14.4 15.1 0.90 0.9316 13 10.0 7.8 0.57 0.9705 14 10.2 11.4 0.78 1.1240 15 9.5 9.0 0.74 0.8517 16 1.5 1.0 0.13 0.7851 17 18.5 17.0 1.26 0.9186 18 12.6 12.8 1.08 1.0395 19 17.5 15.8 0.96 0.9573 20 4.9 4.5 0.42 0.9106 21 15.9 14.5 1.01 1.0070 22 8.5 7.3 0.61 0.9806 23 10.6 8.6 0.69 0.9693 24 13.9 15.2 1.02 0.9496 25 14.9 12.0 0.82 1.1184 2
Example (Cgarette) The REG Procedure Model: MODEL1 Dependent Varable: co Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 495.25781 165.08594 78.98 <.0001 Error 21 43.89259 2.09012 Corrected Total 24 539.15040 Parameter Estmates Parameter Standard Varable DF Estmate Error t Value Pr > t Tolerance Intercept 1 3.20219 3.46175 0.93 0.3655. tar 1 0.96257 0.24224 3.97 0.0007 0.04623 ncotne 1-2.63166 3.90056-0.67 0.5072 0.04566 weght 1-0.13048 3.88534-0.03 0.9735 0.74970 Parameter Estmates Varance Varable DF Inflaton Intercept 1 0 tar 1 21.63071 ncotne 1 21.89992 weght 1 1.33386 3
Model: MODEL2 Dependent Varable: tar Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 734.81601 367.40801 226.94 <.0001 Error 22 35.61759 1.61898 Corrected Total 24 770.43360 Root MSE 1.27239 R-Square 0.9538 Dependent Mean 12.21600 Adj R-Sq 0.9496 The REG Procedure Model: MODEL3 Dependent Varable: ncotne Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 2.87120 1.43560 229.90 <.0001 Error 22 0.13738 0.00624 Corrected Total 24 3.00858 Root MSE 0.07902 R-Square 0.9543 Dependent Mean 0.87640 Adj R-Sq 0.9502 Model: MODEL4 Dependent Varable: weght Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 0.04622 0.02311 3.67 0.0421 Error 22 0.13846 0.00629 Corrected Total 24 0.18468 Root MSE 0.07933 R-Square 0.2503 Dependent Mean 0.97028 Adj R-Sq 0.1821 4
Pearson Correlaton Coeffcents, N = 25 Prob > r under H0: Rho=0 tar ncotne weght tar 1.00000 0.97661 0.49077 <.0001 0.0127 ncotne 0.97661 1.00000 0.50018 <.0001 0.0109 weght 0.49077 0.50018 1.00000 0.127 0.0109 r XX 1.00000 0.97661 0.49077 = 0.97661 1.00000 0.50018 0.49077 0.50018 1.00000 21.6307-21.0918-0.06586 r 1 = -21.0918 21.8999-0.60285 XX -0.0659-0.6028 1.33386 5
Outlers 10.2 p390 Deleted resduals d = y yˆ( ) y = th observed response y = the predcted value of the th repose ˆ() when the data for the th observaton s deleted from the analyss. Studentzed deleted resduals p396 s 2 d MSE = 1 h and d t = s d d t = sd n p 1 = e MSE 2 (1 h ) e 1/2 6
The studentzed deleted resdual t has a dstrbuton that s approxmated by a t- dstrbuton wth (n-1)-p d.f. The approprate Bonferron crtcal value therefore s t(1 α / 2 n, n 1 p) (p396) (n-1)-p= (24-1)-(4+1) = 18 and t = 3.59213 (1 0.025/24,18) Example Row cty traffc sales cty1 cty2 cty3 1 1 59.3 6.3 1 0 0 2 1 60.3 6.6 1 0 0 3 1 82.1 7.6 1 0 0 4 1 32.3 3.0 1 0 0 5 1 98.0 9.5 1 0 0 6 1 54.1 5.9 1 0 0 7 1 54.4 6.1 1 0 0 8 1 51.3 5.0 1 0 0 9 1 36.7 3.6 1 0 0 10 2 23.6 2.8 0 1 0 11 2 57.6 6.7 0 1 0 12 2 44.6 5.2 0 1 0 13 3 75.8 82.0 0 0 1 14 3 48.3 5.0 0 0 1 15 3 41.4 3.9 0 0 1 16 3 52.5 5.4 0 0 1 17 3 41.0 4.1 0 0 1 18 3 29.6 3.1 0 0 1 19 3 49.5 5.4 0 0 1 7
20 4 73.1 8.4 0 0 0 21 4 81.3 9.5 0 0 0 22 4 72.4 8.7 0 0 0 23 4 88.4 10.6 0 0 0 24 4 23.2 3.3 0 0 0 /*outlers*/ optons ls=75; data nflu; nfle 'nflu.txt' frstobs=2; nput Row cty traffc sales cty1 cty2 cty3; proc reg; model sales = cty1 cty2 cty3 traffc / nfluence; output out=a cookd=cook h=h rstudent=tres; proc prnt data=a; var TRes h cook; run; 8
The REG Procedure Model: MODEL1 Dependent Varable: sales Analyss of Varance Sum of Mean Source DF Squares Square F Value Pr > F Model 4 1469.76287 367.44072 1.66 0.1996 Error 19 4194.22671 220.74877 Corrected Total 23 5663.98958 Root MSE 14.85762 R-Square 0.2595 Dependent Mean 9.07083 Adj R-Sq 0.1036 Coeff Var 163.79550 Model: MODEL1 Dependent Varable: sales Output Statstcs Hat Dag Cov Obs Resdual RStudent H Rato DFFITS 1 0.1348 0.009366 0.1112 1.4742 0.0033 2 0.0719 0.004998 0.1114 1.4747 0.0018 3-6.8387-0.4984 0.1809 1.4939-0.2342 4 6.6324 0.4891 0.2003 1.5339 0.2447 5-10.7084-0.8606 0.3081 1.5482-0.5743 6 1.6217 0.1129 0.1138 1.4735 0.0405 7 1.7129 0.1192 0.1135 1.4724 0.0427 8 1.7378 0.1213 0.1181 1.4799 0.0444 9 5.6357 0.4079 0.1731 1.5134 0.1866 10 4.5527 0.3791 0.3763 2.0190 0.2945 11-3.8850-0.3202 0.3647 2.0048-0.2426 12-0.6677-0.0536 0.3342 1.9667-0.0380 13 56.4638 179.3101 0.2394 0.0000 100.6096 14-10.5571-0.7589 0.1429 1.3061-0.3098 15-9.1533-0.6578 0.1489 1.3673-0.2752 16-11.6812-0.8439 0.1451 1.2625-0.3477 17-8.8082-0.6327 0.1497 1.3806-0.2654 18-5.6714-0.4141 0.1875 1.5381-0.1990 19-10.5926-0.7616 0.1430 1.3049-0.3111 20-1.6668-0.1224 0.2038 1.6389-0.0619 21-3.5423-0.2639 0.2237 1.6557-0.1417 22-1.1128-0.0817 0.2028 1.6408-0.0412 23-5.0187-0.3824 0.2548 1.6888-0.2236 24 11.3406 1.0336 0.4527 1.7946 0.9400 9
Obs TRes h cook 1 0.009 0.11115 0.00000 2 0.005 0.11143 0.00000 3-0.498 0.18091 0.01143 4 0.489 0.20027 0.01248 5-0.861 0.30814 0.06688 6 0.113 0.11384 0.00035 7 0.119 0.11350 0.00038 8 0.121 0.11815 0.00042 9 0.408 0.17305 0.00728 10 0.379 0.37626 0.01816 11-0.320 0.36468 0.01236 12-0.054 0.33424 0.00030 13 179.310 0.23944 1.19566 14-0.759 0.14286 0.01963 15-0.658 0.14894 0.01561 16-0.844 0.14511 0.02455 17-0.633 0.14966 0.01455 18-0.414 0.18752 0.00828 19-0.762 0.14304 0.01980 20-0.122 0.20375 0.00081 21-0.264 0.22369 0.00422 22-0.082 0.20285 0.00036 23-0.382 0.25483 0.01047 24 1.034 0.45268 0.17608 10
Leverage values 10.3 p398 - h = Leverage of the th observaton. The leverage values are the dagonal elements of the hat matrx H= X( XX ) 1X Observatons wth 2p n h >. Are consdered by ths rule to ndcate outlyng cases wth regard to ther X values.. Note p n s the average leverage. 11
Cook s dstance, p402 - A measure of the overall nfluence of an observaton on the estmated β coeffcents. Cook s dstance: ( ˆ ) 2 y y h D = pmse (1 h )2 -Note that D depends on both the resdual ( y yˆ ) and the leverage h. - A large value of D ndcates that the th observaton has a strong nfluence on the estmated β coeffcents. - Values of D can be compared to the values of the F(p, n-p) Usually an observaton that falls above the 50 th percentle of the F dstrbuton s consdered to be an nfluental observaton. In fast-food sales example n = 24, p = 5 numerator d. f. = 5 and denomnator d. f = 24-5 = 19 and F = 0.9020 0.50 12