ADA045 406 OCSMATICS INC STATE COU.ESt PA F/S j2/j RIOSC REISCSUON FC NCNSTA$CARDIZLD Moats. (U) OCT 77 T I. KINS, 0 SMITH N0OOLøa75 C 1O54 IMCLASSIFIID TR1066 joe l AD A04 5406 END DAn FILMED 177 DEC l t L. /1
,., P. 0. Box 618 DESMATICS, INC. Stote ColIe e, Po. 1680 1 Phone: (814) 2389621 Applied Resedrch in Stdtistics Mdthemotics Opert,tions Resedrch _J(IDGE REGRESSION FOR NONSTAIIDAIWIZED MODELS by (/ Ok._ i Terry L./King I is E./Smith \ f k. : H This study was su ported b the 0 ice of Naval Research under Contract 5_C_ 154 Task No. MR 042334 : Reproduc n in whole or in par t is permitted for any purpose of the Uni ted Sta tes Governmen t Approved for public release; distribution unlimited 2J.
r TABLE OF CONTENTS Page I. INTRODUCTION.... 1 II. STA1 DARDIZATION IN RIDGE ESTIMATION 5 III. CONCLUSIONS 17 IV. REFER ENCES. 18 ç c c \ ii u :
... w. i!_ INTRODUCTION Consider the usual regression model (1.1) where is an n x 1 vector of observations, 8 (B O,8 l,... 8) is a (p + 1) x 1 vector of unknown parameters, c is an n x 1 vector of random errors, and X is a fixed n x (p + 1) matrix. It will be assumed tha t X is of full rank and that the f irst column of X is a column of ones, denoted by 1. It will also be assumed that c has expectation 0, Var (E) c 2 i, and C follows a normal distribution. The ordinary least squares (OLS) estimator of 8 is given by A l 8 (2 2 ) X (1.2) The proper ties of A are well known. Namely, (1) 8 minimizes X8) ( (2) 8 is the bes t linear unbiased estimator of 8 (3) A is the maximum likelihood estimator of B (4) 8 N(8, c 2 (x x) 1 ) (5) MSE (8) EL ( ) ($ )] a 2 E where A1, A 2,..., ) are the eigenvalues of X X Upon examination of the last of these properties, it is clear that when the minimum elgenvalue is close to zero, the mean squared error (MSE) becomes unsatisfactorily large. This led Hoerl and 1.,.. p.,. _,..
. =.., p, Kennard (1970a) to recomend the use of the ridge estimator ( X + ki) X 1,. k > 0 (1.3) when X X, in correlation form, is an illconditioned matrix. If k is treated as a constant, the following properties are known: (1) For k 0, the ridge estimator is identical to the OLS estimator. estimator. (2) For k > 0, the ridge estimator is shorter than the OLS (3) 8* N(W X8, a )where (X X + (4) MSE( *) a 2 ZAj / (A + k) 2 + k 2. 2! (5) There always exists a k > 0 such that MSE( *) < MSE( ). Hoerl and Kennard (1970b)also suggested a graphical technique to determine the value of k. In this technique, one first constructs A a plot of the components of 8* versus k, called the ridge trace. Then using this ridge trace, he chooses a value of k for which the estimates have stabilized. Analytic methods for choosing k have also been proposed. For example, see McDonald and Galarneau (1975), or Hoerl, Kennard, and Baldwin (1975). In any event, once the value of k has been determined in some manner, (1.3) gives the estimate of to be reported. Research papers appearing since the Hoerl and Kennard articles have not always used this estimator as their ridge estimator because the form of the regression model is varied. Marquardt and Snee (1975) 2 _.f :.. r..
r p advocate using the model with only X standardized. However, they suggest choosing the value of k with also standardized. Heunnerle (1975) standardizes the design matrix X, but not the observation vector,. McDonald and Galarneau (1975) standardize both X and z but Guilkey and Murphy (1975) make no standardizations at all. Obenchain (1975) centers both X and and warns against scaling or reparameter izing unless the results are ultimately to be repor ted in that form. Thisted (1976) gives a good discussion of the problems involved with the standardization of the variables. Comparison of the results of one investigator with those of another may be hampered by the lack of uniformity in the form of the model. Most authors have recognized this problem, bu t have not investigated it further. In the case of the OLS solution, the same estimator is produced from any of the forms of the model. Thus, by solving a problem sta ted in any form, the same OLS estimator of 8 will be produced when measured in the same parameter space. This nice property is not true of ridge estimators, however. In this paper, formulae are given for different forms of the model, and comparisons are made among them. In order to examine the effect of standardization on the ridge estimator, first consider the model. If no transformations are made, the model is given in (1.1). If the X matrix and are par titioned as X [i GJ and = (8 o, 9, and the columns of G centered about their means, the model can be written as 3 lui.
E[y] (1.4) where C is the symsetric idempotent matrix ( I n j 1 ), and n 1 Ey 1. If the vectors in cc are also scaled to have unit length, then the model can be written as E[ ] n i GB (1.5) where H ccic 2, D l/2 6 and D is the diagonal matrix of scaling factors. In the case of the OLS estimator, the three forms of the model, (1.1), (1.4), and (1.5), are known to give equivalent estimators of 8. 4
before.. II. STANDARDIZATION IN RIDGE ESTIMATION For the model (1.1) the ridge estimator is given by (X X + k. J.) 1 X (2.1) In order to make the comparison easier, partition X and B as Then can be written as l * * A B (6 01, Gl (2.2) where 1 1 l Gl ( (it + k 1 ),J)G) G (! (it + k 1 ) )i 8 01 (n/ (n + k 1 ))( and J 1 1 For the centered model (1.4), the ridge estimator is where 6 (2.3) A l * G2 (c cg + k j ) G.QZ 8 02 Y In general, this is a distinct estimator from that given in (2.2). c * 2, there is nothing to prove. If O2 then 5 5
A A A 8 l (n/ (n + k 1 )8 2 I unless k 0; i.e., unless the estimator 1 is the OLS estimator. Hence, these estimators are not the same. For the third form of the model (1.5), which will be referred to as the standardized form of the model, the estimator of the original is (8 3P 3 ) (2.4) where (G CG + A l 8* A l G$ 03 C3 The estimator (2.4) can be seen to be different from that given in (2.2) in the same way a was. The estimators for the centered model are equivalent t ators for the standardized model only when D is a scalar mu pie of the identity. In this case, there do exist values of k and k that would make the estimators equivalent. 2 3 To this point in the development, the error structure has remained the same. However, if transformations are made on the observation vector, the error structure is also changed. It is not too difficult to show that if the design matrix is centered, the same ridge estimator is produced by using! in its original form or in its centered form, even if the equations to be solved do not take into account the error structure. It can also be shown that the centering and scaling of has no effect on the estimator if X has also been centered and scaled. 6 *.
r In order to evaluate the three classes of estimators, squared error (squared Euclidean distance) was used as a basis of comparison in this study. While it is possible to find explicit formulae for the MSE (i.e., the expected squared error) of each estimator, the distribution of the squared error also should be investigated. This led to a simulation study. As the discussion above has emphasized, there are three parameter spaces in which such evaluation may be made. Comparisons can be made based on the parameterization of (1) the original model, (2) the centered model, or (3) the standardized model. In this study, comparisons were made in all three parameter spaces and also in the observation ( ) space. For this simulation study, a design discussed in Marquardt and Snee (1975) was used. In the design matrix there are three vectors orthogonal to 1. Two of these vectors are highly correlated, (r 0.989), and the other correlations are zero. This design matrix, which has eight observations and four parameters to be estimated, is given in Figure 1. The eigenvalues and eigenvectors for the correlation matrix were calculated. The eigenvectors (1, 1, 0) and (1, 1, 0) corresponding to the maximum and minimum eigenvalues respectively, were multiplied by a and 3a, where a 0.8, the same standard deviation value used by Marquardt and Snee. The four resulting vec tors were transformed from the standardized space back to the or iginal space, and the constant term in the model was set equal to 0. Using each of these four vectors as the true, 1000 7 =..
I. _ 1 l l l 1 1 1 l 1 l l 1 1 1 1 1 1 l 0.8 l 1 1 0.8 l 1 0.8 1 1 1 0.8 1 1 Figure 1: Design Matrix Used in the Simulation Study 8 j.
replicates of j were generated from model (1.1) using a 0.8 Since a and $ are known, the value of k was chosen by a computer routine to minimize the MSE of each estimator, using formulae similar to that given in Hoerl and Kennard (l970a). Hence, in each case, the optimal 1 value of k was used, i.e., conditional on a and 8 At this point, the NSE of each es timator in each space could be calculated theoretically, but doing so would not tell very much about the distribution of the squared error of the estimators. Therefore, the simulation study was conducted. In this simulation the true value of B (8 ) is defined such that (1) (2) ;, when transformed to the standardized space, is equal to av, where a denotes a or 3a, and v denotes (1, 1, 0), an eigenvector corresponding to the maximum eigenvalue, or (1, 1, 0), an elgenvector corresponding to the mm imum eigenvalue. The values of a and v are used in Figures 2 through 6 to indicate the true value of used in the simulation. Estimated MSE values for each of the estimators are presented in Figure 2. Figures 3 through 6 show how the estimators compare based on the simulations. Each entry in these figures is the number of times out of 1000 that the estimator at the top of that column 1 1 Subject to computing error not exceeding 1.0 x 10 5 I...
e ci in N 0% r.,. o o in en en c in vi,i.e N. N * o..... 1 I I l i ) in. in. 1J I.4 a) 1.4 Q a) a) C 1 C l %0 vi N. a) I. 0 0 in in In in b r vi 0% in In 9.? * in N in %O in 0 o N 0% in in. i_ i in in in 0 N vi in vi 4J 0 N N 0.t 0 a 0 a) 4)a)....... N Ii en en c.i v Q CI,0 Ci %O 0%.4. a a a) IJ Co Co 0 a in ii in 0 in in vi 4 vi 1 o a) a) I N CI C4 ) C 4 c CI In U en en v Q% vi 0% vi 0% U 0 O. 1.1 0 i I in c.i in en a).4 N %O N %D N 0 0 Co Co CI In CI in c i en I. Co, x en....... 0 11 CI CI C 0 0 0 0 0 w 14 0 %O 0 0 a 0 0% 0 1.4 0 0 0 Co Co if) Co Co CI, C CI C. 0.4 0 vi 0 vi. vi vi a) a) 1 J. I I I N N a 4..4. H 0 0% 0% In a a 4. 1.1 W a 0% 0% N 0% 0 0% 0 0% H CI a) in....... a) vi 14 CI CI C 0 0 a)a) vi C 0 0 1.4..,. U) 005.rI a) N N N 0 If) vi vi vi I 1.4 0% 0% CI Co 0% vi 0 vi o 0% 0% CI C 4. C4.4. 0 0 c. d. d, vi,. : 1 _ s S 0. / 0 ṡ 0 0. 0. a).4/ 0 0 0 0 I4,_S b. /!. i.!!. e enl.5vi Sci sin a) 10
r, =.., v,.. a Multiple a 3a a 3a a B (l,l,o) 98]. 965 979 954 979 954 (1,1,0) 787 573 771 573 771 573 8! (l,l,0) 178 150 201 195 (1,1,0) 49 296 55 305 (l,l,o) 748 786 (1,1,0) 606 597 Note 1: The true value of the parameter B, when transformed to the standard ized space, is equal to av. (See text for full discussion.) Note 2: Each entry in this figure is the number of times out of 1000 replicates that the estimator at the top of the column had smaller squared error than the estimator at the left of the row. Figure 3: Comparison of Estimates of,, from the Simulation. 11 _Th.
. Jultip1e ve < a 3a a 3a 0 (l,l,o) 969 953 979 954 979 954 (1,1,0) 771 573 771 573 771 573 508 470 591 605 (1,1,0) 459 575 477 619 8* 2 (l,l,0) 748 786 (1,1,0) 609 597 Note j The true value of the parameter 3, when trf.nsformed to the standardized space, is equal to av. (See text for full discussion.) Note 2: Each entry in this figure is the number of times out of 1000 replicates that the estimator at the top of the column had smaller squared error than the estimator at the left of the row. Figure 4: Comparison of Estimates of from the Simulation. 12 III.!! i! _
F,,,..... a 3a a 3a a 3a (l,l,o) 971 953 979 954 980 954 (1,1,0) 771 573 771 573 771 573 (1,l,O) 495 467 580 610 (1,1,0) 452 575 472 622 2 (1 1,0) 755 791 (1,1,0) 609 612 No te 1: The true value of the parameter, when transformed to the standardized space, is equal to av. (See text for full discussion.) Note 2: Each entry in this figure is the number of times out of 1000 replicates that the estimator at the top of the column had smaller squared error than the estimator at the left of the row. Figure 5: Comparison of Estimates of from the Simulation. 13. 4 Z._ L I 4
it Vecto ultiple a 3a a 3a a 3a B (1 1,0) 0 0 0 0 0 0 (1,1,0) 0 0 0 0 0 0 (l,l,0) 1000 1000 1000 1000 (1,1,0) 1000 1000 1000 949 (l,1,0) 0 0 (1,1,0) 545 0 Note 1: The true value of the parameter,, when transformed to the standard ized space, is equal to av. (See text for full discussion.) No te 2 : Each entry In this f igure is the number of times out of 1000 replicates that the estimator at the top of the column had smaller squared error than the estimator at the left of the row. Figure 6: Comparison of Estimates of from the Simulation. 14 i L _
had smaller squared error than the estimator given at the left of that row. Figure 3 compares the estimators in the original space, Figure 4 in the centered space, Figure 5 in the standardized space, and Figure 6 In the observation space. In the simulation the OLS estimator performed as expected. When OLS was compared to the ridge estimators separately, each of the ridge estimators yielded estimates with squared error smaller than that of OLS in a minimum of 57% of the simulations in each of the three parameter spaces for all values of Q considered. Since the OLS estimator is a special case of all three r idge estimators, and the ridge estimators were chosen with optimal proper ties, this result is not at all surprising. Also as expected In theory, the OLS es tima tes of! produced the smallest squared error for every generated. For the original parameter, (Figure 3) the ridge estimates from model (1.1), when compared with transformed estimates from either of the other two models, (1.4) and (1.5), were closes t to the true, in more than 69% of the simulations regardless of the magnitude or orientation of the parameter vector. Since these results were based on the use of optimal k values, It does not necessarily follow that similar results will hold for nonoptimal (stochastic) k. Nonetheless, when a r idge es timator is to be used to estimate, it appears that the estimator * derived from the or iginal model may perform the best. For the centered parameter and the standardized parameter y, the results were not as striking. However, by examining Figures 4 15,. *
i. S I! and 5 it appears that in each of these cases it might be veil to base the estimation process on, i.e., by using ridge es timation with the standardized model. An estimate of can be obtained by first estimating the parameter B and then premultiplying the estimate by X As can be seen from Figure 6, the estimator based on the centered model appears to be the mos t likely candidate of the three ridge estimators considered when comparisons are made in the observation space. Of course, since the OLS es timator minimizes ( X13) ( X8), none of the ridge estimators ever resulted in a smaller squared error when measured on this basis. 16 f l...
: I III. CONCLUSION S The research described in this technical repor t has been concerned with three ridge estimators. Since each was chosen to be an optimal estimator within a par ticular transformation of the. parameter space with respect to the problem being considered, the results do not address the question of when ridge estimation rather than OLS estimation should be used. Instead, given that ridge regression is to be used, these results do address the question of which ridge estimator to use. While the results of this small scale simulation are not conclusive, they do indicate that care needs to be taken in deciding which estimator to use. The choice of estimator will depend on the criteria used to measure the goodness of the estimation process. For example, a ridge estimator with good squared error properties in the standardized parameter space may not do as well as another ridge estimator when transformed back to the original parameter space. In this case, the data analyst must decide whether measurement of squared error is more appropriate in the standardized space or in the original space. 17 L.
r, IV. REFERENCES Guilkey, D. K., and Murphy, J. L. (1975) Directed ridge regression techniques in cases of multicollinearity. JASA, 70, 769776. Hemmerle, William J. (1975) An explicit solution for generalized ridge regression. Technometrics, 17, 309314. ben, A. E. and Kennard, R. W. (1970a) Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, 5567. Hoerl, A. E. and Kennard, R. W. (1970b ) Ridge regression: applications to nonorthogonal problems. Technometrics, 12, 6982. Hoenl, A. E., Kennard, R. W., and Baldwin, K. F. (1975) Ridge regression: some simulations. Comm. Stat., 4, 105 123. Marquardt, D. W. and Snee, R. D. (1975) Ridge regression in practice. American Statistician, 29, 320. McDonald, G. C. and Galarneau, D. I. (1975) A MonteCarlo evaluation of some ridgetype estimators. JASA, 70, 407416. Obenchain, R. L. (1975) Ridge analysis following a preliminary test of a shrunken hypothesis. Technometnics, 17, 431441. Thisted, R. A. (1976) Ridge regression, minitnax estimation, and empirical Bayes methods. Division of Biostatistics, Stanford University, Technical Report No. 28. 18. 2 2 Z i.
r UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE (W m.a Oaf. Enl.r d) DE 1. E READ, INSTRUCTIONS run, u j.um ri u, iui T u BEFORE COMPLETING FORM I. REPORT NUMBER 2. GOVT ACCESSION NO. 3. RECIPIENT S CATALOG NUMBER 106 6 4. TITLE (ai d SukIlli.) 5. TYPE OF REPORT 4 PERIOD COVERED RIDGE REGRESSION FOR NONSTANDARDIZED MODELS Technical Report 6. PERFORMING ORG. REPORT NUMBER 7 AUTHOR(a) S. CONTRACT ON GRANT NUMOER(.) Terry L. King and Dennis E. Smith N000l475Cl054 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK AREA & WORK UNIT NUMBERS Desmatics, Inc. P. 0. Box 618 State College, Pa. 16801 NR 042334 II. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE Statistics and Probability Program (Code 436) October 1977 Off ice of Naval Research 13. NUMBEROF PAGES Arlington, VA 22217 18 IS MONITORING AGENCY NAME S ADDRESS(SI dsitoranl Vram Cont,oUInS OWc.) IS. SECURITY CLASS. (ol CAl. r.port) Unclassif ied ISa. DECLASSIFICATION/OOWNGRADING SCHEDULE IS DITYRIBUTION STATEMENT (of SAl. R.pcrt) 17. DISTRIBUTION STATEMENT (of A..bsf,*cS aift, d In Block 20. Sf dsfl.v.i t Irom R.po rt) IS. SUPPLEMENTARY NOTES IS. KEy WORDS (Continua en V.,.,.. aida i n c I r) aid Idvvtily by block numb.r) Ridge regression Standardization Linear model.. 20 )3 act fcontlnu..n r......id. St n.c.a.a,) aid Id.nlity by block numb.,) e In the usual regression model.a + c with X of full rank and A 4 c N (Q,a I), the ordinary least squares estimator (X X) A i has. 3 ;. i Ir.4 1. extremely large mean square error when X is an?i11conditioned atrix. Hoerl and Kennard (Technometnics, Vol. 12, No. 1, pp. 5567 (1970) 3 have DD EDITION OF I NOV SI IS OBSOLETE UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE (U5.n Data Enl.,...
V..,...! UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGI(W& on D.fa Eatsvsd) A l proposed replacing,, by the ridge estimator Q X + k, in these cases. Although Uoerl and Kennard considered the regression problem in correlation form, there is nothing in their procedure that makes this mandatory. T) This paper compares ridge estimators for B that arise when the biasing factor (k) is applied at different stages of standardization (i.e., centering and scaling), and shows which estimators are identical and which are different. In addition, results of a smallscale simulation are discussed. S UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE(W i.n Data Ent r.d) _ n