ge-k ) ECONOMETRIC INSTITUTE

Size: px

Start display at page:

Download "ge-k ) ECONOMETRIC INSTITUTE"

Christiana Gibson
5 years ago
Views:

1 ge-k ) ECONOMETRIC INSTITUTE GIANN:NFC AGRICULTURA 7:)T,4 OF NZ? THE EXACT MSE EFFICIENCY OF THE GENERAL RIDGE ESTIMATOR RELATIVE TO OLS R. TEEKENS and P.M.C. DE BOER otiv"--9 REPORT 770/ES ERASMUS UNIVERSITY ROTTERDAM, P.O. BOX 738, ROTTERDAM, THE NETHERLANDS

2 Report 770/ES Corrigendum page 3: formula in the sixth line from the bottom should read as [c] = c = X*'y = Aa + X* c page 5: equation (.)4) should read as, ) ( A in the line preceding eq. (.6) "education" should be "equation". page 8: the second part of eq. (.5) should read a / a. a. = 0 page 9: the 9th line from the bottom should read as: "is normally distributed and degrees of" _ ma a. has a x -distribution with = n-p page B. : eq. (B.) should read as x e l i r Z,z) = C.z exp )f for 8 z > 0 < x < 00 elsewhere with C = [07Trr r ) page B.: In the 4th line from the top "(w+8) should be "( I t sixth line from the top should read: " C' = 57-f-Tr r m

3 page B.3: 3rd line: " should be 4th line: " h = (w,y0)" should be " h,y;0) " page C.: remark should be: "( (0 for 0 =

4 The Exact MSE-Efficiency.of the General Ridge Estimator Relative to OLS by Rudolf Teekens and Paul de Boer ABSTRACT In this paper it has been pointed out that the merits of the Ridge procedure as proposed by Hoerl and Kennard tend to be overvalued due to an incorrect analysis of the associated Mean Square Error. For the case of the so-called General Ridge Estimator it is then shown how the exact MSE can be derived and finally it is seen that General Ridge Estimator dominates the OLS estimator.only in a limited interval of the parameter space. Contents. Introduction. The explicit general ridge estimator 3. The exact mean square error efficiency 4. Concluding remarks References Appendices * The authors are indebted to Mr A.S. Louter for performing the calculations for appendix C.

5 -- I. Introduction Since Hoerl and Kennard first published their so-called Ridge Regression method in 970, a considerable amount of research has been devoted to this subject..some Econometricians as well seem to be taken with the Ridge Method, witness the publications of Vinod (976 a, 976 b) and Moulaert (976). In our opinion, however, the Ridge Regression Method ) is based on a dubious method, which consists of optimizing an unknown loss-function, and b) dominates the OLS estimator in MSE only in a limited range of parameter values. The subsequent analysis is more or less analogous, to the one carried out by Feldstein (973) who studied the mean square error efficiency of COV (conditional omitted variable) and WTD (weighted average) estimators relative to the OLS estimator. - As Hoerl and Kennard (970 a) we consider the standard linear model ( y = + E where. y is an observable random vector of n elements X is an observable fixed matrix of order nxp with rank p is a vector of p unknown parameters.6 is a non-observable random vector of n elements which has a, multivariate normal distribution: c n(0,gi). We also follow Hoerl and Kennard in reducing the above model to a canonical form in which the X'X matrix is diagonal. This may be achieved by applying the following orthogonal transformation to X and 3. Let (.) a = and (.3) X* = XP

6 -- where the columns of P are the eigenvectors of X'X and - P' = P so that X*'X* = P'X'XP = A with A the diagonal matrix of (positive) eigenvalues of X'X. Then model (.) may be rewritten as (.5)y = X*a + c where it should be noted that a'a = = The OLS estimator of a is (X4"X*)-X* t yrz If we define the vector c as (.6) c = X*'y = P'X'y a may be written as (i.7) a = A l c Moreover, it is easily verified that (.8) = pa The general ridge procedure is defined from (.9) a* = Ex*,x* + Kri x*, = [A + K] lc with K a diagonal matrix with non-negative elements.

7 _3_ The corresponding Ridge estimator in model is then defined as (.0) P&* It should be noted here that 3 equals = [X'X + K] X'y only if K = ki. If K is not a scalar matrix it follows from (.0) that 3* = [vx + PKP']-X'y In order to determine the matrix K, we minimize the mean square error (MSE) of a* relative to a; this function will be denoted as (.) 7(&,a) = Eua* - _ ) = E E i= a- - a. Writing.9) in scalar terms, we obtain c. a* = x. + k. i =,. Realizing furthermore that [ci] = c = X*'y = Aa + X*'u has a multivariate normal distribution: c n(aa, a A) or, c. n(x.a., A.a ) we can write (.) as

8 4- c. (.) ff(a*- 'a) = E E oti] i= +k. i = E ax. C. - X.a. a.k.] a. =. + k.) avx: X. i= (X. + k.) 0.. a.k. a. i= ax. + a.k. ii (Xi + ki) Mirlinlizing7(a*;(x)witlarespecttothek. I S yields the following optimal values of k.: (.3) k. = i =, p Inspection ofthe second order conditions shows that (.3) indeed constitutes minimum for (.). Obviously this solution for k., i =, p is useless for estimation purposes, since k depends on the unknown parameters a., i =, p and a. Therefore Hoerl and Kennard propose the following method for approximating the theoretical optimal values of k.: (i) determine the OLS estimates a i and - - (ii) determine k.(0) from k.(0) = a Ia. (iii) continue the process as follows c. -* _ k. a i() X. + k. (- - a =,, until stability is achieved in

9 _5_ In the remainder of this paper we will limit ourselves to this "general form of the ridge regression", i.e. K a non-scalar diagonal matrix. But we will conclude this introduction with a number of remarks about the case of a scalar K-matrix, to which Hoerl and Kennard devote the main part of their paper. First, it is observed that in case of a unique k the MSE-function of Ci* becomes (.4) = E i= a A. + a.k so that in this case the first order condition for a minimum of 7'(6*,a) does not yield an explicit value of k. This first order condition reads as: (.5) P ak - a EA.. =0 i= (A. + k)-) Unlike the case of different k, i =, p, Hoerl and Kennard do not consider the possibility of applying an iterative method for approximating the theoretical optimal value of k. Such a method could consistofsubstitutingfortheunknownparametera.the value of its ridge estimate into (.5), c. -* a. = A. + k and then solving by a numerical method the resulting education k + (A. - c./a )k + A- ( 6) A. _o i= (k + Ai)5 The alternative method as presented by Hoerl and Kennard; viz, the use of the so-called ridge trace, has already been critizisea by Conniffe

10 -6- and Stone (973) and Farebrother (975). Their stability criterion leads them to choose a too high value of k. As indicated by. Conniffe and Stone (973), and Conniffe, Stone and O'Neill (976) and Newhouse and Oman (97)5, the proof given by Hoerl and Kennard, that for some fixed k a* has lower MSE than the OLS estimator of a, is inapplicable for the practical case where k is a random variable. However, for a unique k it seems impossible to determine analitically the true MSE of 6*, since k cannot be determined explicitely from (i.6). Fortunately, the case of general ridge regression as presented earlier in the introduction lends itself much better to an analysis of the MSE of a* and this will be the subject of the rest of this paper. II. The explicit general ridge estimator, The iterative method as proposed by Hoerl and Kennard to solve for a*. which has been described in the previous section makes things unnecessarily complicated and unclear. This method provides with a solution of the following two equations d (.) a. c. A. + k. = (.) - * k. = a /a. But these equations may easily be solved analitically; substitution of (.) into (.) yields the following quadratic equation in a*:: (.3) * * - X.a. -c.a. +o =0 and the roots of a. are ) (.4) -* a x. / c. + c 4x.a providedthatc"x.6.thecorresporiclingrootsofk.follow immediately i from substitution of (.4) into (.). Remmerle (975) found independently the same solution.

11 Given this analytical solution a number of remarks can be made (±) (ii) The general ridge estimator does not always exist; it is defined - only if c, > 4A.G The general ridge estimator is not unique, since for each i there -* - exist two pairs (a.,k.) which satisfy (. and (.) simultaneously, provided that.c > i (iii) The iteration procedure as proposed by Hoerl -* - definition the set of roots (a.,k.) with the highest la*). This may easily be seen from and Kennard smallest k. figure in selects by (or the which the c. two functions a = and a* = kl-/77 are plotted for the i X. ±k. case where c. > cx From the graph it is clear that, provided the two curves intersect (i.e. provided that c > i a ) starting the iteration with k. = 0 (i.e. chosing for = a.) one always moves to the a(0) i point of intersection with the smallest k.. In the sequel we shall concentrate on the general ridge estimator as proposed by Hoerl and Kennard, i.e. the 6* corresponding to the smallest k.. From

12 -8- the previous paragraphs it is clear that this estimator is not defined if c - < 4A G Since it is our intention to compare the MSE-performance i i of '0.ii withthatoftheolsestimatorofa.,it seems fair to complete. thedefinitionof6y defining it for c <.a to be identical to the i- OLS estimator of a.. The general ridge estimator is then defined for the entire sample space. If, moreover, we realize that the OLS estimator &T is defined as ci/xi, we may write &T as for <. (.5) -* - a- = a- ( for - where G = a lx i is the estimated variance of &.. III. The exact mean square error efficiency WedefinetheNBE-efficiencYofewith respect to a (the OLS estimator of a) as the ratio of the Mean Square Errors of the two estimators (3.) t / -.. a ] et First we consider the simple case, where the variance of the system, a, is known. In that case the general ridge estimator alf, as defined in (.5) becomes (3.) -* a. = a. for lad < a. a. - a. a. ( + - 4a../a. ) a_ for 6. > a., Since6i.isliormallydistributedwithmeana.and variance 0 = 0 IA.

13 we may apply the result of appendix A and conclude that 6 ka';a*) 4)- a where cp (.) is defined in appendix A and tabulated in appendix C. In figure we have given (p(.) for positive arguments only, since (p (.) is symmetric about the origin. From figure it can be seen that the general ridge estimator of ai dominates ) the OLS estimator for a 67: < or l a d< a/5 i. provided that the variance of the system is known. Next, we consider the realistic case where the variance a is unknown. In that case the general ridge estimator as defined in (.5) applies. Since cc:i - is normally distributed and m ci ai has a x -distribution with m degrees of freedom, we may apply the result of appendix B and conclude that the MSEefficiencYofec withrespecttoa.equals a; = ( /-*. -. = (, \ cc CC v). a l a M ) where (P(.,.) is defined in appendix B and tabulated in 'appendix C. From this table it can be.seen that (p(.,.) is very close to (p.(.) for different values of m. Therefore we may consider (p(.) as a good approximation to the mean square error efficiency of the general ridge estimator for unknown variance. ) in the sense of having a lower mean square error

15 - - IV. Concluding remarks (i) Contrary to the Stein-rule estimation procedures (see Baranchik 973), James and Stein (96) and Stein (960)), the Hoerl and Kennard procedure is based on the minimization of an objective function with unknown parameters. Therefore, any resulting optimal value of k depends on the unknown parameters and should be redefined in terms of estimated parameters in order to obtain an estimator. But then the original MSE properties of a* are no longer valid and the redefined a* should be reconsidered with respect to its MSE.. (ii) The exact MSE has been calculated for the case of the so-called general ridge estimator, with a non-scalar diagonal matrix K. Here the function (p(.) as tabulated in appendix C and pictured in figure turns out to be a good approximation to the mean square error of a*. This MSE is a function of (ai'xi Va only. * (iii) The general ridge estimator rii dominates the ordinary least squares estimator ii. only if ot.itia <.59668, a condition which can be a_ I made the subject of a pre-test.

16 References Baranchik, A.J. (973). Inadmissibility of Maximum Likelihood Estimators in Some Multiple Regression Problems with Three or More Independent Variables. The Annals of Statistics, Vol., No., p Conniffe, D. and J. Stone (973). A Critical View of Ridge Regression, The Statistician,, p Conniffe, D. and J. Stone 975). A Reply to Smith and Goldstein, The Statistician, 4, No., p. 67,68. Conniffe, D.-, Stone, J. and F. O'Neill (976). Is Ridge Regression a Useful Technique in Economic Analysis. Unpublished Research Memorandum. Farebrother, R.W. (975). The Minimum Mean Square Error Linear Estimator* and Ridge Regression, Technometrics, Vol. 7, No. p. 7,8. Feldstein, M.S. (973). Multicollenearity and the Mean Square Error of Alternative Estimators, Econometrica, Vol. 4, No,. Hemmerle, W.J. (975). An Explicite Solution for Generalized Ridge Regression, Technometrics, 7, P Hoerl, A.E. and R.W. Kennard (970-a). Ridge Regression: Biased Estimation for Non-orthogonal Problems, Technometrics, Vol., No., p Hoerl, A.EandR.W. Kennard (970-b). Ridge Regression: Applications to Non-orthogonal Problems, Technometrics, Vol., No., p James, W. and C. Stein (96). Estimation with Quadratic Loss, Proc. 4th Berkeley Symposium, p Moulaert, F. (976). Ridge Regression: a Geometrical Revisitation, Regional Science Research Paper no. 5, Centrum voor Economische Studien van de Katholieke Universiteit te Leuven.

17 -3-- Newhouse, J.P. and S.D. Oman (97). The Evaluation of Ridge Estimators. Report R-76-PR prepared for project Rand. Smith, A.F.M. and M. Goldstein (9 ). Ridge Regression: Some Comments on a Paper of Stone and Conniffe. Stein, C.M. (960). Multiple Regression. Chapter. 37 in Essays in Honour of Harold Hotelling. Stanford Univ. Press. Vinod, H.D..(976-a). Canonical Ridge and Econometrics of Joint Production, Journal of Econometrics 4, p Vinod, H.D. 977-b). Application of New Ridge Regression Methods to a Study of Bell System Scale Economies, JASA, Vol. Ti, No. 356, P

18 -A. - Appendix A Consider the random variable X, which is Normally distributed with mean e l and variance 8 and the random variable Y which is defined as ' (A.) =x = x ( for x. < 8 for IX I > 0 The Mean Square Error of Y with respect to 0 is defined as (A.) Tr(Y;0) = E EY and the Mean Square Error of X with respect to 0 is (A.3) Tr(X;0 ) = E [X - We shall now derive an integral expression for the ratio Tr and show that this ratio is a function of 0/8 only. Y;0) / X;8 From (A.) and (A.3) it folldws that (A.)4) Y 0) Tr( X; 0) and since X is normally distributed we may write for (A.4) (A.5) 7(Y0 ). (X;0 ) 0 0 /7 x 8 (x - 8 exp } dx,, ) exp{- 0 \ ) + )4 40/x - 0 ] x e

19 This expression can be rewritten if we apply the transformation z = x 0 we then have (A.6) T r(y-e ) ' Tr(X;0 ) /7- z< 0 exp {-; dz + f Hz ( > / z) ] exp e with 0 = 0/0 It can easily be seen that the function (p(0) is symmetric about the origin, therefore we confirm ourselves to e > o for its numerical evaluation. The function (pi(0) has been tabulated in appendix C.

20 -B.- Appendix B Consider the independently distributed random variables X and Z which have as marginal distributions a normal distribution with mean e l and variance 0 and a chi-square 'distribution with m degrees of freedom respectively. Then the joint density of X and Z is - r Z f x,z) = c.z exp i - - x-0 for -co <x < z > 0 elsewhere with C= [cs,itrr r (-) / In this appendix we shall investigate the mean square. error with respect to 0 of the following function of X and Z: (B.) Y =X - Y = Z/mX for XI < 8 Z/m. for 'XI > 0 /7 and we shall consider again the mean square error of Y as a proportion of the mean square error of X; we are thus looking for (B.3) (Y;0 Tr(X;8 = E 0 Y - oil'- = m( In - 0 z x-0 exp{- - dzd.,x z mx ' z. exp {- - dzdx

21 -B.- After the transformation w = e the above expression can b rewritten as c[ Iw m H-w + / - z /mw - exp w-0 dzdw exp 77 - dzdw 77 - (w+0 z exp with 0 = 0 /0 and C' = [ /Tr r z m / - It may readily be seen that the third integral in this expression equals - (C') and that the first two integrals may be taken together. If, moreover, we apply the transformation y= Illw we obtain B. ) Tr(Y0 ) Tr(x;e ) = + exp 0my. ir h(w,y0 ) wi. irly+4 gexp {-i( dwdy

22 -B.3- with 0 = /0 p = Itegnar+4) a = h = (w,y;6) = [{ ;4- ( +7-7-)- / Hence ff(y;8) / Tr(;8) can be written as a function of 8 = 8/8 and m alone. This function labelled as (p(6,m) has been tabulated in appendix C for positive values of 6 and even values of ml) ) It should be noted that the integral expression (B.4) is relatively easy to calculate for even m and since the function (f) (8 ' m) is not sensitive to m, it did not seem worthwhile to devote more time and effort to the calculation of (p0(8,m) for odd values of m.

23 0 q) ( ) m = m=6 m = 0 m=)4m = 8 m =. m=6 m = 30 m = 40 m = 50 0,00 0,89 0., , j.38 0, o Q., o, , ' M , ,87, , , o ,788 o o 0,869 0,89 0J H , , , , T o , o 0, o ,97 0,980 0, ,994 0, )6., ,95-,053, ,-0.86 : , 0 ' ,,06 '., ,5 T,55.5T. 3.4o ,45.64, '...8 : o , ,05-.07, ,05...,5-7. -,9.6 i D x-puaddv

24 (9 (0) = m = 6 m = 0 m = 4 m = 8 m = m = 6 m = 30 m = 40 m = o , , o Q o ; ; o , , Remark: (pi (e) - 0 for 0 =

25 REPORTS List of Reprints, nos ; List of Reports, /M "Triangular - Square - Pentagonal Numbers", by R.J. Stroeker. 770/ES "The Exact MSE-Efficiency of the General Ridge Estimator Relative to OLS", by R. Teekens and P.M.C. de Boer.

Ridge Regression Revisited

Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge