PARTIAL RIDGE REGRESSION 1

Similar documents
Ridge Regression and Ill-Conditioning

Introduction to Matrix Algebra

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

Ridge Regression Revisited

Multicollinearity and A Ridge Parameter Estimation Approach

Statistics 910, #5 1. Regression Methods

Properties of Matrices and Operations on Matrices

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

Fundamentals of Matrices

Evaluation of a New Estimator

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

X -1 -balance of some partially balanced experimental designs with particular emphasis on block and row-column designs

Lecture 10 - Eigenvalues problem

STK-IN4300 Statistical Learning Methods in Data Science

Improved Liu Estimators for the Poisson Regression Model

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

Econ Slides from Lecture 8

Multivariate Statistical Analysis

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Maximum Likelihood Estimation

ge-k ) ECONOMETRIC INSTITUTE

Computational Linear Algebra

ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA. Cross River State, Nigeria

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

SOME RESULTS ON THE MULTIPLE GROUP DISCRIMINANT PROBLEM. Peter A. Lachenbruch

STAT200C: Review of Linear Algebra

Research Article An Unbiased Two-Parameter Estimation with Prior Information in Linear Regression Model

Numerical Linear Algebra

Ridge Regression: Biased Estimation for Nonorthogonal Problems

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions

Eigenvalues and diagonalization

ELEC633: Graphical Models

Research Article On the Stochastic Restricted r-k Class Estimator and Stochastic Restricted r-d Class Estimator in Linear Regression Model

Data Mining and Analysis: Fundamental Concepts and Algorithms

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

PHYS 705: Classical Mechanics. Rigid Body Motion Introduction + Math Review

Study Notes on Matrices & Determinants for GATE 2017

Knowledge Discovery and Data Mining 1 (VO) ( )

Performance Surfaces and Optimum Points

Matrix Algebra, part 2

Exam questions with full solutions

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

1 Data Arrays and Decompositions

Matrix Algebra: Summary

Singular Value Decomposition Compared to cross Product Matrix in an ill Conditioned Regression Model

Symmetric matrices and dot products

Response Surface Methodology III

The Use of Multiple Measurements to Monitor Protocol. Adherence in Epidemiological Studies*

Monte Carlo Methods. Leon Gu CSD, CMU

Appendix A: Matrices

ƒ f(x)dx ~ X) ^i,nf(%i,n) -1 *=1 are the zeros of P«(#) and where the num

SOME NEW PROPOSED RIDGE PARAMETERS FOR THE LOGISTIC REGRESSION MODEL

Measuring Local Influential Observations in Modified Ridge Regression

EIGENVALUES IN LINEAR ALGEBRA *

Math 423/533: The Main Theoretical Topics

Spectral inequalities and equalities involving products of matrices

A NOTE ON CONFIDENCE BOUNDS CONNECTED WITH ANOVA AND MANOVA FOR BALANCED AND PARTIALLY BALANCED INCOMPLETE BLOCK DESIGNS. v. P.

ON EFFICIENT FORECASTING IN LINEAR REGRESSION MODELS

Draft of an article prepared for the Encyclopedia of Social Science Research Methods, Sage Publications. Copyright by John Fox 2002

Minimax design criterion for fractional factorial designs


EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

Conjugate Gradient (CG) Method

A Practical Guide for Creating Monte Carlo Simulation Studies Using R

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Linear Algebra Practice Final

Ridge Estimator in Logistic Regression under Stochastic Linear Restrictions

Economics 620, Lecture 4: The K-Varable Linear Model I

Exam Study Questions for PS10-11 (*=solutions given in the back of the textbook)

EE263: Introduction to Linear Dynamical Systems Review Session 5

MLES & Multivariate Normal Theory

Two Posts to Fill On School Board

Assignment 11 (C + C ) = (C + C ) = (C + C) i(c C ) ] = i(c C) (AB) = (AB) = B A = BA 0 = [A, B] = [A, B] = (AB BA) = (AB) AB

Regression coefficients may even have a different sign from the expected.

Stat 206: Linear algebra

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

16.584: Random Vectors

CAAM 335 Matrix Analysis

Linear Algebra Formulas. Ben Lee

Exercise Sheet 1.

Positive definite preserving linear transformations on symmetric matrix spaces

Simultaneous Optimization of Incomplete Multi-Response Experiments

Matrix Factorizations

Applications of Fisher Information

Linear Models in Statistics

Bayesian Estimation of Regression Coefficients Under Extended Balanced Loss Function

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.

Multivariate Gaussian Analysis

Response Surface Methodology

Efficient Choice of Biasing Constant. for Ridge Regression

Transcription:

1 2 This work was supported by NSF Grants GU-2059 and GU-19568 and by U.S. Air Force Grant No. AFOSR-68-1415. On leave from Punjab Agricultural University (India). Reproduction in whole or in part is permitted for any purpose of the United States Government PARTIAL RIDGE REGRESSION 1 by D: Raghavarao 2 and K.J. C. Smith Department of Statistics University of North CaroUna at Chapel. HiU Institute of Statistics Himeo Series No. 863 February, 1973

PARTIAL RIDGE REGRESSION~ by 2 D. Raghavarao and K.J.C. Smith Department of Statistics University of North CaroZina at ChapeZ BiZZ ABSTRACT A partial ridge estimator is proposed as a modification of the Hoerl and Kennard ridge regression estimator. It is shown that the proposed estimator has certain advantages over the ridge estimator. The problem of taking an additional observation to meet certain optimality criteria is also discussed. 1 This work was supported by NSF Grants GU-2059 and GU-19568 and by U~S. Air Force Grant No. AFOSR-68-l4l5. 2 On leave from Punjab Agricultural University (India)

1. Introduction. Consider the problem of fitting a linear model z. = X.. + f.., where z.' "" (Yl'Y2' "'Yn) is a vector of n observations on the dependent variable; X = (X ij ) is an n x p matrix of rank p, ~ "" (xil,xiz""'xip) being the vector of i-th observations on the independent variables (i "" 1,2,..,n); ~' "" (Sl'SZ""'Sp) is the vector of parameters to be estimated; and E' is an n-dimensional vector of random errors assumed to be distributed with mean vector QI and dispersion matrix 2 a In' Q being a zero vector and In the identity matrix of order n. Without loss of generality we assume that the dependent and independent variables are standardized so that XiX is a correlation matrix. Let Al ~ A2~.. ~ A p be the eigenvalues of XIX and let ~1'~2""'~ be a set of ortho-normal eigenvectors associated with the eigenvalues Ai (i = 1,2,...,p). Let a =~! S for i -'1.- i = 1,2,,p. The usual least squares estimator of ~ is given by (1.1) i = (X I X) -1 XI y.. and has the unsatisfactory property, when X'X differs substantially from an identity matrix, that the mean squared error or expected squared distance from! to! tends to be large compared to that of an orthogonal system. Often an investigator is interested in obtaining a variance balanced design in which each parameter Si is estimated with equal precision. The departure of a design from variance balancedness increases the more XIX differs from an identity matrix. The ridge regression method proposed by Hoerl and Kennerd (1970) estimates ~ by the ridge estimator given by (1. 2) _ 8* = (XiX + ki p.j... )-~'v,

where k is a positive real number satisfying 2 (1.3) 2 2 k < (1 la == max a max being the maximum of ai(i = 1,2,... p). The estimator A f* is a biased estimator of ~ but has a smaller mean squared error than the least squares estimator!. We propose here as an alternative to the ridge estimator estimator i*, the (1.4) where (1.5) This estimator may be called the partial ridge estimator of ~. We show e in Section 2 that the partial ridge estimator estimates ~! (wit:h minimum mean squared error and estimates the..ilg) (i = 1,'::,,p-1) unbiasedly. In Section 3 we consider the problem of ta~ing e.g a,~ditio:l",l observation so as to rel'~ove the bias of t~e ~rtia1.riqge'2~timatorand to attain certain optimality criteria. 2. Partial Ridge Estimator. To control the mean squared error of the estimator of the coefficient vector ~ in the model y.. = X! 4-, Hoer1 and Kennard (1970) proposed a ridge estimator, s*, defined by (1.2) and showed that the mean squared error of S* was less than that of the least squares estimator ~ of ~. Specifically. the mean squared error of e* is (2.1) [[(e* = (12! Ai i=l (A i +k)2 = y1 (k) + Y2 (k) + i=l! (A.+k) 2 1. say,

where E[] denotes the expected value of the term in braces. The term 3 Y l (k) is the sum of the variances of the components of e* and the term y 2 (k) is the bias component of the mean squared error. When k = 0, the ridge estimator coincides with the least squares estimator. We propose as an alternative to the ridge estimator of! a partial ridge estimator of!, denoted by S, defined by (1.4). The partial -p ridge estimator has the following property: Theorem 2.1. The partial ridge estimator S = (X'X + k ~ ~,)-l X'v, wh ere 1 2/ 2, ~ = 0 a, is such that ~ S p P --PI' with minimum mean squared error and estimator of ~~ 8 (i = 1,2,,p-l). -:L - ~'8,-p I' P'"""P""""'P..lis the linear estimator of ~' 8 1'- is the best linear unbiased Proof. Since ~l' ~2""'~ are a set of ortho-normal eigenvectors - - """'"P associated with the eigen values A >A >... >A 1 = 2 = = p of X'X; will be eigenvectors associated with the eigenvalues of xx' (i = 1,2,...p). Let.!ll' n Z ',.!:lq._p be a set of orthogonal eigenvectors associated with the zero eigen value of multiplicity n-p of XX'. The vectors x~ (i = 1,2,...,p) and.!:lj (j = 1,2,., n-p) form a basis of an n-dimensional vector space. Without loss of generality any linear estimator of t = ~! can be taken to be (2.2) where and ~ n-p R: = 2.c i s-i x' y +.L d j ~ Y., i=1 J=l J are scalars. The mean squared error of as an estimator of t can be shown to be '" 2 p-l 2 2 2 2 A _1)2a 2 + 2 C A 2 n-p (2.3) E[(t-t) ] = ) c 0 d 2 2 i (Ai a. + o Ai) + (c + I 0 J. J.=l P P P P P j=l J Minimizing (2.3) with respect to the coefficients c i and d. we have J

(2.4) =... = c 1 p- d n-p = 0, c p = 1 2 A+~ P 2 a p 4 2 2 Choosing k = a la,the linear estimator of ~'e with least mean squared p p -p - error is given by (A + k )-1 ~' X'y. The best linear unbiased estimators p p t' of ~! are the least squares estimators 1\i,-1 2..i..J.. ~'X'v (i =" 1 2,p- 1) ~laking a 1-1 correspondence of estimators of ~! with estimators of Si' the required estimator ~ of ~ is given by p-l 1 1 4 = (I c ~ ti + A +k ~ ~)X'y i=l ~ P P = (X'X + k E; ~,)-lx'y. P --p --p This completes the proof of Theorem 2.1. The problem of estimating k can be solved either by graphical or p iterative procedures as described by Hoerl and Kennard (1970). From (2.1) and (2.3) we note that the bias component in the mean squared error of the partial ridge estimator is smaller than that of the ridge estimator. 3. Optimum choice of an additional observation. The equation (1.4) defining the partial ridge estimator suggests taking an additional observation Yn+l on the dependent variable corresponding to some choice of values of the independent variables. Let us assume without loss of generality that the design matrix with an additional observation is (3.1) x J x' - n+l

where ~+l ~+l = 1 and w is a non-zero scalar. The least squares 5 estimator of ~ using the additional observation is (3.2) (X 'X + ' )-1 X' = w ~+l ~+l 1 which is an unbiased estimator of ~. Before discussing the optimal choice of the additional observation, we shall introduce the following: Definition 3.1 Let Al ~ A 2 ~ ~ A p be the eigenvalues of X'X, where X is a n x p design matrix. The departure from variance balanced- e (3.3) ness of the design X is measured by An equivalent expression for Q(X) is (3.4) where tr[a] denotes the trace of the matrix A. Definition 3.2. [Kiefer (1959)] Of the class of all n x p design matrices X, the design X is A-optimal if tr[(x,x)-l] is minimum. Definition 3.3. [Kiefer (1959)] Of the class of all n x p design matrices X, the design X is D-optimal if det[(x'x)-l] is minimum, where det[ ] denotes the determinant of the matrix in braces

The following theorem gives the optimum choice of w and ~+1 for an- 6 additional observation: Theorem 3.1. Given the n x p design matrix X, among possible choices of w and ~l in (3.1), the design (3.5) X = * J~ X ~, 1- - t> p has the following properties: (i) Q,(X*) < Q(X) (ii) Among the class of designs Xl in (3.1), Q(X *) ~ Q(X ) 1 (iii) Among the class of designs Xl in (3.1) and subject to Q(X ) 1 minimum, X* is A- and D-optimal. Proof. For the design Xl of (3.1), 2 1 = Q(X) + 2w ~+l X'X ~+l+ w (1 - p) - 2w A The quadratic form ~+l (XiX)~+1 is minimized when ~+l = ~ and the minimum value is A p Substituting this least value of ~+lx'x ~+1 in (3.6) and minimizing with respect to w, we obtain the stationary value of w (3.7) to be A - A P w = 1 1 p

Substituting into (3.6), the minimum value of Q(X l ) is * (r _A )2 (3.8) ~in(xl) = Q(X) = Q(X) - 1 _ rp Thus Q(X) * < Q(X). Moreover Q(X) * is the minimum value of Q(X ) 1 Now 7 (3.9) -1-1-1 = det[(x'x) ] (1 + w ~+l (XiX) ~+l) The maximum value of I (' )-1 ~+l X X ~+l is minimized with respect is lla for Xl = ~I P -0+1 '"'""P order that Q(X 1 ) be least, w must be given by (3.7). Hence Hence to ~+l when ~+l =~. In X* is D-optimal among the class of designs Xl with minimum Q(X ). 1 To prove the A-optimality of X* among the class of designs Xl with minimum (3.10) Q(X l ), we observe that I (I )-2 1 1 w ~+l X X ~+l tr[(xi X1 )- ] 1 = tr[(xix)- ] - I,-1 1+w?S.n+1 (X X)?S.n+1 The maximum value of the second term on the right hand side of (3.10) is the maximum of 1 (3.11) )J =, A (~+ 1) w where A's are the eigenvalues of XiX. In order that Q(X l ) is least, w is given by (3.7) and the maximum )J x' = ~ I Thus X* -0+1 '"'""P minimum Q(X l ) is attained when A = A and p is A-optimal among the class of Xl matrices with

References 8 Hoer1, Arthur E. and Kennard, Robert lot. (1970). "Ridge Regression: Biased Estimation for Non-orthogonal Problems. II TeC!hnometriC!s, 12, 55-67. Kiefer, J. (1959). "Optimum Experimental Designs." J. Roy. Statist. SoC!., 21B, 272-304