Chapter 14 Stein-Rule Estimation

Size: px

Start display at page:

Download "Chapter 14 Stein-Rule Estimation"

Alexandra Lorin Dickerson
5 years ago
Views:

1 Chapter 14 Stein-Rule Estimation The ordinary least squares estimation of regression coefficients in linear regression model provides the estimators having minimum variance in the class of linear and unbiased estimators. The criterion of linearity is desirable because such estimators involve less mathematical complexity, they are easy to compute and it is easier to investigate their statistical properties. The criterion of unbiasedness is attractive because it is intuitively desirable to have an estimator whose expected value, i.e., the mean of the estimator should be same as the parameter being estimated. Considerations of linearity and unbiased estimators sometimes may lead to an unacceptably high price to be paid in terms of the variability around the true parameter. It is possible to have a nonlinear estimator with better properties. It is to be noted that one of the main objectives of estimation is to find an estimator whose values have high concentration around the true parameter. Sometimes it is possible to have a nonlinear and biased estimator that has smaller variability than the variability of best linear unbiased estimator of the parameter under some mild restrictions. In the multiple regression model n1 nk k1 n1 y X, E 0, V I, the ordinary least squares estimator (OLSE) of is 1 estimator of in the sense that it is linear in y, E b and unbiased estimators of. Its covariance matrix is 1 V b E b b ' X ' X. The weighted mean squared error of any estimator is defined as ' ij i i j j E W w E i j b X ' X X ' y which is the best linear unbiased and b has smallest variance among all linear where W is k k fixed positive definite matrix of weights w ij. The two popular choice of weight matrix W are (i) W is an identity matrix, i.e. W I error (MSE) of. then E ' is called as the total mean squared 1

2 (ii) W X ' X, then ' ' ' E X X E X X X X is called as the predictive mean squared error of. Note that E y X is the predictor of average value E y X and X X is the corresponding prediction error. There can be other choices of W and it depends entirely on the analyst how to define the loss function so that the variability is minimum. If a random vector with k elements k is normally distributed as N I,, being the mean vector, then Stein established that if the linearity and unbiasedness are dropped, then it is possible to improve upon the maximum likelihood estimator of under the criterion of total MSE. Later, this result was generalized by James and Stein for linear regression model. They demonstrated that if the criteria of linearity and unbiasedness of the estimators are dropped, then a nonlinear estimator can be obtained which has better performance than the best linear unbiased estimator under the criterion of predictive MSE. In other words, James and Stein established that OLSE is inadmissible for k under predictive MSE criterion, i.e., for k, there exists an estimator such that ' ' ' ' E X X E b X X b for all values of with strict inequality holding for some values of. For k, no such estimator exists and we say that "b can be beaten in this sense. Thus it is possible to find estimators which will beat b in this sense. So a nonlinear and biased estimator can be defined which has better performance than OLSE. Such an estimator is Stein-rule estimator given by 1 c b when is known and ee ' 1c b when is unknown. Here c is a fixed positive characterizing scalar, ee ' is the residuum sum of squares based on OLSE and e y Xb is the residual. By assuming different values to c, we can generate different estimators. So a class of estimators characterized by c can be defined. This is called as a family of Stein-rule estimators.

3 Let 1 c bx ' ' Xb be a scalar quantity. Then b. So essentially we say that instead of estimating 1,,..., k by b1, b,..., b k we estimate them by b1, b,..., bk, respectively. So in order to increase the efficiency, the OLSE is multiplied by a constant. Thus is called the shrinkage factor. As Stein-rule estimators attempts to shrink the components of b towards zero, so these estimators are known as shrinkage estimators. First we discuss a result which is used to prove the dominance of Stein-rule estimator over OLSE. Result: Suppose a random vector of order mean vector and I is the covariance matrix. Then k 1 is normally distributed as N, I where is the ' E k E 1. ' ' An important point to be noted in this result is that the left hand side depends on but right hand side is independent of. Now we consider the Stein-rule estimator when is known. Note that b E E b c E (In general, a non-zero quantity) 0, in general. Thus the Stein-rule estimator is biased while OLSE b is unbiased for. 3

4 The predictive risk of b and are PR b E b ' X ' X b PR E ' X ' X. The Stein-rule estimator is better them OLSE b under the criterion of predictive risk if PR PRb. Solving the expressions, we get 1 ' ' ' 1 b X ' X X ' 1 1 PR b E ' X X ' X X ' X X ' X X ' 1 E' X X ' X X ' 1 Etr X ' X X ' ' X tr X X X E X tr X ' X k. tri k 1 X ' X c ' ' c PR E b b X X b c E b' X ' X be b' X ' X bb' X ' Xb 4 c E b' X ' Xb 4 k E E c b ' X ' Xb c. 4

5 Suppose 1 ' 1/ X X b or 1/ b X ' X 1 X ' X 1/ or X ' X 1/ and 1 ~ N, I, i.e.,,,..., k are independent. Substituting these values in the expressions for PR, we get Thus PR k E c E PR if and only if ' c ' c ' ' kec E ' ' ' 1 kc E c E ' ' 1 kc k ce using the result ' 1 PR bc k ce '. PRb c kce 0. ' 1 Since ~ N, I, 0. So ' has a non-central Chi-square distribution. Thus 1 E 0 ' c k c 0. 5

6 Since c 0 is assumed, so this inequality holds true when k c 0 or 0 c k provided k. So as long as 0 c k is satisfied, the Stein-rule estimator will have smaller predictive risk them OLSE. This inequality is not satisfied for k 1 and k. To find the value of c for which PR is minimum, we differentiate PR PR b c k ce ' with respect to c and it gives as follows: d PRb 1 1 d PR d k cc E dc dc ' dc k c0. or ck. Further, d PR dc ck 0. 0 The largest gains efficiency arises when c k. So if the number of explanatory variables are more than two, then it is always possible to construct an estimator which is better than OLSE. The optimum Stein-rule estimator, or James-Stein rule estimator of in this case is given by ( p ) 1 b when is known. 6

7 To avoid the change of sign in this estimator, the positive part version of this estimator called as Positive part Stein-rule estimator is given by ( p) ( p) 1 b when 0 1 ( p ) 0 when 1. When is unknown then it can be shown that the Stein-rule estimator ee ' 1c b is better than OLSE b if and only if k 0 c ; k. nk The optimum choice of c giving largest gain is efficiency is k c. n k 7

Ridge Regression and Ill-Conditioning

Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane