Ridge Regression Estimation Approach to Measurement Error Model

Size: px

Start display at page:

Download "Ridge Regression Estimation Approach to Measurement Error Model"

Anabel Ryan
5 years ago
Views:

1 Ridge Regression Estimation Approach to Measurement Error Model A.K.Md. Ehsanes Saleh Carleton University, Ottawa CANADA) Shalabh Department of Mathematics & Statistics Indian Institute of Technology, Kanpur INDIA) shalab@iitk.ac.in Abstract This paper considers the estimation of the parameters of measurement error models where the estimated covariance matrix of the regression parameters is ill conditioned. We consider the Hoerl and Kennard type 1970) ridge regression RR) modifications of the five quasiempirical Bayes estimators of the regression parameters of a measurement error model when it is suspected that the parameters may belong to a linear subspace. The modifications are based on the estimated covariance matrix of the estimators of regression parameters. The estimators are compared and the dominance conditions as well as the regions of optimality of the proposed estimators are determined based on quadratic risks. Keywords: Linear Regression model, measurement error, multicollinearity, reliability matrix, ridge regression estimators, shrinkage estimation, Stein type estimators, preliminary test estimator, asymptotic relative efficiency.

2 1 Introduction One of the basic assumptions in the linear regression analysis is that all the explanatory variables are linearly independent, i.e., the rank of the matrix of the observations on all the explanatory variable is of full column rank. When this assumption is violated, the problem of multicollinearity enters into the data and it inflates the variance of ordinary least squares estimator of the regression coefficient which is the minimum in the absence of multicollinearity. The problem of multicollinearity has attracted many researchers, see see Silvey 1969) and large amount of research has been devoted to diagnose the presence and degree of multicollinearity, see Belsley 1991), Belsley, Kuh and Welsh 2004) and Rao, Toutenburg, Shalabh and Heumann 2008) for more details. A complete bibliography on multicollinearity is not one of the objectives of this paper. Obtaining the estimators for multicollinear data is an important problem in the literature. Several approaches have been proposed for its solution. Among them, the ridge regression estimation approach due to Hoerl and Kennard 1970) turned out to be the most popular approach among researchers as well as practitioners. The ridge estimators under the normally distributed random errors in regression model have been studied by Gibbons 1981), Sarker 1992), Saleh and Kibria 1993), Gruber 1998), Malthouse 1999), Singh and Tracy 1999), Wencheko 2000), Inoue 2001), Kibria and Saleh 2003, 2004), Arashi, Tabatabaey and Iranmanesh 2010), Hassanzadeh, Arashi and Tabatabaey 2011a, 2011b), Arashi, Tabatabaey and Soleimani 2012), Bashtian, Arashi and Tabatabaey 2011a, 2011b) etc. The details of development of other approaches and the literature related to the ridge regressions is not within the scope of this paper. Another fundamental assumption in all statistical analysis is that all the observations are correctly observed. This assumption is more oftenly violated in real life data. Consequently, the measurement errors creep into the data. In such situations, the observations on the variables are not correctly observable and are contaminated with measurement errors. The usual statistical tools tend to loose their validity when the data is contaminated with measurement errors, see Fuller 1987), Cheng and van Ness 1999) for more details on measurement error models. In the context of linear regression models, the ordinary least squares estimator of regression coefficient becomes biased as well as inconsistent in the presence of 1

3 measurement errors in the data. An important issue in the area of measurement errors is to find the consistent estimators of the parameters. The consistent estimators of the regression coefficient can be obtained by utilizing some additional information from outside the sample. In the context of multiple linear regression models, the additional information in the form of known covariance matrix of measurement errors associated with explanatory variables and known matrix of reliability ratios of explanatory variables have been extensively utilized in various context, see e.g., Gleser1992, 1993), Kim and Saleh 2002, 2003a, 2003b, 2005, 2008, 2011), Saleh 2010), Sen and Saleh 2010), Jurečková, Sen and Saleh 2010), Shalabh, Garg, Misra 2009) etc. In many practical situations, the observations are not only measurement error ridden but also exhibit some linear relationships among them. So the problem of multicollinearity enter into the measurement error ridden data. An important issue crops up then is how to obtain the estimators of regression coefficients under such a situation. One simple idea is to use the ridge regression estimation over the measurement error ridden data. An obvious question that crops up is what happens then? In this paper, we attempt to answer such questions. It is well known that Stein 1956) and James and Stein 1961) initially proposed the Stein estimator and positive-rule estimators which have been justified by empirical Bayes methodology due to Efron 1975, 1977), Efron and Morris 1972, 1973a, 1973b, 1975). The preliminary test estimators were proposed by Bancroft 1944). On the other hand, ridge regression estimators were proposed by Hoerl and Kennard 1973) and they combat the problem of multicollinearity for the estimation of regression parameters. Saleh 2006, Chapter 4) proposed quasi-empirical Bayes estimators. So we have considered five quasi-empirical Bayes estimators estimators by weighing the unrestricted, restricted, preliminary test and Stein-type estimators by the ridge weight function. The resulting estimators are studied in measurement error models. The quadratic risks of these estimators have been obtained and optimal regions of superiority of the estimators are determined. The plan of the paper is as follows. We describe the model set up in Section 2. The details and development of the estimators are presented in Section 3. The comparison of 2

4 estimators over each other is studied and their dominance conditions are reported in Section 4. The summary and conclusions are placed in Section 5 followed by the references. 2 The Model Description Consider the multiple regression model with measurement errors Y t = β 0 + x tβ + e t, t = 1, 2,..., n 2.1) X t = x t + u t where β 0 is the intercept term and β = β 1, β 2,..., β p ) is the p 1 vector of regression coefficients, x t = x 1t, x 2t,..., x pt ) is the p 1 vector of set t th observations on true but unobservable p explanatory variables that are observed as X t = X 1t, X 2t,..., X pt ) with p 1 measurement error vector u t = u 1t, u 2t,..., u pt ), u it being the measurement error in i th explanatory variable x it and e t is the response error in observed response variable Y t. We assume that x t, e t, u t ) N 2p+1 {µ x, 0, 0 ), BlockDiagΣ xx, σ ee, Σ uu )} 2.2) with µ x = µ x1, µ x2,..., µ xp ), σ ee is the variance of e t s whereas Σ xx and Σ uu are the covariance matrices of x t s and u t s respectively. Clearly, Y t, X t) follows a p + 1)-variate normal distribution with mean vector β 0 + β µ x, µ x ) and covariance matrix σ ee + β Σ xx β β Σ xx. 2.3) Σ xx β Σ xx + Σ uu Then the conditional distribution of Y t given X t is N {γ 0 + γ X t ; σ zz = σ ee + β Σ xx I p K xx)β} 2.4) where K xx = Σ 1 XX Σ xx = Σ xx + Σ uu ) 1 Σ xx 2.5) is the p p matrix of reliability ratios of X, see Gleser1992). 3

5 Further, we note that γ 0 = β 0 + β I p K xx)µ x, γ = K xx β and β = K 1 xx γ. 2.6) Our basic problem is the estimation of β under various situations situations beginning with the primary estimation of β assuming Σ uu is known which is as follows. Let S = S Y Y S Y x 2.7) S xy S XX where i) S Y Y = Y Ȳ 1 p) Y Ȳ 1 p), Y = Y 1, Y 2,..., Y n ), 1 n = 1, 1,..., 1). ii) S XX = S Xi X i )), S Xi X i = x i X i 1 n ) x i X i 1 n ) iii) S Xi Y = X i X i 1 n ) Y i Ȳi1 n ), S XY = S X1 Y, S X2 Y,..., S Xp Y ) iv) X i = 1 n n t=1 X it, Ȳ = 1 n n t=1 Y t. 1 Clearly, S n 1 XX is an unbiased estimator of Σ XX and 1 S P n XX Σ XX as n where P denotes the convergence in probability. Gleser 1992) showed that the maximum likelihood estimators of γ 0, γ and σ zz are just the naive least squares estimators, viz., γ 0n = Ȳ γ X, n γ n = S 1 XX S XY and 2.8) σ zz = 1 Y γ n 0n1 n γ nx) Y γ 0n 1 n γ nx) provided σ ee = σ zz γ nk 1 xx Σ uu γ n ) When Σ uu is known and K xx is unknown, then K xx is estimated by ˆK xx = S 1 XX S XX nσ uu ). 2.10) Further, ˆKxx P K xx as n. Thus, the maximum likelihood estimates of β 0, β 1 4

6 and σ ee are given by β 0n = γ on β ni p ˆK xx) x, β n = ˆK 1 xx γ n and σ ee = σ zz β nσ uu ˆKxx βn 2.11) respectively. Finally, β 0n reduces to Ȳ β n X and β n = S XX nσ uu ) 1 S XY 2.12) provided σ ee 0 as in 2.9). The estimators will be designated as the unrestricted estimators of β 0 and β. Then by Theorem 2 of Fuller 1987) we have as n i) n γ n γ) is normally distributed with mean 0 and covariance matrix σ zz Σ 1 XX. ii) n β n β) is normally distributed with mean 0 and covariance matrix σ zz C 1 defined by C = K xxσ XX K xx 2.13) Let C n be a consistent estimator of C given by Thus, using 2.10) we may write C n = ˆK xx ˆΣ XX ˆKxx = S XX nσ uu ) S 1 XX S XX nσ uu ) 1 n C n = C + O p 1). In case, β is suspected to belong to the linear subspace of Hβ = h where H is a q p matrix and h is a q 1 vector of known numbers respectively, the restricted estimator of β is defined by see Saleh and Shiraish 1989). ˆβ n = β n Cn 1 H HCn 1 H ) 1 H β n h), 2.14) Under H 0, as n, we have n βn β) D nˆβn β) N 3p n βn β n ) 0 0 0, σ zz C 1 C 1 A A C 1 A C 1 A 0 A 0 A 2.15) 5

7 where A = C 1 H HC 1 H ) 1 HC 1. Since it is suspected that the restrictions Hβ = h may hold, we remove the suspicion by testing hypothesis H 0 based on the Wald-type statistic Thus under H 0, as n, L n D denotes the convergence in distribution. L n = H β n h) HC 1 n H ) 1 H β n h) σ zz. 2.16) D χ 2 q, chi-square variable with q degrees of freedom where 3 Regression Estimators of β In this section, we introduce the ridge regression estimators of β. For this, we first consider the conditional set up of least squares method with known reliability matrix K xx and minimize the quadratic form with Lagrangian multiplier XKβ + γ 0 1 p Y ) XKβ + γ 0 1 p Y ) + kβ β. This minimization yields the normal equation for β as K xxs XX K xx + ki p ] β = K xxx Y. Thus the ridge regression estimator for β is given by ) ] 1 1 β n k) = I p + ˆK xxs XX ˆKxx ˆβn, 3.1) substituting the estimators of K xx given by 2.10) with β n = S xx nσ uu ) 1 S xy. Here, the ridge factor of the ridge estimator is given by which is a consistent estimator of R n k) = ] I p + kcn 1 1, Cn = ˆK xxs xx ˆKxx 3.2) Rk) = I p + kc 1] 1, C = K xx Σ XX K xx. 6

8 Based on this factor, the unrestricted ridge regression estimator β n k) is defined by β n k) = R n k) β n. It is easy to verify that as n, the bias, MSE and trace of MSE expressions for β n k) are given by b 1 β n k)) = kc 1 k)β ; C 1 k) = C + ki p ) 1 M 1 β n k)) = σ zz Rk)] C 1 Rk)] + k 2 C 1 k)ββ C 1 k) trm 1 β n k))) = σ zz tr{rk)] C 1 Rk)]} + k 2 β C 2 k)β. Further, since β is suspected to belong to the subspace Hβ = h, we shall consider four more estimators, viz., the i) Restricted estimator of β given by ii) Preliminary test estimator PTE) of β given by ˆβ n k) = R n k)ˆβ n 3.3) ˆβ P T n k) = R n k)ˆβ P T n 3.4) where ˆβ P T n = β n β n ˆβ n )IL n < χ 2 qα)) 3.5) and χ 2 qα) denotes the α-level critical value of a Chi-square distribution with q degrees of freedom. The preliminary test estimation under the assumption of normally distributed random errors has been pioneered by Bancroft 1944) and considered later by Bancroft 1964), Han and Bancroft 1968), Judge and Bock 1978), Giles 1991), Benda 1996), Kibria and Saleh 2006), Saleh 2006) and Arashi 2009), Arashi, Tabatabaey and Soleimani 2012) among others. In the set up of measurement errors models, Kim and Saleh 2002, 2003a, 2003b, 2005, 2008, 2011) have considered the preliminary test and Stein-type estimation. iii) James-Stein type shrinkage estimator SE) of β due to James and Stein 1961) is given by ˆβ S nk) = R n k)ˆβ S n 3.6) 7

9 where ˆβ S n = β n q 2) β n ˆβ n )L n ) The Stein-rule estimation technique in various models have been considered by several researchers, see e.g., Ohtani 1993), Gruber 1998), Saleh 2006), Shalabh 1998, 2001), Tabatabaey and Arashi 2008), Arashi and Tabatabaey 2009, 2010a, 2010b) and Hassanzadeh, Arashi and Tabatabaey 2011), Arashi 2012) among many others. iv) Positive rule Stein estimatorprse) of β is given by ˆβ S+ n k) = R n k)ˆβ S+ n 3.8) where ˆβ S+ n = ˆβ n IL n < q 2) + ˆβ S nil n q 2). 3.9) Now, we present the asymptotic distributional properties of the five ridge regression estimators. It may be verified that the test L n for the test of Hβ = h is consistent as n. Thus all the quasi-empirical Bayes estimators ˆβ P T n, ˆβ S n and ˆβ S+ n are asymptotically equivalent to β n while the asymptotic distribution of ˆβ n degenerates as n. To by pass this problem, we consider the asymptotic distribution under the sequence of local alternatives K n) : Hβ = h + n 1 2 ξ; ξ R q. 3.10) Thus, using Saleh 2006), we obtain the following Theorem concerning the five basic estimators of β defined by 2.12), 2.14), 3.5), 3.7) and 3.9). Theorem 1 Under {K n) } and the basic assumptions of the measurement error model, the following holds: a) n βn β) nˆβn β) n βn ˆβ n ) D N 3p 0 δ δ, σ zz C 1 C 1 A A C 1 A C 1 A 0 A 0 A. 3.11) b) lim P {L n x K n) } = H q x, 2 ), 2 = 1 δ Cδ, δ = C 1 H HC 1 H ) 1 ξ n σ zz 8

10 where H q x; 2 ) is the c.d.f. of a noncentral Chi-square distribution with q degrees of freedom and noncentrality parameter 2. c) lim P { nˆβ P T n β) x K n) } = H q χ 2 qα); 2 )G p x + δ); 0; σ zz C 1 A]) n + G p x C 1 H HC 1 H ) 1 Z; 0; σ zz C 1 A])dG p Z; 0; σ zz HC 1 H ]) E 2 ) where G p x; µ x ; Σ) is the c.d.f. of a p-variate normal distribution with mean µ x and covariance matrix Σ and E 2 ) = { Z : Z + ) HC 1 H ) 1 Z + ) χ 2 qα) } with Z N p 0, C 1 ). d) e) Here D nˆβs n β) Z q ] 2)C 1 H HC 1 H ) 1 HZ + ). Z + ) HC 1 H ) 1 Z + ) nˆβs+ n β) D Z q ] 2)C 1 H HC 1 H ) 1 HZ + ) Z + ) HC 1 H ) 1 Z + ) I HZ + ) HC 1 H ) 1 HZ + ) q 2) ] +C 1 H HC 1 H ) 1 HZ + ) D denotes the convergence in distribution. I HZ + ) HC 1 H ) 1 HZ + ) < q 2) ]. Then we have the following theorem giving the characteristics of the basic five estimators of β. The risk functions are computed using a quadratic loss function with a semi-definite matrix Q. Note that bˆθ), Mˆθ) and Rˆθ; ) denote the bias, MSE matrix and risks, respectively of an estimator ˆθ of a parameter θ. Theorem 2 Under {K n) } and the assumed regularity conditions as n, the following holds: 9

11 a) b 1 β n ) = 0, M 1 β n ) = σ zz C 1, R 1 β n ; Q) = σ zz trc 1 Q). b) b 2 ˆβ n ) = δ, M 2 ˆβ n ) = σ zz C 1 A] + δδ, R 2 ˆβ n ; Q) = σ zz trc 1 A)Q] + δ Qδ. c) b 3 ˆβ P T n ) = δh q+2 χ 2 qα); 2 ], M 3 ˆβ P T n ) = σ zz C 1 AH q+2 χ 2 qα); 2 ] + δδ {2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ]}, R 3 ˆβ P T n ; Q) = σ zz trc 1 Q) σ zz traq)h q+2 χ 2 qα); 2 ] + δqδ ){2H q+2 χ 2 qα); 2 ] d) H q+4 χ 2 qα); 2 ]}. b 4 ˆβ S n) = q 2)δE χ 2 q+2 2 ) ], M 4 ˆβ S n) = σ zz C 1 σ zz q 2)A { 2E χ 2 q+2 2 ) ] q 2)E χ 4 q+2 2 ) ]} +q 2)δδ { 2E χ 2 q+2 2 ) ] 2E χ 4 q+4 2 ) ] + q 2)E χ 4 q+4 2 ) ]}, R 4 ˆβ S n; Q) = σ zz trc 1 Q) σ zz q 2)trAQ) { 2E χ 2 q+2 2 ) ] q 2)E χ 2 q+2 2 ) ]} +q 2)δ Qδ { 2E χ 2 q+2 2 ) ] 2E χ 2 q+4 2 ) ] + q 2)E χ 4 q+4 2 ) ]}. 10

12 e) b 5 ˆβ S+ n ) = q 2)δ { E χ 2 q+2 2 ) ] + H q+2 q 2; 2 ) Eχ 2 q+2 2 )I χ 2 q+2 2 ) < q 2) ) ] }, M 5 ˆβ S+ n ) = M 4 ˆβ S 1 n) A q 2)χ 2 q+2 2 ) ) 2 I χ 2 q+2 2 ) < q 2) )] 1 +δδ {2E q 2)χ 2 q+4 2 ) ) 2 I χ 2 q+2 2 ) < q 2) )] 1 E q 2)χ 2 q+4 2 ) ) 2 I χ 2 q+4 2 ) < q 2) )]}, ] R 5 ˆβ S+ n ; Q) = tr M 4 ˆβ S 1 n)q σ zz traq) q 2)χ 2 q+2 2 ) ) 2 I χ 2 q+2 2 ) < q 2) )] { 1 +δ Qδ) 2E q 2)χ 2 q+4 2 ) ) 2 I χ 2 q+2 2 ) < q 2) )] 1 E q 2)χ 2 q+4 2 ) ) 2 I χ 2 q+4 2 ) < q 2) )]}. The following theorem gives the asymptotic distributional bias, MSE matrices and risk expressions for the ridge regression estimators given by 3.3), 3.3), 3.4), 3.6) and 3.8) Theorem 3 Under {K n) } and the basic assumptions of the measurement error model, the following holds: a) n βn k) β) D nˆβn k) β) N 3p n βn k) ˆβ n k)) kc 1 k)β kc 1 k)β Rk)δ, σ zzσ Rk)δ 3.12) where C 1 k) = C + ki p ) 1 and {Rk)C 1 Rk) } {Rk)C 1 A)Rk) } {Rk)ARk) } Σ = {Rk)C 1 A)Rk) } {Rk)C 1 A)Rk) } 0 {Rk)ARk) } 0 {Rk)ARk) } b) The asymptotic distributional bias, MSE matrices and risk expressions are given by i) b 1 β n k)) = kc 1 k)β, M 1 β n k)) = σ zz Rk)C 1 Rk) + k 2 C 1 k)ββ C 1 k), R 1 β n k); W ) = σ zz tr W Rk)C 1 Rk) ] + k 2 β C 2 k)β. 11

13 ii) b 2 ˆβ n k)) = kc 1 k)β Rk)δ, M 2 ˆβ n k)) = σ zz Rk)C 1 A]R k) + kc 1 k)β + Rk)δ ] kc 1 k)β + Rk)δ ], R 2 ˆβ n k); W ) = σ zz trw Rk)C 1 A)Rk) )] iii) iv) + kc 1 k)β + Rk)δ ] W kc 1 k)β + Rk)δ ]. b 3 ˆβ P T n k)) = kc 1 k)β Rk)δH q+2 χ 2 qα); 2 ], M 3 ˆβ P T n k)) = σ zz Rk)C 1 Rk) ] σ zz Rk)ARk) ]H q+2 χ 2 qα); 2 ] Rk)δδ Rk) ]{2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ]} +k 2 C 1 k)ββ C 1 k)] +krk)δβ C 1 k) + C 1 k)βδ Rk)]H q+2 χ 2 qα); 2 ], R 3 ˆβ P T n k); W ) = σ zz trw Rk)C 1 Rk) ) ] σ zz trw Rk)ARk) )]H q+2 χ 2 qα); 2 ] +δ Rk)W Rk)δ]2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ]] +k 2 β C 1 k)w R 1 k)β] +2kδ Rk) W C 1 k)β]h q+2 χ 2 qα); 2 ]. b 4 ˆβ S nk)) = kc 1 k)β + q 2)Rk)δE χ 2 q+2 2 ) ], M 4 ˆβ S nk)) = σ zz Rk)Σ 1 XX Rk) ] σ zz q 2) Rk)ARk) ] { 2E χ 2 q+2 2 ) ] q 2)E χ 4 q+2 2 ) ]} + q 2 4) Rk)δδ Rk) ] E χ 4 q+4 2 ) ] +k 2 C 1 k)ββ C 1 k) +k Rk)δβ C 1 k) + C 1 k)βδ Rk) ] E χ 2 q+2 2 ) ], R 4 ˆβ S nk); W ) = σ zz tr W Rk)Σ 1 XX Rk) ]) σ zz q 2)tr W Rk)ARk) ]) { 2E χ 2 q+2 2 ) q 2)χ 4 q+2 2 ) ]} +q 2 4)tr W Rk)δδ Rk) )] E χ 4 q+4 2 ) ] +k 2 β C 1 k)w C 1 k)β + k tr { W Rk)δβ C 1 k) + C 1 k)βδ Rk) ]} E χ 2 q+2 2 ) ]. 12

14 v) b 5 ˆβ S+ n k)) = b 4 ˆβ S nk)) M 5 ˆβ S+ n k)) = M 4 ˆβ S nk)) Rk)δ { H q+2 q 2)E χ 2 q+2 2 )I χ 2 q+2 2 ) < q 2) )]}, σ zz Rk)ARk) ] E 1 q 2)χ 2 q+2 2 ) ) 2 I χ 2 q+2 2 ) < q 2) )] Rk)δδ Rk) ] { 2E 1 q 2)χ 2 q+2 2 ) ) I χ 2 q+2 2 ) < q 2) )] 1 E q 2)χ 2 q+4 2 ) ) 2 I χ 2 q+4 2 ) < q 2) )]} { Rk)δβ C 1 k) + C 1 k)βδ Rk) } E 1 q 2)χ 2 q+2 2 ) ) I χ 2 q+4 2 ) < q 2) )], R 5 ˆβ S+ n k); W ) = R 4 ˆβ S nk); W ) { 1 σ zz tr Rk)ARk) ) E q 2)χ 2 q+2 2 ) ) 2 I χ 2 q+2 2 ) < q 2) )]} + δ W 2 δ ) { 1 2E q 2)χ 2 q+2 2 ) ) 2 I χ 2 q+2 2 ) < q 2) )] 1 E q 2)χ 2 q+4 2 ) ) 2 I χ 2 q+4 2 ) < q 2) )]} +2kδ Rk) W C 1 k)βe 1 q 2)χ 2 q+2 2 ) ) I χ 2 q+2 2 ) < q 2) )]. 4 Comparison of Estimators of β First note that the asymptotic covariance matrix of unrestricted estimator of β is σ zz C 1 where C = K xxσ XX K xx is positive definite matrix. Thus we can find an orthogonal matrix Γ such that Γ K xxσ XX K xx ) Γ = Γ CΓ = diagλ 1, λ 2,..., λ p ), 4.1) where λ 1 λ 2..., λ p > 0 are the characteristic roots of the matrix K xxσ XX K xx ). It is easy to see that the characteristic roots of I p + k K xxσ XX K xx ) 1] 1 K xxσ XX K xx ) + ki p ] = R 1 k) are λ 1 λ 1 + k) 2, λ 2 λ 2 + k) 2,..., λ p λ p + k) 2 = Rk) and of ) and λ 1 + k, λ 2 + k,..., λ p + k) 4.2) 13

15 respectively. Hence we obtain the following identities: trrk)c 1 Rk) ] = tr Rk) K xxσ ] XX K xx ) 1 Rk) = β R 2 k)β = trrk)ark) ] = λ i λ i + k), 4.3) θ 2 i λ i + k) 2, θ = Γ β = θ 1, θ 2,..., θ p ), 4.4) a iiλ 2 i λ i + k) 2 4.5) where a ii 0 is the diagonal matrix of A = Γ AΓ and tr δ Rk) λ i δi 2 Rk)δ] = 4.6) λ i + k) 2 where δ i is the i th element of δ = δ Γ. Similarly ] tr δ Rk) K xxσ XX K xx + ki p ) 1 Rk)δ = θ i λ i δ i λ i + k) ) 4.1 Comparison of β n k) and β n : In this case we have the risk of β n k) as λ i R β n k); I p ) = σ zz λ i + k) + 2 k2 θ 2 i λ i + k) ) Clearly for k = 0, the risk equals that the risk of β n. Note that the first term of 4.8) is a continuous, monotonically decreasing function of k and its derivative with respect to k approaches as k 0 + and λ p 0. The second term is also a continuous monotonically increasing function of k and its derivative with respective k tends to zero as k 0 + and the second term approaches ββ as k. Differentiating with respect to k, we get R β n k); I p ) λ i = 2 k λ i + k) 3 kθ2 i σ zz ). 4.9) Thus a sufficient condition for 4.9) to be negative is that 0 < k < k 0 where where θ max = Largest element of θ and θ = θ 1, θ 2,..., θ p ). k 0 = σ zz θ max, 4.10) Thus R 1 β n ; I p ) R 1 β n k); I p ) for 0 < k k 0. 14

16 4.2 Comparison of ˆβ n k) and ˆβ n : In this case, assume Hβ = h, then R 2 ˆβ n k); I p ) = σ zz = σ zz λ i λ i + k) 2 σ zz λ i a ii) λ i + k) + 2 where a ii 0 is the diagonal element of δ = Γ δ. a ii λ i + k) 2 + k2 θ 2 i 1 + λ i k θ 2 i λ i + k) 2 ) ) Differentiating with respect to k, we have R 2 ˆβ n k); I p ) kθi 2 λ i σ zz λ i a = 2 ii)]. 4.12) k λ i + k) 2 It is obvious that a sufficient condition for the derivative 4.12) is negative is that 0 < k < k 1 where k1 = σ zz min 1 i p λ i a ii) max 1 j p θi 2λ. 4.13) i) It is clear that k 1 k 0 since λ i a ii λ i 1 for all a ii 0 and λ i > 0, i = 1, 2,... p. Hence the range of values of k for the dominance of ˆβ n k) over ˆβ n is smaller than the dominance of β n k) over β n. Thus, one concludes that the restricted ridge regression estimator has smaller risk value than the unrestricted ridge regression estimator under Hβ = h. Now under the alternative hypothesis Hβ h, we may write 1 ] R 2 ˆβ n k); I p ) = σzz λ λ i + k) 2 i a ii) + k 2 θi 2 + λ 2 i δi 2 + 2kθ i δi λ i. 4.14) Thus differentiating 4.14) with respect to k, we obtain a sufficient condition for R 2 ˆβ n k); I p )/ k to be negative as k 0, k 2) where k 2 = min 1 i p λ i a ii) λ 2 i δ i θ i δ i )] max 1 i p λ i θ i θ i δ i ). 4.15) Thus a sufficient condition for the restricted ridge regression estimator to have smaller risk value than the unrestricted ridge regression estimator is that there exists a value of k such that 0 < k < k 1 where k 1 is given by i ] k 1 = min 1 i p σ zz a ii λ 2 i δ 2 max 1 i p 2θ i δi λ. 4.16) i) 15

17 We conclude that R 1 β n k); I p ) R 2 ˆβ n k); I p ) 0 for all k such that 0 < k < k 1, since R 1 β n k); I p ) R 2 ˆβ n k); I p ) = σ zz a ii λ 2 i δ 2 i 2kθ i λ i δ i ) λ i + k) Comparison of ˆβ P T n k) and ˆβ P T n : Under H 0 : Hβ = h, we obtain R 3 ˆβ P T n k); I p ) = σ zz Then, the derivative with respect to k, we have R 3 ˆβ P T n k); I p ) k = 2 λ i a iih q+2 χ 2 qα); 0)] λ i + k) 2 + k 2 θ 2 i λ i + k) ) 1 λ i + k) 2 { kθ 2 i λ i σ zz λi a iih q+2 χ 2 qα); 0) ]}. 4.18) A sufficient condition for 4.18) to be negative is that k 0, k3) where k3α) = σ zz min 1 i p λi a iih q+2 χ 2 qα); 0) ] max 1 i p θi 2λ, 0 α ) i) Note that if α = 0, then R 1 β n k); I p ) < R 1 β n ; I p ) under H 0 : Hβ = h while for α = 1, R 1 ˆβ n k); I p ) < R 2 ˆβ n ; I p ). When the null hypothesis does not hold, we consider the risk difference R 3 ˆβ P T n ; I p ) R 3 ˆβ P T n k); I p ) = σ zz tr C 1 Rk)C 1 Rk) ] 4.20) The R.H.S. of 4.20) 0 whenever δ I p Rk) 2 ]δ σ zz tr A Rk)ARk) ] H q+2 χ 2 qα); 2 ] +δ I p Rk) 2] δ { 2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] } 2k δ Rk) C 1 Rk)β ] H q χ 2 qα); 2 ) k 2 β C 2 k)β. f 1 k, α, 2 ) 2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ]], 4.21) 16

18 where f 1 k, α, 2 ) = σ zz tr A Rk)ARk) ] H q+2 χ 2 qα); 2 ] 4.22) σ zz tr C 1 Rk)C 1 Rk) ] +2kH q+2 χ 2 qα); 2 ]δ Rk) C 1 k)β + k 2 β C 2 k)β. Thus it can be shown by Courant-Fisher theorem Rao 1973), pp )) that ˆβ P T n k) is superior to ˆβ P T n with respect to the criterion of risk, whenever 2 1α) 2 <, where 2 1α) = f 1 k, α, 2 ) Ch max {I R 2 k)} C 1 ] 2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2] 4.23) and Ch max M) is the largest characteristic root of the matrix M. follows: Now we consider the R 3 ˆβ P T n k), I p ) which as a function of eigenvalues and k is given as R 3 ˆβ P T n k), I p ) = Differentiating it with respect to k, we obtain 1 λ i + k) 2 σzz { λi a iih q+2 χ 2 qα); 2 ] } + k 2 θ 2 i 4.24) +2kθ i λ i δi H q+2 χ 2 qα); 2 ] { +λ 2 i δi 2 2Hq+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] }]. R 3 ˆβ P T n k), I p ) k = 2 1 λ i + k) 3 kλi θ i { θi δ i H q+2 χ 2 qα); 2 ] } 4.25) { σ zz λ i a iih q+2 χ 2 qα); 2 ]) { +λ 2 i δi 2 2Hq+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] } θ i λ 2 i δ i H q+2 χ 2 qα); 2 ] }]. Hence we find a sufficient condition on k such that 0 < k < k 2 α, 2 ) so that the derivative R 3 ˆβ P T n k), I p )/ k is negative where k 2 α, 2 ) is defined by where k 2 α, 2 ) = f 2α, 2 ) g 1 α, 2 ) f 2 α, 2 ) = min σzz λi a iih q+2 χ 2 qα); 2 ] ) { + λ 2 i δi 2 2Hq+2 χ 2 qα); 2 ] 1 i p 4.26) H q+4 χ 2 qα); 2 ] } θ i λ 2 i δ i H q+2 χ 2 qα); 2 ] ], 4.27) 17

19 and { g 1 α, 2 ) = max λi θ i θi δi H q+2 χ 2 qα); 2 ] }]. 4.28) 1 i p Suppose k > 0, then the following statements hold true following Kaciranlar, Sakallioglu, Akdeniz, Styan and Werner 1999): 1. If g 1 α, 2 ) > 0, it follows that for each k > 0 with k < k 2 α, 2 ), ˆβ P T n k) has smaller risk than that of ˆβ P T n. 2. If g 1 α, 2 ) < 0, it follows that for each k > 0 with k > k 2 α, 2 ), ˆβ P T n k) has smaller risk than that of ˆβ P T n. 4.4 Comparison of ˆβ P T n k), β n k) and ˆβ n k): Consider first the comparison of ˆβ P T n k) and ˆβ n k). Under the alternative hypothesis Hβ h, the difference between the risks of ˆβ P T n k) and ˆβ n k) is R 3 ˆβ P T n k) : I p ) R 2 ˆβ n k); I p ) 4.29) = σ zz tr Rk)ARk) ] 1 H q+2 χ 2 qα); 2 ] ] δ Rk) Rk)δ) 1 2H q+2 χ 2 qα); 2 ] + H q+4 χ 2 qα); 2 ] ] 2kδ Rk) R 1 k)β 1 H q+2 χ 2 qα); 2 ] ]. The expression is nonnegative whenever δrk) Rk)δ 4.30) σ zz tr Rk)ARk) ] 1 H q+2 χ 2 qα); 2 ] ] 2kδ Rk) C 1 k)β 1 H q+2 χ 2 qα); 2 ] ] 1 2Hq+2 χ 2 qα); 2 ] + H q+4 χ 2 qα); 2 ] ]. Using the Courant-Fisher theorem, we obtain that the right hand side of 4.30) is nonnegative according as 2 2α, k) {σ zztr Rk)ARk) ] 2kδ Rk) C 1 k)β} 1 H q+2 χ 2 qα); 2 ] ] Ch max Rk) Rk)C 1 ) ]. 4.31) 1 2H q+2 χ 2 qα); 2 ] + H q+4 χ 2 qα); 2 ] 18

20 Thus ˆβ P T n k) is dominated by ˆβ n k) whenever 2 0, 2 2α, k)). When α = 1, then ˆβ P T n k) is dominated by ˆβ n k) whenever 2 0, 2 31)), where 2 31) = {σ zztr Rk)ARk) ] 2kδ Rk) C 1 k)β}. 4.32) Ch max Rk) Rk)C 1 ) Rewriting the expression 4.30) in terms of eigenvalues and k, we obtain R 3 ˆβ P T n k); I p ) R 2 ˆβ n k); I p ) = 1 σzz a λ i + k) 2 ii 1 Hq+2 χ 2 qα); 2 ] ] λ 2 i δi 2 1 2Hq+2 χ 2 qα); 2 ] + H q+4 χ 2 qα); 2 ] ] 2kθ i λ i δ i 1 Hq+2 χ 2 qα); 2 ] ]] and the right hand side is negative when k 3 2, α) = { max 1 i p σzz a ii 1 Hq+2 χ 2 qα); 2 ] ] λ 2 i δi 2 1 2Hq+2 χ 2 qα); 2 ] + H q+4 χ 2 qα); 2 ] ]]} { min 1 i p 2θi λ i δi 1 Hq+2 χ 2 qα); 2 ] ]}. Thus ˆβ P T n k) dominates ˆβ n k) whenever k 3 2, α) < k, otherwise the reverse holds true. For α = 1, we find that ˆβ P T n k) is dominated by ˆβ n k) when k 4 2, 1) = max 1 i p σ zz a ii λ 2 i δ 2 i ] min 1 i p 2θ i λ i δ i ). 4.33) Under the null hypothesis H 0 : Hβ = h, ˆβn k) is superior to ˆβ P T n k) since the risk difference equals 1 λ i + k) 2 σzz a ii{1 H q+2 χ 2 qα); 0]} ] 0. Now we consider the comparison between the risks of ˆβ P T n k) and β n k) as follows: R 3 ˆβ P T n ; I p ) R 1 β n k); I p ) = σ zz tr Rk)ARk) ] H q+2 χ 2 qα); 2 ] δ Rk) Rk)δ] 2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] ] 2k δ Rk) C 1 k)β ] H q+2 χ 2 qα); 2 ]. The expression on the right hand side is nonnegative whenever δ Rk) Rk)δ {σ zztr Rk)ARk) ] 2kδ Rk) C 1 k)β} H q+2 χ 2 qα); 2 ] 2Hq+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] ]. 4.34) 19

21 The use of Courant-Fisher theorem once again yields that 4.34) is nonnegative according to 2 4α, k) {σ zztr Rk)ARk) ] 2kδ Rk) C 1 k)β} H q+2 χ 2 qα); 2 ] Ch max Rk) Rk)C 1 ) ]. 4.35) 2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] Thus ˆβ P T n k) is dominated by ˆβ n k) whenever 2 0, 2 4α, k)). Rewriting the expression in 4.34) in terms of eigenvalues and k, we obtain R 3 ˆβ P T 1 n k); I p ) R 1 β n k); I p ) = σzz λ 2 λ i + k) 2 i a iih q+2 χ 2 qα); 2 ] 4.36) λ 2 i δi 2 2Hq+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] ] 2kθ i λ i δi H q+2 χ 2 qα); 2 ] ] and the right hand side is positive meaning thereby that ˆβ P T n k) dominates β n k) when k 5 2, α) < k, where k 5 2, α) = 4.37) { max 1 i p σzz λ 2 i a iih q+2 χ 2 qα); 2 ] λ 2 i δi 2 2Hq+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] ]]} { min 1 i p 2kθi λ i δi H q+2χ 2 qα); 2 ] } and β n k) dominates ˆβ P T n k) whenever k 5 α, 2 ) > k. Now we consider the relative efficiency RE) of ˆβ P T n k) compared with β n k). Accordingly, we provide a maximum and minimum Max and Min) rule for the optimum choice of the level of significance of the ˆβ P T n k) for testing the null hypothesis H 0 : Hβ = h,. For fixed value of kk > 0), this RE is a function of α and 2. Let us denote this by Eα,, k) = R β R ] n k); I p ) f 14 α, k, 2 1 ) = 1 Rˆβ P T n k); I p ) σ zz trrk)ark) ] + k 2 β 4.38) C 2 k) β where f 14 α, k, 2 ) = σ zz trrk)ark) ]H q+2 χ 2 qα); 2 ] 4.39) δ Rk) Rk)δ { 2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] } 2kH q+2 χ 2 qα); 2 ]δ Rk) C 1 k)β. For a given k, the function Eα, 2, k), is a function of α and 2. This function for α 0 has its maximum under the null hypothesis with following value, E max α, 0, k) = 1 trrk)ark) ]H q+2 χ 2 ] 1 qα); 0] trrk)ark) ] + k 2 β. 4.40) C 2 k) β 20

22 For given k, E max α, 0, k) is a decreasing function of α. While, the minimum efficiency E min is an increasing function of α. For α 0, as 2 varies the graphs of E0,, k) and E1,, k) intersects in the range 0 < 2 < 2 2α, k), which is given in 4.32). Therefore, in order to choose an estimator with optimum relative efficiency, we adopt the following rule for fixed values of k. If 0 < 2 < 2 2α, k), we choose ˆβ R n k) since E0,, k) is the largest in this interval. However, 2 is unknown and there is no way of choosing a uniformly best estimator. Therefore, following Saleh 2006), we will use the following criterion for selecting the significance level of the preliminary test. Suppose the experimenter does not know the size of α and wants an estimator which has relative efficiency not less than E min. Then among the set of estimators with α A, where A = {α : Eα,, k) E min for all }, the estimator is chosen to maximizes Eα,, k) over all α A and all 2. Thus we solve for α from the following equation. max min Eα,, k) = E min. 4.41) 0 α 1 2 The solution α for 4.41) gives the optimum choice of α and the value of 2 = 2 min for which 4.40) is satisfied. The maximum and minimum guaranteed efficiencies for p = 6, q = 4, σ zz = 1 and different values of k is presented in Table 1. We observe that the maximum guaranteed efficiency is decreasing function of α the size of the test where as the minimum guaranteed efficiency is a increasing function of α. Suppose for example, p = 6, q = 4, σ zz = 1 and k = 0.01 and the experimenter wishes to have an estimator with a minimum guaranteed efficiency of Note that when p = 6, q = 4, σ zz = 1 and k = 0.50, the size of α is 0.15 with a minimum guaranteed efficiency and maximum efficiency of Thus with these conditions, we choose α = min0.05, 0.15) = 0.05 which corresponds to k = Note that when p = 6, q = 4, k = 0.01 and σ zz = 1, the size of α is 0.50 with a minimum guaranteed efficiency and maximum efficiency of Thus with these conditions, we choose α = min0.01, 0.50) = 0.01 which corresponds to σ zz = 2. 21

23 Table 1: Maximum and minimum guaranteed efficiency of PTRRRE for p = 6, q = 4 and for selected values of k, σ zz and α. k σ zz α 1% 5% 10% 15% 20% 25% 30% 50% E Max E Min Min E Max E Min Min E Max E Min Min E Max E Min Min E Max E Min Min E Max E Min Min E Max E Min Min E Max E Min Min E Max E Min Min E Max E Min Min E Max E Min Min E Max E Min Min

24 4.5 Comparison of ˆβ S nk) with ˆβ S n, β n k) and ˆβ P T n k)): Comparison of ˆβ S nk) and ˆβ S n: Case 1: Under the null hypothesis H 0 : Hβ = h Using Theorem 3, the risk function of ˆβ S nk) can be expressed as R 4 ˆβ S nk); I p ) = Differentiating 4.42) with respect to k gives R 4 ˆβ S nk); I p ) k = 2 1 ] σzz λ λ i + k) 2 i q 2)a ii) + k 2 θi ) 1 λ i + k) 3 kθ 2 i σ zz λ i q 2)a ii) ]. 4.43) A sufficient condition for 4.43) to be negative is that k 0, k 6 ) where Thus ˆβ S nk) dominates ˆβ S n for k 0, k 6 ). k 6 = σ zz min 1 i p λ i q 2)a ii]. 4.44) max 1 i p θi 2 Case 2: Under the alternative hypothesis H 1 : Hβ h Using Theorem 3 again, the risk function of ˆβ S nk) can be expressed as R 4 ˆβ S nk); I p ) 4.45) 1 = λ i + k) 2 { ) ]} σ zz λ i q 2)a ii q 2)Eχ 4 q+2 2 )δ 2 i )] )Eχ σ zz a q+4 2 )] ] ii +2q 2)kθ i λ i δi Eχ 4 q+2 2 )] + k 2 θi 2. Differentiating 4.45) with respect to k gives R 4 ˆβ S nk); I p ) k = 2 1 { kλi θ λ i + k) 3 i θi q 2)δi Eχ 4 q+2 2 )] ] 4.46) q 2)Eχ 4 { σ zz λi q 2)a ii )δ 2 i σ zz a ii q 2)θ i λ 2 i δi Eχ 4 q+2 2 )] }. q+2 2 )] ) 2 2 )Eχ 4 q+4 2 )] ]} 23

25 where and A sufficient condition for 4.46) to be negative is that k < k 7 2 ) where k 7 2 ) = f 3 2 ) g 2 2 ), 4.47) { { f 3 2 ) = min σzz λi q 2)σ zz a ii q 2)Eχ 4 q+2 2 )] 4.48) 1 i p ) ]} )δ 2 i )Eχ σ zz a q+4 2 )] ii +q 2)θ i λ 2 i δi Eχ 2 q+2 2 )] } { g 2 2 ) = max λi θ i θi q 2)δi Eχ 4 q+2 2 )] }]. 4.49) 1 i p Suppose k > 0, then the following statements hold true following Kaciranlar, Sakallioglu, Akdeniz, Styan and Werner 1999): 1. If g 2 2 ) > 0, it follows that for each k > 0 with k < k 7 2 ), ˆβ S nk) has smaller risk than that of ˆβ S n. 2. If g 2 ) < 0, it follows that for each k > 0 with k > k 7 ), ˆβ S nk) has smaller risk than that of ˆβ S n. To find a sufficient condition on 2, consider the difference in the risks of ˆβ S nk) and ˆβ S n as Rˆβ S nk); I p ) Rˆβ S n; I p ) = σ zz trrk)c 1 Rk) ) trc 1 ) ] 4.50) +q 2)σ zz tra) trrk)ark) )] { 2Eχ 2 q+2 2 )] q 2)Eχ 4 q+2 2 )] } q 2 4)δ I p Rk) Rk)]δEχ 4 q+4 2 )] + k 2 β C 2 k)β +2q 2)kδ Rk) C 1 k)βeχ 2 q+2 2 )]. The difference will be non-positive when δ I p Rk) Rk)]δ 24 f 4 2 ) q 2 4)Eχ 4 q+4 2 )], 4.51)

26 where f 4 2 ) = σ zz trrk)c 1 Rk) ) trc 1 )] +q 2)σ zz tra) trrk)ark) )] { 2Eχ 2 q+2 2 )] q 2)Eχ 4 q+2 2 )] } +k 2 β C 2 k)β + 2q 2)kδ Rk) C 1 k)βeχ 2 q+2 2 )]. Since 2 > 0, we assume that the numerator of 4.51) is positive. Then the ˆβ S n is superior to ˆβ S nk) when 2 f 4 2 ) q 2 4)Ch max I p Rk) Rk))C 1 ]Eχ 4 q+4 2 )] = 2 5k), say where Ch max M) is the maximum characteristic root of the matrix M). However, ˆβ S n is superior to ˆβ S nk) when 2 < f 4 2 ) q 2 4)Ch min I p Rk) Rk))C 1 ]Eχ 4 q+4 2 )], where Ch min M) is the minimum characteristic root of the matrix M). 4.6 Comparison of ˆβ S nk) with β n k) and ˆβ n k): Comparison of ˆβ S nk) and β n k): First we compare ˆβ S nk) and β n k). Consider the difference in the risks of ˆβ S nk) and β n k) as R 1 β n k); I p ) R 4 ˆβ S k); I p ) = 4.52) 1 { q 2)σzz λ 2 λ i + k) 2 i a ii q 2)Eχ 4 q+2 2 )] ) ]] )δ 2 i )Eχ σ zz a q+4 2 )] ii 2q 2)kθ i λ i δi Eχ 2 q+2 2 )] }. 25

27 where where A sufficient condition for the risk difference to be non-negative whenever 0 < k < k 8 2 ) k 8 2 ) = f 5 2 ) g 3 2 ), 4.53) { f 5 2 ) = min σzz a iiλ 2 i q 2)Eχ 4 q+2 2 )]] 4.54) 1 i p ) } )δ 2 i )Eχ σ zz a q+4 2 )], ii and g 3 2 ) = max 2θi λ i δi Eχ 2 q+2 2 )] ]. 4.55) 1 i p Suppose k > 0, then the following statements hold true following Kaciranlar, Sakallioglu, Akdeniz, Styan and Werner 1999): 1. If g 3 2 ) > 0, it follows that for each k > 0 with k < k 8 2 ), ˆβ S nk) has smaller risk than that of β n k). 2. If g 3 2 ) < 0, it follows that for each k > 0 with k > k 8 2 ), ˆβ S nk) has smaller risk than that of β n k). Note that the risk difference 4.52) under H 0 : Hβ = h is q 2)σ zz λ 2 i a ii qλ i + k) 2 0. Therefore ˆβ S nk) always dominates β n k) under the null hypothesis for q 3. 26

28 4.6.2 Comparison of ˆβ S nk) and ˆβ n k): Now consider the comparison of ˆβ S nk) and ˆβ n k). Consider the difference in the risks of ˆβ S nk) and ˆβ n k) R 4 ˆβ S nk); I p ) R 2 ˆβ n k); I p ) = 4.56) 1 { σzz λ 2 λ i + k) 2 i a ii 1 q 2) q 2)Eχ 4 q+2 2 )] ) ]] )δ 2 i )Eχ σ zz a q+4 2 )] ii λ 2 i δi 2 2kθ i λ i δi 1 q 2)Eχ 2 q+2 2 )]] }. where A sufficient condition for the difference to be non-negative is that 0 < k < k 9 2 ) where k 9 2 ) = f 6 2 ) g 4 2 ), 4.57) { { f 6 2 ) = max σzz a iiλ 2 i 1 q 2) q 2)Eχ 4 q+2 2 )] ] 4.58) 1 i p + 1 q + ) } } 2)λ2 i δi )Eχ σ zz a q+4 2 )] λ 2 i δi 2, ii with g 4 2 ) = min 2θi λ i δi 1 q 2)Eχ 2 q+2 2 )]] ]. 4.59) 1 i p Suppose k > 0, then the following statements hold true: 1. If g 4 2 ) > 0, it follows that for each k > 0 with k < k 9 2 ), ˆβ S nk) has smaller risk than that of ˆβ n k). 2. If g 4 2 ) < 0, it follows that for each k > 0 with k > k 9 2 ), ˆβ n k) has smaller risk than that of ˆβ S nk). Note that the risk difference 4.56) under H 0 : Hβ = h is 2σ zz q a ii λ i + k)

29 Thus ˆβ n k) is superior to ˆβ S nk) under H 0 : Hβ = h. Consider the difference in the risks of ˆβ S nk) and β P T n k) from Theorem 3iii) and Theorem 3iv) as R β S nk); I p ) Rˆβ P T n k); I p ) 4.60) 1 { = σzz a λ i + k) 2 ii Hq+2 χ 2 qα); 2 ] ) ]] q 2) q 2)Eχ 4 q+2 2 )δ 2 i )] )Eχ σ zz a q+4 2 )] ii λ 2 i δi 2 2Hq+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] ] 2kθ i λ i δi Hq+2 χ 2 qα); 2 ] q 2)Eχ 2 q+2 2 )] ]}. where We define k 10 α, 2 ) = f 8α, 2 ) g 5 α, 2 ), 4.61) { { f 8 α, 2 ) = max σzz a iiλ 2 i Hq+2 χ 2 qα); 2 ] q 2) q 2)Eχ 4 q+2 2 )] ] 4.62) 1 i p ) } )δ 2 i )Eχ σ zz a q+4 2 )] ii λ 2 i δi 2 2Hq+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] ]}. and g 5 2 ) = min 2θi λ i δi 2 Hq+2 χ 2 qα); 2 ] q 2)Eχ 4 q+2 2 )] ]]. 4.63) 1 i p Suppose k > 0, then the following statements hold true: 1. If g 5 2 ) > 0, it follows that for each k > 0 with k > k 10 α, 2 ), ˆβ S nk) has smaller risk than that of ˆβ P T n k). 2. If g 5 α, 2 ) < 0, it follows that for each k > 0 with k < k 10 α, 2 ), ˆβ S nk) has smaller risk than that of ˆβ P T n k). Remark: Now we obtain the conditions for the superiority of ˆβ S nk) over ˆβ n k) when α = 0 and of ˆβ S nk) over β n k) when α = 1. 28

30 The risk difference 4.60) under H 0 : Hβ = h reduces to )] 2H q+2 χ 2 q 2 qα); 0] σ q zz a ii. λ i + k) 2 Therefore the risk of ˆβ S nk) is smaller than the risk of ˆβ P T n k) when ] q 2 χ 2 qα) Hq+2 1, 0 q 4.64) where χ 2 qα) is the upper α-level critical value from the chi-square distribution with q degrees of freedom. Otherwise the risk of ˆβ P T n k) is smaller than the risk of ˆβ S nk). 4.7 Comparison of ˆβ S+ n k) with ˆβ S+ n, β n k), ˆβ n k), ˆβ P T k) and ˆβ S nk): n Comparison of S+ ˆβ n k) and ˆβ S nk): Case 1: Under the null hypothesis H 0 : Hβ = h The risk function of Rˆβ S nk); I p ) = S+ ˆβ n k)) can be expressed as 1 λ i + k) 2 σ zz a iie {σ zz λ i q 2 1 q 2 Hχ2 α); 0) Differentiating 4.65) with respect to k gives Rˆβ S+ n k); I p ) k = 2 E q ) ] a ii + k 2 θi ) ) 2 I kλ i θ 2i σ zz λ i q 2 q 1 λ i + k) 3 1 q 2 H 1 χ 2 α); 0) ] 2 I H q+2 χ 2 α); 0) < q 2 ) ]}. ) a iiλ 2 i a iiλ 2 i 4.66) H q+2 χ 2 α); 0) < q 2 A sufficient condition for 4.66) to be negative is that k 0, k 11 α)) where ) ]. k 11 α) = 4.67) ) ) ] 2 ] q 2 min 1 i p σ zz {1 } λ q i a ii λ i a q 2 iie 1 H 1 q+2 q+2χ 2 qα); 0) I H q+2 χ 2 qα); 0) < q 2 q+2. max 1 i p θi 2 29

31 Suppose the numerator of 4.67) is positive, then to the region k 0, k 11 α)). ˆβ S+ n k) dominates Case 2: Under the alternative hypothesis H 1 : Hβ h S+ ˆβ n when k > 0 belongs The risk function of S+ ˆβ n k) can be expressed as Rˆβ S+ n k); I p ) = Rˆβ S nk); I p ) 4.68) {{ 1 σ λ i + k) 2 zz a iiλ 2 i E 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+2 2 ) q 2 ) ] +λ 2 i δi 2 E 1 q 2 2 q + 4 χ 2 q+4 )) 2 I χ 2 q+4 2 ) q 2 ) ]} q + 4 ) )] q 2 +2λ 2 i δi 2 E +2kθ i λ i δ i E χ 2 q+2 2 ) 1 q 2 χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) q 2 ) I χ 2 q+2 2 ) q 2 )]} where Rˆβ S nk); I p ) is given by 4.46). Differentiating Rˆβ S+ n k); I p ) with respect to k, we obtain Differentiating 4.45) with respect to k gives Rˆβ S+ n k); I p ) k 1 +2 λ i + k) 3 +λ 2 i δi 2 E θ i 2δ i )λ 2 i δ i E +kθ i λ i δ i E = Rˆβ S nk); I p ) {{ k σ zz a iiλ 2 i E 1 q 2 2 q + 4 χ 2 q+4 )) 2 I q 2 q + 4 χ 2 q 2 χ 2 q+2 2 ) 1 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+2 2 ) q 2 ) ]} ) q+2 2 ) 1 ) χ 2 q+4 2 ) q 2 q + 4 where Rˆβ S nk); I p )/ k is given by 4.46). We define I χ 2 q+2 2 ) q 2 q + 4 )]} I χ 2 q+2 2 ) q 2 )] ) ] k 12 2 ) = f 9 2 ) g 6 2 ), 4.69) 30

32 where { { f 9 α, 2 ) = min σzz λi q 2)a ii 1 i p + 1 )λ iδi 2 2σ zz 2 a ii 1 q 2 a iie λ 2 i δi 2 E q 2)Eχ 2 q+2 2 )] ] 4.70) ) 2 2 )Eχ 4 q+4 2 )] 2 χ 2 q+2 )) 2 I 1 q 2 2 q + 4 χ 2 q+4 )) 2 I χ 2 q+2 2 ) q 2 ) ] χ 2 q+4 2 ) q 2 q + 4 ) q 2 +θ i 2δi )λ 2 i δi E χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) q 2 )] +dqθ i δi λ 2 i Eχ 2 q+2 2 )] }} 4.71) ) ] and { ) q 2 g 6 α, 2 ) = max λ i θ i θ i + δi E 1 i p χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) q 2 ) q 2)δ i E χ 2 q+2 2 ) ]}]. 4.72) Suppose k > 0, then the following statements hold true following Kaciranlar, Sakallioglu, Akdeniz, Styan and Werner 1999): 1. If g 6 2 ) > 0, it follows that for each k > 0 with k < k 12 2 ), than that of ˆβ S+ n. S+ ˆβ n k) has smaller risk 2. If g 6 2 ) < 0, it follows that for each k > 0 with k > k 12 2 ), than that of S+ ˆβ n k). ˆβ S+ n has smaller risk To obtain a condition on 2, we consider the risk difference between β S+ n k) and β S+ n 31

33 as follows: Rˆβ S+ n k); I p ) Rˆβ S+ n ; I p ) 4.73) = σ zz trrk)c 1 Rk) ) trc 1 ) ] +σ zz tra) trrk) ARk))] { 2Eχ 2 q+2 2 )] q 2)Eχ 4 q+2 2 )] } +E 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+2 2 ) < q 2 ) ] { ) q 2 +2kδ Rk) C 1 k)β q 2) E χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) < q 2 )]} δ I p Rk) Rk)]δE 2 ) where E 2 ) = q 2) E 1 q 2 2 q + 4 χ 2 q+2 )) 2 I χ 2 q+4 2 ) < q 2 q + 4 ) q 2 2 χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) < q 2 ). ) ] where The right hand side of 4.73) is non-positive when δ I p Rk) Rk)]δ f 10 2 ) E 2 ) f 10 2, k) = σ zz trrk) C 1 Rk)) trc 1 ) ] +σ zz tra) trrk) ARk))] { 2Eχ 2 q+2 2 )] q 2)Eχ 4 q+2 2 )] } +E 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+2 2 ) < q 2 ) ] +2kδ Rk) C 1 k)β {q 2) ) q 2 E χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) < q 2 )]}. 4.74) Since 2 > 0, assume that both the numerator and the denominator of 4.74) are positive or negative respectively. Then and ˆβ S+ n dominates 2 > 2 9k) = S+ ˆβ n k) when 2 < 2 10k) = ˆβ S+ n k) dominates ˆβ S+ n when f 10 2, k) Ch max I p Rk) Rk)C 1 ]E 2 ) 4.75) f 10 2, k) Ch min I p Rk) Rk)C 1 ]E 2 ). 4.76) 32

34 4.7.2 Comparison of Since β S+ n β n k) and ˆβ S+ n k) with ˆβ P T k) and ˆβ S nk): n k) and ˆβ n k) are the particular cases of ˆβ P T n k), therefore the comparison between ˆβ S+ n k) as well as between ˆβ n k) and S+ ˆβ n k) can be skipped Comparison between ˆβ S+ n k) and ˆβ P T k): n Case 1: Under the null hypothesis H 0 : Hβ = h The risk difference is = Rˆβ S+ n k); I p ) Rˆβ P T n k); I p ) 4.77) σ zz a iiλ 2 { i Hq+2 χ 2 λ i + k) qα); 0] 2 q 2) E 1 q 2 H 1 q+2 χ 2 q α), 0 )) ] 2 I χ 2 q+2 q 2 ) ]} 0, for all α satisfying the condition { α : χ 2 qα) H 1 q+2 q 2 + E Thus the risk of ˆβ P T n k) is smaller than the risk of satisfies the condition 4.78). However the risk of 1 q 2 2 q+20)) χ 2 I H q+2 0) q 2 ) ])}. 4.78) ˆβ S+ n S+ ˆβ n k) when the critical value χ 2 qα) when the critical value χ 2 qα) satisfies 4.78) with reverse inequality sign. Case 2: Under the alternative hypothesis H 1 : Hβ h k) is smaller than the risk of ˆβ P T n k) 33

35 where and Under the alternative hypothesis, the difference in the risks of ˆβ S+ n k) and ˆβ P T k) is R β S+ n k); I p ) Rˆβ P T n k); I p ) 4.79) = σ zz trrk)ark) ] { H q+2 χ 2 qα); 2 ] q 2) 2Eχ 2 q+2 2 )] q 2)Eχ 4 q+2 2 )] ) E 1 q 2 χ 2 q+2 2 )) 2 I χ 2 q+2 2 ) < q 2 ) ]} δ Rk) Rk)δ { 2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] q 2) 2Eχ 2 q+2 2 )] 2Eχ 4 q+4 2 )] + q 2)Eχ 4 q+2 2 )] ) +E 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+2 < q 2 ) ] ) q 2 +2E χ 2 q+2 2 ) 1 I χ 2 q+2 < q 2 )]} +2kδ Rk) Ck) β { q 2)Eχ 2 q+2 2 )] ) q 2 H q+2 χ 2 qα); 2 ] E χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) < q 2 )]}. The right hand side of 4.79) will be non-positive when δ Rk) Rk)δ f 11α, 2 ) g 7 α, 2 ), 4.80) f 11 α, 2 ) = σ zz trrk)ark) ] { H q+2 χ 2 qα); 2 ] 4.81) q 2)2Eχ 2 q+2 2 )] q 2)Eχ 4 q+2 2 )] E 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+2 2 ) < q 2 ) ]} +2kδ Rk) Ck) β { q 2)Eχ 2 q+2 2 )] H q+2 χ 2 qα); 2 ] ) q 2 E χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) < q 2 )]} g 7 α, 2 ) = 2H q+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] q 2 2)Eχ 4 q+4 2 )] 4.82) +E 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+4 2 ) < q 2 ) ] ) q 2 +2E χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) < q 2 )]}. 34 n

36 Since 2 > 0, assume that both the numerator and the denominator of 4.80) are positive or negative respectively. Then ˆβ S+ n k) dominates ˆβ P T n k) when α, k) = f 11 α, 2 ) Ch max Rk) Rk)C 1 ]g 7 α, 2 ) 4.83) and ˆβ P T n k) dominates S+ ˆβ n k) when 2 < 2 12α, k) = f 11 2, k) Ch min Rk) Rk)C 1 ]g 7 α, 2 ). 4.84) Now consider the difference in the risk functions of eignevalues as follows: ˆβ S+ n k) and ˆβ P T k) as a function of n Rˆβ S+ n k); I p ) Rˆβ P T n k); I p ) 4.85) 1 {{ { = σzz a λ i + k) iiλ 2 2 i Hq+2 χ 2 qα); 2 ] q 2) q 2)Eχ 4 q+2 2 )] ) ]} )δ 2 i )Eχ σ zz a q+4 2 )] ii λ 2 i δi 2 2Hq+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] ] { σ zz a iiλ 2 i E 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+2 2 ) q 2 ) ] +λ 2 i δi 2 E 1 q 2 2 χ 2 q+4 )) 2 I χ 2 q+4 2 ) q 2 ) ]} ) )] q 2 2λ 2 i δi 2 E q + 4 χ 2 q+4 2 ) 1 I { 2kθ i λ i δi Hq+2 χ 2 qα); 2 ] q 2)Eχ 2 ) q 2 +E χ 2 q+2 2 ) 1 I χ 2 q+4 2 ) q 2 q + 4 q+2 2 )] χ 2 q+2 2 ) q 2 )]}. Now we define k 13 α, 2 ) = f 12α, 2 ) g 8 α, 2 ), 4.86) 35

37 where and { { f 12 α, 2 ) = max σzz a iiλ 2 i Hq+2 χ 2 qα); 2 ] q 2) q 2)Eχ 4 q+2 2 )] ] 4.87) 1 i p ) } )δ 2 i )Eχ σ zz a q+4 2 )] ii λ 2 i δi 2 2Hq+2 χ 2 qα); 2 ] H q+4 χ 2 qα); 2 ] ] { σ zz a iiλ 2 i E 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+2 2 ) q 2 ) ] +λ 2 i δi 2 E 1 q 2 2 q + 4 χ 2 q+4 )) 2 I χ 2 q+4 2 ) q 2 ) ]} q + 4 ) )]} q 2 2λ 2 i δi 2 E χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) q 2 { g 8 α, 2 ) = min 2λi θ i δi Hq+2 χ 2 qα); 2 ] q 2)Eχ 2 q+2 2 )] 4.88) 1 i p ) q 2 +E χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) q 2 )]}]. Suppose k > 0, then the following statements hold true following Kaciranlar, Sakallioglu, Akdeniz, Styan and Werner 1999): 1. If g 8 α, 2 ) > 0, it follows that for each k > 0 with k < k 13 α, 2 ), risk than that of ˆβ P T n k). 2. If g 8 α, 2 ) < 0, it follows that for each k > 0 with k > k 13 α, 2 ), risk than that of ˆβ P T n k). S+ ˆβ n k) has smaller S+ ˆβ n k) has smaller S+ Remark: For α = 0, we obtain the condition for the superiority of ˆβ n k) over ˆβ n k) and S+ for α = 1, we obtain the superiority condition of ˆβ n k) over β n k). 36

38 4.7.4 Comparison of S+ ˆβ n k) and ˆβ S nk): The risk difference of S+ ˆβ n k) and ˆβ S nk) is Rˆβ S+ n k); I p ) Rˆβ S nk); I p ) 4.89) { = σ zz trrk) ARk)]E 1 q 2 2 χ 2 q+2 )) 2 I χ 2 q+2 2 ) < q 2 ) ] +δ Rk) Rk)δ]E 1 q 2 2 q + 4 χ 2 q+4 )) 2 I χ 2 q+4 2 ) < q 2 ) ]} q + 4 ) q 2 2δ Rk) Rk)δ]E χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) < q 2 )] ) q 2 2kδ Rk) C 1 k)βe χ 2 q+2 2 ) 1 I χ 2 q+2 2 ) < q 2 )]. Case 1: Suppose δ Rk) C 1 k)β > 0, then the right hand side of 4.89) is negative, since the expectation of a positive random variable is positive. Thus for all 2 and k, Therefore under this condition, the Rˆβ S+ n k); I p ) Rˆβ S nk); I p ). S+ ˆβ n k) not only confirms the inadmissibility of ˆβ S nk) but also provides a simple superior estimator for the ill-conditioned data. Case 2: Suppose δ Rk) C 1 k)β < 0, then the right hand side of 4.89) is positive when where f 13 α, 2 ) = q 2 2kδ Rk) C 1 k)βe χ 2 σ zz trrk) ARk)]E δ Rk) Rk)δ f 13α, 2 ) g 9 α, 2 ), 4.90) ) q+2 2 ) 1 1 q 2 χ 2 q+2 2 )) 2 I I χ 2 q+2 2 ) < q 2 )] χ 2 q+2 2 ) < q 2 ) ] and g 9 α, 2 ) = { E 2E 1 q 2 2 χ 2 q+2 )) 2 I q 2 χ 2 q+2 2 ) 1 37 ) I χ 2 q+2 2 ) < q 2 χ 2 q+2 < q 2 ) ] )]}.

39 Since 2 > 0, assume that both the numerator and the denominator of 4.90) are positive or negative respectively. Then S+ ˆβ n k) dominates ˆβ S nk) when 2 > 2 13α, k) = f 13 α, 2 ) Ch max Rk) Rk)C 1 ]g 9 α, 2 ) 4.91) and ˆβ S nk) dominates S+ ˆβ n k) when 2 < 2 14α, k) = f 13 2, k) Ch min Rk) Rk)C 1 ]g 9 α, 2 ). 4.92) Thus, it is observed that the ˆβ P T n k) and ˆβ S nk). S+ ˆβ n k) does not uniformly dominates the β n k), ˆβ n k), 5 Summary and Conclusions In this paper, we have combined the idea of the preliminary test and the Stein-rule estimator with the RR approach to obtain a better estimator for the regression parameters β in a multiple measurement error model. Accordingly, we considered five RRR-estimators, namely, T β n k), ˆβn k), ˆβP n k), ˆβS nk) and S+ ˆβ n k) for estimating the parameters β) when it is suspected that the parameter β may belong to a linear subspace defined by Hβ = h. The performances of the estimators are compared based on the quadratic risk function under both null and alternative hypotheses. Under the restriction H 0, the ˆβ n k) performed the best compared with other estimators, however, it performed the worst even when 2 moves away from its origin. Note under the risk of β n k) is constant while the risk of ˆβ n k) is unbounded as 2 goes to. Also under H 0, the risk of ˆβ P T n k) is smaller than the risks of ˆβ S nk) and ˆβ S+ n k) for satisfying 4.78) for q 3. Thus, neither ˆβ P T n k) nor S+ ˆβ n k) nor ˆβ S S+ nk) dominate each other uniformly. Note that the application of ˆβ n k) and ˆβ S nk) is constrained by the requirement q 3, while ˆβ P T n k) does not need such a constraint. However, the choice of the level of significance of the test has a dramatic impact on the nature of the risk function for the ˆβ P T n k) estimator. Thus when q 3, one would use ˆβ S+ n k); otherwise one uses ˆβ P T n k) with some optimum size α. 38

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

Journal of Statistical Research 007, Vol. 4, No., pp. 5 Bangladesh ISSN 056-4 X ESTIMATION OF AUTOREGRESSIVE COEFFICIENT IN AN ARMA(, ) MODEL WITH VAGUE INFORMATION ON THE MA COMPONENT M. Ould Haye School