Selection of Model Selection Criteria for Multivariate Ridge Regression

Similar documents
An Unbiased C p Criterion for Multivariate Ridge Regression

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions

A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Bias-corrected AIC for selecting variables in Poisson regression models

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification

High-dimensional asymptotic expansions for the distributions of canonical correlations

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Edgeworth Expansions of Functions of the Sample Covariance Matrix with an Unknown Population

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Stochastic Design Criteria in Linear Models

CONDITIONS FOR ROBUSTNESS TO NONNORMALITY ON TEST STATISTICS IN A GMANOVA MODEL

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Ridge Regression and Ill-Conditioning

Improved Liu Estimators for the Poisson Regression Model

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

Supplement to the paper Accurate distributions of Mallows C p and its unbiased modifications with applications to shrinkage estimation

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

Variable selection in linear regression through adaptive penalty selection

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

A Modern Look at Classical Multivariate Techniques

Regularization: Ridge Regression and the LASSO

Multicollinearity and A Ridge Parameter Estimation Approach

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis

Statistical Inference On the High-dimensional Gaussian Covarianc

Iterative Selection Using Orthogonal Regression Techniques

A Practical Guide for Creating Monte Carlo Simulation Studies Using R

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Ridge Regression Revisited

EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan.

Modfiied Conditional AIC in Linear Mixed Models

ISyE 691 Data mining and analytics

Biostatistics Advanced Methods in Biostatistics IV

Introduction to Statistical modeling: handout for Math 489/583

Multivariate Regression (Chapter 10)

Journal of Asian Scientific Research COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY AND AUTOCORRELATION

MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete Observations

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

RIDGE REGRESSION REPRESENTATIONS OF THE GENERALIZED HODRICK-PRESCOTT FILTER

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Research Article An Unbiased Two-Parameter Estimation with Prior Information in Linear Regression Model

Model comparison and selection

Robust model selection criteria for robust S and LT S estimators

TAMS39 Lecture 2 Multivariate normal distribution

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

Focused fine-tuning of ridge regression

On corrections of classical multivariate tests for high-dimensional data

Inference For High Dimensional M-estimates. Fixed Design Results

Bias-corrected Estimators of Scalar Skew Normal

Measuring Local Influential Observations in Modified Ridge Regression

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Econ 582 Nonparametric Regression

Estimation of the large covariance matrix with two-step monotone missing data

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Asymptotic inference for a nonstationary double ar(1) model

Likelihood Ratio Tests in Multivariate Linear Model

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

The properties of L p -GMM estimators

Comparison of Estimators in GLM with Binary Data

Feature selection with high-dimensional data: criteria and Proc. Procedures

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION

Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs

ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS

Consistent Bivariate Distribution

On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly Unrelated Regression Model and a Heteroscedastic Model

CS 195-5: Machine Learning Problem Set 1

ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE

Transcription:

Selection of Model Selection Criteria for Multivariate Ridge Regression (Last Modified: January, 24) Isamu Nagai, Department of Mathematics, Graduate School of Science, Hiroshima University -3- Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8626, Japan E-mail: d09348@hiroshima-u.ac.jp Abstract In the present study, we consider the selection of model selection criteria for multivariate ridge regression. There are several model selection criteria for selecting the ridge parameter in multivariate ridge regression, e.g., the C p criterion and the modified C p (MC p ) criterion. We propose the generalized C p (GC p ) criterion, which includes C p and MC p criteria as special cases. The GC p criterion is specified by a non-negative parameter λ, which is referred to as the penalty parameter. We attempt to select an optimal penalty parameter such that the predictive mean square error (PMSE) of the predictor of ridge regression after optimizing the ridge parameter is minimized. Through numerical experiments, we verify that the proposed optimization methods exhibit better performance than conventional optimization methods, i.e., optimizing only the ridge parameter by minimizing the C p or MC p criterion. Key words: Asymptotic expansion; Generalized C p criterion; Model selection criterion; Multivariate linear regression model; Ridge regression; Selection of the model selection criterion.. Introduction In the present paper, we deal with a multivariate linear regression model with n observations of a p-dimensional vector of response variables and a k-dimensional vector of regressors (for more detailed information, see, for example, Srivastava, 2002, Chapter 9; Timm, 2002, Chapter 4). Let Y = (y,..., y n ), X, and E = (ε,..., ε n ) be the n p matrix of response variables, the n k matrix of non-stochastic centerized explanatory variables (i.e., X n = 0 k ) of rank(x) = k (< n), and the n p matrix of error variables, respectively, where n is the sample size, n is an n-dimensional vector of ones, and 0 k is a k-dimensional vector of zeros. Suppose that n k p 2 > 0 and ε,..., ε n i.i.d. N p (0 p, Σ), where Σ

is a p p unknown covariance matrix. Then, the matrix form of the multivariate linear regression model is expressed as Y = n µ + XΞ + E, where µ is a p-dimensional unknown location vector, and Ξ is a k p unknown regression coefficient matrix. This model can also be expressed as Y N n p ( n µ + XΞ, Σ I n ). Note that X is centerized. The maximum likelihood or the least squares (LS) estimators of µ and Ξ are given by ˆµ = Y n /n and ˆΞ = (X X) X Y, respectively. Since ˆµ and ˆΞ are simple, and the unbiased estimators of µ and Ξ, LS estimators are widely used in actual data analysis (see, e.g., Dien et al., 2006; Sârbu et al., 2008, Saxén and Sundell, 2006; Skagerberg, Macgregor, and Kiparissides, 992; Yoshimoto, Yanagihara, and Ninomiya, 2005). However, the problem of an estimator of Ξ becoming unstable when multicollinearity occurs in X is well known. In order to avoid this problem, a ridge regression was proposed by Hoerl and Kennard (970) when p =. Several studies extended this univariate ridge regression to the multivariate case, e.g., Brown and Zidek (980), Haitovsky (987), and Yanagihara and Satoh (200). The ridge regression estimator of Ξ is given as where M ˆΞ = M X Y, = X X + I k, and is a nonnegative value, which is referred to as a ridge parameter. Since an estimate of ˆΞ depends strongly on the value of the ridge parameter, the optimization of is an important problem in ridge regression. An optimal is commonly determined by minimizing the predicted mean square error (PMSE) of the predictor of Y, i.e., Ŷ = n ˆµ + X ˆΞ. However, we cannot directly use the PMSE to optimize, because unknown parameters are included in the PMSE. Hence, we adopt an optimization method using a model selection criterion, i.e., an estimator of PMSE, instead of the unknown PMSE. As an estimator of PMSE, Yanagihara and Satoh (200) proposed a C p criterion. This criterion includes C p criteria for selecting variables in a univariate linear model, which was proposed by Mallows (973; 995), for selecting variables in a multivariate linear model, which was proposed by Sparks, Coutsourides, and Troskie (983) as a special case. Yanagihara and Satoh (200) also proposed the modified C p (MC p ) criterion such that the bias of the C p criterion for choosing the ridge parameter for PMSE is completely corrected under a fixed. This criterion coincides with the bias-corrected C p criterion proposed by Fujikoshi and Satoh (997) when = 0. The MC p criterion has several 2

desirable properties as the estimator of PMSE as described by, e.g., Fujikoshi, Yanagihara, and Wakaki (2005), and Yanagihara and Satoh (200). Unfortunately, optimizing by minimizing MC p, i.e., an unbiased estimator of PMSE, does not always minimize the PMSE of Ŷ. This indicates that there will be an optimal model selection criterion for selecting. Thus, we propose a generalized C p (GC p ) criterion that includes the C p and MC p criteria as special cases (originally, the GC p criterion was proposed by Atkinson, 980, for selecting variables in the univariate linear model). The GC p criterion is specified by a non-negative parameter λ, which is referred to as the penalty parameter. From the viewpoint of making the PMSE of the predictor of Y after optimizing small, we select the optimal penalty parameter λ, which is basically the selection of the model selection criterion. In the present paper, we optimize λ by the following three methods: (Double optimization): We optimize and λ simultaneously by minimizing GC p and the penalty selection criteria, respectively. (Optimization of λ with an approximated value of an optimal ): We optimize λ by minimizing the penalty selection criterion made from the approximated value of optimal. (Asymptotic optimization of λ): We calculate an asymptotic optimal λ from an asymptotic expansion of the PMSE. We then estimate the asymptotic optimal λ. From the optimization of the model selection criterion, we will perform a reasonable optimization of. The remainder of the present paper is organized as follows: In Section 2, we propose the GC p criterion, which includes criteria proposed by Yanagihara and Satoh (200) as special cases. In Section 3, we propose three optimization methods for λ. In Section 4, we compare the optimization methods by conducting numerical studies. Finally, technical details are provided in the Appendix. 2. Generalized C p Criterion In this section, we propose the GC p criterion for optimizing the ridge parameter, which includes C p and MC p criteria proposed by Yanagihara and Satoh (200). Moreover, we present several mathematical properties of the optimal by minimizing the GC p criterion. The PMSE of Ŷ is defined as PMSE[Ŷ] = E Y [E U [tr{(u Ŷ) (U Ŷ)Σ }]], 3

where U is a random variable matrix that is independent of Y and has the same distribution as Y. The C p criterion proposed by Yanagihara and Satoh (200) is a rough estimator of the PMSE of Ŷ, which is defined by C p () = tr(w S ) + 2ptr(M M 0 ), where W is a residual sum of squares matrix defined by W = (Y Ŷ) (Y Ŷ), S is an unbiased estimator of Σ defined by S = W 0 /(n k ). From the definition of the C p criterion, the first term of C p measures the closeness of the ridge regression to the data, and the second term evaluates the penalty for the complexity of the ridge regression. However, the C p criterion has the bias to the PMSE. The MC p proposed by Yanagihara and Satoh (200) is an exact unbiased estimator of the PMSE. By neglecting terms that are independent of, MC p is defined as where c M MC p () = c M tr(w S ) + 2ptr(M M 0 ), = (p + )/(n k ). By comparing the two criteria, we can see that the difference between C p and MC p is a coefficient before tr(w S ). as Thus, we can generalize the model selection criterion for optimizing the ridge parameter GC p (, λ) = λtr(w S ) + 2ptr(M M 0 ), (2.) where λ is a non-negative parameter. Note that GC p (, ) = C p () and GC p (, c M ) = MC p (). In this criterion, the penalty for the complexity of the model, which is in the second term of (2.), becomes large when λ becomes small. This means that λ controls the penalty for the complexity of the model in the criterion (2.). Hence, we can regard λ as a penalty parameter. In the present paper, we consider the optimization of λ to obtain the optimal, which further reduces the PMSE. When λ is fixed, the optimized ridge parameter ˆ(λ) is obtained by ˆ(λ) = arg min GC p (, λ). (2.2) [0, ] Since ˆ(λ) is a minimizer of GC p (, λ), the following equation holds: GC p (, λ) = 0. (2.3) =ˆ(λ) Note that ˆ(λ) changes with λ. Here, we obtain the following mathematical properties of ˆ(λ) (The proof is provided in Appendix A..): 4

Theorem 2.. Let (z,..., z k ) = Q X Y S /2, (2.4) r λ,j = λ z j 2 pd j, (j =,..., k), (2.5) pd 2 j where z i is a p-dimensional vector, Q is a k k orthogonal matrix which diagonalizes X X, i.e., Q X XQ = D = diag(d,..., d k ) and d i (i =,..., k) are eigenvalues of X X, and r + λ, r+ λ,m (m k) are positive values of r λ,,..., r λ,k. Then, ˆ(λ) has the following properties:. ˆ(λ) is a monotonic decreasing function with respect to λ. 2. ˆ(λ) is not 0 when λ [0, ). 3. ˆ(λ) > (r + λ, ) when r + λ, exists. ˆ(λ) = when r + λ, does not exist, i.e., max j=,...,m r λ,j 0. 4. ˆ( ) = 0, ˆ(0) =. 5. ˆ(λ) = for any λ < min j=,...,k pd 2 j/ z j 2. We suppose that d i = O(n). However, we must use an iterative computational algorithm to optimize because we cannot obtain ˆ(λ) in closed form. In order to reduce the number of computational tasks, we consider approximating ˆ(λ) using an asymptotic expansion. Equation (2.3) implies asymptotic expansion of the GC p criterion. From this expansion, we obtain the asymptotic expansion of ˆ(λ) as the follows: Theorem 2.2. Here, ˆ(λ) can be expanded as ˆ(λ) = (L) (λ) + O p (n L ), where (L) (λ) = pb λa + L 2λa n l ( )l+ (l + ) (L )(λ){λ(l l + 2)a l+ (L ) (λ) 2pb l+ }. l=0 (2.6) Here, (0) (λ) = 0, V = X Y S Y X, a j = n j tr(v M (j+2) 0 ), and b j = n j tr(m j 0 ), and l (L) (λ) refers to { (L) (λ)} l. The proofs of this theorem are given in Appendix A.2. Note that (L) (λ) can be used as an approximated value of ˆ(λ). There is a one-to-one correspondence between () (λ) = pb /(λa ) and λ, and () (λ) satisfies the properties, 2, and 4 in Theorem 2.. 5

3. Optimization of Penalty in the GC p Criterion 3.. Double optimization of and λ In the previous section, we considered the model selection criterion for selecting, which can be regarded as an estimator of PMSE[Ŷ]. By minimizing the estimator of PMSE of Ŷ, we expect to reduce the PMSE of Ŷ. However, since the optimal ridge parameter will be changed by the data, it is important to reduce not the PMSE of Ŷ but rather the PMSE of Ŷˆ, i.e., the predictor of Y after optimizing. In this section, we consider optimizing λ using PMSE[Ŷˆ(λ) ], where Ŷˆ(λ) = n ˆµ + X ˆΞˆ(λ). Without a loss of generality, we can assume that the covariance matrix of y i is I p in the PMSE[Ŷˆ(λ) ]. Therefore, from Efron (2004), we obtain PMSE[Ŷˆ(λ) ], which is a function of λ, as follows: PMSE[Ŷˆ(λ) ] = E Y [tr(wˆ(λ) Σ )] + 2E Y [ n i= p j= (Ŷˆ(λ) ) ij where (A) ij are the (i, j)th elements of A. Since ˆ(λ) depends on (Y ) ij, we can see that (Ŷˆ(λ) ) ij = (Ŷ) ij + (Ŷ) ij ˆ(λ). (3.) =ˆ(λ) =ˆ(λ) The first term of the above equation is calculated as (Ŷ) ij = ( n ˆµ ) ij (XM X Y ) ij + =ˆ(λ) =ˆ(λ) ], = ( n ˆµ ) ij + (XM ˆ(λ) X ) ii. Note that n p i= j= ( n ˆµ ) ij / = p and n p i= j= (XMˆ(λ) X ) ii = ptr(m ˆ(λ) M 0). Next, we consider obtaining the second term of (3.). Note that (Ŷ) ij = (XM X Y ) ij = (XM 2 X Y ) ij. Hence, we derive PMSE[Ŷˆ(λ) ] = E Y [tr(wˆ(λ) Σ )] + 2p{E Y [tr(m ˆ(λ) M 0)] + } [ n p 2E Y (XM 2 ˆ(λ) X Y ) ˆ(λ) ] ij. (Y ) i= ij j= (3.2) Based on this result, we need only obtain ˆ(λ)/ in order to calculate PMSE[Ŷˆ(λ) ]. This derivative leads to the following theorem (The proofs are given in Appendix A.3.): 6

Theorem 3.. The PMSE of Ŷˆ(λ) is expressed as where PMSE[Ŷˆ(λ) ] = E Y [tr(wˆ(λ) Σ ) + 2p{tr(M ˆ(λ) M 0) + } + 4B(ˆ(λ))], (3.3) B() = λtr(v M 5 M 0 ) λtr(v M 3 ) 3λtr(V M 4 ) + 2ptr(M 3 M 0 ). By neglecting the terms that are independent of λ, we define the penalty selection criteria for optimizing λ as follows: Definition 3.. The penalty selection criteria to optimize λ are defined as C # p (λ) = tr(wˆ(λ) S ) + 2ptr(M ˆ(λ) M 0) + 4B(ˆ(λ)), MC # p (λ) = c M tr(wˆ(λ) S ) + 2ptr(M ˆ(λ) M 0) + 4B(ˆ(λ)), where ˆ(λ) is given by (2.2) and c M = (p + )/(n k ). Here, note that C # p (λ) is obtained by substituting S for Σ when we neglect the terms that are independent of λ in (3.3). However, there exists a bias because S is not an unbiased estimator of Σ (see, e.g., Siotani, Hayakawa, and Fujikoshi (985)). Based on the results reported by Yanagihara and Satoh (200), we will correct the bias of E Y [tr(wˆ(λ) S )] to E Y [tr(wˆ(λ) Σ )] under fixed ˆ(λ). Finally, we define MC # p (λ) by neglecting terms that are independent of and λ. Using these criteria, λ and are optimized as follows: ˆλ # C = arg min C p # (λ) and ˆ(ˆλ # C ) = arg min GC p (, ˆλ # C ), λ [0, ] [0, ] ˆλ # M = arg min MC p # (λ) and ˆ(ˆλ # M) = arg min GC p (, ˆλ # M). λ [0, ] [0, ] These optimization methods are similar to those reported by Ye (998) and Shen and Ye (2002). 3.2. Optimization of λ with approximated ˆ(λ) In the previous subsection, we proposed penalty selection criteria for selecting λ. These criteria are made from the optimal obtained by minimizing the GC p criterion. This indicates that we need to repeat the optimization of until obtaining the optimal λ. Hence, a number of computational tasks are required for such an optimization. In this subsection, we try to reduce the number of computational tasks by using the approximated ˆ(λ), which is given by (2.6). Thus, we propose the penalty selection criterion when the approximated ˆ(λ) is used. As such, we calculate (L) (λ)/. The following lemma is useful for obtaining such a derivative (The proof is provided in Appendix A.4.): 7

Lemma 3.. For any l, the first derivative of a l with respect to (Y ) ij is calculated as ( )) a l = 2n l S Y XM0 l 2 X (I n n k Y S Y H where H = I n n n/n XM 0 X. By using this lemma and (2.6), we obtain the following theorem:, ji Theorem 3.2. The PMSE of Ŷ (L) (λ) is expressed as where PMSE[Ŷ (L) (λ) ] = E Y [tr(w (L) (λ) Σ ) + 2p{tr(M (L) (λ) M 0) + }] + 2E Y [B ( (L) (λ))], B ( (L) (λ)) = 2n a () (λ)tr(m 2 0 M 2 (L) (λ) V ) + tr(m λa 2 0 2 M 2 i= V ) L (L) (λ) j= l=0 n l ( )l+ (l + ) l (L )(λ){λ(l + 2)a l+ (L ) (λ) 2pb l+ } L 2λa n l ( )l+ (l + ) l (L ) {λ(l + )(l + 2)a l+ (L ) (λ) 2plb l+ } l=0 n p (XM 2 (L) (λ) X Y ) (L ) ij n L ( ) l+ (l + )(l + 2) l+ a l=0 (L ) tr(m (l+2) 0 M 2 (L) (λ) V ). The proof of this theorem is presented in Appendix A.5. When () (λ) is used, we obtain PMSE[Ŷ () (λ) ] = E Y [tr(w () (λ) Σ ) + 2p{tr(M M () (λ) 0) + }] [ ] 4n () (λ) + E Y tr(m0 2 M 2 a V ). () (λ) Thus, by neglecting terms that are independent of λ, the penalty selection criteria with () (λ) are defined as follows: Definition 3.2. Penalty selection criteria to optimize λ when () (λ) is used are defined as follows: C p () (λ) = tr(w () (λ) S ) + 2ptr(M M () (λ) 0) + 4n () (λ)tr(m0 2 M 2 a V ), () (λ) MC p () (λ) = c M tr(w () (λ) S ) + 2ptr(M M () (λ) 0) + 4n () (λ)tr(m0 2 M 2 a V ), () (λ) where () (λ) = pb /(λa ) and c M = (p + )/(n k ). 8

Similar to MC p # (λ), MC p () (λ) can be regarded as simple bias-corrected C p () (λ). At least, when () (λ) = 0, MC p () (λ) completely corrects the bias of C p () (λ). If we use a (L) (λ) other than () (λ), the penalty selection criteria becomes more complicated as the number of L increases. As an example, we describe the penalty selection criteria for (2) (λ) in Appendix A.6. From the viewpoint of an application, C p () (λ) and MC p () (λ) are useful because these are the simplest criteria among all L. When we use C p () (λ) and MC p () (λ), the optimal and λ are given as follows: ˆλ () M ˆλ () C = arg min λ [0, ] C() p (λ) and = arg min λ [0, ] MC() p (λ) and ˆ(ˆλ () C ) = () (ˆλ () C ), ˆ(ˆλ () M ) = () (ˆλ () M ). 3.3. Asymptotic optimization for λ In previous subsections, we proposed the penalty selection criteria. When such criteria are used to optimize λ, we must perform an iterative procedure. In this subsection, we consider the non-iterative optimization of λ. This requires the calculation of an asymptotic optimal λ, which minimizes an asymptotic expansion of PMSE[Ŷˆ(λ) ] among λ [0, ]. The following theorem gives such an asymptotic optimal value of λ (The proof is provided in Appendix A.7.): Theorem 3.3. An asymptotic optimal λ minimizes PMSE[Ŷˆ(λ) ] asymptotically is given by λ = E Y [a ] E Y [a /a 2 ] 2E Y [a 2 /a 2 ] pb E Y [a /a 2 ], where V = X Y Σ Y X and a j = n j tr(v M (j+2) 0 ). By replacing a with a, we estimate λ as follows: ˆλ 0 = { 2tr(M 4 } 0 V ) ptr(m0 3 V )tr(m0. (3.4) ) Note that E Y [a ] = c M E Y [a ] holds. Hence, we can estimate E Y [a ] as c M a. This implies new estimator of λ given by ˆλ M = c Mˆλ0. When we use ˆλ 0 and ˆλ M, optimal is given by ˆ(ˆλ 0 ) = arg min GC p (, ˆλ 0 ) and ˆλ 0 is in (3.4), [0, ] ˆ(ˆλ M ) = arg min GC p (, ˆλ M ) and ˆλ M = c Mˆλ0. [0, ] 4. Numerical Study 9

In this section, we conduct numerical studies to compare the PMSEs of predictors of Y consisting of the ridge regression estimators with optimized ridge and penalty parameters. Let R q and q (ρ) be q q matrices defined by R q = diag(,..., q) and ( q (ρ)) ij = ρ i j. The explanatory matrix X was generated from X = W Ψ /2, where Ψ = R /2 k k (ρ x )R /2 k, and W is an n k matrix, the elements of which were generated independently from the uniform distribution on (, ). The k p unknown regression coefficient matrix Ξ was defined by Ξ = δf Ξ i, where δ is a constant, and F is defined as F = diag( κ, 0 k κ ), which is a k k matrix, and Ξ i is defined as the first five rows of Ξ 0 when k = 5, Ξ 0 when k = 0, and Ξ when k = 5. Ξ 0 = 0.850 0.657 0.259 0.2753 0.2432 0.87 0.393 0.2926 0.67 0.2754 0.2608 0.766 0.2693 0.264 0.2066 0.0676 0.0663 0.056 0.2239 0.297 0.880 0.0352 0.0346 0.0305 0.3240 0.399 0.2868 0.3747 0.3727 0.3554, Ξ =.3794 0.0645 0.0330 0.0766 0.024 0.043 0.268 0.396 0.095 0.469 0.2589 0.798 0.238 0.488 0.082 0.240 0.463 0.2 0.3002 0.2364 0.950 0.55 0.0953 0.082 0.2774 0.2395 0.209 0.3392 0.3072 0.2807 0.006 0.007 0.000 0.0438 0.0408 0.038 0.387 0.3039 0.2904 0.0529 0.050 0.0493 0.2505 0.245 0.2399 Here, δ controls the scale of the regression coefficient matrix, and F controls the number of non-zero regression coefficients via κ (the dimension of the true model).. The values of the elements of Ξ 0 and Ξ, which is an essential regression coefficient matrix, are the same as in Lawless (98). Simulated data values Y were generated by N n 3 (XΞ, Σ I n ) repeatedly under several selections of n, k, κ, δ, ρ y, and ρ x, where Σ = R /2 3 3 (ρ y )R /2 3, and the number of repetition was 000. At each repetition, we evaluated r(xξ, Ŷˆ) = tr{(xξ Ŷ) (XΞ Ŷ)Σ }, where Ŷˆ = n ˆµ + X ˆΞˆ, which is the predicted value of Y obtained from each method. The average of np + r(xξ, Ŷˆ) across 000 repetitions was regarded as the PMSE of Ŷˆ. In the simulation, a standardized X was used to estimate the regression coefficients. Recall that GC p (, λ) is defined in (2.). Here, λ and are optimized by the following methods: Method : ˆ = arg min GC p (, ˆλ) and ˆλ = ˆλ 0, where ˆλ 0 is defined in (3.4). [0, ] Method 2: ˆ = arg min GC p (, ˆλ) and ˆλ = ˆλ M = c Mˆλ0, where c M = (p + )/(n k ). [0, ] 0

Method 3: ˆ = arg min GC p (, ˆλ) and ˆλ = arg min C p # (λ), where C p # (λ) is given by [0, ] λ [0, ] Definition 3.. Method 4: ˆ =arg min GC p (, ˆλ) and ˆλ=arg min MC p # (λ), where MC p # (λ) is given by [0, ] λ [0, ] Definition 3.. Method 5: ˆ = () (ˆλ) and ˆλ = arg min λ [0, ] C() p (λ), where () (λ) and C p () (λ) are defined in (2.6) and by Definition 3.2. Method 6: ˆ = () (ˆλ) and ˆλ = arg min MC p () (λ), where MC p () (λ) is given by Definition λ [0, ] 3.2. For the purpose of comparison with the proposed methods, we prepare conventional optimization methods, which are obtained using the following methods: Method 7: ˆC = arg min C p () = arg min GC p (, ). [0, ] [0, ] Method 8: ˆM = arg min MC p () = arg min GC p (, c M ), where c M = (p + )/(n [0, ] [0, ] k ). In Methods 3 through 8, the fminsearch function in Matlab is used to find the minimizer of the penalty selection criterion or model selection criterion. In the fminsearch function, the Nelder-Mead simplex method (see, e.g., Lagarias et al., 998) is used to search the value that minimizes the function. When Methods through 4 are used, an optimal is searched using the fminsearch function. We can see that computational speeds of Methods and 2 are the same as those of Methods 5 and 6. Furthermore, the computational speeds of Methods 5 and 6 are almost the same as those of Methods 7 and 8 because these four methods optimize one parameter. It is easy to predict that the computational speeds of Methods 3 and 4 are slower than the other methods because Methods 3 and 4 optimize two parameters simultaneously. In this paper, we proposed Methods through 6 as referred to above, and these methods can be regarded as the estimation methods for the optimal λ. To obtain the optimal λ, called λ, which minimizes the PMSE, we divided [0, 2] into 00 parts and used each point. Then we compute r(xξ, Ŷˆ) for each point in each repetition. After 000 repetitions, we compute the averages of these values for each point which are regarded as the main term of PMSE of Ŷˆ. By comparing the average values, the λ is obtained. For comparing ˆλ which is estimated λ by using each method in above Methods through 6, we show the Figure in some situations which shows the box plots of each ˆλ in 000 repetitions. The horizontal line

means the λ. Tables through 4 show the averages of (ˆλ λ ) 2 across 000 repetitions, which is referred as the mean squared error (MSE) of ˆλ, for each method. In the Theorem 2.2, we derived the expansion for ˆ(λ) and we suggested to use the first term of the expansion which is referred as () (λ). To compare the ˆ(λ) and () (λ), we show the scatter plots in some situations when we fix λ as or 2 in Figure 2. In each scatter plot, the line means the 45-degree line which means the line of ˆ(λ) = () (λ). When the scatter plot close up this line, () (λ) closes to ˆ(λ). Tables 7 through 2 show the simulation results obtained for PMSE[Ŷˆ]/{p(n +k + )} 00 for the cases in which (k, n) = (5, 30), (5, 50), (0, 30), (0, 50), (5, 30), and (5, 50), respectively, where p(n + k + ) is the PMSE of the predictor of Y derived using the LS estimators. We note p = 3 in numerical studies. In the tables, bold font indicates the minimized PMSE, and italic font indicates the second-smallest PMSE. Please insert figures and 2 around here Please insert tables, 2, 3, 4, 5, 6, 7, 8, 9, 0, and 2 around here From the figure shows the box plots for ˆλ for each method, we can see that the dispersion of ˆλ and the differences between ˆλ and λ. Methods 2, 4 and 6 are always smaller value than Methods, 3 and 5. The dispersion of Methods 2, 4 and 6 are smaller than Methods, 3 and 5. This facts mean that the correction of each method make the optimized value and dispersion smaller. We note that the dispersions of Methods and 2 are smaller than other methods. When ρ y and ρ x are small, our optimization method is nearly equal to λ. From the tables through 6, we can see that the numerical evaluation for each method. When k and δ are zeros, Method 6 is the best and Method 5 is the second best. Methods 2 and 4 are the best and the second best when κ and δ is small. When κ is equal to k = 0 and δ is large, Method 6 is the best method. On the other hand, when k = κ = 5 and δ is large, Method 6 or 2 is the best in ρ y is small or large. Consequently, Method 6 and 5 was, on average, the best and the second best method except k = 5. When k = 5, Method 5 was the best. Hence we recommend using Method 6 to optimize the penalty parameter λ. From the figure 2 shows the scatter plot for () (λ) and ˆ(λ), we can see the dispersion in each situation. We note that λ become large, the difference between ˆ(λ) and () (λ) becomes small. Also when ρ y or n become large, the difference between ˆ(λ) and () (λ) becomes small. On the other hand, the difference between ˆ(λ) and () (λ) becomes large ρ x is large. In almost case, () (λ) is smaller than ˆ(λ). This fact is corresponding the result in the Theorem 2.2 since ˆ(λ) = () (λ) + O(n ). When the ρ y or δ become large, the dispersion of ˆ(λ) becomes small. The each value of ˆ(λ) and () (λ) become small when ρ y, ρ x, δ or λ becomes large. 2

Based on the simulations, we can see that all of the methods improved the PMSEs of the LS estimators in almost all cases. All of the methods greatly improved the PMSE when n becomes small or k becomes large. Moreover, the improvement in the PMSE of the proposed method increases as ρ y decreases. The improvement in the PMSE when κ 0 and δ 0 of the proposed method increases as ρ x increases. Comparison of several methods reveals that Methods 2 and 4 were better than Methods and 3, respectively, in almost all cases when ρ x is large. When k and ρ x become large, Methods 5 and 6 provide a greater improvement in PMSE than Methods 3 and 4. When k becomes small and n becomes large, Methods 4 and 6 improve the PMSE more than Methods 2 and 4 in most cases. Occasionally, Method 7 improves the PMSE more than Method 8, especially when κ and δ become large. Consequently, Method 6 was, on average, the best method. In particular, it strongly improved the PMSE when δ and κ are small. Based on these results, we recommend using Method 6 to optimize the multivariate ridge regression. Acknowledgments The author wish express my deepest gratitude to Dr. Hirokazu Yanagihara of Hiroshima University for his valuable advice and encouragement and introducing me to various fields of mathematical statistics. In addition, I would like to thank Professor Hirofumi Wakaki for previous comments and Professor Yasunori Fujikoshi of Hiroshima University for providing helpful comments and suggestions. Also, I thank to Dr. Kengo Kato of Hiroshima University for his encouragement and comments. Futhemore, I sincere thank Professor Megu Ohtaki, Dr. Kenichi Satoh, and Dr. Tetsuji Tonda of Hiroshima University for their encouragement and valuable comments and introducing me to various fields of mathematical statistics. I also would like to thank Dr. Tomoyuki Akita, Dr. Shinobu Fujii, and Dr. Kunihiko Taniguchi of Hiroshima University for their advice and encouragement. Special thanks are due to Professor Takashi Kanda of Hiroshima Institute of Technology for his encouragement and his recommendation to attend Hiroshima University. Appendix A.. Proof of Theorem 2. In this subsection, we prove Theorem 2., which shows the properties of ˆ(λ). Using d j 3

and z j in (2.4), we can write tr(w S ) and tr(m M 0 ) in (2.) as g() = tr(w S ) = tr(y Y S ) 2 h() = tr(m M 0 ) = k j= Since d j > 0 and 0, we have d j d j +. k j= z j 2 d j + + k j= z j 2 d j (d j + ) 2 n ˆµ S ˆµ, ġ() = g() = 2 k j= z j 2 0, (A.) (d j + ) 3 with equality if and only if = 0 or, and ḣ() = h() = k j= d j 0, (A.2) (d j + ) 2 with equality if and only if. Therefore, g() and h() are strictly monotonic increasing and decreasing functions of [0, ], respectively. Since GC p (, λ) = λg() + 2ph(), these results imply that ˆ( ) = arg min [0, ] GC p (, ) = arg min [0, ] tr(w S ) = 0, ˆ(0) = arg min GC p (, 0) = arg min tr(m M 0 ) =. [0, ] [0, ] On the other hand, from (A.) and (A.2), we derive GC p (, λ) = GC p (, λ) = 2 k j= pd 2 j(r λ,j ) (d j + ) 3 = k ϕ j ( λ), j= where r λ,j is given by (2.5). Note that ϕ j ( λ) 0 when [0, (r + λ, ) ]. Therefore, GC p (, λ) < 0 when [0, (r + λ, ) ]. These imply that GC p (, λ) is a monotonic decreasing function with respect to [0, (r + λ, ) ]. Thus, ˆ(λ) > (r + λ, ). On the other hand, if max j=,...,k r λ,j 0 is satisfied, ϕ j ( λ) 0 holds for any. This fact means GC p (, λ) is a monotonic decreasing function with respect to [0, ]. Hence, we can see that ˆ(λ) = when max j=,...,k r λ,j 0. Since max j=,...,k r λ,j 0 holds when λ < min j=,...,k pd j / z j 2. Thus, ˆ(λ) = when λ < min j=,...,k pd j / z j 2 is satisfied. Using Equation (2.3) and GC p (, λ) = λġ() + 2pḣ(), we have where GC λ p (ˆ(λ), λ) = ġ(ˆ(λ)) + ˆ(λ) GC p (ˆ(λ), λ) = 0, λ (A.3) GC p (, λ) = 2 GC p (, λ)/( 2 ). Since ˆ(λ) satisfies (2.2), i.e., ˆ(λ) is the minimizer of GC p (, λ), GC p (, λ) is a convex function around the neighborhood of ˆ(λ). Hence, we 4

have GC p (ˆ(λ), λ) > 0. Using this result and Equation (A.3), we obtain ˆ(λ) λ = ġ(ˆ(λ)) GC p (ˆ(λ), λ). We derive ġ(ˆ(λ)) 0 because ġ() is a strictly monotonic decreasing function of [0, ]. Hence, ˆ(λ)/( λ) 0 is obtained. This implies that ˆ(λ) is a monotonic decreasing function with respect to λ. A.2. Proof of Theorem 2.2 In this subsection, we present the proof of Theorem 2.2, which describes the expansion of ˆ(λ). In order to prove this theorem, we expand the GC p criterion in (2.) under fixed λ. Recall that X n = 0 k, (n k )S = Y (I n n n/n XM 0 X )Y, and Q X XQ = D. Hence, we derive W S = Y (I n n n/n XM X ) 2 Y S = (n k )I p + Y X(M 0 2M + M M 0 M )X Y S = (n k )I p + Y XQD /2 {I k D(D + I k ) } 2 D /2 Q X Y S. Based on this result, the GC p criterion is expressed as GC p (, λ) = λ(n k )p + λ 2 tr{d /2 Q V QD /2 (D + I k ) 2 } + 2p{D(D + I k ) }. Letting t j = (D /2 Q V QD /2 ) jj, we obtain GC p (, λ) = λ(n k )p + { k ( λ + ) 2 ) } 2 t i + 2p ( + di. d i d 2 i i= By Taylor expansion around = 0, we have GC p (, λ) = λ(n k )p + λ k i= = (λ(n k ) + 2k)p + = (λ(n k ) + 2k)p + ( ) l+ l l+ t i + 2p l= d l+ i { λ( ) l+ l l+ l= l= k i= ( k i= ) ( ) l+ l l= d l i t i 2p( ) l+ l d l+ i k i= ( ) l+ l {λltr(v M (l+2) 0 ) 2ptr(M l 0 )}. Recall that a j = n j tr(v M (j+2) 0 ) and b j = n j tr(m j 0 ). It follows that GC p (, λ) = (λ(n k ) + 2k)p + lim L L ( ) l+ l {λla n l l 2pb l }. l= } d l i 5

Then, the following equation is derived: GC p(, λ) = lim L L ( ) l+ l l {λ(l + )a n l l 2pb l }. l= Using the above equation and ˆ(λ) satisfying (2.3), we obtain the equation in Theorem 2.2. A.3. Proof of Theorem 3. In this subsection, we prove Theorem 3., which shows the risk function with respect to λ. Recall that g() = tr(w S ), h() = tr(m M 0 ) and GC p (, λ) = λg() + 2ph(). Since ˆ(λ) satisfies (2.3), we obtain ( 0 = λ g() + 2p h() ) =ˆ(λ) =ˆ(λ) = ˆ(λ) { } λ g(ˆ(λ)) + 2pḧ(ˆ(λ)) + λ ġ(ˆ(λ)), where ġ(ˆ(λ)) = g()/( ) =ˆ(λ), g(ˆ(λ)) = 2 g()/( ) 2 =ˆ(λ), and ḣ(ˆ(λ)) = h()/( ) =ˆ(λ). Thus, we obtain because, from Appendix A., ˆ(λ) = λ( ġ(ˆ(λ)))/( ), GC p (ˆ(λ), λ) By simple calculation, we have ġ(ˆ(λ)) = 2ˆ(λ)tr(M 3 ˆ(λ) V ). GC p (ˆ(λ), λ) = λ g(ˆ(λ)) + 2pḧ(ˆ(λ)) > 0. As in the proof of Lemma 3., which is given in Appendix A.4, we obtain tr(m 3 ˆ(λ) V )/() = 2(S Y XM 3 ˆ(λ) X {I n Y S Y H/(n k )}) ji. Hence, we derive n i= p j= because X n = 0 k and M 3 ˆ(λ) M 0M 2 ˆ(λ) GC p (, λ) = λġ() + (XM 2 ˆ(λ) X Y ) ˆ(λ) 4λˆ(λ)tr(V M 5 ˆ(λ) M 0) ij =, GC p (ˆ(λ), λ) 2pḣ() = 2{λtr(M 3 = M 5 ˆ(λ) M 0. V ) ptr(m 2 M 0 )} and By simple calculation, we have GC p (ˆ(λ), λ) = 2{λtr(M 3 4 3 V ) 3λˆ(λ)tr(M V ) + 2ptr(M ˆ(λ) ˆ(λ) ˆ(λ) M 0)}. Thus, the theorem is proved. A.4. Proof of Lemma 3. In this subsection, we prove Lemma 3., which shows the derivative of a l for any l. Since a l = n l tr(v M (l+2) 0 ), M 0 = X X, and V = X Y S Y X, we need only obtain the 6

derivative of V. We can see that S = S S S = n k S (e j p e i nhy + Y He i n e j p)s, where e i n is the n-dimensional vector, the ith element of which is one and other elements of which are zeros. Thus, we obtain V = X e i n e (Y ) j ps Y X + X Y S e j p e i nx ij n k X Y S (e j p e i nhy + Y He i n e j p)s Y X. From a l /( ) = n l tr{m (l+2) 0 ( V )/( )}, we derive this lemma. From (3.2), we obtain A.5. Proof of Theorem 3.2 PMSE[Ŷ (L) (λ) ] = E Y [tr(w (L) (λ) Σ ) + 2p{tr(M M (L) (λ) 0) + }] [ n p 2E Y (XM 2 (L) (λ) X Y ) ] (L) (λ) ij. (Y ) i= j= ij Hence, we need only calculate (L) (λ)/( ). Using Theorem 2.2, we derive the derivative as follows: (L) (λ) = () (λ) + [ 2λa L l=0 n l ( )l+ (l + ) ] (L )(λ){λ(l l + 2)a l+ (L ) (λ) 2pb l+ }. Recall that (0) (λ) = 0. From Lemma 3., we have () (λ) = pb λa 2 a = 2n () (λ) a ( )) S Y XM0 3 X (I n n k Y S Y H, ji 7

and [ = n λa 2 L l=0 + L 2λa L ] n l ( )l+ (l + ) (L )(λ){λ(l l + 2)a l+ (L ) (λ) 2pb l+ } )) 2λa l=0 ( ( S Y XM0 3 X I n Y S Y H n k n l ( )l+ (l + ) l (L )(λ){λ(l + 2)a l+ (L ) (λ) 2pb l+ } l=0 n l ( )l+ (l + ) l (L ) {λ(l + )(l + 2)a l+ (L ) (λ) 2plb l+ } (L ) + n L ( ) l+ (l + )(l + 2) l+ a l=0 Thus, we obtain n p (XM 2 (L) (λ) X Y ) ij i= j= [ 2λa = n tr(m λa 2 0 2 M 2 i= j= L l=0 (L ) ji ( S Y XM (l+3) 0 X (I n n k Y S Y H n l ( )l+ (l + ) ] (L )(λ){λ(l l + 2)a l+ (L ) (λ) 2pb l+ } V ) L (L) (λ) l=0 )) n l ( )l+ (l + ) l (L )(λ){λ(l + 2)a l+ (L ) (λ) 2pb l+ } + L 2λa n l ( )l+ (l + ) l (L ) {λ(l + )(l + 2)a l+ (L ) (λ) 2plb l+ } l=0 n p (XM 2 (L) (λ) X Y ) (L ) ij + n L ( ) l+ (l + )(l + 2) l+ a and l=0 n i= p j= (l+2) (L ) tr(m0 M 2 (XM 2 (L) (λ) X Y ) ij () (λ) (L) (λ) V ), We derive this theorem by substituting these results into (3.2). = 2n () (λ) tr(m0 2 M 2 a V ). (L) (λ) A.6. The criteria for optimizing λ when we use (2) (λ) In this subsection, we calculate the criterion for optimizing λ when we use (2) (λ). From (2.6), we obtain (2) (λ) = () (λ) + () (λ){3λa 2 () (λ) 2pb 2 } nλa = () (λ) + { n ()(λ) 2 3 a 2 2 b } 2. a b 8. ji

Since b l = n l tr(m0 l ) does not depend on (Y ) ij, we derive { ()(λ) 2 (2) (λ) = () (λ) = () (λ) = () (λ) + n { 2 () (λ) + n { + 2 () (λ) n ( 3 a 2 ( 3 a 2 a 2 b 2 b ( 3 a 2 a 2 b 2 b )} 2 b ) } 2 + 3 2 a b ()(λ) a 2/a ) } + 3 () 2 (λ) a 2 /a. n Hence, the third term of (3.) is obtained using the result of () (λ)/( ) as follows: n p (XM 2 (2) (λ) X Y ) (2) (λ) ij (Y ) i= j= ij = 2n { () (λ) + 2 ( () (λ) 3 a 2 2 b ) } ( ) 2 tr M0 2 M 2 a n a b V (2) (λ) + 3 () 2 (λ) n p ( ) (XM 2 a 2 a na 2 (2) (λ) X Y ) ij a a 2. i= j= From Lemma 3., we obtain 3 () 2 (λ) n p ( ) (XM 2 a 2 a (2) (λ) X Y ) ij a a 2 na 2 i= = 6 2 () (λ) ( Thus, we have n a 2 j= n i= p (XM 2 (2) (λ) X Y ) ij j= S Y XM 3 0 ( ( ) na M0 a 2 I k X I n = 6 () 2 (λ) tr{xm 2 a 2 V M (2) (λ) 0 3 (na M0 a 2 I k )X } = 6 () 2 (λ) tr{m a 2 0 2 M 2 V (na (2) (λ) M0 a 2 I k )}. i= p j= (XM 2 (2) (λ) X Y ) ij (2) (λ) = 2n () (λ) tr(m0 2 M 2 a V ) + 4 () 2 (λ) (2) (λ) a + 6 () 2 (λ) tr{m a 2 0 2 M 2 V (a (2) (λ) 2I k na M0 )} = 2n () (λ) tr(m0 2 M 2 a 8 () 2 (λ)b 2 tr(m0 2 M 2 a b (2) (λ) V ) + 8 2 () (λ)a 2 a 2 (2) (λ) V ) 6n 2 () (λ) 9 a )) n k Y S Y H ji ( 3 a 2 2 b ) 2 tr(m0 2 M 2 a b V ) (2) (λ) tr(m 2 0 M 2 (2) (λ) V ) tr(m 3 0 M 2 (2) (λ) V ).

When we use Theorem 3.2, we obtain the same result. Using this result, we obtain the C p type criteria for optimizing λ are defined as C p (2) (λ) = tr(w () (λ) S ) + 2ptr(M M (2) λ 0) and + 4n () (λ) tr(m0 2 M 2 a 6 () 2 (λ)b 2 tr(m0 2 M 2 a b (2) (λ) V ) + 36 2 () (λ)a 2 a 2 (2) (λ) V ) 2n 2 () (λ) a tr(m 2 0 M 2 (2) (λ) V ) tr(m 3 0 M 2 (2) (λ) V ), MC p (2) (λ) = c M tr(w () (λ) S ) + 2ptr(M M (2) λ 0) + 4n () (λ) tr(m0 2 M 2 a 6 () 2 (λ)b 2 tr(m0 2 M 2 a b (2) (λ) V ) + 36 2 () (λ)a 2 a 2 (2) (λ) V ) 2n 2 () (λ) a A.7. Proof of Theorem 3.3 tr(m 2 0 M 2 (2) (λ) V ) tr(m 3 0 M 2 (2) (λ) V ). In this subsection, we show an asymptotic expansion PMSE[Ŷˆ(λ) ] of and calculation to obtain ˆλ 0. Since PMSE[Ŷˆ(λ) ] is obtained as (3.3), we consider expanding each term for obtaining ˆλ 0. We obtain Wˆ(λ) Σ = Y (I n n n/n XM ˆ(λ) X ) 2 Y Σ = (n k )SΣ + Y X(M 0 2M ˆ(λ) + M ˆ(λ) M 0M ˆ(λ) )X Y Σ = (n k )SΣ + Y XQD /2 (I k D{D + ˆ(λ)I k ) } 2 D /2 Q X Y Σ, because Y (I n n n/n XM 0 X )Y = (n k )S, X n = 0 k and Q X XQ = D. Hence, we obtain tr(wˆ(λ) Σ ) = (n k )tr(sσ ) + 2 tr{(d + I k ) 2 D Q V Q}. Since S is an unbiased estimator of Σ, we have E Y [tr(wˆ(λ) Σ )] = (n k )p + E Y ( ) 2 k ˆ(λ) (Q V Q) jj d j + ˆ(λ). d j j= Then, since d i = O(n) and V = O p (n 2 ), we can expand the above equation as follows: [ k ] ˆ E Y [tr(wˆ(λ) Σ 2 (λ) )] = (n k )p + E Y (Q V Q) d 3 jj + O p (n 2 ) j= j [ ] a = (n k )p + E ˆ 2 (λ) Y + O p (n 2 ). n 20

From simple calculation by Taylor expansion and noting that a j = O p () and b j = O(), we derive tr(m ˆ(λ) M 0) = k b ˆ(λ) + O p (n 2 ), n λˆ(λ)tr(v M 5 ˆ(λ) M 0) = λˆ(λ)a 2 n 2 + O p (n 3 ), λtr(v M 3 ˆ(λ) ) = λa n + O p(n 2 ), λtr(v M 4 ˆ(λ) ) = λˆ(λ)a 2 n 2 + O p (n 3 ), tr(m 3 ˆ(λ) M 0) = b 2 n 2 + O p(n 3 ). By substituting these results into PMSE[Ŷˆ(λ) ] in (3.3), we obtain the asymptotic expansion of PMSE[Ŷˆ(λ) ], as follows: [ ] a PMSE[Ŷˆ(λ) ] = (n + k + )p + E ˆ 2 (λ) Y 2pb ˆ(λ) + 4a 2ˆ(λ) + O p (n 2 ) n n na From (2.6), which is proved in Appendix A.2, ˆ(λ) = pb /(λa ) + O p (n ). consider minimizing the following approximated PMSE: PMSE[Ŷˆ(λ) ] = (n + k + )p + E Y [ pb na ( pb a λ 2 a 2pb λ + 4a 2 λa )] + O(n 2 ).. Hence, we Hence we obtain the asymptotic optimal λ, which minimizes the second term of the above equation as in Theorem 3.3. References [] A. C. Atkinson, A note on the generalized information criterion for choice of a model, Biometrika, 67 (980), 43-48. [2] P. J. Brown and J. V. Zidek, Adaptive multivariate ridge regression, Ann. Statist., 8 (980), 64 74. [3] B. Efron, The estimation of prediction error: covariance penalties and cross-validation, J. Amer. Statist. Assoc., 99 (2004), 69 632. [4] S. J. V. Dien, S. Iwatani, Y. Usuda and K. Matsui, Theoretical analysis of amino acid-producing Eschenrichia coli using a stoixhiometrix model and multivariate linear regression, J. Biosci. Bioeng., 02 (2006), 34 40. [5] Y. Fujikoshi and K. Satoh, Modified AIC and C p in multivariate linear regression, Biometrika, 84 (997), 707 76. 2

[6] Y. Fujikoshi, H. Yanagihara and H. Wakaki, Bias corrections of some criteria for selecting multivariate linear models in a general nonnormal case, Amer. J. Math. Management Sci., 25 (2005), 22-258. [7] Y. Haitovsky, On multivariate ridge regression, Biometrika, 74 (987), 563 570. [8] A. E. Hoerl and R. W. Kennard, Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 2(970), 55 67. [9] J. C. Lagarias, J. A. Reeds, M. H. Wright and P. E. Wright, Convergence properties of the Nelder-Mead simplex method in low dimensions, SIAM J. Optim., 9 (998), 2 47. [0] J. F. Lawless, Mean squared error properties of generalized ridge estimators, J. Amer. Statist. Assoc., 76 (98), 462 466. [] C. L. Mallows, Some comments on C p, Technometrics, 5 (973), 66 675. [2] C. L. Mallows, More comments on C p, Technometrics, 37 (995), 362 372. [3] C. Sârbu, C. Onisor, M. Posa, S. Kevresan and K. Kuhajda, Modeling and prediction (correction) of partition coefficients of bile acids and their derivatives by multivariate regression methods, Talanta, 75 (2008), 65 657. [4] R. Saxén and J. Sundell, 37 Cs in freshwater fish in Finland since 986 a statistical analysis with multivariate linear regression models, J. Environ. Radioactiv., 87 (2006), 62 76. [5] X. Shen and J. Ye, Adaptive model selection, J. Amer. Statist. Assoc., 97 (2002), 20 22. [6] M. Siotani, T. Hayakawa and Y. Fujikoshi, Modern Multivariate Statistical Analysis: A Graduate Course and Handbook, American Sciences Press, Columbus, Ohio, 985. [7] B. Skagerberg, J. MacGregor and C. Kiparissides, Multivariate data analysis applied to low-density polyethylene reactors, Chemometr. Intell. Lab. Syst., 4 (992), 34 356. [8] R. S. Sparks, D. Coutsourides and L. Troskie, The multivariate C p, Comm. Statist. A Theory Methods, 2 (983), 775 793. [9] M. S. Srivastava, Methods of Multivariate Statistics, John Wiley & Sons, New York, 2002. 22

[20] N. H. Timm, Applied Multivariate Analysis, Springer-Verlag, New York, 2002. [2] H. Yanagihara and K. Satoh, An unbiased C p criterion for multivariate ridge regression, J. Multivariate Anal., 0 (200), 226 238. [22] J. Ye, On measuring and correcting the effects of data mining and model selection, J. Amer. Statist. Assoc., 93 (998), 20 3. [23] A. Yoshimoto, H. Yanagihara and Y. Ninomiya, Finding factors affecting a forest stand growth through multivariate linear modeling, J. Jpn. For. Res., 87 (2005), 504 52 (in Japanese). 23

.4 (30, 5, 0,, 0.2, 0.2) 3 (30, 5, 0,, 0.2, 0.95).2 2.5 2 0.8.5 0.6 0.4 0.2 0.5 0 Method Method 2 Method 3 Method 4 Method 5 Method 6 (30, 5, 0,, 0.95, 0.2) 0 Method Method 2 Method 3 Method 4 Method 5 Method 6 (30, 5, 0,, 0.95, 0.95).2 3 2.5 0.8 2 0.6.5 0.4 0.2 0.5 0 0 Method Method 2 Method 3 Method 4 Method 5 Method 6 Method Method 2 Method 3 Method 4 Method 5 Method 6 (30, 5, 5,, 0.2, 0.2) (30, 5, 0, 3, 0.2, 0.2).2.2 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 Method Method 2 Method 3 Method 4 Method 5 Method 6 Method Method 2 Method 3 Method 4 Method 5 Method 6.2 (50, 5, 0,, 0.2, 0.2).2 (30, 0, 5, 3, 0.2, 0.2) 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 Method Method 2 Method 3 Method 4 Method 5 Method 6 Method Method 2 Method 3 Method 4 Method 5 Method 6 Figure. The box plots for ˆλ in the case of (n, k, κ, δ, ρ y, ρ x ) 24

Table. MSE of ˆλ in which (k, n) = (5, 30) Method κ δ ρ y ρ x 2 3 4 5 6 0 0 0.2 0.2.470.04.475.45 0.039 0.04 0.95.644.35 2.599 2.5 0.036 0.03 0.95 0.2.327 0.896.239 0.907 0.033 0.03 0.95.593.092 2.482 2.050 0.035 0.02 5 0.2 0.2 0.030 0.002 0.009 0.05 0.8 0.33 0.95 0.305 0.4 0.424 0.266 0.378 0.445 0.95 0.2 0.006 0.06 0.00 0.032 0.055 0.67 0.95 0.340 0.37 0.327 0.69 0.270 0.338 3 0.2 0.2 0.008 0.02 0.004 0.07 0.00 0.030 0.95.577.095.26 0.85 0.303 0.54 0.95 0.2.322 0.98.293 0.894.253 0.86 0.95.56.053.359 0.927 0.677 0.397 Average 0.928 0.624.039 0.782 0.272 0.23 Table 2. MSE of ˆλ in which (k, n) = (5, 50) Method κ δ ρ y ρ x 2 3 4 5 6 0 0 0.2 0.2.394.48.504.277 0.027 0.05 0.95.654.363 2.645 2.397 0.022 0.02 0.95 0.2.64 0.94.28 0.998 0.023 0.07 0.95.656.364 2.680 2.436 0.020 0.00 5 0.2 0.2 0.00 0.00 0.003 0.003 0.076 0.36 0.95 0.288 0.76 0.43 0.30 0.406 0.442 0.95 0.2 0.003 0.003 0.00 0.008 0.05 0.048 0.95 0.232 0.33 0.87 0.2 0.322 0.377 3 0.2 0.2 0.000 0.03 0.00 0.06 0.002 0.02 0.95.550.28.28.040 0.460 0.337 0.95 0.2.289.065.276.054.253.033 0.95.483.226.335.00 0.928 0.73 Average 0.894 0.726.045 0.897 0.296 0.265 25

Table 3. MSE of ˆλ in which (k, n) = (0, 30) Method κ δ ρ y ρ x 2 3 4 5 6 0 0 0.2 0.2.293 0.798 0.977 0.773 0.06 0.003 0.95.375 0.849.60.26 0.06 0.004 0.95 0.2.292 0.798 0.930 0.70 0.07 0.004 0.95.332 0.85.539. 0.07 0.004 5 0.2 0.2 0.070 0.00 0.028 0.0 0.3 0.46 0.95 0.328 0.04 0.375 0.202 0.3 0.355 0.95 0.2 0.056 0.000 0.020 0.007 0.45 0.284 0.95 0.303 0.090 0.283 0.27 0.308 0.36 3 0.2 0.2 0.030 0.004 0.0 0.05 0.07 0.088 0.95 0.8 0.03 0.0 0.029 0.329 0.438 0.95 0.2 0.08 0.0 0.007 0.022 0.002 0.054 0.95.369 0.853.30 0.673 0.24 0.06 0 0.2 0.2 0.035 0.003 0.00 0.09 0.267 0.450 0.95 0.283 0.079 0.277 0.4 0.339 0.392 0.95 0.2 0.043 0.002 0.05 0.04 0.2 0.277 0.95 0.303 0.090 0.25 0.07 0.282 0.348 3 0.2 0.2.299 0.80.230 0.755 0.936 0.535 0.95.4 0.879.5 0.657 0.22 0.043 0.95 0.2.307 0.85.259 0.777.059 0.624 0.95.399 0.872.59 0.693 0.233 0.094 Average 0.686 0.395 0.65 0.403 0.255 0.246 26

Table 4. MSE of ˆλ in which (k, n) = (0, 50) Method κ δ ρ y ρ x 2 3 4 5 6 0 0 0.2 0.2.73 0.940 0.977 0.862 0.06 0.007 0.95.282.028.562.409 0.008 0.004 0.95 0.2.29 0.902 0.890 0.75 0.04 0.007 0.95.285.03.598.403 0.007 0.003 5 0.2 0.2 0.02 0.000 0.005 0.004 0.200 0.287 0.95 0.203 0. 0.83 0.26 0.407 0.437 0.95 0.2 0.007 0.00 0.003 0.004 0.058 0.5 0.95 0.82 0.096 0.20 0.067 0.372 0.44 3 0.2 0.2 0.005 0.002 0.003 0.003 0.00 0.05 0.95 0.04 0.000 0.03 0.024 0.492 0.60 0.95 0.2 0.006 0.00 0.004 0.002 0.00 0.006 0.95.26.06.38 0.97 0.455 0.328 0 0.2 0.2 0.004 0.002 0.00 0.008 0.58 0.248 0.95 0.7 0.087 0.02 0.054 0.426 0.465 0.95 0.2 0.007 0.00 0.004 0.003 0.047 0.04 0.95 0.23 0.054 0.063 0.022 0.430 0.487 3 0.2 0.2.224 0.986.27 0.979.066 0.8444 0.95.325.067.6 0.92 0.229 0.47 0.95 0.2.2 0.976.206 0.97.24 0.897 0.95.304.050.94 0.95 0.459 0.325 Average 0.596 0.468 0.572 0.474 0.298 0.287 27

Table 5. MSE of ˆλ in which (k, n) = (5, 30) Method κ δ ρ y ρ x 2 3 4 5 6 0 0 0.2 0.2.3 0.660.046 0.76 0.04 0.002 0.95.54 0.762.364 0.940 0.0 0.00 0.95 0.2.27 0.594 0.922 0.540 0.02 0.004 0.95.462 0.726.326 0.86 0.03 0.003 5 0.2 0.2 0.285 0.042 0.63 0.036 0.242 0.328 0.95 0.864 0.328 0.77 0.356 0.080 0.096 0.95 0.2 0.030 0.024 0.03 0.048 0.207 0.47 0.95 0.36 0.044 0.20 0.049 0.338 0.42 3 0.2 0.2 0.02 0.047 0.005 0.069 0.8 0.473 0.95 0.232 0.07 0.4 0.048 0.427 0.522 0.95 0.2 0.004 0.063 0.003 0.068 0.00 0. 0.95 0.39 0.002 0.043 0.08 0.224 0.390 0 0.2 0.2 0.099 0.00 0.038 0.020 0.336 0.506 0.95 0.475 0. 0.340 0.099 0.253 0.296 0.95 0.2 0.044 0.04 0.08 0.033 0.097 0.30 0.95 0.35 0.044 0.69 0.038 0.35 0.398 3 0.2 0.2.308 0.667.39 0.562 0.62 0.249 0.95.542 0.787.45 0.547 0.057 0.03 0.95 0.2 0.002 0.073 0.00 0.084 0.002 0.8 0.95 0.4 0.002 0.026 0.024 0.20 0.376 5 0.2 0.2 0.07 0.005 0.025 0.023 0.260 0.462 0.95 0.468 0.09 0.307 0.074 0.244 0.292 0.95 0.2 0.033 0.020 0.03 0.040 0.074 0.267 0.95.540 0.786.42 0.540 0.050 0.0 3 0.2 0.2.259 0.642.46 0.57 0.84 0.374 0.95.527 0.779.2 0.556 0.04 0.029 0.95 0.2 0.006 0.056 0.004 0.063 0.00 0.083 0.95 0.2 0.002 0.033 0.08 0.49 0.305 Average 0.583 0.264 0.450 0.25 0.92 0.246 28

Table 6. MSE of ˆλ in which (k, n) = (5, 50) Method κ δ ρ y ρ x 2 3 4 5 6 0 0 0.2 0.2.66 0.903 0.867 0.753 0.006 0.002 0.95.256 0.973.247.059 0.004 0.00 0.95 0.2.67 0.904 0.86 0.70 0.007 0.003 0.95.254 0.972.63 0.997 0.004 0.002 5 0.2 0.2 0.023 0.00 0.023 0.025 0.520 0.625 0.95 0.289 0.63 0.20 0.4 0.38 0.340 0.95 0.2 0.002 0.007 0.00 0.009 0.073 0.54 0.95 0.044 0.006 0.00 0.006 0.555 0.630 3 0.2 0.2 0.000 0.05 0.000 0.07 0.053 0.26 0.95 0.035 0.003 0.0 0.007 0.557 0.645 0.95 0.2 0.006 0.04 0.006 0.042 0.03 0.059 0.95 0.004 0.005 0.000 0.05 0.7 0.205 0 0.2 0.2 0.02 0.000 0.004 0.005 0.278 0.392 0.95 0.74 0.080 0.078 0.036 0.44 0.472 0.95 0.2 0.004 0.004 0.003 0.006 0.034 0.095 0.95 0.063 0.04 0.02 0.005 0.452 0.532 3 0.2 0.2.7 0.92.58 0.900 0.946 0.74 0.95.26 0.982.069 0.87 0.26 00.073 0.95 0.2 0.003 0.034 0.003 0.034 0.008 0.046 0.95 0.004 0.005 0.000 0.04 0.088 0.68 5 0.2 0.2 0.006 0.003 0.00 0.0 0.232 0.345 0.95 0.42 0.059 0.060 0.020 0.449 0.494 0.95 0.2 0.002 0.007 0.00 0.009 0.030 0.088 0.95 0.053 0.00 0.05 0.002 0.442 0.523 3 0.2 0.2.97 0.932.84 0.920.002 0.76 0.95.270 0.989.056 0.82 0.92 0.22 0.95 0.2 0.007 0.002 0.007 0.002 0.003 0.005 0.95 0.06 0.000 0.006 0.003 0.05 0.5 Average 0.380 0.287 0.322 0.263 0.250 0.276 29

50 (30, 5, 0,, 0.2, 0.2, ) 30 (30, 5, 0,, 0.2, 0.2, 2) 45 40 25 35 30 20 25 5 20 5 0 0 5 5 0 0 2 4 6 8 0 2 4 6 8 20 20 8 6 4 2 0 8 6 4 2 0 0 2 3 4 5 6 7 8 7 6 5 4 3 2 (30, 5, 0,, 0.95, 0.2, ) 0 0 0.5.5 2 2.5 3 3.5 00 90 80 70 60 50 40 30 20 0 (30, 5, 0, 3, 0.2, 0.2, ) (50, 5, 0,, 0.2, 0.2, ) 0 0 5 0 5 20 25 0 0 2 3 4 5 6 7 8 9 0 20 8 6 4 2 0 8 6 4 2 (30, 5, 0,, 0.2, 0.95, ) 0 0 0.2 0.4 0.6 0.8.2.4 5 4.5 4 3.5 3 2.5 2.5 0.5 (30, 5, 0,, 0.95, 0.95, 2) 0 0 0. 0.2 0.3 0.4 0.5 0.6 4 2 0 8 6 4 2 (30, 0, 5, 3, 0.2, 0.2, ) 0 2 3 4 5 6 7 8 9 0 Figure 2. The scatter plots for () (λ) and ˆ(λ) in the case of (n, k, κ, δ, ρ y, ρ x, λ) 30

Table 7. Simulation results for the case in which (k, n) = (5, 30) Method κ δ ρ y ρ x 2 3 4 5 6 7 8 0 0 0.2 0.2 88.34 87.53 88.00 87.6 87.30 86.76 87.44 86.88 0.95 90.24 89.05 90.52 89.32 88.34 87.46 88.63 87.73 0.95 0.2 88.49 87.64 88.5 87.28 87.39 86.82 87.53 86.94 0.95 90.25 89.03 90.37 89.5 88.28 87.4 88.56 87.67 5 0.2 0.2 94.80 94.72 94.8 94.90 94.87 95.0 94.72 94.9 0.95 92.7 9.54 92.09 9.48 9.25 90.96 9.34 9.02 0.95 0.2 97.34 97.3 97.34 97.36 97.37 97.46 97.29 97.4 0.95 93.47 92.94 93.47 93.00 92.74 92.43 92.76 92.45 3 0.2 0.2 98.92 98.93 98.92 98.94 98.92 98.95 98.9 98.98 0.95 95.44 95.23 95.69 95.75 95.40 95.43 95.9 95.2 0.95 0.2 99.37 99.37 99.37 99.38 99.37 99.38 99.37 99.40 0.95 97.6 97.54 97.66 97.82 97.72 97.86 97.54 97.67 Average 93.87 93.40 93.87 93.46 93.25 93.00 93.27 93.02 Table 8. Simulation results for the case in which (k, n) = (5, 50) Method κ δ ρ y ρ x 2 3 4 5 6 7 8 0 0 0.2 0.2 92.22 9.94 92.06 9.70 9.63 9.45 9.73 9.54 0.95 93.29 92.9 93.52 93.08 92.0 9.80 92.27 9.98 0.95 0.2 92.22 9.96 92.08 9.72 9.67 9.49 9.76 9.58 0.95 93.24 92.83 93.42 93.0 92.05 9.78 92.2 9.93 5 0.2 0.2 97.46 97.45 97.46 97.48 97.52 97.57 97.46 97.52 0.95 94.96 94.79 95.00 94.82 94.54 94.46 94.57 94.48 0.95 0.2 98.75 98.76 98.76 98.77 98.77 98.79 98.77 98.8 0.95 96.9 96.06 96.28 96.20 95.96 95.92 95.9 95.87 3 0.2 0.2 99.57 99.57 99.57 99.57 99.57 99.57 99.57 99.58 0.95 97.70 97.67 97.80 97.84 97.78 97.83 97.66 97.70 0.95 0.2 99.79 99.80 99.79 99.80 99.79 99.80 99.80 99.8 0.95 98.73 98.73 98.74 98.75 98.77 98.80 98.74 98.78 Average 96.8 96.04 96.2 96.06 95.84 95.77 95.87 95.80 3

Table 9. Simulation results for the case in which (k, n) = (0, 30) Method κ δ ρ y ρ x 2 3 4 5 6 7 8 0 0 0.2 0.2 79.07 77.26 78.0 76.70 77.69 76.47 77.86 76.56 0.95 8.9 78.6 80.54 78.24 78.86 77.4 79.3 77.28 0.95 0.2 78.78 77.07 77.90 76.56 77.48 76.33 77.64 76.4 0.95 8.0 78.44 80.29 77.72 78.72 76.89 79.03 77.08 5 0.2 0.2 87.4 86.60 87.2 87.07 86.75 86.88 86.72 86.76 0.95 83.49 8.63 82.99 8.4 8.83 80.74 82.0 80.86 0.95 0.2 9.60 9.36 9.45 9.52 9.46 9.66 9.38 9.56 0.95 85.2 83.47 84.66 83.2 83.70 82.59 83.87 82.68 3 0.2 0.2 95.74 95.7 95.7 95.80 95.72 95.85 95.67 95.83 0.95 88.4 87.52 88.34 87.83 87.67 87.2 87.73 87.23 0.95 0.2 97.50 97.5 97.50 97.56 97.50 97.56 97.47 97.58 0.95 9.85 9.67 9.9 92.63 9.77 92.8 9.66 92.02 0 0.2 0.2 92.02 9.74 9.93 9.93 9.85 92.2 9.76 9.98 0.95 84.62 82.96 84.27 82.9 83.4 82.8 83.29 82.28 0.95 0.2 94.53 94.37 94.50 94.57 94.47 94.68 94.36 94.57 0.95 86.9 84.74 85.8 84.66 84.9 84.02 85.03 84.0 3 0.2 0.2 98.57 98.64 98.59 98.7 98.59 98.72 98.57 98.77 0.95 9.50 90.99 9.59 9.67 9.2 9.23 9.06 9.2 0.95 0.2 99.69 99.73 99.70 99.76 99.70 99.76 99.70 99.8 0.95 94.5 93.83 94.20 94.37 94.00 94.9 93.87 94.02 Average 89. 88.9 88.85 88.24 88.35 87.92 88.39 87.93 32

Table 0. Simulation results for the case in which (k, n) = (0, 50) Method κ δ ρ y ρ x 2 3 4 5 6 7 8 0 0 0.2 0.2 84.93 84.48 84.60 84.09 84.4 84.09 84.5 84.7 0.95 86.22 85.43 85.59 84.96 84.99 84.48 85.9 84.63 0.95 0.2 84.98 84.50 84.62 84.3 84.44 84.0 84.54 84.8 0.95 86.20 85.42 85.58 85.00 85.04 84.60 85.20 84.68 5 0.2 0.2 92.65 92.60 92.67 92.72 92.66 92.74 92.60 92.66 0.95 88.43 87.94 88.4 87.77 87.70 87.40 87.82 87.48 0.95 0.2 95.43 95.43 95.44 95.47 95.45 95.5 95.42 95.50 0.95 90.03 89.69 89.94 89.68 89.55 89.3 89.6 89.37 3 0.2 0.2 97.90 97.92 97.90 97.93 97.90 97.93 97.90 97.95 0.95 93.7 93.09 93.36 93.52 93.5 93.22 93.09 93.4 0.95 0.2 99.27 99.28 99.27 99.28 99.27 99.28 99.27 99.30 0.95 95.5 95.47 95.49 95.53 95.54 95.6 95.48 95.54 0 0.2 0.2 96.50 96.47 96.53 96.55 96.52 96.57 96.47 96.53 0.95 89.7 89.33 89.58 89.32 89.6 88.94 89.23 89.00 0.95 0.2 97.64 97.64 97.66 97.68 97.66 97.70 97.64 97.70 0.95 9.07 90.75 9.09 90.9 90.63 90.48 90.67 90.5 3 0.2 0.2 99.44 99.44 99.44 99.44 99.44 99.44 99.44 99.45 0.95 95.66 95.64 95.85 95.97 95.75 95.89 95.65 95.77 0.95 0.2 99.46 99.46 99.46 99.47 99.46 99.47 99.46 99.47 0.95 97.3 97.29 97.35 97.38 97.36 97.43 97.30 97.37 Average 93.08 92.86 92.98 92.84 92.80 92.7 92.83 92.72 33