Selection of Model Selection Criteria for Multivariate Ridge Regression

Size: px
Start display at page:

Download "Selection of Model Selection Criteria for Multivariate Ridge Regression"

Transcription

1 Selection of Model Selection Criteria for Multivariate Ridge Regression (Last Modified: January, 24) Isamu Nagai, Department of Mathematics, Graduate School of Science, Hiroshima University -3- Kagamiyama, Higashi-Hiroshima, Hiroshima , Japan Abstract In the present study, we consider the selection of model selection criteria for multivariate ridge regression. There are several model selection criteria for selecting the ridge parameter in multivariate ridge regression, e.g., the C p criterion and the modified C p (MC p ) criterion. We propose the generalized C p (GC p ) criterion, which includes C p and MC p criteria as special cases. The GC p criterion is specified by a non-negative parameter λ, which is referred to as the penalty parameter. We attempt to select an optimal penalty parameter such that the predictive mean square error (PMSE) of the predictor of ridge regression after optimizing the ridge parameter is minimized. Through numerical experiments, we verify that the proposed optimization methods exhibit better performance than conventional optimization methods, i.e., optimizing only the ridge parameter by minimizing the C p or MC p criterion. Key words: Asymptotic expansion; Generalized C p criterion; Model selection criterion; Multivariate linear regression model; Ridge regression; Selection of the model selection criterion.. Introduction In the present paper, we deal with a multivariate linear regression model with n observations of a p-dimensional vector of response variables and a k-dimensional vector of regressors (for more detailed information, see, for example, Srivastava, 2002, Chapter 9; Timm, 2002, Chapter 4). Let Y = (y,..., y n ), X, and E = (ε,..., ε n ) be the n p matrix of response variables, the n k matrix of non-stochastic centerized explanatory variables (i.e., X n = 0 k ) of rank(x) = k (< n), and the n p matrix of error variables, respectively, where n is the sample size, n is an n-dimensional vector of ones, and 0 k is a k-dimensional vector of zeros. Suppose that n k p 2 > 0 and ε,..., ε n i.i.d. N p (0 p, Σ), where Σ

2 is a p p unknown covariance matrix. Then, the matrix form of the multivariate linear regression model is expressed as Y = n µ + XΞ + E, where µ is a p-dimensional unknown location vector, and Ξ is a k p unknown regression coefficient matrix. This model can also be expressed as Y N n p ( n µ + XΞ, Σ I n ). Note that X is centerized. The maximum likelihood or the least squares (LS) estimators of µ and Ξ are given by ˆµ = Y n /n and ˆΞ = (X X) X Y, respectively. Since ˆµ and ˆΞ are simple, and the unbiased estimators of µ and Ξ, LS estimators are widely used in actual data analysis (see, e.g., Dien et al., 2006; Sârbu et al., 2008, Saxén and Sundell, 2006; Skagerberg, Macgregor, and Kiparissides, 992; Yoshimoto, Yanagihara, and Ninomiya, 2005). However, the problem of an estimator of Ξ becoming unstable when multicollinearity occurs in X is well known. In order to avoid this problem, a ridge regression was proposed by Hoerl and Kennard (970) when p =. Several studies extended this univariate ridge regression to the multivariate case, e.g., Brown and Zidek (980), Haitovsky (987), and Yanagihara and Satoh (200). The ridge regression estimator of Ξ is given as where M ˆΞ = M X Y, = X X + I k, and is a nonnegative value, which is referred to as a ridge parameter. Since an estimate of ˆΞ depends strongly on the value of the ridge parameter, the optimization of is an important problem in ridge regression. An optimal is commonly determined by minimizing the predicted mean square error (PMSE) of the predictor of Y, i.e., Ŷ = n ˆµ + X ˆΞ. However, we cannot directly use the PMSE to optimize, because unknown parameters are included in the PMSE. Hence, we adopt an optimization method using a model selection criterion, i.e., an estimator of PMSE, instead of the unknown PMSE. As an estimator of PMSE, Yanagihara and Satoh (200) proposed a C p criterion. This criterion includes C p criteria for selecting variables in a univariate linear model, which was proposed by Mallows (973; 995), for selecting variables in a multivariate linear model, which was proposed by Sparks, Coutsourides, and Troskie (983) as a special case. Yanagihara and Satoh (200) also proposed the modified C p (MC p ) criterion such that the bias of the C p criterion for choosing the ridge parameter for PMSE is completely corrected under a fixed. This criterion coincides with the bias-corrected C p criterion proposed by Fujikoshi and Satoh (997) when = 0. The MC p criterion has several 2

3 desirable properties as the estimator of PMSE as described by, e.g., Fujikoshi, Yanagihara, and Wakaki (2005), and Yanagihara and Satoh (200). Unfortunately, optimizing by minimizing MC p, i.e., an unbiased estimator of PMSE, does not always minimize the PMSE of Ŷ. This indicates that there will be an optimal model selection criterion for selecting. Thus, we propose a generalized C p (GC p ) criterion that includes the C p and MC p criteria as special cases (originally, the GC p criterion was proposed by Atkinson, 980, for selecting variables in the univariate linear model). The GC p criterion is specified by a non-negative parameter λ, which is referred to as the penalty parameter. From the viewpoint of making the PMSE of the predictor of Y after optimizing small, we select the optimal penalty parameter λ, which is basically the selection of the model selection criterion. In the present paper, we optimize λ by the following three methods: (Double optimization): We optimize and λ simultaneously by minimizing GC p and the penalty selection criteria, respectively. (Optimization of λ with an approximated value of an optimal ): We optimize λ by minimizing the penalty selection criterion made from the approximated value of optimal. (Asymptotic optimization of λ): We calculate an asymptotic optimal λ from an asymptotic expansion of the PMSE. We then estimate the asymptotic optimal λ. From the optimization of the model selection criterion, we will perform a reasonable optimization of. The remainder of the present paper is organized as follows: In Section 2, we propose the GC p criterion, which includes criteria proposed by Yanagihara and Satoh (200) as special cases. In Section 3, we propose three optimization methods for λ. In Section 4, we compare the optimization methods by conducting numerical studies. Finally, technical details are provided in the Appendix. 2. Generalized C p Criterion In this section, we propose the GC p criterion for optimizing the ridge parameter, which includes C p and MC p criteria proposed by Yanagihara and Satoh (200). Moreover, we present several mathematical properties of the optimal by minimizing the GC p criterion. The PMSE of Ŷ is defined as PMSE[Ŷ] = E Y [E U [tr{(u Ŷ) (U Ŷ)Σ }]], 3

4 where U is a random variable matrix that is independent of Y and has the same distribution as Y. The C p criterion proposed by Yanagihara and Satoh (200) is a rough estimator of the PMSE of Ŷ, which is defined by C p () = tr(w S ) + 2ptr(M M 0 ), where W is a residual sum of squares matrix defined by W = (Y Ŷ) (Y Ŷ), S is an unbiased estimator of Σ defined by S = W 0 /(n k ). From the definition of the C p criterion, the first term of C p measures the closeness of the ridge regression to the data, and the second term evaluates the penalty for the complexity of the ridge regression. However, the C p criterion has the bias to the PMSE. The MC p proposed by Yanagihara and Satoh (200) is an exact unbiased estimator of the PMSE. By neglecting terms that are independent of, MC p is defined as where c M MC p () = c M tr(w S ) + 2ptr(M M 0 ), = (p + )/(n k ). By comparing the two criteria, we can see that the difference between C p and MC p is a coefficient before tr(w S ). as Thus, we can generalize the model selection criterion for optimizing the ridge parameter GC p (, λ) = λtr(w S ) + 2ptr(M M 0 ), (2.) where λ is a non-negative parameter. Note that GC p (, ) = C p () and GC p (, c M ) = MC p (). In this criterion, the penalty for the complexity of the model, which is in the second term of (2.), becomes large when λ becomes small. This means that λ controls the penalty for the complexity of the model in the criterion (2.). Hence, we can regard λ as a penalty parameter. In the present paper, we consider the optimization of λ to obtain the optimal, which further reduces the PMSE. When λ is fixed, the optimized ridge parameter ˆ(λ) is obtained by ˆ(λ) = arg min GC p (, λ). (2.2) [0, ] Since ˆ(λ) is a minimizer of GC p (, λ), the following equation holds: GC p (, λ) = 0. (2.3) =ˆ(λ) Note that ˆ(λ) changes with λ. Here, we obtain the following mathematical properties of ˆ(λ) (The proof is provided in Appendix A..): 4

5 Theorem 2.. Let (z,..., z k ) = Q X Y S /2, (2.4) r λ,j = λ z j 2 pd j, (j =,..., k), (2.5) pd 2 j where z i is a p-dimensional vector, Q is a k k orthogonal matrix which diagonalizes X X, i.e., Q X XQ = D = diag(d,..., d k ) and d i (i =,..., k) are eigenvalues of X X, and r + λ, r+ λ,m (m k) are positive values of r λ,,..., r λ,k. Then, ˆ(λ) has the following properties:. ˆ(λ) is a monotonic decreasing function with respect to λ. 2. ˆ(λ) is not 0 when λ [0, ). 3. ˆ(λ) > (r + λ, ) when r + λ, exists. ˆ(λ) = when r + λ, does not exist, i.e., max j=,...,m r λ,j ˆ( ) = 0, ˆ(0) =. 5. ˆ(λ) = for any λ < min j=,...,k pd 2 j/ z j 2. We suppose that d i = O(n). However, we must use an iterative computational algorithm to optimize because we cannot obtain ˆ(λ) in closed form. In order to reduce the number of computational tasks, we consider approximating ˆ(λ) using an asymptotic expansion. Equation (2.3) implies asymptotic expansion of the GC p criterion. From this expansion, we obtain the asymptotic expansion of ˆ(λ) as the follows: Theorem 2.2. Here, ˆ(λ) can be expanded as ˆ(λ) = (L) (λ) + O p (n L ), where (L) (λ) = pb λa + L 2λa n l ( )l+ (l + ) (L )(λ){λ(l l + 2)a l+ (L ) (λ) 2pb l+ }. l=0 (2.6) Here, (0) (λ) = 0, V = X Y S Y X, a j = n j tr(v M (j+2) 0 ), and b j = n j tr(m j 0 ), and l (L) (λ) refers to { (L) (λ)} l. The proofs of this theorem are given in Appendix A.2. Note that (L) (λ) can be used as an approximated value of ˆ(λ). There is a one-to-one correspondence between () (λ) = pb /(λa ) and λ, and () (λ) satisfies the properties, 2, and 4 in Theorem 2.. 5

6 3. Optimization of Penalty in the GC p Criterion 3.. Double optimization of and λ In the previous section, we considered the model selection criterion for selecting, which can be regarded as an estimator of PMSE[Ŷ]. By minimizing the estimator of PMSE of Ŷ, we expect to reduce the PMSE of Ŷ. However, since the optimal ridge parameter will be changed by the data, it is important to reduce not the PMSE of Ŷ but rather the PMSE of Ŷˆ, i.e., the predictor of Y after optimizing. In this section, we consider optimizing λ using PMSE[Ŷˆ(λ) ], where Ŷˆ(λ) = n ˆµ + X ˆΞˆ(λ). Without a loss of generality, we can assume that the covariance matrix of y i is I p in the PMSE[Ŷˆ(λ) ]. Therefore, from Efron (2004), we obtain PMSE[Ŷˆ(λ) ], which is a function of λ, as follows: PMSE[Ŷˆ(λ) ] = E Y [tr(wˆ(λ) Σ )] + 2E Y [ n i= p j= (Ŷˆ(λ) ) ij where (A) ij are the (i, j)th elements of A. Since ˆ(λ) depends on (Y ) ij, we can see that (Ŷˆ(λ) ) ij = (Ŷ) ij + (Ŷ) ij ˆ(λ). (3.) =ˆ(λ) =ˆ(λ) The first term of the above equation is calculated as (Ŷ) ij = ( n ˆµ ) ij (XM X Y ) ij + =ˆ(λ) =ˆ(λ) ], = ( n ˆµ ) ij + (XM ˆ(λ) X ) ii. Note that n p i= j= ( n ˆµ ) ij / = p and n p i= j= (XMˆ(λ) X ) ii = ptr(m ˆ(λ) M 0). Next, we consider obtaining the second term of (3.). Note that (Ŷ) ij = (XM X Y ) ij = (XM 2 X Y ) ij. Hence, we derive PMSE[Ŷˆ(λ) ] = E Y [tr(wˆ(λ) Σ )] + 2p{E Y [tr(m ˆ(λ) M 0)] + } [ n p 2E Y (XM 2 ˆ(λ) X Y ) ˆ(λ) ] ij. (Y ) i= ij j= (3.2) Based on this result, we need only obtain ˆ(λ)/ in order to calculate PMSE[Ŷˆ(λ) ]. This derivative leads to the following theorem (The proofs are given in Appendix A.3.): 6

7 Theorem 3.. The PMSE of Ŷˆ(λ) is expressed as where PMSE[Ŷˆ(λ) ] = E Y [tr(wˆ(λ) Σ ) + 2p{tr(M ˆ(λ) M 0) + } + 4B(ˆ(λ))], (3.3) B() = λtr(v M 5 M 0 ) λtr(v M 3 ) 3λtr(V M 4 ) + 2ptr(M 3 M 0 ). By neglecting the terms that are independent of λ, we define the penalty selection criteria for optimizing λ as follows: Definition 3.. The penalty selection criteria to optimize λ are defined as C # p (λ) = tr(wˆ(λ) S ) + 2ptr(M ˆ(λ) M 0) + 4B(ˆ(λ)), MC # p (λ) = c M tr(wˆ(λ) S ) + 2ptr(M ˆ(λ) M 0) + 4B(ˆ(λ)), where ˆ(λ) is given by (2.2) and c M = (p + )/(n k ). Here, note that C # p (λ) is obtained by substituting S for Σ when we neglect the terms that are independent of λ in (3.3). However, there exists a bias because S is not an unbiased estimator of Σ (see, e.g., Siotani, Hayakawa, and Fujikoshi (985)). Based on the results reported by Yanagihara and Satoh (200), we will correct the bias of E Y [tr(wˆ(λ) S )] to E Y [tr(wˆ(λ) Σ )] under fixed ˆ(λ). Finally, we define MC # p (λ) by neglecting terms that are independent of and λ. Using these criteria, λ and are optimized as follows: ˆλ # C = arg min C p # (λ) and ˆ(ˆλ # C ) = arg min GC p (, ˆλ # C ), λ [0, ] [0, ] ˆλ # M = arg min MC p # (λ) and ˆ(ˆλ # M) = arg min GC p (, ˆλ # M). λ [0, ] [0, ] These optimization methods are similar to those reported by Ye (998) and Shen and Ye (2002) Optimization of λ with approximated ˆ(λ) In the previous subsection, we proposed penalty selection criteria for selecting λ. These criteria are made from the optimal obtained by minimizing the GC p criterion. This indicates that we need to repeat the optimization of until obtaining the optimal λ. Hence, a number of computational tasks are required for such an optimization. In this subsection, we try to reduce the number of computational tasks by using the approximated ˆ(λ), which is given by (2.6). Thus, we propose the penalty selection criterion when the approximated ˆ(λ) is used. As such, we calculate (L) (λ)/. The following lemma is useful for obtaining such a derivative (The proof is provided in Appendix A.4.): 7

8 Lemma 3.. For any l, the first derivative of a l with respect to (Y ) ij is calculated as ( )) a l = 2n l S Y XM0 l 2 X (I n n k Y S Y H where H = I n n n/n XM 0 X. By using this lemma and (2.6), we obtain the following theorem:, ji Theorem 3.2. The PMSE of Ŷ (L) (λ) is expressed as where PMSE[Ŷ (L) (λ) ] = E Y [tr(w (L) (λ) Σ ) + 2p{tr(M (L) (λ) M 0) + }] + 2E Y [B ( (L) (λ))], B ( (L) (λ)) = 2n a () (λ)tr(m 2 0 M 2 (L) (λ) V ) + tr(m λa M 2 i= V ) L (L) (λ) j= l=0 n l ( )l+ (l + ) l (L )(λ){λ(l + 2)a l+ (L ) (λ) 2pb l+ } L 2λa n l ( )l+ (l + ) l (L ) {λ(l + )(l + 2)a l+ (L ) (λ) 2plb l+ } l=0 n p (XM 2 (L) (λ) X Y ) (L ) ij n L ( ) l+ (l + )(l + 2) l+ a l=0 (L ) tr(m (l+2) 0 M 2 (L) (λ) V ). The proof of this theorem is presented in Appendix A.5. When () (λ) is used, we obtain PMSE[Ŷ () (λ) ] = E Y [tr(w () (λ) Σ ) + 2p{tr(M M () (λ) 0) + }] [ ] 4n () (λ) + E Y tr(m0 2 M 2 a V ). () (λ) Thus, by neglecting terms that are independent of λ, the penalty selection criteria with () (λ) are defined as follows: Definition 3.2. Penalty selection criteria to optimize λ when () (λ) is used are defined as follows: C p () (λ) = tr(w () (λ) S ) + 2ptr(M M () (λ) 0) + 4n () (λ)tr(m0 2 M 2 a V ), () (λ) MC p () (λ) = c M tr(w () (λ) S ) + 2ptr(M M () (λ) 0) + 4n () (λ)tr(m0 2 M 2 a V ), () (λ) where () (λ) = pb /(λa ) and c M = (p + )/(n k ). 8

9 Similar to MC p # (λ), MC p () (λ) can be regarded as simple bias-corrected C p () (λ). At least, when () (λ) = 0, MC p () (λ) completely corrects the bias of C p () (λ). If we use a (L) (λ) other than () (λ), the penalty selection criteria becomes more complicated as the number of L increases. As an example, we describe the penalty selection criteria for (2) (λ) in Appendix A.6. From the viewpoint of an application, C p () (λ) and MC p () (λ) are useful because these are the simplest criteria among all L. When we use C p () (λ) and MC p () (λ), the optimal and λ are given as follows: ˆλ () M ˆλ () C = arg min λ [0, ] C() p (λ) and = arg min λ [0, ] MC() p (λ) and ˆ(ˆλ () C ) = () (ˆλ () C ), ˆ(ˆλ () M ) = () (ˆλ () M ) Asymptotic optimization for λ In previous subsections, we proposed the penalty selection criteria. When such criteria are used to optimize λ, we must perform an iterative procedure. In this subsection, we consider the non-iterative optimization of λ. This requires the calculation of an asymptotic optimal λ, which minimizes an asymptotic expansion of PMSE[Ŷˆ(λ) ] among λ [0, ]. The following theorem gives such an asymptotic optimal value of λ (The proof is provided in Appendix A.7.): Theorem 3.3. An asymptotic optimal λ minimizes PMSE[Ŷˆ(λ) ] asymptotically is given by λ = E Y [a ] E Y [a /a 2 ] 2E Y [a 2 /a 2 ] pb E Y [a /a 2 ], where V = X Y Σ Y X and a j = n j tr(v M (j+2) 0 ). By replacing a with a, we estimate λ as follows: ˆλ 0 = { 2tr(M 4 } 0 V ) ptr(m0 3 V )tr(m0. (3.4) ) Note that E Y [a ] = c M E Y [a ] holds. Hence, we can estimate E Y [a ] as c M a. This implies new estimator of λ given by ˆλ M = c Mˆλ0. When we use ˆλ 0 and ˆλ M, optimal is given by ˆ(ˆλ 0 ) = arg min GC p (, ˆλ 0 ) and ˆλ 0 is in (3.4), [0, ] ˆ(ˆλ M ) = arg min GC p (, ˆλ M ) and ˆλ M = c Mˆλ0. [0, ] 4. Numerical Study 9

10 In this section, we conduct numerical studies to compare the PMSEs of predictors of Y consisting of the ridge regression estimators with optimized ridge and penalty parameters. Let R q and q (ρ) be q q matrices defined by R q = diag(,..., q) and ( q (ρ)) ij = ρ i j. The explanatory matrix X was generated from X = W Ψ /2, where Ψ = R /2 k k (ρ x )R /2 k, and W is an n k matrix, the elements of which were generated independently from the uniform distribution on (, ). The k p unknown regression coefficient matrix Ξ was defined by Ξ = δf Ξ i, where δ is a constant, and F is defined as F = diag( κ, 0 k κ ), which is a k k matrix, and Ξ i is defined as the first five rows of Ξ 0 when k = 5, Ξ 0 when k = 0, and Ξ when k = 5. Ξ 0 = , Ξ = Here, δ controls the scale of the regression coefficient matrix, and F controls the number of non-zero regression coefficients via κ (the dimension of the true model).. The values of the elements of Ξ 0 and Ξ, which is an essential regression coefficient matrix, are the same as in Lawless (98). Simulated data values Y were generated by N n 3 (XΞ, Σ I n ) repeatedly under several selections of n, k, κ, δ, ρ y, and ρ x, where Σ = R /2 3 3 (ρ y )R /2 3, and the number of repetition was 000. At each repetition, we evaluated r(xξ, Ŷˆ) = tr{(xξ Ŷ) (XΞ Ŷ)Σ }, where Ŷˆ = n ˆµ + X ˆΞˆ, which is the predicted value of Y obtained from each method. The average of np + r(xξ, Ŷˆ) across 000 repetitions was regarded as the PMSE of Ŷˆ. In the simulation, a standardized X was used to estimate the regression coefficients. Recall that GC p (, λ) is defined in (2.). Here, λ and are optimized by the following methods: Method : ˆ = arg min GC p (, ˆλ) and ˆλ = ˆλ 0, where ˆλ 0 is defined in (3.4). [0, ] Method 2: ˆ = arg min GC p (, ˆλ) and ˆλ = ˆλ M = c Mˆλ0, where c M = (p + )/(n k ). [0, ] 0

11 Method 3: ˆ = arg min GC p (, ˆλ) and ˆλ = arg min C p # (λ), where C p # (λ) is given by [0, ] λ [0, ] Definition 3.. Method 4: ˆ =arg min GC p (, ˆλ) and ˆλ=arg min MC p # (λ), where MC p # (λ) is given by [0, ] λ [0, ] Definition 3.. Method 5: ˆ = () (ˆλ) and ˆλ = arg min λ [0, ] C() p (λ), where () (λ) and C p () (λ) are defined in (2.6) and by Definition 3.2. Method 6: ˆ = () (ˆλ) and ˆλ = arg min MC p () (λ), where MC p () (λ) is given by Definition λ [0, ] 3.2. For the purpose of comparison with the proposed methods, we prepare conventional optimization methods, which are obtained using the following methods: Method 7: ˆC = arg min C p () = arg min GC p (, ). [0, ] [0, ] Method 8: ˆM = arg min MC p () = arg min GC p (, c M ), where c M = (p + )/(n [0, ] [0, ] k ). In Methods 3 through 8, the fminsearch function in Matlab is used to find the minimizer of the penalty selection criterion or model selection criterion. In the fminsearch function, the Nelder-Mead simplex method (see, e.g., Lagarias et al., 998) is used to search the value that minimizes the function. When Methods through 4 are used, an optimal is searched using the fminsearch function. We can see that computational speeds of Methods and 2 are the same as those of Methods 5 and 6. Furthermore, the computational speeds of Methods 5 and 6 are almost the same as those of Methods 7 and 8 because these four methods optimize one parameter. It is easy to predict that the computational speeds of Methods 3 and 4 are slower than the other methods because Methods 3 and 4 optimize two parameters simultaneously. In this paper, we proposed Methods through 6 as referred to above, and these methods can be regarded as the estimation methods for the optimal λ. To obtain the optimal λ, called λ, which minimizes the PMSE, we divided [0, 2] into 00 parts and used each point. Then we compute r(xξ, Ŷˆ) for each point in each repetition. After 000 repetitions, we compute the averages of these values for each point which are regarded as the main term of PMSE of Ŷˆ. By comparing the average values, the λ is obtained. For comparing ˆλ which is estimated λ by using each method in above Methods through 6, we show the Figure in some situations which shows the box plots of each ˆλ in 000 repetitions. The horizontal line

12 means the λ. Tables through 4 show the averages of (ˆλ λ ) 2 across 000 repetitions, which is referred as the mean squared error (MSE) of ˆλ, for each method. In the Theorem 2.2, we derived the expansion for ˆ(λ) and we suggested to use the first term of the expansion which is referred as () (λ). To compare the ˆ(λ) and () (λ), we show the scatter plots in some situations when we fix λ as or 2 in Figure 2. In each scatter plot, the line means the 45-degree line which means the line of ˆ(λ) = () (λ). When the scatter plot close up this line, () (λ) closes to ˆ(λ). Tables 7 through 2 show the simulation results obtained for PMSE[Ŷˆ]/{p(n +k + )} 00 for the cases in which (k, n) = (5, 30), (5, 50), (0, 30), (0, 50), (5, 30), and (5, 50), respectively, where p(n + k + ) is the PMSE of the predictor of Y derived using the LS estimators. We note p = 3 in numerical studies. In the tables, bold font indicates the minimized PMSE, and italic font indicates the second-smallest PMSE. Please insert figures and 2 around here Please insert tables, 2, 3, 4, 5, 6, 7, 8, 9, 0, and 2 around here From the figure shows the box plots for ˆλ for each method, we can see that the dispersion of ˆλ and the differences between ˆλ and λ. Methods 2, 4 and 6 are always smaller value than Methods, 3 and 5. The dispersion of Methods 2, 4 and 6 are smaller than Methods, 3 and 5. This facts mean that the correction of each method make the optimized value and dispersion smaller. We note that the dispersions of Methods and 2 are smaller than other methods. When ρ y and ρ x are small, our optimization method is nearly equal to λ. From the tables through 6, we can see that the numerical evaluation for each method. When k and δ are zeros, Method 6 is the best and Method 5 is the second best. Methods 2 and 4 are the best and the second best when κ and δ is small. When κ is equal to k = 0 and δ is large, Method 6 is the best method. On the other hand, when k = κ = 5 and δ is large, Method 6 or 2 is the best in ρ y is small or large. Consequently, Method 6 and 5 was, on average, the best and the second best method except k = 5. When k = 5, Method 5 was the best. Hence we recommend using Method 6 to optimize the penalty parameter λ. From the figure 2 shows the scatter plot for () (λ) and ˆ(λ), we can see the dispersion in each situation. We note that λ become large, the difference between ˆ(λ) and () (λ) becomes small. Also when ρ y or n become large, the difference between ˆ(λ) and () (λ) becomes small. On the other hand, the difference between ˆ(λ) and () (λ) becomes large ρ x is large. In almost case, () (λ) is smaller than ˆ(λ). This fact is corresponding the result in the Theorem 2.2 since ˆ(λ) = () (λ) + O(n ). When the ρ y or δ become large, the dispersion of ˆ(λ) becomes small. The each value of ˆ(λ) and () (λ) become small when ρ y, ρ x, δ or λ becomes large. 2

13 Based on the simulations, we can see that all of the methods improved the PMSEs of the LS estimators in almost all cases. All of the methods greatly improved the PMSE when n becomes small or k becomes large. Moreover, the improvement in the PMSE of the proposed method increases as ρ y decreases. The improvement in the PMSE when κ 0 and δ 0 of the proposed method increases as ρ x increases. Comparison of several methods reveals that Methods 2 and 4 were better than Methods and 3, respectively, in almost all cases when ρ x is large. When k and ρ x become large, Methods 5 and 6 provide a greater improvement in PMSE than Methods 3 and 4. When k becomes small and n becomes large, Methods 4 and 6 improve the PMSE more than Methods 2 and 4 in most cases. Occasionally, Method 7 improves the PMSE more than Method 8, especially when κ and δ become large. Consequently, Method 6 was, on average, the best method. In particular, it strongly improved the PMSE when δ and κ are small. Based on these results, we recommend using Method 6 to optimize the multivariate ridge regression. Acknowledgments The author wish express my deepest gratitude to Dr. Hirokazu Yanagihara of Hiroshima University for his valuable advice and encouragement and introducing me to various fields of mathematical statistics. In addition, I would like to thank Professor Hirofumi Wakaki for previous comments and Professor Yasunori Fujikoshi of Hiroshima University for providing helpful comments and suggestions. Also, I thank to Dr. Kengo Kato of Hiroshima University for his encouragement and comments. Futhemore, I sincere thank Professor Megu Ohtaki, Dr. Kenichi Satoh, and Dr. Tetsuji Tonda of Hiroshima University for their encouragement and valuable comments and introducing me to various fields of mathematical statistics. I also would like to thank Dr. Tomoyuki Akita, Dr. Shinobu Fujii, and Dr. Kunihiko Taniguchi of Hiroshima University for their advice and encouragement. Special thanks are due to Professor Takashi Kanda of Hiroshima Institute of Technology for his encouragement and his recommendation to attend Hiroshima University. Appendix A.. Proof of Theorem 2. In this subsection, we prove Theorem 2., which shows the properties of ˆ(λ). Using d j 3

14 and z j in (2.4), we can write tr(w S ) and tr(m M 0 ) in (2.) as g() = tr(w S ) = tr(y Y S ) 2 h() = tr(m M 0 ) = k j= Since d j > 0 and 0, we have d j d j +. k j= z j 2 d j + + k j= z j 2 d j (d j + ) 2 n ˆµ S ˆµ, ġ() = g() = 2 k j= z j 2 0, (A.) (d j + ) 3 with equality if and only if = 0 or, and ḣ() = h() = k j= d j 0, (A.2) (d j + ) 2 with equality if and only if. Therefore, g() and h() are strictly monotonic increasing and decreasing functions of [0, ], respectively. Since GC p (, λ) = λg() + 2ph(), these results imply that ˆ( ) = arg min [0, ] GC p (, ) = arg min [0, ] tr(w S ) = 0, ˆ(0) = arg min GC p (, 0) = arg min tr(m M 0 ) =. [0, ] [0, ] On the other hand, from (A.) and (A.2), we derive GC p (, λ) = GC p (, λ) = 2 k j= pd 2 j(r λ,j ) (d j + ) 3 = k ϕ j ( λ), j= where r λ,j is given by (2.5). Note that ϕ j ( λ) 0 when [0, (r + λ, ) ]. Therefore, GC p (, λ) < 0 when [0, (r + λ, ) ]. These imply that GC p (, λ) is a monotonic decreasing function with respect to [0, (r + λ, ) ]. Thus, ˆ(λ) > (r + λ, ). On the other hand, if max j=,...,k r λ,j 0 is satisfied, ϕ j ( λ) 0 holds for any. This fact means GC p (, λ) is a monotonic decreasing function with respect to [0, ]. Hence, we can see that ˆ(λ) = when max j=,...,k r λ,j 0. Since max j=,...,k r λ,j 0 holds when λ < min j=,...,k pd j / z j 2. Thus, ˆ(λ) = when λ < min j=,...,k pd j / z j 2 is satisfied. Using Equation (2.3) and GC p (, λ) = λġ() + 2pḣ(), we have where GC λ p (ˆ(λ), λ) = ġ(ˆ(λ)) + ˆ(λ) GC p (ˆ(λ), λ) = 0, λ (A.3) GC p (, λ) = 2 GC p (, λ)/( 2 ). Since ˆ(λ) satisfies (2.2), i.e., ˆ(λ) is the minimizer of GC p (, λ), GC p (, λ) is a convex function around the neighborhood of ˆ(λ). Hence, we 4

15 have GC p (ˆ(λ), λ) > 0. Using this result and Equation (A.3), we obtain ˆ(λ) λ = ġ(ˆ(λ)) GC p (ˆ(λ), λ). We derive ġ(ˆ(λ)) 0 because ġ() is a strictly monotonic decreasing function of [0, ]. Hence, ˆ(λ)/( λ) 0 is obtained. This implies that ˆ(λ) is a monotonic decreasing function with respect to λ. A.2. Proof of Theorem 2.2 In this subsection, we present the proof of Theorem 2.2, which describes the expansion of ˆ(λ). In order to prove this theorem, we expand the GC p criterion in (2.) under fixed λ. Recall that X n = 0 k, (n k )S = Y (I n n n/n XM 0 X )Y, and Q X XQ = D. Hence, we derive W S = Y (I n n n/n XM X ) 2 Y S = (n k )I p + Y X(M 0 2M + M M 0 M )X Y S = (n k )I p + Y XQD /2 {I k D(D + I k ) } 2 D /2 Q X Y S. Based on this result, the GC p criterion is expressed as GC p (, λ) = λ(n k )p + λ 2 tr{d /2 Q V QD /2 (D + I k ) 2 } + 2p{D(D + I k ) }. Letting t j = (D /2 Q V QD /2 ) jj, we obtain GC p (, λ) = λ(n k )p + { k ( λ + ) 2 ) } 2 t i + 2p ( + di. d i d 2 i i= By Taylor expansion around = 0, we have GC p (, λ) = λ(n k )p + λ k i= = (λ(n k ) + 2k)p + = (λ(n k ) + 2k)p + ( ) l+ l l+ t i + 2p l= d l+ i { λ( ) l+ l l+ l= l= k i= ( k i= ) ( ) l+ l l= d l i t i 2p( ) l+ l d l+ i k i= ( ) l+ l {λltr(v M (l+2) 0 ) 2ptr(M l 0 )}. Recall that a j = n j tr(v M (j+2) 0 ) and b j = n j tr(m j 0 ). It follows that GC p (, λ) = (λ(n k ) + 2k)p + lim L L ( ) l+ l {λla n l l 2pb l }. l= } d l i 5

16 Then, the following equation is derived: GC p(, λ) = lim L L ( ) l+ l l {λ(l + )a n l l 2pb l }. l= Using the above equation and ˆ(λ) satisfying (2.3), we obtain the equation in Theorem 2.2. A.3. Proof of Theorem 3. In this subsection, we prove Theorem 3., which shows the risk function with respect to λ. Recall that g() = tr(w S ), h() = tr(m M 0 ) and GC p (, λ) = λg() + 2ph(). Since ˆ(λ) satisfies (2.3), we obtain ( 0 = λ g() + 2p h() ) =ˆ(λ) =ˆ(λ) = ˆ(λ) { } λ g(ˆ(λ)) + 2pḧ(ˆ(λ)) + λ ġ(ˆ(λ)), where ġ(ˆ(λ)) = g()/( ) =ˆ(λ), g(ˆ(λ)) = 2 g()/( ) 2 =ˆ(λ), and ḣ(ˆ(λ)) = h()/( ) =ˆ(λ). Thus, we obtain because, from Appendix A., ˆ(λ) = λ( ġ(ˆ(λ)))/( ), GC p (ˆ(λ), λ) By simple calculation, we have ġ(ˆ(λ)) = 2ˆ(λ)tr(M 3 ˆ(λ) V ). GC p (ˆ(λ), λ) = λ g(ˆ(λ)) + 2pḧ(ˆ(λ)) > 0. As in the proof of Lemma 3., which is given in Appendix A.4, we obtain tr(m 3 ˆ(λ) V )/() = 2(S Y XM 3 ˆ(λ) X {I n Y S Y H/(n k )}) ji. Hence, we derive n i= p j= because X n = 0 k and M 3 ˆ(λ) M 0M 2 ˆ(λ) GC p (, λ) = λġ() + (XM 2 ˆ(λ) X Y ) ˆ(λ) 4λˆ(λ)tr(V M 5 ˆ(λ) M 0) ij =, GC p (ˆ(λ), λ) 2pḣ() = 2{λtr(M 3 = M 5 ˆ(λ) M 0. V ) ptr(m 2 M 0 )} and By simple calculation, we have GC p (ˆ(λ), λ) = 2{λtr(M V ) 3λˆ(λ)tr(M V ) + 2ptr(M ˆ(λ) ˆ(λ) ˆ(λ) M 0)}. Thus, the theorem is proved. A.4. Proof of Lemma 3. In this subsection, we prove Lemma 3., which shows the derivative of a l for any l. Since a l = n l tr(v M (l+2) 0 ), M 0 = X X, and V = X Y S Y X, we need only obtain the 6

17 derivative of V. We can see that S = S S S = n k S (e j p e i nhy + Y He i n e j p)s, where e i n is the n-dimensional vector, the ith element of which is one and other elements of which are zeros. Thus, we obtain V = X e i n e (Y ) j ps Y X + X Y S e j p e i nx ij n k X Y S (e j p e i nhy + Y He i n e j p)s Y X. From a l /( ) = n l tr{m (l+2) 0 ( V )/( )}, we derive this lemma. From (3.2), we obtain A.5. Proof of Theorem 3.2 PMSE[Ŷ (L) (λ) ] = E Y [tr(w (L) (λ) Σ ) + 2p{tr(M M (L) (λ) 0) + }] [ n p 2E Y (XM 2 (L) (λ) X Y ) ] (L) (λ) ij. (Y ) i= j= ij Hence, we need only calculate (L) (λ)/( ). Using Theorem 2.2, we derive the derivative as follows: (L) (λ) = () (λ) + [ 2λa L l=0 n l ( )l+ (l + ) ] (L )(λ){λ(l l + 2)a l+ (L ) (λ) 2pb l+ }. Recall that (0) (λ) = 0. From Lemma 3., we have () (λ) = pb λa 2 a = 2n () (λ) a ( )) S Y XM0 3 X (I n n k Y S Y H, ji 7

18 and [ = n λa 2 L l=0 + L 2λa L ] n l ( )l+ (l + ) (L )(λ){λ(l l + 2)a l+ (L ) (λ) 2pb l+ } )) 2λa l=0 ( ( S Y XM0 3 X I n Y S Y H n k n l ( )l+ (l + ) l (L )(λ){λ(l + 2)a l+ (L ) (λ) 2pb l+ } l=0 n l ( )l+ (l + ) l (L ) {λ(l + )(l + 2)a l+ (L ) (λ) 2plb l+ } (L ) + n L ( ) l+ (l + )(l + 2) l+ a l=0 Thus, we obtain n p (XM 2 (L) (λ) X Y ) ij i= j= [ 2λa = n tr(m λa M 2 i= j= L l=0 (L ) ji ( S Y XM (l+3) 0 X (I n n k Y S Y H n l ( )l+ (l + ) ] (L )(λ){λ(l l + 2)a l+ (L ) (λ) 2pb l+ } V ) L (L) (λ) l=0 )) n l ( )l+ (l + ) l (L )(λ){λ(l + 2)a l+ (L ) (λ) 2pb l+ } + L 2λa n l ( )l+ (l + ) l (L ) {λ(l + )(l + 2)a l+ (L ) (λ) 2plb l+ } l=0 n p (XM 2 (L) (λ) X Y ) (L ) ij + n L ( ) l+ (l + )(l + 2) l+ a and l=0 n i= p j= (l+2) (L ) tr(m0 M 2 (XM 2 (L) (λ) X Y ) ij () (λ) (L) (λ) V ), We derive this theorem by substituting these results into (3.2). = 2n () (λ) tr(m0 2 M 2 a V ). (L) (λ) A.6. The criteria for optimizing λ when we use (2) (λ) In this subsection, we calculate the criterion for optimizing λ when we use (2) (λ). From (2.6), we obtain (2) (λ) = () (λ) + () (λ){3λa 2 () (λ) 2pb 2 } nλa = () (λ) + { n ()(λ) 2 3 a 2 2 b } 2. a b 8. ji

19 Since b l = n l tr(m0 l ) does not depend on (Y ) ij, we derive { ()(λ) 2 (2) (λ) = () (λ) = () (λ) = () (λ) + n { 2 () (λ) + n { + 2 () (λ) n ( 3 a 2 ( 3 a 2 a 2 b 2 b ( 3 a 2 a 2 b 2 b )} 2 b ) } a b ()(λ) a 2/a ) } + 3 () 2 (λ) a 2 /a. n Hence, the third term of (3.) is obtained using the result of () (λ)/( ) as follows: n p (XM 2 (2) (λ) X Y ) (2) (λ) ij (Y ) i= j= ij = 2n { () (λ) + 2 ( () (λ) 3 a 2 2 b ) } ( ) 2 tr M0 2 M 2 a n a b V (2) (λ) + 3 () 2 (λ) n p ( ) (XM 2 a 2 a na 2 (2) (λ) X Y ) ij a a 2. i= j= From Lemma 3., we obtain 3 () 2 (λ) n p ( ) (XM 2 a 2 a (2) (λ) X Y ) ij a a 2 na 2 i= = 6 2 () (λ) ( Thus, we have n a 2 j= n i= p (XM 2 (2) (λ) X Y ) ij j= S Y XM 3 0 ( ( ) na M0 a 2 I k X I n = 6 () 2 (λ) tr{xm 2 a 2 V M (2) (λ) 0 3 (na M0 a 2 I k )X } = 6 () 2 (λ) tr{m a M 2 V (na (2) (λ) M0 a 2 I k )}. i= p j= (XM 2 (2) (λ) X Y ) ij (2) (λ) = 2n () (λ) tr(m0 2 M 2 a V ) + 4 () 2 (λ) (2) (λ) a + 6 () 2 (λ) tr{m a M 2 V (a (2) (λ) 2I k na M0 )} = 2n () (λ) tr(m0 2 M 2 a 8 () 2 (λ)b 2 tr(m0 2 M 2 a b (2) (λ) V ) () (λ)a 2 a 2 (2) (λ) V ) 6n 2 () (λ) 9 a )) n k Y S Y H ji ( 3 a 2 2 b ) 2 tr(m0 2 M 2 a b V ) (2) (λ) tr(m 2 0 M 2 (2) (λ) V ) tr(m 3 0 M 2 (2) (λ) V ).

20 When we use Theorem 3.2, we obtain the same result. Using this result, we obtain the C p type criteria for optimizing λ are defined as C p (2) (λ) = tr(w () (λ) S ) + 2ptr(M M (2) λ 0) and + 4n () (λ) tr(m0 2 M 2 a 6 () 2 (λ)b 2 tr(m0 2 M 2 a b (2) (λ) V ) () (λ)a 2 a 2 (2) (λ) V ) 2n 2 () (λ) a tr(m 2 0 M 2 (2) (λ) V ) tr(m 3 0 M 2 (2) (λ) V ), MC p (2) (λ) = c M tr(w () (λ) S ) + 2ptr(M M (2) λ 0) + 4n () (λ) tr(m0 2 M 2 a 6 () 2 (λ)b 2 tr(m0 2 M 2 a b (2) (λ) V ) () (λ)a 2 a 2 (2) (λ) V ) 2n 2 () (λ) a A.7. Proof of Theorem 3.3 tr(m 2 0 M 2 (2) (λ) V ) tr(m 3 0 M 2 (2) (λ) V ). In this subsection, we show an asymptotic expansion PMSE[Ŷˆ(λ) ] of and calculation to obtain ˆλ 0. Since PMSE[Ŷˆ(λ) ] is obtained as (3.3), we consider expanding each term for obtaining ˆλ 0. We obtain Wˆ(λ) Σ = Y (I n n n/n XM ˆ(λ) X ) 2 Y Σ = (n k )SΣ + Y X(M 0 2M ˆ(λ) + M ˆ(λ) M 0M ˆ(λ) )X Y Σ = (n k )SΣ + Y XQD /2 (I k D{D + ˆ(λ)I k ) } 2 D /2 Q X Y Σ, because Y (I n n n/n XM 0 X )Y = (n k )S, X n = 0 k and Q X XQ = D. Hence, we obtain tr(wˆ(λ) Σ ) = (n k )tr(sσ ) + 2 tr{(d + I k ) 2 D Q V Q}. Since S is an unbiased estimator of Σ, we have E Y [tr(wˆ(λ) Σ )] = (n k )p + E Y ( ) 2 k ˆ(λ) (Q V Q) jj d j + ˆ(λ). d j j= Then, since d i = O(n) and V = O p (n 2 ), we can expand the above equation as follows: [ k ] ˆ E Y [tr(wˆ(λ) Σ 2 (λ) )] = (n k )p + E Y (Q V Q) d 3 jj + O p (n 2 ) j= j [ ] a = (n k )p + E ˆ 2 (λ) Y + O p (n 2 ). n 20

21 From simple calculation by Taylor expansion and noting that a j = O p () and b j = O(), we derive tr(m ˆ(λ) M 0) = k b ˆ(λ) + O p (n 2 ), n λˆ(λ)tr(v M 5 ˆ(λ) M 0) = λˆ(λ)a 2 n 2 + O p (n 3 ), λtr(v M 3 ˆ(λ) ) = λa n + O p(n 2 ), λtr(v M 4 ˆ(λ) ) = λˆ(λ)a 2 n 2 + O p (n 3 ), tr(m 3 ˆ(λ) M 0) = b 2 n 2 + O p(n 3 ). By substituting these results into PMSE[Ŷˆ(λ) ] in (3.3), we obtain the asymptotic expansion of PMSE[Ŷˆ(λ) ], as follows: [ ] a PMSE[Ŷˆ(λ) ] = (n + k + )p + E ˆ 2 (λ) Y 2pb ˆ(λ) + 4a 2ˆ(λ) + O p (n 2 ) n n na From (2.6), which is proved in Appendix A.2, ˆ(λ) = pb /(λa ) + O p (n ). consider minimizing the following approximated PMSE: PMSE[Ŷˆ(λ) ] = (n + k + )p + E Y [ pb na ( pb a λ 2 a 2pb λ + 4a 2 λa )] + O(n 2 ).. Hence, we Hence we obtain the asymptotic optimal λ, which minimizes the second term of the above equation as in Theorem 3.3. References [] A. C. Atkinson, A note on the generalized information criterion for choice of a model, Biometrika, 67 (980), [2] P. J. Brown and J. V. Zidek, Adaptive multivariate ridge regression, Ann. Statist., 8 (980), [3] B. Efron, The estimation of prediction error: covariance penalties and cross-validation, J. Amer. Statist. Assoc., 99 (2004), [4] S. J. V. Dien, S. Iwatani, Y. Usuda and K. Matsui, Theoretical analysis of amino acid-producing Eschenrichia coli using a stoixhiometrix model and multivariate linear regression, J. Biosci. Bioeng., 02 (2006), [5] Y. Fujikoshi and K. Satoh, Modified AIC and C p in multivariate linear regression, Biometrika, 84 (997),

22 [6] Y. Fujikoshi, H. Yanagihara and H. Wakaki, Bias corrections of some criteria for selecting multivariate linear models in a general nonnormal case, Amer. J. Math. Management Sci., 25 (2005), [7] Y. Haitovsky, On multivariate ridge regression, Biometrika, 74 (987), [8] A. E. Hoerl and R. W. Kennard, Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 2(970), [9] J. C. Lagarias, J. A. Reeds, M. H. Wright and P. E. Wright, Convergence properties of the Nelder-Mead simplex method in low dimensions, SIAM J. Optim., 9 (998), [0] J. F. Lawless, Mean squared error properties of generalized ridge estimators, J. Amer. Statist. Assoc., 76 (98), [] C. L. Mallows, Some comments on C p, Technometrics, 5 (973), [2] C. L. Mallows, More comments on C p, Technometrics, 37 (995), [3] C. Sârbu, C. Onisor, M. Posa, S. Kevresan and K. Kuhajda, Modeling and prediction (correction) of partition coefficients of bile acids and their derivatives by multivariate regression methods, Talanta, 75 (2008), [4] R. Saxén and J. Sundell, 37 Cs in freshwater fish in Finland since 986 a statistical analysis with multivariate linear regression models, J. Environ. Radioactiv., 87 (2006), [5] X. Shen and J. Ye, Adaptive model selection, J. Amer. Statist. Assoc., 97 (2002), [6] M. Siotani, T. Hayakawa and Y. Fujikoshi, Modern Multivariate Statistical Analysis: A Graduate Course and Handbook, American Sciences Press, Columbus, Ohio, 985. [7] B. Skagerberg, J. MacGregor and C. Kiparissides, Multivariate data analysis applied to low-density polyethylene reactors, Chemometr. Intell. Lab. Syst., 4 (992), [8] R. S. Sparks, D. Coutsourides and L. Troskie, The multivariate C p, Comm. Statist. A Theory Methods, 2 (983), [9] M. S. Srivastava, Methods of Multivariate Statistics, John Wiley & Sons, New York,

23 [20] N. H. Timm, Applied Multivariate Analysis, Springer-Verlag, New York, [2] H. Yanagihara and K. Satoh, An unbiased C p criterion for multivariate ridge regression, J. Multivariate Anal., 0 (200), [22] J. Ye, On measuring and correcting the effects of data mining and model selection, J. Amer. Statist. Assoc., 93 (998), [23] A. Yoshimoto, H. Yanagihara and Y. Ninomiya, Finding factors affecting a forest stand growth through multivariate linear modeling, J. Jpn. For. Res., 87 (2005), (in Japanese). 23

24 .4 (30, 5, 0,, 0.2, 0.2) 3 (30, 5, 0,, 0.2, 0.95) Method Method 2 Method 3 Method 4 Method 5 Method 6 (30, 5, 0,, 0.95, 0.2) 0 Method Method 2 Method 3 Method 4 Method 5 Method 6 (30, 5, 0,, 0.95, 0.95) Method Method 2 Method 3 Method 4 Method 5 Method 6 Method Method 2 Method 3 Method 4 Method 5 Method 6 (30, 5, 5,, 0.2, 0.2) (30, 5, 0, 3, 0.2, 0.2) Method Method 2 Method 3 Method 4 Method 5 Method 6 Method Method 2 Method 3 Method 4 Method 5 Method 6.2 (50, 5, 0,, 0.2, 0.2).2 (30, 0, 5, 3, 0.2, 0.2) Method Method 2 Method 3 Method 4 Method 5 Method 6 Method Method 2 Method 3 Method 4 Method 5 Method 6 Figure. The box plots for ˆλ in the case of (n, k, κ, δ, ρ y, ρ x ) 24

25 Table. MSE of ˆλ in which (k, n) = (5, 30) Method κ δ ρ y ρ x Average Table 2. MSE of ˆλ in which (k, n) = (5, 50) Method κ δ ρ y ρ x Average

26 Table 3. MSE of ˆλ in which (k, n) = (0, 30) Method κ δ ρ y ρ x Average

27 Table 4. MSE of ˆλ in which (k, n) = (0, 50) Method κ δ ρ y ρ x Average

28 Table 5. MSE of ˆλ in which (k, n) = (5, 30) Method κ δ ρ y ρ x Average

29 Table 6. MSE of ˆλ in which (k, n) = (5, 50) Method κ δ ρ y ρ x Average

30 50 (30, 5, 0,, 0.2, 0.2, ) 30 (30, 5, 0,, 0.2, 0.2, 2) (30, 5, 0,, 0.95, 0.2, ) (30, 5, 0, 3, 0.2, 0.2, ) (50, 5, 0,, 0.2, 0.2, ) (30, 5, 0,, 0.2, 0.95, ) (30, 5, 0,, 0.95, 0.95, 2) (30, 0, 5, 3, 0.2, 0.2, ) Figure 2. The scatter plots for () (λ) and ˆ(λ) in the case of (n, k, κ, δ, ρ y, ρ x, λ) 30

31 Table 7. Simulation results for the case in which (k, n) = (5, 30) Method κ δ ρ y ρ x Average Table 8. Simulation results for the case in which (k, n) = (5, 50) Method κ δ ρ y ρ x Average

32 Table 9. Simulation results for the case in which (k, n) = (0, 30) Method κ δ ρ y ρ x Average

33 Table 0. Simulation results for the case in which (k, n) = (0, 50) Method κ δ ρ y ρ x Average

An Unbiased C p Criterion for Multivariate Ridge Regression

An Unbiased C p Criterion for Multivariate Ridge Regression An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University

More information

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions TR-No. 14-07, Hiroshima Statistical Research Group, 1 12 Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 2, Mariko Yamamura 1 and Hirokazu Yanagihara 2 1

More information

A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion

A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion TR-No. 17-07, Hiroshima Statistical Research Group, 1 24 A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion Mineaki Ohishi, Hirokazu

More information

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions c 215 FORMATH Research Group FORMATH Vol. 14 (215): 27 39, DOI:1.15684/formath.14.4 Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 1,

More information

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large Tetsuji Tonda 1 Tomoyuki Nakagawa and Hirofumi Wakaki Last modified: March 30 016 1 Faculty of Management and Information

More information

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems

More information

Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

More information

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,

More information

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification (Last Modified: May 22, 2012) Yusuke Hashiyama, Hirokazu Yanagihara 1 and Yasunori

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3

More information

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

More information

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification TR-No. 13-08, Hiroshima Statistical Research Group, 1 23 A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

More information

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan A STUDY ON THE BIAS-CORRECTION EFFECT OF THE AIC FOR SELECTING VARIABLES IN NOR- MAL MULTIVARIATE LINEAR REGRESSION MOD- ELS UNDER MODEL MISSPECIFICATION Authors: Hirokazu Yanagihara Department of Mathematics,

More information

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Nobumichi Shutoh, Masashi Hyodo and Takashi Seo 2 Department of Mathematics, Graduate

More information

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables Ryoya Oda and Hirokazu Yanagihara Department of Mathematics,

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Edgeworth Expansions of Functions of the Sample Covariance Matrix with an Unknown Population

Edgeworth Expansions of Functions of the Sample Covariance Matrix with an Unknown Population Edgeworth Expansions of Functions of the Sample Covariance Matrix with an Unknown Population (Last Modified: April 24, 2008) Hirokazu Yanagihara 1 and Ke-Hai Yuan 2 1 Department of Mathematics, Graduate

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors Takashi Seo a, Takahiro Nishiyama b a Department of Mathematical Information Science, Tokyo University

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control Takahiro Nishiyama a a Department of Mathematical Information Science, Tokyo University of Science,

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

CONDITIONS FOR ROBUSTNESS TO NONNORMALITY ON TEST STATISTICS IN A GMANOVA MODEL

CONDITIONS FOR ROBUSTNESS TO NONNORMALITY ON TEST STATISTICS IN A GMANOVA MODEL J. Japan Statist. Soc. Vol. 37 No. 1 2007 135 155 CONDITIONS FOR ROBUSTNESS TO NONNORMALITY ON TEST STATISTICS IN A GMANOVA MODEL Hirokazu Yanagihara* This paper presents the conditions for robustness

More information

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration Ryoya Oda, Yoshie Mima, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

Ridge Regression and Ill-Conditioning

Ridge Regression and Ill-Conditioning Journal of Modern Applied Statistical Methods Volume 3 Issue Article 8-04 Ridge Regression and Ill-Conditioning Ghadban Khalaf King Khalid University, Saudi Arabia, albadran50@yahoo.com Mohamed Iguernane

More information

Improved Liu Estimators for the Poisson Regression Model

Improved Liu Estimators for the Poisson Regression Model www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author

More information

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Supplement to the paper Accurate distributions of Mallows C p and its unbiased modifications with applications to shrinkage estimation

Supplement to the paper Accurate distributions of Mallows C p and its unbiased modifications with applications to shrinkage estimation To aear in Economic Review (Otaru University of Commerce Vol. No..- 017. Sulement to the aer Accurate distributions of Mallows C its unbiased modifications with alications to shrinkage estimation Haruhiko

More information

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Variable selection in linear regression through adaptive penalty selection

Variable selection in linear regression through adaptive penalty selection PREPRINT 國立臺灣大學數學系預印本 Department of Mathematics, National Taiwan University www.math.ntu.edu.tw/~mathlib/preprint/2012-04.pdf Variable selection in linear regression through adaptive penalty selection

More information

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA Journal of Research (Science), Bahauddin Zakariya University, Multan, Pakistan. Vol.15, No.1, June 2004, pp. 97-106 ISSN 1021-1012 APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA G. R. Pasha 1 and

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Tetsuro Sakurai and Yasunori Fujikoshi Center of General Education, Tokyo University

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

A Practical Guide for Creating Monte Carlo Simulation Studies Using R

A Practical Guide for Creating Monte Carlo Simulation Studies Using R International Journal of Mathematics and Computational Science Vol. 4, No. 1, 2018, pp. 18-33 http://www.aiscience.org/journal/ijmcs ISSN: 2381-7011 (Print); ISSN: 2381-702X (Online) A Practical Guide

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION

EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION EFFICIENCY of the PRINCIPAL COMPONEN LIU- YPE ESIMAOR in LOGISIC REGRESSION Authors: Jibo Wu School of Mathematics and Finance, Chongqing University of Arts and Sciences, Chongqing, China, linfen52@126.com

More information

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common Journal of Statistical Theory and Applications Volume 11, Number 1, 2012, pp. 23-45 ISSN 1538-7887 A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems Modern Applied Science; Vol. 9, No. ; 05 ISSN 9-844 E-ISSN 9-85 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity

More information

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006 Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging

More information

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model 204 Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model S. A. Ibrahim 1 ; W. B. Yahya 2 1 Department of Physical Sciences, Al-Hikmah University, Ilorin, Nigeria. e-mail:

More information

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan.

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan. High-Dimensional Properties of Information Criteria and Their Efficient Criteria for Multivariate Linear Regression Models with Covariance Structures Tetsuro Sakurai and Yasunori Fujikoshi Center of General

More information

Modfiied Conditional AIC in Linear Mixed Models

Modfiied Conditional AIC in Linear Mixed Models CIRJE-F-895 Modfiied Conditional AIC in Linear Mixed Models Yuki Kawakubo Graduate School of Economics, University of Tokyo Tatsuya Kubokawa University of Tokyo July 2013 CIRJE Discussion Papers can be

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

Journal of Asian Scientific Research COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY AND AUTOCORRELATION

Journal of Asian Scientific Research COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY AND AUTOCORRELATION Journal of Asian Scientific Research ISSN(e): 3-1331/ISSN(p): 6-574 journal homepage: http://www.aessweb.com/journals/5003 COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY

More information

MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete Observations

MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete Observations Sankhyā : The Indian Journal of Statistics 2006, Volume 68, Part 3, pp. 409-435 c 2006, Indian Statistical Institute MIVQUE and Maximum Likelihood Estimation for Multivariate Linear Models with Incomplete

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

RIDGE REGRESSION REPRESENTATIONS OF THE GENERALIZED HODRICK-PRESCOTT FILTER

RIDGE REGRESSION REPRESENTATIONS OF THE GENERALIZED HODRICK-PRESCOTT FILTER J. Japan Statist. Soc. Vol. 45 No. 2 2015 121 128 RIDGE REGRESSION REPRESENTATIONS OF THE GENERALIZED HODRICK-PRESCOTT FILTER Hiroshi Yamada* The Hodrick-Prescott (HP) filter is a popular econometric tool

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem T Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem Toshiki aito a, Tamae Kawasaki b and Takashi Seo b a Department of Applied Mathematics, Graduate School

More information

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS Advances in Adaptive Data Analysis Vol. 2, No. 4 (2010) 451 462 c World Scientific Publishing Company DOI: 10.1142/S1793536910000574 MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS STAN LIPOVETSKY

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Research Article An Unbiased Two-Parameter Estimation with Prior Information in Linear Regression Model

Research Article An Unbiased Two-Parameter Estimation with Prior Information in Linear Regression Model e Scientific World Journal, Article ID 206943, 8 pages http://dx.doi.org/10.1155/2014/206943 Research Article An Unbiased Two-Parameter Estimation with Prior Information in Linear Regression Model Jibo

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux Pliska Stud. Math. Bulgar. 003), 59 70 STUDIA MATHEMATICA BULGARICA DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION P. Filzmoser and C. Croux Abstract. In classical multiple

More information

Focused fine-tuning of ridge regression

Focused fine-tuning of ridge regression Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Bias-corrected Estimators of Scalar Skew Normal

Bias-corrected Estimators of Scalar Skew Normal Bias-corrected Estimators of Scalar Skew Normal Guoyi Zhang and Rong Liu Abstract One problem of skew normal model is the difficulty in estimating the shape parameter, for which the maximum likelihood

More information

Measuring Local Influential Observations in Modified Ridge Regression

Measuring Local Influential Observations in Modified Ridge Regression Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University

More information

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 23 11-1-2016 Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers Ashok Vithoba

More information

Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors

Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors James Younker Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment

More information

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study

More information

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Likelihood Ratio Tests in Multivariate Linear Model

Likelihood Ratio Tests in Multivariate Linear Model Likelihood Ratio Tests in Multivariate Linear Model Yasunori Fujikoshi Department of Mathematics, Graduate School of Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi Hiroshima, Hiroshima 739-8626,

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Comparison of Estimators in GLM with Binary Data

Comparison of Estimators in GLM with Binary Data Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 10 11-2014 Comparison of Estimators in GLM with Binary Data D. M. Sakate Shivaji University, Kolhapur, India, dms.stats@gmail.com

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June

More information

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38

More information

A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION

A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION J. Japan Statist. Soc. Vol. 32 No. 2 2002 165 181 A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION Hisayuki Tsukuma* The problem of estimation in multivariate linear calibration

More information

Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs

Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs Some optimal criteria of model-robustness for two-level non-regular fractional factorial designs arxiv:0907.052v stat.me 3 Jul 2009 Satoshi Aoki July, 2009 Abstract We present some optimal criteria to

More information

ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS

ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS Mem. Gra. Sci. Eng. Shimane Univ. Series B: Mathematics 47 (2014), pp. 63 71 ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS TAKUMA YOSHIDA Communicated by Kanta Naito (Received: December 19, 2013)

More information

Consistent Bivariate Distribution

Consistent Bivariate Distribution A Characterization of the Normal Conditional Distributions MATSUNO 79 Therefore, the function ( ) = G( : a/(1 b2)) = N(0, a/(1 b2)) is a solu- tion for the integral equation (10). The constant times of

More information

On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly Unrelated Regression Model and a Heteroscedastic Model

On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly Unrelated Regression Model and a Heteroscedastic Model Journal of Multivariate Analysis 70, 8694 (1999) Article ID jmva.1999.1817, available online at http:www.idealibrary.com on On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE

ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE Sankhyā : The Indian Journal of Statistics 999, Volume 6, Series B, Pt. 3, pp. 488 495 ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE By S. HUDA and A.A. AL-SHIHA King Saud University, Riyadh, Saudi Arabia

More information