A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion

Size: px

Start display at page:

Download "A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion"

Charles Cunningham
5 years ago
Views:

1 TR-No , Hiroshima Statistical Research Group, 1 24 A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion Mineaki Ohishi, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate School of Science, Hiroshima University Kagamiyama, Higashi-Hiroshima, Hiroshima , Japan Abstract In this paper, we deal with an optimization of ridge parameters in a generalized ridge regression by minimizing a model selection criterion. Optimization methods based on minimizations of a generalized C p (GC p ) criterion and GCV criterion have several important advantages and several major disadvantages. The most important advantage of two methods is that the speed of an optimization is very fast because minimizers of two criteria are given by closed forms. The serious disadvantage of the two methods is that the optimization methods do not work well when the number of explanatory variables is larger than the sample size. In order to improve the disadvantage while maintaining the advantage, at first we extend the GCV criterion, called the extended GCV (EGCV), by changing the second power of its denominator to the α-power, where α is some positive number larger than 2. Next, we propose a fast optimization algorithm to minimize the EGCV by using specific candidate minimizers of the EGCV of which the number is finite. By conducting numerical examinations, we verify that the optimization method based on the minimization of the EGCV performs well than those based on the minimizations of existing criteria. (Last Modified: April 14, 2017) Key words: Generalized ridge regression, Generalized cross-validation criterion, Linear regression model, Model selection criterion, Ridge parameters, Optimization of ridge parameters. address: mineaki-ohishi@hiroshima-u.ac.jp (Mineaki Ohishi) 1. Introduction A linear regression model is one of important tools to clarify the relationship between a scalar response variable and one or more explanatory variables. Let y = (y 1,..., y n ) be an n- 1

2 A Fast Algorithm for Minimizing EGCV in GRR dimensional vector of response variables and X be an n k matrix of nonstochastic centralized explanatory variables (X 1 n = 0 k ) with rank(x) = m min{n 1, k}, where n is the sample size, k is the number of explanatory variables, 1 n is an n-dimensional vector of ones, and 0 k is a k-dimensional vector of zeros. An equation of the linear regression model is y = µ1 n + Xβ + ε, where µ is an unknown location parameter, β is a k-dimensional vector of unknown regression coefficients, and ε = (ε 1,..., ε n ) is an n-dimensional vector of independent error variables from a distribution with the mean 0 and variance σ 2. One of methods to avoid a serious problem of multicollinearity among explanatory variables is the generalized ridge regression (GRR) proposed by Hoerl and Kennard (1970), which gives a shrinkage estimate of β towards 0 k via multiple ridge parameters θ 1,..., θ k, where θ j R + = {θ R θ 0} ( j = 1,..., k). Although the GRR was proposed 45 years ago, it is still used in many research fields (e.g., Liu et al., 2011; Shen et al., 2013). Since an estimator of the GRR estimator (GRRE) of β changes depending on θ 1,..., θ k, it is important how to optimize θ 1,..., θ k. An optimization method based on minimization of a model selection criterion (MSC) with respect to θ 1,..., θ k is one of common methods to optimize θ 1,..., θ k (see e.g., Lawless, 1981; Walker & Page, 2001; Yanagihara, 2013). Most MSCs consist of two terms; one is a discrepancy term that measures a goodness of fit of a model, and the other is a penalty term that applies a penalty to a complexity of a model. The generalized C p (GC p ) criterion (see Atkinson, 1980; Nagai et al., 2012) and generalized cross-validation (GCV) criterion (see Craven & Wahba, 1979; Ye, 1998; Yanagihara, 2013) measure a goodness of fit of model via the residual sum of squares. An optimization method based on the minimization of GC p criterion (called the GC p -minimization method) has the following advantages and disadvantage: Advantages of the GC p -minimization method. (1) The speed of the optimization is very fast because minimizers of GC p criterion can be derived by the closed forms (see Nagai et al., 2012). (2) The strength of the penalty term can be changed freely. A disadvantage of the GC p -minimization method. (1) This method cannot be used when n is smaller than k because the GC p requires an asymptotically unbiased estimator of σ 2. Since we encounter many opportunities to analyze data with huge explanatory variables, i.e., 2

3 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. high-dimensional explanatory variables, we have to say that the GC p -minimization method has a serious disadvantage. Since the serious problem is occurred from a requirement of an asymptotically unbiased estimator of σ 2, it is solvable by using a model selection criterion that does not require the asymptotically unbiased estimator. The GCV criterion is one of model selection criteria that do not require an asymptotically unbiased estimator of σ 2. However, an optimization method based on the minimization of GCV criterion (called the GCV-minimization method) also has the following advantages and disadvantage: Advantages of the GCV-minimization method. (1) The speed of the optimization is very fast because minimizers of the GCV criterion can be derived by the closed forms (see Yanagihara, 2013). (2) This method can be used even when n is smaller than k because the GCV does not require an asymptotically unbiased estimator of σ 2. A disadvantage of the GCV-minimization method. (1) The strength of the penalty term is fixed. Although the GCV-minimization method can be used when n is smaller than k, this method does not work well then. This is because the GRRE hardly shrink towards 0 k (see Yanagihara, 2013). The problem will be avoided if the strength of the penalty term in the GCV can be changed freely. On the other hand, a goodness of fit of a model can be also measured by the Kullback-Leibler (KL) discrepancy (Kullback & Leibler, 1951). Hence, we can define a generalized information criterion (GIC; Nishii, 1984) for the GRR under a normal distribution assumption. An optimization method based on the minimization of GIC (called the GIC-minimization method) also has the following advantages and disadvantage: Advantages of the GIC-minimization method. (1) This method can be used even when n is smaller than k because the GIC does not require an asymptotically unbiased estimator of σ 2. (2) The strength of the penalty to a complexity of a model can be changed freely. A disadvantage of the GIC-minimization method. (1) The speed of the optimization is slow because minimizers of the GIC are derived by an iterative calculation, e.g., the Newton-Raphson method. 3

4 A Fast Algorithm for Minimizing EGCV in GRR In this paper, at the beginning, we propose a fast optimization algorithm to solve a minimization problem of the GIC by using specific candidate minimizers of the GIC of which the number is finite. Unfortunately, we also show that the GIC-minimization method does not work well when n is smaller than k even if the strength of the penalty term is increased. Therefore, we extend the GCV so that the strength of the penalty term can be changed freely. We call this criterion an extended GCV (EGCV), and an optimization method based on the minimization of this criterion the EGCV-minimization method. We also propose a fast optimization algorithm to solve a minimization problem of the EGCV by using specific candidate minimizers of the EGCV of which the number is finite. This paper is organized as follows: In section 2, we describe several results for deriving main theorems. In section 3, we give a fast optimization algorithm of the GIC-minimization method, and show the reason why the GIC-minimization method does not work well when k is larger than n. In section 4, we propose the EGCV criterion and fast optimization algorithm to solve a minimization problem of EGCV. Numerical examinations are conducted in Section 5. Technical details are provided in the Appendix. 2. Preliminaries It follows from the results in Yanagihara (2013) that the GRRE of β and a hat matrix of the GRR are invariant to any changes in θ m+1,..., θ k when m < k. From this fact, we set θ m+1 = = θ k = to reduce a variance of the GRRE of β. Therefore, it is enough just to minimize MSC with respect to θ 1,..., θ m for optimizing ridge parameters in the GRR. Let Q be a kth orthogonal matrix that diagonalizes X X, i.e., Q X D O m,k m XQ =, O k m,m O k m,k m where D is an mth diagonal matrix defined by D = diag(d 1,..., d m ), d 1 d m are nonzero eigenvalues of X X, (2.1) and O k,m is a k m matrix of zeros, and let θ be an m-dimensional vector given by θ = (θ 1,..., θ m ) R m +, and Θ be a kth diagonal matrix given by Θ = diag(θ 1,..., θ m, 0,..., 0), where R m + is the mth Cartesian power set of R +. Notice that d 1,..., d m are positive because X X is a positive semidefinite matrix. Moreover, let M θ be a kth square matrix defined by M θ = X X + QΘQ. In particular, we write M θ = M when θ = 0 m. Then, the GRRE of β is expressed by ˆβ θ = M + θ X y, (2.2) 4

5 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. where A + is the Moore-Penrose inverse matrix of a matrix A (for details of the Moore-Penrose inverse matrix, see e.g., Harville, 1997, chap. 20). It is well known fact that the least square estimator (LSE) of β is given by ˆβ = M + X y. (2.3) It is clear that (2.2) coincides with (2.3) when θ = 0 m. Let ˆµ be the LSE of µ given by ˆµ = ȳ, where ȳ is the sample mean of y 1,..., y n, i.e., ȳ = n j=1 y i /n. The equation (2.2) and ˆµ imply a predictive value of y as ŷ θ = ˆµ1 n + X ˆβ θ = (J n + XM + θ X )y, where J n is an n n projection matrix defined by J n = 1 n 1 n/n. By using M + θ and ŷ θ, we define an estimator of σ 2 and a generalized degree of freedom (df) of the GRR that are main bodies of the discrepancy and penalty terms of MSC, respectively, as ˆσ 2 (θ) = 1 n (y ŷ θ) (y ŷ θ ) = 1 n y ( I n J n XM + θ X ) 2 y, (2.4) df(θ) = tr(j n + XM + θ X ) = 1 + tr(m + θ M). (2.5) Most MSCs for the GRR are defined by a bivariate function of ˆσ 2 (θ) and df(θ). Let z be an m-dimensional vector defined by z = (z 1,..., z m ) = (D 1/2, O m,k m )Q X y = D 1/2 Q 1 X y, (2.6) where Q 1 is a k m matrix defined by Q = (Q 1, Q 2 ). Here, we assume that all z 1,..., z m are not 0. If z j is accidentally 0, then we set d j = 0 and use (z 1,..., z j 1, z j+1,..., z m ) as z. By using the result in Yanagihara (2013), ˆσ 2 (θ) in (2.4) and df(θ) in (2.5) can be rewritten as ˆσ 2 (θ) = 1 m ( ) n n ˆσ2 0 + θ 2 j m z 2 j d j + θ j, df(θ) = 1 + m θ j, (2.7) d j + θ j j=1 where ˆσ 2 0 is the maximum likelihood estimator of σ2 under normality, which is given by ˆσ 2 0 = 1 n y (I n J n XM + X )y. (2.8) It should be kept in mind that ˆσ 2 0 = 0 holds in most cases of the linear regression with highdimensional explanatory variables. Notice that θ j /(d j + θ j ) monotonically increases θ j, and ˆσ 2 (0 m ) = ˆσ 2 0, df(0 m) = m + 1, and j=1 lim θ 1,...,θ m ˆσ2 (θ) = ˆσ 2 = 1 n y (I n J n )y, lim df(θ) = 1. (2.9) θ 1,...,θ m Hence, the ranges of ˆσ 2 (θ) and df(θ) are given by 5

6 A Fast Algorithm for Minimizing EGCV in GRR ˆσ 2 (θ) [ ˆσ 2 0, ˆσ2 ], df(θ) [1, m + 1]. Let f (r, u) be a bivariate function which satisfies the following assumptions: (A1) (A2) (A3) f (r, u) is a continuous function at any (r, u) (0, ˆσ 2 ] [1, n). f (r, u) > 0 for any (r, u) (0, ˆσ 2 ] [1, n). f (r, u) is the first-order partially differentiable at any (r, u) (0, ˆσ 2 ] [1, n), and f r (r, u) = f (r, u) > 0, r f u (r, u) = f (r, u) > 0, u (r, u) (0, ˆσ 2 ] [1, n). Let D be the domain of f. It is easy to see that m = n 1 ˆσ 2 0 = 0. Recall that m n 1. Hence, we have ˆσ D (0, ˆσ2 ] [1, n). By using the bivariate function f, MSC for the GRR can be expressed as MSC(θ) = f ( ˆσ 2 (θ), df(θ)). (2.10) The relationship between an existing criterion and the bivariate function f is as follows: f GCp (r, u) = nr/s αu (GC p : when ˆσ 2 0 0) f (r, u) = f GCV (r, u) = r(1 u/n) 2 (GCV : when u n), (2.11) f GIC (r, u) = r exp(αu/n) (GIC) where α is some positive value expressing the strength of the penalty term, and s 2 0 is an ordinary unbiased estimator of σ 2 when m < n 1, which is given by s 2 0 = n ˆσ2 0 /(n m 1). The GC p includes several famous MSCs, e.g., the GC p coincides with the C p proposed by Mallows (1973) when α = 2, and the modified C p (MC p ) proposed by Fujikoshi and Satoh (1997) and Yanagihara et al. (2009) when α = 1 + 2/(n m 1). Moreover, the original GIC under normality is expressed as n log r+αu. The GIC in this paper is defined by an exponential transformation of the original GIC divided by n. The GIC includes several famous MSCs, e.g., the GIC coincides the AIC proposed by Akaike (1973) when α = 2, the HQC proposed by Hannan and Quinn (1978) when α = 2 log log n, and the BIC proposed by Schwarz (1978) when α = log n. In order to express an optimized GRRE of β systematically, we prepare the following class of θ for any h R + : ˆθ(h) = (ˆθ 1 (h),..., ˆθ m (h)), d j h ˆθ j (h) = z 2 j h (h z 2 j ). (h > z 2 j ) (2.12) 6

7 The GRRE of β based on ˆθ is given by Ohishi, M., Yanagihara, H. & Fujikoshi, Y. ˆβ ˆθ(ĥ) = ˆβ(ĥ) = Q 1 V (ĥ)q 1 ˆβ, (2.13) where ˆβ is the LSE of β given by (2.3), and V (h) is a kth diagonal matrix of which the jth diagonal element is given by v j (h) = d j d j + ˆθ j (h) = 1 h (h z 2 z 2 j ) j. 0 (h > z 2 j ) An optimized θ by minimizing MSC(θ) when the domain of f is a subset of (0, ˆσ 2 ] [1, n) is expressed as in the following theorem (the proof is given in Appendix A.1): Lemma 1. Let ϕ(h) = MSC( ˆθ(h)) (h R + \{0}). (2.14) Suppose that D (0, ˆσ 2 ] [1, n) and ξ > 0 s.t. ϕ(ξ) < lim h 0 ϕ(h). Then, optimized ridge parameters by minimizing the MSC(θ) are given by ˆθ(ĥ), where ĥ is given by ĥ = arg min h R + \{0} ϕ(h). Let t j ( j = 1,..., m) be the jth-order statistic of z 2 1,..., z2 m, i.e., min{z 2 1 t j =,..., z2 m} ( j = 1) min{{z 2 1,..., z2 m} \ {t 1,..., t m 1 }} ( j = 2,..., m), and let R j ( j = 0,..., m) be a range defined by (0, t 1 ] ( j = 0) R j = (t j, t j+1 ] ( j = 1,..., m 1). (2.15) (t m, ) ( j = m) By using t j and R j, we can clarify properties of ϕ(h) given in (2.14) as in the following lemma (the proof is given in Appendix A.2): Lemma 2. The ϕ(h) satisfies the following properties: (P1) ϕ(h) is a continuous at any h R + \{0}. (P2) ϕ(h) = f ( ˆσ 2, 1) for every h t m. 7

8 A Fast Algorithm for Minimizing EGCV in GRR (P3) ϕ(h) is a piecewise function as ϕ(h) = ϕ a (h) (h R a, a = 0, 1,..., m). The specific form of ϕ a (h) can be expressed as ϕ a (h) = f ( ˆσ (c ) 1,a + h 2 c 2,a )/n, 1 + m a hc 2,a, where c 1,a and c 2,a are given by c 1,a = a t j, c 2,a = j=1 m j=a+1 1 t j. (2.16) Let s 2 a (a = 0, 1,..., m) be a statistic defined by and, let a be a positive integer defined by s 2 a = n ˆσ2 0 + c 1,a n m 1 + a, (2.17) a {0, 1,..., m 1} s.t. s 2 a R a, (2.18) From Lemma 2.1 in Yanagihara (2013), we can see that a exists unique when ˆσ Let GC p (θ) = f GCp ( ˆσ 2 (θ), df(θ)), GCV(θ) = f GCV ( ˆσ 2 (θ), df(θ)). From the results in Nagai et al. (2012) and Yanagihara (2013), the minimizers of the GC p and GCV can be expressed as in the following theorem (the proof is omitted because it is easy to obtain from the results in Nagai et al., 2012; Yanagihara, 2013): Theorem 1. Optimized ridge parameters minimizing GC p (θ) and GCV(θ) are ˆθ(ĥ GCp ) and ˆθ(ĥ GCV ), respectively, where ĥ GCp and ĥ GCV are defined as s 2 ĥ GCp = 1 a 2 αs2 0 ( ˆσ2 0 0), ĥ (when ˆσ2 0 0) GCV = t 1 (when ˆσ 2 0 = 0 and m = n 1), 0 (when ˆσ 2 0 = 0 and m < n 1) where s 2 a is given by (2.17) and a is the integer given by (2.18). By using the results in Theorem 1, optimized GRREs of β by minimizing the GC p and GCV can be derived as ˆβ(ĥ GCp ) and ˆβ(ĥ GCV ), respectively. 8

9 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. 3. Algorithm to Solve the Minimization Problem of the GIC Theorem 1 in the previous section indicates that the GC p -minimization method cannot be used when ˆσ 2 0 in 2.8 is 0, i.e., most cases of the linear regression with high dimensional explanatory variables, the GCV-minimization method hardly shrink the GRRE of β towards 0 k when ˆσ 2 0 = 0. On the other hand, the GIC is one of model selection criteria that can be defined even if ˆσ 2 0 = 0 and of which the size of the penalty term can be changed freely. However, at the moment, there is no efficient algorithm to solve the minimization problem of the GIC. Hence, we propose the fast algorithm for the GIC-minimization method. Let GIC(θ) = f GIC ( ˆσ 2 (θ), df(θ)), (3.1) where f GIC (r, u) is given by (2.11), and ˆσ 2 (θ) and df(θ) are given by (2.7), and let ϕ GIC (h) = f GIC ( ˆσ 2 ( ˆθ(h)), df( ˆθ(h))) (h R + \{0}), where ˆθ(h) is given by (2.12). From Lemma 2, we can see that ϕ GIC (h) is a piecewise function as ϕ GIC (h) = ϕ GIC,a (h) (h R a, a = 0, 1,..., m), (3.2) where R a is the range given by (2.15) and ϕ GIC,a (h) is defined by { ϕ GIC,a (h) = ˆσ } { } 1 n (c 1,a + h 2 c 2,a ) exp n α(1 + m a hc 2,a). Here c 1,a and c 2,a (a = 0, 1,..., m) are given by (2.16). Furthermore, we define the function ψ GIC (h) (h (0, t m ]) which is a piecewise function as where ψ GIC,a (h) is defined by ψ GIC (h) = ψ GIC,a (h) (h R a, a = 0, 1,..., m 1), (3.3) ψ GIC,a (h) = c 2,a exp{α(1 + m a hc 2,a )/n} h ϕ GIC,a(h) n 2 = αc 2,a h 2 + 2nh α(n ˆσ c 1,a). (3.4) The minimizer of GIC can be expressed as in the following theorem (the proof is given in Appendix A.3): Theorem 2. equation ψ GIC,a (h) = 0 as Let ξ GIC,a be one of the two distinct roots or the double root of the quadratic 9

10 A Fast Algorithm for Minimizing EGCV in GRR n ξ GIC,a = n 2 α 2 c 2,a (n ˆσ c 1,a) αc 2,a, (3.5) and let A GIC and T GIC be sets defined by A GIC = { } a {0, 1,..., m 1} ξ GIC,a R a, {t m } TGIC = { } (when ˆσ 2 > 2t m /α) (when ˆσ 2 2t m /α), (3.6) where ˆσ 2 is given by (2.9). We define a set of candidate minimizers of ϕ GIC (h) as S GIC = {ξ GIC,a } T GIC. (3.7) a A GIC Then, an optimized ridge parameters by minimizing GIC(θ) are given by ˆθ(ĥ GIC ), where ĥ GIC is given by arg min ϕ GIC (h) (when ˆσ 2 ĥ GIC = h S 0 0) GIC. (3.8) 0 (when ˆσ 2 0 = 0) It should be emphasized that elements of S GIC can be written by closed forms, and the number of elements of S GIC is equal to or smaller than m + 1, i.e., #(S GIC ) m + 1. Hence, when ˆσ 2 0 0, by using the results in Theorem 2, we can optimize θ quickly as follows: A Fast Algorithm to Derive the Optimized GRRE of β by Minimizing the GIC (1) Calculate elements of S GIC by using ξ GIC,a, A GIC and T GIC. (2) Determine ĥ GIC by comparing function values of ϕ GIC (h) at the points in S GIC. (3) Calculate the optimized GRRE of β by minimizing the GIC as ˆβ(ĥ GIC ), where ˆβ(h) is given by (2.13). Unfortunately, Theorem 2 also indicates that the optimized GRRE of β by minimizing GIC is always equal to the LSE of β when ˆσ 2 0 = 0. Thus, we cannot help but say that the GICminimization method does not work well in the case of high-dimensional explanatory variables. When ˆσ 2 0 0, we cannot derive #(S GIC) without comparing function values of ϕ GIC (h). However, #(S GIC ) becomes 1 under the specific α when ˆσ The result is summarized as in the following corollary (the proof is given as Appendix A.4): Corollary 1. Suppose that α n/m and ˆσ Let a be a positive integer defined by a {0, 1,..., m 1} s.t. ξ GIC,a R a. 10

11 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. The integer a exists unique when ˆσ 2 2t m /α. Then ĥ GIC in (3.8) can be rewritten as ξ GIC,a (when ˆσ 2 2t m /α) ĥ GIC = t m (when ˆσ 2 > 2t m /α). 4. Algorithm to Solve the Minimization Problem of the EGCV Theorem 2 in the previous section indicates that the GIC-minimization method does not shrink the GRRE of β towards 0 k at all when ˆσ 2 0 in (2.8) is 0, i.e., most cases of the linear regression with high dimensional explanatory variables. From Theorems 1 and 2, we would have to say regrettably that there are no existing MSCs which work well when ˆσ 2 0 = 0. Hence, we have to propose new MSC which works well even when ˆσ 2 0 = 0. On the other hand, the cause of the problem in GCV is that the strength of the penalty to a complexity of a model is small. Hence, we extend the GCV so that the size of the penalty to a complexity of a model can be changed freely. It is called the extended GCV (EGCV) in this paper. Let f EGCV (r, u) = Then, the EGCV for the GRR is defined as r, α > 2 and u < n. (4.1) (1 u/n) α EGCV(θ) = f EGCV ( ˆσ 2 (θ), df(θ)), (4.2) where ˆσ 2 (θ) and df(θ) are given by (2.7). It is easy to see that the EGCV coincides with the GCV if α = 2. Let ϕ EGCV (h) = f EGCV ( ˆσ 2 ( ˆθ(h)), df( ˆθ(h))) (h R + ), where ˆθ(h) is given by (2.12). From Lemma 2, we can see that ϕ EGCV (h) is a piecewise function as ϕ EGCV (h) = ϕ EGCV,a (h) (h R a, a = 0, 1,..., m), (4.3) where R a is the range given by (2.15) and ϕ EGCV,a (h) is defined by ϕ EGCV,a (h) = ˆσ2 0 + c 1,a/n + h 2 c 2,a /n (b + a/n + hc 2,a /n) α. (4.4) Here c 1,a and c 2,a (a = 0, 1,..., m) are given by (2.16), and b is the constant defined by b = 1 1 (m + 1). (4.5) n Furthermore, we define the function ψ EGCV (h) (h (0, t m ]) which is a piecewise function as 11

12 where ψ EGCV,a (h) is defined by A Fast Algorithm for Minimizing EGCV in GRR ψ EGCV (h) = ψ EGCV,a (h) (h R a, a = 0, 1,..., m 1), (4.6) ψ EGCV,a (h) = n2 (b + a/n + c 2,a h/n) α+1 c 2,a h ϕ EGCV,a(h) = (α 2)c 2,a h 2 + 2(a + nb)h α(n ˆσ c 1,a). (4.7) The minimizer of EGCV can be expressed as in the following theorem (the proof is given in Appendix A.5): Theorem 3. equation ψ EGCV,a = 0 as Let ξ EGCV,a be one of the two distinct roots or the double root of the quadratic (a + nb) ξ EGCV,a = (a + nb) 2 α(α 2)c 2,a (n ˆσ c 1,a) (α 2)c 2,a, and let A EGCV and T EGCV be sets defined by A EGCV = { } {t m } (when ˆσ 2 > 2(1 n 1 )t m /α) a {0, 1,..., m 1} ξ EGCV,a R a, TEGCV = { } (when ˆσ 2 2(1 n 1 )t m /α), where ˆσ 2 is given by (2.9). We define a set of candidate minimizers of ϕ EGCV (h) as S EGCV = {ξ EGCV,a } T EGCV. (4.8) a A EGCV Then, an optimized ridge parameters by minimizing EGCV(θ) are given by ˆθ(ĥ EGCV ), where ĥ EGCV is given by arg min ϕ EGCV (h) (when ˆσ 2 ĥ EGCV = h S 0 0 or b = 0) EGCV. (4.9) 0 (when ˆσ 2 0 = 0 and b 0) It should be emphasized that elements of S EGCV can be written by closed forms, and the number of elements of S EGCV is equal to or smaller than m+1, i.e., #(S EGCV ) m+1. Hence, when ˆσ or b = 0, by using the results in Theorem 3, we can optimize β quickly as follows: A Fast Algorithm to Derive the Optimized GRRE of β by Minimizing the EGCV (1) Calculate elements of S EGCV by using ξ EGCV,a, A EGCV and T EGCV. (2) Determine ĥ EGCV by comparing function values of ϕ EGCV (h) at the points in S EGCV. 12

13 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. (3) Calculate the optimized GRRE of β by minimizing the EGCV as ˆβ(ĥ EGCV ), where ˆβ(h) is given by (2.13). When ˆσ or m = n 1, we cannot derive #(S EGCV) without comparing function values of ϕ EGCV (h). However, #(S EGCV ) becomes 1 under the specific α when ˆσ or m = n 1. The result is summarized as in the following corollary (the proof is given as Appendix A.6): Corollary 2. Suppose that α (n + m 1)/m, and { ˆσ 2 0 0} {b = 0}. Let a be a positive integer defined by a {0, 1,..., m 1} s.t. ξ EGCV,a R a. The integer a exists unique when ˆσ 2 2(1 n 1 )t m /α. Then ĥ EGCV in (4.9) can be rewritten as ξ EGCV,a (when ˆσ 2 2(1 n 1 )t m /α) ĥ EGCV = t m (when ˆσ 2 > 2(1 n 1 )t m /α). 5. Numerical Study In this section, we compared performances of the GC p -, GIC- and EGCV-minimization methods with α = 2, 2 log log n and log n by conducting numerical examinations. We generated data from the simulation model N n (Xβ, I n ), where X = (I n J n )X 0 Φ(ρ) 1/2 and β = M + X η. Here X 0 is an n k matrix of which elements are identically and independently distributed according to U( 1, 1), Φ(ρ) is a k k symmetric matrix of which the (i, j)th element is ρ i j and η is an n-dimensional vector of which the jth element is given by ( 12n(n 1) {( 1) j 1 1 j 1 ) 1 }. 4n 2 + 6n 1 n 2n The simulation model is the same as that in Yanagihara (2013). Let ˆθ be an optimized ridge parameters by minimizing MSC, ŷ be the predictive value of y based on the LSE of β and ŷ ˆθ be the predictive value of y in the GRR based on θ, i.e., ŷ = (J n + XM + X )y, ŷ ˆθ = (J n + XM +ˆθ X )y. We used the following relative mean square error (MSE) as a measurement of performance of each method. E [(ŷ θ Xβ) (ŷ θ Xβ)] E [(ŷ Xβ) (ŷ Xβ)] 100(%) Notice that E [(ŷ Xβ) (ŷ Xβ)] = m+1 in this case. The above expectation was evaluated by Monte Carlo simulation with 10,000 iterations. 13

14 A Fast Algorithm for Minimizing EGCV in GRR Relative MSE (%) EGCV GIC GCp The number of explanatory variables (a) Case of α = 2 Relative MSE (%) EGCV GIC GCp The number of explanatory variables (b) Case of α = 2 log log n Relative MSE (%) EGCV GIC GCp The number of explanatory variables (c) Case of α = log n Figure 1. Case of n = 50 14

15 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. Relative MSE (%) EGCV GIC GCp The number of explanatory variables (a) Case of α = 2 Relative MSE (%) EGCV GIC GCp The number of explanatory variables (b) Case of α = 2 log log n Relative MSE (%) EGCV GIC GCp The number of explanatory variables (c) Case of α = log n Figure 2. Case of n =

16 A Fast Algorithm for Minimizing EGCV in GRR Relative MSE (%) EGCV GIC GCp The number of explanatory variables (a) Case of α = 2 Relative MSE (%) EGCV GIC GCp The number of explanatory variables (b) Case of α = 2 log log n Relative MSE (%) EGCV GIC GCp The number of explanatory variables (c) Case of α = log n Figure 3. Case of n =

17 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. Figures 1, 2 and 3 shows the results when n = 50, 200 and 500, respectively, with ρ = 0.99 and k = k 1,..., k 16, where n(4 + j)/10 ( j = 1,..., 5) k j = n 2 ( j = 6). n{1 + ( j 6)/10} ( j = 7,..., 16) In the figures, there were no results of GC p -minimization methods when k = k 7,..., k 16 because the GC p was not definable when k k 7. From figures, we can see that the performances of the EGCV-minimization method with α = log n were well in most cases. On the other hand, the performances of the GIC-minimization method were very bad when k k 7 because the GIC-minimization method dose not shrink the GRRE of β towards 0 k at all when k k 7. Also, the performances of the GCV-minimization method were also bad when k k 7 because the GCV-minimization method hardly shrink the GRRE of β towards 0 k when k k 7. Moreover, as for the number of candidate minimizers, the following results were derived: mode {#(S GIC )} = 2, min {#(S GIC )} = 2, max {#(S GIC )} = 7, and 3 (k n 1 & α = 2 log log n) mode {#(S EGCV )} =, 2 (otherwise) 3 (k < n 1) min {#(S GIC )} = 2, max {#(S EGCV )} = 12 (k n 1). Hence our algorithm will be efficient because the number of candidate minimizers was small. Acknowledgment The authors thank Dr. Shintaro Hashimoto, Mr. Tomoyuki Nakagawa, Mr. Ryoya Oda and Mr. Shota Ochiai, of Hiroshima University, for helpful comments. References Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory (eds. B. N. Petrov & F. Csáki), pp Akadémiai Kiadó, Budapest. Atkinson, A. C. (1980). A note on the generalized information criterion for choice of a model. Biometrika, 67, Craven, P. & Wahba, G. (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31, Fujikoshi, Y. and Satoh, K. (1997). Modified AIC and C p in multivariate linear regression. Biometrika, 84, Hannan, E. J. & Quinn, B. G. (1979). The determination of the order of an autoregression. J. Roy. Stat. Soc. Ser. B, 41,

18 A Fast Algorithm for Minimizing EGCV in GRR Harville, D. A. (1997). Matrix Algebra from a Statistician s Perspective. Springer-Verlag, New York. Hoerl, A. E. & Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, Kullback, S. & Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Statistics, 22, Lawless, J. F. (1981). Mean squared error properties of generalized ridge regression. J. Amer. Statist. Assoc., 76, Liu, Z., Shen, Y. & Ott, J. (2011). Multilocus association mapping using generalized ridge logistic regression. BMC Bioinformatics, 12, Mallows, C. L. (1973). Some comments on C p. Technometrics, 15, Nagai, I., Yanagihara, H. & Satoh, K. (2012). Optimization of ridge parameters in multivariate generalized ridge regression by plug-in methods. Hiroshima Math. J., 42, Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist., 12, Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist., 6, Shen, X., Alam, M., Fikse, F. & Rönnegård, L. (2013). A novel generalized ridge regression method for quantitative genetics. Genetics, 193, Walker, S. G. & Page, C. J. (2001). Generalized ridge regression and a generalization of the C p statistic. J. Appl. Statist., 28, Yanagihara, H. (2013). Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. TR-No , Hiroshima Statistical Research Group, Hiroshima University. Yanagihara, H., Nagai, I. & Satoh, K. (2009). A bias-corrected C p criterion for optimizing ridge parameters in multivariate generalized ridge regression. Japanese J. Appl. Statist., 38, (in Japanese). Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc., 93, Appendix A.1. Proof of Lemma 1 Let δ = (δ 1,..., δ m ) be an m-dimensional vector of which the jth element δ j [0, 1] ( j = 1,..., m) is defined by δ j = θ j d j + θ j, where θ 1,..., θ m are ridge parameters, and d j is given by (2.1). Since θ δ is one to one, we minimize MSC via δ instead of θ. By using δ, we rewrite ˆσ 2 (θ) and df(θ) in (2.7) as a function with respect to δ as ˆσ 2 (θ) = r(δ) = 1 m n n ˆσ2 0 + j=1 δ 2 j z2 j, m df(θ) = u(δ) = 1 + m δ j, j=1 (A.1) where z 2 j and ˆσ2 0 are given by (2.6) and (2.8), respectively. By using r(δ) and u(δ), we rewrite MSC(θ) in (2.10) as a function with respect to δ as MSC(θ) = g(δ) = f (r(δ), u(δ)). (A.2) 18

19 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. Since D (0, ˆσ 2 ] [1, n), we know that δ 0 m. From the assumption of f, we have f r (r(δ), u(δ)) > 0 and f u (r(δ), u(δ)) > 0, where f r (r, u) and f u (r, u) are the first partial derivatives of f with respect to r and u, respectively. Let τ(δ) = n f u (r(δ), u(δ)) 2 f r (r(δ), u(δ)). It is clear that τ(δ) > 0 when D (0, ˆσ 2 ] [1, n). Notice that g(δ) = r(δ) f (r, u) + u(δ) f (r, u) δ j δ j r (r,u)=(r(δ),u(δ)) δ j u (r,u)=(r(δ),u(δ)) = 1 n 2z2 j δ j f r (r(δ), u(δ)) f u (r(δ), u(δ)) = 2 n z2 j f r (r(δ), u(δ)) δ j τ(δ) z 2. j (A.3) Let δ = (δ 1,..., δ m) be the minimizer of g(δ), i.e., δ = arg min g(δ), δ [0,1] m \{0 m } where [0, 1] m is the mth Cartesian power of the set [0, 1]. From (A.3), necessary condition of δ is given by Let G be a set defined by τ(δ ) (τ(δ ) z 2 δ j = z 2 j ) j ( j = 1,..., m). 1 (τ(δ ) > z 2 j ) G = { δ = (δ 1,..., δ m ) [0, 1] m δ = ˆδ(h), h R + \{0} }, where ˆδ(h) is the m-dimensional vector of which jth element is defined by h (h z 2 ˆδ j (h) = z 2 j ) j ( j = 1,..., m). (A.4) 1 (h > z 2 j ) It follows from the fact G [0, 1] m \{0 m } that g(δ ) = Moreover, it is clear that δ G. This implies that min g(δ) min g(δ) = min g( ˆδ(h)). δ [0,1] m \{0 m } δ G h R + \{0} g(δ ) min g( ˆδ(h)). h R + \{0} 19

20 Therefore, we have δ = ˆδ(ĥ), where A Fast Algorithm for Minimizing EGCV in GRR ĥ = arg min h R + \{0} g( ˆδ(h)). Recall that g(δ) = MSC(θ) and d j δ j (if δ j 1) θ j = 1 δ j (if δ j = 1). Moreover, ˆβ θ is rewitten as ˆβ θ = (X X + QΘQ ) + X y = Q(D + Θ) + Q Xy = Q(D + Θ) + DQ QD + Q X y = QV Q ˆβ, where V = (D + Θ) + D and ˆβ is the LSE of β given by (2.3). When θ 0 m, the jth diagonal element of V is given by d j (when θ j < ) v j = d j + θ j. 0 (when θ j = ) It follows from the simple calculation that d j /{d j +ˆθ j (h)} = 1 h/z 2 j when h z2 j. Consequently, Lemma 1 is proved. A.2. Proof of Lemma 2 From (A.2), we can see that ϕ(h) = MSC( ˆθ(h)) = f (r( ˆδ(h)), u( ˆδ(h))), where ˆθ(h) is given by (2.12), ˆδ(h) are given by (A.4), and r(δ) and u(δ) are given by (A.1), respectively. Since t 1,..., t m are order statistics of z 2 1,..., z2 m, we have r( ˆδ(h)) = ˆσ m n I(z2 j h) h 2 1 z z 2 j = ˆσ m { I(t j h) n j=1 j j=1 m u( ˆδ(h)) = 1 + m I(z2 j h) h 1 z m { = 1 + m I(t j h) j j=1 j=1 ( ) 2 h 1 + 1} t j, t j ( ) } h 1 + 1, t j where I(x h) is an indicator function, i.e., I(x h) = 1 if x h and I(x h) = 0 if x < h. It is clear that r( ˆδ(h)) and u( ˆδ(h)) are continuous functions on h R + \{0}. Therefore, the property (P1) is proved. Notice that when h R a, we have 20

21 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. r( ˆδ(h)) = r a ( ˆδ(h)) = ˆσ a m h 2 n t j + t j=1 j=a+1 j = ˆσ n (c 1,a + h 2 c 2,a ), a m 1 u( ˆδ(h)) = u a ( ˆδ(h)) = 1 + m 1 + = 1 + m a hc 2,a, j=1 t j=a+1 j where c 1,a and c 2,a are given by (2.16). The property (P3) is proved by this equation. Notice that c 2,m = 0 and n ˆσ c 1,m = n ˆσ z z = y (I n J n XM + X )y + y XM + X y = n ˆσ 2. These imply that r m ( ˆδ(h)) = ˆσ 2 and u m ( ˆδ(h)) = 1 whem h R m. Hence, the property (P2) is proved. A.3. Proof of Theorem 2 At first, we derive the minimizer of GIC(θ) in (3.1) when ˆσ 2 0 in (2.8) is 0. It is easy to see that ˆσ 2 (0 m ) = 0 when ˆσ 2 0 = 0, and f GIC(0, u) = 0 holds for any u [1, n], where ˆσ(θ) is given by (2.4) and f GIC (r, u) is given by (2.11). Moreover, f GIC (r, u) > 0 holds for any (r, u) (0, ˆσ 2 ] [1, n], where ˆσ 2 is given by (2.9). Notice that ˆσ 2 (θ) = 0 holds if and only if θ = 0 m and ˆσ 2 0 = 0, and ˆθ(0) = 0 m holds, where ˆθ(h) is given by (2.12). These results imply that the minimizer of GIC(θ) is ˆθ(0) when ˆσ 2 0 = 0. Next, we derive the minimizer of GIC(θ) when ˆσ Then the domain of f GIC is included in (0, ˆσ 2 ] [1, n). Moreover, lim ψ GIC,0(h) = α ˆσ 2 0 < 0, h 0 where ψ GIC,a (h) is given by (3.4). This indicates that some positive number ξ exists such that ϕ GIC (ξ) < lim h 0 ϕ GIC (h), where ϕ GIC (h) is given by (3.2). Therefore, by using Lemma 1, the minimizer of GIC(θ) is given by ˆθ(ĥ GIC ), where ĥ GIC is the minimizer of ϕ GIC (h) as ĥ GIC = arg min h R + \{0} ϕ GIC(h). Hence, in order to prove Theorem 2 when ˆσ 2 0 0, it is enough just to show ĥ GIC S GIC, where S GIC is a set of candidate minimizers of ϕ GIC (h) given by (3.7). From (P1) and (P2) in Lemma 2, we can see that ϕ GIC (h) is continuous at any h R + \{0} and ϕ GIC (h) = ˆσ 2 exp(α/n) when h R m. The ψ GIC (h) in (3.3) is also continuous at any h (0, t m ] because of ψ GIC,a (t a+1 ) = αc 2,a ta nt a+1 α(n ˆσ c 1,a) ( = α c 2,a ) ta+1 2 t + 2nt a+1 α ( n ˆσ c ) 1,a+1 t a+1 a+1 = αc 2,a+1 t 2 a+1 + 2nt a+1 α(n ˆσ c 1,a+1) = ψ GIC,a+1 (t a+1 ) (a = 0,..., m 2), 21

22 A Fast Algorithm for Minimizing EGCV in GRR where c 1,a and c 2,a are given by (2.16). It should be emphasized that the sign of ϕ GIC,a (h)/ h is equal to that of ψ GIC,a (h). Hence, we have the necessary condition of ĥ GIC as {{ ψgic (ĥ GIC ) = 0 } { ϵ 0 > 0 s.t. ϵ (0, ϵ 0 ), ψ GIC (ĥ GIC ϵ) < 0 }} {ĥgic = t m s.t. ψ GIC,m 1 (t m ) < 0 } (A.5). Notice that ψ GIC,m 1 (t m ) = α t m t 2 m + 2nt m α(n ˆσ 2 t m ) = 2nt m αn ˆσ 2. Hence, we derive ψ GIC,m 1 (t m ) < 0 ˆσ 2 < 2 α t m. (A.6) Recall that ψ GIC,a (h) is the concave quadratic function. Hence, we have { ψgic (ĥ GIC ) = 0 } { ϵ 0 > 0 s.t. ϵ (0, ϵ 0 ), ψ GIC (ĥ GIC ϵ) < 0 } { ĥ GIC is the smaller of the two real distinct roots or the double (A.7) root of ψ GIC,a (h) = 0 which is included in R a (a = 0, 1,..., m 1) }. Notice that ξ GIC,a in (3.5) is the smaller of the two real distinct roots or the double root of ψ GIC,a (h) = 0 when α 2 c 2,a (n ˆσ 2 0 +c 1,a) n 2. Therefore, from (A.5), (A.6) and (A.7), ĥ GIC S GIC can be shown. Consequently, Theorem 2 is proved. A.4. Proof of Corollary 1 It follows from (3.4) that ψ GIC,a (h) = αc 2,a ( h n αc 2,a ) 2 + n2 αc 2,a α(n ˆσ c 1,a), where ˆσ 2 0 is given by (2.8), and c 1,a and c 2,a are given by (2.16). This indicates that the h- coordinate of the vertex of ψ GIC,a (h) is n/(αc 2,a ). It follows from the equation t 1 t m that for any a {0, 1,..., m 1}, Hence, we have c a,2 = m j=a+1 1 t j m j=a+1 1 = m a m. t a+1 t a+1 t a+1 n n αc 2,a αm t a+1 (a = 0, 1,..., m 1). (A.8) From this result, we can see that n/(αc 2,a ) t a+1 if α n/m. This indicates that ψ GIC,a (h) is a strictly increasing function at h R a in (2.15) if α n/m because the h-coordinate of the vertex of ψ GIC,a (h) is larger or equal to t a+1, which is the right end point of R a. Hence, it should 22

23 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. be emphasized that ψ GIC (h) is a strictly increasing function at h (0, t m ] when α n/m. Recall that ψ GIC (h) is continuous at any h (0, t m ] and lim h 0 ψ GIC,0 (h) < 0 when ˆσ 2 0 0, where ˆσ2 0 is given by (2.8). Therefore, when ˆσ and α n/m, we have ψ GIC,m 1 (t m ) 0 = the unique solution of ψ GIC (h) = 0 exisits in h (0, t m ] {#(A GIC ) = 1} {T GIC = }, ψ GIC,m 1 (t m ) < 0 = there is no solution of ψ GIC (h) = 0 in h (0, t m ] {A GIC = } {T GIC = {t m }}, where A GIC and T GIC are given by (3.6). From the above results and (A.6), Corollary 1 can be proved. A.5. Proof of Theorem 3 At first, we derive the minimizer of EGCV(θ) in (4.2) when ˆσ 2 0 = 0 and b 0, where ˆσ 2 0 and b are given by (2.8) and (4.5), respectively. It is easy to see that ˆσ2 (0 m ) = 0 when ˆσ 2 0 = 0, df(θ) < n when b 0, and f EGCV(0, u) = 0 holds for any u [1, n), where ˆσ 2 (θ) is given by (2.4) and f EGCV (r, u) is given by (4.1). Moreover, f EGCV (r, u) > 0 holds for any (r, u) (0, ˆσ 2 ] [1, n), where ˆσ 2 is given by (2.9). Notice that ˆσ 2 (θ) = 0 holds if and only if θ = 0 m and ˆσ 2 0 = 0, and ˆθ(0) = 0 m holds, where ˆθ(h) is given by (2.12). These results imply that the minimizer of EGCV(θ) is ˆθ(0) when ˆσ 2 0 = 0 and b 0. Next, we consider when ˆσ or b = 0. Since r < n in f EGCV(r, u), it is necessary to satisfy the inequality df(θ) < n, where df(θ) is given by (2.5). From (2.7), we can see that df(θ) < n θ 0 m ˆσ 2 (θ) 0, where ˆσ 2 (θ) is given by (2.4). This indicates that the domain of f EGCV is included in (0, ˆσ 2 ] [1, n). Notice that ϕ EGCV,0 (h) = ˆσ2 0 + h2 c 2,0 /n (b + hc 2,0 /n) α, ψ EGCV,0(h) = (α 2)c 2,0 h 2 + 2nbh αn ˆσ 2 0, where ϕ EGCV,a (h), ψ EGCV,a (h) and c 2,a are given by (4.4), (4.7) and (2.16), respectively. Recall that b = 0 ˆσ 2 0 = 0. It follows from the above equations and the assumption α > 2 that lim ϕ ˆσ 2 0 EGCV,0(h) = /bα ( ˆσ 2 0 0) αn ˆσ 2 0, lim ψ EGCV,0 (h) = < 0 ( ˆσ2 0 0). h 0 (b = 0) h 0 0 (b = 0) These results imply that some positive number ξ exists such that ϕ EGCV (ξ) < lim h 0 ϕ EGCV (h) when ˆσ or b = 0, where ϕ EGCV(h) is given by (4.3). Therefore, by using Lemma 1, the minimizer of EGCV(θ) is given by ˆθ(ĥ EGCV ), where ĥ EGCV is the minimizer of ϕ EGCV (h) as ĥ EGCV = arg min h R + \{0} ϕ EGCV(h). 23

24 A Fast Algorithm for Minimizing EGCV in GRR From (P1) and (P2) in Lemma 2, we can see that ϕ EGCV (h) is continuous at any h R + \{0} and ϕ EGCV (h) = ˆσ 2 /(1 n 1 ) α when h R m, where ˆσ 2 is given by (2.9). The ψ EGCV (h) which is given by (4.6) is also continuous at any h (0, t m ] because of ψ EGCV,a (t a+1 ) = (α 2)c 2,a ta (a + nb)t a+1 α(n ˆσ c 1,a) ( = (α 2) c 2,a ) ta+1 2 t + 2(a + nb)t a+1 α ( n ˆσ c ) 1,a+1 t a+1 a+1 = (α 2)c 2,a+1 t 2 a+1 + 2(a nb)t a+1 α(n ˆσ c 1,a+1) = ψ EGCV,a+1 (t a+1 ) (a = 0,..., m 2), where c 1,a is given by (2.16). Notice that ψ EGCV,m 1 (t m ) = α 2 t m t 2 m + 2(n 2)t m α(n ˆσ 2 t m ) = 2(n 1)t m αn ˆσ 2. Hence, by the same way as in the proof of Theorem 2, we can show that a set of candidate minimizers of ϕ GIC (h) is given by S EGCV in (4.8). Consequently, Theorem 3 is proved. A.6. Proof of Corollary 2 It follows from (4.7) that ψ EGCV,a (h) = (α 2)c 2,a { h a + nb (α 2)c 2,a } 2 + (a + nb)2 (α 2)c a,2 α(n ˆσ c 1,a), where ˆσ 2 0 and b are given by (2.8) and (4.5), respectively, and c a,1 and c a,2 are given by (2.16). This indicates that the h-coordinate of the vertex of ψ EGCV,a (h) is (a + nb)/{(α 2)c a,2 }. From (A.8), we derive a + nb a + nb (α 2)c 2,a (α 2)m t a+1 n m 1 (α 2)m t a+1 (a = 0, 1,..., m 1). From this result, we can see that (a+nb)/{(α 2)c 2,a } t a+1 if α (n+m 1)/m. This indicates that ψ EGCV,a (h) is a strictly increasing function at h R a in (2.15) if α (n + m 1)/m because the h-coordinate of the vertex of ψ EGCV,a (h) is larger or equal to t a+1, which is the right end point of R a. It should be emphasized that ψ GIC (h) is a strictly increasing function at h (0, t m ] when α (n + m 1)/m. Hence, by the same way as in the proof of Corollary 1, we can prove Corollary 2. 24

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions

TR-No. 14-07, Hiroshima Statistical Research Group, 1 12 Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 2, Mariko Yamamura 1 and Hirokazu Yanagihara 2 1