A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion

Size: px
Start display at page:

Download "A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion"

Transcription

1 TR-No , Hiroshima Statistical Research Group, 1 24 A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion Mineaki Ohishi, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate School of Science, Hiroshima University Kagamiyama, Higashi-Hiroshima, Hiroshima , Japan Abstract In this paper, we deal with an optimization of ridge parameters in a generalized ridge regression by minimizing a model selection criterion. Optimization methods based on minimizations of a generalized C p (GC p ) criterion and GCV criterion have several important advantages and several major disadvantages. The most important advantage of two methods is that the speed of an optimization is very fast because minimizers of two criteria are given by closed forms. The serious disadvantage of the two methods is that the optimization methods do not work well when the number of explanatory variables is larger than the sample size. In order to improve the disadvantage while maintaining the advantage, at first we extend the GCV criterion, called the extended GCV (EGCV), by changing the second power of its denominator to the α-power, where α is some positive number larger than 2. Next, we propose a fast optimization algorithm to minimize the EGCV by using specific candidate minimizers of the EGCV of which the number is finite. By conducting numerical examinations, we verify that the optimization method based on the minimization of the EGCV performs well than those based on the minimizations of existing criteria. (Last Modified: April 14, 2017) Key words: Generalized ridge regression, Generalized cross-validation criterion, Linear regression model, Model selection criterion, Ridge parameters, Optimization of ridge parameters. address: mineaki-ohishi@hiroshima-u.ac.jp (Mineaki Ohishi) 1. Introduction A linear regression model is one of important tools to clarify the relationship between a scalar response variable and one or more explanatory variables. Let y = (y 1,..., y n ) be an n- 1

2 A Fast Algorithm for Minimizing EGCV in GRR dimensional vector of response variables and X be an n k matrix of nonstochastic centralized explanatory variables (X 1 n = 0 k ) with rank(x) = m min{n 1, k}, where n is the sample size, k is the number of explanatory variables, 1 n is an n-dimensional vector of ones, and 0 k is a k-dimensional vector of zeros. An equation of the linear regression model is y = µ1 n + Xβ + ε, where µ is an unknown location parameter, β is a k-dimensional vector of unknown regression coefficients, and ε = (ε 1,..., ε n ) is an n-dimensional vector of independent error variables from a distribution with the mean 0 and variance σ 2. One of methods to avoid a serious problem of multicollinearity among explanatory variables is the generalized ridge regression (GRR) proposed by Hoerl and Kennard (1970), which gives a shrinkage estimate of β towards 0 k via multiple ridge parameters θ 1,..., θ k, where θ j R + = {θ R θ 0} ( j = 1,..., k). Although the GRR was proposed 45 years ago, it is still used in many research fields (e.g., Liu et al., 2011; Shen et al., 2013). Since an estimator of the GRR estimator (GRRE) of β changes depending on θ 1,..., θ k, it is important how to optimize θ 1,..., θ k. An optimization method based on minimization of a model selection criterion (MSC) with respect to θ 1,..., θ k is one of common methods to optimize θ 1,..., θ k (see e.g., Lawless, 1981; Walker & Page, 2001; Yanagihara, 2013). Most MSCs consist of two terms; one is a discrepancy term that measures a goodness of fit of a model, and the other is a penalty term that applies a penalty to a complexity of a model. The generalized C p (GC p ) criterion (see Atkinson, 1980; Nagai et al., 2012) and generalized cross-validation (GCV) criterion (see Craven & Wahba, 1979; Ye, 1998; Yanagihara, 2013) measure a goodness of fit of model via the residual sum of squares. An optimization method based on the minimization of GC p criterion (called the GC p -minimization method) has the following advantages and disadvantage: Advantages of the GC p -minimization method. (1) The speed of the optimization is very fast because minimizers of GC p criterion can be derived by the closed forms (see Nagai et al., 2012). (2) The strength of the penalty term can be changed freely. A disadvantage of the GC p -minimization method. (1) This method cannot be used when n is smaller than k because the GC p requires an asymptotically unbiased estimator of σ 2. Since we encounter many opportunities to analyze data with huge explanatory variables, i.e., 2

3 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. high-dimensional explanatory variables, we have to say that the GC p -minimization method has a serious disadvantage. Since the serious problem is occurred from a requirement of an asymptotically unbiased estimator of σ 2, it is solvable by using a model selection criterion that does not require the asymptotically unbiased estimator. The GCV criterion is one of model selection criteria that do not require an asymptotically unbiased estimator of σ 2. However, an optimization method based on the minimization of GCV criterion (called the GCV-minimization method) also has the following advantages and disadvantage: Advantages of the GCV-minimization method. (1) The speed of the optimization is very fast because minimizers of the GCV criterion can be derived by the closed forms (see Yanagihara, 2013). (2) This method can be used even when n is smaller than k because the GCV does not require an asymptotically unbiased estimator of σ 2. A disadvantage of the GCV-minimization method. (1) The strength of the penalty term is fixed. Although the GCV-minimization method can be used when n is smaller than k, this method does not work well then. This is because the GRRE hardly shrink towards 0 k (see Yanagihara, 2013). The problem will be avoided if the strength of the penalty term in the GCV can be changed freely. On the other hand, a goodness of fit of a model can be also measured by the Kullback-Leibler (KL) discrepancy (Kullback & Leibler, 1951). Hence, we can define a generalized information criterion (GIC; Nishii, 1984) for the GRR under a normal distribution assumption. An optimization method based on the minimization of GIC (called the GIC-minimization method) also has the following advantages and disadvantage: Advantages of the GIC-minimization method. (1) This method can be used even when n is smaller than k because the GIC does not require an asymptotically unbiased estimator of σ 2. (2) The strength of the penalty to a complexity of a model can be changed freely. A disadvantage of the GIC-minimization method. (1) The speed of the optimization is slow because minimizers of the GIC are derived by an iterative calculation, e.g., the Newton-Raphson method. 3

4 A Fast Algorithm for Minimizing EGCV in GRR In this paper, at the beginning, we propose a fast optimization algorithm to solve a minimization problem of the GIC by using specific candidate minimizers of the GIC of which the number is finite. Unfortunately, we also show that the GIC-minimization method does not work well when n is smaller than k even if the strength of the penalty term is increased. Therefore, we extend the GCV so that the strength of the penalty term can be changed freely. We call this criterion an extended GCV (EGCV), and an optimization method based on the minimization of this criterion the EGCV-minimization method. We also propose a fast optimization algorithm to solve a minimization problem of the EGCV by using specific candidate minimizers of the EGCV of which the number is finite. This paper is organized as follows: In section 2, we describe several results for deriving main theorems. In section 3, we give a fast optimization algorithm of the GIC-minimization method, and show the reason why the GIC-minimization method does not work well when k is larger than n. In section 4, we propose the EGCV criterion and fast optimization algorithm to solve a minimization problem of EGCV. Numerical examinations are conducted in Section 5. Technical details are provided in the Appendix. 2. Preliminaries It follows from the results in Yanagihara (2013) that the GRRE of β and a hat matrix of the GRR are invariant to any changes in θ m+1,..., θ k when m < k. From this fact, we set θ m+1 = = θ k = to reduce a variance of the GRRE of β. Therefore, it is enough just to minimize MSC with respect to θ 1,..., θ m for optimizing ridge parameters in the GRR. Let Q be a kth orthogonal matrix that diagonalizes X X, i.e., Q X D O m,k m XQ =, O k m,m O k m,k m where D is an mth diagonal matrix defined by D = diag(d 1,..., d m ), d 1 d m are nonzero eigenvalues of X X, (2.1) and O k,m is a k m matrix of zeros, and let θ be an m-dimensional vector given by θ = (θ 1,..., θ m ) R m +, and Θ be a kth diagonal matrix given by Θ = diag(θ 1,..., θ m, 0,..., 0), where R m + is the mth Cartesian power set of R +. Notice that d 1,..., d m are positive because X X is a positive semidefinite matrix. Moreover, let M θ be a kth square matrix defined by M θ = X X + QΘQ. In particular, we write M θ = M when θ = 0 m. Then, the GRRE of β is expressed by ˆβ θ = M + θ X y, (2.2) 4

5 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. where A + is the Moore-Penrose inverse matrix of a matrix A (for details of the Moore-Penrose inverse matrix, see e.g., Harville, 1997, chap. 20). It is well known fact that the least square estimator (LSE) of β is given by ˆβ = M + X y. (2.3) It is clear that (2.2) coincides with (2.3) when θ = 0 m. Let ˆµ be the LSE of µ given by ˆµ = ȳ, where ȳ is the sample mean of y 1,..., y n, i.e., ȳ = n j=1 y i /n. The equation (2.2) and ˆµ imply a predictive value of y as ŷ θ = ˆµ1 n + X ˆβ θ = (J n + XM + θ X )y, where J n is an n n projection matrix defined by J n = 1 n 1 n/n. By using M + θ and ŷ θ, we define an estimator of σ 2 and a generalized degree of freedom (df) of the GRR that are main bodies of the discrepancy and penalty terms of MSC, respectively, as ˆσ 2 (θ) = 1 n (y ŷ θ) (y ŷ θ ) = 1 n y ( I n J n XM + θ X ) 2 y, (2.4) df(θ) = tr(j n + XM + θ X ) = 1 + tr(m + θ M). (2.5) Most MSCs for the GRR are defined by a bivariate function of ˆσ 2 (θ) and df(θ). Let z be an m-dimensional vector defined by z = (z 1,..., z m ) = (D 1/2, O m,k m )Q X y = D 1/2 Q 1 X y, (2.6) where Q 1 is a k m matrix defined by Q = (Q 1, Q 2 ). Here, we assume that all z 1,..., z m are not 0. If z j is accidentally 0, then we set d j = 0 and use (z 1,..., z j 1, z j+1,..., z m ) as z. By using the result in Yanagihara (2013), ˆσ 2 (θ) in (2.4) and df(θ) in (2.5) can be rewritten as ˆσ 2 (θ) = 1 m ( ) n n ˆσ2 0 + θ 2 j m z 2 j d j + θ j, df(θ) = 1 + m θ j, (2.7) d j + θ j j=1 where ˆσ 2 0 is the maximum likelihood estimator of σ2 under normality, which is given by ˆσ 2 0 = 1 n y (I n J n XM + X )y. (2.8) It should be kept in mind that ˆσ 2 0 = 0 holds in most cases of the linear regression with highdimensional explanatory variables. Notice that θ j /(d j + θ j ) monotonically increases θ j, and ˆσ 2 (0 m ) = ˆσ 2 0, df(0 m) = m + 1, and j=1 lim θ 1,...,θ m ˆσ2 (θ) = ˆσ 2 = 1 n y (I n J n )y, lim df(θ) = 1. (2.9) θ 1,...,θ m Hence, the ranges of ˆσ 2 (θ) and df(θ) are given by 5

6 A Fast Algorithm for Minimizing EGCV in GRR ˆσ 2 (θ) [ ˆσ 2 0, ˆσ2 ], df(θ) [1, m + 1]. Let f (r, u) be a bivariate function which satisfies the following assumptions: (A1) (A2) (A3) f (r, u) is a continuous function at any (r, u) (0, ˆσ 2 ] [1, n). f (r, u) > 0 for any (r, u) (0, ˆσ 2 ] [1, n). f (r, u) is the first-order partially differentiable at any (r, u) (0, ˆσ 2 ] [1, n), and f r (r, u) = f (r, u) > 0, r f u (r, u) = f (r, u) > 0, u (r, u) (0, ˆσ 2 ] [1, n). Let D be the domain of f. It is easy to see that m = n 1 ˆσ 2 0 = 0. Recall that m n 1. Hence, we have ˆσ D (0, ˆσ2 ] [1, n). By using the bivariate function f, MSC for the GRR can be expressed as MSC(θ) = f ( ˆσ 2 (θ), df(θ)). (2.10) The relationship between an existing criterion and the bivariate function f is as follows: f GCp (r, u) = nr/s αu (GC p : when ˆσ 2 0 0) f (r, u) = f GCV (r, u) = r(1 u/n) 2 (GCV : when u n), (2.11) f GIC (r, u) = r exp(αu/n) (GIC) where α is some positive value expressing the strength of the penalty term, and s 2 0 is an ordinary unbiased estimator of σ 2 when m < n 1, which is given by s 2 0 = n ˆσ2 0 /(n m 1). The GC p includes several famous MSCs, e.g., the GC p coincides with the C p proposed by Mallows (1973) when α = 2, and the modified C p (MC p ) proposed by Fujikoshi and Satoh (1997) and Yanagihara et al. (2009) when α = 1 + 2/(n m 1). Moreover, the original GIC under normality is expressed as n log r+αu. The GIC in this paper is defined by an exponential transformation of the original GIC divided by n. The GIC includes several famous MSCs, e.g., the GIC coincides the AIC proposed by Akaike (1973) when α = 2, the HQC proposed by Hannan and Quinn (1978) when α = 2 log log n, and the BIC proposed by Schwarz (1978) when α = log n. In order to express an optimized GRRE of β systematically, we prepare the following class of θ for any h R + : ˆθ(h) = (ˆθ 1 (h),..., ˆθ m (h)), d j h ˆθ j (h) = z 2 j h (h z 2 j ). (h > z 2 j ) (2.12) 6

7 The GRRE of β based on ˆθ is given by Ohishi, M., Yanagihara, H. & Fujikoshi, Y. ˆβ ˆθ(ĥ) = ˆβ(ĥ) = Q 1 V (ĥ)q 1 ˆβ, (2.13) where ˆβ is the LSE of β given by (2.3), and V (h) is a kth diagonal matrix of which the jth diagonal element is given by v j (h) = d j d j + ˆθ j (h) = 1 h (h z 2 z 2 j ) j. 0 (h > z 2 j ) An optimized θ by minimizing MSC(θ) when the domain of f is a subset of (0, ˆσ 2 ] [1, n) is expressed as in the following theorem (the proof is given in Appendix A.1): Lemma 1. Let ϕ(h) = MSC( ˆθ(h)) (h R + \{0}). (2.14) Suppose that D (0, ˆσ 2 ] [1, n) and ξ > 0 s.t. ϕ(ξ) < lim h 0 ϕ(h). Then, optimized ridge parameters by minimizing the MSC(θ) are given by ˆθ(ĥ), where ĥ is given by ĥ = arg min h R + \{0} ϕ(h). Let t j ( j = 1,..., m) be the jth-order statistic of z 2 1,..., z2 m, i.e., min{z 2 1 t j =,..., z2 m} ( j = 1) min{{z 2 1,..., z2 m} \ {t 1,..., t m 1 }} ( j = 2,..., m), and let R j ( j = 0,..., m) be a range defined by (0, t 1 ] ( j = 0) R j = (t j, t j+1 ] ( j = 1,..., m 1). (2.15) (t m, ) ( j = m) By using t j and R j, we can clarify properties of ϕ(h) given in (2.14) as in the following lemma (the proof is given in Appendix A.2): Lemma 2. The ϕ(h) satisfies the following properties: (P1) ϕ(h) is a continuous at any h R + \{0}. (P2) ϕ(h) = f ( ˆσ 2, 1) for every h t m. 7

8 A Fast Algorithm for Minimizing EGCV in GRR (P3) ϕ(h) is a piecewise function as ϕ(h) = ϕ a (h) (h R a, a = 0, 1,..., m). The specific form of ϕ a (h) can be expressed as ϕ a (h) = f ( ˆσ (c ) 1,a + h 2 c 2,a )/n, 1 + m a hc 2,a, where c 1,a and c 2,a are given by c 1,a = a t j, c 2,a = j=1 m j=a+1 1 t j. (2.16) Let s 2 a (a = 0, 1,..., m) be a statistic defined by and, let a be a positive integer defined by s 2 a = n ˆσ2 0 + c 1,a n m 1 + a, (2.17) a {0, 1,..., m 1} s.t. s 2 a R a, (2.18) From Lemma 2.1 in Yanagihara (2013), we can see that a exists unique when ˆσ Let GC p (θ) = f GCp ( ˆσ 2 (θ), df(θ)), GCV(θ) = f GCV ( ˆσ 2 (θ), df(θ)). From the results in Nagai et al. (2012) and Yanagihara (2013), the minimizers of the GC p and GCV can be expressed as in the following theorem (the proof is omitted because it is easy to obtain from the results in Nagai et al., 2012; Yanagihara, 2013): Theorem 1. Optimized ridge parameters minimizing GC p (θ) and GCV(θ) are ˆθ(ĥ GCp ) and ˆθ(ĥ GCV ), respectively, where ĥ GCp and ĥ GCV are defined as s 2 ĥ GCp = 1 a 2 αs2 0 ( ˆσ2 0 0), ĥ (when ˆσ2 0 0) GCV = t 1 (when ˆσ 2 0 = 0 and m = n 1), 0 (when ˆσ 2 0 = 0 and m < n 1) where s 2 a is given by (2.17) and a is the integer given by (2.18). By using the results in Theorem 1, optimized GRREs of β by minimizing the GC p and GCV can be derived as ˆβ(ĥ GCp ) and ˆβ(ĥ GCV ), respectively. 8

9 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. 3. Algorithm to Solve the Minimization Problem of the GIC Theorem 1 in the previous section indicates that the GC p -minimization method cannot be used when ˆσ 2 0 in 2.8 is 0, i.e., most cases of the linear regression with high dimensional explanatory variables, the GCV-minimization method hardly shrink the GRRE of β towards 0 k when ˆσ 2 0 = 0. On the other hand, the GIC is one of model selection criteria that can be defined even if ˆσ 2 0 = 0 and of which the size of the penalty term can be changed freely. However, at the moment, there is no efficient algorithm to solve the minimization problem of the GIC. Hence, we propose the fast algorithm for the GIC-minimization method. Let GIC(θ) = f GIC ( ˆσ 2 (θ), df(θ)), (3.1) where f GIC (r, u) is given by (2.11), and ˆσ 2 (θ) and df(θ) are given by (2.7), and let ϕ GIC (h) = f GIC ( ˆσ 2 ( ˆθ(h)), df( ˆθ(h))) (h R + \{0}), where ˆθ(h) is given by (2.12). From Lemma 2, we can see that ϕ GIC (h) is a piecewise function as ϕ GIC (h) = ϕ GIC,a (h) (h R a, a = 0, 1,..., m), (3.2) where R a is the range given by (2.15) and ϕ GIC,a (h) is defined by { ϕ GIC,a (h) = ˆσ } { } 1 n (c 1,a + h 2 c 2,a ) exp n α(1 + m a hc 2,a). Here c 1,a and c 2,a (a = 0, 1,..., m) are given by (2.16). Furthermore, we define the function ψ GIC (h) (h (0, t m ]) which is a piecewise function as where ψ GIC,a (h) is defined by ψ GIC (h) = ψ GIC,a (h) (h R a, a = 0, 1,..., m 1), (3.3) ψ GIC,a (h) = c 2,a exp{α(1 + m a hc 2,a )/n} h ϕ GIC,a(h) n 2 = αc 2,a h 2 + 2nh α(n ˆσ c 1,a). (3.4) The minimizer of GIC can be expressed as in the following theorem (the proof is given in Appendix A.3): Theorem 2. equation ψ GIC,a (h) = 0 as Let ξ GIC,a be one of the two distinct roots or the double root of the quadratic 9

10 A Fast Algorithm for Minimizing EGCV in GRR n ξ GIC,a = n 2 α 2 c 2,a (n ˆσ c 1,a) αc 2,a, (3.5) and let A GIC and T GIC be sets defined by A GIC = { } a {0, 1,..., m 1} ξ GIC,a R a, {t m } TGIC = { } (when ˆσ 2 > 2t m /α) (when ˆσ 2 2t m /α), (3.6) where ˆσ 2 is given by (2.9). We define a set of candidate minimizers of ϕ GIC (h) as S GIC = {ξ GIC,a } T GIC. (3.7) a A GIC Then, an optimized ridge parameters by minimizing GIC(θ) are given by ˆθ(ĥ GIC ), where ĥ GIC is given by arg min ϕ GIC (h) (when ˆσ 2 ĥ GIC = h S 0 0) GIC. (3.8) 0 (when ˆσ 2 0 = 0) It should be emphasized that elements of S GIC can be written by closed forms, and the number of elements of S GIC is equal to or smaller than m + 1, i.e., #(S GIC ) m + 1. Hence, when ˆσ 2 0 0, by using the results in Theorem 2, we can optimize θ quickly as follows: A Fast Algorithm to Derive the Optimized GRRE of β by Minimizing the GIC (1) Calculate elements of S GIC by using ξ GIC,a, A GIC and T GIC. (2) Determine ĥ GIC by comparing function values of ϕ GIC (h) at the points in S GIC. (3) Calculate the optimized GRRE of β by minimizing the GIC as ˆβ(ĥ GIC ), where ˆβ(h) is given by (2.13). Unfortunately, Theorem 2 also indicates that the optimized GRRE of β by minimizing GIC is always equal to the LSE of β when ˆσ 2 0 = 0. Thus, we cannot help but say that the GICminimization method does not work well in the case of high-dimensional explanatory variables. When ˆσ 2 0 0, we cannot derive #(S GIC) without comparing function values of ϕ GIC (h). However, #(S GIC ) becomes 1 under the specific α when ˆσ The result is summarized as in the following corollary (the proof is given as Appendix A.4): Corollary 1. Suppose that α n/m and ˆσ Let a be a positive integer defined by a {0, 1,..., m 1} s.t. ξ GIC,a R a. 10

11 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. The integer a exists unique when ˆσ 2 2t m /α. Then ĥ GIC in (3.8) can be rewritten as ξ GIC,a (when ˆσ 2 2t m /α) ĥ GIC = t m (when ˆσ 2 > 2t m /α). 4. Algorithm to Solve the Minimization Problem of the EGCV Theorem 2 in the previous section indicates that the GIC-minimization method does not shrink the GRRE of β towards 0 k at all when ˆσ 2 0 in (2.8) is 0, i.e., most cases of the linear regression with high dimensional explanatory variables. From Theorems 1 and 2, we would have to say regrettably that there are no existing MSCs which work well when ˆσ 2 0 = 0. Hence, we have to propose new MSC which works well even when ˆσ 2 0 = 0. On the other hand, the cause of the problem in GCV is that the strength of the penalty to a complexity of a model is small. Hence, we extend the GCV so that the size of the penalty to a complexity of a model can be changed freely. It is called the extended GCV (EGCV) in this paper. Let f EGCV (r, u) = Then, the EGCV for the GRR is defined as r, α > 2 and u < n. (4.1) (1 u/n) α EGCV(θ) = f EGCV ( ˆσ 2 (θ), df(θ)), (4.2) where ˆσ 2 (θ) and df(θ) are given by (2.7). It is easy to see that the EGCV coincides with the GCV if α = 2. Let ϕ EGCV (h) = f EGCV ( ˆσ 2 ( ˆθ(h)), df( ˆθ(h))) (h R + ), where ˆθ(h) is given by (2.12). From Lemma 2, we can see that ϕ EGCV (h) is a piecewise function as ϕ EGCV (h) = ϕ EGCV,a (h) (h R a, a = 0, 1,..., m), (4.3) where R a is the range given by (2.15) and ϕ EGCV,a (h) is defined by ϕ EGCV,a (h) = ˆσ2 0 + c 1,a/n + h 2 c 2,a /n (b + a/n + hc 2,a /n) α. (4.4) Here c 1,a and c 2,a (a = 0, 1,..., m) are given by (2.16), and b is the constant defined by b = 1 1 (m + 1). (4.5) n Furthermore, we define the function ψ EGCV (h) (h (0, t m ]) which is a piecewise function as 11

12 where ψ EGCV,a (h) is defined by A Fast Algorithm for Minimizing EGCV in GRR ψ EGCV (h) = ψ EGCV,a (h) (h R a, a = 0, 1,..., m 1), (4.6) ψ EGCV,a (h) = n2 (b + a/n + c 2,a h/n) α+1 c 2,a h ϕ EGCV,a(h) = (α 2)c 2,a h 2 + 2(a + nb)h α(n ˆσ c 1,a). (4.7) The minimizer of EGCV can be expressed as in the following theorem (the proof is given in Appendix A.5): Theorem 3. equation ψ EGCV,a = 0 as Let ξ EGCV,a be one of the two distinct roots or the double root of the quadratic (a + nb) ξ EGCV,a = (a + nb) 2 α(α 2)c 2,a (n ˆσ c 1,a) (α 2)c 2,a, and let A EGCV and T EGCV be sets defined by A EGCV = { } {t m } (when ˆσ 2 > 2(1 n 1 )t m /α) a {0, 1,..., m 1} ξ EGCV,a R a, TEGCV = { } (when ˆσ 2 2(1 n 1 )t m /α), where ˆσ 2 is given by (2.9). We define a set of candidate minimizers of ϕ EGCV (h) as S EGCV = {ξ EGCV,a } T EGCV. (4.8) a A EGCV Then, an optimized ridge parameters by minimizing EGCV(θ) are given by ˆθ(ĥ EGCV ), where ĥ EGCV is given by arg min ϕ EGCV (h) (when ˆσ 2 ĥ EGCV = h S 0 0 or b = 0) EGCV. (4.9) 0 (when ˆσ 2 0 = 0 and b 0) It should be emphasized that elements of S EGCV can be written by closed forms, and the number of elements of S EGCV is equal to or smaller than m+1, i.e., #(S EGCV ) m+1. Hence, when ˆσ or b = 0, by using the results in Theorem 3, we can optimize β quickly as follows: A Fast Algorithm to Derive the Optimized GRRE of β by Minimizing the EGCV (1) Calculate elements of S EGCV by using ξ EGCV,a, A EGCV and T EGCV. (2) Determine ĥ EGCV by comparing function values of ϕ EGCV (h) at the points in S EGCV. 12

13 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. (3) Calculate the optimized GRRE of β by minimizing the EGCV as ˆβ(ĥ EGCV ), where ˆβ(h) is given by (2.13). When ˆσ or m = n 1, we cannot derive #(S EGCV) without comparing function values of ϕ EGCV (h). However, #(S EGCV ) becomes 1 under the specific α when ˆσ or m = n 1. The result is summarized as in the following corollary (the proof is given as Appendix A.6): Corollary 2. Suppose that α (n + m 1)/m, and { ˆσ 2 0 0} {b = 0}. Let a be a positive integer defined by a {0, 1,..., m 1} s.t. ξ EGCV,a R a. The integer a exists unique when ˆσ 2 2(1 n 1 )t m /α. Then ĥ EGCV in (4.9) can be rewritten as ξ EGCV,a (when ˆσ 2 2(1 n 1 )t m /α) ĥ EGCV = t m (when ˆσ 2 > 2(1 n 1 )t m /α). 5. Numerical Study In this section, we compared performances of the GC p -, GIC- and EGCV-minimization methods with α = 2, 2 log log n and log n by conducting numerical examinations. We generated data from the simulation model N n (Xβ, I n ), where X = (I n J n )X 0 Φ(ρ) 1/2 and β = M + X η. Here X 0 is an n k matrix of which elements are identically and independently distributed according to U( 1, 1), Φ(ρ) is a k k symmetric matrix of which the (i, j)th element is ρ i j and η is an n-dimensional vector of which the jth element is given by ( 12n(n 1) {( 1) j 1 1 j 1 ) 1 }. 4n 2 + 6n 1 n 2n The simulation model is the same as that in Yanagihara (2013). Let ˆθ be an optimized ridge parameters by minimizing MSC, ŷ be the predictive value of y based on the LSE of β and ŷ ˆθ be the predictive value of y in the GRR based on θ, i.e., ŷ = (J n + XM + X )y, ŷ ˆθ = (J n + XM +ˆθ X )y. We used the following relative mean square error (MSE) as a measurement of performance of each method. E [(ŷ θ Xβ) (ŷ θ Xβ)] E [(ŷ Xβ) (ŷ Xβ)] 100(%) Notice that E [(ŷ Xβ) (ŷ Xβ)] = m+1 in this case. The above expectation was evaluated by Monte Carlo simulation with 10,000 iterations. 13

14 A Fast Algorithm for Minimizing EGCV in GRR Relative MSE (%) EGCV GIC GCp The number of explanatory variables (a) Case of α = 2 Relative MSE (%) EGCV GIC GCp The number of explanatory variables (b) Case of α = 2 log log n Relative MSE (%) EGCV GIC GCp The number of explanatory variables (c) Case of α = log n Figure 1. Case of n = 50 14

15 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. Relative MSE (%) EGCV GIC GCp The number of explanatory variables (a) Case of α = 2 Relative MSE (%) EGCV GIC GCp The number of explanatory variables (b) Case of α = 2 log log n Relative MSE (%) EGCV GIC GCp The number of explanatory variables (c) Case of α = log n Figure 2. Case of n =

16 A Fast Algorithm for Minimizing EGCV in GRR Relative MSE (%) EGCV GIC GCp The number of explanatory variables (a) Case of α = 2 Relative MSE (%) EGCV GIC GCp The number of explanatory variables (b) Case of α = 2 log log n Relative MSE (%) EGCV GIC GCp The number of explanatory variables (c) Case of α = log n Figure 3. Case of n =

17 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. Figures 1, 2 and 3 shows the results when n = 50, 200 and 500, respectively, with ρ = 0.99 and k = k 1,..., k 16, where n(4 + j)/10 ( j = 1,..., 5) k j = n 2 ( j = 6). n{1 + ( j 6)/10} ( j = 7,..., 16) In the figures, there were no results of GC p -minimization methods when k = k 7,..., k 16 because the GC p was not definable when k k 7. From figures, we can see that the performances of the EGCV-minimization method with α = log n were well in most cases. On the other hand, the performances of the GIC-minimization method were very bad when k k 7 because the GIC-minimization method dose not shrink the GRRE of β towards 0 k at all when k k 7. Also, the performances of the GCV-minimization method were also bad when k k 7 because the GCV-minimization method hardly shrink the GRRE of β towards 0 k when k k 7. Moreover, as for the number of candidate minimizers, the following results were derived: mode {#(S GIC )} = 2, min {#(S GIC )} = 2, max {#(S GIC )} = 7, and 3 (k n 1 & α = 2 log log n) mode {#(S EGCV )} =, 2 (otherwise) 3 (k < n 1) min {#(S GIC )} = 2, max {#(S EGCV )} = 12 (k n 1). Hence our algorithm will be efficient because the number of candidate minimizers was small. Acknowledgment The authors thank Dr. Shintaro Hashimoto, Mr. Tomoyuki Nakagawa, Mr. Ryoya Oda and Mr. Shota Ochiai, of Hiroshima University, for helpful comments. References Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory (eds. B. N. Petrov & F. Csáki), pp Akadémiai Kiadó, Budapest. Atkinson, A. C. (1980). A note on the generalized information criterion for choice of a model. Biometrika, 67, Craven, P. & Wahba, G. (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31, Fujikoshi, Y. and Satoh, K. (1997). Modified AIC and C p in multivariate linear regression. Biometrika, 84, Hannan, E. J. & Quinn, B. G. (1979). The determination of the order of an autoregression. J. Roy. Stat. Soc. Ser. B, 41,

18 A Fast Algorithm for Minimizing EGCV in GRR Harville, D. A. (1997). Matrix Algebra from a Statistician s Perspective. Springer-Verlag, New York. Hoerl, A. E. & Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, Kullback, S. & Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Statistics, 22, Lawless, J. F. (1981). Mean squared error properties of generalized ridge regression. J. Amer. Statist. Assoc., 76, Liu, Z., Shen, Y. & Ott, J. (2011). Multilocus association mapping using generalized ridge logistic regression. BMC Bioinformatics, 12, Mallows, C. L. (1973). Some comments on C p. Technometrics, 15, Nagai, I., Yanagihara, H. & Satoh, K. (2012). Optimization of ridge parameters in multivariate generalized ridge regression by plug-in methods. Hiroshima Math. J., 42, Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist., 12, Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist., 6, Shen, X., Alam, M., Fikse, F. & Rönnegård, L. (2013). A novel generalized ridge regression method for quantitative genetics. Genetics, 193, Walker, S. G. & Page, C. J. (2001). Generalized ridge regression and a generalization of the C p statistic. J. Appl. Statist., 28, Yanagihara, H. (2013). Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. TR-No , Hiroshima Statistical Research Group, Hiroshima University. Yanagihara, H., Nagai, I. & Satoh, K. (2009). A bias-corrected C p criterion for optimizing ridge parameters in multivariate generalized ridge regression. Japanese J. Appl. Statist., 38, (in Japanese). Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc., 93, Appendix A.1. Proof of Lemma 1 Let δ = (δ 1,..., δ m ) be an m-dimensional vector of which the jth element δ j [0, 1] ( j = 1,..., m) is defined by δ j = θ j d j + θ j, where θ 1,..., θ m are ridge parameters, and d j is given by (2.1). Since θ δ is one to one, we minimize MSC via δ instead of θ. By using δ, we rewrite ˆσ 2 (θ) and df(θ) in (2.7) as a function with respect to δ as ˆσ 2 (θ) = r(δ) = 1 m n n ˆσ2 0 + j=1 δ 2 j z2 j, m df(θ) = u(δ) = 1 + m δ j, j=1 (A.1) where z 2 j and ˆσ2 0 are given by (2.6) and (2.8), respectively. By using r(δ) and u(δ), we rewrite MSC(θ) in (2.10) as a function with respect to δ as MSC(θ) = g(δ) = f (r(δ), u(δ)). (A.2) 18

19 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. Since D (0, ˆσ 2 ] [1, n), we know that δ 0 m. From the assumption of f, we have f r (r(δ), u(δ)) > 0 and f u (r(δ), u(δ)) > 0, where f r (r, u) and f u (r, u) are the first partial derivatives of f with respect to r and u, respectively. Let τ(δ) = n f u (r(δ), u(δ)) 2 f r (r(δ), u(δ)). It is clear that τ(δ) > 0 when D (0, ˆσ 2 ] [1, n). Notice that g(δ) = r(δ) f (r, u) + u(δ) f (r, u) δ j δ j r (r,u)=(r(δ),u(δ)) δ j u (r,u)=(r(δ),u(δ)) = 1 n 2z2 j δ j f r (r(δ), u(δ)) f u (r(δ), u(δ)) = 2 n z2 j f r (r(δ), u(δ)) δ j τ(δ) z 2. j (A.3) Let δ = (δ 1,..., δ m) be the minimizer of g(δ), i.e., δ = arg min g(δ), δ [0,1] m \{0 m } where [0, 1] m is the mth Cartesian power of the set [0, 1]. From (A.3), necessary condition of δ is given by Let G be a set defined by τ(δ ) (τ(δ ) z 2 δ j = z 2 j ) j ( j = 1,..., m). 1 (τ(δ ) > z 2 j ) G = { δ = (δ 1,..., δ m ) [0, 1] m δ = ˆδ(h), h R + \{0} }, where ˆδ(h) is the m-dimensional vector of which jth element is defined by h (h z 2 ˆδ j (h) = z 2 j ) j ( j = 1,..., m). (A.4) 1 (h > z 2 j ) It follows from the fact G [0, 1] m \{0 m } that g(δ ) = Moreover, it is clear that δ G. This implies that min g(δ) min g(δ) = min g( ˆδ(h)). δ [0,1] m \{0 m } δ G h R + \{0} g(δ ) min g( ˆδ(h)). h R + \{0} 19

20 Therefore, we have δ = ˆδ(ĥ), where A Fast Algorithm for Minimizing EGCV in GRR ĥ = arg min h R + \{0} g( ˆδ(h)). Recall that g(δ) = MSC(θ) and d j δ j (if δ j 1) θ j = 1 δ j (if δ j = 1). Moreover, ˆβ θ is rewitten as ˆβ θ = (X X + QΘQ ) + X y = Q(D + Θ) + Q Xy = Q(D + Θ) + DQ QD + Q X y = QV Q ˆβ, where V = (D + Θ) + D and ˆβ is the LSE of β given by (2.3). When θ 0 m, the jth diagonal element of V is given by d j (when θ j < ) v j = d j + θ j. 0 (when θ j = ) It follows from the simple calculation that d j /{d j +ˆθ j (h)} = 1 h/z 2 j when h z2 j. Consequently, Lemma 1 is proved. A.2. Proof of Lemma 2 From (A.2), we can see that ϕ(h) = MSC( ˆθ(h)) = f (r( ˆδ(h)), u( ˆδ(h))), where ˆθ(h) is given by (2.12), ˆδ(h) are given by (A.4), and r(δ) and u(δ) are given by (A.1), respectively. Since t 1,..., t m are order statistics of z 2 1,..., z2 m, we have r( ˆδ(h)) = ˆσ m n I(z2 j h) h 2 1 z z 2 j = ˆσ m { I(t j h) n j=1 j j=1 m u( ˆδ(h)) = 1 + m I(z2 j h) h 1 z m { = 1 + m I(t j h) j j=1 j=1 ( ) 2 h 1 + 1} t j, t j ( ) } h 1 + 1, t j where I(x h) is an indicator function, i.e., I(x h) = 1 if x h and I(x h) = 0 if x < h. It is clear that r( ˆδ(h)) and u( ˆδ(h)) are continuous functions on h R + \{0}. Therefore, the property (P1) is proved. Notice that when h R a, we have 20

21 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. r( ˆδ(h)) = r a ( ˆδ(h)) = ˆσ a m h 2 n t j + t j=1 j=a+1 j = ˆσ n (c 1,a + h 2 c 2,a ), a m 1 u( ˆδ(h)) = u a ( ˆδ(h)) = 1 + m 1 + = 1 + m a hc 2,a, j=1 t j=a+1 j where c 1,a and c 2,a are given by (2.16). The property (P3) is proved by this equation. Notice that c 2,m = 0 and n ˆσ c 1,m = n ˆσ z z = y (I n J n XM + X )y + y XM + X y = n ˆσ 2. These imply that r m ( ˆδ(h)) = ˆσ 2 and u m ( ˆδ(h)) = 1 whem h R m. Hence, the property (P2) is proved. A.3. Proof of Theorem 2 At first, we derive the minimizer of GIC(θ) in (3.1) when ˆσ 2 0 in (2.8) is 0. It is easy to see that ˆσ 2 (0 m ) = 0 when ˆσ 2 0 = 0, and f GIC(0, u) = 0 holds for any u [1, n], where ˆσ(θ) is given by (2.4) and f GIC (r, u) is given by (2.11). Moreover, f GIC (r, u) > 0 holds for any (r, u) (0, ˆσ 2 ] [1, n], where ˆσ 2 is given by (2.9). Notice that ˆσ 2 (θ) = 0 holds if and only if θ = 0 m and ˆσ 2 0 = 0, and ˆθ(0) = 0 m holds, where ˆθ(h) is given by (2.12). These results imply that the minimizer of GIC(θ) is ˆθ(0) when ˆσ 2 0 = 0. Next, we derive the minimizer of GIC(θ) when ˆσ Then the domain of f GIC is included in (0, ˆσ 2 ] [1, n). Moreover, lim ψ GIC,0(h) = α ˆσ 2 0 < 0, h 0 where ψ GIC,a (h) is given by (3.4). This indicates that some positive number ξ exists such that ϕ GIC (ξ) < lim h 0 ϕ GIC (h), where ϕ GIC (h) is given by (3.2). Therefore, by using Lemma 1, the minimizer of GIC(θ) is given by ˆθ(ĥ GIC ), where ĥ GIC is the minimizer of ϕ GIC (h) as ĥ GIC = arg min h R + \{0} ϕ GIC(h). Hence, in order to prove Theorem 2 when ˆσ 2 0 0, it is enough just to show ĥ GIC S GIC, where S GIC is a set of candidate minimizers of ϕ GIC (h) given by (3.7). From (P1) and (P2) in Lemma 2, we can see that ϕ GIC (h) is continuous at any h R + \{0} and ϕ GIC (h) = ˆσ 2 exp(α/n) when h R m. The ψ GIC (h) in (3.3) is also continuous at any h (0, t m ] because of ψ GIC,a (t a+1 ) = αc 2,a ta nt a+1 α(n ˆσ c 1,a) ( = α c 2,a ) ta+1 2 t + 2nt a+1 α ( n ˆσ c ) 1,a+1 t a+1 a+1 = αc 2,a+1 t 2 a+1 + 2nt a+1 α(n ˆσ c 1,a+1) = ψ GIC,a+1 (t a+1 ) (a = 0,..., m 2), 21

22 A Fast Algorithm for Minimizing EGCV in GRR where c 1,a and c 2,a are given by (2.16). It should be emphasized that the sign of ϕ GIC,a (h)/ h is equal to that of ψ GIC,a (h). Hence, we have the necessary condition of ĥ GIC as {{ ψgic (ĥ GIC ) = 0 } { ϵ 0 > 0 s.t. ϵ (0, ϵ 0 ), ψ GIC (ĥ GIC ϵ) < 0 }} {ĥgic = t m s.t. ψ GIC,m 1 (t m ) < 0 } (A.5). Notice that ψ GIC,m 1 (t m ) = α t m t 2 m + 2nt m α(n ˆσ 2 t m ) = 2nt m αn ˆσ 2. Hence, we derive ψ GIC,m 1 (t m ) < 0 ˆσ 2 < 2 α t m. (A.6) Recall that ψ GIC,a (h) is the concave quadratic function. Hence, we have { ψgic (ĥ GIC ) = 0 } { ϵ 0 > 0 s.t. ϵ (0, ϵ 0 ), ψ GIC (ĥ GIC ϵ) < 0 } { ĥ GIC is the smaller of the two real distinct roots or the double (A.7) root of ψ GIC,a (h) = 0 which is included in R a (a = 0, 1,..., m 1) }. Notice that ξ GIC,a in (3.5) is the smaller of the two real distinct roots or the double root of ψ GIC,a (h) = 0 when α 2 c 2,a (n ˆσ 2 0 +c 1,a) n 2. Therefore, from (A.5), (A.6) and (A.7), ĥ GIC S GIC can be shown. Consequently, Theorem 2 is proved. A.4. Proof of Corollary 1 It follows from (3.4) that ψ GIC,a (h) = αc 2,a ( h n αc 2,a ) 2 + n2 αc 2,a α(n ˆσ c 1,a), where ˆσ 2 0 is given by (2.8), and c 1,a and c 2,a are given by (2.16). This indicates that the h- coordinate of the vertex of ψ GIC,a (h) is n/(αc 2,a ). It follows from the equation t 1 t m that for any a {0, 1,..., m 1}, Hence, we have c a,2 = m j=a+1 1 t j m j=a+1 1 = m a m. t a+1 t a+1 t a+1 n n αc 2,a αm t a+1 (a = 0, 1,..., m 1). (A.8) From this result, we can see that n/(αc 2,a ) t a+1 if α n/m. This indicates that ψ GIC,a (h) is a strictly increasing function at h R a in (2.15) if α n/m because the h-coordinate of the vertex of ψ GIC,a (h) is larger or equal to t a+1, which is the right end point of R a. Hence, it should 22

23 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. be emphasized that ψ GIC (h) is a strictly increasing function at h (0, t m ] when α n/m. Recall that ψ GIC (h) is continuous at any h (0, t m ] and lim h 0 ψ GIC,0 (h) < 0 when ˆσ 2 0 0, where ˆσ2 0 is given by (2.8). Therefore, when ˆσ and α n/m, we have ψ GIC,m 1 (t m ) 0 = the unique solution of ψ GIC (h) = 0 exisits in h (0, t m ] {#(A GIC ) = 1} {T GIC = }, ψ GIC,m 1 (t m ) < 0 = there is no solution of ψ GIC (h) = 0 in h (0, t m ] {A GIC = } {T GIC = {t m }}, where A GIC and T GIC are given by (3.6). From the above results and (A.6), Corollary 1 can be proved. A.5. Proof of Theorem 3 At first, we derive the minimizer of EGCV(θ) in (4.2) when ˆσ 2 0 = 0 and b 0, where ˆσ 2 0 and b are given by (2.8) and (4.5), respectively. It is easy to see that ˆσ2 (0 m ) = 0 when ˆσ 2 0 = 0, df(θ) < n when b 0, and f EGCV(0, u) = 0 holds for any u [1, n), where ˆσ 2 (θ) is given by (2.4) and f EGCV (r, u) is given by (4.1). Moreover, f EGCV (r, u) > 0 holds for any (r, u) (0, ˆσ 2 ] [1, n), where ˆσ 2 is given by (2.9). Notice that ˆσ 2 (θ) = 0 holds if and only if θ = 0 m and ˆσ 2 0 = 0, and ˆθ(0) = 0 m holds, where ˆθ(h) is given by (2.12). These results imply that the minimizer of EGCV(θ) is ˆθ(0) when ˆσ 2 0 = 0 and b 0. Next, we consider when ˆσ or b = 0. Since r < n in f EGCV(r, u), it is necessary to satisfy the inequality df(θ) < n, where df(θ) is given by (2.5). From (2.7), we can see that df(θ) < n θ 0 m ˆσ 2 (θ) 0, where ˆσ 2 (θ) is given by (2.4). This indicates that the domain of f EGCV is included in (0, ˆσ 2 ] [1, n). Notice that ϕ EGCV,0 (h) = ˆσ2 0 + h2 c 2,0 /n (b + hc 2,0 /n) α, ψ EGCV,0(h) = (α 2)c 2,0 h 2 + 2nbh αn ˆσ 2 0, where ϕ EGCV,a (h), ψ EGCV,a (h) and c 2,a are given by (4.4), (4.7) and (2.16), respectively. Recall that b = 0 ˆσ 2 0 = 0. It follows from the above equations and the assumption α > 2 that lim ϕ ˆσ 2 0 EGCV,0(h) = /bα ( ˆσ 2 0 0) αn ˆσ 2 0, lim ψ EGCV,0 (h) = < 0 ( ˆσ2 0 0). h 0 (b = 0) h 0 0 (b = 0) These results imply that some positive number ξ exists such that ϕ EGCV (ξ) < lim h 0 ϕ EGCV (h) when ˆσ or b = 0, where ϕ EGCV(h) is given by (4.3). Therefore, by using Lemma 1, the minimizer of EGCV(θ) is given by ˆθ(ĥ EGCV ), where ĥ EGCV is the minimizer of ϕ EGCV (h) as ĥ EGCV = arg min h R + \{0} ϕ EGCV(h). 23

24 A Fast Algorithm for Minimizing EGCV in GRR From (P1) and (P2) in Lemma 2, we can see that ϕ EGCV (h) is continuous at any h R + \{0} and ϕ EGCV (h) = ˆσ 2 /(1 n 1 ) α when h R m, where ˆσ 2 is given by (2.9). The ψ EGCV (h) which is given by (4.6) is also continuous at any h (0, t m ] because of ψ EGCV,a (t a+1 ) = (α 2)c 2,a ta (a + nb)t a+1 α(n ˆσ c 1,a) ( = (α 2) c 2,a ) ta+1 2 t + 2(a + nb)t a+1 α ( n ˆσ c ) 1,a+1 t a+1 a+1 = (α 2)c 2,a+1 t 2 a+1 + 2(a nb)t a+1 α(n ˆσ c 1,a+1) = ψ EGCV,a+1 (t a+1 ) (a = 0,..., m 2), where c 1,a is given by (2.16). Notice that ψ EGCV,m 1 (t m ) = α 2 t m t 2 m + 2(n 2)t m α(n ˆσ 2 t m ) = 2(n 1)t m αn ˆσ 2. Hence, by the same way as in the proof of Theorem 2, we can show that a set of candidate minimizers of ϕ GIC (h) is given by S EGCV in (4.8). Consequently, Theorem 3 is proved. A.6. Proof of Corollary 2 It follows from (4.7) that ψ EGCV,a (h) = (α 2)c 2,a { h a + nb (α 2)c 2,a } 2 + (a + nb)2 (α 2)c a,2 α(n ˆσ c 1,a), where ˆσ 2 0 and b are given by (2.8) and (4.5), respectively, and c a,1 and c a,2 are given by (2.16). This indicates that the h-coordinate of the vertex of ψ EGCV,a (h) is (a + nb)/{(α 2)c a,2 }. From (A.8), we derive a + nb a + nb (α 2)c 2,a (α 2)m t a+1 n m 1 (α 2)m t a+1 (a = 0, 1,..., m 1). From this result, we can see that (a+nb)/{(α 2)c 2,a } t a+1 if α (n+m 1)/m. This indicates that ψ EGCV,a (h) is a strictly increasing function at h R a in (2.15) if α (n + m 1)/m because the h-coordinate of the vertex of ψ EGCV,a (h) is larger or equal to t a+1, which is the right end point of R a. It should be emphasized that ψ GIC (h) is a strictly increasing function at h (0, t m ] when α (n + m 1)/m. Hence, by the same way as in the proof of Corollary 1, we can prove Corollary 2. 24

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions TR-No. 14-07, Hiroshima Statistical Research Group, 1 12 Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 2, Mariko Yamamura 1 and Hirokazu Yanagihara 2 1

More information

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions c 215 FORMATH Research Group FORMATH Vol. 14 (215): 27 39, DOI:1.15684/formath.14.4 Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 1,

More information

An Unbiased C p Criterion for Multivariate Ridge Regression

An Unbiased C p Criterion for Multivariate Ridge Regression An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University

More information

Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

Selection of Model Selection Criteria for Multivariate Ridge Regression

Selection of Model Selection Criteria for Multivariate Ridge Regression Selection of Model Selection Criteria for Multivariate Ridge Regression (Last Modified: January, 24) Isamu Nagai, Department of Mathematics, Graduate School of Science, Hiroshima University -3- Kagamiyama,

More information

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

More information

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3

More information

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification TR-No. 13-08, Hiroshima Statistical Research Group, 1 23 A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

More information

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables Ryoya Oda and Hirokazu Yanagihara Department of Mathematics,

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration Ryoya Oda, Yoshie Mima, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Tetsuro Sakurai and Yasunori Fujikoshi Center of General Education, Tokyo University

More information

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Modfiied Conditional AIC in Linear Mixed Models

Modfiied Conditional AIC in Linear Mixed Models CIRJE-F-895 Modfiied Conditional AIC in Linear Mixed Models Yuki Kawakubo Graduate School of Economics, University of Tokyo Tatsuya Kubokawa University of Tokyo July 2013 CIRJE Discussion Papers can be

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification (Last Modified: May 22, 2012) Yusuke Hashiyama, Hirokazu Yanagihara 1 and Yasunori

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity International Journal of Statistics and Applications 21, (): 17-172 DOI: 1.523/j.statistics.21.2 Regularized Multiple Regression Methods to Deal with Severe Multicollinearity N. Herawati *, K. Nisa, E.

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Improved Liu Estimators for the Poisson Regression Model

Improved Liu Estimators for the Poisson Regression Model www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan.

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan. High-Dimensional Properties of Information Criteria and Their Efficient Criteria for Multivariate Linear Regression Models with Covariance Structures Tetsuro Sakurai and Yasunori Fujikoshi Center of General

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan A STUDY ON THE BIAS-CORRECTION EFFECT OF THE AIC FOR SELECTING VARIABLES IN NOR- MAL MULTIVARIATE LINEAR REGRESSION MOD- ELS UNDER MODEL MISSPECIFICATION Authors: Hirokazu Yanagihara Department of Mathematics,

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable

More information

Multivariate Regression Modeling for Functional Data

Multivariate Regression Modeling for Functional Data Journal of Data Science 6(2008), 313-331 Multivariate Regression Modeling for Functional Data Hidetoshi Matsui 1, Yuko Araki 2 and Sadanori Konishi 1 1 Kyushu University and 2 Kurume University Abstract:

More information

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study

More information

Measuring Local Influential Observations in Modified Ridge Regression

Measuring Local Influential Observations in Modified Ridge Regression Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION

STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION AND By Zhidong Bai,, Yasunori Fujikoshi, and Jiang Hu, Northeast Normal University and Hiroshima University

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006 Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION

EFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION EFFICIENCY of the PRINCIPAL COMPONEN LIU- YPE ESIMAOR in LOGISIC REGRESSION Authors: Jibo Wu School of Mathematics and Finance, Chongqing University of Arts and Sciences, Chongqing, China, linfen52@126.com

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 :

More information

arxiv: v3 [stat.me] 4 Jan 2012

arxiv: v3 [stat.me] 4 Jan 2012 Efficient algorithm to select tuning parameters in sparse regression modeling with regularization Kei Hirose 1, Shohei Tateishi 2 and Sadanori Konishi 3 1 Division of Mathematical Science, Graduate School

More information

ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE

ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE Sankhyā : The Indian Journal of Statistics 999, Volume 6, Series B, Pt. 3, pp. 488 495 ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE By S. HUDA and A.A. AL-SHIHA King Saud University, Riyadh, Saudi Arabia

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Regression I: Mean Squared Error and Measuring Quality of Fit

Regression I: Mean Squared Error and Measuring Quality of Fit Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA

APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA Journal of Research (Science), Bahauddin Zakariya University, Multan, Pakistan. Vol.15, No.1, June 2004, pp. 97-106 ISSN 1021-1012 APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA G. R. Pasha 1 and

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Variable selection in linear regression through adaptive penalty selection

Variable selection in linear regression through adaptive penalty selection PREPRINT 國立臺灣大學數學系預印本 Department of Mathematics, National Taiwan University www.math.ntu.edu.tw/~mathlib/preprint/2012-04.pdf Variable selection in linear regression through adaptive penalty selection

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors

Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors James Younker Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment

More information

Estimation of parametric functions in Downton s bivariate exponential distribution

Estimation of parametric functions in Downton s bivariate exponential distribution Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr

More information

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression Working Paper 2016:1 Department of Statistics On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression Sebastian

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

2.2 Classical Regression in the Time Series Context

2.2 Classical Regression in the Time Series Context 48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression

More information

An Information Criteria for Order-restricted Inference

An Information Criteria for Order-restricted Inference An Information Criteria for Order-restricted Inference Nan Lin a, Tianqing Liu 2,b, and Baoxue Zhang,2,b a Department of Mathematics, Washington University in Saint Louis, Saint Louis, MO 633, U.S.A. b

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,

More information

Discrepancy-Based Model Selection Criteria Using Cross Validation

Discrepancy-Based Model Selection Criteria Using Cross Validation 33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS Advances in Adaptive Data Analysis Vol. 2, No. 4 (2010) 451 462 c World Scientific Publishing Company DOI: 10.1142/S1793536910000574 MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS STAN LIPOVETSKY

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Order Selection for Vector Autoregressive Models

Order Selection for Vector Autoregressive Models IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 2, FEBRUARY 2003 427 Order Selection for Vector Autoregressive Models Stijn de Waele and Piet M. T. Broersen Abstract Order-selection criteria for vector

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure

Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Sonderforschungsbereich 386, Paper 478 (2006) Online unter: http://epub.ub.uni-muenchen.de/

More information

Optimum designs for model. discrimination and estimation. in Binary Response Models

Optimum designs for model. discrimination and estimation. in Binary Response Models Optimum designs for model discrimination and estimation in Binary Response Models by Wei-Shan Hsieh Advisor Mong-Na Lo Huang Department of Applied Mathematics National Sun Yat-sen University Kaohsiung,

More information

Föreläsning /31

Föreläsning /31 1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Estimating prediction error in mixed models

Estimating prediction error in mixed models Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models

More information

Large Sample Theory For OLS Variable Selection Estimators

Large Sample Theory For OLS Variable Selection Estimators Large Sample Theory For OLS Variable Selection Estimators Lasanthi C. R. Pelawa Watagoda and David J. Olive Southern Illinois University June 18, 2018 Abstract This paper gives large sample theory for

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Tuning parameter selectors for the smoothly clipped absolute deviation method

Tuning parameter selectors for the smoothly clipped absolute deviation method Tuning parameter selectors for the smoothly clipped absolute deviation method Hansheng Wang Guanghua School of Management, Peking University, Beijing, China, 100871 hansheng@gsm.pku.edu.cn Runze Li Department

More information

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under

More information

Key Words: first-order stationary autoregressive process, autocorrelation, entropy loss function,

Key Words: first-order stationary autoregressive process, autocorrelation, entropy loss function, ESTIMATING AR(1) WITH ENTROPY LOSS FUNCTION Małgorzata Schroeder 1 Institute of Mathematics University of Białystok Akademicka, 15-67 Białystok, Poland mszuli@mathuwbedupl Ryszard Zieliński Department

More information

AR-order estimation by testing sets using the Modified Information Criterion

AR-order estimation by testing sets using the Modified Information Criterion AR-order estimation by testing sets using the Modified Information Criterion Rudy Moddemeijer 14th March 2006 Abstract The Modified Information Criterion (MIC) is an Akaike-like criterion which allows

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Model selection using penalty function criteria

Model selection using penalty function criteria Model selection using penalty function criteria Laimonis Kavalieris University of Otago Dunedin, New Zealand Econometrics, Time Series Analysis, and Systems Theory Wien, June 18 20 Outline Classes of models.

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information