A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion
|
|
- Charles Cunningham
- 5 years ago
- Views:
Transcription
1 TR-No , Hiroshima Statistical Research Group, 1 24 A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion Mineaki Ohishi, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate School of Science, Hiroshima University Kagamiyama, Higashi-Hiroshima, Hiroshima , Japan Abstract In this paper, we deal with an optimization of ridge parameters in a generalized ridge regression by minimizing a model selection criterion. Optimization methods based on minimizations of a generalized C p (GC p ) criterion and GCV criterion have several important advantages and several major disadvantages. The most important advantage of two methods is that the speed of an optimization is very fast because minimizers of two criteria are given by closed forms. The serious disadvantage of the two methods is that the optimization methods do not work well when the number of explanatory variables is larger than the sample size. In order to improve the disadvantage while maintaining the advantage, at first we extend the GCV criterion, called the extended GCV (EGCV), by changing the second power of its denominator to the α-power, where α is some positive number larger than 2. Next, we propose a fast optimization algorithm to minimize the EGCV by using specific candidate minimizers of the EGCV of which the number is finite. By conducting numerical examinations, we verify that the optimization method based on the minimization of the EGCV performs well than those based on the minimizations of existing criteria. (Last Modified: April 14, 2017) Key words: Generalized ridge regression, Generalized cross-validation criterion, Linear regression model, Model selection criterion, Ridge parameters, Optimization of ridge parameters. address: mineaki-ohishi@hiroshima-u.ac.jp (Mineaki Ohishi) 1. Introduction A linear regression model is one of important tools to clarify the relationship between a scalar response variable and one or more explanatory variables. Let y = (y 1,..., y n ) be an n- 1
2 A Fast Algorithm for Minimizing EGCV in GRR dimensional vector of response variables and X be an n k matrix of nonstochastic centralized explanatory variables (X 1 n = 0 k ) with rank(x) = m min{n 1, k}, where n is the sample size, k is the number of explanatory variables, 1 n is an n-dimensional vector of ones, and 0 k is a k-dimensional vector of zeros. An equation of the linear regression model is y = µ1 n + Xβ + ε, where µ is an unknown location parameter, β is a k-dimensional vector of unknown regression coefficients, and ε = (ε 1,..., ε n ) is an n-dimensional vector of independent error variables from a distribution with the mean 0 and variance σ 2. One of methods to avoid a serious problem of multicollinearity among explanatory variables is the generalized ridge regression (GRR) proposed by Hoerl and Kennard (1970), which gives a shrinkage estimate of β towards 0 k via multiple ridge parameters θ 1,..., θ k, where θ j R + = {θ R θ 0} ( j = 1,..., k). Although the GRR was proposed 45 years ago, it is still used in many research fields (e.g., Liu et al., 2011; Shen et al., 2013). Since an estimator of the GRR estimator (GRRE) of β changes depending on θ 1,..., θ k, it is important how to optimize θ 1,..., θ k. An optimization method based on minimization of a model selection criterion (MSC) with respect to θ 1,..., θ k is one of common methods to optimize θ 1,..., θ k (see e.g., Lawless, 1981; Walker & Page, 2001; Yanagihara, 2013). Most MSCs consist of two terms; one is a discrepancy term that measures a goodness of fit of a model, and the other is a penalty term that applies a penalty to a complexity of a model. The generalized C p (GC p ) criterion (see Atkinson, 1980; Nagai et al., 2012) and generalized cross-validation (GCV) criterion (see Craven & Wahba, 1979; Ye, 1998; Yanagihara, 2013) measure a goodness of fit of model via the residual sum of squares. An optimization method based on the minimization of GC p criterion (called the GC p -minimization method) has the following advantages and disadvantage: Advantages of the GC p -minimization method. (1) The speed of the optimization is very fast because minimizers of GC p criterion can be derived by the closed forms (see Nagai et al., 2012). (2) The strength of the penalty term can be changed freely. A disadvantage of the GC p -minimization method. (1) This method cannot be used when n is smaller than k because the GC p requires an asymptotically unbiased estimator of σ 2. Since we encounter many opportunities to analyze data with huge explanatory variables, i.e., 2
3 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. high-dimensional explanatory variables, we have to say that the GC p -minimization method has a serious disadvantage. Since the serious problem is occurred from a requirement of an asymptotically unbiased estimator of σ 2, it is solvable by using a model selection criterion that does not require the asymptotically unbiased estimator. The GCV criterion is one of model selection criteria that do not require an asymptotically unbiased estimator of σ 2. However, an optimization method based on the minimization of GCV criterion (called the GCV-minimization method) also has the following advantages and disadvantage: Advantages of the GCV-minimization method. (1) The speed of the optimization is very fast because minimizers of the GCV criterion can be derived by the closed forms (see Yanagihara, 2013). (2) This method can be used even when n is smaller than k because the GCV does not require an asymptotically unbiased estimator of σ 2. A disadvantage of the GCV-minimization method. (1) The strength of the penalty term is fixed. Although the GCV-minimization method can be used when n is smaller than k, this method does not work well then. This is because the GRRE hardly shrink towards 0 k (see Yanagihara, 2013). The problem will be avoided if the strength of the penalty term in the GCV can be changed freely. On the other hand, a goodness of fit of a model can be also measured by the Kullback-Leibler (KL) discrepancy (Kullback & Leibler, 1951). Hence, we can define a generalized information criterion (GIC; Nishii, 1984) for the GRR under a normal distribution assumption. An optimization method based on the minimization of GIC (called the GIC-minimization method) also has the following advantages and disadvantage: Advantages of the GIC-minimization method. (1) This method can be used even when n is smaller than k because the GIC does not require an asymptotically unbiased estimator of σ 2. (2) The strength of the penalty to a complexity of a model can be changed freely. A disadvantage of the GIC-minimization method. (1) The speed of the optimization is slow because minimizers of the GIC are derived by an iterative calculation, e.g., the Newton-Raphson method. 3
4 A Fast Algorithm for Minimizing EGCV in GRR In this paper, at the beginning, we propose a fast optimization algorithm to solve a minimization problem of the GIC by using specific candidate minimizers of the GIC of which the number is finite. Unfortunately, we also show that the GIC-minimization method does not work well when n is smaller than k even if the strength of the penalty term is increased. Therefore, we extend the GCV so that the strength of the penalty term can be changed freely. We call this criterion an extended GCV (EGCV), and an optimization method based on the minimization of this criterion the EGCV-minimization method. We also propose a fast optimization algorithm to solve a minimization problem of the EGCV by using specific candidate minimizers of the EGCV of which the number is finite. This paper is organized as follows: In section 2, we describe several results for deriving main theorems. In section 3, we give a fast optimization algorithm of the GIC-minimization method, and show the reason why the GIC-minimization method does not work well when k is larger than n. In section 4, we propose the EGCV criterion and fast optimization algorithm to solve a minimization problem of EGCV. Numerical examinations are conducted in Section 5. Technical details are provided in the Appendix. 2. Preliminaries It follows from the results in Yanagihara (2013) that the GRRE of β and a hat matrix of the GRR are invariant to any changes in θ m+1,..., θ k when m < k. From this fact, we set θ m+1 = = θ k = to reduce a variance of the GRRE of β. Therefore, it is enough just to minimize MSC with respect to θ 1,..., θ m for optimizing ridge parameters in the GRR. Let Q be a kth orthogonal matrix that diagonalizes X X, i.e., Q X D O m,k m XQ =, O k m,m O k m,k m where D is an mth diagonal matrix defined by D = diag(d 1,..., d m ), d 1 d m are nonzero eigenvalues of X X, (2.1) and O k,m is a k m matrix of zeros, and let θ be an m-dimensional vector given by θ = (θ 1,..., θ m ) R m +, and Θ be a kth diagonal matrix given by Θ = diag(θ 1,..., θ m, 0,..., 0), where R m + is the mth Cartesian power set of R +. Notice that d 1,..., d m are positive because X X is a positive semidefinite matrix. Moreover, let M θ be a kth square matrix defined by M θ = X X + QΘQ. In particular, we write M θ = M when θ = 0 m. Then, the GRRE of β is expressed by ˆβ θ = M + θ X y, (2.2) 4
5 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. where A + is the Moore-Penrose inverse matrix of a matrix A (for details of the Moore-Penrose inverse matrix, see e.g., Harville, 1997, chap. 20). It is well known fact that the least square estimator (LSE) of β is given by ˆβ = M + X y. (2.3) It is clear that (2.2) coincides with (2.3) when θ = 0 m. Let ˆµ be the LSE of µ given by ˆµ = ȳ, where ȳ is the sample mean of y 1,..., y n, i.e., ȳ = n j=1 y i /n. The equation (2.2) and ˆµ imply a predictive value of y as ŷ θ = ˆµ1 n + X ˆβ θ = (J n + XM + θ X )y, where J n is an n n projection matrix defined by J n = 1 n 1 n/n. By using M + θ and ŷ θ, we define an estimator of σ 2 and a generalized degree of freedom (df) of the GRR that are main bodies of the discrepancy and penalty terms of MSC, respectively, as ˆσ 2 (θ) = 1 n (y ŷ θ) (y ŷ θ ) = 1 n y ( I n J n XM + θ X ) 2 y, (2.4) df(θ) = tr(j n + XM + θ X ) = 1 + tr(m + θ M). (2.5) Most MSCs for the GRR are defined by a bivariate function of ˆσ 2 (θ) and df(θ). Let z be an m-dimensional vector defined by z = (z 1,..., z m ) = (D 1/2, O m,k m )Q X y = D 1/2 Q 1 X y, (2.6) where Q 1 is a k m matrix defined by Q = (Q 1, Q 2 ). Here, we assume that all z 1,..., z m are not 0. If z j is accidentally 0, then we set d j = 0 and use (z 1,..., z j 1, z j+1,..., z m ) as z. By using the result in Yanagihara (2013), ˆσ 2 (θ) in (2.4) and df(θ) in (2.5) can be rewritten as ˆσ 2 (θ) = 1 m ( ) n n ˆσ2 0 + θ 2 j m z 2 j d j + θ j, df(θ) = 1 + m θ j, (2.7) d j + θ j j=1 where ˆσ 2 0 is the maximum likelihood estimator of σ2 under normality, which is given by ˆσ 2 0 = 1 n y (I n J n XM + X )y. (2.8) It should be kept in mind that ˆσ 2 0 = 0 holds in most cases of the linear regression with highdimensional explanatory variables. Notice that θ j /(d j + θ j ) monotonically increases θ j, and ˆσ 2 (0 m ) = ˆσ 2 0, df(0 m) = m + 1, and j=1 lim θ 1,...,θ m ˆσ2 (θ) = ˆσ 2 = 1 n y (I n J n )y, lim df(θ) = 1. (2.9) θ 1,...,θ m Hence, the ranges of ˆσ 2 (θ) and df(θ) are given by 5
6 A Fast Algorithm for Minimizing EGCV in GRR ˆσ 2 (θ) [ ˆσ 2 0, ˆσ2 ], df(θ) [1, m + 1]. Let f (r, u) be a bivariate function which satisfies the following assumptions: (A1) (A2) (A3) f (r, u) is a continuous function at any (r, u) (0, ˆσ 2 ] [1, n). f (r, u) > 0 for any (r, u) (0, ˆσ 2 ] [1, n). f (r, u) is the first-order partially differentiable at any (r, u) (0, ˆσ 2 ] [1, n), and f r (r, u) = f (r, u) > 0, r f u (r, u) = f (r, u) > 0, u (r, u) (0, ˆσ 2 ] [1, n). Let D be the domain of f. It is easy to see that m = n 1 ˆσ 2 0 = 0. Recall that m n 1. Hence, we have ˆσ D (0, ˆσ2 ] [1, n). By using the bivariate function f, MSC for the GRR can be expressed as MSC(θ) = f ( ˆσ 2 (θ), df(θ)). (2.10) The relationship between an existing criterion and the bivariate function f is as follows: f GCp (r, u) = nr/s αu (GC p : when ˆσ 2 0 0) f (r, u) = f GCV (r, u) = r(1 u/n) 2 (GCV : when u n), (2.11) f GIC (r, u) = r exp(αu/n) (GIC) where α is some positive value expressing the strength of the penalty term, and s 2 0 is an ordinary unbiased estimator of σ 2 when m < n 1, which is given by s 2 0 = n ˆσ2 0 /(n m 1). The GC p includes several famous MSCs, e.g., the GC p coincides with the C p proposed by Mallows (1973) when α = 2, and the modified C p (MC p ) proposed by Fujikoshi and Satoh (1997) and Yanagihara et al. (2009) when α = 1 + 2/(n m 1). Moreover, the original GIC under normality is expressed as n log r+αu. The GIC in this paper is defined by an exponential transformation of the original GIC divided by n. The GIC includes several famous MSCs, e.g., the GIC coincides the AIC proposed by Akaike (1973) when α = 2, the HQC proposed by Hannan and Quinn (1978) when α = 2 log log n, and the BIC proposed by Schwarz (1978) when α = log n. In order to express an optimized GRRE of β systematically, we prepare the following class of θ for any h R + : ˆθ(h) = (ˆθ 1 (h),..., ˆθ m (h)), d j h ˆθ j (h) = z 2 j h (h z 2 j ). (h > z 2 j ) (2.12) 6
7 The GRRE of β based on ˆθ is given by Ohishi, M., Yanagihara, H. & Fujikoshi, Y. ˆβ ˆθ(ĥ) = ˆβ(ĥ) = Q 1 V (ĥ)q 1 ˆβ, (2.13) where ˆβ is the LSE of β given by (2.3), and V (h) is a kth diagonal matrix of which the jth diagonal element is given by v j (h) = d j d j + ˆθ j (h) = 1 h (h z 2 z 2 j ) j. 0 (h > z 2 j ) An optimized θ by minimizing MSC(θ) when the domain of f is a subset of (0, ˆσ 2 ] [1, n) is expressed as in the following theorem (the proof is given in Appendix A.1): Lemma 1. Let ϕ(h) = MSC( ˆθ(h)) (h R + \{0}). (2.14) Suppose that D (0, ˆσ 2 ] [1, n) and ξ > 0 s.t. ϕ(ξ) < lim h 0 ϕ(h). Then, optimized ridge parameters by minimizing the MSC(θ) are given by ˆθ(ĥ), where ĥ is given by ĥ = arg min h R + \{0} ϕ(h). Let t j ( j = 1,..., m) be the jth-order statistic of z 2 1,..., z2 m, i.e., min{z 2 1 t j =,..., z2 m} ( j = 1) min{{z 2 1,..., z2 m} \ {t 1,..., t m 1 }} ( j = 2,..., m), and let R j ( j = 0,..., m) be a range defined by (0, t 1 ] ( j = 0) R j = (t j, t j+1 ] ( j = 1,..., m 1). (2.15) (t m, ) ( j = m) By using t j and R j, we can clarify properties of ϕ(h) given in (2.14) as in the following lemma (the proof is given in Appendix A.2): Lemma 2. The ϕ(h) satisfies the following properties: (P1) ϕ(h) is a continuous at any h R + \{0}. (P2) ϕ(h) = f ( ˆσ 2, 1) for every h t m. 7
8 A Fast Algorithm for Minimizing EGCV in GRR (P3) ϕ(h) is a piecewise function as ϕ(h) = ϕ a (h) (h R a, a = 0, 1,..., m). The specific form of ϕ a (h) can be expressed as ϕ a (h) = f ( ˆσ (c ) 1,a + h 2 c 2,a )/n, 1 + m a hc 2,a, where c 1,a and c 2,a are given by c 1,a = a t j, c 2,a = j=1 m j=a+1 1 t j. (2.16) Let s 2 a (a = 0, 1,..., m) be a statistic defined by and, let a be a positive integer defined by s 2 a = n ˆσ2 0 + c 1,a n m 1 + a, (2.17) a {0, 1,..., m 1} s.t. s 2 a R a, (2.18) From Lemma 2.1 in Yanagihara (2013), we can see that a exists unique when ˆσ Let GC p (θ) = f GCp ( ˆσ 2 (θ), df(θ)), GCV(θ) = f GCV ( ˆσ 2 (θ), df(θ)). From the results in Nagai et al. (2012) and Yanagihara (2013), the minimizers of the GC p and GCV can be expressed as in the following theorem (the proof is omitted because it is easy to obtain from the results in Nagai et al., 2012; Yanagihara, 2013): Theorem 1. Optimized ridge parameters minimizing GC p (θ) and GCV(θ) are ˆθ(ĥ GCp ) and ˆθ(ĥ GCV ), respectively, where ĥ GCp and ĥ GCV are defined as s 2 ĥ GCp = 1 a 2 αs2 0 ( ˆσ2 0 0), ĥ (when ˆσ2 0 0) GCV = t 1 (when ˆσ 2 0 = 0 and m = n 1), 0 (when ˆσ 2 0 = 0 and m < n 1) where s 2 a is given by (2.17) and a is the integer given by (2.18). By using the results in Theorem 1, optimized GRREs of β by minimizing the GC p and GCV can be derived as ˆβ(ĥ GCp ) and ˆβ(ĥ GCV ), respectively. 8
9 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. 3. Algorithm to Solve the Minimization Problem of the GIC Theorem 1 in the previous section indicates that the GC p -minimization method cannot be used when ˆσ 2 0 in 2.8 is 0, i.e., most cases of the linear regression with high dimensional explanatory variables, the GCV-minimization method hardly shrink the GRRE of β towards 0 k when ˆσ 2 0 = 0. On the other hand, the GIC is one of model selection criteria that can be defined even if ˆσ 2 0 = 0 and of which the size of the penalty term can be changed freely. However, at the moment, there is no efficient algorithm to solve the minimization problem of the GIC. Hence, we propose the fast algorithm for the GIC-minimization method. Let GIC(θ) = f GIC ( ˆσ 2 (θ), df(θ)), (3.1) where f GIC (r, u) is given by (2.11), and ˆσ 2 (θ) and df(θ) are given by (2.7), and let ϕ GIC (h) = f GIC ( ˆσ 2 ( ˆθ(h)), df( ˆθ(h))) (h R + \{0}), where ˆθ(h) is given by (2.12). From Lemma 2, we can see that ϕ GIC (h) is a piecewise function as ϕ GIC (h) = ϕ GIC,a (h) (h R a, a = 0, 1,..., m), (3.2) where R a is the range given by (2.15) and ϕ GIC,a (h) is defined by { ϕ GIC,a (h) = ˆσ } { } 1 n (c 1,a + h 2 c 2,a ) exp n α(1 + m a hc 2,a). Here c 1,a and c 2,a (a = 0, 1,..., m) are given by (2.16). Furthermore, we define the function ψ GIC (h) (h (0, t m ]) which is a piecewise function as where ψ GIC,a (h) is defined by ψ GIC (h) = ψ GIC,a (h) (h R a, a = 0, 1,..., m 1), (3.3) ψ GIC,a (h) = c 2,a exp{α(1 + m a hc 2,a )/n} h ϕ GIC,a(h) n 2 = αc 2,a h 2 + 2nh α(n ˆσ c 1,a). (3.4) The minimizer of GIC can be expressed as in the following theorem (the proof is given in Appendix A.3): Theorem 2. equation ψ GIC,a (h) = 0 as Let ξ GIC,a be one of the two distinct roots or the double root of the quadratic 9
10 A Fast Algorithm for Minimizing EGCV in GRR n ξ GIC,a = n 2 α 2 c 2,a (n ˆσ c 1,a) αc 2,a, (3.5) and let A GIC and T GIC be sets defined by A GIC = { } a {0, 1,..., m 1} ξ GIC,a R a, {t m } TGIC = { } (when ˆσ 2 > 2t m /α) (when ˆσ 2 2t m /α), (3.6) where ˆσ 2 is given by (2.9). We define a set of candidate minimizers of ϕ GIC (h) as S GIC = {ξ GIC,a } T GIC. (3.7) a A GIC Then, an optimized ridge parameters by minimizing GIC(θ) are given by ˆθ(ĥ GIC ), where ĥ GIC is given by arg min ϕ GIC (h) (when ˆσ 2 ĥ GIC = h S 0 0) GIC. (3.8) 0 (when ˆσ 2 0 = 0) It should be emphasized that elements of S GIC can be written by closed forms, and the number of elements of S GIC is equal to or smaller than m + 1, i.e., #(S GIC ) m + 1. Hence, when ˆσ 2 0 0, by using the results in Theorem 2, we can optimize θ quickly as follows: A Fast Algorithm to Derive the Optimized GRRE of β by Minimizing the GIC (1) Calculate elements of S GIC by using ξ GIC,a, A GIC and T GIC. (2) Determine ĥ GIC by comparing function values of ϕ GIC (h) at the points in S GIC. (3) Calculate the optimized GRRE of β by minimizing the GIC as ˆβ(ĥ GIC ), where ˆβ(h) is given by (2.13). Unfortunately, Theorem 2 also indicates that the optimized GRRE of β by minimizing GIC is always equal to the LSE of β when ˆσ 2 0 = 0. Thus, we cannot help but say that the GICminimization method does not work well in the case of high-dimensional explanatory variables. When ˆσ 2 0 0, we cannot derive #(S GIC) without comparing function values of ϕ GIC (h). However, #(S GIC ) becomes 1 under the specific α when ˆσ The result is summarized as in the following corollary (the proof is given as Appendix A.4): Corollary 1. Suppose that α n/m and ˆσ Let a be a positive integer defined by a {0, 1,..., m 1} s.t. ξ GIC,a R a. 10
11 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. The integer a exists unique when ˆσ 2 2t m /α. Then ĥ GIC in (3.8) can be rewritten as ξ GIC,a (when ˆσ 2 2t m /α) ĥ GIC = t m (when ˆσ 2 > 2t m /α). 4. Algorithm to Solve the Minimization Problem of the EGCV Theorem 2 in the previous section indicates that the GIC-minimization method does not shrink the GRRE of β towards 0 k at all when ˆσ 2 0 in (2.8) is 0, i.e., most cases of the linear regression with high dimensional explanatory variables. From Theorems 1 and 2, we would have to say regrettably that there are no existing MSCs which work well when ˆσ 2 0 = 0. Hence, we have to propose new MSC which works well even when ˆσ 2 0 = 0. On the other hand, the cause of the problem in GCV is that the strength of the penalty to a complexity of a model is small. Hence, we extend the GCV so that the size of the penalty to a complexity of a model can be changed freely. It is called the extended GCV (EGCV) in this paper. Let f EGCV (r, u) = Then, the EGCV for the GRR is defined as r, α > 2 and u < n. (4.1) (1 u/n) α EGCV(θ) = f EGCV ( ˆσ 2 (θ), df(θ)), (4.2) where ˆσ 2 (θ) and df(θ) are given by (2.7). It is easy to see that the EGCV coincides with the GCV if α = 2. Let ϕ EGCV (h) = f EGCV ( ˆσ 2 ( ˆθ(h)), df( ˆθ(h))) (h R + ), where ˆθ(h) is given by (2.12). From Lemma 2, we can see that ϕ EGCV (h) is a piecewise function as ϕ EGCV (h) = ϕ EGCV,a (h) (h R a, a = 0, 1,..., m), (4.3) where R a is the range given by (2.15) and ϕ EGCV,a (h) is defined by ϕ EGCV,a (h) = ˆσ2 0 + c 1,a/n + h 2 c 2,a /n (b + a/n + hc 2,a /n) α. (4.4) Here c 1,a and c 2,a (a = 0, 1,..., m) are given by (2.16), and b is the constant defined by b = 1 1 (m + 1). (4.5) n Furthermore, we define the function ψ EGCV (h) (h (0, t m ]) which is a piecewise function as 11
12 where ψ EGCV,a (h) is defined by A Fast Algorithm for Minimizing EGCV in GRR ψ EGCV (h) = ψ EGCV,a (h) (h R a, a = 0, 1,..., m 1), (4.6) ψ EGCV,a (h) = n2 (b + a/n + c 2,a h/n) α+1 c 2,a h ϕ EGCV,a(h) = (α 2)c 2,a h 2 + 2(a + nb)h α(n ˆσ c 1,a). (4.7) The minimizer of EGCV can be expressed as in the following theorem (the proof is given in Appendix A.5): Theorem 3. equation ψ EGCV,a = 0 as Let ξ EGCV,a be one of the two distinct roots or the double root of the quadratic (a + nb) ξ EGCV,a = (a + nb) 2 α(α 2)c 2,a (n ˆσ c 1,a) (α 2)c 2,a, and let A EGCV and T EGCV be sets defined by A EGCV = { } {t m } (when ˆσ 2 > 2(1 n 1 )t m /α) a {0, 1,..., m 1} ξ EGCV,a R a, TEGCV = { } (when ˆσ 2 2(1 n 1 )t m /α), where ˆσ 2 is given by (2.9). We define a set of candidate minimizers of ϕ EGCV (h) as S EGCV = {ξ EGCV,a } T EGCV. (4.8) a A EGCV Then, an optimized ridge parameters by minimizing EGCV(θ) are given by ˆθ(ĥ EGCV ), where ĥ EGCV is given by arg min ϕ EGCV (h) (when ˆσ 2 ĥ EGCV = h S 0 0 or b = 0) EGCV. (4.9) 0 (when ˆσ 2 0 = 0 and b 0) It should be emphasized that elements of S EGCV can be written by closed forms, and the number of elements of S EGCV is equal to or smaller than m+1, i.e., #(S EGCV ) m+1. Hence, when ˆσ or b = 0, by using the results in Theorem 3, we can optimize β quickly as follows: A Fast Algorithm to Derive the Optimized GRRE of β by Minimizing the EGCV (1) Calculate elements of S EGCV by using ξ EGCV,a, A EGCV and T EGCV. (2) Determine ĥ EGCV by comparing function values of ϕ EGCV (h) at the points in S EGCV. 12
13 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. (3) Calculate the optimized GRRE of β by minimizing the EGCV as ˆβ(ĥ EGCV ), where ˆβ(h) is given by (2.13). When ˆσ or m = n 1, we cannot derive #(S EGCV) without comparing function values of ϕ EGCV (h). However, #(S EGCV ) becomes 1 under the specific α when ˆσ or m = n 1. The result is summarized as in the following corollary (the proof is given as Appendix A.6): Corollary 2. Suppose that α (n + m 1)/m, and { ˆσ 2 0 0} {b = 0}. Let a be a positive integer defined by a {0, 1,..., m 1} s.t. ξ EGCV,a R a. The integer a exists unique when ˆσ 2 2(1 n 1 )t m /α. Then ĥ EGCV in (4.9) can be rewritten as ξ EGCV,a (when ˆσ 2 2(1 n 1 )t m /α) ĥ EGCV = t m (when ˆσ 2 > 2(1 n 1 )t m /α). 5. Numerical Study In this section, we compared performances of the GC p -, GIC- and EGCV-minimization methods with α = 2, 2 log log n and log n by conducting numerical examinations. We generated data from the simulation model N n (Xβ, I n ), where X = (I n J n )X 0 Φ(ρ) 1/2 and β = M + X η. Here X 0 is an n k matrix of which elements are identically and independently distributed according to U( 1, 1), Φ(ρ) is a k k symmetric matrix of which the (i, j)th element is ρ i j and η is an n-dimensional vector of which the jth element is given by ( 12n(n 1) {( 1) j 1 1 j 1 ) 1 }. 4n 2 + 6n 1 n 2n The simulation model is the same as that in Yanagihara (2013). Let ˆθ be an optimized ridge parameters by minimizing MSC, ŷ be the predictive value of y based on the LSE of β and ŷ ˆθ be the predictive value of y in the GRR based on θ, i.e., ŷ = (J n + XM + X )y, ŷ ˆθ = (J n + XM +ˆθ X )y. We used the following relative mean square error (MSE) as a measurement of performance of each method. E [(ŷ θ Xβ) (ŷ θ Xβ)] E [(ŷ Xβ) (ŷ Xβ)] 100(%) Notice that E [(ŷ Xβ) (ŷ Xβ)] = m+1 in this case. The above expectation was evaluated by Monte Carlo simulation with 10,000 iterations. 13
14 A Fast Algorithm for Minimizing EGCV in GRR Relative MSE (%) EGCV GIC GCp The number of explanatory variables (a) Case of α = 2 Relative MSE (%) EGCV GIC GCp The number of explanatory variables (b) Case of α = 2 log log n Relative MSE (%) EGCV GIC GCp The number of explanatory variables (c) Case of α = log n Figure 1. Case of n = 50 14
15 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. Relative MSE (%) EGCV GIC GCp The number of explanatory variables (a) Case of α = 2 Relative MSE (%) EGCV GIC GCp The number of explanatory variables (b) Case of α = 2 log log n Relative MSE (%) EGCV GIC GCp The number of explanatory variables (c) Case of α = log n Figure 2. Case of n =
16 A Fast Algorithm for Minimizing EGCV in GRR Relative MSE (%) EGCV GIC GCp The number of explanatory variables (a) Case of α = 2 Relative MSE (%) EGCV GIC GCp The number of explanatory variables (b) Case of α = 2 log log n Relative MSE (%) EGCV GIC GCp The number of explanatory variables (c) Case of α = log n Figure 3. Case of n =
17 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. Figures 1, 2 and 3 shows the results when n = 50, 200 and 500, respectively, with ρ = 0.99 and k = k 1,..., k 16, where n(4 + j)/10 ( j = 1,..., 5) k j = n 2 ( j = 6). n{1 + ( j 6)/10} ( j = 7,..., 16) In the figures, there were no results of GC p -minimization methods when k = k 7,..., k 16 because the GC p was not definable when k k 7. From figures, we can see that the performances of the EGCV-minimization method with α = log n were well in most cases. On the other hand, the performances of the GIC-minimization method were very bad when k k 7 because the GIC-minimization method dose not shrink the GRRE of β towards 0 k at all when k k 7. Also, the performances of the GCV-minimization method were also bad when k k 7 because the GCV-minimization method hardly shrink the GRRE of β towards 0 k when k k 7. Moreover, as for the number of candidate minimizers, the following results were derived: mode {#(S GIC )} = 2, min {#(S GIC )} = 2, max {#(S GIC )} = 7, and 3 (k n 1 & α = 2 log log n) mode {#(S EGCV )} =, 2 (otherwise) 3 (k < n 1) min {#(S GIC )} = 2, max {#(S EGCV )} = 12 (k n 1). Hence our algorithm will be efficient because the number of candidate minimizers was small. Acknowledgment The authors thank Dr. Shintaro Hashimoto, Mr. Tomoyuki Nakagawa, Mr. Ryoya Oda and Mr. Shota Ochiai, of Hiroshima University, for helpful comments. References Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory (eds. B. N. Petrov & F. Csáki), pp Akadémiai Kiadó, Budapest. Atkinson, A. C. (1980). A note on the generalized information criterion for choice of a model. Biometrika, 67, Craven, P. & Wahba, G. (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31, Fujikoshi, Y. and Satoh, K. (1997). Modified AIC and C p in multivariate linear regression. Biometrika, 84, Hannan, E. J. & Quinn, B. G. (1979). The determination of the order of an autoregression. J. Roy. Stat. Soc. Ser. B, 41,
18 A Fast Algorithm for Minimizing EGCV in GRR Harville, D. A. (1997). Matrix Algebra from a Statistician s Perspective. Springer-Verlag, New York. Hoerl, A. E. & Kennard, R. W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12, Kullback, S. & Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Statistics, 22, Lawless, J. F. (1981). Mean squared error properties of generalized ridge regression. J. Amer. Statist. Assoc., 76, Liu, Z., Shen, Y. & Ott, J. (2011). Multilocus association mapping using generalized ridge logistic regression. BMC Bioinformatics, 12, Mallows, C. L. (1973). Some comments on C p. Technometrics, 15, Nagai, I., Yanagihara, H. & Satoh, K. (2012). Optimization of ridge parameters in multivariate generalized ridge regression by plug-in methods. Hiroshima Math. J., 42, Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist., 12, Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist., 6, Shen, X., Alam, M., Fikse, F. & Rönnegård, L. (2013). A novel generalized ridge regression method for quantitative genetics. Genetics, 193, Walker, S. G. & Page, C. J. (2001). Generalized ridge regression and a generalization of the C p statistic. J. Appl. Statist., 28, Yanagihara, H. (2013). Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. TR-No , Hiroshima Statistical Research Group, Hiroshima University. Yanagihara, H., Nagai, I. & Satoh, K. (2009). A bias-corrected C p criterion for optimizing ridge parameters in multivariate generalized ridge regression. Japanese J. Appl. Statist., 38, (in Japanese). Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc., 93, Appendix A.1. Proof of Lemma 1 Let δ = (δ 1,..., δ m ) be an m-dimensional vector of which the jth element δ j [0, 1] ( j = 1,..., m) is defined by δ j = θ j d j + θ j, where θ 1,..., θ m are ridge parameters, and d j is given by (2.1). Since θ δ is one to one, we minimize MSC via δ instead of θ. By using δ, we rewrite ˆσ 2 (θ) and df(θ) in (2.7) as a function with respect to δ as ˆσ 2 (θ) = r(δ) = 1 m n n ˆσ2 0 + j=1 δ 2 j z2 j, m df(θ) = u(δ) = 1 + m δ j, j=1 (A.1) where z 2 j and ˆσ2 0 are given by (2.6) and (2.8), respectively. By using r(δ) and u(δ), we rewrite MSC(θ) in (2.10) as a function with respect to δ as MSC(θ) = g(δ) = f (r(δ), u(δ)). (A.2) 18
19 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. Since D (0, ˆσ 2 ] [1, n), we know that δ 0 m. From the assumption of f, we have f r (r(δ), u(δ)) > 0 and f u (r(δ), u(δ)) > 0, where f r (r, u) and f u (r, u) are the first partial derivatives of f with respect to r and u, respectively. Let τ(δ) = n f u (r(δ), u(δ)) 2 f r (r(δ), u(δ)). It is clear that τ(δ) > 0 when D (0, ˆσ 2 ] [1, n). Notice that g(δ) = r(δ) f (r, u) + u(δ) f (r, u) δ j δ j r (r,u)=(r(δ),u(δ)) δ j u (r,u)=(r(δ),u(δ)) = 1 n 2z2 j δ j f r (r(δ), u(δ)) f u (r(δ), u(δ)) = 2 n z2 j f r (r(δ), u(δ)) δ j τ(δ) z 2. j (A.3) Let δ = (δ 1,..., δ m) be the minimizer of g(δ), i.e., δ = arg min g(δ), δ [0,1] m \{0 m } where [0, 1] m is the mth Cartesian power of the set [0, 1]. From (A.3), necessary condition of δ is given by Let G be a set defined by τ(δ ) (τ(δ ) z 2 δ j = z 2 j ) j ( j = 1,..., m). 1 (τ(δ ) > z 2 j ) G = { δ = (δ 1,..., δ m ) [0, 1] m δ = ˆδ(h), h R + \{0} }, where ˆδ(h) is the m-dimensional vector of which jth element is defined by h (h z 2 ˆδ j (h) = z 2 j ) j ( j = 1,..., m). (A.4) 1 (h > z 2 j ) It follows from the fact G [0, 1] m \{0 m } that g(δ ) = Moreover, it is clear that δ G. This implies that min g(δ) min g(δ) = min g( ˆδ(h)). δ [0,1] m \{0 m } δ G h R + \{0} g(δ ) min g( ˆδ(h)). h R + \{0} 19
20 Therefore, we have δ = ˆδ(ĥ), where A Fast Algorithm for Minimizing EGCV in GRR ĥ = arg min h R + \{0} g( ˆδ(h)). Recall that g(δ) = MSC(θ) and d j δ j (if δ j 1) θ j = 1 δ j (if δ j = 1). Moreover, ˆβ θ is rewitten as ˆβ θ = (X X + QΘQ ) + X y = Q(D + Θ) + Q Xy = Q(D + Θ) + DQ QD + Q X y = QV Q ˆβ, where V = (D + Θ) + D and ˆβ is the LSE of β given by (2.3). When θ 0 m, the jth diagonal element of V is given by d j (when θ j < ) v j = d j + θ j. 0 (when θ j = ) It follows from the simple calculation that d j /{d j +ˆθ j (h)} = 1 h/z 2 j when h z2 j. Consequently, Lemma 1 is proved. A.2. Proof of Lemma 2 From (A.2), we can see that ϕ(h) = MSC( ˆθ(h)) = f (r( ˆδ(h)), u( ˆδ(h))), where ˆθ(h) is given by (2.12), ˆδ(h) are given by (A.4), and r(δ) and u(δ) are given by (A.1), respectively. Since t 1,..., t m are order statistics of z 2 1,..., z2 m, we have r( ˆδ(h)) = ˆσ m n I(z2 j h) h 2 1 z z 2 j = ˆσ m { I(t j h) n j=1 j j=1 m u( ˆδ(h)) = 1 + m I(z2 j h) h 1 z m { = 1 + m I(t j h) j j=1 j=1 ( ) 2 h 1 + 1} t j, t j ( ) } h 1 + 1, t j where I(x h) is an indicator function, i.e., I(x h) = 1 if x h and I(x h) = 0 if x < h. It is clear that r( ˆδ(h)) and u( ˆδ(h)) are continuous functions on h R + \{0}. Therefore, the property (P1) is proved. Notice that when h R a, we have 20
21 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. r( ˆδ(h)) = r a ( ˆδ(h)) = ˆσ a m h 2 n t j + t j=1 j=a+1 j = ˆσ n (c 1,a + h 2 c 2,a ), a m 1 u( ˆδ(h)) = u a ( ˆδ(h)) = 1 + m 1 + = 1 + m a hc 2,a, j=1 t j=a+1 j where c 1,a and c 2,a are given by (2.16). The property (P3) is proved by this equation. Notice that c 2,m = 0 and n ˆσ c 1,m = n ˆσ z z = y (I n J n XM + X )y + y XM + X y = n ˆσ 2. These imply that r m ( ˆδ(h)) = ˆσ 2 and u m ( ˆδ(h)) = 1 whem h R m. Hence, the property (P2) is proved. A.3. Proof of Theorem 2 At first, we derive the minimizer of GIC(θ) in (3.1) when ˆσ 2 0 in (2.8) is 0. It is easy to see that ˆσ 2 (0 m ) = 0 when ˆσ 2 0 = 0, and f GIC(0, u) = 0 holds for any u [1, n], where ˆσ(θ) is given by (2.4) and f GIC (r, u) is given by (2.11). Moreover, f GIC (r, u) > 0 holds for any (r, u) (0, ˆσ 2 ] [1, n], where ˆσ 2 is given by (2.9). Notice that ˆσ 2 (θ) = 0 holds if and only if θ = 0 m and ˆσ 2 0 = 0, and ˆθ(0) = 0 m holds, where ˆθ(h) is given by (2.12). These results imply that the minimizer of GIC(θ) is ˆθ(0) when ˆσ 2 0 = 0. Next, we derive the minimizer of GIC(θ) when ˆσ Then the domain of f GIC is included in (0, ˆσ 2 ] [1, n). Moreover, lim ψ GIC,0(h) = α ˆσ 2 0 < 0, h 0 where ψ GIC,a (h) is given by (3.4). This indicates that some positive number ξ exists such that ϕ GIC (ξ) < lim h 0 ϕ GIC (h), where ϕ GIC (h) is given by (3.2). Therefore, by using Lemma 1, the minimizer of GIC(θ) is given by ˆθ(ĥ GIC ), where ĥ GIC is the minimizer of ϕ GIC (h) as ĥ GIC = arg min h R + \{0} ϕ GIC(h). Hence, in order to prove Theorem 2 when ˆσ 2 0 0, it is enough just to show ĥ GIC S GIC, where S GIC is a set of candidate minimizers of ϕ GIC (h) given by (3.7). From (P1) and (P2) in Lemma 2, we can see that ϕ GIC (h) is continuous at any h R + \{0} and ϕ GIC (h) = ˆσ 2 exp(α/n) when h R m. The ψ GIC (h) in (3.3) is also continuous at any h (0, t m ] because of ψ GIC,a (t a+1 ) = αc 2,a ta nt a+1 α(n ˆσ c 1,a) ( = α c 2,a ) ta+1 2 t + 2nt a+1 α ( n ˆσ c ) 1,a+1 t a+1 a+1 = αc 2,a+1 t 2 a+1 + 2nt a+1 α(n ˆσ c 1,a+1) = ψ GIC,a+1 (t a+1 ) (a = 0,..., m 2), 21
22 A Fast Algorithm for Minimizing EGCV in GRR where c 1,a and c 2,a are given by (2.16). It should be emphasized that the sign of ϕ GIC,a (h)/ h is equal to that of ψ GIC,a (h). Hence, we have the necessary condition of ĥ GIC as {{ ψgic (ĥ GIC ) = 0 } { ϵ 0 > 0 s.t. ϵ (0, ϵ 0 ), ψ GIC (ĥ GIC ϵ) < 0 }} {ĥgic = t m s.t. ψ GIC,m 1 (t m ) < 0 } (A.5). Notice that ψ GIC,m 1 (t m ) = α t m t 2 m + 2nt m α(n ˆσ 2 t m ) = 2nt m αn ˆσ 2. Hence, we derive ψ GIC,m 1 (t m ) < 0 ˆσ 2 < 2 α t m. (A.6) Recall that ψ GIC,a (h) is the concave quadratic function. Hence, we have { ψgic (ĥ GIC ) = 0 } { ϵ 0 > 0 s.t. ϵ (0, ϵ 0 ), ψ GIC (ĥ GIC ϵ) < 0 } { ĥ GIC is the smaller of the two real distinct roots or the double (A.7) root of ψ GIC,a (h) = 0 which is included in R a (a = 0, 1,..., m 1) }. Notice that ξ GIC,a in (3.5) is the smaller of the two real distinct roots or the double root of ψ GIC,a (h) = 0 when α 2 c 2,a (n ˆσ 2 0 +c 1,a) n 2. Therefore, from (A.5), (A.6) and (A.7), ĥ GIC S GIC can be shown. Consequently, Theorem 2 is proved. A.4. Proof of Corollary 1 It follows from (3.4) that ψ GIC,a (h) = αc 2,a ( h n αc 2,a ) 2 + n2 αc 2,a α(n ˆσ c 1,a), where ˆσ 2 0 is given by (2.8), and c 1,a and c 2,a are given by (2.16). This indicates that the h- coordinate of the vertex of ψ GIC,a (h) is n/(αc 2,a ). It follows from the equation t 1 t m that for any a {0, 1,..., m 1}, Hence, we have c a,2 = m j=a+1 1 t j m j=a+1 1 = m a m. t a+1 t a+1 t a+1 n n αc 2,a αm t a+1 (a = 0, 1,..., m 1). (A.8) From this result, we can see that n/(αc 2,a ) t a+1 if α n/m. This indicates that ψ GIC,a (h) is a strictly increasing function at h R a in (2.15) if α n/m because the h-coordinate of the vertex of ψ GIC,a (h) is larger or equal to t a+1, which is the right end point of R a. Hence, it should 22
23 Ohishi, M., Yanagihara, H. & Fujikoshi, Y. be emphasized that ψ GIC (h) is a strictly increasing function at h (0, t m ] when α n/m. Recall that ψ GIC (h) is continuous at any h (0, t m ] and lim h 0 ψ GIC,0 (h) < 0 when ˆσ 2 0 0, where ˆσ2 0 is given by (2.8). Therefore, when ˆσ and α n/m, we have ψ GIC,m 1 (t m ) 0 = the unique solution of ψ GIC (h) = 0 exisits in h (0, t m ] {#(A GIC ) = 1} {T GIC = }, ψ GIC,m 1 (t m ) < 0 = there is no solution of ψ GIC (h) = 0 in h (0, t m ] {A GIC = } {T GIC = {t m }}, where A GIC and T GIC are given by (3.6). From the above results and (A.6), Corollary 1 can be proved. A.5. Proof of Theorem 3 At first, we derive the minimizer of EGCV(θ) in (4.2) when ˆσ 2 0 = 0 and b 0, where ˆσ 2 0 and b are given by (2.8) and (4.5), respectively. It is easy to see that ˆσ2 (0 m ) = 0 when ˆσ 2 0 = 0, df(θ) < n when b 0, and f EGCV(0, u) = 0 holds for any u [1, n), where ˆσ 2 (θ) is given by (2.4) and f EGCV (r, u) is given by (4.1). Moreover, f EGCV (r, u) > 0 holds for any (r, u) (0, ˆσ 2 ] [1, n), where ˆσ 2 is given by (2.9). Notice that ˆσ 2 (θ) = 0 holds if and only if θ = 0 m and ˆσ 2 0 = 0, and ˆθ(0) = 0 m holds, where ˆθ(h) is given by (2.12). These results imply that the minimizer of EGCV(θ) is ˆθ(0) when ˆσ 2 0 = 0 and b 0. Next, we consider when ˆσ or b = 0. Since r < n in f EGCV(r, u), it is necessary to satisfy the inequality df(θ) < n, where df(θ) is given by (2.5). From (2.7), we can see that df(θ) < n θ 0 m ˆσ 2 (θ) 0, where ˆσ 2 (θ) is given by (2.4). This indicates that the domain of f EGCV is included in (0, ˆσ 2 ] [1, n). Notice that ϕ EGCV,0 (h) = ˆσ2 0 + h2 c 2,0 /n (b + hc 2,0 /n) α, ψ EGCV,0(h) = (α 2)c 2,0 h 2 + 2nbh αn ˆσ 2 0, where ϕ EGCV,a (h), ψ EGCV,a (h) and c 2,a are given by (4.4), (4.7) and (2.16), respectively. Recall that b = 0 ˆσ 2 0 = 0. It follows from the above equations and the assumption α > 2 that lim ϕ ˆσ 2 0 EGCV,0(h) = /bα ( ˆσ 2 0 0) αn ˆσ 2 0, lim ψ EGCV,0 (h) = < 0 ( ˆσ2 0 0). h 0 (b = 0) h 0 0 (b = 0) These results imply that some positive number ξ exists such that ϕ EGCV (ξ) < lim h 0 ϕ EGCV (h) when ˆσ or b = 0, where ϕ EGCV(h) is given by (4.3). Therefore, by using Lemma 1, the minimizer of EGCV(θ) is given by ˆθ(ĥ EGCV ), where ĥ EGCV is the minimizer of ϕ EGCV (h) as ĥ EGCV = arg min h R + \{0} ϕ EGCV(h). 23
24 A Fast Algorithm for Minimizing EGCV in GRR From (P1) and (P2) in Lemma 2, we can see that ϕ EGCV (h) is continuous at any h R + \{0} and ϕ EGCV (h) = ˆσ 2 /(1 n 1 ) α when h R m, where ˆσ 2 is given by (2.9). The ψ EGCV (h) which is given by (4.6) is also continuous at any h (0, t m ] because of ψ EGCV,a (t a+1 ) = (α 2)c 2,a ta (a + nb)t a+1 α(n ˆσ c 1,a) ( = (α 2) c 2,a ) ta+1 2 t + 2(a + nb)t a+1 α ( n ˆσ c ) 1,a+1 t a+1 a+1 = (α 2)c 2,a+1 t 2 a+1 + 2(a nb)t a+1 α(n ˆσ c 1,a+1) = ψ EGCV,a+1 (t a+1 ) (a = 0,..., m 2), where c 1,a is given by (2.16). Notice that ψ EGCV,m 1 (t m ) = α 2 t m t 2 m + 2(n 2)t m α(n ˆσ 2 t m ) = 2(n 1)t m αn ˆσ 2. Hence, by the same way as in the proof of Theorem 2, we can show that a set of candidate minimizers of ϕ GIC (h) is given by S EGCV in (4.8). Consequently, Theorem 3 is proved. A.6. Proof of Corollary 2 It follows from (4.7) that ψ EGCV,a (h) = (α 2)c 2,a { h a + nb (α 2)c 2,a } 2 + (a + nb)2 (α 2)c a,2 α(n ˆσ c 1,a), where ˆσ 2 0 and b are given by (2.8) and (4.5), respectively, and c a,1 and c a,2 are given by (2.16). This indicates that the h-coordinate of the vertex of ψ EGCV,a (h) is (a + nb)/{(α 2)c a,2 }. From (A.8), we derive a + nb a + nb (α 2)c 2,a (α 2)m t a+1 n m 1 (α 2)m t a+1 (a = 0, 1,..., m 1). From this result, we can see that (a+nb)/{(α 2)c 2,a } t a+1 if α (n+m 1)/m. This indicates that ψ EGCV,a (h) is a strictly increasing function at h R a in (2.15) if α (n + m 1)/m because the h-coordinate of the vertex of ψ EGCV,a (h) is larger or equal to t a+1, which is the right end point of R a. It should be emphasized that ψ GIC (h) is a strictly increasing function at h (0, t m ] when α (n + m 1)/m. Hence, by the same way as in the proof of Corollary 1, we can prove Corollary 2. 24
Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions
TR-No. 14-07, Hiroshima Statistical Research Group, 1 12 Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 2, Mariko Yamamura 1 and Hirokazu Yanagihara 2 1
More informationComparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions
c 215 FORMATH Research Group FORMATH Vol. 14 (215): 27 39, DOI:1.15684/formath.14.4 Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 1,
More informationAn Unbiased C p Criterion for Multivariate Ridge Regression
An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University
More informationBias-corrected AIC for selecting variables in Poisson regression models
Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,
More informationConsistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis
Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,
More informationSelection of Model Selection Criteria for Multivariate Ridge Regression
Selection of Model Selection Criteria for Multivariate Ridge Regression (Last Modified: January, 24) Isamu Nagai, Department of Mathematics, Graduate School of Science, Hiroshima University -3- Kagamiyama,
More informationBias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition
Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems
More informationConsistency of test based method for selection of variables in high dimensional two group discriminant analysis
https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationIllustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives
TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui
More informationVariable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension
Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3
More informationA Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification
TR-No. 13-08, Hiroshima Statistical Research Group, 1 23 A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification
More informationA fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables
A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables Ryoya Oda and Hirokazu Yanagihara Department of Mathematics,
More informationRidge Regression Revisited
Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge
More informationA high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration
A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration Ryoya Oda, Yoshie Mima, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationConsistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis
Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Tetsuro Sakurai and Yasunori Fujikoshi Center of General Education, Tokyo University
More information1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa
On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University
More informationOn Properties of QIC in Generalized. Estimating Equations. Shinpei Imori
On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationModfiied Conditional AIC in Linear Mixed Models
CIRJE-F-895 Modfiied Conditional AIC in Linear Mixed Models Yuki Kawakubo Graduate School of Economics, University of Tokyo Tatsuya Kubokawa University of Tokyo July 2013 CIRJE Discussion Papers can be
More informationSelection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models
Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation
More informationJackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification
Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification (Last Modified: May 22, 2012) Yusuke Hashiyama, Hirokazu Yanagihara 1 and Yasunori
More informationRobust model selection criteria for robust S and LT S estimators
Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often
More informationRegularized Multiple Regression Methods to Deal with Severe Multicollinearity
International Journal of Statistics and Applications 21, (): 17-172 DOI: 1.523/j.statistics.21.2 Regularized Multiple Regression Methods to Deal with Severe Multicollinearity N. Herawati *, K. Nisa, E.
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationImproved Liu Estimators for the Poisson Regression Model
www.ccsenet.org/isp International Journal of Statistics and Probability Vol., No. ; May 202 Improved Liu Estimators for the Poisson Regression Model Kristofer Mansson B. M. Golam Kibria Corresponding author
More informationVariable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1
Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr
More informationDepartment of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan.
High-Dimensional Properties of Information Criteria and Their Efficient Criteria for Multivariate Linear Regression Models with Covariance Structures Tetsuro Sakurai and Yasunori Fujikoshi Center of General
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationKen-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan
A STUDY ON THE BIAS-CORRECTION EFFECT OF THE AIC FOR SELECTING VARIABLES IN NOR- MAL MULTIVARIATE LINEAR REGRESSION MOD- ELS UNDER MODEL MISSPECIFICATION Authors: Hirokazu Yanagihara Department of Mathematics,
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationStatistics for analyzing and modeling precipitation isotope ratios in IsoMAP
Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable
More informationMultivariate Regression Modeling for Functional Data
Journal of Data Science 6(2008), 313-331 Multivariate Regression Modeling for Functional Data Hidetoshi Matsui 1, Yuko Araki 2 and Sadanori Konishi 1 1 Kyushu University and 2 Kurume University Abstract:
More informationCOMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR
Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study
More informationMeasuring Local Influential Observations in Modified Ridge Regression
Journal of Data Science 9(2011), 359-372 Measuring Local Influential Observations in Modified Ridge Regression Aboobacker Jahufer 1 and Jianbao Chen 2 1 South Eastern University and 2 Xiamen University
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationSTRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION
STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION AND By Zhidong Bai,, Yasunori Fujikoshi, and Jiang Hu, Northeast Normal University and Hiroshima University
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationLeast Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006
Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationEFFICIENCY of the PRINCIPAL COMPONENT LIU- TYPE ESTIMATOR in LOGISTIC REGRESSION
EFFICIENCY of the PRINCIPAL COMPONEN LIU- YPE ESIMAOR in LOGISIC REGRESSION Authors: Jibo Wu School of Mathematics and Finance, Chongqing University of Arts and Sciences, Chongqing, China, linfen52@126.com
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationAsymptotic inference for a nonstationary double ar(1) model
Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More informationWeek 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend
Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 :
More informationarxiv: v3 [stat.me] 4 Jan 2012
Efficient algorithm to select tuning parameters in sparse regression modeling with regularization Kei Hirose 1, Shohei Tateishi 2 and Sadanori Konishi 3 1 Division of Mathematical Science, Graduate School
More informationON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE
Sankhyā : The Indian Journal of Statistics 999, Volume 6, Series B, Pt. 3, pp. 488 495 ON D-OPTIMAL DESIGNS FOR ESTIMATING SLOPE By S. HUDA and A.A. AL-SHIHA King Saud University, Riyadh, Saudi Arabia
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationModel Selection for Semiparametric Bayesian Models with Application to Overdispersion
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationAPPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA
Journal of Research (Science), Bahauddin Zakariya University, Multan, Pakistan. Vol.15, No.1, June 2004, pp. 97-106 ISSN 1021-1012 APPLICATION OF RIDGE REGRESSION TO MULTICOLLINEAR DATA G. R. Pasha 1 and
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationVariable selection in linear regression through adaptive penalty selection
PREPRINT 國立臺灣大學數學系預印本 Department of Mathematics, National Taiwan University www.math.ntu.edu.tw/~mathlib/preprint/2012-04.pdf Variable selection in linear regression through adaptive penalty selection
More informationNonparametric Regression. Badr Missaoui
Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationRidge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors
Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors James Younker Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment
More informationEstimation of parametric functions in Downton s bivariate exponential distribution
Estimation of parametric functions in Downton s bivariate exponential distribution George Iliopoulos Department of Mathematics University of the Aegean 83200 Karlovasi, Samos, Greece e-mail: geh@aegean.gr
More informationOn the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression
Working Paper 2016:1 Department of Statistics On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression Sebastian
More informationTheorems. Least squares regression
Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics
More information2.2 Classical Regression in the Time Series Context
48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression
More informationAn Information Criteria for Order-restricted Inference
An Information Criteria for Order-restricted Inference Nan Lin a, Tianqing Liu 2,b, and Baoxue Zhang,2,b a Department of Mathematics, Washington University in Saint Louis, Saint Louis, MO 633, U.S.A. b
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationThe Multiple Regression Model Estimation
Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:
More informationHigh-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi
High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,
More informationDiscrepancy-Based Model Selection Criteria Using Cross Validation
33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationMEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS
Advances in Adaptive Data Analysis Vol. 2, No. 4 (2010) 451 462 c World Scientific Publishing Company DOI: 10.1142/S1793536910000574 MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS STAN LIPOVETSKY
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More informationOrder Selection for Vector Autoregressive Models
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 2, FEBRUARY 2003 427 Order Selection for Vector Autoregressive Models Stijn de Waele and Piet M. T. Broersen Abstract Order-selection criteria for vector
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationHolzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure
Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Sonderforschungsbereich 386, Paper 478 (2006) Online unter: http://epub.ub.uni-muenchen.de/
More informationOptimum designs for model. discrimination and estimation. in Binary Response Models
Optimum designs for model discrimination and estimation in Binary Response Models by Wei-Shan Hsieh Advisor Mong-Na Lo Huang Department of Applied Mathematics National Sun Yat-sen University Kaohsiung,
More informationFöreläsning /31
1/31 Föreläsning 10 090420 Chapter 13 Econometric Modeling: Model Speci cation and Diagnostic testing 2/31 Types of speci cation errors Consider the following models: Y i = β 1 + β 2 X i + β 3 X 2 i +
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationEstimating prediction error in mixed models
Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models
More informationLarge Sample Theory For OLS Variable Selection Estimators
Large Sample Theory For OLS Variable Selection Estimators Lasanthi C. R. Pelawa Watagoda and David J. Olive Southern Illinois University June 18, 2018 Abstract This paper gives large sample theory for
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationEmpirical Economic Research, Part II
Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationTuning parameter selectors for the smoothly clipped absolute deviation method
Tuning parameter selectors for the smoothly clipped absolute deviation method Hansheng Wang Guanghua School of Management, Peking University, Beijing, China, 100871 hansheng@gsm.pku.edu.cn Runze Li Department
More informationComparison of Some Improved Estimators for Linear Regression Model under Different Conditions
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 3-24-2015 Comparison of Some Improved Estimators for Linear Regression Model under
More informationKey Words: first-order stationary autoregressive process, autocorrelation, entropy loss function,
ESTIMATING AR(1) WITH ENTROPY LOSS FUNCTION Małgorzata Schroeder 1 Institute of Mathematics University of Białystok Akademicka, 15-67 Białystok, Poland mszuli@mathuwbedupl Ryszard Zieliński Department
More informationAR-order estimation by testing sets using the Modified Information Criterion
AR-order estimation by testing sets using the Modified Information Criterion Rudy Moddemeijer 14th March 2006 Abstract The Modified Information Criterion (MIC) is an Akaike-like criterion which allows
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationModel selection using penalty function criteria
Model selection using penalty function criteria Laimonis Kavalieris University of Otago Dunedin, New Zealand Econometrics, Time Series Analysis, and Systems Theory Wien, June 18 20 Outline Classes of models.
More informationSpatial Process Estimates as Smoothers: A Review
Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More information