A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration

Size: px
Start display at page:

Download "A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration"

Transcription

1 A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration Ryoya Oda, Yoshie Mima, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate School of Science, Hiroshima University -3- Kagamiyama, Higashi-Hiroshima, Hiroshima , Japan Last Modified: December 2, 208 Abstract In a multivariate linear regression with a p-dimensional response vector y and a q- dimensional explanatory vector x, we consider a multivariate calibration problem requiring the estimation of an unknown explanatory vector x 0 corresponding to a response vector y 0, based on y 0 and n-samples of x and y. We propose a high-dimensional bias-corrected Akaike s information criterion HAIC C for selecting response variables. To correct the bias between a risk function and its estimator, we use a hybrid-high-dimensional asymptotic framework such that n tends to but p/n does not exceed. Through numerical experiments, we verify that the HAIC C performs better than a formal AIC. Introduction Multivariate linear regression is very widely used in various fields. Let y = y,..., y p and x = x,..., x q be a p-dimensional response vector and a q-dimensional nonstochastic explanatory vector, respectively. A multivariate linear regression is written as y = α + B x + ε, where α is a p-dimensional unknown vector of intercept coefficients, B is a q p unknown matrix of regression coefficients, and ε is a p-dimensional error vector. We assume that ε is distributed according to the p-dimensional normal distribution with a mean vector 0 p, which is a p-dimensional vector of zeros, and a p p covariance matrix Σ. Further, assume that there are n observations y, x,..., y n, x n, which are expressed as Y = y,..., y n : n p and X = x,..., x n : n q in matrix notation, and X is centralized X n = 0 q, where n is the n-dimensional vector of ones. Our model selection criterion is defined for the case that rankx = q p and n p q 2 > 0. Corresponding author. oda.stat@gmail.com Current address: Institute of educational foundation Fukuyama Akenohoshi, 3-4- Nishifukatsucho, Fukuyama-shi, Hiroshima , Japan.

2 In actual empirical contexts, calibration is often needed and is widely applied e.g., Alves and Poppi, 206; Bro, 2003; Carvalho et al., 205. A typical calibration problem can be described as follows. Suppose that a value y 0 of y in has been observed, but the corresponding value x 0 of x is unknown. Then the problem is to make an inference about an unknown vector x 0, based on y 0, Y, and X. Brown 982 summarized various aspects of this problem. There is considerable literature on point estimators of x 0 and confidence regions for x 0 in the controlled calibration model. Another calibration problem involves removing redundant response variables in estimating x 0. Generally, estimating a reduced model on a subset of the available data leads to an inferior fit. This will be the case when all the parameters are known. However, it is understood that when some response variables are redundant, we can expect better prediction by neglecting these redundant response variables. In this vein, two approaches are salient: one based on test procedures for a redundancy hypothesis and the other based on model selection procedures for selecting response variables. Brown 982 proposed a procedure based on a test for redundancy of response variables. In the context of model selection, Fujikoshi and Nishii 986 proposed a formal Akaike s information criterion AIC for the redundancy of response variables in estimating x 0. In the AIC, the goodness of fit of a model is measured by the Kullback-Leibler KL discrepancy function. The best model is defined as that where the risk function defined by the expected KL loss function is lowest among all the redundancy models. The AIC is defined by adding an estimator of the bias between the risk function and the expectation of a negative twofold maximum log-likelihood to the negative twofold maximum log-likelihood, and the AIC is regarded as an estimator of the risk function. Following Akaike 973, Fujikoshi and Nishii 986 formally regarded the twofold number of parameters in the redundancy model as an estimator of bias. However, it is known that when the sample size n is small or the number of redundancy models is large, a formal AIC does not estimate the risk function well and tends to choose a model such that many response variables are not redundant. Therefore, it is important to correct the bias in order to better estimate the risk function. For selecting explanatory variables in an ordinary linear regression, Bedrick and Tsai 994, Hurvich and Tsai 989, and Sugiura 978 corrected the bias for overspecified models. However, these corrections did not consider the selection of response variables in multivariate calibration. In recent years, it has become commonplace to analyze high-dimensional data such that not only the sample size n but also the dimension p are large. Bedrick and Tsai 994, Hurvich and Tsai 989 and Sugiura 978 used a large sample LS asymptotic framework such that n only tends to in order to correct bias. However, corrections using the LS asymptotic framework become inadequate when the response dimension p is large. It is known that corrections using the high-dimensional HD asymptotic framework such that both n and p tend to are better than those using the LS asymptotic framework. Fujikoshi et al. 204 offered a bias correction using the HD asymptotic framework in the context of selecting explanatory variables in a multivariate linear regression. It can be expected that such a correction performs well when both n and p are large but is inadequate when p is small. However, it can be difficult for analysts to decide whether p is large or small; thus, when p is moderate, it can be burdensome to determine whether corrected AICs by the LS or HD asymptotic framework should be applied. In this paper, we correct the bias under a hybrid-high-dimensional HHD asymptotic frame- 2

3 work: n, p c [0,. n It should be noted that the LS and HD asymptotic frameworks are included in the HHD asymptotic framework as special cases. Thus, it is expected that the bias-corrected AIC determined by using the HHD asymptotic framework is a better estimator of the risk function regardless of the size of p. We refer to this bias-corrected AIC as the high-dimensional bias-corrected AIC HAIC C. The remainder of the paper is organized as follows. In section 2, we introduce a framework for selecting response variables in multivariate calibration. In section 3, the HAIC C is proposed and we obtain an asymptotic property of the HAIC C. In section 4, we explore, and verify, the performance of the HAIC C through conducting numerical simulations. relegated to the Appendix. Technical details are 2 A framework for selecting response variables 2. A redundancy hypothesis of response variables Let Z = n, X and Θ = α, B be n q + and q + p matrices. multivariate linear regression for Y and X is written as Then, the Y = ZΘ + E, 2 where E = ε,..., ε n is the n p error matrix and ε,..., ε n are mutually independent and identically distributed according to N p 0 p, Σ assuming that Σ is positive definite. To introduce the redundancy hypothesis of response variables by Fujikoshi and Nishii 986, we prepare a classical estimator of x 0. If the population parameters α, B, and Σ are known, then the classical estimator x 0 is written as x 0 = arg min x 0 {y 0 α B x 0 Σ y 0 α B x 0 } = BΣ B BΣ y 0 α. 3 We partition y = y, y 2. Without loss of generality, let y and y 2 be the p -dimensional and p p -dimensional vectors of the candidate redundant variables and non-redundant variables, respectively. To define the redundancy hypothesis, we consider only the case q p < p. We express the partitions of y 0, α, B, and Σ corresponding to the division of y = y, y 2 as follows: y 0 = y 0,, y 0,2, α = α, α 2 Σ Σ 2, B = B, B 2, Σ =, 4 Σ 2 Σ 22 where y 0, and y 0,2 are p -dimensional and p p -dimensional vectors, α and α 2 are p - dimensional and p p -dimensional vectors, B and B 2 are q p and q p p matrices, Σ and Σ 22 are the p p and p p p p covariance matrices of y and y 2, and Σ 2 is the p p p covariance matrix of y and y 2. Let C = BΣ B BΣ. We also partition the classical estimator x 0 in 3 as x 0 = Cy 0 α = C y 0, α + C 2 y 0,2 α 2, 3

4 where C and C 2 are the q p and q p p submatrices of C = C, C 2. From Fujikoshi and Nishii 986, the hypothesis such that y 2 is redundant for estimating x 0 is written as C 2 = B 2 B Γ = O q,p p, 5 where Γ = Σ Σ 2 and O q,p p is the q p p matrix of zeros. 2.2 Models, Risk, and Bias Let Y and Y 2 be the n p and n p p partitioned matrices of Y = Y, Y 2, and let Θ = α, B, Θ 2 = α 2, B 2, and Θ 2 = Θ 2 Θ Γ. From a property of a conditional distribution of a multivariate normal distribution e.g., Srivastava and Khatri, 979, we can express 2 as follows: Y N n p ZΘ, Σ I n, Y 2 Y N n p p Z Θ 2 + Y Γ, Σ 22 I n, where Σ ab c = Σ ab Σ ac Σ cc Σ ca. Under hypothesis 5, it is straightforward to observe that Z Θ 2 = n δ, where δ = α 2 Γ α. Therefore, the candidate model M such that y 2 is redundant is expressed as Y N n p ZΘ, Σ I n, Y 2 Y N n p p n δ + Y Γ, Σ 22 I n. 6 Let S = Y, Y 2, X I n J n Y, Y 2, X be n times as large as the sample covariance matrix of y, x, where J n = n n n. We partition S as follows: S S 2 S x S = S 2 S 22 S 2x, S x S x2 S xx where the sizes of S, S 22, and S xx are p p, p p p p, and q q, respectively. Let fy ; Θ, Σ be the probability density function of N n p ZΘ, Σ I n. Then, the log-likelihood function is derived as lθ, Σ; Y, X = 2 log fy ; Θ, Σ = [ np log 2π + n log Σ + tr{σ Y ZΘ Y ZΘ} ]. 2 By maximizing lθ, Σ; Y, X under model 6, we can obtain the maximum likelihood estimators MLEs of Σ, Σ 22, α, α 2, B, B 2, Θ, Θ 2, Γ, and δ as follows the proof is given in Appendix A: Σ = n Y I n P Z Y, Σ22 = n Y 2 I n P n,y Y 2, α = ȳ S x Sxx x, B = Sxx S x, Θ = Z Z Z Y = α B α 2 = ȳ 2 S 2 S S xs xx x, B2 = Sxx S x S S 2, ȳ, Θ2 = 2 x Sxx S x S S 2 Sxx S x S S, 2 Γ = S S 2, δ = ȳ2 Γ ȳ, 4

5 where P A = AA A A for a matrix A, ȳ = n Y n, ȳ 2 = n Y 2 n, and x = n X n. We assume that the data are generated from the following true model M : Y N n p, Σ I n, y 0 N p ξ, Σ, where is an n p true mean matrix, ξ is a p-dimensional true mean vector, and Σ is a p p true covariance matrix assuming that Σ is positive definite. Under 6, we partition the true covariance matrix Σ in the same way as Σ, as follows: Σ Σ 2 Σ =. Σ 2 Let Γ = Σ Σ 2. We state that the candidate model M is an overspecified model when M satisfies Σ 22 = ZΘ, B 2 B Γ = O q,p p, 7 where Θ = α, B is a q+ p true unknown matrix, α is the p-dimensional true unknown vector of intercept coefficients, B is the q p true unknown matrix of regression coefficients, and B and B 2 are q p and q p p submatrices of B = B, B 2. Let LΘ, Σ be the expected negative twofold log-likelihood function as follows: LΘ, Σ = E Y [ 2lΘ, Σ; Y, X] = 2lΘ, Σ;, X + ntrσ Σ, 8 where EY is the expectation with respect to Y under the true model M. Then, we define the risk function R KL as R KL = EY [L Θ, Σ]. This type of risk function is essentially the same as the expected KL discrepancy function between the true model and a redundancy model. Although the best model is defined as the candidate model with the smallest risk function, we need to estimate the risk function because the risk function includes unknown parameters. Therefore, we usually estimate R KL as 2l Θ, Σ; Y, X, which is expressed as 2l Θ, Σ; Y, X = np{log 2π + } + nlog Σ + log Σ 22, 9 under a candidate model M the proof of 9 is given in Appendix A. However, when R KL is estimated as 9, there is the following bias between the risk function and 2l Θ, Σ; Y, X: B KL = R KL E Y [ 2l Θ, Σ; Y, X]. 0 Although 2l Θ, Σ; Y, X + B KL may be considered as a criterion, note that the bias B KL also includes unknown parameters. Therefore, we usually consider estimating B KL by using its estimator. 5

6 3 Main results 3. High-Dimensional Bias-Corrected AIC Let the number of parameters in the candidate model M be h M = {p + qp + pp + /2}. Following Akaike 973, by approximating B KL as 2h M, the formal AIC is given by { np{log2π + } + nlog AIC = Σ + log Σ h M q p < p np{log2π + } + n log Σ, ω + 2h M p = p where Σ ω = n Y I n P Z Y is the MLE of Σ in the model where none of the response variables y are redundant. This formal AIC has been given by Fujikoshi and Nishii 986 and the model which minimizes the AIC should be selected. However, since the AIC cannot approximate the risk function well when n is small or when n and p are both large, the AIC may not perform adequately in general terms. Therefore, it is important to consider a new criterion that has a better approximation than the AIC in such cases. Correcting the bias B KL in the HHD asymptotic framework, we propose a high-dimensional bias-corrected AIC HAIC C as follows: np{log 2π + } + nlog Σ + log Σ 22 + mn, p q p < p HAIC C = np log2π + n log Σ npn + q + ω + p = p n p q 2 Here, mn, p is defined by mn, p = m n, p + m 2 n, p, m n, p = np + np n + q + + np q + 2n + q + n p n p 2 + n2 p q n p 3 m 2 n, p = + np p n + + 2np p n + n p n p 2 + 4n2 p p n p 3 + np p [ { q n } ] q + 2 n pn p n p n p + nn + p p { p q } n pn p n p n p + 4n{np { q + p }p p n pn p { + 8n2 p p p n pn p n 2 n pn p {2q + τ τ 2 τ 3 }, and τ, τ 2 and τ 3 are given by n p 2 + n p 2 + n pn p }. 2 n p 3 + n p 3 + n pn p 2 + n p 2 n p }, 3 τ = p p { tr{se S e + S h } p q }, τ 2 = n τ 2, 4 n p n p τ 3 = np p n p 2 { tr[{se S e + S h } 2 ] p q } 5 6

7 where S e = n ˆΣ = Y I n P Z Y and S h = Y P X Y. Practically, to propose the HAIC C when q p < p, we took the following step. We expanded the bias B KL in overspecified models the result is given in Appendix B, Proposition B.2. Since there are unknown parameters included in the expanded term, we replaced the unknown parameters as τ, τ 2 and τ 3. Therefore, the HAIC C is proposed using mn, p instead of B KL. When p = p, the redundancy model means the usual multivariate linear regression such that all response variables y are not redundant. Therefore, from Bedrick and Tsai 994, the bias for p = p is exactly equal to np + {npn + q + }/n p q Asymptotic property of HAIC C To present an asymptotic property of the HAIC C, we start by offering some notation and delineating assumptions. Let Ξ be the p p constant matrix given by Ξ = B X XB. 6 Note that Σ /2 Ξ Σ /2 is the p p symmetric matrix and rankσ /2 Ξ Σ /2 min {p, q} = q. Therefore, its spectral decomposition can be expressed as Σ /2 Ξ Σ /2 = H Λ O p q,q O q,p q O p q,p q H, 7 where H is a p p orthogonal matrix, and Λ is a q q diagonal matrix of which the a-th diagonal element is a singular value λ a, i.e., Λ = diagλ,..., λ q with λ λ q 0. Let G be a q q random matrix distributed according to the non-central Wishart distribution with n degrees of freedom, covariance matrix Φ, and non-central parameter matrix ΛΦ, that is, where Φ is defined by We prepare the following assumptions for the moments of G: Assumption A. E[trG 2 ] = O. G W q n, Φ ; ΛΦ, 8 Φ = n I q + Λ. 9 Assumption A2. E[trG 4 ] = O, E[trG 2 2 ] = On + λ q, E[trG 8 ] = On + λ q 3. Note that Φ = On + λ q and ΛΦ = O because Φ 2 = ΛΦ 2 = q n + λ i 2 q n + λ q 2 = On + λ q 2, q 2 λ i qλ 2 n + λ i n + λ 2 = O, i= i= where is the Frobenius norm. Therefore, Assumptions A and A2 are natural. We obtain the asymptotic unbiasedness of the HAIC C for R KL as Theorem 3.. Theorem 3. is directly derived from 0, Proposition B.2 in Appendix B, and Proposition C.2 in Appendix C. 7

8 Theorem 3.. Suppose that q p < p, and Assumptions A and A2 hold. Then, under overspecified models 7, we have R KL E Y [HAIC C ] = Opn n + λ q /2, as n and p/n c [0,, where λ q is the q-th singular value of Σ /2 Ξ Σ /2. From Theorem 3., the HAIC C approximates the risk function well in cases where n is not large and when n and p are both large. Thus, it is expected that the HAIC C will work well. 4 Numerical simulations We conduct numerical simulations to show that the HAIC C in 2 works better than the AIC in. The p redundancy models j j =,..., p were prepared for Monte Carlo simulations with 0,000 iterations. Here, model j denotes a model such that y,..., y j are not redundant and y j+,..., y p are redundant. Suppose that the true model is the model such that p -response variables y,..., y p are not redundant. The data Y were generated from the true model N n p ZΘ, Σ I n, where Z = n, X, Θ = α, B, and we gave α = p and Σ = 0.8I p p p. We constructed X and B as follows. First, we independently generated u,..., u n from U,, where Ua, b denotes the uniform distribution with the range a, b. Using u,..., u n, we constructed X as X = I n J n X 0, where i, j-th element of X 0 is defined by u j i i n, j q. Second, let B and B 2 be submatrices of B in 4 when p = p. Similarly, Σ and Σ 2 are submatrices of Σ. Using the above notation, we constructed B 2 as B 2 = B Σ Σ 2, where the l l p -th column vectors and p - th column vector of B are defined by, 2,..., 2 and, 2p,..., 2p, respectively. Next, we set q = 2, and data were generated for various combinations of n, p, and p. In order to compare the performances of the HAIC C and AIC, the following three properties were considered: Ratio of the information criterion IC average to the risk function R KL : E Y [IC]/R KL. The probability of selecting the true model: the frequency with which the true model is chosen as the best model. The KL information of the predicted values of the best model chosen by the information criterion, which is defined by KL = np {E Y [L Θ best, Σ best ] LΘ, Σ }, where LΘ, Σ is given by 8 and Θ best and Σ best are the MLEs of Θ and Σ, respectively, under the best model. A high-performance model selector is considered to be a model selection criterion where the ratio of the information criterion average to the risk function is close to, where there is a high probability of selecting the true model, and where there is a small amount of KL information. Figures -3 show the ratio of the average of each criterion, i.e., the HAIC C and the AIC to R KL. From Figures -3, it can be observed that the values for the HAIC C are closer to than those for 8

9 the AIC. Therefore, the HAIC C can approximate the risk exactly, but the AIC underestimates this risk. Moreover, as the number of dimensions p increases, the bias of the AIC increases. Table shows the probabilities of selecting the true model and the amount of KL information for the HAIC C and AIC. Therein, it is clear that the probability for the HAIC C is higher than that for the AIC. On the other hand, the KL information for the HAIC C nearly equals that for the AIC, except when p = 80 and p = 60, when it is significantly smaller than that for the AIC. p = 0, p = 5 p = 20, p = 5 p = 20, p = 0 p = 40, p = 5 p = 40, p = 20 Figure : Ratio of the average of each criterion to the risk function with n = 50 9

10 p = 0, p = 5 p = 40, p = 5 p = 40, p = 20 p = 80, p = 5 p = 80, p = 40 Figure 2: Ratio of the average of each criterion to the risk function with n = 00 0

11 p = 0, p = 5 p = 80, p = 5 p = 80, p = 40 p = 60, p = 5 p = 60, p = 80 Figure 3: Ratio of the average of each criterion to the risk function with n = 200

12 Table : Probabilities of selecting the true model and the KL information of the predicted values of the best model Probabilities % KL n p p HAIC C AIC HAIC C AIC Acknowledgments Ryoya Oda was supported by a Research Fellowship for Young Scientists from the Japan Society for the Promotion of Science. Hirokazu Yanagihara and Yasunori Fujikoshi were both partially supported by the Ministry of Education, Science, Sports, and Culture through Grantsin-Aid for Scientific Research C, #8K0345 and #6K00047, respectively. Appendix A Deriving MLEs and proof of 9 To derive MLEs, we use the following lemma e.g., Fujikoshi et al., 200, A.2.: Lemma A.. Let Y be an n p known matrix and A be an n k known matrix of rank k. Consider a function of p p positive definite matrix Σ and k p matrix Θ given by gθ, Σ = m log Σ + tr{y AΘΣ Y AΘ }, where m > 0 and all elements of Θ are bounded. Then, gθ, Σ takes the minimum at Θ = Θ = A A A Y, Σ = Σ = m Y I n P A Y, 2

13 and the minimum value is given by m log Σ + mp. First, we derive MLEs of Σ, Σ 22, α, α 2, B, B 2, Θ, Θ 2, Γ, and δ. From 6, we have 2lΘ, Σ; Y, X = np log 2π + n log Σ + tr{y ZΘ Σ Y ZΘ } + n log Σ 22 + tr{y 2 n δ Y ΓΣ 22 Y 2 n δ Y Γ }. Let g Θ, Σ = n log Σ +tr{y ZΘ Σ Y ZΘ } and g 2 δ, Γ, Σ 22 = n log Σ 22 + tr{y 2 n δ Y ΓΣ 22 Y 2 n δ Y Γ }. The set of unknown parameters {Θ, Σ} exhibits one-to-one correspondence with the set {α, B, C 2, δ, Γ, Σ, Σ 22 }, and each set of parameters for g and g 2 is separated. Thus, we only consider each minimization of g and g 2. Note that rankz = q + and rank{ n, Y } = p +. Then, from Lemma A., we can state that min g Θ, Σ = n log Σ + tr{y Z Θ Σ Y Z Θ } Θ,Σ = n log Σ + np, min δ,γ,σ 22 g 2 δ, Γ, Σ 22 = n log Σ 22 + tr{y 2 n δ Y Γ Σ 22 Y 2 n δ Y Γ } = n log Σ 22 + np p, where Θ = Z Z Z Y, Σ = n Y I n P Z Y, δ, Γ = { n, Y n, Y } n, Y Y 2 and Σ 22 = n Y 2 I n P n,y Y 2. Therefore, the MLEs of α, α 2, B, B 2, and Θ 2 can also be derived from Θ and δ. Next, we show 9. From Lemma A., it is clearly that 2l Θ, Σ; Y, X = np{log 2π + } + nlog Σ + log Σ 22. B Calculating and expanding bias Here, we present propositions for calculating and expanding the bias B KL in 0. B. Calculating bias and its proof Proposition B.. Suppose that q p < p. In overspecified models 7, we obtain the exact expression of the bias as follows: B KL = np + np n + q + n p q 2 + nn + p p n p 2 where Ξ is defined in 6. + nn + p p EY [trs n p 2 Σ ] + np p n p 2 E Y [trs Ξ ], B. The proof of Proposition B. is presented as follows. To calculate the bias, we describe a lemma for distributions of some statistics the proof is given in Appendix D. 3

14 Lemma B.. Suppose that q p < p. Let Σ 22 be the MLE of Σ 22 under model 6, and let [ Γ ] T = {Y I n J n Y } /2 {Y I n J n Y } Y I n J n Ω Γ Σ /2 22, B.2 ] [ δ U = { n, Y n, Y } /2 { n, Y Γ n, Y } n, Y Ω + Y Γ Σ /2 22, B.3 where Ω = 2 Γ, and and 2 are the n p and n p p partitioned matrices of =, 2. Then, T and U are independent of Y, and n Σ 22 and δ, Γ are mutually conditional independent under Y, and n Σ 22 Y W p p n p, Σ 22 ; Ω, T N p p p O p,p p, I p p I p, U N p+ p p O p +,p p, I p p I p +, where Ω = Ω I n P n,y Ω. Moreover, if the model is overspecified model 7, then n Σ 22 and Y are mutually independent, n Σ 22 and δ, Γ are also mutually independent, and n Σ 22 W p p n p, Σ 22, T = {Y I n J n Y } /2 Γ Γ Σ /2 U = { n, Y n, Y } /2 δ δ Γ Γ 22, Σ /2 22, where δ = α 2 Γ α and α and α 2 are the p - and p p -dimensional subvectors of α = α, α 2. From the general formula of the determinant of a partitioned matrix e.g., Lütkepohl, 997, , log Σ = log Σ + log Σ 22 holds. Under overspecified models 7, L Θ, Σ can be expressed as L Θ, Σ = np log 2π + nlog Σ + log Σ 22 + ntrσ Σ + tr{ Σ Θ Θ Z Z Θ Θ }. Thus, from the above equation and 9, the bias B KL in 0 can be expressed as [ B KL = np + EY ntrσ Σ + tr{ Σ Θ Θ Z Z Θ ] Θ }. We separate the second and third terms in the above equation into four terms in order to calculate the bias. It follows from the general formula for the inverse of a block matrix e.g., Harville, 997, Theorem 8.5. that Σ Σ O p,p p = Γ + O p p,p O p p,p p I p p Σ 22 Γ, I p p. Then, we can derive another expression of the bias B KL as follows: [ { }] B KL = np + ney [trσ Σ ] + ne Y tr Σ 22 Γ Γ, I p p Σ I p p 4

15 [ + EY tr{ Σ Θ Θ Z Z Θ ] Θ } [ { + E Y tr Σ 22 Γ, I p p Θ Θ Z Z Θ Θ Γ I p p }] = np + i + ii + iii + iv say, B.4 where Θ and Θ 2 are the q + p and q + p p submatrices of Θ = Θ, Θ 2. Now, we calculate i, ii, iii, and iv in B.4. Starting with i in B.4, it can be stated that nσ /2 Σ Σ /2 is distributed according to W p n q, I p. Thus, by using the expectation of an inverted Wishart distribution see, Watamori, 990, we have i = ne Y [trσ Σ ] = n 2 p n p q 2. B.5 Now we calculate ii in B.4. From Lemma B., by using the expectation of an inverted Wishart distribution e.g., Fujikoshi et al., 200, Theorem 2.2.7, the expectation of EY [ Σ 22 ] can be calculated as EY [ Σ 22 ] = ne Y [n Σ 22 n ] = n p 2 Σ 22. From the above equation and the fact that Σ 22 is independent of Γ, ii can be expressed as [ }] n 2 Γ ii = n p 2 E Y tr {Σ Σ 22 Γ, I p p. By the definition of T in B.2, we have Γ Σ Σ 22 Γ, I p p I p p = Σ Γ I p p Σ 22 Γ, I p p I p p {Y I n J n Y } /2 T Σ / Σ Γ {Y I n J n Y } /2 T Σ /2 22 O p p,p O p p,p p {Y I n J n Y } /2 T Σ / Σ Γ {Y I n J n Y } /2 T Σ /2 22 O p p,p O p p,p p {Y I n J n Y } /2 T T {Y I n J n Y } /2 O p,p p + Σ O p p,p O p p,p p = ii.a + ii.b + ii.c + ii.d say. From the general formula for the inverse of a block matrix, it is straightforward to observe that tr{ii.a} = p p. Since T and Y are mutually independent, we can calculate the expectations of tr{ii.b} and tr{ii.c} as follows: E Y [tr{ii.b}] = 0, E Y [tr{ii.c}] = 0. 5

16 Note that E Y [T T ] = p p I p. Then, from the independence of T and Y, we can calculate tr{ii.d} as E Y [tr{ii.d}] = p p E Y [ tr { Σ {Y I n J n Y } }]. Hence, ii can be expressed as ii = n2 p p n p 2 { + E Y [ tr { Σ {Y I n J n Y } }]}. B.6 Now we move on to calculating iii in B.4. Recall that n Σ = Y I n P Z Y and Θ = Z Z Z Y. Since it is straightforward to observe that Z Z Z I n P Z = O q+,n, n Σ and Θ are mutually independent from Cochran s Theorem e.g., Fujikoshi et al., 200, Theorem Thus, by using a property of conditional expectations, we obtain n iii = n p q 2 E Y = n n p q 2 trp ZtrI p = np q + n p q 2. [ tr{σ P ZY ZΘ P Z Y ZΘ } ] B.7 Finally, we calculate iv in B.4. Recall that Θ is expressed by α, α 2, B, and B 2. Then, we can express Γ, I p p Θ Θ as Γ, I p p Θ Θ = Γ α α B B, I p p α 2 α 2 B 2 B 2 = α 2 α 2 Γ α α, B 2 B 2 Γ B B. From the definitions of δ, α, and α 2, we have δ = α 2 Γ α. Thus, α 2 α 2 Γ α α = α 2 Γ α α 2 Γ α = δ δ + α 2 Γ α α 2 Γ α B.8 = δ δ + Γ Γ α { } =, α δ δ. B.9 Γ Γ Since it is straightforward that B 2 B Γ = Oq,p p and it is prudent that B 2 B Γ = O q,p p under overspecified models, we obtain B 2 B 2 B B Γ = B 2 B Γ B2 B Γ + B Γ Γ δ δ = 0 q, B. B.0 Γ Γ By using B.9 and B.0, we can express B.8 as Γ, I p p Θ Θ δ δ α = Γ Γ 0 q B 6

17 = Σ /2 22 U { n, Y n, Y } /2 α, 0 q B where U is defined in B.3. Thus, by using the above equation and the independence of U and Σ 22, iv can be expressed as follows: [ }] iv = np p n p 2 E Y tr {{ n, Y n, Y } n nα nα nα α + B. X XB From the general formula for the inverse of a block matrix, { n, Y n, Y } is expressed as { n, Y n, Y } n + ȳ = S ȳ ȳ S S ȳ S. Note that S and ȳ are mutually independent and E Y [ȳ ] = α. Thus, we can calculate iv as follows: iv = np p n p 2 = np p n p 2 [ [ { + E Y tr {nȳ α ȳ α + B X XB }S }]] [ [ { + E Y tr {Σ + B X XB }S }]]. B. Therefore, from B.4, B.5, B.6, B.7, and B., Proposition B. can be derived. B.2 Results for expanding bias and its proof Proposition B.2. Suppose that q p < p and Assumption A both hold. Then, under overspecified models 7, we have B KL = mn, p + Op max {n 2, n + λ q 3/2 }, B.2 as n, p/n c [0,, where λ q is the q-th diagonal element of Λ defined in 7. Here, mn, p is the constant given by mn, p = m n, p + m 2 n, p, where m n, p is defined in 3, and m 2 n, p is given by m 2 n, p = n2 p p { 2q + trφ ntrφ 2 ntrφ 2 }, n pn p in which Φ is defined by 9. The proof of Proposition B.2 is presented as follows. We use the following lemma the proof is given in Appendix D. Lemma B.2. Suppose that n q 2 > 0. Let G be random matrices defined by G = n + λ q G I q, 7

18 where G is defined by 8, λ q are given by 7, and Φ = n I q + Λ is defined by 9. Then, using the HHD asymptotic framework, we have G = O p, and G can be expanded as G = I q G + n + λq n + λ G2 + R G, q B.3 where R G = n + λ q 3/2 G G3 = O p n + λ q 3/2. Further, when Assumption A holds, we have E[trD Φ R G ] = On n + λ q 3/2, B.4 where D = + n I q + n Λ and Λ is given in 7. We focus on the case that q < p < p because the proof is simpler where p = q. Let V = H Σ /2 S Σ /2 H, B.5 where H is the p p orthogonal matrix given by 7. It is straightforward that V is distributed according to W p n, I p ; Λ, where Λ is given by Λ O q,p q Λ =. B.6 O p q,q O p q,p q Further, we partition V as V V 2 V =, V 2 V 22 where the sizes of V and V 22 are q q and p q p q, respectively. Let Ṽ be the p p random matrix given by Ṽ = Φ /2 V Φ /2. B.7 Since Φ = E Y [V ], it is straightforward that Ṽ is distributed according to W q n, Φ ; ΛΦ. Now, we calculate the parts of the expectations in B.. Using the definitions in 7 and B.5, we have nn + p p EY [trs n p 2 Σ ] + np p n p 2 E Y [trs Ξ ] = n2 p p [ n p 2 E Y trdv ], B.8 where D and the partitioned expression are given by D = + n I p + n Λ = D O q,p q O p q,q D 22 By applying the general formula for the inverse of a block matrix to V, trdv can be expressed as trdv = trd V + trd V V 2V 22 V 2V + trd 22V 22. Recall that V W p n, I p ; Λ. By using the properties of a partitioned non-central Wishart matrix see Kabe, 964, we can state that V W q n, I q ; Λ, V 22 W p qn q. 8

19 , I p q, V 2 V /2 N p q p O p q,p, I p I p q, and V, V 22 and V 2 V /2 are mutually independent. Thus, we can calculate the expectations of trd V V 2V22 V 2V and trd 22 V22 as follows: E Y [trd V V 2V 22 V 2V ] = p 2 n p 2 E Y [trd V ], EY [trd 22 V22 ] = n p 2 trd 22. Hence, from the above equations, E Y [ trdv ] is expressed as E Y [ trdv ] = n q 2 n p 2 E Y [trd V ] + n p 2 trd 22. B.9 From the definition of Ṽ in B.7, EY [trd V ] is expressed as E Y [trd Φ Ṽ ]. Let L = n + λ q Ṽ I q. Then, by using B.3 and B.4 in Lemma B.2, we can expand EY [trd V ] as E Y [trd V ] = trd Φ + n + λ q E Y [trd Φ L 2 ] + On n + λ q 3/2. B.20 From the expectation of a non-central Wishart distribution, n + λ q E Y [trd Φ L 2 ] can be calculated as EY [trd Φ L 2 ] n + λ q = 2 trφ 3 n 5 trφ 2 + n From B.2, we can express B.20 as E Y [trd V ] = trφ 2 trφ trφ 2 2 trφ trφ 2 n n 2q + trφ + q n n trd Φ. B.2 2q + trφ + q n n + On n + λ q 3/2. B.22 Thus, by using B.8, B.9, B.22, and Proposition B., we can expand the bias as follows: B KL = np + np n + q + n p q 2 + nn + p p n p 2 + n2 n q 2p p n p 2n p 2 + n 2 p p n p 2n p 2 { trφ 2 trφ 2 + 2q + trφ + q } n n + p q + Opn + λ q 3/2. n Note that n p q 2 = n p n p 2 = n p { + q + 2 n p + } q + 22 n p 2 + On 3, }, { + 2 n p + 4 n p 2 + On 3 9

20 n p 2 = { + 2 n p n p 2n p 2 = n p + } 4 n p 2 + On 3, { + 2 n p + 2 n p + n pn p 4 + n pn p + 8 n p n p 3 8 n p 2 n p + 8 n pn p 2 + On 4 4 n p n p 2 }. Therefore, by using the above equations, we can derive B.2. C Asymptotic properties of τ, τ 2 and τ 3 Here we present the results for expanding τ, τ 2 and τ 3 defined by 4 and 5. C. Expanding τ, τ 2, and τ 3 and the proof Proposition C.. Suppose that q p < p. Then, we have τ = p p trφ + O p pn /2 n + λ q, τ 2 = np p trφ 2 + O p pn /2 n + λ q 2, τ 3 = np p trφ 2 + O p pn /2 n + λ q 2, as n, p/n c [0,, where Φ is defined by 9. The proof of Proposition C. is presented as follows. To expand τ, τ 2 and τ 3 by using a HHD asymptotic framework, we use the following lemma which was essentially obtained in Wakaki et al. 204: Lemma C.. Let F h = F F and F e be independently distributed according to W p q, I p ; M M and W p n, I p, respectively. Here, F is a q p random matrix distributed according to N q p M, I p I q. Put B = F F, W = B /2 F F e F B /2. Then, B and W are independently distributed according to W q p, I q ; MM and W q n p+q, I q, respectively. Further, the nonzero eigenvalues of F h Fe we have tr{f e F e + F h } = tr{w W + B } + p q, are the same as those of BW, and tr [ {F e F e + F h } 2] = tr [ {W W + B } 2] + p q. We consider the distributions of S e and S h defined in 5. Since Y is distributed according to N n p ZΘ, Σ I n, it is straightforward that S e and S h are mutually independent and S e W p n q, Σ, S h W p q, Σ ; Ξ, 20

21 where Ξ is given by 6. Let T e = H Σ /2 S eσ /2 H, T h = H Σ /2 S hσ /2 H, where H is the p p orthogonal matrix defined in 7. non-central Wishart matrices that It follows from the properties of T e W p n q, I p, T h W p q, I p ; Λ, where Λ is defined by B.6. Then, we can express tr{s e S e +S h } and tr[{s e S e +S h } 2 ] as tr{s e S e + S h } = tr{t e T e + T h }, tr[{s e S e + S h } 2 ] = tr [ {T e T e + T h } 2]. First, we express tr{s e S e + S h } and tr[{s e S e + S h } 2 ] as functions of q q matrices in order to examine their asymptotic behaviors. From the definition of the non-central Wishart distribution, a different expression for T h is given by T h = U h U h, where U h N q p Λ, I p I q and Λ is the q p partitioned matrix of Λ = Λ, O p,p q satisfying Λ Λ = Λ. Let W = U h U h, W 2 = W /2 U h T e U h W /2. C. Then, from Lemma C., W and W 2 are independently distributed according to W q p, I q ; Λ and W q n p, I q, respectively, and we have tr{s e S e + S h } = tr{w 2 W + W 2 } + p q, tr[{s e S e + S h } 2 ] = tr [ {W 2 W + W 2 } 2] + p q. Next, we expand tr{w 2 W + W 2 } and tr [ {W 2 W + W 2 } 2]. Let W 3 = Φ /2 W + W 2 Φ /2. C.2 Since W and W 2 are mutually independent, we can state that W 3 W q n, Φ ; ΛΦ. Therefore, using B.3 in Lemma B.2, W 3 can be expanded as W 3 = I q + O p n + λ q /2. C.3 By calculating the expectation and variance of W 2, we can also expand W 2 as Thus, Φ /2 W 2 Φ /2 is expressed as W 2 = n p I q + O p n /2. Φ /2 W 2 Φ /2 = n p Φ + O p n /2 n + λ q. C.4 From C.3 and C.4, the following equations can be derived: tr{w 2 W + W 2 } = trφ /2 W 2 Φ /2 W 3 = n p trφ + O p n /2 n + λ q, tr{w 2 W + W 2 } 2 = n p 2 trφ 2 + O p n 3/2 n + λ q 2, 2

22 tr [ {W 2 W + W 2 } 2] = tr{φ /2 W 2 Φ /2 W 3 2 } Thus, we can derive the following equations: = n p 2 trφ 2 + O p n 3/2 n + λ q 2. p p n p tr{w 2 W + W 2 } = p p trφ + O p pn /2 n + λ q, np p n p 2 tr{w 2W + W 2 } 2 = np p trφ 2 + O p pn /2 n + λ q 2, np p n p 2 tr [ {W 2 W + W 2 } 2] = np p trφ 2 + O p pn /2 n + λ q 2. Therefore, we can derive the result of Proposition C.. C.2 Expanding the expectations of τ, τ 2, and τ 3 Proposition C.2. Suppose that q p < p and Assumptions A and A2 hold. Then, we have E Y [ τ ] = p p trφ + Opn n + λ q /2, E Y [ τ 2 ] = np p trφ 2 + Opn n + λ q /2, E Y [ τ 3 ] = np p trφ 2 + Opn n + λ q /2, as n, p/n c [0,, where Φ is defined by 9. The proof of Proposition C.2 is presented as follows. Let W 3 = n + λ q W 3 I q, where W 3 is given by C.2. Then, from Lemma B.2, we can observe that W3 is expanded as W3 = I q R W3, where R W3 = n + λ q /2 W3 W 3. We consider EY [tr{w 2W + W 2 }], EY [tr{w 2W + W 2 } 2 ] and EY [tr[{w 2W + W 2 } 2 ]], where W and W 2 are given by C.. By using R W3, we have E Y [tr{w 2 W + W 2 }] = E Y [trφ /2 W 2 Φ /2 ] E Y [trφ /2 W 2 Φ /2 R W3 ], E Y [tr{w 2 W + W 2 } 2 ] = E Y [trφ /2 W 2 Φ /2 2 ] 2E Y [trφ /2 W 2 Φ /2 trφ /2 W 2 Φ /2 R W3 ] C.5 + E Y [trφ /2 W 2 Φ /2 R W3 2 ], C.6 E Y [tr[{w 2 W + W 2 } 2 ]] = E Y [tr{φ /2 W 2 Φ /2 2 }] 2E Y [tr{φ /2 W 2 Φ /2 2 R W3 }] + E Y [tr{φ /2 W 2 Φ /2 R W3 2 }]. C.7 Since Φ /2 W 2 Φ /2 is distributed according to W q n p, Φ, we can calculate the expectation as follows: E Y [trφ /2 W 2 Φ /2 ] = n p trφ + On + λ q, E Y [trφ /2 W 2 Φ /2 2 ] = n p 2 trφ 2 + Onn + λ q 2, E Y [tr{φ /2 W 2 Φ /2 2 }] = n p 2 trφ 2 + Onn + λ q 2. C.8 C.9 C.0 22

23 By using the Cauchy-Schwarz inequality, we have E Y [trφ /2 W 2 Φ /2 R W3 ] = EY [trφ /2 W 2 Φ /2 W W 3 ] n + λq 3 n + λq E Y [tr{φ /2 W 2 Φ /2 4 }] /2 E Y [tr W 4 3 ] /2 E Y [trw 2 3 ] /2 = On + λ q /2, C. E Y [trφ /2 W 2 Φ /2 trφ /2 W 2 Φ /2 R W3 ] n + λq E Y [tr{φ /2 W 2 Φ /2 8 }] /4 E Y [tr W 4 3 ] /4 E Y [trw 2 3 ] /2 = On + λ q /2, C.2 E Y [trφ /2 W 2 Φ /2 R W3 2 ] n + λ q E Y [tr{φ /2 W 2 Φ /2 4 } 2 ] /4 E Y [tr{ W 4 3 } 2 ] /4 E Y [tr{w 2 3 } 2 ] /2 = On + λ q /2, C.3 E Y [tr{φ /2 W 2 Φ /2 2 R W3 }] n + λq E Y [tr{φ /2 W 2 Φ /2 4 }] /2 E Y [tr W 8 3 ] /4 E Y [trw 4 3 ] /4 = On + λ q /2, C.4 E Y [tr{φ /2 W 2 Φ /2 R W3 2 }] n + λ q 2 E Y [tr{φ /2 W 2 Φ /2 4 }] /2 E Y [tr W 8 3 ] /2 E Y [trw 8 3 ] /2 = On + λ q /2. C.5 Therefore, from C.5-C.5, we have E Y [ τ ] = p p trφ + Opn n + λ q /2, E Y [ τ 2 ] = np p trφ 2 + Opn n + λ q /2, E Y [ τ 3 ] = np p trφ 2 + Opn n + λ q /2. D Proofs of Lemma B. and B.2 D. Proof of Lemma B. From a property of a conditional distribution of a multivariate normal distribution e.g., Srivastava and Khatri, 979, we have Y 2 Y N n p pω + Y Γ, Σ 22 I n. Then, another expression of the true model M is given as follows: Y N n p, Σ I n, Y 2 Y N n p pω + Y Γ, Σ 22 I n. 23

24 On the other hand, we can express Y 2 as Y 2 = Ω + Y Γ + AΣ /2 22, D. where A and Y are mutually independent, and A is distributed according to N n p p O n,p p, I p p I n. By using D., n Σ 22, Γ, and δ, Γ are expressed as n Σ 22 = Ω + AΣ /2 22 I n P n,y Ω + AΣ /2 22. D.2 Γ = {Y I n J n Y } Y I n J n Ω + AΣ / Γ, δ = { n, Y Γ n, Y } n, Y Ω + Y Γ + AΣ /2 22. Hence, the conditional distribution of n Σ 22, Γ, and δ, Γ under Y are n Σ 22 Y W p p n p, Σ 22 ; Ω, Γ Y N p p p {Y I n J n Y } Y I n J n Ω + Γ, Σ 22 {Y I n J n Y }, δ Γ Y N p+ p p { n, Y n, Y } n, Y Ω + Y Γ, Σ 22 { n, Y n, Y }. From the above equations, we can state that T and U are distributed according to N p p p O p,p p, I p p I p and N p + p p O p+,p p, I p p I p+, respectively, and are independent of Y. Next, we show that n Σ 22 and δ, Γ are mutually conditionally independent under Y. It is straightforward that { n, Y n, Y } n, Y I n P n,y = O p +,n. D.3 Hence, from Cochran s Theorem, the conditional independence of n Σ 22 and δ, Γ is obtained. Finally, we consider the case that the model is overspecified. By the definition of overspecified models in 7, the equation Ω = n δ holds. Thus, from D.2, we can state that n Σ 22 = Σ /2 22 A I n P n,y AΣ /2 22 W p p n p, Σ 22, and n Σ 22 and Y are mutually independent. Also, T and U are expressed as T = {Y I n J n Y } /2 Γ Γ Σ /2 U = { n, Y n, Y } /2 δ δ Γ Γ 22, Σ /2 22, and n Σ 22 and δ, ˆΓ are mutually independent from D.3. D.2 Proof of Lemma B.2 This proof is based on an idea in Hashiyama et al From the expectation of a noncentral Wishart distribution, we can calculate the expectations as E[G] = I q and E [ G I q 2] = n trφ 2 n trφ 2 + 2q + trφ 24

25 = On + λ q. The above equation implies that G = O p. By applying Taylor expansion to G, we derive G = I q + G = I q G + n + λq n + λq n + λ G2 + R G, q D.4 where R G is a remainder term in the Taylor expansion. First, we calculate an explicit form of R G. From D.4, the following equation can be derived: I q = GG = I q + = I q + GR G G I q n + λq n + λ q 3/2 G 3. From the above equation, R G can be expressed as R G = G + n + λq n + λ G2 + R G q n + λ q 3/2 G G3. Next, we show B.4. By using the Cauchy-Schwarz inequality, we have E[ trd Φ R G ] n + λ q 3/2 { E[trD 2 Φ 2 G6 ]E[trG 2 ]} /2. We can observe that D Φ = On because nd Φ 2 = q 2 n + + λi = O. i= n + λ i By using results for expectations of the multivariate normal random vector e.g., Gupta and Nagar, 2000, we can calculate E[trD 2 Φ 2 G6 ] = On 2. Therefore, we derive B.4. References [] Akaike, H Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory eds. B. N. Petrov & F. Csáki, pp Akadémiai Kiadó, Budapest. [2] Alves, J. C. L. & Poppi, R. J Quantification of conventional and advanced biofuels contents in diesel fuel blends using near-infrared spectroscopy and multivariate calibration. Fuel, 65, [3] Bedrick, E. J. & Tsai, C.-L Model selection for multivariate regression in small samples. Biometrics, 50, [4] Bro, R Multivariate calibration: What is in chemometrics for the analytical chemist? Anal. Chim. Acta, 500, [5] Brown, P. J Multivariate calibration. J. R. Statist. Soc., B,

26 [6] Carvalho, B. M. A., Carvalho, L. M., Coimbra, J. S. R., Minim, L. A., Barcellos, E. S., Júnior, W. F. S., Detmann, E. & Carvalho, G. G. P Rapid detection of whey in milk powder samples by spectrophotometric and multivariate calibration. Food Chem., 74, 7. [7] Fujikoshi, Y. and Nishii, R Selection of variables in multivariate regression. Hiroshima Math. J., 3, [8] Fujikoshi, Y., Sakurai, T. & Yanagihara, H Consistency of high-dimensional AICtype and C p -type criteria in multivariate linear regression. J. Multivariate Anal., 23, [9] Fujikoshi, Y., Ulyanov, V. V. & Shimizu, R Multivariate Statistics: High- Dimensional and Large-Sample Approximations. John Wiley & Sons, Inc., Hoboken, New Jersey. [0] Gupta, A. & Nagar, D Matrix Variate Distributions. Chapman and Hall/CRC, Boca Raton, FL. [] Harville, D. A Matrix Algebra from a Statistician s Perspective. Springer-Verlag, New York. [2] Hashiyama, Y., Yanagihara, Y. & Fujikoshi, Y Jackknife bias correction of the AIC for selecting variables in canonical correlation analysis under model misspecification. Linear Algebra Appl., 455, [3] Hurvich, C. M. & Tsai, C.-L Regression and time series model selection in small samples. Biometrika, 76, [4] Kabe, D. G A note on the Bartlett decomposition of a Wishart matrix. J. Roy. Statist. Soc. Ser. B, 26, [5] Lütkepohl, H Handbook of Matrices. Wiley, Chichester. [6] Srivastava, M. S. & Khatri, C. G An Introduction to Multivariate Statistics. North- Holland, New York. [7] Sugiura, N Further analysis of the data by Akaike s information criterion and the finite corrections, Comm. Statist. Theory Methods, A7, [8] Wakaki, H., Fujikoshi, Y. & Ulyanov, V. V Asymptotic expansions of the distributions of MANOVA test statistics when the dimension is large. Hiroshima Math. J., 44, [9] Watamori, Y On the moments of traces of Wishart and inverted Wishart matrices. South African Statist. J., 24,

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,

More information

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems

More information

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification TR-No. 13-08, Hiroshima Statistical Research Group, 1 23 A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

More information

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3

More information

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Tetsuro Sakurai and Yasunori Fujikoshi Center of General Education, Tokyo University

More information

Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

More information

Likelihood Ratio Tests in Multivariate Linear Model

Likelihood Ratio Tests in Multivariate Linear Model Likelihood Ratio Tests in Multivariate Linear Model Yasunori Fujikoshi Department of Mathematics, Graduate School of Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi Hiroshima, Hiroshima 739-8626,

More information

An Unbiased C p Criterion for Multivariate Ridge Regression

An Unbiased C p Criterion for Multivariate Ridge Regression An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University

More information

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan.

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan. High-Dimensional Properties of Information Criteria and Their Efficient Criteria for Multivariate Linear Regression Models with Covariance Structures Tetsuro Sakurai and Yasunori Fujikoshi Center of General

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan A STUDY ON THE BIAS-CORRECTION EFFECT OF THE AIC FOR SELECTING VARIABLES IN NOR- MAL MULTIVARIATE LINEAR REGRESSION MOD- ELS UNDER MODEL MISSPECIFICATION Authors: Hirokazu Yanagihara Department of Mathematics,

More information

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification (Last Modified: May 22, 2012) Yusuke Hashiyama, Hirokazu Yanagihara 1 and Yasunori

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large Tetsuji Tonda 1 Tomoyuki Nakagawa and Hirofumi Wakaki Last modified: March 30 016 1 Faculty of Management and Information

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

More information

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables Ryoya Oda and Hirokazu Yanagihara Department of Mathematics,

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions TR-No. 14-07, Hiroshima Statistical Research Group, 1 12 Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 2, Mariko Yamamura 1 and Hirokazu Yanagihara 2 1

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION

A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION J. Japan Statist. Soc. Vol. 32 No. 2 2002 165 181 A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION Hisayuki Tsukuma* The problem of estimation in multivariate linear calibration

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE J Jaan Statist Soc Vol 34 No 2004 9 26 ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE Yasunori Fujikoshi*, Tetsuto Himeno

More information

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University

More information

Asymptotic non-null distributions of test statistics for redundancy in high-dimensional canonical correlation analysis

Asymptotic non-null distributions of test statistics for redundancy in high-dimensional canonical correlation analysis Asymtotic non-null distributions of test statistics for redundancy in high-dimensional canonical correlation analysis Ryoya Oda, Hirokazu Yanagihara, and Yasunori Fujikoshi, Deartment of Mathematics, Graduate

More information

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem T Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem Toshiki aito a, Tamae Kawasaki b and Takashi Seo b a Department of Applied Mathematics, Graduate School

More information

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions c 215 FORMATH Research Group FORMATH Vol. 14 (215): 27 39, DOI:1.15684/formath.14.4 Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 1,

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion

A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion TR-No. 17-07, Hiroshima Statistical Research Group, 1 24 A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion Mineaki Ohishi, Hirokazu

More information

Discrepancy-Based Model Selection Criteria Using Cross Validation

Discrepancy-Based Model Selection Criteria Using Cross Validation 33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research

More information

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics

More information

Akaike Information Criterion

Akaike Information Criterion Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =

More information

Edgeworth Expansions of Functions of the Sample Covariance Matrix with an Unknown Population

Edgeworth Expansions of Functions of the Sample Covariance Matrix with an Unknown Population Edgeworth Expansions of Functions of the Sample Covariance Matrix with an Unknown Population (Last Modified: April 24, 2008) Hirokazu Yanagihara 1 and Ke-Hai Yuan 2 1 Department of Mathematics, Graduate

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Geographically Weighted Regression as a Statistical Model

Geographically Weighted Regression as a Statistical Model Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco

More information

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control Takahiro Nishiyama a a Department of Mathematical Information Science, Tokyo University of Science,

More information

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors Takashi Seo a, Takahiro Nishiyama b a Department of Mathematical Information Science, Tokyo University

More information

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical

More information

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful

More information

High-Dimensional Asymptotic Behaviors of Differences between the Log-Determinants of Two Wishart Matrices

High-Dimensional Asymptotic Behaviors of Differences between the Log-Determinants of Two Wishart Matrices TR-No. 4-0, Hiroshima Statistical Research Grou, 9 High-Dimensional Asymtotic Behaviors of Differences between the Log-Determinants of Two Wishart Matrices Hirokazu Yanagihara,2,3, Yusuke Hashiyama and

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

ON A ZONAL POLYNOMIAL INTEGRAL

ON A ZONAL POLYNOMIAL INTEGRAL ON A ZONAL POLYNOMIAL INTEGRAL A. K. GUPTA AND D. G. KABE Received September 00 and in revised form 9 July 003 A certain multiple integral occurring in the studies of Beherens-Fisher multivariate problem

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Nobumichi Shutoh, Masashi Hyodo and Takashi Seo 2 Department of Mathematics, Graduate

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García. Comunicación Técnica No I-07-13/ (PE/CIMAT)

MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García. Comunicación Técnica No I-07-13/ (PE/CIMAT) MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García Comunicación Técnica No I-07-13/11-09-2007 (PE/CIMAT) Multivariate analysis of variance under multiplicity José A. Díaz-García Universidad

More information

Department of Statistics

Department of Statistics Research Report Department of Statistics Research Report Department of Statistics No. 05: Testing in multivariate normal models with block circular covariance structures Yuli Liang Dietrich von Rosen Tatjana

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Effective Sample Size in Spatial Modeling

Effective Sample Size in Spatial Modeling Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS029) p.4526 Effective Sample Size in Spatial Modeling Vallejos, Ronny Universidad Técnica Federico Santa María, Department

More information

ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY

ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY ABOUT PRINCIPAL COMPONENTS UNDER SINGULARITY José A. Díaz-García and Raúl Alberto Pérez-Agamez Comunicación Técnica No I-05-11/08-09-005 (PE/CIMAT) About principal components under singularity José A.

More information

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Modfiied Conditional AIC in Linear Mixed Models

Modfiied Conditional AIC in Linear Mixed Models CIRJE-F-895 Modfiied Conditional AIC in Linear Mixed Models Yuki Kawakubo Graduate School of Economics, University of Tokyo Tatsuya Kubokawa University of Tokyo July 2013 CIRJE Discussion Papers can be

More information

Multivariate Least Weighted Squares (MLWS)

Multivariate Least Weighted Squares (MLWS) () Stochastic Modelling in Economics and Finance 2 Supervisor : Prof. RNDr. Jan Ámos Víšek, CSc. Petr Jonáš 23 rd February 2015 Contents 1 2 3 4 5 6 7 1 1 Introduction 2 Algorithm 3 Proof of consistency

More information

Consistent Bivariate Distribution

Consistent Bivariate Distribution A Characterization of the Normal Conditional Distributions MATSUNO 79 Therefore, the function ( ) = G( : a/(1 b2)) = N(0, a/(1 b2)) is a solu- tion for the integral equation (10). The constant times of

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

A Variant of AIC Using Bayesian Marginal Likelihood

A Variant of AIC Using Bayesian Marginal Likelihood CIRJE-F-971 A Variant of AIC Using Bayesian Marginal Likelihood Yuki Kawakubo Graduate School of Economics, The University of Tokyo Tatsuya Kubokawa The University of Tokyo Muni S. Srivastava University

More information

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS

ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS Ø Ñ Å Ø Ñ Ø Ð ÈÙ Ð Ø ÓÒ DOI: 10.2478/v10127-012-0017-9 Tatra Mt. Math. Publ. 51 (2012), 173 181 ON EXACT INFERENCE IN LINEAR MODELS WITH TWO VARIANCE-COVARIANCE COMPONENTS Júlia Volaufová Viktor Witkovský

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

CONDITIONS FOR ROBUSTNESS TO NONNORMALITY ON TEST STATISTICS IN A GMANOVA MODEL

CONDITIONS FOR ROBUSTNESS TO NONNORMALITY ON TEST STATISTICS IN A GMANOVA MODEL J. Japan Statist. Soc. Vol. 37 No. 1 2007 135 155 CONDITIONS FOR ROBUSTNESS TO NONNORMALITY ON TEST STATISTICS IN A GMANOVA MODEL Hirokazu Yanagihara* This paper presents the conditions for robustness

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

arxiv: v1 [math.st] 2 Apr 2016

arxiv: v1 [math.st] 2 Apr 2016 NON-ASYMPTOTIC RESULTS FOR CORNISH-FISHER EXPANSIONS V.V. ULYANOV, M. AOSHIMA, AND Y. FUJIKOSHI arxiv:1604.00539v1 [math.st] 2 Apr 2016 Abstract. We get the computable error bounds for generalized Cornish-Fisher

More information

Lecture Stat Information Criterion

Lecture Stat Information Criterion Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Junfeng Shang, and Joseph E. Cavanaugh 2, Bowling Green State University, USA 2 The University of Iowa, USA Abstract: Two

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction Acta Math. Univ. Comenianae Vol. LXV, 1(1996), pp. 129 139 129 ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES V. WITKOVSKÝ Abstract. Estimation of the autoregressive

More information

STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION

STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION AND By Zhidong Bai,, Yasunori Fujikoshi, and Jiang Hu, Northeast Normal University and Hiroshima University

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

Obtaining simultaneous equation models from a set of variables through genetic algorithms

Obtaining simultaneous equation models from a set of variables through genetic algorithms Obtaining simultaneous equation models from a set of variables through genetic algorithms José J. López Universidad Miguel Hernández (Elche, Spain) Domingo Giménez Universidad de Murcia (Murcia, Spain)

More information

Orthogonal decompositions in growth curve models

Orthogonal decompositions in growth curve models ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 4, Orthogonal decompositions in growth curve models Daniel Klein and Ivan Žežula Dedicated to Professor L. Kubáček on the occasion

More information

STAT 730 Chapter 5: Hypothesis Testing

STAT 730 Chapter 5: Hypothesis Testing STAT 730 Chapter 5: Hypothesis Testing Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 28 Likelihood ratio test def n: Data X depend on θ. The

More information