Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Size: px
Start display at page:

Download "Consistency of test based method for selection of variables in high dimensional two group discriminant analysis"

Transcription

1 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai 2 Received: 19 July 2018 / Accepted: 16 January 2019 Japanese Federation of Statistical Science Associations 2019 Abstract This paper is concerned with selection of variables in two-group discriminant analysis with the same covariance matrix. We propose a test-based method (TM) drawing on the significance of each variable. Sufficient conditions for the test-based method to be consistent are provided when the dimension and the sample size are large. For the case that the dimension is larger than the sample size, a ridge-type method is proposed. Our results and tendencies therein are explored numerically through a Monte Carlo simulation. It is pointed that our selection method can be applied for high-dimensional data. Keywords Consistency Discriminant analysis High-dimensional framework Selection of variables Test-based method 1 Introduction This paper is concerned with the variable selection problem in two-group discriminant analysis with the same covariance matrix. In a variable selection problem under such discriminant model, one of the goals is to find a subset of variables, whose coefficients of the linear discriminant function are not zero. Several methods including model selection criteria AIC (Akaike information criterion, Akaike 1973) and BIC (Bayesian information criterion, Schwarz 1978) have been developed. It is known (see, e.g., Fujikoshi 1985; Nishii et al. 1988) that in a large-sample framework, AIC is not consistent, but BIC is consistent. On the other hand, in multivariate regression model, it has been shown (Fujikoshi et al. 2014; Yanagihara et al. 2015) that in a high-dimensional framework, AIC has consistency properties under some conditions, but BIC is not necessarily * Yasunori Fujikoshi fujikoshi_y@yahoo.co.jp 1 Department of Mathematics, Graduate School of Science, Hiroshima University, Kagamiyama, Higashi Hiroshima, Hiroshima , Japan 2 School of General and Management Studies, Suwa University of Science, Toyohira, Chino, Nagano , Japan Vol.:( )

2 consistent. In our discriminant model, there are methods based on misclassification errors by McLachlan (1976), Fujikoshi (1985), Hyodo and Kubokawa (2014), Yamada et al. (2017) for a high-dimensional case as well as a large-sample case. It is known (see, e.g., Fujikoshi 1985) that two methods by misclassification error rate and AIC are asymptotically equivalent under a large-sample framework. In our discriminant model, Sakurai et al. (2013) derived an asymptotically unbiased estimator of the risk function for a high-dimensional case. On the other hand, the selection methods are based on the minimization of the criteria, and become computationally onerous when the dimension is large. Though some stepwise methods have been proposed, their optimality is not known. For high-dimensional data, Lasso and other regularization methods have been extended. For such study, see, e.g., Clemmensen et al. (2011), Witten and Tibshirani (2011), Hao et al. (2015), etc. In this paper, we propose a test-based method based on significance test of each variable, which is useful for high-dimensional data as well as large-sample data. Such an idea is essentially the same as in Zhao et al. (1986). Our criterion involves a constant term which should be determined by point of some optimality. We propose a class of constants satisfying a consistency when the dimension and the sample size are large. For the case, when the dimension is larger than the sample size, a regularized method is numerically examined. Our results and tendencies therein are explored numerically through a Monte Carlo simulation. The remainder of the present paper is organized as follows. In Sect. 2, we present the relevant notation and the test-based method. In Sect. 3, we derive sufficient conditions for the test-based criterion to be consistent under a high-dimensional case. In Sect. 4, we study the test-based criterion through a Monte Carlo simulation. In Sect. 5, we propose the ridge-type criteria, whose consistency properties are numerically examined. In Sect. 6, conclusions are offered. All proofs of our results are provided in the Appendix. 2 Test based method In two-group discriminant analysis, suppose that we have independent samples y (i) 1,, y(i) n of y =(y i 1,, y p ) from p-dimensional normal distributions Π i N p (μ (i), Σ), i = 1, 2. Let Y be the total sample matrix defined by Y =(y (1) 1,, y(1) n, y (2) 1 1,, y(2) n ). 2 The coefficients of the population discriminant function are given by β = Σ 1 (μ (1) μ (2) )=(β 1,, β p ). (1) Let Δ and D be the population and the sample Mahalanobis distances defined by Δ = { (μ (1) μ (2) ) Σ 1 (μ (1) μ (2) ) } 1 2, and D = { (ȳ (1) ȳ (2) ) S 1 (ȳ (1) ȳ (2) ) } 1 2,

3 respectively. Here, ȳ (1) and ȳ (2) are the sample mean vectors, and S be the pooled sample covariance matrix based on n = n 1 + n 2 samples. Suppose that j denotes a subset of ω ={1,, p} containing p j elements, and y j denotes the p j vector consisting of the elements of y indexed by the elements of j. We use the notation D j and D ω for D based on y j and y ω (= y), respectively. Let M j be a variable selection model defined by M j β i 0 if i j, and β i = 0 if i j. (2) The Model M j is equivalent to Δ j = Δ ω, i.e., the Mahalanobis distance based on y j is the same as the one based on the full set of variables y. We identify the selection of M j with the selection of y j. Let AIC j be the AIC for M j. Then, it is known (see, e.g., Fujikoshi 1985) that A j = AIC j AIC ω { = n log 1 + g2 (D 2 ω D2 j ) } 2(p p n 2 + g 2 D 2 j ), j (3) where g = (n 1 n 2 ) n. Similarly, let BIC j be the BIC for M j, and we have B j = BIC j BIC ω { = n log 1 + g2 (D 2 ω D2 j ) } (log n)(p p n 2 + g 2 D 2 j ). j (4) In a large-sample framework, it is known (see, Fujikoshi 1985; Nishii et al. 1988) that AIC is not consistent, but BIC is consistent. On the other hand, the variable selection methods based on AIC and BIC are given as min j AIC j and min j BIC j, respectively. Therefore, such criteria become computationally onerous when p is large. To circumvent this issue, we consider a test-based method (TM) drawing on the significance of each variable. A critical region for β i = 0 based on the likelihood ratio principle is expressed (see, e.g., Rao 1973; Fujikoshi et al. 2010) as { T d,i = n log 1 + g2 (D 2 ω D2 ( i) ) } d > 0, (5) n 2 + g 2 D 2 ( i) where ( i), i = 1,, p is the subset of ω ={1,, p} obtained by omitting the i from ω, and d is a positive constant which may depend on p and n. Note that T 2,i > 0 AIC ( i) AIC ω > 0. We consider a test-based method for the selection of variables defined by selecting the set of suffixes or the set of variables given by TM d ={i ω T d,i > 0}, (6) or {y i {y 1,, y p } T d,i > 0}. The notation ĵ TMd is also used for TM d. This idea is closely related to an approach considered by Zhao et al. (1986) and Nishii et al.

4 (1988), who used a selection method based on comparison with IC ( i) and IC ω for a general information criterion. In general, if d is large, a small number of variables is selected. On the other hand, if d is small, a large number of variables is selected. Ideally, we want to select only the true variables whose discriminant coefficients are not zero. For a test-based method, there is an important problem how to decide the constant term d. Nishii et al. (1988) have used a special case with d = log n. They noted that under a largesample framework, TM log n is consistent. However, we note that TM log n will be not consistent for a high-dimensional case, through a simulation experiment. We propose a class of d satisfying a high-dimensional consistency including d = n in Sect Consistency of TM d under high dimensional framework For studying consistency of the variable selection criterion TM d, it is assumed that the true model M is included in the full model. Let the minimum model including M be M j. For a notational simplicity, we regard the true model M as M j. Let be the entire suite of candidate models, that is = {{1},, {p}, {1, 2},, {1,, p}}. A subset j of ω is called overspecified model if j includes the true model. On the other hand, a subset j of ω is called underspecified model if j does not include the true model. Then, is separated into two sets, one is a set of overspecified models, i.e., + ={j j j} and the other is a set of underspecified models, i.e., = + c. It is said that a model selection criterion ĵ has a high-dimensional consistency if Here, we list some of our main assumptions: A1 (The true model): M j. A2 (The high-dimensional asymptotic framework): p, n, p n c (0, 1), n i n k i > 0, (i = 1, 2). For the dimensionality p of the true model and the Mahalanobis distance Δ, first the following case is considered: A3 : p is finite, and Δ 2 = O(1). For the constant d of test-based statistic T d,i in (5), we consider the following assumptions: B1 : d n 0. B2 : h d n 1 (n p 3) > 0, and h = O(n a ), where 0 < a < 1. lim Pr(ĵ = j )=1. p n c (0,1)

5 A consistency of TM d in (6) for some d > 0 shall be shown along the following outline: In general, we have TM d = j T d,i > 0 for i j, and T d,i 0 for i j Therefore ( ) P(TM d = j )=P T d,i > 0 T d,i < 0 i j i j ( ) = 1 P T d,i 0 T d,i 0 i j i j 1 i j P(T d,i 0) i j P(T d,i 0). We shall consider to show [F1] i j P(T d,i 0) 0. (7) [F2] i j P(T d,i 0) 0, (8) [F1] denotes the probability such that the true variables are not selected. [F2] denotes the probability, such that the non-true variables are selected. The square of Mahalanobis distance of y is decomposed as a sum of the squares of Mahalanobis distance of y ( i) and the conditional Mahalanobis distance of y {i} given y ( i) as follows: Δ 2 = Δ 2 ( i) + Δ2 {i} ( i). (9) When i j, ( i) + and hence Related to consistency of TM d under assumption A3, we consider the following assumption: A4 : For i j, lim(n 1 n 2 n 2 )Δ 2 {i} ( i) Δ 2 {i} ( i) = Δ2 Δ 2 ( i) > 0. { 1 +(n 1 n 2 n 2 )Δ 2 ( i)} 1 > 0. Note that A4 is satisfied if lim Δ 2 {i} ( i) > 0 and lim Δ2 ( i) > 0, under Δ2 = O(1).

6 Theorem 1 Suppose that assumptions A1, A2, A3 and A4 are satisfied. Then, the test-based method TM d is consistent if B1 and B2 are satisfied. Let d = n r, where 0 < r < 1. Then, h = O(n (1 r) ), and the condition B2 is satisfied. Therefore, the test-based criteria with d = n 3 4, n 2 3, n 1 2, n 1 3 or n 1 4 have a high-dimensional consistency. Among of them, we have numerically seen that the one with d = n has a good behavior. Note that TC 2 and TC log n do not satisfy B2. As a special case, under the assumptions of Theorem 1, we have seen that the probability of selecting overspecified models tends to zero, since P(i TM d )= P(T d,i 0) 0. i j i j The proof given there is applicable also to the case replaced assumption A3 by assumption A5: A5 : p = O(p), and Δ 2 = O(p). In other words, such a property holds regardless of whether the dimension of j is finite or not. Furthermore, it does not depend on the order of the Mahalanobis distance. Related to consistency of TM d under assumption A5, we consider the following assumption: A6 : For i j, θ 2 i = Δ 2 {i} ( i) { Δ 2 ( i)} 1 = O(p b 1 ), 0 < b < 1. Theorem 2 Suppose that assumptions A1, A2, A5 and A6 are satisfied. Then, the test-based criterion TM d is consistent if d = n r,0< r < 1 and r < b, (3 4)(1 + δ) < b are satisfied, where δ is any small positive number. From our proof, it is conjectured that the condition 3 4 < b can be replaced by 1 2 < b. From a practical point, it shall be natural that b is small, and then, the sufficiency condition requires r 1 2. Theorem 3 Suppose that assumption; A1 and p; fixed, n i n k i > 0 (i = 1, 2), d, d n 0 are satisfied. Then, the test-based criterion TM d is consistent when n tends to infinity.

7 From Theorem 3, we can see that TM log n and TM n are consistent under a large-sample framework. However, TM 2 does not satisfy the sufficient condition. 4 Numerical study In this section, we numerically explore the validity of our claims through three test-based criteria, TM 2, TM log n, and TM n. Note that TM n satisfies sufficient conditions B1 and B2 for its consistency, but TM 2 and TM log n do not satisfy them. The true model was assumed as follows: the true dimension is j = 3, 6, the true mean vectors: μ 1 = α(1,, 1, 0,,0), μ 2 = α( 1,, 1, 0,,0), where α = 1, 2, 3, and the true covariance matrix Σ = I p. The selection rates associated with these criteria are given in Tables 1, 2, and 3. Under, True, and Over denote the underspecified models, the true model, and the overspecified models, respectively. We focused on selection rates for 10 3 replications in Tables 1, 2, and for 10 2 replications in Table 3. From Table 1, we can identify the following tendencies. The selection probabilities of the true model by TM 2 are relatively large when the dimension is small as in the case p = 5. However, the values do not approach to 1 as n increases, and it seems that TM 2 has no consistency under a large-sample case. The selection probabilities of the true model by TM log n are near to 1, and has a consistency under a large-sample case. However, the probabilities are decreasing as p increases, and so will not have a consistency in a high-dimensional case. The selection probabilities of the true model by TM n approach 1 as n is large, even if p is small. Furthermore, if p is large, but under a high-dimensional framework such that n is also large, then it has a consistency. However, the probabilities decrease as the ratio p / n approaches 1. As the quantity α presenting a distance between two groups becomes large, the selection probabilities of the true model by TM 2 and TM log n increase in a largesample size as in the case p = 5. However, the effect becomes small when p is large. On the other hand, the selection probabilities of the true model by TM n increase to a certain extend both in large-sample and high-dimensional cases. In Table 2, we examine the case, where the dimension p of the true model is larger than the one in Table 1. The following tendencies can be identified. As is the case with Table 1, TM 2 and TM log n are not consistent as p increases, but TM n is consistent. In general, the probability of selecting the true model decreases as the dimension of the true model is large.

8 Table 1 Selection rates of TM 2, TM log n and TM n for p = 3 n 1 n 2 p TM 2 TM log n TM n Under True Over Under True Over Under True Over p = 3, α = p = 3, α = p = 3, α = In Table 3, we examine the case, where the dimension p of the true model is relatively large, and especially in the case of p = p 4. The following tendencies can be identified. When p = p 4, the consistency of TM 2 and TM log n can not be seen. The consistency of TM n can be seen when n is large.

9 Table 2 Selection rates of TM 2, TM log n and TM n for p = 6 n 1 n 2 p TM 2 TM log n TM n Under True Over Under True Over Under True Over p = 6, α = p = 6, α = p = 6, α = Table 3 Selection rates of TM 2, TM log n and TM n for p = p 4 n 1 n 2 α TM 2 TM log n TM n Under True Over Under True Over Under True Over p = 100, p = p = 200, p = ,

10 5 Ridge type methods When p > n 2, S becomes singular, and so we cannot use the TM d. One way to overcome this problem is to use the ridge-type estimator of Σ defined by Σ λ = 1 n{ (n 2)S + λip }, (10) where λ =(n 2)(np) 1 trs. The estimator was used in multivariate regression model, by Kubokawa and Srivastava (2012), and Fujikoshi and Sakurai (2016), etc. The numerical experiment was done for j = 3, μ 1 = α(1, 1, 1, 0,,0), μ 2 = α( 1, 1, 1, 0,,0), Σ = I p, where α = 1, 2, 3. We focused on selection rates for 10 2 replications in Table 4. From Table 4, we can identify the following tendencies. TM 2 does not have consistency. On the other hand, it seems that TM log n and TM n have consistency when the dimension p and the total sample size n are separated. Recently, it has been proposed to use a ridge-type estimator for Σ 1 by Ito and Kubokawa (2015) and Van Wieringen and Peeters (2016). By the use of such estimators, it is expected that the consistency property will be improved, but this is left as a future work. 6 Concluding remarks In this paper, we propose a test-based method (TM) for the variable selection problem, based on drawing on the significance of each variable. The method involves a constant term d, and is denoted by TM d. When d = 2 and d = log n, the corresponding TM s are related to AIC and BIC, respectively. However, the usual model selection criteria such as AIC and BIC need to examine all the subsets. However, TM d need not to examine all the subsets, but need to examine only the p subsets ( i), i = 1,, p. This circumvents computational complexities associated Table 4 Selection rates of TM 2, TM log n and TM n for p = p 4 α = 1 TM 2 TM log n TM n n 1 n 2 p Under True Over Under True Over Under True Over

11 with AIC and BIC has been resolved. Furthermore, it was identified that TM d has a high-dimensional consistency property for some d including d = n, when (i) p is finite and Δ 2 = O(1), and (ii) p is infinite and Δ 2 = O(p), When the dimension is larger than the sample size, we propose the ridge-type methods. However, a study of their theoretical property is left as a future work. In discriminant analysis, it is important to select a set of variables, such that it minimizes the expected probability of misclassification. When we use the linear discriminant function with cutoff point 0, its expected probability of misclassification is R = ρ 1 P(2 1)+ρ 2 P(1 2), where P(2 1) is the probability of putting a new observation into Π 2, although it is known from Π 1. Similarly, P(1 2) is the probability of putting a new observation into Π 1, although it is known from Π 2. An estimator is used as a measure of selection of variables. Usually, the prior probabilities ρ i are unknown, so we use an estimator with ρ 1 = ρ 2 = 1 2 as a conventional approach. Then, we have a variable selection method based on an estimator when a subset of variables, y j is used. For example, under a large-sample asymptotic framework is used, which is given (see, e.g., Fujikoshi 1985) by R j = Φ(Q j ), where Φ is the distribution function of the standard normal distribution: Q j = 1 2 D j (p 1){ n n 1 } 2 D 1 j +(32m) 1 D j {4(4p 1) D 2 j m = n 1 + n 2 2, and D j is the sample Mahalanobis distance based on y j. For an estimator under a large-sample asymptotic framework, see, e.g., Fujikoshi (2000), Hyodo and Kubokawa (2014), and Yamada et al. (2017). A usual variable selection method is to select a subset minimizing Q j. However, such a method has no consistency as in Fujikoshi (1985), and becomes computationally onerous when the dimension is large. On the other hand, along the same line as in the test-based method, we can propose a variable selection method defined by selecting the set of suffixes given by MM d = { i ω Q i Q ω > d }, where d is a constant, though the problem of looking for an optimum d is left as a future subject. Furthermore, as being pointed by one of the reviewers, the variable selection method introduced in this paper should be compared with the methods based on estimators of misclassification. Acknowledgements We thank two referees for careful reading of our manuscript and many helpful comments which improved the presentation of this paper. The first author s research is partially supported by the Ministry of Education, Science, Sports, and Culture, a Grant-in-Aid for Scientific Research (C), 16K00047, }, (11)

12 Appendix: Proofs of Theorems 1, 2 and 3 Preliminary lemmas First, we study distributional results related to the test statistics T d,i in (5). For a notational simplicity, consider a decomposition of y =(y 1, y 2 ), y 1 ; p 1 1, y 2 ; p 2 1. Similarly, decompose β =(β 1, β 2 ), and S = ( S11 S 12 S 21 S 22 ), S 12 ; p 1 p 2. Let λ be the likelihood ratio criterion for testing a hypothesis β 2 = 0, then { 2 log λ = n log 1 + g2 (D 2 D 2 1 ) }, n 2 + g 2 D 2 1 (12) where g = { (n 1 n 2 ) n } 1 2. The following lemma (see, e.g., Fujikoshi et al. 2010) is used. Lemma 1 Let D 1 and D be the sample Mahalanobis distances based on y 1 and y, respectively. Let D = D2 D 2. Similarly, the corresponding population quantities 1 are expressed as Δ 1, Δ and Δ 2. Then, it holds that (1) D 2 1 =(n 2)g 2 R, R = χ 2 p (g 2 Δ {χ ) 2 n p 1 1} ( ){ (2) D =(n 2)g 2 χ 2 p g 2 Δ (1 χ R n p 1} + R). (3) g2 (D 2 D 2 1 ) = χ 2 (g 2 Δ 2 n 2 + g 2 D 2 p (1 + R) 1 ){χ 2 n p 1 } 1 1 Here, χ 2 ( ), χ 2 p 1 n p 1 1, χ 2 ( ), and χ 2 are independent Chi-square variates. p 2 n p 1 Related to the conditional distribution of the right-hand side of (3) with p 2 = 1 and m = n p 1 in Lemma 1, consider the random variable defined by V = χ 2 1 (λ2 ) χ 2 m 1 + λ2 m 2, (13) where χ 2 1 (λ2 ) and χm 2 are independent. We can express V as V = U 1 U 2 +(m 2) 1 U 1 +(1 + λ 2 )U 2, in terms of the centralized variables U 1 and U 2 defined by U 1 = χ 2 1 (1 + λ2 ) (1 + λ 2 ), U 2 = 1 1 χm 2 m 2. (14) (15)

13 It is well known (see, e.g., Tiku 1985) that E(U 1 )=0, E(U 2 1 )=2(1 + 2λ2 ), E(U 3 1 )=8(1 + 3λ2 ), E(U 4 1 )=48(1 + 4λ2 )+4(1 + 3λ 2 ) 2. Furthermore E ( { ( U k ) k 1 2 = kc i E = i=0 k k C i i=1 χ 2 m ) } i ( 1 ) k i m 2 1 (m 2) (m 2i) ( 1 ) k i ( + 1 ) k. m 2 m 2 These give the first four moments of V. In particular, we use the following results. Lemma 2 Let V be the random variable defined by (14). Suppose that λ 2 = O(m). Then Proof of Theorem 1 E(V) =0, E(V 2 )= 2(m 3 2λ2 + λ 4 ) (m 2) 2 (m 4) E(V 3 )=O(m 2 ), E(V 4 )=O(m 2 ). First, we show [F1] 0. Let i j. Then, ( i) +, and hence Using (12) and Lemma 1(3) { T d,i = n log Δ 2 ( i) <Δ2, Δ 2 {i} ( i) > χ 2 1 (g2 Δ 2 {i} ( i) (1 + R i ) 1 ) χ 2 n p 1 = O(m 1 ), } d, where R i = χ 2 p 1 (g2 Δ 2 ( i) ) {χ 2 n p} 1. Here, since j is finite, by showing T d,i p ti > 0 or T d,i p, we obtain P(T d,i 0) 0, and hence, [F1] 0. It is easily seen that R i p + g2 Δ 2 ( i), n p

14 where g = {(n 1 n 2 ) n} 1 2 and means asymptotically equivalent, and hence (1 + R i ) 1 n p. n + g 2 Δ 2 ( i) Therefore, we obtain ( 1 n T d,i lim log 1 + g2 Δ 2 ) {i} ( i) > 0, n + g 2 Δ 2 ( i) which implies our assertion. Next, consider to show [F2] 0. For any i j, Δ 2 = Δ 2. Therefore, using ( i) Lemma 1(3), we have ( ) T d,i = n log 1 + χ 2 1 χ 2 n p 1 whose distribution does not depend on i. Here, χ 2 1 and χ 2 are independent Chisquare variates with 1 and n p 1 degrees of freedom. This implies n p 1 that d, (16) Noting that E[χ 2 1 χ 2 n p 1 ]=(n p 3) 1, let Then, since e d n 1 1 n p 3 > h T d,i > 0 χ 2 1 > e d n 1. χ 2 n p 1 U = χ χ 2 n p 3. n p 1 ( ) P(T d,i > 0) =P U > e d n 1 1 n p 3 P(U > h). Furthermore, using Markov inequality, we have P(T d,i > 0) P( U > h) h 2l E(U 2l ), l = 1, 2, Furthermore, it is easily seen that E(U 2l )=O(n 2l ), using, e.g., Theorem in Fujikoshi et al. (2010). When h = O(n a ) h 2l E(U 2l )=O(n 2(1 a)l ).

15 Choosing l such that l > (1 a) 1, we have [F2] 0. Proof of Theorem 2 First, note that in the proof of [F2] 0 in Theorem 1, Assumption A3 is not used. This implies the assertion [F2] 0 in Theorem 2. Now, we consider to show [F1] 0 when p = O(p) and Δ 2 = O(p). In this case, p tends to. Based on the proof in Theorem 1, we can express T d,i for i j as { T d,i = n log 1 + χ 2 1 ( λ 2 i ) } d, χ 2 n p 1 1. where λ 2 = g 2 Δ 2 i {i} ( i) (1 + R i ) 1 and R i = χ 2 p 1 (g2 Δ {χ 2 ( i) n p} ) 2 Note that χ 2 1 and χ 2 n p 1 are independent of R i, and hence of λ 2. Then, we have i P(T d,i 0) =P( V ĥ), (17) where V = χ 2 1 ( λ 2 i ) 1 + λ 2 i χ 2 n p 3, n p 1 ĥ = e d n 1 (1 + λ 2 i ) (n p 3). Considering the conditional distribution of the right-hand side in (17), we have } P( V ĥ) =E λ {Q( λ 2i ), 2 i (18) where Q(λ 2 i )=P( V ĥ λ 2 i = λ 2 i ) = P(Ṽ h). Here Using Assumption A6, it can be seen that Ṽ = χ 2 1 (λ2 i ) 1 + λ2 i χ 2 n p 3, n p 1 h = e d n 1 (1 + λ 2 ) (n p 3). i λ 2 i (1 c)c 1 θ 2 i p λ2 i0, and λ 2 i = O(p b ).

16 Now, we consider the probability P(Ṽ h) when λ 2 = λ 2. From assumption r < b, i i0 for large n, h < 0. Therefor, for large n, we have P(Ṽ h) P( Ṽ h ) h 4 E(Ṽ 4 ). From Lemma 2, E(Ṽ 4 )=O(n 2 ). Noting that h = O(n (1 b) ), we have h 4 E(Ṽ 4 )=O(n 4(1 b) 2 ), whose order is O(n (1+3δ) ) if we choose b as b > (3 4)(1 + δ). Therefore, we have P(T d,i 0) =O(n (1+3δ) ), which implies [F1] 0. Proof of Theorem 3 The assertion [F1] 0 follows from the proof of [F1] 0 in Theorem 1. For a proof of [F2] 0, it is enough to show that for i j, T d,i. since p has been fixed. From (16), the limiting distribution of T d,i is χ 2 d. This 1 implies [F2] 0. References Akaike, H. (1973). Information theory and an extension of themaximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), 2nd International Symposium on Information Theory (pp ). Budapest: Akadémiai Kiadó. Clemmensen, L., Hastie, T., Witten, D. M., & Ersbell, B. (2011). Sparse discriminant analysis. Technometrics, 53, Fujikoshi, Y. (1985). Selection of variables in two-group discriminant analysis by error rate and Akaike s information criteria. Journal of Multivariate Analysis, 17, Fujikoshi, Y. (2000). Error bounds for asymptotic approximations of the linear discriminant function when the sample size and dimensionality are large. Journal of Multivariate Analysis, 73, Fujikoshi, Y., & Sakurai, T. (2016). High-dimensional consistency of rank estimation criteria in multivariate linear model. Journal of Multivariate Analysis, 149, Fujikoshi, Y., Ulyanov, V. V., & Shimizu, R. (2010). Multivariate statistics: high-dimensional and largesample approximations. Hobeken, NJ: Wiley. Fujikoshi, Y., Sakurai, T., & Yanagihara, H. (2014). Consistency of high-dimensional AIC-type and C p -type criteria in multivariate linear regression. Journal of Multivariate Analysis, 144, Hao, N., Dong, B. & Fan, J. (2015). Sparsifying the Fisher linear discriminant by rotation. Journal of the Royal Statistical Society: Series B, 77, Hyodo, M., & Kubokawa, T. (2014). A variable selection criterion for linear discriminant rule and its optimality in high dimensional and large sample data. Journal of Multivariate Analysis, 123, Ito, T. & Kubokawa, T. (2015). Linear ridge estimator of high-dimensional precision matrix using random matrix theory. Discussion Paper Series, CIRJE-F-995. Kubokawa, T., & Srivastava, M. S. (2012). Selection of variables in multivariate regression models for large dimensions. Communication in Statistics-Theory and Methods, 41, McLachlan, G. J. (1976). A criterion for selecting variables for the linear discriminant function. Biometrics, 32,

17 Nishii, R., Bai, Z. D., & Krishnaia, P. R. (1988). Strong consistency of the information criterion for model selection in multivariate analysis. Hiroshima Mathematical Journal, 18, Rao, C. R. (1973). Linear statistical inference and its applications (2nd ed.). New York: Wiley. Sakurai, T., Nakada, T., & Fujikoshi, Y. (2013). High-dimensional AICs for selection of variables in discriminant analysis. Sankhya, Series A, 75, Schwarz, G. (1978). Estimating the dimension od a model. Annals of Statistics, 6, Tiku, M. (1985). Noncentral chi-square distribution. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of Statistical Sciences, vol. 6 (pp ). New York: Wiely. Van Wieringen, W. N., & Peeters, C. F. (2016). Ridge estimation of inverse covariance matrices from high-dimensional data. Computational Statistics & Data Analysis, 103, Witten, D. W., & Tibshirani, R. (2011). Penalized classification using Fisher s linear discriminant. Journal of the Royal Statistical Society: Series B, 73, Yamada, T., Sakurai, T. & Fujikoshi, Y. (2017). High-dimensional asymptotic results for EPMCs of W- and Z- rules. Hiroshima Statistical Research Group, Yanagihara, H., Wakaki, H., & Fujikoshi, Y. (2015). A consistency property of the AIC for multivariate linear models when the dimension and the sample size are large. Electronic Journal of Statistics, 9, Zhao, L. C., Krishnaiah, P. R., & Bai, Z. D. (1986). On determination of the number of signals in presence of white noise. Journal of Multivariate Analysis, 20, Publisher s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Tetsuro Sakurai and Yasunori Fujikoshi Center of General Education, Tokyo University

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,

More information

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan.

Department of Mathematics, Graduate School of Science Hiroshima University Kagamiyama, Higashi Hiroshima, Hiroshima , Japan. High-Dimensional Properties of Information Criteria and Their Efficient Criteria for Multivariate Linear Regression Models with Covariance Structures Tetsuro Sakurai and Yasunori Fujikoshi Center of General

More information

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables

A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables Ryoya Oda and Hirokazu Yanagihara Department of Mathematics,

More information

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large Tetsuji Tonda 1 Tomoyuki Nakagawa and Hirofumi Wakaki Last modified: March 30 016 1 Faculty of Management and Information

More information

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension

Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3

More information

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration

A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration A high-dimensional bias-corrected AIC for selecting response variables in multivariate calibration Ryoya Oda, Yoshie Mima, Hirokazu Yanagihara and Yasunori Fujikoshi Department of Mathematics, Graduate

More information

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition

Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

An Unbiased C p Criterion for Multivariate Ridge Regression

An Unbiased C p Criterion for Multivariate Ridge Regression An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University

More information

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions

Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions TR-No. 14-07, Hiroshima Statistical Research Group, 1 12 Comparison with RSS-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 2, Mariko Yamamura 1 and Hirokazu Yanagihara 2 1

More information

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification TR-No. 13-08, Hiroshima Statistical Research Group, 1 23 A Study on the Bias-Correction Effect of the AIC for Selecting Variables in Normal Multivariate Linear Regression Models under Model Misspecification

More information

Modfiied Conditional AIC in Linear Mixed Models

Modfiied Conditional AIC in Linear Mixed Models CIRJE-F-895 Modfiied Conditional AIC in Linear Mixed Models Yuki Kawakubo Graduate School of Economics, University of Tokyo Tatsuya Kubokawa University of Tokyo July 2013 CIRJE Discussion Papers can be

More information

STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION

STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION STRONG CONSISTENCY OF THE AIC, BIC, C P KOO METHODS IN HIGH-DIMENSIONAL MULTIVARIATE LINEAR REGRESSION AND By Zhidong Bai,, Yasunori Fujikoshi, and Jiang Hu, Northeast Normal University and Hiroshima University

More information

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples

Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Expected probabilities of misclassification in linear discriminant analysis based on 2-Step monotone missing samples Nobumichi Shutoh, Masashi Hyodo and Takashi Seo 2 Department of Mathematics, Graduate

More information

Likelihood Ratio Tests in Multivariate Linear Model

Likelihood Ratio Tests in Multivariate Linear Model Likelihood Ratio Tests in Multivariate Linear Model Yasunori Fujikoshi Department of Mathematics, Graduate School of Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi Hiroshima, Hiroshima 739-8626,

More information

Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

More information

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions

Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions c 215 FORMATH Research Group FORMATH Vol. 14 (215): 27 39, DOI:1.15684/formath.14.4 Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 1,

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification

Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification Jackknife Bias Correction of the AIC for Selecting Variables in Canonical Correlation Analysis under Model Misspecification (Last Modified: May 22, 2012) Yusuke Hashiyama, Hirokazu Yanagihara 1 and Yasunori

More information

A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion

A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion TR-No. 17-07, Hiroshima Statistical Research Group, 1 24 A Fast Algorithm for Optimizing Ridge Parameters in a Generalized Ridge Regression by Minimizing an Extended GCV Criterion Mineaki Ohishi, Hirokazu

More information

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE J Jaan Statist Soc Vol 34 No 2004 9 26 ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE Yasunori Fujikoshi*, Tetsuto Himeno

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University

More information

EXPLICIT EXPRESSIONS OF PROJECTORS ON CANONICAL VARIABLES AND DISTANCES BETWEEN CENTROIDS OF GROUPS. Haruo Yanai*

EXPLICIT EXPRESSIONS OF PROJECTORS ON CANONICAL VARIABLES AND DISTANCES BETWEEN CENTROIDS OF GROUPS. Haruo Yanai* J. Japan Statist. Soc. Vol. 11 No. 1 1981 43-53 EXPLICIT EXPRESSIONS OF PROJECTORS ON CANONICAL VARIABLES AND DISTANCES BETWEEN CENTROIDS OF GROUPS Haruo Yanai* Generalized expressions of canonical correlation

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Testing Block-Diagonal Covariance Structure for High-Dimensional Data MASASHI HYODO 1, NOBUMICHI SHUTOH 2, TAKAHIRO NISHIYAMA 3, AND TATJANA PAVLENKO 4 1 Department of Mathematical Sciences, Graduate School

More information

On Autoregressive Order Selection Criteria

On Autoregressive Order Selection Criteria On Autoregressive Order Selection Criteria Venus Khim-Sen Liew Faculty of Economics and Management, Universiti Putra Malaysia, 43400 UPM, Serdang, Malaysia This version: 1 March 2004. Abstract This study

More information

The Role of "Leads" in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji

The Role of Leads in the Dynamic Title of Cointegrating Regression Models. Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji he Role of "Leads" in the Dynamic itle of Cointegrating Regression Models Author(s) Hayakawa, Kazuhiko; Kurozumi, Eiji Citation Issue 2006-12 Date ype echnical Report ext Version publisher URL http://hdl.handle.net/10086/13599

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Discrepancy-Based Model Selection Criteria Using Cross Validation

Discrepancy-Based Model Selection Criteria Using Cross Validation 33 Discrepancy-Based Model Selection Criteria Using Cross Validation Joseph E. Cavanaugh, Simon L. Davies, and Andrew A. Neath Department of Biostatistics, The University of Iowa Pfizer Global Research

More information

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem T Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem Toshiki aito a, Tamae Kawasaki b and Takashi Seo b a Department of Applied Mathematics, Graduate School

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,

More information

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control

On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control On the conservative multivariate multiple comparison procedure of correlated mean vectors with a control Takahiro Nishiyama a a Department of Mathematical Information Science, Tokyo University of Science,

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION

A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION J. Japan Statist. Soc. Vol. 32 No. 2 2002 165 181 A NOTE ON ESTIMATION UNDER THE QUADRATIC LOSS IN MULTIVARIATE CALIBRATION Hisayuki Tsukuma* The problem of estimation in multivariate linear calibration

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC A Simple Approach to Inference in Random Coefficient Models March 8, 1988 Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC 27695-8203 Key Words

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Obtaining simultaneous equation models from a set of variables through genetic algorithms

Obtaining simultaneous equation models from a set of variables through genetic algorithms Obtaining simultaneous equation models from a set of variables through genetic algorithms José J. López Universidad Miguel Hernández (Elche, Spain) Domingo Giménez Universidad de Murcia (Murcia, Spain)

More information

Large Sample Theory For OLS Variable Selection Estimators

Large Sample Theory For OLS Variable Selection Estimators Large Sample Theory For OLS Variable Selection Estimators Lasanthi C. R. Pelawa Watagoda and David J. Olive Southern Illinois University June 18, 2018 Abstract This paper gives large sample theory for

More information

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan

Ken-ichi Kamo Department of Liberal Arts and Sciences, Sapporo Medical University, Hokkaido, Japan A STUDY ON THE BIAS-CORRECTION EFFECT OF THE AIC FOR SELECTING VARIABLES IN NOR- MAL MULTIVARIATE LINEAR REGRESSION MOD- ELS UNDER MODEL MISSPECIFICATION Authors: Hirokazu Yanagihara Department of Mathematics,

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED by W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand Email: bob.reed@canterbury.ac.nz

More information

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison

More information

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables

Decomposition of Parsimonious Independence Model Using Pearson, Kendall and Spearman s Correlations for Two-Way Contingency Tables International Journal of Statistics and Probability; Vol. 7 No. 3; May 208 ISSN 927-7032 E-ISSN 927-7040 Published by Canadian Center of Science and Education Decomposition of Parsimonious Independence

More information

AR-order estimation by testing sets using the Modified Information Criterion

AR-order estimation by testing sets using the Modified Information Criterion AR-order estimation by testing sets using the Modified Information Criterion Rudy Moddemeijer 14th March 2006 Abstract The Modified Information Criterion (MIC) is an Akaike-like criterion which allows

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection

Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Bootstrap Variants of the Akaike Information Criterion for Mixed Model Selection Junfeng Shang, and Joseph E. Cavanaugh 2, Bowling Green State University, USA 2 The University of Iowa, USA Abstract: Two

More information

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco

More information

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors

On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors On the conservative multivariate Tukey-Kramer type procedures for multiple comparisons among mean vectors Takashi Seo a, Takahiro Nishiyama b a Department of Mathematical Information Science, Tokyo University

More information

Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality

Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality Estimation of multivariate rd moment for high-dimensional data and its application for testing multivariate normality Takayuki Yamada, Tetsuto Himeno Institute for Comprehensive Education, Center of General

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria

Outlier detection in ARIMA and seasonal ARIMA models by. Bayesian Information Type Criteria Outlier detection in ARIMA and seasonal ARIMA models by Bayesian Information Type Criteria Pedro Galeano and Daniel Peña Departamento de Estadística Universidad Carlos III de Madrid 1 Introduction The

More information

arxiv: v1 [math.st] 2 Apr 2016

arxiv: v1 [math.st] 2 Apr 2016 NON-ASYMPTOTIC RESULTS FOR CORNISH-FISHER EXPANSIONS V.V. ULYANOV, M. AOSHIMA, AND Y. FUJIKOSHI arxiv:1604.00539v1 [math.st] 2 Apr 2016 Abstract. We get the computable error bounds for generalized Cornish-Fisher

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Akaike Information Criterion

Akaike Information Criterion Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

All models are wrong but some are useful. George Box (1979)

All models are wrong but some are useful. George Box (1979) All models are wrong but some are useful. George Box (1979) The problem of model selection is overrun by a serious difficulty: even if a criterion could be settled on to determine optimality, it is hard

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY

UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY J. Phys. Earth, 34, 187-194, 1986 UNBIASED ESTIMATE FOR b-value OF MAGNITUDE FREQUENCY Yosihiko OGATA* and Ken'ichiro YAMASHINA** * Institute of Statistical Mathematics, Tokyo, Japan ** Earthquake Research

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

A Variant of AIC Using Bayesian Marginal Likelihood

A Variant of AIC Using Bayesian Marginal Likelihood CIRJE-F-971 A Variant of AIC Using Bayesian Marginal Likelihood Yuki Kawakubo Graduate School of Economics, The University of Tokyo Tatsuya Kubokawa The University of Tokyo Muni S. Srivastava University

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Bootstrap for model selection: linear approximation of the optimism

Bootstrap for model selection: linear approximation of the optimism Bootstrap for model selection: linear approximation of the optimism G. Simon 1, A. Lendasse 2, M. Verleysen 1, Université catholique de Louvain 1 DICE - Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium,

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments

Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments CIRJE-F-466 Approximate Distributions of the Likelihood Ratio Statistic in a Structural Equation with Many Instruments Yukitoshi Matsushita CIRJE, Faculty of Economics, University of Tokyo February 2007

More information

AIC Scores as Evidence a Bayesian Interpretation. Malcolm Forster and Elliott Sober University of Wisconsin, Madison

AIC Scores as Evidence a Bayesian Interpretation. Malcolm Forster and Elliott Sober University of Wisconsin, Madison AIC Scores as Evidence a Bayesian Interpretation Malcolm Forster and Elliott Sober University of Wisconsin, Madison Abstract: Bayesians often reject the Akaike Information Criterion (AIC) because it introduces

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information