On testing the equality of mean vectors in high dimension

Size: px
Start display at page:

Download "On testing the equality of mean vectors in high dimension"

Transcription

1 ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at On testing the equality of mean vectors in high dimension Muni S. Srivastava Abstract. In this article, we review various tests that have been proposed in the literature for testing the equality of several mean vectors. In particular, it includes testing the equality of two mean vectors, the so-called two-sample problem as well as that of testing the equality of several mean vectors, the so-called multivariate analysis of variance or MANOVA problem. The total sample size, however, may be less than the dimension of the mean vectors, and so usual tests cannot be used. Powers of these tests are compared using simulation. 1. Introduction In this article, we review various tests that have been proposed in the literature for testing the equality of several mean vectors. We begin with the comparison of the mean vectors of two groups with independently distributed p-dimensional observation vectors x ij and the mean vectors µ i and the covariance matrices Σ i, i = 1, 2, j = 1,... N i. The total number of p-dimensional observation vectors, N = N 1 + N 2 is less than p. Various tests have been proposed in the literature. For normally distributed observation vectors, and when Σ 1 = Σ 2, a test has been proposed by Dempster [3]. Bai and Saranadasa [1] proposed another test which does not require the assumption of normality but have asymptotically the same power as the one proposed by Dempster [3]. Srivastava [7] proposed a Hotelling s T 2 type test, denoted by T +2, by using Moore Penrose inverse of S in place of S 1 as n < p, n = N 1 + N 2 2. It may be noted that all the above three tests, namely, T D, T BS and T +2 are invariant under the group of orthogonal matrices. A test that is invariant under the group of non-singular p p diagonal Received January 12, Mathematics Subject Classification. 62H10, 62H15. Key words and phrases. Equality of two mean vectors, high dimensional, multivariate analysis of variance, sample smaller than dimension, inequality of two covariance matrices. 31

2 32 MUNI S. SRIVASTAVA matrices has recently been proposed by Srivastava and Du [9] and Srivastava [8]. It may be noted that this test is not invariant under the transformation by orthogonal matrices. The power comparison of these tests will be given in this article. Bai and Saranadasa s [1] test as well as the test proposed by Srivastava and Du [9] can be generalized to the case when the covariance matrices Σ 1 and Σ 2 of the two groups are not equal; for testing the equality of two covariance matrices, see Srivastava and Yanagihara [13]. The generalized versions of these two tests have been considered by Chen and Qin [2], and Srivastava, Katayama and Kano [11]. For testing the equality of the mean vectors of several groups, the socalled multivariate analysis of variance or simply MANOVA, it is assumed that all the groups have the same covariance matrix. Under the assumption that (p/n) c, c (0, ), Fujikoshi, Himeno, and Wakaki [4] and Schott [5] have given a generalized version of Dempster [3] and Bai Saranadasa [1] two-sample tests for the MANOVA problem. Tests that do not require the above assumption have been proposed by Srivastava and Fujikoshi [10]. The above two tests considered by Fujikoshi et al. [4], Schott [5], Srivastava and Fujikoshi [10] require the assumption of normality to obtain the asymptotic distributions of these statistics. Following Srivastava [8], it can, however, be shown that these two tests are robust under a general non-normal distributions. A third test based on the Moore Penrose Inverse of the sample covariance matrix has been proposed by Srivastava [7] under the assumption of normality. For comparison of the asymptotic powers of these three tests, see Srivastava and Fujikoshi [10]. The above three tests are, however, not invariant under the transformation by non-singular diagonal matrices. A test that has this property has been recently proposed by Yamada and Srivastava [14] under normality assumption. The organization of this article is as follows. Since it is assumed that N < p, we show in Section 2 that there does not exist a test that is invariant under the transformation of the observation vector by any p p non-singular matrix. Thus, we shall be considering tests that are invariant under smaller groups. Two such groups are the group of orthogonal matrices and the group of non-singular diagonal matrices. Such tests for the equality of two mean vectors will be given in Section 4 for the non-normal model described in Section 3. In Section 5, the problem of testing the equality of two mean vectors is considered again only under normality but the covariance matrices of the two groups are not equal. The MANOVA problem will be considered in Section 6. The paper concludes in Section 7.

3 ON TESTING THE EQUALITY OF MEAN VECTORS Consequences of N < p, Σ non-singular In this section, we show that no invariant test exists under the transformation of the observation vectors by a non-singular matrix. Let be two sample points. Let X = (x 1, x 2,..., x N ) : p N, N < p, X = (x 1, x 2,..., x N) : p N, N < p Z = (X, X 1 ) and Z = (X, X 1), where X 1 and X1 are p (p N) arbitrary matrices chosen such that Z and Z are non-singular (n.s.). Thus Hence I p = (Z ) 1 Z = (Z ) 1 (X, X 1) = [(Z ) 1 X, (Z ) 1 X 1] (Z ) 1 X = = ( ) IN 0. 0 I p N ( ) ( ) IN, Z(Z 0 ) 1 X IN = (X, X 1 ) = X, 0 X = AX, A = Z(Z ) 1, where A is n.s. Thus, for two sample points, X and X, on the sample space, there exists a non-singular matrix taking one point to the other. That is the whole sample space is a single orbit; group of non-singular transformations is transitive. Hence, no invariant test under the group of non-singular transformations exists. Clearly then, we need to consider smaller group of transformations, namely invariance under the group of orthogonal matrices and invariance under the group of non-singular diagonal matrices. Thus, tests have been proposed in the literature that are invariant under orthogonal group or invariant under the group of non-singular diagonal matrices. The latter tests appear to perform better than the ones that are invariant under orthogonal group. 3. A non-normal model To show that the two-sample tests and tests in multivariate analysis of variance (MANOVA) hold when the observation vectors are not necessarily normally distributed, we consider a general model for the independently distributed p-dimensional vectors x ij, j = 1,..., N i, i = 1, 2. We call this model as Model M which we describe next.

4 34 MUNI S. SRIVASTAVA Model M In this model, we assume that the observation vectors x ij satisfy (a), (b), and (c) given below: (a) x ij = µ i + F i u ij, j = 1,..., N i, i = 1, 2, Cov(x ij ) = Σ i = Fi 2, i = 1, 2, [ p ] (b) E k=1 uν k ijk = p k=1 E(uν k ijk ), u ij = (u ij1,..., u ijp ), ν k 0, ν ν 4 4, ν i are integers, (c) E(u 4 ijk ) = K 4 + 3, K 4 2. It may be noted that F i is the unique p p positive definite matrix such that Σ i = F i F i. The results, however, holds for a general factorization of Σ i = C i C i where C i is a p p non-singular matrix (see Srivastava [8]). For simplicity of presentation we have, however, chosen Σ i = Fi 2. Bai and Saranadasa [1] have chosen a model in which Σ i = Γ i Γ i, Γ i is a p m matrix, m p; in our model m = p. However, the assumptions to prove the asymptotic normality of the test statistic proposed by Bai and Saranadasa [1] under their model are much stronger than for Model M, see Section Two sample tests: Σ 1 = Σ 2 Let x 11,..., x 1N1 be independent and identically distributed (i.i.d.) vectors with p-variate normal distribution N p (µ 1, Σ 1 ), and x 21,..., x 2N2 be i.i.d. N p (µ 2, Σ 2 ), where both samples are independently distributed. The sample mean vectors are, respectively, given by x 1 = 1 N 1 x ij, x 2 = 1 N 2 x 2j, N 1 N 2 j=1 j=1 The sample covariance matrices are, respectively, given by S 1 = 1 N 1 (x 1j x 1 )(x 1j x 1 ), n 1 = N 1 1, n 1 j=1 S 2 = 1 N 2 (x 2j x 2 )(x 2j x 2 ), n 2 = N 2 1. n 2 j=1 When Σ 1 = Σ 2 = Σ, an unbiased estimator of Σ is given by S = n 1S 1 + n 2 S 2, n = n 1 + n 2 = N 1 + N 2 2. n

5 ON TESTING THE EQUALITY OF MEAN VECTORS Dempster s test. For testing the hypothesis H : µ 1 = µ 2 versus A : µ 1 µ 2, four tests have been proposed in the literature by Dempster [3], Bai and Saranadasa [1], Srivastava [7] and Srivastava and Du [9]. The first three tests are invariant under orthogonal transformation and the fourth one is invariant under the transformation by any p p non-singular diagonal matrix. We begin with the test proposed by Dempster [3] under the assumption of normality. In the notation of Section 4, Dempster s test statistic is given by ( 1 T D = + 1 ) 1 ( x 1 x 2 ) ( x 1 x 2 )/(tr S). (4.1) N 1 N 2 Let G be an N N orthogonal matrix given by [ ( ) ] 1 (N G = 1 N 2 /N 1 N) N1, g N (N 1 /N 2 N) 1 3,..., g N, N = N 1 + N 2, 2 1 N2 where GG = G G = I N, g i g i = 1, i = 3,..., N, and g i g j = 0, i j. Let X = (x 11,..., x 1N1, x 21,..., x 2N2 ) : p N be the p N matrix of the observation vectors from the two groups, and Then XG = (y 1,..., y N ). E(y 1 ) = N 1 2 (N1 µ 1 + N 2 µ 2 ), E(y 2 ) = ( 1 N N 2 ) 1 2 (µ1 µ 2 ), E(y j ) = 0, j = 3,... N. Hence, we can write T D in terms of (N 1) independent random vectors y 2,..., y N as T D = ny 2 y 2 y 3 y y N y, n = N 1 + N 2 2. (4.2) N If Σ = γ 2 I p, then y 2,..., y N are i.i.d. N p (0, γ 2 I p ) under µ 1 = µ 2. Hence, T D F p,np. It may be noted that when Σ = γ 2 I p and under the assumption of normality Dempster s test T D is uniformly most powerful among all tests whose power depends on µ µ/γ 2. To obtain the distribution of T D when Σ γ 2 I p, it is assumed that under the null hypothesis y i y i, i = 2,..., p, are independently distributed as mχ 2 r, where m > 0, r > 0 are scalar unknown quantities. Clearly, the distributions of T D will not depend on m and thus, we need to find only r. Dempster

6 36 MUNI S. SRIVASTAVA gave two iterative methods to obtain an estimate of r. Bai and Saranadasa [1] obtained an experssion for r as follows: E(y iy i ) = tr Σ = me(χ 2 r) = mr and Hence, V ar(y iy i ) = 2tr Σ 2 = V ar(mχ 2 r) = 2m 2 r. r = (tr Σ) 2 /(tr Σ 2 ). Bai and Saranadasa [1], however, did not provide any estimator of r, as they have provided only ratio consistent estimator of tr Σ 2 which cannot be used here. An estimator of r is obtained as follows. Let b = (a 2 1/a 2 ), where a i = tr Σ i /p, i = 1, 2. It may be noted that from Cauchy Schwarz inequality 0 < (tr Σ) 2 p tr Σ 2, the equality on the right hand side holds if and only if Σ = γ 2 I p, γ > 0. Hence, 0 < b 1, and r p. Under the assumption of normality of the observation vectors and assuming that 0 < lim p a i <, i = 1,..., 4, it has been shown by Srivastava [6] that â 1 = tr S p, â 2 = 1 p [tr S2 1 n (tr S)2 ], are consistent estimators of a i. Thus, ˆb = (â 2 1 /â 2) is a consistent estimator of b giving ˆr = pˆb, and T D F [ˆr],[nˆr]. Thus, under the hypothesis that µ 1 = µ 2, Dempster s T D statistic for normally distributed observations is approximately distributed as an F- distribution with [ˆr] and [nˆr] degrees of freedom, where [a] denotes the largest integer value a. Clearly then the hypothesis of equality of two mean vectors is rejected if T D > F [ˆr][nˆr],1 α, the 100(1 α)% point of the F-distribution with degrees of freedom mentioned above. The asymptotic power of the T D test has been derived by Bai and Saranadasa [1] who proposed another test and showed that the asymptotic power of the T D test is the same as the one proposed by them. This test will be called Bai Saranadasa test in our discussion which we describe next.

7 ON TESTING THE EQUALITY OF MEAN VECTORS Bai Saranadasa test. Consider an asymptotic version of Dempster s statistic. The mean of the numerator is given by ( ) 1 [ ] E ( x 1 x 2 ) ( x 1 x 2 ) N 1 N 2 ( 1 = + 1 ) 1 [ ] E( x N 1 N 1 x 1 2 x 1 x 2 + x 2 x 2 ) 2 ( 1 = + 1 ) 1 [ tr Σ + µ N 1 N 2 N 1µ 1 2µ 1µ 2 + tr Σ ] + µ 1 N 2µ 2 2 = tr Σ + τ(µ 1 µ 2 ) (µ 1 µ 2 ), τ = N 1 N 2 /(N 1 + N 2 ). So the numerator may be estimated by [ ( ) 1 ( x 1 x 2 ) ( x 1 x 2 ) tr S]. N 1 N 2 The asymptotic variance of this estimator is 2tr Σ 2 + o(1). Under the assumption that lim(p/n) = c, 0 < c < and λ max (Σ) = o(p 1 2 ), Bai and Saranadasa [1] showed that as (n, p), (tr Σ 2 ) 1 [ tr S 2 1 n (tr S)2 ] p 1. (4.3) Thus, using the ratio consistent estimator of (tr Σ 2 ) given in (4.3), Bai and Saranadasa [1] proposed the statistic ( ) N1 N2 ( x1 x 2 ) ( x 1 x 2 ) tr S T BS = 2[tr S 2 1 n (tr S)2 ] (4.4) for testing the hypothesis that µ 1 = µ 2. They also showed that under the hypothesis T BS is normally distributed with mean zero and variance 1 for a general model described in Section 3 that includes the normal model as a special case. To obtain the asymptotic distribution of T BS under the alternative hypothesis A = µ 1 µ 2, it is assumed that the difference between the two mean vectors satisfy the following conditions: ( ) 1 1 (µ 1 µ 2 ) Σ(µ 1 µ 2 ) = o(tr Σ 2 ), (4.5) N N 2 (p/n) c > 0 and λ max (Σ) = o( tr Σ 2 ). (4.6) Under the conditions (4.5) and (4.6), Bai and Saranadasa [1] showed that the asymptotic power of Dempster s test T D and Bai and Saranadasa s test

8 38 MUNI S. SRIVASTAVA T BS are equal and is given by ( β(t D ) = β(t BS ) = Φ z 1 α + N 1N 2 (N 1 + N 2 ) ) (µ 1 µ 2 ) (µ 1 µ 2 ), (4.7) 2pa2 where Φ is the distribution function of the standard normally distributed random variable with mean 0 and variance 1, Φ(z 1 α ) = 1 α, 0 < α < 1. It may be recalled that both tests T D as well as T BS are one-sided tests: Reject H : µ 1 = µ 2, if T D > z 1 α ; similarly reject H if T BS > z 1 α Srivastava s T +2 test. Using Moore Penrose of S, Srivastava [7] proposed the statistic ( 1 T +2 = + 1 ) 1 ( x 1 x 2 ) S + ( x 1 x 2 ), (4.8) N 1 N 2 where S + is the Moore Penrose inverse of S, which is unique and satisfies the following conditions: S + SS + = S +, SS + S=S, SS + and S + S are symmetric matrices. Let S = 1 n V, n = N 1 + N 2 2. It can be shown that the T +2 statistic is invariant under the transformation x ij cγx ij, c 0, ΓΓ = I p. Thus, without loss of generality, we may assume that Σ = Λ, a diagonal matrix. Note that S + = nv + = nh L 1 H, HH = I n, where L = diag (l 1,..., l n ), the non-zero eigenvalues of V. Thus, ( 1 T +2 = n + 1 ) 1 ( x 1 x 2 ) H L 1 H( x 1 x 2 ). (4.9) N 1 N 2 Let where ( 1 z = + 1 ) 1 2 AH( x1 x 2 ), N 1 N 2 A = (HΛH ) 1 2. Hence, under the hypothesis that µ 1 = µ 2, T +2 = nz (ALA) 1 z, z N n (0, I n ). (4.10) In order to obtain the asymptotic distribution of T +2, it is assumed that 0 < lim p a i <, i = 1,..., 4, where a i = (tr Σ i /p). Under the above assumption it has been shown in Srivastava [7], that as p, p 1 ALA p bi n, where b = (a 2 1 /a 2). Thus, under the hypothesis, as p, bpt +2 d z z, n

9 ON TESTING THE EQUALITY OF MEAN VECTORS 39 where d stands for in distribution. Since, under the hypothesis z z has a chi-square distribution with n degrees of freedom denoted by χ 2 n (bp/n)t +2 is asymptotically distributed as χ 2 n when p. Although b is unknown, a consistent estimator of b is given by ˆb = (â 2 1 /â 2) as p and n go to infinity. Thus, approximately (ˆbp/n)T +2 is distributed as χ 2 n and the hypothesis is rejected if (ˆbp/n)T +2 > χ 2 n,1 α, (4.11) where P (χ 2 n < χ 2 n,1 α ) = 1 α. Alternatively, we may consider the asymptotic distribution of (ˆbp/n)T +2 as p, and then n. That is, we consider the standardized statistic T + S = (ˆbp/n)T +2 n 2n (4.12) which is asymptotically distributed as standard normal. Thus, the hypothesis is rejected if T + S > z 1 α, where Φ(z 1 α ) = 1 α. It may be noted that for moderate sample size, the approximate distribution in (4.11) may give a better approximation than (4.12). To obtain the asymptotic power of T + S test, we consider local alternative in which, (µ 1 µ 2 ) = (τn) 1 2 δ, (4.13) where δ is fixed, n = N 1 +N 2 2, and τ = N 1 N 2 /(N 1 +N 2 ). The asymptotic power of the T + S statistic is given by ] ( n ) 1 β(t + S [ z ) = Φ 2 τ(µ 1 α + 1 µ 2 ) Λ(µ 1 µ 2 ). (4.14) 2 pa 2 It may be noted that all the above three tests, namely, T D, T BS and T +2 or equivalently T + S are invariant under the transformations x ij aγx ij, a 0, ΓΓ = I p. It has been shown by Bai and Saranadasa [1] that the tests T D and T BS have same asymptotic power given in (4.8). Comparing it with the power of T + S given in (4.13), we find that the test T + S may be preferred if ( ) 1 n 2 θ Λθ > θ θ, θ = (µ pa 1 µ 2 ) (4.15) 2 For example, if θ N p (0, Λ), then on the average (4.15) implies that ( ) 1 n 2 tr Λ 2 > tr Σ, that is pa 2 n > (a 2 1/a 2 2)p = bp. (4.16)

10 40 MUNI S. SRIVASTAVA Since a 2 1 < a 2, many such n exist for large p. Similarly, if θ =( λ 1,..., λ p ), where λ 1 > > λ p are the eigenvalues of Σ, the same inequality given in (4.16) is obtained Srivastava Du test. All the above three tests are invariant under the transformations x ij aγx ij, a 0, ΓΓ = I p, but not invariant under the transformations x ij Dx ij, where D is a p p non-singular diagonal matrix, D = diag (d 1,... d p ). This implies that change of unit of measurements will affect all the above three statistics. Srivastava and Du [9] proposed a statistic that is invariant under the transformation by any p p non-singular diagonal matrix. This test statistic is given by where T SD = ( 1 N1 + 1 N2 ) 1 ( x1 x 2 ) D 1 S [ { 2 tr ˆR } ] 1 2 ( p2 n ) 2 C p,n ( x 1 x 2 ) p, ˆR = D S SD 2 S, D S = diag (S 11,..., S pp ), S = (S ij ), ( tr C p,n = 1 + ˆR ) 2 p 1, as (n, p). p 3 2 Under normality assumptions this test was proposed by Srivastava and Du [9]. It has been shown to be robust by Srivastava [8]. Let Σ = (σ ij ) = D 1 2 σ RD 1 2 σ, (4.17) where D 1 2 σ = diag (σ ,..., σ 1 2 pp ). Then R is the correlation matrix which can be estimated by ˆR defined above. In order to obtain the distribution of T SD, Srivastava and Du [9] assumed the following: ( ) tr R i (i) 0 < lim <, i = 1,..., 4, (4.18) p p (ii) lim p max 1 i p λ ip p = 0, (4.19) where λ 1p,..., λ pp are the eigenvalues of the correlation matrix R. Under the hypothesis of equality of two mean vectors, T SD has asymptotically standard

11 ON TESTING THE EQUALITY OF MEAN VECTORS 41 normal distribution. Under local alternative defined by µ 1 µ 2 = (τn) 1 2 δ, τ = N1 N 2 /(N 1 + N 2 ), n = N 1 + N 2 2, where δ is fixed, the asymptotic distribution of the statistic T SD is given by [ ] lim P (T SD > z 1 α ) = Φ z 1 α + τ(µ 1 µ 2 ) Dσ 1 (µ 1 µ 2 ). (n,p) 2tr R 2 It may be noted that Yamada and Srivastava [14] have shown that the condition (4.19) is not needed in deriving the asymptotic distribution of the statistic T SD. Srivastava and Du [9] showed theoretically that under the condition δ δ 0 < lim p p = lim p δ Dσ 1 δ tr Dσ 1 < (4.20) the T SD test is superior to T BS test. It may be noted that the condition (4.20) is satisfied for δ = (δ 1,..., δ p ) in which δ i = δ 0, i = 1,..., p, that is, all the components of the random vector have the same mean. In the next two sections, we compare the attained significance level and power of the two robust tests, namely T D and T SD through simulation Attained significance level (ASL). To compare the three tests, we need to define the attained significance levels and the empirical powers. Let z 1 α be a 100(1 α)% quantile of the asymptotic null distribution of the test statistic T which is N(0,1) in our case; thus z 1 α is the 100(1 α)% quantile of N(0, 1). With m replications of the data set simulated under the null hypothesis, the ASL is ˆα = (# of t H z 1 α ), m = 10, 000, α = 0.05, m where t H represents the values of the test statistic T based on the data sets simulated under the null hypothesis. ˆα is approximately distributed as Binomial(10000, 0.05) and has the standard deviation estimated by ŝe(ˆα) = /10, For simplicity of presentation, we shall consider one-sample case in which we test that the mean vector is zero Empirical power. To compute the empirical powers, we shall use the empirical critical points. Specifically, we first simulate m replications of the data set under the null hypothesis, then select the (mα) th largest value of the test statistic as the empirical critical point, denoted as ẑ 1 α, that is, the 100(1 α)% quantile of the empirical null distribution of the test statistic

12 42 MUNI S. SRIVASTAVA obtained from the m replications. Then another m replications of the data set are simulated under the alternative with the given choice of µ. The empirical power is calculated by ˆβ = (# of t A ẑ 1 α ). m Table 1. Attained significance levels of T SD and T D under the null hypothesis, when R = I p and R = R 1, respectively D σ = I p D σ = D σ,1 D σ = D σ,2 p N T SD T D T SD T D T SD T D R = I p R = R

13 ON TESTING THE EQUALITY OF MEAN VECTORS 43 Table 2. Empirical powers of T SD and T D under the alternative hypothesis, when R = I p and R = R 1, respectively D σ = I p D σ = D σ,1 D σ = D σ,2 p N T SD T D T SD T D T SD T D R = I p R = R Parameter selection: one-sample case. We consider both independent correlation structures R = I p = diag (1, 1,... 1) and equal correlation structure R = R 1 = (ρ ij ) : ρ ij = 0.25, i j. We also consider different scalar matrix D σ = diag (σ 11,..., σ pp ). We select D σ = I p, D σ = D σ,1 : σ ,..., σ 1 2 iid iid pp Unif(2, 3) and D σ = D σ,2 : σ 11,..., σ pp χ 2 3. For the alternative hypothesis, we choose µ = ν = (ν 1,... ν p ) : ν 2k 1 = 0 iid and ν 2k Unif( 1 2, 1 2 ), k = 1,..., p 2.

14 44 MUNI S. SRIVASTAVA 5. Two sample tests: Σ 1 Σ 2 In this section, we consider the problem of testing the equality of the mean vectors of two groups when the covariance matrices of the two groups are not equal. For normally distributed observation vectors, the equality of two covariance matrices can be ascertained using a test proposed by Srivastava and Yanagihara [13]. And if it is found that the covariance matrices of the two groups are not equal, the tests given in this section should be used. We begin with a test proposed by Chen and Qin [2] Chen Qin test statistic. In the notation of Section 4, this test statistic is given by T cq = 1 N1 N 1 n 1 i j x 1i x 1j + 1 N2 N 2 n 2 [ 2 N 1 n 1 tr Σ N 2 n 2 tr Σ i j x 2i x 2j 2 x 1 x 2 ] 1, 2 tr Σ 1 Σ 2 N 1 n 2 where tr Σ 2 i = 1 N i tr (x ij x N i n i i(j,k) )x ij(x ik x i(jk) x ik ), i = 1, 2, j k tr Σ 1 Σ 2 = 1 N 1 N 2 tr (x 1j x N 1 N 2 1(j) )x 1j(x 2k x 2(k) )x 2k, j=1 k=1 x i(j,k) = 1 N i 2 (N i x i x ij x ik ), i = 1, 2; j, k = 1,..., N i, x i(k) = 1 n i (N i x i x ik ), i = 1, 2; k = 1,....N i. In order to derive the asymptotic distribution of the statistic T cq, Chen and Qin [2] made the following assumption. Assumption (A) A(1) x ij = Γ i z ij + µ i, j = 1,..., N i, i = 1, 2, where each Γ i is a p m matrix for some m p such that Γ i Γ i = Σ i, and {z ij } N i j=1 are m-variate independent and identically distributed random vectors satisfying E(z ij ) = 0, Cov(z ij ) = I m, and for z ij = (z ij1,..., z ijm ), it is assumed that E(zijk 4 ) = K <, and E(z α 1 ijl 1, z α 2 ijl 2,..., z αq ijl q ) = q r=1 E(zαr ijl r ), q r=1 α r 8, and l 1 l 2 l q,

15 ON TESTING THE EQUALITY OF MEAN VECTORS 45 tr Σ A(2) lim i Σ j Σ k Σ l p = 0, i, j, k, l = 1 or 2, {tr (Σ 1 +Σ 2 )} 2 A(3) N(µ 1 µ 2 ) Σ i (µ 1 µ 2 ) = o { tr (Σ 1 + Σ 2 ) 2}. Chen and Qin (2010) proved the asymptotic normality of T cq under the hypothesis that µ 1 = µ 2 and under the Assumptions (A1) (A2) given above. They also obtained the asymptotic power under Assumptions (A1) (A3). It is given by ( ) lim P 1(T cq > z 1 α ) = Φ z 1 α + (µ 1 µ 2 ) (µ 1 µ 2 ) (N 1,N 2,p) σn,p 2, where σ 2 n,p = 2 N 1 n 1 tr (Σ 2 1) + 2 N 2 n 2 tr (Σ 2 2) + 4 N 1 N 2 tr (Σ 1 Σ 2 ). It may be noted that σn,p 2 is the variance of the numerator of T cq which can also be estimated by using the usual consistent estimators of tr Σ 2 i /p and tr Σ 1 Σ 2 /p, i = 1, 2. Most importantly note that the numerator of T cq, 1 N 1 n 1 N 1 i j x 1ix 1j + 1 N 2 n 2 N 2 i j x 2ix 2j 2x 1x 2 = ( x 1 x 2 ) ( x 1 x 2 ) 1 N 1 tr S 1 1 N 2 tr S 2, which is identical to the one obtained by generalizing Bai and Saranadasa test to the case when Σ 1 Σ 2, and much simpler to compute. Thus, Srivastava, Katayama and Kano [11] proposed a simpler test given in the next subsection in which a different consistent estimator of the variance of the numerator is used A simpler test than T cq. Chen Qin test T cq has rather complicated expressions and takes much longer time in computing with no apparent advantage in terms of power than the corresponding simpler test T 2 = p 1 2 â 2i = 1 p [ ( x 1 x 2 ) ( x 1 x 2 ) N 1 1 tr S 1 N 1 2 tr S 2 [ 2N 2 1 â21 + 2N2 2 ] â22 + 4(N 1 N 2 p) tr S 1 S 2 [tr S 2i 1ni (tr S i ) 2 ], n i = N i 1, i = 1, 2. The test T 2 has the same distribution as T cq. The power and ASL for both tests, are indistinguishable, as seen in the attached tables obtained from simulation. ],

16 46 MUNI S. SRIVASTAVA 5.3. Srivastava Katayama Kano test. It may be noted that the tests T cq and T u are both invariant under the transformation by an orthogonal matrix but not invariant under the transformation by any non-singular diagonal matrix. In this subsection we describe a test which is invariant under the group. Let S i = (S ijk ), ˆD i = diag (S i11,..., S ipp ), i = 1, 2, ˆD = N1 1 ˆD 1 + N2 1 ˆD 2, ˆR = ˆD 1 2 (N 1 1 S 1 + N2 1 S 2) ˆD 1 2 = (rij ), q n = [( x 1 x 2 ) ˆD 1 ( x 1 x 2 ) p]. Let σ 2 (q n ) = V ar(q n ) = 2trR 2, where R is the population version of the sample correlation matrix ˆR. Then the statistic proposed by Srivastava, Katayama and Kano [11] is given by T 1 = q n, n [C p,nˆσ 2 (q n )] 1 i = N i 1, i = 1, 2, 2 where tr ˆR 2 C p,n = 1 + p 2 1, as (n, p), and [ ˆσ 2 (q n ) = 2 (tr ˆR 2 ) 1 n 1 N1 2 (tr ˆD 1 S 1 ) 2 1 ] n 2 N2 2 (tr ˆD 1 S 2 ) 2. It can be shown that under the Assumptions (B1) (B3) stated below, ˆσ 2 (q n ) is a ratio consistent estimator of σ 2 (q n ): namely [ˆσ 2 (q n )/σ 2 p (q n )] 1. Assumption (B) (B1) 0 < c 1 < min 1 k p σ ikp < max 1 k p σ ikk < c 2 < uniformly for all p, (B2) lim p [tr R 4 /(tr R 2 ) 2 ] = 0, (B3) (N 1 /N) k (0, 1) as N = N 1 + N 2, (B4) N m = O(p δ ), δ > 1 2, N m = min(n 1, N 2 ), (B5) (µ 1 µ 2 ) D 1 2 RD 1 2 (µ 1 µ 2 ) = o(tr R 2 ),

17 ON TESTING THE EQUALITY OF MEAN VECTORS 47 where R = D 1 2 (N 1 1 Σ 1 + N2 1 2)D 1 2, Σi = (σ ijk ), i = 1, 2, D = (N N2 1 2), D i = diag (σ i11,..., σ ipp ). It can be shown that when µ 1 = µ 2, and under the Assumptions (B1) (B4), we get the following theorem. Theorem 5.1. Under the Assumptions (B1) (B4), when µ 1 = µ 2, P 0 {T 1 z 1 α } Φ(z 1 α ) as (N 1, N 2, p). Srivastava, Katayama and Kano [11] obtained this result assuming normality of the observation vectors. To obtain the distribution under the alternative hypothesis, we choose the local alternative given in (B5), and obtain the following theorem. Theorem 5.2. Under the Assumption (B), the distribution of the statistic T 1 under the local alternative (B5) is given by P 1 {T 1 < z 1 α } Φ { z 1 α + (µ 1 µ 2 ) D 1 (µ 1 µ 2 ) 2tr R Comparison of T cq, T 1 and T 2 tests: simulation. The parameters of the simulation are given by }. N 1 = N 2 = N/2, µ 1 = µ 2 = 0 for ASL, µ 1 = 0, µ 2 = (u 1,..., u p ) T, u i i.i.d. U( 1 2, 3 2 ), Σ 1 = diag (d 1,..., d p ) R 1 diag (d 1,..., d p ), Σ 2 = diag (ψ 1,..., ψ p ) R 2 diag (ψ 1,..., ψ p ), d i = 2 + p i + 1, ψi 2 i.i.d. χ 2 p 3, R 1 = (r ij ) with r ij = ( 1) i+j (0.2) i j 0.1, R 2 = (ρ ij ) with ρ ij = ( 1) i+j (0.4) i j 0.1.

18 48 MUNI S. SRIVASTAVA Table 3. ASL and power for normal distribution: 1000 simulations ASL Power p N 1 = N 2 T 1 T cq T 2 T 1 T cq T Table 4. ASL and power for χ 2 8 : 10,000 simulations p N 1 = N 2 ASL T 1 ASL T 2 power T 1 power T

19 ON TESTING THE EQUALITY OF MEAN VECTORS 49 distribution, 1,000 simula- Table 5. ASL and power: χ 2 32 tions p N 1 = N 2 ASL T 1 ASL T 2 power T 1 power T Multivariate analysis of variance (MANOVA) The problem of testing the equality of (q + 1) mean vectors, q 2, is a special case of the multivariate regression model Y = XΘ + UΛ 1 2, Y = (y1,..., y N ) : N p, U = (u 1,..., u N ) : N p, (u i1,..., u ip ) = u i, where y 1,..., y N are independently distributed p-dimensional observation vectors, X : N k of rank k, matrix of constants, Θ : k p, matrix of parameters, Rather than assuming normality of y i, or equivalently of u i, we shall assume that E(u i ) = 0, Cov(u i ) = I p, E(u 4 ik ) = K 4 + 3, (6.1) Cov(y i ) = Λ = Λ 1 2 Λ 1 2 = (λij ). We also assume that for ν k 0, Σ p k=1 ν k = 4, [ p ] p E = k=1 u ν k ik k=1 E(u ν k ik ). (6.2)

20 50 MUNI S. SRIVASTAVA Under the assumptions (6.1) and (6.2), we consider the problem of testing the hypothesis H : CΘ = 0 versus A : CΘ 0, where C is a q k, matrix with q k, and rank(c) = q. Let where B = Y GY = (C ˆΘ) [C(X X) 1 C ] 1 C ˆΘ, G = X(X X) 1 C [C(X X) 1 C ] 1 C(X X) 1 X, ˆΘ = (X X) 1 X Y, S = n 1 Y (I N H)Y, H = X(X X) 1 X, n = N k. Then, under the hypothesis CΘ = 0, E(B) = qσ, and thus B measures the departure from the hypothesis. On the other hand, E(S) = Σ, irrespective of whether the hypothesis is true or not. Thus, a test is usually constructed by comparing in some manner B with S. Since n < p, the inverse of S does not exist, and thus the likelihood ratio test which uses the inverse of S does not exist. Fujikoshi, Himeno and Wakaki [4] proposed a test by generalizing Dempster s two-sample test, and Srivastava [7] proposed a test using Moore Penrose inverse of S. Srivastava and Fujikoshi [10] and Schott [5] proposed a test by generalizing Bai and Saranadasa s two sample test. These tests will be described in the following subsection Generalization of Dempster s test. Assume (p/n) c (0, ), (6.3) T D = [ ] tr B p tr S q, â 2 = p 1 [tr S 2 1 n (tr S)2 ], ˆσ D 2 = 2q â2 â 2, â 1 = 1 (tr S). p Then, Fujikoshi, Himeno and Wakaki [4] showed that under the hypothesis H, (6.1), and normality, ( T D /ˆσ D ) N(0, 1). Distribution under the alternative hypothesis is also given Generalization of Bai Saranadasa test. Let T 2 = p 1 2 [tr B qtr S] 2qâ2. (6.4) This test was proposed by Srivastava and Fujikoshi [10], and Schott [5]. Under the hypothesis that CΘ = 0, the asymptotic distribution of T 2 is

21 ON TESTING THE EQUALITY OF MEAN VECTORS 51 normal with mean 0 and variance 1. To obtain the asymptotic distribution when CΘ 0, we consider local alternatives. For this, let (CΘ) [C(X X) 1 C ] 1 (CΘ) = MM, (6.5) where M is a p q matrix of rank q < n, when CΘ 0. We shall assume that q is finite and tr ΛMM lim = 0. (6.6) p p Then, Srivastava and Fujikoshi [10] have shown that the power of the T 2 test is given by ( ) tr MM β(t 2 ) Φ z 1 α +, (6.7) 2pqa2 for local alternatives satisfying (6.6), and finite q Srivastava s test. Let V = ns, V + be its Moore Penrose inverse, and d 1,..., d q be the eigenvalues of BV +, q < n. Then Srivastava [7] proposed the satistic q U + = (1 + d i ) 1 = I + BV + 1. (6.8) i=1 When Σ = γ 2 I or Σ is of rank r n, Srivastava [7] obtained distributions of chi-square types, similar to the likelihood ratio test. It may be noted that the hypothesis Σ = γ 2 I can be tested by a test proposed by Srivastava [6] which has been shown to be robust under some departure from normality by Srivastava, Kollo, and von Rosen [12]. For general Σ, however, he showed that under the null hypothesis CΘ = 0, lim p (n,p) [ pˆb log U + qn 2qn < z 1 α ] = Φ(z 1 α ). (6.9) Srivastava and Fujikoshi [10] considered more generalized form of T D and T 2 and obtained the distribution without the condition (6.3) which was required by Schott [5]. Next we give the asymptotic distribution of the statistic U + under local alternatives given by M = n 1 2, (6.10) where is O(1). The asymptotic power of the test based on U + is given by [ ] β(u + ) = lim lim P pˆb log U + qn 1 < z 1 α n p 2qn = lim n lim p Φ [ z 1 α + n tr ΛMM pa 2 2qn ].

22 52 MUNI S. SRIVASTAVA The U + test should be preferred over T 2 if tr ΛMM > pa 2 2qn tr MM n = 2pqa2 ( pa2 ) 1 2. n For example, if M = (m 1,..., m q ), and m i = ( λ 1,..., λ p ) i = 1,..., q, where λ 1,..., λ p are the eigenvalues of Σ given in the same order as the matrix Λ consisting of the eigenvalues of Σ, then the test U + should be preferred if ( ) a 2 n < p 1 = pb. a 2 Since 0 < b 1, many such n exist for large p. It may be noted that the U + test assumes normality and has yet to be generalized for non-normal model such as Model M. Thus, it is not included in the tables of comparison by simulation. These tests are invariant under the group of orthogonal transformations but not invariant under p p non-singular diagonal matrices. A test that is invariant under this transformation has been proposed by Yamada and Srivastava [14] assuming normality of the observation vectors. We describe this test in the next Subsection Invariant test statistics. As mentioned above, TD and T 2 tests are not invariant under non-singular diagonal matrices. Thus, Yamada and Srivastava [14] proposed the statistic ) pq ( tr BD 1 S n n 2 T 1 = [2c p,n q(tr R 2 n 1 p 2 )] 1 2, R = D S SD 2 S under normality, where c p,n = 1 + (tr R 2 /p 3 2 ). The asymptotic distribution under the null hypothesis is standard normal and under local alternatives similar to (6.6), the asymptotic power of the T 1 -test is given by β(t 1 ) Φ [ z 1 α + tr MM D 1 Σ 2q tr R 2 ], where R = D Σ ΣD 2 Σ and D Σ = diag (σ 11,..., σ pp ), σ 11,..., σ pp being the diagonal elements of Σ. For details see Yamada and Srivastava [14], where a theoretical comparison between T 1 and T 2 similar to Srivastava and Du [9] is also given. In the next section, we compare the power of T 1 -test with that of T 2 -test by simulation.

23 ON TESTING THE EQUALITY OF MEAN VECTORS Power comparison by simulation. For simulation, we consider the problem of testing the equality of 3 mean vectors, that is, k = q + 1 = 3 and q = 2, where N 1 = N 2 = N 3 = N, and the cases of (N, p) = (10, 40), (20, 80), (30, 120) and (40, 200) are treated. Note that n = N 1 + N 2 + N 3 k = 3(N 1). For testing the inequality of the three mean vectors, we write Θ = (µ 1, µ 2, µ 3 ) : 3 p, ( ) ( ) µ 1 µ 3 C =, CΘ = µ 2 µ. 3 The observation matrix is Y = (y (1) 1,..., y(1) N ; y (2) 1,..., y(2) N ; y (3) 1,..., y(3) N ), X = 1 N N 0, N N where 1 N = (1,..., 1) : N 1 for N = 3N. For the hypothesis, without loss of generality, we choose µ 1 = µ 2 = µ 3 = 0. For the alternative hypothesis, we choose µ 1 = 0, µ 2 = 3n 1/2 p 1/4 1 p, µ 3 = µ 2. To generate the Y matrix from a non-normal distribution, we generate 3N p i.i.d. random variables u ij from three kinds of chi-square distributions, namely, χ 2 2, χ2 8 and χ 2 32 with 2, 8 and 32 degrees of freedom, respectively, and centre them and scale them as ν ij = (u ij m)/ 2m, for u ij χ 2 m, m = 2, 8, 32. Since the skewness and kurtosis (K 4 + 3) of χ 2 m are, repectively, (8/m) 1/2 and /m, it is noted that χ 2 2 has higher skewness and kurtosis than χ 2 8 and χ2 32. Write them as V = (ν (1) 1,..., ν(1) N ; ν (2) 1,..., ν(2) N ; ν (3) 1,..., ν(3) N ), where ν (i) j vectors are p-vectors, j = 1,..., N, i = 1, 2, 3. For the covariance matrix, we consider two cases. Case 1 : Σ = I p, Case 2 : Σ = D a = diag (a 2 1,..., a2 p), where a i are i.i.d. as chi-square with 3 degrees of freedom. For the Case 1 we define Y = V + X(0, µ 2, µ 3 ), where under the hypothesis, Y = V, and under the alternative, µ 2 and µ 3 are replaced by the vectors mentioned above. For the Case 2 let Y = VD 1/2 a + X(0, µ 2, µ 3 ),

24 54 MUNI S. SRIVASTAVA where under the hypothesis, Y = VDa 1/2, and in the alternative, µ 2 and µ 3 are replaced by the vectors mentioned above. The simulation results under the χ 2 m distributions for m = 2, 8 and 32 are presented in Tables 6, 7 and 8, respectively. The critical values are computed based on 100,000 replications and the powers are obtained based on 10,000 replications. Three tables report the critical values and the power in the hypothesis of the two tests, and it is seen that the critical values are appropriate. Table 6. Critical values and powers of the tests T 1 and T 2 in the case of χ 2 2-distribution with skewness 2 and kurtosis 9 Critical value Power in H Power in A N p T 1 T 2 T 1 T 2 T 1 T 2 Case Case Table 7. Critical values and powers of the tests T 1 and T 2 in the case of χ 2 8-distribution with skewness 1 and kurtosis 4.5 Critical value Power in H Power in A N p T 1 T 2 T 1 T 2 T 1 T 2 Case Case

25 ON TESTING THE EQUALITY OF MEAN VECTORS 55 Table 8. Critical values and powers of the tests T 1 and T 2 in the case of χ 2 32-distribution with skewness 0.5 and kurtosis Critical value Power in H Power in A N p T 1 T 2 T 1 T 2 T 1 T 2 Case Case As reported in the tables, the powers of the two tests perform similarly in Case 1, but the proposed test T 1 has much higher powers than T 2 in Case 2. For the χ 2 2-distribution, which has higher skewness and kurtosis, T 1 has slightly higher power than T 2 in Case 1. Clearly, when Σ = I p, all the components have the same unit of measurements and hence both tests perform equally well but when the unit of measurements are not the same, as in Case 2, the proposed test performs much better than the test based on T Concluding remarks In this article we reviewed several tests for the equality of the two mean vectors including the case when the covariance matrices of the two groups may be unequal. The asymptotic distributions are given under non-normal models. Thus, the tests are robust against the departure from normality. In MANOVA, we assume that the covariance matrices are all equal. We have shown through simulation that the tests that are invariant under nonsingular diagonal matrices perform better than those that are not invariant. Acknowledgements I would like to thank a referee and the editor handling this paper for their suggestions that led to a great improvement in the presentation. The research of this paper was supported by NSERC of Canada.

26 56 MUNI S. SRIVASTAVA References [1] Z. Bai and H. Saranadasa, Effect of high dimension: by an example of a two sample problem, Statist. Sinica 6 (1996), [2] S. X. Chen and Y. L. Qin, A two-sample test for high-dimensional data with applications to gene-set testing, Ann. Statist. 38 (2010), [3] A. P. Dempster, A high dimensional two-sample significance test, Ann. Math. Statist. 29 (1958), [4] Y. Fujikoshi, T. Himeno, and H. Wakaki, Asymptotic results of a high dimensional Manova test and power comparison when the dimension is large compared to the sample size, J. Japan Statist. Soc. 34 (2004), [5] J. R. Schott, Some high-dimensional tests for a one-way MANOVA, J. Multivariate Anal. 98 (2007), [6] M. S. Srivastava, Some tests concerning the covariance matrix in high-dimensional data, J. Japan Statist. Soc. 35 (2005), [7] M. S. Srivastava, Multivariate theory for analysing high dimensional data, J. Japan Statist. Soc. 37 (2007), [8] M. S. Srivastava, A test of the mean vector with fewer observations than the dimension under non-normality, J. Multivariate Anal. 100 (2009), [9] M. S. Srivastava and M. Du, A test for the mean vector with fewer observations than the dimension, J. Multivariate Anal. 99 (2008), [10] M. S. Srivastava and Y. Fujikoshi, Multivariate analysis of variance with fewer observations than the dimension, J. Multivariate Anal. 97 (2006), [11] M. S. Srivastava, S. Katayama, and Y. Kano, A two sample test in high dimension with fewer observations than the dimension, J. Multivariate Anal. 114 (2013), [12] M. S. Srivastava, T. Kollo, and D. von Rosen, Some tests for the covariance matrix with fewer observations than the dimension under non-normality, J. Multivariate Anal. 102 (2011), [13] M. S. Srivastava and H. Yanagihara, Tetsing the equality of several covariance matrices with fewer observations than the dimension, J. Multivariate Anal. 101 (2010), [14] T. Yamada and M. S. Srivastava, A test for the multivariate analysis of variance in high-dimension, Commun. Statist. Theory Methods 41 (2012), 13 14, Department of Statistical Sciences, University of Toronto, 100 St George Street, Toronto, Canada address: srivasta@utstat.toronto.edu

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA J. Japan Statist. Soc. Vol. 37 No. 1 2007 53 86 MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA M. S. Srivastava* In this article, we develop a multivariate theory for analyzing multivariate datasets

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA

TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA Statistica Sinica (014): Preprint 1 TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA Long Feng 1, Changliang Zou 1, Zhaojun Wang 1, Lixing Zhu 1 Nankai University and Hong Kong Baptist University

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means Jackknife Empirical Likelihood est for Equality of wo High Dimensional Means Ruodu Wang, Liang Peng and Yongcheng Qi 2 Abstract It has been a long history to test the equality of two multivariate means.

More information

Orthogonal decompositions in growth curve models

Orthogonal decompositions in growth curve models ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 4, Orthogonal decompositions in growth curve models Daniel Klein and Ivan Žežula Dedicated to Professor L. Kubáček on the occasion

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error Journal of Multivariate Analysis 00 (009) 305 3 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Sphericity test in a GMANOVA MANOVA

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0063R2 Title Two-sample tests for high-dimension, strongly spiked eigenvalue models Manuscript ID SS-2016-0063R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality

Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality Estimation of multivariate rd moment for high-dimensional data and its application for testing multivariate normality Takayuki Yamada, Tetsuto Himeno Institute for Comprehensive Education, Center of General

More information

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

Chapter 7, continued: MANOVA

Chapter 7, continued: MANOVA Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Yuki Yamada, Masashi Hyodo and Takahiro Nishiyama Department of Mathematical Information Science, Tokyo University

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly Unrelated Regression Model and a Heteroscedastic Model

On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly Unrelated Regression Model and a Heteroscedastic Model Journal of Multivariate Analysis 70, 8694 (1999) Article ID jmva.1999.1817, available online at http:www.idealibrary.com on On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

A Squared Correlation Coefficient of the Correlation Matrix

A Squared Correlation Coefficient of the Correlation Matrix A Squared Correlation Coefficient of the Correlation Matrix Rong Fan Southern Illinois University August 25, 2016 Abstract Multivariate linear correlation analysis is important in statistical analysis

More information

Effects of data dimension on empirical likelihood

Effects of data dimension on empirical likelihood Biometrika Advance Access published August 8, 9 Biometrika (9), pp. C 9 Biometrika Trust Printed in Great Britain doi:.9/biomet/asp7 Effects of data dimension on empirical likelihood BY SONG XI CHEN Department

More information

Random Matrices and Multivariate Statistical Analysis

Random Matrices and Multivariate Statistical Analysis Random Matrices and Multivariate Statistical Analysis Iain Johnstone, Statistics, Stanford imj@stanford.edu SEA 06@MIT p.1 Agenda Classical multivariate techniques Principal Component Analysis Canonical

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

TEST FOR INDEPENDENCE OF THE VARIABLES WITH MISSING ELEMENTS IN ONE AND THE SAME COLUMN OF THE EMPIRICAL CORRELATION MATRIX.

TEST FOR INDEPENDENCE OF THE VARIABLES WITH MISSING ELEMENTS IN ONE AND THE SAME COLUMN OF THE EMPIRICAL CORRELATION MATRIX. Serdica Math J 34 (008, 509 530 TEST FOR INDEPENDENCE OF THE VARIABLES WITH MISSING ELEMENTS IN ONE AND THE SAME COLUMN OF THE EMPIRICAL CORRELATION MATRIX Evelina Veleva Communicated by N Yanev Abstract

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report -

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report - A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report - Edgar Brunner and Marius Placzek University of Göttingen, Germany 3. August 0 . Statistical Model and Hypotheses Throughout

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Testing Block-Diagonal Covariance Structure for High-Dimensional Data MASASHI HYODO 1, NOBUMICHI SHUTOH 2, TAKAHIRO NISHIYAMA 3, AND TATJANA PAVLENKO 4 1 Department of Mathematical Sciences, Graduate School

More information

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Bull. Korean Math. Soc. 5 (24), No. 3, pp. 7 76 http://dx.doi.org/34/bkms.24.5.3.7 KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Yicheng Hong and Sungchul Lee Abstract. The limiting

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

Improving linear quantile regression for

Improving linear quantile regression for Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Comparison of Discrimination Methods for High Dimensional Data

Comparison of Discrimination Methods for High Dimensional Data Comparison of Discrimination Methods for High Dimensional Data M.S.Srivastava 1 and T.Kubokawa University of Toronto and University of Tokyo Abstract In microarray experiments, the dimension p of the data

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus

Chapter 9. Hotelling s T 2 Test. 9.1 One Sample. The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus Chapter 9 Hotelling s T 2 Test 9.1 One Sample The one sample Hotelling s T 2 test is used to test H 0 : µ = µ 0 versus H A : µ µ 0. The test rejects H 0 if T 2 H = n(x µ 0 ) T S 1 (x µ 0 ) > n p F p,n

More information

Composite Hotelling s T-Square Test for High-Dimensional Data. 1. Introduction

Composite Hotelling s T-Square Test for High-Dimensional Data. 1. Introduction A^VÇÚO 1 33 ò 1 4 Ï 2017 c 8 Chinese Journal of Applied Probability and Statistics Aug., 2017, Vol. 33, No. 4, pp. 349-368 doi: 10.3969/j.issn.1001-4268.2017.04.003 Composite Hotelling s T-Square Test

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p. Title A new test for the proportionality of two large-dimensional covariance matrices Authors) Liu, B; Xu, L; Zheng, S; Tian, G Citation Journal of Multivariate Analysis, 04, v. 3, p. 93-308 Issued Date

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

Empirical Likelihood Tests for High-dimensional Data

Empirical Likelihood Tests for High-dimensional Data Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on

More information

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX Submitted to the Annals of Statistics HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX By Shurong Zheng, Zhao Chen, Hengjian Cui and Runze Li Northeast Normal University, Fudan

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Tiefeng Jiang 1 and Fan Yang 1, University of Minnesota Abstract For random samples of size n obtained

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information

Department of Statistics

Department of Statistics Research Report Department of Statistics Research Report Department of Statistics No. 05: Testing in multivariate normal models with block circular covariance structures Yuli Liang Dietrich von Rosen Tatjana

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

A note on the equality of the BLUPs for new observations under two linear models

A note on the equality of the BLUPs for new observations under two linear models ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 14, 2010 A note on the equality of the BLUPs for new observations under two linear models Stephen J Haslett and Simo Puntanen Abstract

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

arxiv: v2 [math.st] 24 Jul 2016

arxiv: v2 [math.st] 24 Jul 2016 SCIENCE CHINA Mathematics. ARTICLES. January 05 Vol.55 No.: XX doi: 0.007/s45-000-0000-0 In memory of 0 years since the passing of Professor Xiru Chen - a great Chinese statistician arxiv:603.0003v [math.st]

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Tetsuro Sakurai and Yasunori Fujikoshi Center of General Education, Tokyo University

More information

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE J Jaan Statist Soc Vol 34 No 2004 9 26 ASYMPTOTIC RESULTS OF A HIGH DIMENSIONAL MANOVA TEST AND POWER COMPARISON WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE Yasunori Fujikoshi*, Tetsuto Himeno

More information

General Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35

General Linear Test of a General Linear Hypothesis. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 35 General Linear Test of a General Linear Hypothesis Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 35 Suppose the NTGMM holds so that y = Xβ + ε, where ε N(0, σ 2 I). opyright

More information

STA 437: Applied Multivariate Statistics

STA 437: Applied Multivariate Statistics Al Nosedal. University of Toronto. Winter 2015 1 Chapter 5. Tests on One or Two Mean Vectors If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition Chapter 5. Tests

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

CHANGE DETECTION IN TIME SERIES

CHANGE DETECTION IN TIME SERIES CHANGE DETECTION IN TIME SERIES Edit Gombay TIES - 2008 University of British Columbia, Kelowna June 8-13, 2008 Outline Introduction Results Examples References Introduction sunspot.year 0 50 100 150 1700

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your answer

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large

EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large EPMC Estimation in Discriminant Analysis when the Dimension and Sample Sizes are Large Tetsuji Tonda 1 Tomoyuki Nakagawa and Hirofumi Wakaki Last modified: March 30 016 1 Faculty of Management and Information

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Estimation of Variances and Covariances

Estimation of Variances and Covariances Estimation of Variances and Covariances Variables and Distributions Random variables are samples from a population with a given set of population parameters Random variables can be discrete, having a limited

More information

Lecture 5: Hypothesis tests for more than one sample

Lecture 5: Hypothesis tests for more than one sample 1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Multivariate-sign-based high-dimensional tests for sphericity

Multivariate-sign-based high-dimensional tests for sphericity Biometrika (2013, xx, x, pp. 1 8 C 2012 Biometrika Trust Printed in Great Britain Multivariate-sign-based high-dimensional tests for sphericity BY CHANGLIANG ZOU, LIUHUA PENG, LONG FENG AND ZHAOJUN WANG

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information