SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

Size: px
Start display at page:

Download "SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan"

Transcription

1 A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan Abstract We propose a new test for the equality of the mean vectors between a two groups with the same number of the observations in highdimensional data. The existing tests for this problem require a strong condition on the population covariance matrix. The proposed test in this paper does not require such conditions for it. This test will be obtained in a general model, that is, the data need not be normally distributed. 1 Introduction Advances of information technology and database system make it possible to collect and store high-dimensional data which have fewer observations than or equal to the dimension. High-dimensional data appears in various fields, such as DNA microarray data analysis and marketing data analysis. In such high-dimensional data, however, traditional multivariate analysis is not often applicable. In this paper, we propose a new test for the equality of two mean vectors of n observations which are given by (X p (i1), X p (i2) ), i = 1,..., n, where X p (ij) are p-dimensional random vectors. Within the group j = 1, 2, X p (ij) (i = 1,..., n) are independent, while the two groups may be dependent. In a specific situation, we can regard X p (i1) and X p (i2) as the two measurements on the same subject i before and after a treatment. Let X p (i) = X p (i1) X p (i2), µ p = E(X p (i) ) and Σ p = Var(X p (i) ), then the problem is a one-sample problem. We wish to test the following hypothesis: H 0 : µ p = 0 versus H 1 : µ p 0. (1.1) A traditional method to test the null hypothesis is Hotelling s T 2 test. However, Hotelling s T 2 test is not applicable when n p since the inverse of the 1

2 sample covariance matrix does not exist. Bai and Saranadasa (1996) propose a test for the two sample problem with equal population covariance matrices. For the one sample problem in (1.1), the test is based on the statistic F n,p = X T n,p X n,p 1 n trs n,p, (1.2) where X n,p is the sample mean vector and S n,p is the sample covariance matrix. More recently, Chen and Qin (2010) propose a test based on the statistic G n,p = 1 n(n 1) X p (i)t X p (j) (1.3) for the one sample problem in (1.1). They also propose a test for the two sample problem with unequal population covariance matrices. The asymptotic normality of the test statistic is obtained under the condition trσ 4 p = o{(trσ 2 p) 2 } and the null hypothesis H 0 in (1.1) when the sample size n and the dimension p tend to infinity. We easily show that F n,p = G n,p. Hence, for the one sample problem in (1.1), the asymptotic normality of Bai and Saranadasa s test statistic is also obtained under the condition trσ 4 p = o{(trσ 2 p) 2 } which is weaker than the original one. However, this condition is still restrictive. Indeed, the condition exclude some typical situation such as Σ p = (1 ρ)i p + ρ1 p 1 T p, ρ (0, 1), where I p is the p p identity matrix and 1 p is the p-column vector with all entries one, or the case where the maximum eigenvalue of Σ p has larger order of p than 1/2. Moreover, it is hard to estimate whether the condition is satisfied or not from the data because of its high-dimensionality. When the condition is not satisfied, the population covariance matrix makes an enormous effect on the asymptotic null distributions of the two test statistics. Katayama et al. (2010) show that the type of the asymptotic null distribution of Bai and Saranadasa s test statistic depends on the population covariance matrix, hence so is Chen and Qin s test statistic. To illustrate this phenomenon, we give a simple example. Suppose temporarily X p (i) s are independently and identically distributed (i.i.d.) as p-dimensional multivariate normal distribution with mean vector 0 p and covariance matrix Σ p = (1 ρ p )I p +ρ p 1 p 1 T p, ρ p = p α, α (0, 1). Here, 0 p denotes the p-column vector with all entries zero. Note that the condition is satisfied only when 1/2 < α < 1. Since the two test statistics are equivalent, we consider Chen 2

3 and Qin s test statistic only. We define the standardized version of the test statistic as follows: T = G n,p. 2trΣ 2 p/n(n 1) The distributions of T are plotted for (n, p) = (80, 300), (80, 500) based on 10, 000 simulations in Figure 1. This shows that the type of the asymptotic null distributions of T depends on α, hence on Σ p. When the condition is not satisfied, i.e., 0 < α 1/2, the asymptotic distribution of T is obviously not normal for the two cases considered. Indeed, the asymptotic distribution is standardized chi-square distribution with 1 degree of freedom for 0 < α < 1/2, and the convolution of normal and chi-square distributions for α = 1/2. See Katayama et al. (2010) for more details. Hence the type I error of Chen and Qin s test would be asymptotically incorrect. Estimated density of T for n = 80 and p = 300 Estimated density of T for n = 80 and p = 500 density normal α = 0.1 α = 0.3 α = 0.5 α = 0.7 α = 0.9 density normal α = 0.1 α = 0.3 α = 0.5 α = 0.7 α = Figure 1: Distribution of T, when n = 80 and p = 300, 500, based on 10, 000 Monte Carlo simulations Motivated by these findings, in this paper, we construct a test statistic which has the asymptotic normality under H 0 without any assumption on population covariance matrix. Hence, the type I error of the proposed test is always asymptotically correct when we use the percentile point of normal 3

4 distribution. The asymptotic normality is obtained without assuming that the random vectors are i.i.d. as multivariate normal distribution. The paper is organized as follows. In Section 2, we introduce the proposed test and obtain the asymptotic normality. Several simulations and a real data example are given in Section 3. All the proofs are given in Section 4. 2 Proposed Test In this section, we propose a test statistic for (1.1). The asymptotic normality of the test statistic is obtained without any assumption on the population covariance matrix under H 0. Like Srivastava (2009), we assume X (i) p = µ p + C p Z (i) p, i = 1,..., n, (2.1) where C p is a p p matrix with Σ p = C p C T p and Z (i) p = (z i1,..., z ip ) T. Assume that z ij s are independent random variables which satisfy E(z ij ) = 0, E(z 2 ij) = 1, E(z 4 ij) = γ <. For the high-dimensional inference, we assume the following condition: Condition A. n = n(p) as p. Now we propose the test statistic as where X n,p = (X (1) p T n,p = trx T n,pm n X n,p, (2.2),..., X p (n) ) T is the data matrix and M n is a known n n symmetric matrix which satisfies the following condition: Condition B. M n = (m ij ) with m ij 0 and m ii = 0, 1 T nm n 1 n 0, trm 4 n = o{(trm 2 n) 2 }. The proposed test statistic is also defined as T n,p = m ij X p (i)t X p (j). The mean and variance of T n,p are given by E(T n,p ) = (1 T nm n 1 n )µ T p µ p, (2.3) Var(T Mn ) = 2trM 2 ntrσ 2 p + 4(1 T nm 2 n1 n )µ T p Σ p µ p. (2.4) 4

5 The derivation of (2.4) is a little complicated, while the equation (2.3) is easily obtained. We shall give the derivation of (2.4) in Section 4. Since 1 T nm n 1 n 0 under Condition B, we note that T n,p is a valid statistic for testing the null hypothesis in (1.1). In view of trm 2 n 0, we also note that Var(T n,p ) 0 for any population mean vector µ p. For the asymptotic normality of T n,p under the null hypothesis H 0 in (1.1), we have the following theorem. Theorem 2.1 have Under the null hypothesis H 0 and Conditions A and B, we T n,p = T n,p 2trM 2 n trσ 2 p d N(0, 1), p, d where denotes the convergence in distribution and N(0, 1) denotes a standard normal distribution. Since trσ 2 p is unknown generally, we need to estimate it. Define U 2 = 1 P 2 n U 3 = 1 P 3 n U 4 = 1 P 4 n k k l X (i)t p X (i)t p X (i)t p X (j) p X (j) p X (j) p X p (i)t X (j) p, X p (i)t X p (k), X p (k)t X (l) p, where P r n = n!/(n r)!. Following Chen et al. (2010), an unbiased estimator trσ 2 p of trσ 2 p is given by trσ 2 p = U 2 2U 3 + U 4. (2.5) In addition to the unbiasedness, the estimator trσ 2 p is ratio-consistent which is given in the following theorem: Theorem 2.2 where Under Condition A, we have trσ 2 p P 1, p, (2.6) trσ 2 p P 1 denotes the convergence in probability. 5

6 From Theorem 2.1 and 2.2, under H 0 and Conditions A and B, we have T n,p 2trM 2 n trσ 2 p d N(0, 1), p. This result does not require any conditions on the population covariance matrix. The asymptotic normality is attained by using the known matrix M n which satisfies Condition B. For instance, the matrix M n is given by (i) M n = (ρ i j I( i j > 0)) with ρ (0, 1); (ii) M n = ( i j s I( i j > 0)) with s > 1; (iii) M n = (m i j I( i j > 0)) with lim n n 1 k=1 m k <, where I(E) for a event E equals to 1 when E is true, equals to 0 when E is false. The matrices (i) and (ii) are the special cases of (iii). We apply the Szegö theorem (see, e.g., Grenander and Szegö, 1958) to show that (iii) meets Condition B. For the asymptotic power of T n,p, we assume the following condition: Condition C. (1 T nm 2 n1 n )µ T p Σ p µ p = o(trm 2 ntrσ 2 p). The following theorem establishes the asymptotic power of T n,p under the local alternative defined by Condition C. Theorem 2.3 Under the alternative H 1 and Conditions A C, we have ( ) ( ) P Tn,p > z 1 α H 1 Φ z 1 α + (1T nm n 1 n )µ T p µ p 0, p, 2trM 2 n trσ 2 p where Φ denotes the distribution function of a standard normal distribution, z 1 α is the upper α-percentile point of the standard normal distribution, and P ( H 1 ) denotes the probability under the alternative H 1. From Theorem 2.3, we note that the asymptotic power of T n,p depends on λ n = 1 T nm n 1 n /(trmn) 2 1/2. Hence, we suppose that selecting the matrix M n which has larger λ n leads to larger power of T n,p. Obviously, the matrix M n which maximizes λ n is given by c1 n 1 T n where c is any constant. However, this matrix does not satisfy m ii = 0 and trmn 4 = o{(trmn) 2 2 } in Condition B. We need to maximize λ n over the matrices which satisfy Condition B. Unfortunately, it is hard to find such matrix even in the case (iii) since the 6

7 numbers of the parameters, we need to determine to maximize λ n, increase as the sample size increases. For this reason, we shall compare T n,p using (i) and (ii) numerically later in the next Section. It should be mentioned that when we use the matrix M n which maximizes λ n subject only to m ii = 0 in Condition B, the proposed statistic T n,p is equivalent to the statistic G n,p given in (1.3), and hence, to F n,p given in (1.2). Thus we note that the power of the proposed test would be lower than the two tests. In this view, we should use the existing test when Σ p is known and the asymptotic distribution is specified using the result of Katayama et.al. (2010). Nevertheless, the proposed test is useful when we have no information about Σ p for the correct type I error. In the rest of this section, we consider the consistency of the proposed test. Assume that the matrix M n is given by (iii). Then we have and n 1 1 T nm n 1 n = 2 (n k)m k = O(n) k=1 n 1 trmn 2 = 2 (n k)m 2 k = O(n), k=1 since lim n n 1 k=1 m k < and n 1 k=1 m2 k ( n 1 k=1 m k ) 2. Hence, Theorem 2.3 and (2.7) yield the following theorem: 1 T nm n 1 n trm 2 n = O( n). (2.7) Theorem 2.4 Assume Conditions A C and that M n has the form given by (iii). Let δ n,p = n 1/2 µ T p µ p /(trσ 2 p) 1/2. Under the alternative H 1, if δ n,p as p, then we have P ( T n,p > z 1 α H 1 ) 1 where P ( H 1 ) denotes the probability under the alternative H 1. 3 Numerical Results In this section, we evaluate performance of the proposed test statistic T n,p with the existing one T by computing the empirical significance level and power. The performance for a DNA microarray data is also evaluated. 7

8 3.1 Computational Complexity of trσ 2 p Before conducting several simulations and a DNA microarray data analysis, we need to mention the computational complexity of trσ 2 p. Indeed, a large amounts of time is required to calculate trσ 2 p since it includes the summation over the set of up to quadruplet, i.e., {(i, j, k, l) 1 i, j, k, l n}. Let a ij = X p (i) T X p (j) for the sake of simplicity. Then it follows that P 3 nu 3 = and = P 4 nu 4 = a ij a ik a 2 ij 2 a ii a ij a 2 ii {( ) 2 } ( ) a ij a 2 ij 2 a ii a ij + 2 j=1 i,j=1 j=1 i,j,k=1 i,j,k,l=1 a ij a kl 2 a ii a jj 2 k a ii a jk 4 a 2 ij 4 ( ) 2 ( )( = a ij 2 a ii i,j=1 ( ) 2 + a ii + 2 k a ij a ik a ii a ij ) a jk 4 j,k=1 a 2 ij + 8 i,j=1 a 2 ii a 2 ii (3.1) {( ) 2 } a ij j=1 ( ) a ii a ij 6 j=1 a 2 ii. (3.2) Thus, when a ij (i, j = 1,..., p) are given, we only require O(n 2 ) operations to calculate trσ 2 p through the above equations (3.1) and (3.2), while it costs as many as O(n 4 ) operations to calculate it directly. 3.2 Simulations We consider three population distributions in the model (2.1): random variables z ij s are i.i.d. as standard normal distribution, standardized t distribution with 5 degrees of freedom and standardized gamma distribution with shape and scale parameters 2 and 2. For the population covariance matrix, 8

9 we choose Σ 1 = (σ i σ j ρ i j ), Σ 2 = (σ i σ j ρ), Σ 3 = (σ i σ j u ij ) where ρ = 0.5, σi 2 = 2 + (p i + 1)/p, u ii = 1 and u ij = u ji (i j) are i.i.d. as U(0, 1), uniform distribution with the support (0, 1). We note that T n,p and T include the unknown parameter trσ 2 p. The estimator (2.5) is used in the following simulations. The empirical significance level and power are calculated based on 10, 000 Monte Carlo simulations. New Test (1) New Test (2) Covariance Population p Σ 1 Normal t Gamma Σ 2 Normal t Gamma Σ 3 Normal t Gamma Table 1: Empirical significance levels for Σ 1, Σ 2 and Σ 3 at 5% and 1% significance based on the normal approximation. 9

10 In Table 1, we evaluate empirical significance levels of the three statistics based on the normal approximation at 5% and 1% significance. We choose n = 80 and p = 100, 300, 500. In the table, New Test (1) and (2) denote the proposed test using (i) and (ii) as M n respectively. Considering trmn 4 = o{(trmn) 2 2 } in Condition B for the asymptotic normality of T n,p, we determine ρ and s in (i) and (ii) such that trmn/(trm 4 n) 2 2 = denotes the test proposed by Chen and Qin (2010). It is found that the proposed test has better approximation for the significance level, particularly at 1%, for the all populations and population covariance matrices considered. We denote η = µ T p µ p /(trσ 2 p) 1/2 for comparing the empirical powers of the three tests. This simulation is conducted at significance 5%. In Figure 2, we show the empirical power curves for n = 80, p = 300, Σ 1, Σ 2 and Σ 3 with standard normal, t and gamma distributions. It is found that the power of the proposed test which uses (ii) as M n is slightly higher than that using (i). As described in Section 2, the power of the proposed test is lower than that of Chen and Qin s test, theoretically. We also find this result in the figure. For improvement of the power of the proposed test, we may need to find a optimal matrix M n in the matrices which satisfy Condition B. This is a future problem. 3.3 A Real Data Example We evaluate performance of the proposed test using the prostate cancer data (Singh et al., 2002). This data contain 12, 600 genes for 102 patients, 52 of which are prostate tumor patients and 50 of which are prostate normal patients. Following Dudoit et al. (2002), we preprocess the data by first truncating on the closed set [1, 5000], second removing genes with max/min 5 or max min 150 where max and min denote the maximum and minimum intensities over 102 patients, and finally transforming all intensities to base-10 logarithms. The number of genes is reduced to p = 2, 745. We use n = 50 observations from each of the tumor and normal patients. Then this setup leads to the situation of (1.1). In Figure 3, we plot the distributions of T n,p and T under H 0 using the parametric bootstrap method, that is, we draw 50 samples from N p (0 p, S n,p ) ten thousand times, and then plot them each time. For the proposed test statistic T n,p, the matrix M n is given by (ii) of which parameter s is determined such that trm 4 n/(trm 2 n) 2 = This shows that the distribution of the proposed test statistic is much closer to the standard normal distribution 10

11 Normal, Σ 1 t, Σ 1 Gamma, Σ 1 Power New Test (1) New Test (2) Power New Test (1) New Test (2) Power New Test (1) New Test (2) η η η Normal, Σ 2 t, Σ 2 Gamma, Σ 2 Power New Test (1) New Test (2) Power New Test (1) New Test (2) Power New Test (1) New Test (2) η η η Normal, Σ 3 t, Σ 3 Gamma, Σ 3 Power New Test (1) New Test (2) Power New Test (1) New Test (2) Power New Test (1) New Test (2) η η η Figure 2: Power curves of the proposed test and Chen and Qin s test for n = 80, p = 300, Σ 1, Σ 2 and Σ 3 with normal, t and gamma distributions. 11

12 than the existing one. Estimated densities of T np and T density Normal New Test Figure 3: Distributions of T n,p and T under H 0 based on the parametric bootstrap method 4 Proofs In this section, we give proofs of the equation (2.4) and Theorem in Section 2. Throughout this section, we describe X i and Z i for X p (i) and respectively. Z (i) p 12

13 4.1 Proof of the equation (2.4) Note that T n,p = m ij (X i µ p ) T (X j µ p ) + 2 = m ij Zi T Cp T C p Z j + 2 m ij µ T p X j m ij µ T p C p Z j + m ij µ T p µ p m ij µ T p µ p. Hence, we have ( ) ( ) Var(T n,p ) = Var m ij Zi T Cp T C p Z j + 4Var m ij µ T p C p Z j + 4 m ij m kl Cov ( ) Zi T Cp T C p Z j, µ T p C p Z l k l = V 1 + 4V 2 + 4V 3, say. Clearly, V 3 = 0 since Z i s are independent and E(Z i ) = 0 p. For the first term V 1, using the vec-operator vec( ) and the Kronecker product (see, e.g., Schott, 2005), it follows that m ij Zi T Cp T C p Z j = vec(z n,p ) T (Cp T C p M n )vec(z n,p ), where Z n,p = (Z 1,..., Z n ) T. Then, from Lemma 2.1 in Srivastava (2009), V 1 = 2tr(C T p C p M n ) 2 = 2trM 2 ntrσ 2 p. For the second term V 2, let Θ = 1 p µ T p. Note that m ij µ T p C p Z j = vec(θ) T (C p M n )vec(z n,p ). Then we obtain V 2 = vec(θ) T (C p M n )(C p M n ) T vec(θ) = (µ p 1 p )(Σ p M 2 n)(µ p 1 p ) = (1 T p M 2 n1 p )µ T p Σ p µ p. Thus, the equation (2.4) is derived. 13

14 4.2 Proof of Theorem 2.1 This proof is based on the results in Srivastava (2009) and Chen and Qin (2010). Let m ij = m ij /(2trM 2 ntrσ 2 p) 1/2 and M n = ( m ij ). Then we have T n,p = n m ijx T i X j 2trM 2 n trσ 2 p = j 1 m ij Xi T X j = 2 j=2 m ij X T i X j = 2 Y j, j=2 where Y j = j 1 m ijx T i X j. We need to show that Q n = j=2 Y j d N(0, 1/4), p. (4.1) We can apply Martingale Central Limit Theorem (Hall and Heyde, 1980) for Q n to show (4.1). Let F n be the σ algebra which is generated by {X 1,..., X n }. Then we have F j 1 F j for any j = 2,..., n. It is easily obtained that E(Q n ) = 0 and E(Q 2 n) <. We note for n > m that Q n = Q m + n j=m+1 Y j and E( n j=m+1 Y j F m ) = 0 since when m+1 j n, E(Xi T X j F m ) = Xi T E(X j ) = 0 for 1 i m and E(Xi T X j F m ) = E(Xi T X j ) = 0 for m + 1 i n 1. Hence we find that the sequence {Q n, F n } is a zero mean and square integrable martingale array. It suffices to show that under Conditions A and B j=2 E(Yj 2 P F j 1 ) 1 4, p (4.2) j=2 E(Y 4 j ) 0, p. (4.3) We first show (4.2). Note that ( j 1 ) 2 E(Yj 2 F j 1 ) = E m ij Xi T X j F j 1 j=2 = j=2 m 2 ijxi T Σ p X i + i<j = A 1 + A 2, say. j 1 j=2 i 1 i 2 m i1 j m i2 jx T i 1 Σ p X i2 14

15 Also we note that E(A 1 ) = i<j m 2 ije(x T i Σ p X i ) = i<j m 2 ijtrσ 2 p = 1 2 tr M 2 ntrσ 2 p = 1 4, and we have ( ) Var(A 1 ) = Var m 2 ijxi T Σ p X i i<j ( n 1 = j=i+1 ( n 1 = j=i+1 m ij ) 2 Var(X T i Σ p X i ) m ij ) 2 [(γ 3) ] p vkk 2 + trσ 4 p, where v ii is the diagonal elements of C T p Σ p C p = (C T p C p )(C T p C p ). The last equation is derived from Lemma 2.1 in Srivastava (2009). Let D p = C T p C p = (d ij ). Following Lemma 2.6 in Srivastava (2009), we note that k=1 p vkk 2 = k=1 ( p p k=1 d 2 ik ) 2 tr(d T p D p ) 2 = trσ 4 p. (4.4) Also we note that under Condition B ( n 1 2 ( 2 ( m ij) 2 m ij) 2 + j=i+1 j=1 i 1 i 2 j=1 m i1 j m i2 j ) 2 = tr M 4 n. Hence, we have Since trσ 4 p/(trσ 2 p) 2 1, we obtain Var(A 1 ) = O(tr M 4 ntrσ 4 p). tr M 4 ntrσ 4 p = trmntrσ 4 4 p 0, p (4.5) 4(trMn) 2 2 (trσ 2 2 p) 15

16 P under Conditions A and B. Thus A 1 1/4 from the Chebyshev s inequality. P Next, we shall show A 2 0. Since n 1 A 2 = Xi T 1 Σ p X i2 m i1 j m i2 j, i 1 i 2 j=min{i 1,i 2 }+1 and E(X T i 1 Σ p X i2 X T i 3 Σ p X i4 ) = 0 when at least one index is different between (i 1, i 2 ) and (i 3, i 4 ), we have E(A 2 ) = 0 and E(A 2 2) = E(Xi T 1 Σ p X i2 ) 2 i 1 i 2 j=min{i 1,i 2 }+1 m i1 j m i2 j 2. Note that E(X T i 1 Σ p X i2 ) 2 = trσ 4 p for i 1 i 2 and m i1 j m i2 j i 1 i 2 j=min{i 1,i 2 }+1 i 1 i 2 j=1 2 ( 2 m i1 j m i2 j) + = tr M 4 n, ( j=1 m 2 ij ) 2 under Condition B. Then we have E(A 2 2) = O(tr M ntrσ 4 4 p). From (4.5), we obtain that E(A 2 2) 0 as p under Conditions A and B. The P Chebyshev s inequality yields A 2 0. Next, we verify (4.3). Note that E(Y 4 j ) = E = E = ( j 1 [ j 1 j 1 m ij X T i X j ) 4 m 2 ij(xi T X j ) 2 + m 4 ije(x T i X j ) j 1 i 1 i 2 j 1 Following Chen and Qin (2010), we have m i1 j m i2 jx T i 1 X j X T i 2 X j ] 2 i 1 i 2 m 2 i 1 j m 2 i 2 je[(x T i 1 X j ) 2 (X T i 2 X j ) 2 ]. E(X T i X j ) 4 = O(trΣ 4 p) + O{(trΣ 2 p) 2 }, i j. 16

17 Also we have from (4.4), E[(Xi T 1 X j ) 2 (Xi T 2 X j ) 2 ] = E[(Xj T Σ p X j ) 2 ] p = (γ 3) vkk 2 + 2trΣ 4 p + (trσ 2 p) 2 = O(trΣ 4 p) + O{(trΣ 2 p) 2 }, i 1 i 2 j. Following Corollary 2.7 in Srivastava (2009), we have k=1 and j 1 j=2 i 1 i 2 j 1 m 4 ij j=2 m 2 i 1 j m 2 i 2 j j=1 i 1 i 2 m 4 ij tr M n 4 m 2 i 1 j m 2 i 2 j ( j=1 m 2 ij ) 2 tr M 4 n. From (4.5) and the fact that tr M 4 n(trσ 2 p) 2 = trm 4 n 4(trM 2 n) 2 0, p under Conditions A and B, we obtain n j=2 E(Y 4 j ) 0, p. This ends the proof. 4.3 Proof of Theorem 2.2 Note that trσ 2 p is invariant under the location transformations X i X i +c p where c p is an arbitrary p-column constant vector. Hence, we assume µ p = 0 without loss of generality. Obviously, we have E(U 2 ) = trσ 2 p and E(U 3 ) = E(U 4 ) = 0. It follows from Chen et al. (2010) that Var(U 2 ) = 4 n 2 (trσ2 p) (γ 3) n trσ4 p + tr[(cp T C p ) 2 (Cp T C p ) 2 ] [ n 1 + O n 3 (trσ2 p) ] n 2 trσ4 p, Var(U 3 ) = 2 n 3 (trσ2 p) [ 1 n 2 trσ4 p + O n 4 (trσ2 p) ] n 3 trσ4 p 17

18 and Var(U 4 ) = 8 [ 1 n 4 (trσ2 p) 2 + O n 5 (trσ2 p) ] n 4 trσ4 p, where the symbol denotes the Hadamard product. Using the fact that tr[(c T p C p ) 2 (C T p C p ) 2 ] tr(c T p C p ) 4 = trσ 4 p, we have Var(U i /trσ 2 p) 0 for i = 2, 3, 4 under Condition A. This and the Chebychev s inequality yield (2.6). 4.4 Proof of Theorem 2.3 Using Theorem 2.2, we have n m ij(x i µ p ) T (X j µ p ) σ n,p d N(0, 1), p, under Conditions A and B. Note that T n,p = m ij (X i µ p ) T (X j µ p ) + 2 m ij µ T p X j m ij µ T p µ p. Hence, it remains to show that n m ijµ T p X j n m ijµ T p µ p σ n,p P 0, p. (4.6) We notice E( n m ijµ T p X j ) = n m ijµ T p µ p and ( ) Var m ij µ T p X j = 4(1 T nmn1 2 n )µ T p Σ p µ p = o(σn,p), 2 under Condition C. The Chebyshev s inequality yields (4.6). The Slutsky s theorem and (4.6) complete the proof. 18

19 References [1] Bai, Z and Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica 6: [2] Chen, S. X. and Qin, Y. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics 38: [3] Chen, S. X., Zhang, L. X. and Zhong, P. S. (2010). Tests for highdimensional covariance matrices. Journal of American Statistical Association 105: [4] Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Jounal of the American Statistical Association 97: [5] Grenander, U. and Szegö, G. (1958). Toeplitz forms and their applications, 2nd edition. New York: Chelsea Publishing Company. [6] Hall, P and Heyde, C. (1980). Martingale limit theory and applications. New York: Academic Press. [7] Katayama, S., Kano, Y. and Srivastava, M.S. (2010). Asymptotic distributions of some test criteria for the mean vector with fewer observations than the dimension. Submitted. [8] Schott, J. R. (2005). Matrix analysis for statistics, 2nd edition. New York: Wiley. [9] Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D Amico, A. V., Richie, J. P., Lander, E. S., Loda, M., Kantoff, P. W., Golub, T. R. and Sellers, W. R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1: [10] Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. Journal of Multivariate Analysis. 100:

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Testing Block-Diagonal Covariance Structure for High-Dimensional Data MASASHI HYODO 1, NOBUMICHI SHUTOH 2, TAKAHIRO NISHIYAMA 3, AND TATJANA PAVLENKO 4 1 Department of Mathematical Sciences, Graduate School

More information

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means Jackknife Empirical Likelihood est for Equality of wo High Dimensional Means Ruodu Wang, Liang Peng and Yongcheng Qi 2 Abstract It has been a long history to test the equality of two multivariate means.

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

ASYMPTOTIC THEORY FOR DISCRIMINANT ANALYSIS IN HIGH DIMENSION LOW SAMPLE SIZE

ASYMPTOTIC THEORY FOR DISCRIMINANT ANALYSIS IN HIGH DIMENSION LOW SAMPLE SIZE Mem. Gra. Sci. Eng. Shimane Univ. Series B: Mathematics 48 (2015), pp. 15 26 ASYMPTOTIC THEORY FOR DISCRIMINANT ANALYSIS IN HIGH DIMENSION LOW SAMPLE SIZE MITSURU TAMATANI Communicated by Kanta Naito (Received:

More information

Package SHIP. February 19, 2015

Package SHIP. February 19, 2015 Type Package Package SHIP February 19, 2015 Title SHrinkage covariance Incorporating Prior knowledge Version 1.0.2 Author Maintainer Vincent Guillemot The SHIP-package allows

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality

Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality Estimation of multivariate rd moment for high-dimensional data and its application for testing multivariate normality Takayuki Yamada, Tetsuto Himeno Institute for Comprehensive Education, Center of General

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

On the Testing and Estimation of High- Dimensional Covariance Matrices

On the Testing and Estimation of High- Dimensional Covariance Matrices Clemson University TigerPrints All Dissertations Dissertations 1-009 On the Testing and Estimation of High- Dimensional Covariance Matrices Thomas Fisher Clemson University, thomas.j.fisher@gmail.com Follow

More information

TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA

TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA Statistica Sinica (014): Preprint 1 TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA Long Feng 1, Changliang Zou 1, Zhaojun Wang 1, Lixing Zhu 1 Nankai University and Hong Kong Baptist University

More information

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA J. Japan Statist. Soc. Vol. 37 No. 1 2007 53 86 MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA M. S. Srivastava* In this article, we develop a multivariate theory for analyzing multivariate datasets

More information

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX Submitted to the Annals of Statistics HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX By Shurong Zheng, Zhao Chen, Hengjian Cui and Runze Li Northeast Normal University, Fudan

More information

Are a set of microarrays independent of each other?

Are a set of microarrays independent of each other? Are a set of microarrays independent of each other? Bradley Efron Stanford University Abstract Having observed an m n matrix X whose rows are possibly correlated, we wish to test the hypothesis that the

More information

ROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) Bradley Efron Department of Statistics Stanford University

ROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) Bradley Efron Department of Statistics Stanford University ROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) By Bradley Efron Department of Statistics Stanford University Technical Report 244 March 2008 This research was supported

More information

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report -

A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report - A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report - Edgar Brunner and Marius Placzek University of Göttingen, Germany 3. August 0 . Statistical Model and Hypotheses Throughout

More information

Package IGG. R topics documented: April 9, 2018

Package IGG. R topics documented: April 9, 2018 Package IGG April 9, 2018 Type Package Title Inverse Gamma-Gamma Version 1.0 Date 2018-04-04 Author Ray Bai, Malay Ghosh Maintainer Ray Bai Description Implements Bayesian linear regression,

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS

KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Bull. Korean Math. Soc. 5 (24), No. 3, pp. 7 76 http://dx.doi.org/34/bkms.24.5.3.7 KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Yicheng Hong and Sungchul Lee Abstract. The limiting

More information

Empirical Likelihood Tests for High-dimensional Data

Empirical Likelihood Tests for High-dimensional Data Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Yuki Yamada, Masashi Hyodo and Takahiro Nishiyama Department of Mathematical Information Science, Tokyo University

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0063R2 Title Two-sample tests for high-dimension, strongly spiked eigenvalue models Manuscript ID SS-2016-0063R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

Testing equality of two mean vectors with unequal sample sizes for populations with correlation

Testing equality of two mean vectors with unequal sample sizes for populations with correlation Testing equality of two mean vectors with unequal sample sizes for populations with correlation Aya Shinozaki Naoya Okamoto 2 and Takashi Seo Department of Mathematical Information Science Tokyo University

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

On Expected Gaussian Random Determinants

On Expected Gaussian Random Determinants On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

On a General Two-Sample Test with New Critical Value for Multivariate Feature Selection in Chest X-Ray Image Analysis Problem

On a General Two-Sample Test with New Critical Value for Multivariate Feature Selection in Chest X-Ray Image Analysis Problem Applied Mathematical Sciences, Vol. 9, 2015, no. 147, 7317-7325 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2015.510687 On a General Two-Sample Test with New Critical Value for Multivariate

More information

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis

Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Tetsuro Sakurai and Yasunori Fujikoshi Center of General Education, Tokyo University

More information

Tests for Covariance Matrices, particularly for High-dimensional Data

Tests for Covariance Matrices, particularly for High-dimensional Data M. Rauf Ahmad Tests for Covariance Matrices, particularly for High-dimensional Data Technical Report Number 09, 200 Department of Statistics University of Munich http://www.stat.uni-muenchen.de Tests for

More information

Composite Hotelling s T-Square Test for High-Dimensional Data. 1. Introduction

Composite Hotelling s T-Square Test for High-Dimensional Data. 1. Introduction A^VÇÚO 1 33 ò 1 4 Ï 2017 c 8 Chinese Journal of Applied Probability and Statistics Aug., 2017, Vol. 33, No. 4, pp. 349-368 doi: 10.3969/j.issn.1001-4268.2017.04.003 Composite Hotelling s T-Square Test

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Comparison of Discrimination Methods for High Dimensional Data

Comparison of Discrimination Methods for High Dimensional Data Comparison of Discrimination Methods for High Dimensional Data M.S.Srivastava 1 and T.Kubokawa University of Toronto and University of Tokyo Abstract In microarray experiments, the dimension p of the data

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

Asymptotic Theory. L. Magee revised January 21, 2013

Asymptotic Theory. L. Magee revised January 21, 2013 Asymptotic Theory L. Magee revised January 21, 2013 1 Convergence 1.1 Definitions Let a n to refer to a random variable that is a function of n random variables. Convergence in Probability The scalar a

More information

Effects of data dimension on empirical likelihood

Effects of data dimension on empirical likelihood Biometrika Advance Access published August 8, 9 Biometrika (9), pp. C 9 Biometrika Trust Printed in Great Britain doi:.9/biomet/asp7 Effects of data dimension on empirical likelihood BY SONG XI CHEN Department

More information

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Course information: Instructor: Tim Hanson, Leconte 219C, phone 777-3859. Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Text: Applied Linear Statistical Models (5th Edition),

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION Gábor J. Székely Bowling Green State University Maria L. Rizzo Ohio University October 30, 2004 Abstract We propose a new nonparametric test for equality

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Lecture 3 - Expectation, inequalities and laws of large numbers

Lecture 3 - Expectation, inequalities and laws of large numbers Lecture 3 - Expectation, inequalities and laws of large numbers Jan Bouda FI MU April 19, 2009 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 1 / 67 Part

More information

Probability Lecture III (August, 2006)

Probability Lecture III (August, 2006) robability Lecture III (August, 2006) 1 Some roperties of Random Vectors and Matrices We generalize univariate notions in this section. Definition 1 Let U = U ij k l, a matrix of random variables. Suppose

More information

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,

More information

Chp 4. Expectation and Variance

Chp 4. Expectation and Variance Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-5-2016 Large-Scale Multiple Testing of Correlations T. Tony Cai University of Pennsylvania Weidong Liu Follow this

More information

CS281A/Stat241A Lecture 17

CS281A/Stat241A Lecture 17 CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity

More information

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek

Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek J. Korean Math. Soc. 41 (2004), No. 5, pp. 883 894 CONVERGENCE OF WEIGHTED SUMS FOR DEPENDENT RANDOM VARIABLES Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek Abstract. We discuss in this paper the strong

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Statistica Sinica Preprint No: SS

Statistica Sinica Preprint No: SS Statistica Sinica Preprint No: SS-2017-0275 Title Testing homogeneity of high-dimensional covariance matrices Manuscript ID SS-2017-0275 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0275

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

6 The normal distribution, the central limit theorem and random samples

6 The normal distribution, the central limit theorem and random samples 6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e

More information

A new method to bound rate of convergence

A new method to bound rate of convergence A new method to bound rate of convergence Arup Bose Indian Statistical Institute, Kolkata, abose@isical.ac.in Sourav Chatterjee University of California, Berkeley Empirical Spectral Distribution Distribution

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 3 for Applied Multivariate Analysis Outline 1 Reprise-Vectors, vector lengths and the angle between them 2 3 Partial correlation

More information

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p. Title A new test for the proportionality of two large-dimensional covariance matrices Authors) Liu, B; Xu, L; Zheng, S; Tian, G Citation Journal of Multivariate Analysis, 04, v. 3, p. 93-308 Issued Date

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Five Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Five Notes Spring 2011 1 / 37 Outline 1

More information

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common

A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common Journal of Statistical Theory and Applications Volume 11, Number 1, 2012, pp. 23-45 ISSN 1538-7887 A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays

Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2015 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Tiefeng Jiang 1 and Fan Yang 1, University of Minnesota Abstract For random samples of size n obtained

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

TESTING SERIAL CORRELATION IN SEMIPARAMETRIC VARYING COEFFICIENT PARTIALLY LINEAR ERRORS-IN-VARIABLES MODEL

TESTING SERIAL CORRELATION IN SEMIPARAMETRIC VARYING COEFFICIENT PARTIALLY LINEAR ERRORS-IN-VARIABLES MODEL Jrl Syst Sci & Complexity (2009) 22: 483 494 TESTIG SERIAL CORRELATIO I SEMIPARAMETRIC VARYIG COEFFICIET PARTIALLY LIEAR ERRORS-I-VARIABLES MODEL Xuemei HU Feng LIU Zhizhong WAG Received: 19 September

More information

18Ï È² 7( &: ÄuANOVAp.O`û5 571 Based on this ANOVA model representation, Sobol (1993) proposed global sensitivity index, S i1...i s = D i1...i s /D, w

18Ï È² 7( &: ÄuANOVAp.O`û5 571 Based on this ANOVA model representation, Sobol (1993) proposed global sensitivity index, S i1...i s = D i1...i s /D, w A^VÇÚO 1 Êò 18Ï 2013c12 Chinese Journal of Applied Probability and Statistics Vol.29 No.6 Dec. 2013 Optimal Properties of Orthogonal Arrays Based on ANOVA High-Dimensional Model Representation Chen Xueping

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Stochastic Convergence, Delta Method & Moment Estimators

Stochastic Convergence, Delta Method & Moment Estimators Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)

More information

A note on mean testing for high dimensional multivariate data under non-normality

A note on mean testing for high dimensional multivariate data under non-normality A note on mean testing for high dimensional multivariate data under non-normality M. Rauf Ahmad, Dietrich von Rosen and Martin Singull Linköping University Post Print N.B.: When citing this work, cite

More information

A TWO SAMPLE TEST FOR HIGH DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING. By Song Xi Chen,Ying-LiQin Iowa State University

A TWO SAMPLE TEST FOR HIGH DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING. By Song Xi Chen,Ying-LiQin Iowa State University Submitted to the Annals of Statistics A TWO SAMPLE TEST FOR HIGH DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING By Song Xi Chen,Ying-LiQin Iowa State University We proposed a two sample test for

More information

4.1 Order Specification

4.1 Order Specification THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2009, Mr Ruey S Tsay Lecture 7: Structural Specification of VARMA Models continued 41 Order Specification Turn to data

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information