SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan
|
|
- Ross Willis
- 5 years ago
- Views:
Transcription
1 A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan Abstract We propose a new test for the equality of the mean vectors between a two groups with the same number of the observations in highdimensional data. The existing tests for this problem require a strong condition on the population covariance matrix. The proposed test in this paper does not require such conditions for it. This test will be obtained in a general model, that is, the data need not be normally distributed. 1 Introduction Advances of information technology and database system make it possible to collect and store high-dimensional data which have fewer observations than or equal to the dimension. High-dimensional data appears in various fields, such as DNA microarray data analysis and marketing data analysis. In such high-dimensional data, however, traditional multivariate analysis is not often applicable. In this paper, we propose a new test for the equality of two mean vectors of n observations which are given by (X p (i1), X p (i2) ), i = 1,..., n, where X p (ij) are p-dimensional random vectors. Within the group j = 1, 2, X p (ij) (i = 1,..., n) are independent, while the two groups may be dependent. In a specific situation, we can regard X p (i1) and X p (i2) as the two measurements on the same subject i before and after a treatment. Let X p (i) = X p (i1) X p (i2), µ p = E(X p (i) ) and Σ p = Var(X p (i) ), then the problem is a one-sample problem. We wish to test the following hypothesis: H 0 : µ p = 0 versus H 1 : µ p 0. (1.1) A traditional method to test the null hypothesis is Hotelling s T 2 test. However, Hotelling s T 2 test is not applicable when n p since the inverse of the 1
2 sample covariance matrix does not exist. Bai and Saranadasa (1996) propose a test for the two sample problem with equal population covariance matrices. For the one sample problem in (1.1), the test is based on the statistic F n,p = X T n,p X n,p 1 n trs n,p, (1.2) where X n,p is the sample mean vector and S n,p is the sample covariance matrix. More recently, Chen and Qin (2010) propose a test based on the statistic G n,p = 1 n(n 1) X p (i)t X p (j) (1.3) for the one sample problem in (1.1). They also propose a test for the two sample problem with unequal population covariance matrices. The asymptotic normality of the test statistic is obtained under the condition trσ 4 p = o{(trσ 2 p) 2 } and the null hypothesis H 0 in (1.1) when the sample size n and the dimension p tend to infinity. We easily show that F n,p = G n,p. Hence, for the one sample problem in (1.1), the asymptotic normality of Bai and Saranadasa s test statistic is also obtained under the condition trσ 4 p = o{(trσ 2 p) 2 } which is weaker than the original one. However, this condition is still restrictive. Indeed, the condition exclude some typical situation such as Σ p = (1 ρ)i p + ρ1 p 1 T p, ρ (0, 1), where I p is the p p identity matrix and 1 p is the p-column vector with all entries one, or the case where the maximum eigenvalue of Σ p has larger order of p than 1/2. Moreover, it is hard to estimate whether the condition is satisfied or not from the data because of its high-dimensionality. When the condition is not satisfied, the population covariance matrix makes an enormous effect on the asymptotic null distributions of the two test statistics. Katayama et al. (2010) show that the type of the asymptotic null distribution of Bai and Saranadasa s test statistic depends on the population covariance matrix, hence so is Chen and Qin s test statistic. To illustrate this phenomenon, we give a simple example. Suppose temporarily X p (i) s are independently and identically distributed (i.i.d.) as p-dimensional multivariate normal distribution with mean vector 0 p and covariance matrix Σ p = (1 ρ p )I p +ρ p 1 p 1 T p, ρ p = p α, α (0, 1). Here, 0 p denotes the p-column vector with all entries zero. Note that the condition is satisfied only when 1/2 < α < 1. Since the two test statistics are equivalent, we consider Chen 2
3 and Qin s test statistic only. We define the standardized version of the test statistic as follows: T = G n,p. 2trΣ 2 p/n(n 1) The distributions of T are plotted for (n, p) = (80, 300), (80, 500) based on 10, 000 simulations in Figure 1. This shows that the type of the asymptotic null distributions of T depends on α, hence on Σ p. When the condition is not satisfied, i.e., 0 < α 1/2, the asymptotic distribution of T is obviously not normal for the two cases considered. Indeed, the asymptotic distribution is standardized chi-square distribution with 1 degree of freedom for 0 < α < 1/2, and the convolution of normal and chi-square distributions for α = 1/2. See Katayama et al. (2010) for more details. Hence the type I error of Chen and Qin s test would be asymptotically incorrect. Estimated density of T for n = 80 and p = 300 Estimated density of T for n = 80 and p = 500 density normal α = 0.1 α = 0.3 α = 0.5 α = 0.7 α = 0.9 density normal α = 0.1 α = 0.3 α = 0.5 α = 0.7 α = Figure 1: Distribution of T, when n = 80 and p = 300, 500, based on 10, 000 Monte Carlo simulations Motivated by these findings, in this paper, we construct a test statistic which has the asymptotic normality under H 0 without any assumption on population covariance matrix. Hence, the type I error of the proposed test is always asymptotically correct when we use the percentile point of normal 3
4 distribution. The asymptotic normality is obtained without assuming that the random vectors are i.i.d. as multivariate normal distribution. The paper is organized as follows. In Section 2, we introduce the proposed test and obtain the asymptotic normality. Several simulations and a real data example are given in Section 3. All the proofs are given in Section 4. 2 Proposed Test In this section, we propose a test statistic for (1.1). The asymptotic normality of the test statistic is obtained without any assumption on the population covariance matrix under H 0. Like Srivastava (2009), we assume X (i) p = µ p + C p Z (i) p, i = 1,..., n, (2.1) where C p is a p p matrix with Σ p = C p C T p and Z (i) p = (z i1,..., z ip ) T. Assume that z ij s are independent random variables which satisfy E(z ij ) = 0, E(z 2 ij) = 1, E(z 4 ij) = γ <. For the high-dimensional inference, we assume the following condition: Condition A. n = n(p) as p. Now we propose the test statistic as where X n,p = (X (1) p T n,p = trx T n,pm n X n,p, (2.2),..., X p (n) ) T is the data matrix and M n is a known n n symmetric matrix which satisfies the following condition: Condition B. M n = (m ij ) with m ij 0 and m ii = 0, 1 T nm n 1 n 0, trm 4 n = o{(trm 2 n) 2 }. The proposed test statistic is also defined as T n,p = m ij X p (i)t X p (j). The mean and variance of T n,p are given by E(T n,p ) = (1 T nm n 1 n )µ T p µ p, (2.3) Var(T Mn ) = 2trM 2 ntrσ 2 p + 4(1 T nm 2 n1 n )µ T p Σ p µ p. (2.4) 4
5 The derivation of (2.4) is a little complicated, while the equation (2.3) is easily obtained. We shall give the derivation of (2.4) in Section 4. Since 1 T nm n 1 n 0 under Condition B, we note that T n,p is a valid statistic for testing the null hypothesis in (1.1). In view of trm 2 n 0, we also note that Var(T n,p ) 0 for any population mean vector µ p. For the asymptotic normality of T n,p under the null hypothesis H 0 in (1.1), we have the following theorem. Theorem 2.1 have Under the null hypothesis H 0 and Conditions A and B, we T n,p = T n,p 2trM 2 n trσ 2 p d N(0, 1), p, d where denotes the convergence in distribution and N(0, 1) denotes a standard normal distribution. Since trσ 2 p is unknown generally, we need to estimate it. Define U 2 = 1 P 2 n U 3 = 1 P 3 n U 4 = 1 P 4 n k k l X (i)t p X (i)t p X (i)t p X (j) p X (j) p X (j) p X p (i)t X (j) p, X p (i)t X p (k), X p (k)t X (l) p, where P r n = n!/(n r)!. Following Chen et al. (2010), an unbiased estimator trσ 2 p of trσ 2 p is given by trσ 2 p = U 2 2U 3 + U 4. (2.5) In addition to the unbiasedness, the estimator trσ 2 p is ratio-consistent which is given in the following theorem: Theorem 2.2 where Under Condition A, we have trσ 2 p P 1, p, (2.6) trσ 2 p P 1 denotes the convergence in probability. 5
6 From Theorem 2.1 and 2.2, under H 0 and Conditions A and B, we have T n,p 2trM 2 n trσ 2 p d N(0, 1), p. This result does not require any conditions on the population covariance matrix. The asymptotic normality is attained by using the known matrix M n which satisfies Condition B. For instance, the matrix M n is given by (i) M n = (ρ i j I( i j > 0)) with ρ (0, 1); (ii) M n = ( i j s I( i j > 0)) with s > 1; (iii) M n = (m i j I( i j > 0)) with lim n n 1 k=1 m k <, where I(E) for a event E equals to 1 when E is true, equals to 0 when E is false. The matrices (i) and (ii) are the special cases of (iii). We apply the Szegö theorem (see, e.g., Grenander and Szegö, 1958) to show that (iii) meets Condition B. For the asymptotic power of T n,p, we assume the following condition: Condition C. (1 T nm 2 n1 n )µ T p Σ p µ p = o(trm 2 ntrσ 2 p). The following theorem establishes the asymptotic power of T n,p under the local alternative defined by Condition C. Theorem 2.3 Under the alternative H 1 and Conditions A C, we have ( ) ( ) P Tn,p > z 1 α H 1 Φ z 1 α + (1T nm n 1 n )µ T p µ p 0, p, 2trM 2 n trσ 2 p where Φ denotes the distribution function of a standard normal distribution, z 1 α is the upper α-percentile point of the standard normal distribution, and P ( H 1 ) denotes the probability under the alternative H 1. From Theorem 2.3, we note that the asymptotic power of T n,p depends on λ n = 1 T nm n 1 n /(trmn) 2 1/2. Hence, we suppose that selecting the matrix M n which has larger λ n leads to larger power of T n,p. Obviously, the matrix M n which maximizes λ n is given by c1 n 1 T n where c is any constant. However, this matrix does not satisfy m ii = 0 and trmn 4 = o{(trmn) 2 2 } in Condition B. We need to maximize λ n over the matrices which satisfy Condition B. Unfortunately, it is hard to find such matrix even in the case (iii) since the 6
7 numbers of the parameters, we need to determine to maximize λ n, increase as the sample size increases. For this reason, we shall compare T n,p using (i) and (ii) numerically later in the next Section. It should be mentioned that when we use the matrix M n which maximizes λ n subject only to m ii = 0 in Condition B, the proposed statistic T n,p is equivalent to the statistic G n,p given in (1.3), and hence, to F n,p given in (1.2). Thus we note that the power of the proposed test would be lower than the two tests. In this view, we should use the existing test when Σ p is known and the asymptotic distribution is specified using the result of Katayama et.al. (2010). Nevertheless, the proposed test is useful when we have no information about Σ p for the correct type I error. In the rest of this section, we consider the consistency of the proposed test. Assume that the matrix M n is given by (iii). Then we have and n 1 1 T nm n 1 n = 2 (n k)m k = O(n) k=1 n 1 trmn 2 = 2 (n k)m 2 k = O(n), k=1 since lim n n 1 k=1 m k < and n 1 k=1 m2 k ( n 1 k=1 m k ) 2. Hence, Theorem 2.3 and (2.7) yield the following theorem: 1 T nm n 1 n trm 2 n = O( n). (2.7) Theorem 2.4 Assume Conditions A C and that M n has the form given by (iii). Let δ n,p = n 1/2 µ T p µ p /(trσ 2 p) 1/2. Under the alternative H 1, if δ n,p as p, then we have P ( T n,p > z 1 α H 1 ) 1 where P ( H 1 ) denotes the probability under the alternative H 1. 3 Numerical Results In this section, we evaluate performance of the proposed test statistic T n,p with the existing one T by computing the empirical significance level and power. The performance for a DNA microarray data is also evaluated. 7
8 3.1 Computational Complexity of trσ 2 p Before conducting several simulations and a DNA microarray data analysis, we need to mention the computational complexity of trσ 2 p. Indeed, a large amounts of time is required to calculate trσ 2 p since it includes the summation over the set of up to quadruplet, i.e., {(i, j, k, l) 1 i, j, k, l n}. Let a ij = X p (i) T X p (j) for the sake of simplicity. Then it follows that P 3 nu 3 = and = P 4 nu 4 = a ij a ik a 2 ij 2 a ii a ij a 2 ii {( ) 2 } ( ) a ij a 2 ij 2 a ii a ij + 2 j=1 i,j=1 j=1 i,j,k=1 i,j,k,l=1 a ij a kl 2 a ii a jj 2 k a ii a jk 4 a 2 ij 4 ( ) 2 ( )( = a ij 2 a ii i,j=1 ( ) 2 + a ii + 2 k a ij a ik a ii a ij ) a jk 4 j,k=1 a 2 ij + 8 i,j=1 a 2 ii a 2 ii (3.1) {( ) 2 } a ij j=1 ( ) a ii a ij 6 j=1 a 2 ii. (3.2) Thus, when a ij (i, j = 1,..., p) are given, we only require O(n 2 ) operations to calculate trσ 2 p through the above equations (3.1) and (3.2), while it costs as many as O(n 4 ) operations to calculate it directly. 3.2 Simulations We consider three population distributions in the model (2.1): random variables z ij s are i.i.d. as standard normal distribution, standardized t distribution with 5 degrees of freedom and standardized gamma distribution with shape and scale parameters 2 and 2. For the population covariance matrix, 8
9 we choose Σ 1 = (σ i σ j ρ i j ), Σ 2 = (σ i σ j ρ), Σ 3 = (σ i σ j u ij ) where ρ = 0.5, σi 2 = 2 + (p i + 1)/p, u ii = 1 and u ij = u ji (i j) are i.i.d. as U(0, 1), uniform distribution with the support (0, 1). We note that T n,p and T include the unknown parameter trσ 2 p. The estimator (2.5) is used in the following simulations. The empirical significance level and power are calculated based on 10, 000 Monte Carlo simulations. New Test (1) New Test (2) Covariance Population p Σ 1 Normal t Gamma Σ 2 Normal t Gamma Σ 3 Normal t Gamma Table 1: Empirical significance levels for Σ 1, Σ 2 and Σ 3 at 5% and 1% significance based on the normal approximation. 9
10 In Table 1, we evaluate empirical significance levels of the three statistics based on the normal approximation at 5% and 1% significance. We choose n = 80 and p = 100, 300, 500. In the table, New Test (1) and (2) denote the proposed test using (i) and (ii) as M n respectively. Considering trmn 4 = o{(trmn) 2 2 } in Condition B for the asymptotic normality of T n,p, we determine ρ and s in (i) and (ii) such that trmn/(trm 4 n) 2 2 = denotes the test proposed by Chen and Qin (2010). It is found that the proposed test has better approximation for the significance level, particularly at 1%, for the all populations and population covariance matrices considered. We denote η = µ T p µ p /(trσ 2 p) 1/2 for comparing the empirical powers of the three tests. This simulation is conducted at significance 5%. In Figure 2, we show the empirical power curves for n = 80, p = 300, Σ 1, Σ 2 and Σ 3 with standard normal, t and gamma distributions. It is found that the power of the proposed test which uses (ii) as M n is slightly higher than that using (i). As described in Section 2, the power of the proposed test is lower than that of Chen and Qin s test, theoretically. We also find this result in the figure. For improvement of the power of the proposed test, we may need to find a optimal matrix M n in the matrices which satisfy Condition B. This is a future problem. 3.3 A Real Data Example We evaluate performance of the proposed test using the prostate cancer data (Singh et al., 2002). This data contain 12, 600 genes for 102 patients, 52 of which are prostate tumor patients and 50 of which are prostate normal patients. Following Dudoit et al. (2002), we preprocess the data by first truncating on the closed set [1, 5000], second removing genes with max/min 5 or max min 150 where max and min denote the maximum and minimum intensities over 102 patients, and finally transforming all intensities to base-10 logarithms. The number of genes is reduced to p = 2, 745. We use n = 50 observations from each of the tumor and normal patients. Then this setup leads to the situation of (1.1). In Figure 3, we plot the distributions of T n,p and T under H 0 using the parametric bootstrap method, that is, we draw 50 samples from N p (0 p, S n,p ) ten thousand times, and then plot them each time. For the proposed test statistic T n,p, the matrix M n is given by (ii) of which parameter s is determined such that trm 4 n/(trm 2 n) 2 = This shows that the distribution of the proposed test statistic is much closer to the standard normal distribution 10
11 Normal, Σ 1 t, Σ 1 Gamma, Σ 1 Power New Test (1) New Test (2) Power New Test (1) New Test (2) Power New Test (1) New Test (2) η η η Normal, Σ 2 t, Σ 2 Gamma, Σ 2 Power New Test (1) New Test (2) Power New Test (1) New Test (2) Power New Test (1) New Test (2) η η η Normal, Σ 3 t, Σ 3 Gamma, Σ 3 Power New Test (1) New Test (2) Power New Test (1) New Test (2) Power New Test (1) New Test (2) η η η Figure 2: Power curves of the proposed test and Chen and Qin s test for n = 80, p = 300, Σ 1, Σ 2 and Σ 3 with normal, t and gamma distributions. 11
12 than the existing one. Estimated densities of T np and T density Normal New Test Figure 3: Distributions of T n,p and T under H 0 based on the parametric bootstrap method 4 Proofs In this section, we give proofs of the equation (2.4) and Theorem in Section 2. Throughout this section, we describe X i and Z i for X p (i) and respectively. Z (i) p 12
13 4.1 Proof of the equation (2.4) Note that T n,p = m ij (X i µ p ) T (X j µ p ) + 2 = m ij Zi T Cp T C p Z j + 2 m ij µ T p X j m ij µ T p C p Z j + m ij µ T p µ p m ij µ T p µ p. Hence, we have ( ) ( ) Var(T n,p ) = Var m ij Zi T Cp T C p Z j + 4Var m ij µ T p C p Z j + 4 m ij m kl Cov ( ) Zi T Cp T C p Z j, µ T p C p Z l k l = V 1 + 4V 2 + 4V 3, say. Clearly, V 3 = 0 since Z i s are independent and E(Z i ) = 0 p. For the first term V 1, using the vec-operator vec( ) and the Kronecker product (see, e.g., Schott, 2005), it follows that m ij Zi T Cp T C p Z j = vec(z n,p ) T (Cp T C p M n )vec(z n,p ), where Z n,p = (Z 1,..., Z n ) T. Then, from Lemma 2.1 in Srivastava (2009), V 1 = 2tr(C T p C p M n ) 2 = 2trM 2 ntrσ 2 p. For the second term V 2, let Θ = 1 p µ T p. Note that m ij µ T p C p Z j = vec(θ) T (C p M n )vec(z n,p ). Then we obtain V 2 = vec(θ) T (C p M n )(C p M n ) T vec(θ) = (µ p 1 p )(Σ p M 2 n)(µ p 1 p ) = (1 T p M 2 n1 p )µ T p Σ p µ p. Thus, the equation (2.4) is derived. 13
14 4.2 Proof of Theorem 2.1 This proof is based on the results in Srivastava (2009) and Chen and Qin (2010). Let m ij = m ij /(2trM 2 ntrσ 2 p) 1/2 and M n = ( m ij ). Then we have T n,p = n m ijx T i X j 2trM 2 n trσ 2 p = j 1 m ij Xi T X j = 2 j=2 m ij X T i X j = 2 Y j, j=2 where Y j = j 1 m ijx T i X j. We need to show that Q n = j=2 Y j d N(0, 1/4), p. (4.1) We can apply Martingale Central Limit Theorem (Hall and Heyde, 1980) for Q n to show (4.1). Let F n be the σ algebra which is generated by {X 1,..., X n }. Then we have F j 1 F j for any j = 2,..., n. It is easily obtained that E(Q n ) = 0 and E(Q 2 n) <. We note for n > m that Q n = Q m + n j=m+1 Y j and E( n j=m+1 Y j F m ) = 0 since when m+1 j n, E(Xi T X j F m ) = Xi T E(X j ) = 0 for 1 i m and E(Xi T X j F m ) = E(Xi T X j ) = 0 for m + 1 i n 1. Hence we find that the sequence {Q n, F n } is a zero mean and square integrable martingale array. It suffices to show that under Conditions A and B j=2 E(Yj 2 P F j 1 ) 1 4, p (4.2) j=2 E(Y 4 j ) 0, p. (4.3) We first show (4.2). Note that ( j 1 ) 2 E(Yj 2 F j 1 ) = E m ij Xi T X j F j 1 j=2 = j=2 m 2 ijxi T Σ p X i + i<j = A 1 + A 2, say. j 1 j=2 i 1 i 2 m i1 j m i2 jx T i 1 Σ p X i2 14
15 Also we note that E(A 1 ) = i<j m 2 ije(x T i Σ p X i ) = i<j m 2 ijtrσ 2 p = 1 2 tr M 2 ntrσ 2 p = 1 4, and we have ( ) Var(A 1 ) = Var m 2 ijxi T Σ p X i i<j ( n 1 = j=i+1 ( n 1 = j=i+1 m ij ) 2 Var(X T i Σ p X i ) m ij ) 2 [(γ 3) ] p vkk 2 + trσ 4 p, where v ii is the diagonal elements of C T p Σ p C p = (C T p C p )(C T p C p ). The last equation is derived from Lemma 2.1 in Srivastava (2009). Let D p = C T p C p = (d ij ). Following Lemma 2.6 in Srivastava (2009), we note that k=1 p vkk 2 = k=1 ( p p k=1 d 2 ik ) 2 tr(d T p D p ) 2 = trσ 4 p. (4.4) Also we note that under Condition B ( n 1 2 ( 2 ( m ij) 2 m ij) 2 + j=i+1 j=1 i 1 i 2 j=1 m i1 j m i2 j ) 2 = tr M 4 n. Hence, we have Since trσ 4 p/(trσ 2 p) 2 1, we obtain Var(A 1 ) = O(tr M 4 ntrσ 4 p). tr M 4 ntrσ 4 p = trmntrσ 4 4 p 0, p (4.5) 4(trMn) 2 2 (trσ 2 2 p) 15
16 P under Conditions A and B. Thus A 1 1/4 from the Chebyshev s inequality. P Next, we shall show A 2 0. Since n 1 A 2 = Xi T 1 Σ p X i2 m i1 j m i2 j, i 1 i 2 j=min{i 1,i 2 }+1 and E(X T i 1 Σ p X i2 X T i 3 Σ p X i4 ) = 0 when at least one index is different between (i 1, i 2 ) and (i 3, i 4 ), we have E(A 2 ) = 0 and E(A 2 2) = E(Xi T 1 Σ p X i2 ) 2 i 1 i 2 j=min{i 1,i 2 }+1 m i1 j m i2 j 2. Note that E(X T i 1 Σ p X i2 ) 2 = trσ 4 p for i 1 i 2 and m i1 j m i2 j i 1 i 2 j=min{i 1,i 2 }+1 i 1 i 2 j=1 2 ( 2 m i1 j m i2 j) + = tr M 4 n, ( j=1 m 2 ij ) 2 under Condition B. Then we have E(A 2 2) = O(tr M ntrσ 4 4 p). From (4.5), we obtain that E(A 2 2) 0 as p under Conditions A and B. The P Chebyshev s inequality yields A 2 0. Next, we verify (4.3). Note that E(Y 4 j ) = E = E = ( j 1 [ j 1 j 1 m ij X T i X j ) 4 m 2 ij(xi T X j ) 2 + m 4 ije(x T i X j ) j 1 i 1 i 2 j 1 Following Chen and Qin (2010), we have m i1 j m i2 jx T i 1 X j X T i 2 X j ] 2 i 1 i 2 m 2 i 1 j m 2 i 2 je[(x T i 1 X j ) 2 (X T i 2 X j ) 2 ]. E(X T i X j ) 4 = O(trΣ 4 p) + O{(trΣ 2 p) 2 }, i j. 16
17 Also we have from (4.4), E[(Xi T 1 X j ) 2 (Xi T 2 X j ) 2 ] = E[(Xj T Σ p X j ) 2 ] p = (γ 3) vkk 2 + 2trΣ 4 p + (trσ 2 p) 2 = O(trΣ 4 p) + O{(trΣ 2 p) 2 }, i 1 i 2 j. Following Corollary 2.7 in Srivastava (2009), we have k=1 and j 1 j=2 i 1 i 2 j 1 m 4 ij j=2 m 2 i 1 j m 2 i 2 j j=1 i 1 i 2 m 4 ij tr M n 4 m 2 i 1 j m 2 i 2 j ( j=1 m 2 ij ) 2 tr M 4 n. From (4.5) and the fact that tr M 4 n(trσ 2 p) 2 = trm 4 n 4(trM 2 n) 2 0, p under Conditions A and B, we obtain n j=2 E(Y 4 j ) 0, p. This ends the proof. 4.3 Proof of Theorem 2.2 Note that trσ 2 p is invariant under the location transformations X i X i +c p where c p is an arbitrary p-column constant vector. Hence, we assume µ p = 0 without loss of generality. Obviously, we have E(U 2 ) = trσ 2 p and E(U 3 ) = E(U 4 ) = 0. It follows from Chen et al. (2010) that Var(U 2 ) = 4 n 2 (trσ2 p) (γ 3) n trσ4 p + tr[(cp T C p ) 2 (Cp T C p ) 2 ] [ n 1 + O n 3 (trσ2 p) ] n 2 trσ4 p, Var(U 3 ) = 2 n 3 (trσ2 p) [ 1 n 2 trσ4 p + O n 4 (trσ2 p) ] n 3 trσ4 p 17
18 and Var(U 4 ) = 8 [ 1 n 4 (trσ2 p) 2 + O n 5 (trσ2 p) ] n 4 trσ4 p, where the symbol denotes the Hadamard product. Using the fact that tr[(c T p C p ) 2 (C T p C p ) 2 ] tr(c T p C p ) 4 = trσ 4 p, we have Var(U i /trσ 2 p) 0 for i = 2, 3, 4 under Condition A. This and the Chebychev s inequality yield (2.6). 4.4 Proof of Theorem 2.3 Using Theorem 2.2, we have n m ij(x i µ p ) T (X j µ p ) σ n,p d N(0, 1), p, under Conditions A and B. Note that T n,p = m ij (X i µ p ) T (X j µ p ) + 2 m ij µ T p X j m ij µ T p µ p. Hence, it remains to show that n m ijµ T p X j n m ijµ T p µ p σ n,p P 0, p. (4.6) We notice E( n m ijµ T p X j ) = n m ijµ T p µ p and ( ) Var m ij µ T p X j = 4(1 T nmn1 2 n )µ T p Σ p µ p = o(σn,p), 2 under Condition C. The Chebyshev s inequality yields (4.6). The Slutsky s theorem and (4.6) complete the proof. 18
19 References [1] Bai, Z and Saranadasa, H. (1996). Effect of high dimension: by an example of a two sample problem. Statistica Sinica 6: [2] Chen, S. X. and Qin, Y. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics 38: [3] Chen, S. X., Zhang, L. X. and Zhong, P. S. (2010). Tests for highdimensional covariance matrices. Journal of American Statistical Association 105: [4] Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Jounal of the American Statistical Association 97: [5] Grenander, U. and Szegö, G. (1958). Toeplitz forms and their applications, 2nd edition. New York: Chelsea Publishing Company. [6] Hall, P and Heyde, C. (1980). Martingale limit theory and applications. New York: Academic Press. [7] Katayama, S., Kano, Y. and Srivastava, M.S. (2010). Asymptotic distributions of some test criteria for the mean vector with fewer observations than the dimension. Submitted. [8] Schott, J. R. (2005). Matrix analysis for statistics, 2nd edition. New York: Wiley. [9] Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D Amico, A. V., Richie, J. P., Lander, E. S., Loda, M., Kantoff, P. W., Golub, T. R. and Sellers, W. R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1: [10] Srivastava, M. S. (2009). A test for the mean vector with fewer observations than the dimension under non-normality. Journal of Multivariate Analysis. 100:
On testing the equality of mean vectors in high dimension
ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.
More informationTesting linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices
Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical
More informationTesting Block-Diagonal Covariance Structure for High-Dimensional Data
Testing Block-Diagonal Covariance Structure for High-Dimensional Data MASASHI HYODO 1, NOBUMICHI SHUTOH 2, TAKAHIRO NISHIYAMA 3, AND TATJANA PAVLENKO 4 1 Department of Mathematical Sciences, Graduate School
More informationJackknife Empirical Likelihood Test for Equality of Two High Dimensional Means
Jackknife Empirical Likelihood est for Equality of wo High Dimensional Means Ruodu Wang, Liang Peng and Yongcheng Qi 2 Abstract It has been a long history to test the equality of two multivariate means.
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationASYMPTOTIC THEORY FOR DISCRIMINANT ANALYSIS IN HIGH DIMENSION LOW SAMPLE SIZE
Mem. Gra. Sci. Eng. Shimane Univ. Series B: Mathematics 48 (2015), pp. 15 26 ASYMPTOTIC THEORY FOR DISCRIMINANT ANALYSIS IN HIGH DIMENSION LOW SAMPLE SIZE MITSURU TAMATANI Communicated by Kanta Naito (Received:
More informationPackage SHIP. February 19, 2015
Type Package Package SHIP February 19, 2015 Title SHrinkage covariance Incorporating Prior knowledge Version 1.0.2 Author Maintainer Vincent Guillemot The SHIP-package allows
More informationA Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data
A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine
More informationEstimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality
Estimation of multivariate rd moment for high-dimensional data and its application for testing multivariate normality Takayuki Yamada, Tetsuto Himeno Institute for Comprehensive Education, Center of General
More informationStatistical Inference On the High-dimensional Gaussian Covarianc
Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional
More informationOn the Testing and Estimation of High- Dimensional Covariance Matrices
Clemson University TigerPrints All Dissertations Dissertations 1-009 On the Testing and Estimation of High- Dimensional Covariance Matrices Thomas Fisher Clemson University, thomas.j.fisher@gmail.com Follow
More informationTWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA
Statistica Sinica (014): Preprint 1 TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA Long Feng 1, Changliang Zou 1, Zhaojun Wang 1, Lixing Zhu 1 Nankai University and Hong Kong Baptist University
More informationMULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA
J. Japan Statist. Soc. Vol. 37 No. 1 2007 53 86 MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA M. S. Srivastava* In this article, we develop a multivariate theory for analyzing multivariate datasets
More informationSIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA
SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,
More informationHigh-dimensional two-sample tests under strongly spiked eigenvalue models
1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationHYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX
Submitted to the Annals of Statistics HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX By Shurong Zheng, Zhao Chen, Hengjian Cui and Runze Li Northeast Normal University, Fudan
More informationAre a set of microarrays independent of each other?
Are a set of microarrays independent of each other? Bradley Efron Stanford University Abstract Having observed an m n matrix X whose rows are possibly correlated, we wish to test the hypothesis that the
More informationROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) Bradley Efron Department of Statistics Stanford University
ROW AND COLUMN CORRELATIONS (ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER?) By Bradley Efron Department of Statistics Stanford University Technical Report 244 March 2008 This research was supported
More informationA Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report -
A Box-Type Approximation for General Two-Sample Repeated Measures - Technical Report - Edgar Brunner and Marius Placzek University of Göttingen, Germany 3. August 0 . Statistical Model and Hypotheses Throughout
More informationPackage IGG. R topics documented: April 9, 2018
Package IGG April 9, 2018 Type Package Title Inverse Gamma-Gamma Version 1.0 Date 2018-04-04 Author Ray Bai, Malay Ghosh Maintainer Ray Bai Description Implements Bayesian linear regression,
More informationOn corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR
Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional
More informationOn corrections of classical multivariate tests for high-dimensional data
On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationKRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS
Bull. Korean Math. Soc. 5 (24), No. 3, pp. 7 76 http://dx.doi.org/34/bkms.24.5.3.7 KRUSKAL-WALLIS ONE-WAY ANALYSIS OF VARIANCE BASED ON LINEAR PLACEMENTS Yicheng Hong and Sungchul Lee Abstract. The limiting
More informationEmpirical Likelihood Tests for High-dimensional Data
Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on
More informationTesting Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality
Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Yuki Yamada, Masashi Hyodo and Takahiro Nishiyama Department of Mathematical Information Science, Tokyo University
More informationModified Simes Critical Values Under Positive Dependence
Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2016-0063R2 Title Two-sample tests for high-dimension, strongly spiked eigenvalue models Manuscript ID SS-2016-0063R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI
More informationTesting Some Covariance Structures under a Growth Curve Model in High Dimension
Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping
More informationTAMS39 Lecture 2 Multivariate normal distribution
TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution
More informationTesting equality of two mean vectors with unequal sample sizes for populations with correlation
Testing equality of two mean vectors with unequal sample sizes for populations with correlation Aya Shinozaki Naoya Okamoto 2 and Takashi Seo Department of Mathematical Information Science Tokyo University
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationHigh-dimensional asymptotic expansions for the distributions of canonical correlations
Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic
More informationOn Expected Gaussian Random Determinants
On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose
More informationApproximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work
Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto
More informationMultivariate Time Series
Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form
More informationStatistical Inference of Covariate-Adjusted Randomized Experiments
1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationOn a General Two-Sample Test with New Critical Value for Multivariate Feature Selection in Chest X-Ray Image Analysis Problem
Applied Mathematical Sciences, Vol. 9, 2015, no. 147, 7317-7325 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2015.510687 On a General Two-Sample Test with New Critical Value for Multivariate
More informationConsistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis
Consistency of Distance-based Criterion for Selection of Variables in High-dimensional Two-Group Discriminant Analysis Tetsuro Sakurai and Yasunori Fujikoshi Center of General Education, Tokyo University
More informationTests for Covariance Matrices, particularly for High-dimensional Data
M. Rauf Ahmad Tests for Covariance Matrices, particularly for High-dimensional Data Technical Report Number 09, 200 Department of Statistics University of Munich http://www.stat.uni-muenchen.de Tests for
More informationComposite Hotelling s T-Square Test for High-Dimensional Data. 1. Introduction
A^VÇÚO 1 33 ò 1 4 Ï 2017 c 8 Chinese Journal of Applied Probability and Statistics Aug., 2017, Vol. 33, No. 4, pp. 349-368 doi: 10.3969/j.issn.1001-4268.2017.04.003 Composite Hotelling s T-Square Test
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More informationConsistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis
Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationComparison of Discrimination Methods for High Dimensional Data
Comparison of Discrimination Methods for High Dimensional Data M.S.Srivastava 1 and T.Kubokawa University of Toronto and University of Tokyo Abstract In microarray experiments, the dimension p of the data
More informationConsistency of test based method for selection of variables in high dimensional two group discriminant analysis
https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai
More informationAsymptotic Theory. L. Magee revised January 21, 2013
Asymptotic Theory L. Magee revised January 21, 2013 1 Convergence 1.1 Definitions Let a n to refer to a random variable that is a function of n random variables. Convergence in Probability The scalar a
More informationEffects of data dimension on empirical likelihood
Biometrika Advance Access published August 8, 9 Biometrika (9), pp. C 9 Biometrika Trust Printed in Great Britain doi:.9/biomet/asp7 Effects of data dimension on empirical likelihood BY SONG XI CHEN Department
More informationCourse information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.
Course information: Instructor: Tim Hanson, Leconte 219C, phone 777-3859. Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Text: Applied Linear Statistical Models (5th Edition),
More informationBootstrapping high dimensional vector: interplay between dependence and dimensionality
Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang
More informationTESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION
TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION Gábor J. Székely Bowling Green State University Maria L. Rizzo Ohio University October 30, 2004 Abstract We propose a new nonparametric test for equality
More informationCanonical Correlation Analysis of Longitudinal Data
Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate
More information2.1 Linear regression with matrices
21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and
More informationLecture Note 1: Probability Theory and Statistics
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would
More informationLecture 3 - Expectation, inequalities and laws of large numbers
Lecture 3 - Expectation, inequalities and laws of large numbers Jan Bouda FI MU April 19, 2009 Jan Bouda (FI MU) Lecture 3 - Expectation, inequalities and laws of large numbersapril 19, 2009 1 / 67 Part
More informationProbability Lecture III (August, 2006)
robability Lecture III (August, 2006) 1 Some roperties of Random Vectors and Matrices We generalize univariate notions in this section. Definition 1 Let U = U ij k l, a matrix of random variables. Suppose
More informationHigh-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi
High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,
More informationChp 4. Expectation and Variance
Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.
More informationLarge-Scale Multiple Testing of Correlations
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-5-2016 Large-Scale Multiple Testing of Correlations T. Tony Cai University of Pennsylvania Weidong Liu Follow this
More informationCS281A/Stat241A Lecture 17
CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationLarge-Scale Multiple Testing of Correlations
Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity
More informationHan-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek
J. Korean Math. Soc. 41 (2004), No. 5, pp. 883 894 CONVERGENCE OF WEIGHTED SUMS FOR DEPENDENT RANDOM VARIABLES Han-Ying Liang, Dong-Xia Zhang, and Jong-Il Baek Abstract. We discuss in this paper the strong
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationCOMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS
Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department
More informationStatistica Sinica Preprint No: SS
Statistica Sinica Preprint No: SS-2017-0275 Title Testing homogeneity of high-dimensional covariance matrices Manuscript ID SS-2017-0275 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0275
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More information6 The normal distribution, the central limit theorem and random samples
6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e
More informationA new method to bound rate of convergence
A new method to bound rate of convergence Arup Bose Indian Statistical Institute, Kolkata, abose@isical.ac.in Sourav Chatterjee University of California, Berkeley Empirical Spectral Distribution Distribution
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 3 for Applied Multivariate Analysis Outline 1 Reprise-Vectors, vector lengths and the angle between them 2 3 Partial correlation
More informationA new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.
Title A new test for the proportionality of two large-dimensional covariance matrices Authors) Liu, B; Xu, L; Zheng, S; Tian, G Citation Journal of Multivariate Analysis, 04, v. 3, p. 93-308 Issued Date
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationProbability and Statistics Notes
Probability and Statistics Notes Chapter Five Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Five Notes Spring 2011 1 / 37 Outline 1
More informationA Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when the Covariance Matrices are Unknown but Common
Journal of Statistical Theory and Applications Volume 11, Number 1, 2012, pp. 23-45 ISSN 1538-7887 A Test for Order Restriction of Several Multivariate Normal Mean Vectors against all Alternatives when
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationTesting Homogeneity Of A Large Data Set By Bootstrapping
Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More information4. Distributions of Functions of Random Variables
4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n
More informationPrincipal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays
Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) with applications to Microarrays Prof. Tesler Math 283 Fall 2015 Prof. Tesler Principal Components Analysis Math 283 / Fall 2015
More informationlarge number of i.i.d. observations from P. For concreteness, suppose
1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationCentral Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions
Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Tiefeng Jiang 1 and Fan Yang 1, University of Minnesota Abstract For random samples of size n obtained
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More informationStat 206: Sampling theory, sample moments, mahalanobis
Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because
More informationTESTING SERIAL CORRELATION IN SEMIPARAMETRIC VARYING COEFFICIENT PARTIALLY LINEAR ERRORS-IN-VARIABLES MODEL
Jrl Syst Sci & Complexity (2009) 22: 483 494 TESTIG SERIAL CORRELATIO I SEMIPARAMETRIC VARYIG COEFFICIET PARTIALLY LIEAR ERRORS-I-VARIABLES MODEL Xuemei HU Feng LIU Zhizhong WAG Received: 19 September
More information18Ï È² 7( &: ÄuANOVAp.O`û5 571 Based on this ANOVA model representation, Sobol (1993) proposed global sensitivity index, S i1...i s = D i1...i s /D, w
A^VÇÚO 1 Êò 18Ï 2013c12 Chinese Journal of Applied Probability and Statistics Vol.29 No.6 Dec. 2013 Optimal Properties of Orthogonal Arrays Based on ANOVA High-Dimensional Model Representation Chen Xueping
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationStochastic Convergence, Delta Method & Moment Estimators
Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)
More informationA note on mean testing for high dimensional multivariate data under non-normality
A note on mean testing for high dimensional multivariate data under non-normality M. Rauf Ahmad, Dietrich von Rosen and Martin Singull Linköping University Post Print N.B.: When citing this work, cite
More informationA TWO SAMPLE TEST FOR HIGH DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING. By Song Xi Chen,Ying-LiQin Iowa State University
Submitted to the Annals of Statistics A TWO SAMPLE TEST FOR HIGH DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING By Song Xi Chen,Ying-LiQin Iowa State University We proposed a two sample test for
More information4.1 Order Specification
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2009, Mr Ruey S Tsay Lecture 7: Structural Specification of VARMA Models continued 41 Order Specification Turn to data
More informationThe largest eigenvalues of the sample covariance matrix. in the heavy-tail case
The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationThomas J. Fisher. Research Statement. Preliminary Results
Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More information