High-dimensional two-sample tests under strongly spiked eigenvalue models

Size: px
Start display at page:

Download "High-dimensional two-sample tests under strongly spiked eigenvalue models"

Transcription

1 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data under the strongly spiked eigenvalue (SSE) model. We provide a general test statistic as a function of positive-semidefinite matrices. We investigate the test statistic under the SSE model by considering strongly spiked eigenstructures and create a new effective test procedure for the SSE model. Key words and phrases: Asymptotic normality, eigenstructure estimation, large p small n, noise reduction methodology, spiked model. 1. Introduction A common feature of high-dimensional data is that the data dimension is high, however, the sample size is relatively low. This is the so-called HDLSS or large p, small n data, where p is the data dimension, n is the sample size and p/n. Statistical inference on this type of data is becoming increasingly relevant, especially in the areas of medical diagnostics, engineering and other big data. Suppose we have independent samples of p-variate random variables from two populations, π i, i = 1, 2, having an unknown mean vector µ i and unknown positive-definite covariance matrix Σ i for each π i. We do not assume that the population distributions are Gaussian. The eigen-decomposition of Σ i (i = 1, 2) is given by Σ i = H i Λ i H T i = p =1 λ ih i h T i, where Λ i = diag(λ i1,..., λ ip ) is a diagonal matrix of eigenvalues, λ i1 λ ip > 0, and H i = [h i1,..., h ip ] is an orthogonal matrix of the corresponding eigenvectors. Note that λ i1 is the largest eigenvalue of Σ i for i = 1, 2. Having recorded i.i.d. samples, x i, = 1,..., n i, from each π i, let x i = H i Λ 1/2 i z i + µ i, where z i = (z i1,..., z ip ) T is considered as a sphered data vector having the zero mean vector and identity covariance matrix. We assume that the fourth moments of each variable in z i are uniformly bounded. When Σ 1 = Σ 2, we simply omit the population index from Σ i, λ i s

2 2 Makoto Aoshima and Kazuyoshi Yata and h i s. For example, we write the covariance matrix as Σ when Σ 1 = Σ 2. In this paper, we consider the two-sample test: H 0 : µ 1 = µ 2 vs. H 1 : µ 1 µ 2. (1.1) Having recorded i.i.d. samples, x i, = 1,..., n i, from each π i, we define x ini = ni =1 x i/n i and S ini = n i =1 (x i x ini )(x i x ini ) T /(n i 1) for i = 1, 2. We assume n i 4 for i = 1, 2. Hotelling s T 2 -statistic is defined by T 2 = n 1n 2 n 1 + n 2 (x 1n1 x 2n2 ) T S 1 (x 1n1 x 2n2 ), where S = {(n 1 1)S 1n1 + (n 2 1)S 2n2 }/(n 1 + n 2 2). However, S 1 does not exist in the HDLSS context such as p/n i, i = 1, 2. In such situations, Dempster (1958, 1960) and Srivastava (2007) considered the test when π 1 and π 2 are Gaussian. When π 1 and π 2 are non-gaussian, Bai and Saranadasa (1996) and Cai et al. (2014) considered the test under homoscedasticity, Σ 1 = Σ 2. On the other hand, Chen and Qin (2010) and Aoshima and Yata (2011, 2015) considered the distance-based two-sample test under heteroscedasticity, Σ 1 Σ 2. discussed in Section 2 of Aoshima and Yata (2015), the distance-based twosample test is quite flexible for high-dimension, non-gaussian data. As We note that those two-sample tests were constructed under the eigenvalue condition as follows: λ 2 i1 tr(σ 2 0 as p for i = 1, 2. (1.2) i ) However, if (1.2) is not met, one cannot use those two-sample tests. See Aoshima and Yata (2016) for the details. Aoshima and Yata (2016) called (1.2) the nonstrongly spiked eigenvalue (NSSE) model. On the hand, Aoshima and Yata (2016) considered the strongly spiked eigenvalue (SSE) model as follows: lim inf p { λ 2 } i1 tr(σ 2 > 0 for i = 1 or 2. (1.3) i ) We emphasize that high-dimensional data often have the SSE model. See Fig. 1 in Yata and Aoshima (2013) and Section 8 in Aoshima and Yata (2016). For the SSE model, Katayama et al. (2013) considered a one-sample test when the population distribution is Gaussian. Ishii et al. (2016) considered the one-sample test for non-gaussian cases. Ma et al. (2015) considered a two-sample test for the factor model when Σ 1 = Σ 2.

3 Two-sample tests for strongly spiked models 3 In this paper, we propose a new effective test procedure for the SSE model. In Section 2, we provide a general test statistic as a function of positive-semidefinite matrices. We investigate the test statistic under the SSE model by considering strongly spiked eigenstructures. In Section 3, we create a new test procedure by estimating the eigenstructures for the SSE model. 2. Test statistic using eigenstructures In this paper, we consider the divergence condition such as p, n 1 and n 2, which is equivalent to Let m, where m = min{p, n min } with n min = min{n 1, n 2 }. Ψ i(s) = p λ 2 i for i = 1, 2; s = 1,..., p. =s We consider the following model: (A-i) For i = 1, 2, there exists a positive fixed integer k i such that λ i1,..., λ iki are distinct in the sense that lim inf p (λ i /λ i 1) > 0 when 1 < k i, and λ iki and λ iki +1 satisfy lim inf p λ 2 ik i Ψ i(ki ) > 0 and λ 2 ik i +1 Ψ i(ki +1) 0 as p. Note that (A-i) implies (1.3), that is (A-i) is one of the SSE models. (A-i) is also a power spiked model given by Yata and Aoshima (2013). We consider the following test statistic with positive-semidefinite matrices, A i, i = 1, 2, of dimension p: T (A 1, A 2 ) = 2 2 i=1 ni < x T i A ix i n i (n i 1) 2x T 1n 1 A 1/2 1 A 1/2 2 x 2n2. Let I p denote the identity matrix of dimension p. Note that T (I p, I p ) is equivalent to the distance-based two-sample test. Let us write that µ A12 = A 1/2 1 µ 1 A 1/2 2 µ 2 and Σ i,ai = A 1/2 i Σ i A 1/2 i, i = 1, 2. Let (A 1, A 2 ) = µ A12 2 and K(A 1, A 2 ) = K 1 (A 1, A 2 ) + K 2 (A 1, A 2 ), where K 1 (A 1, A 2 ) = 2 2 i=1 tr(σ 2 i,a i ) n i (n i 1) + 4tr(Σ 1,A i Σ 2,Ai ) n 1 n 2

4 4 Makoto Aoshima and Kazuyoshi Yata and K 2 (A 1, A 2 ) = 4 2 i=1 µt A 12 Σ i,a µ A12 /n i. Note that E{T (A 1, A 2 )} = (A 1, A 2 ) and Var{T (A 1, A 2 )} = K(A 1, A 2 ). Let λ max (B) denote the largest eigenvalue of any positive-semidefinite matrix, B. We consider the following condition: {λ max (Σ i,ai )} 2 tr(σ 2 i,a i ) Then, Aoshima and Yata (2016) showed that as m 0 as p for i = 1, 2. (2.1) T (A 1, A 2 ) (A 1, A 2 ) {K(A 1, A 2 )} 1/2 N(0, 1) (2.2) under (2.1), lim sup m { (A 1, A 2 )} 2 /K 1 (A 1, A 2 ) < and some regularity conditions. Here, denotes the convergence in distribution and N(0, 1) denotes a random variable distributed as the standard normal distribution. We consider A i s as k i A i(ki ) = I p h i h T i = =1 p =k i +1 h i h T i for i = 1, 2. Note that A i(ki ) = A 1/2 i(k i ). Let Σ i = A 1/2 i(k i ) Σ ia 1/2 i(k i ) = p =k i +1 λ ih i h T i for i = 1, 2. Then, it holds that tr(σ 2 i ) = Ψ i(ki +1) and λ max (Σ i ) = λ ki +1 for i = 1, 2, so that (2.1) is met when A i = A i(ki ), i = 1, 2, under (A-i). Hence, for A i = A i(ki ), i = 1, 2, we can claim (2.2) under (A-i) instead of (2.1). Hereafter, we simply write T = T (A 1(k1 ), A 2(k2 )), µ i = A i(ki )µ i for i = 1, 2, = (A 1(k1 ), A 2(k2 )) = µ 1 µ 2 2, K = K(A 1(k1 ), A 2(k2 )) and K 1 = K 1 (A 1(k1 ), A 2(k2 )) = 2 2 i=1 Note that tr(σ 2 i ) = Ψ i(ki +1) for i = 1, 2. Let tr(σ 2 i ) n i (n i 1) + 4tr(Σ 1 Σ 2 ). n 1 n 2 x il = h T ix il = λ 1/2 i z il + µ i() for all i,, l, where µ i() = h T iµ i. Then, we write that 2 ni l<l (x T T =2 il x il k i =1 x ilx il ) n i (n i 1) i=1 2 n1 l=1 n2 l =1 (x 1l k 1 =1 x 1lh 1 ) T (x 2l k 2 =1 x 2l h 2) n 1 n 2.

5 Two-sample tests for strongly spiked models 5 In order to use T, it is necessary to estimate x il s and h i s. 3. Test procedure using eigenstructures for the SSE model In this section, we assume (A-i) and the following assumption for π i, i = 1, 2: (A-ii) E(zis 2 z2 it ) = E(z2 is )E(z2 it ), E(z isz it z iu ) = 0 and E(z is z it z iu z iv ) = 0 for all s t, u, v, with z il s defined in Section 1. When the π i s are Gaussian, (A-ii) naturally holds. First, we discuss estimation of the eigenvalues and eigenvectors in the SSE model Estimation of eigenvalues and eigenvectors Throughout this section, we omit the subscript with regard to the population for the sake of simplicity. Let ˆλ 1 ˆλ p 0 be the eigenvalues of S n. Let us write the eigen-decomposition of S n as S n = p =1 ˆλ ĥ ĥ T, where ĥ denotes a unit eigenvector corresponding to ˆλ. We assume h T ĥ 0 w.p.1 for all without loss of generality. Let X = [x 1,..., x n ] and X = [x n,..., x n ]. Then, we define the n n dual sample covariance matrix by S D = (n 1) 1 (X X) T (X X). Note that S n and S D share non-zero eigenvalues. Let us write the eigendecomposition of S D as S D = n 1 =1 ˆλ û û T, where û = (û 1,..., û n ) T denotes a unit eigenvector corresponding to ˆλ. Note that ĥ can be calculated by ĥ = {(n 1)ˆλ } 1/2 (X X)û. Let δ = p =k+1 λ /(n 1). Let m 0 = min{p, n}. First, we have the following result. Proposition 1 (Aoshima and Yata, 2016). Assume (A-i) and (A-ii). It holds for = 1,..., k, that as m 0 ˆλ = 1 + δ ( + O P (n 1/2 ) and λ λ (ĥt h ) 2 = 1 + δ ) 1 + OP (n 1/2 ). λ If δ/λ as m 0, ˆλ and ĥ are strongly inconsistent in the sense that λ /ˆλ = o P (1) and (ĥt h ) 2 = o P (1). In order to overcome the curse of dimensionality, Yata and Aoshima (2012) proposed an eigenvalue estimation

6 6 Makoto Aoshima and Kazuyoshi Yata called the noise-reduction (NR) methodology, which was brought about by a geometric representation of S D. If one applies the NR methodology, the λ s are estimated by λ = ˆλ tr(s D) l=1 ˆλ l n 1 ( = 1,..., n 2). (3.1) Note that λ 0 w.p.1 for = 1,..., n 2, and the second term in (3.1) is an estimator of δ. When applying the NR methodology to the PC direction vector, one obtains h = {(n 1) λ } 1/2 (X X)û (3.2) for = 1,..., n 2. Then, we have the following result. Proposition 2 (Aoshima and Yata, 2016). Assume (A-i) and (A-ii). It holds for = 1,..., k, that as m 0 λ λ = 1 + O P (n 1/2 ) and ( h T h ) 2 = 1 + O P (n 1 ). We note that h is a consistent estimator of h in terms of the inner product even when δ/λ as m 0. On the other hand, we note that h T (x l µ) = λ 1/2 z l for all, l. For ĥ and h, we have the following result. Proposition 3 (Aoshima and Yata, 2016). Assume (A-i) and (A-ii). It holds for = 1,..., k (l = 1,..., n) that as m 0 λ 1/2 ĥ T (x l µ) = z l + (n 1) 1/2 û l λ 1 δ{1 + o P (1)} (1 + λ 1 δ) 1/2 + O P (n 1/2 ); λ 1/2 h T (x l µ) = z l + (n 1) 1/2 û l λ 1 δ{1 + o P (1)} + O P (n 1/2 ). Let us consider the standard deviation of the above quantities. Note that [ n l=1 {(n 1)1/2 û l δ/λ } 2 /n] 1/2 = O(δ/λ ) and δ = O(p/n) for λ k+1 = O(1). Hence, in Proposition 3, the inner products are very biased when p is large. Now, we explain the main reason why the inner products involve the large biased terms. Let P n = I n 1 n 1 T n /n, where 1 n = (1,..., 1) T. Note that 1 T n û = 0 and P n û = û when ˆλ > 0 since 1 T n S D 1 n = 0. Also, when ˆλ > 0, note that {(n 1) λ } 1/2 h = (X X)û = (X M)P n û = (X M)û,

7 Two-sample tests for strongly spiked models 7 where M = [µ,..., µ]. Thus it holds that {(n 1) λ } 1/2 ht (x l µ) = û T (X M) T (x l µ) = û l x l µ 2 + n s=1( l) ûs(x s µ) T (x l µ), so that û l x l µ 2 is very biased since E( x l µ 2 )/{(n 1) 1/2 λ } (n 1) 1/2 δ/λ. Hence, one should not apply the ĥs or the h s to the estimation of the inner product. Here, we consider a bias-reduced estimation of the inner product. Let us write that û l = (û 1,..., û l 1, û l /(n 1), û l+1,..., û n ) T whose l-th element is û l /(n 1) for all, l. Note that û l = û (0,..., 0, û l n/(n 1), 0,..., 0) T. Let h l = {(n 1) λ } 1/2 (X X)û l (3.3) for all, l. When ˆλ > 0, we note that {(n 1) λ } 1/2 hl = (X M)P n û l = (X M)û (l) since 1 T n û = n l=1 ûl = 0, where û (l) = (û 1,..., û l 1, 0, û l+1,..., û n ) T + (n 1) 1 û l 1 n(l) for l = 1,..., n. Here, 1 n(l) = (1,..., 1, 0, 1,..., 1) T whose l-th element is 0. Thus it holds that {(n 1) λ } 1/2 ht l(x l µ) = û T (l) (X M)T (x l µ) n = {û s + (n 1) 1 û l }(x s µ) T (x l µ), s=1( l) so that the large biased term, x l µ 2, has vanished. following result. Then, we have the Proposition 4 (Aoshima and Yata, 2016). Assume (A-i) and (A-ii). It holds for = 1,..., k (l = 1,..., n) that as m 0 λ 1/2 h T l(x l µ) = z l + û l O P {(n 1/2 λ ) 1 λ 1 } + O P (n 1/2 ). Note that [ n l=1 {û lλ 1 /(n 1/2 λ )} 2 /n] 1/2 = λ 1 /(λ n). The bias term is small when λ 1 /λ is not large Test procedure using eigenstructures Let x il = h T ilx il for all i,, l, where h il s are defined by (3.3). From Propo-

8 8 Makoto Aoshima and Kazuyoshi Yata sitions 2 and 4, we consider the following test statistic for (1.1): T =2 2 i=1 2 ni l<l (x T il x il k i =1 x il x il ) n i (n i 1) n1 l=1 n2 l =1 (x 1l k 1 =1 x 1l h 1 ) T (x 2l k 2 =1 x 2l h2 ) n 1 n 2, where h i s are defined by (3.2). Then, we have the following result. Theorem 1 (Aoshima and Yata, 2016). Assume (A-i) and (A-ii). Assume also Then, it holds that as m lim sup m 2 K 1 <. under some regularity conditions. T K 1/2 N(0, 1) Let z c be a constant such that P {N(0, 1) > z c } = c for c (0, 1). We note that K 1 /K = 1 + o(1) as m under (A-i) and lim sup m 2 /K 1 <. Then, for given α (0, 1/2), we consider testing the hypothesis in (1.1) by reecting H 0 T K 1/2 1 > z α, (3.4) where K 1 is defined in Section 5.2 of Aoshima and Yata (2016). Let power( ) denote the power of the test (3.4). Then, we have the following result. Theorem 2 (Aoshima and Yata, 2016). Assume (A-i) and (A-ii). Then, the test (3.4) has as m size = α + o(1) and power( ) Φ ( K 1/2 ( K1 ) ) 1/2 z α = o(1) K under some regularity conditions, where Φ( ) denotes the cumulative distribution function of N(0, 1). In general, k i s are unknown in T. See Section 6.2 in Aoshima and Yata (2016) for estimation of k i s.

9 9 Acknowledgements The research of the first author was partially supported by Grants-in-Aid for Scientific Research (A) and Challenging Exploratory Research, Japan Society for the Promotion of Science (JSPS), under Contract Numbers 15H01678 and The research of the second author was partially supported by Grantin-Aid for Young Scientists (B), JSPS, under Contract Number References Aoshima, M. and Yata, K. (2011). Two-stage procedures for high-dimensional data. Sequential Anal. (Editor s special invited paper) 30, Aoshima, M. and Yata, K. (2015). Asymptotic normality for inference on multisample, high-dimensional mean vectors under mild conditions. Methodol. Comput. Appl. Probab. 17, Aoshima, M. and Yata, K. (2016). Two-sample tests for high-dimension, strongly spiked eigenvalue models. Statist. Sinica, revised (arxiv: ). Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6, Cai, T. T., Liu, W. and Xia, Y. (2014). Two sample test of high dimensional means under dependence. J. R. Statist. Soc. Ser. B 76, Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38, Dempster, A. P. (1958). A high dimensional two sample significance test. Ann. Math. Statist. 29, Dempster, A. P. (1960). A significance test for the separation of two highly multivariate small samples. Biometrics 16, Ishii, A., Yata, K. and Aoshima, M. (2016). Asymptotic properties of the first principal component and equality tests of covariance matrices in highdimension, low-sample-size context. J. Statist. Plan. Infer. 170,

10 10 Makoto Aoshima and Kazuyoshi Yata Katayama, S., Kano, Y. and Srivastava, M. S. (2013). Asymptotic distributions of some test criteria for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 116, Ma, Y., Lan, W. and Wang, H. (2015). A high dimensional two-sample test under a low dimensional factor structure. J. Multivariate Anal. 140, Srivastava, M. S. (2007). Multivariate theory for analyzing high dimensional data. J. Japan Statist. Soc. 37, Yata, K. and Aoshima, M. (2012). Effective PCA for high-dimension, low-samplesize data with noise reduction via geometric representations. J. Multivariate Anal. 105, Yata, K. and Aoshima, M. (2013). PCA consistency for the power spiked model in high-dimensional settings. J. Multivariate Anal. 122, Institute of Mathematics, University of Tsukuba, Ibaraki , Japan. aoshima@math.tsukuba.ac.p Institute of Mathematics, University of Tsukuba, Ibaraki , Japan. yata@math.tsukuba.ac.p

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0063R2 Title Two-sample tests for high-dimension, strongly spiked eigenvalue models Manuscript ID SS-2016-0063R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

The Third International Workshop in Sequential Methodologies

The Third International Workshop in Sequential Methodologies Area C.6.1: Wednesday, June 15, 4:00pm Kazuyoshi Yata Institute of Mathematics, University of Tsukuba, Japan Effective PCA for large p, small n context with sample size determination In recent years, substantial

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data

A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data Ann Inst Stat Math (2014) 66:983 1010 DOI 10.1007/s10463-013-0435-8 A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data Makoto Aoshima Kazuyoshi Yata Received:

More information

On testing the equality of mean vectors in high dimension

On testing the equality of mean vectors in high dimension ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 17, Number 1, June 2013 Available online at www.math.ut.ee/acta/ On testing the equality of mean vectors in high dimension Muni S.

More information

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices

Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices Takahiro Nishiyama a,, Masashi Hyodo b, Takashi Seo a, Tatjana Pavlenko c a Department of Mathematical

More information

Statistical Inference On the High-dimensional Gaussian Covarianc

Statistical Inference On the High-dimensional Gaussian Covarianc Statistical Inference On the High-dimensional Gaussian Covariance Matrix Department of Mathematical Sciences, Clemson University June 6, 2011 Outline Introduction Problem Setup Statistical Inference High-Dimensional

More information

TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA

TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA Statistica Sinica (014): Preprint 1 TWO-SAMPLE BEHRENS-FISHER PROBLEM FOR HIGH-DIMENSIONAL DATA Long Feng 1, Changliang Zou 1, Zhaojun Wang 1, Lixing Zhu 1 Nankai University and Hong Kong Baptist University

More information

Small sample size in high dimensional space - minimum distance based classification.

Small sample size in high dimensional space - minimum distance based classification. Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc

More information

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan

SHOTA KATAYAMA AND YUTAKA KANO. Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka , Japan A New Test on High-Dimensional Mean Vector Without Any Assumption on Population Covariance Matrix SHOTA KATAYAMA AND YUTAKA KANO Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama,

More information

On corrections of classical multivariate tests for high-dimensional data

On corrections of classical multivariate tests for high-dimensional data On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with Zhidong Bai, Dandan Jiang, Shurong Zheng Overview Introduction High-dimensional data and new challenge in statistics

More information

On a General Two-Sample Test with New Critical Value for Multivariate Feature Selection in Chest X-Ray Image Analysis Problem

On a General Two-Sample Test with New Critical Value for Multivariate Feature Selection in Chest X-Ray Image Analysis Problem Applied Mathematical Sciences, Vol. 9, 2015, no. 147, 7317-7325 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2015.510687 On a General Two-Sample Test with New Critical Value for Multivariate

More information

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means

Jackknife Empirical Likelihood Test for Equality of Two High Dimensional Means Jackknife Empirical Likelihood est for Equality of wo High Dimensional Means Ruodu Wang, Liang Peng and Yongcheng Qi 2 Abstract It has been a long history to test the equality of two multivariate means.

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR

On corrections of classical multivariate tests for high-dimensional data. Jian-feng. Yao Université de Rennes 1, IRMAR Introduction a two sample problem Marčenko-Pastur distributions and one-sample problems Random Fisher matrices and two-sample problems Testing cova On corrections of classical multivariate tests for high-dimensional

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Optimal spectral shrinkage and PCA with heteroscedastic noise

Optimal spectral shrinkage and PCA with heteroscedastic noise Optimal spectral shrinage and PCA with heteroscedastic noise William Leeb and Elad Romanov Abstract This paper studies the related problems of denoising, covariance estimation, and principal component

More information

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University

More information

Persistence and global stability in discrete models of Lotka Volterra type

Persistence and global stability in discrete models of Lotka Volterra type J. Math. Anal. Appl. 330 2007 24 33 www.elsevier.com/locate/jmaa Persistence global stability in discrete models of Lotka Volterra type Yoshiaki Muroya 1 Department of Mathematical Sciences, Waseda University,

More information

Additional Propositions

Additional Propositions Statistica Sinica: Supplement SUPPLEMENT TO: TWO-SAMPLE TESTS FOR HIGH-DIMENSION, STRONGLY SPIKED EIGENVALUE MODELS Maoto Aoshima and Kazuyoshi Yata University of Tsuuba Supplementary Material In this

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Efficient Estimation for the Partially Linear Models with Random Effects

Efficient Estimation for the Partially Linear Models with Random Effects A^VÇÚO 1 33 ò 1 5 Ï 2017 c 10 Chinese Journal of Applied Probability and Statistics Oct., 2017, Vol. 33, No. 5, pp. 529-537 doi: 10.3969/j.issn.1001-4268.2017.05.009 Efficient Estimation for the Partially

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Testing Block-Diagonal Covariance Structure for High-Dimensional Data Under Non-normality Yuki Yamada, Masashi Hyodo and Takahiro Nishiyama Department of Mathematical Information Science, Tokyo University

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis

More information

Permutation transformations of tensors with an application

Permutation transformations of tensors with an application DOI 10.1186/s40064-016-3720-1 RESEARCH Open Access Permutation transformations of tensors with an application Yao Tang Li *, Zheng Bo Li, Qi Long Liu and Qiong Liu *Correspondence: liyaotang@ynu.edu.cn

More information

High-dimensional asymptotic expansions for the distributions of canonical correlations

High-dimensional asymptotic expansions for the distributions of canonical correlations Journal of Multivariate Analysis 100 2009) 231 242 Contents lists available at ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva High-dimensional asymptotic

More information

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Kristoffer Hellton Department of Mathematics, University of Oslo May 12, 2015 K. Hellton (UiO) Distribution

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Empirical Likelihood Tests for High-dimensional Data

Empirical Likelihood Tests for High-dimensional Data Empirical Likelihood Tests for High-dimensional Data Department of Statistics and Actuarial Science University of Waterloo, Canada ICSA - Canada Chapter 2013 Symposium Toronto, August 2-3, 2013 Based on

More information

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA

MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA J. Japan Statist. Soc. Vol. 37 No. 1 2007 53 86 MULTIVARIATE THEORY FOR ANALYZING HIGH DIMENSIONAL DATA M. S. Srivastava* In this article, we develop a multivariate theory for analyzing multivariate datasets

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Principal Analysis Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board

More information

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications

More information

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. Adjoint operator and adjoint matrix Given a linear operator L on an inner product space V, the adjoint of L is a transformation

More information

Extreme inference in stationary time series

Extreme inference in stationary time series Extreme inference in stationary time series Moritz Jirak FOR 1735 February 8, 2013 1 / 30 Outline 1 Outline 2 Motivation The multivariate CLT Measuring discrepancies 3 Some theory and problems The problem

More information

SEMILINEAR ELLIPTIC EQUATIONS WITH DEPENDENCE ON THE GRADIENT

SEMILINEAR ELLIPTIC EQUATIONS WITH DEPENDENCE ON THE GRADIENT Electronic Journal of Differential Equations, Vol. 2012 (2012), No. 139, pp. 1 9. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu SEMILINEAR ELLIPTIC

More information

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series C.7 Numerical series Pag. 147 Proof of the converging criteria for series Theorem 5.29 (Comparison test) Let and be positive-term series such that 0, for any k 0. i) If the series converges, then also

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference

Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference Alan Edelman Department of Mathematics, Computer Science and AI Laboratories. E-mail: edelman@math.mit.edu N. Raj Rao Deparment

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p.

A new test for the proportionality of two large-dimensional covariance matrices. Citation Journal of Multivariate Analysis, 2014, v. 131, p. Title A new test for the proportionality of two large-dimensional covariance matrices Authors) Liu, B; Xu, L; Zheng, S; Tian, G Citation Journal of Multivariate Analysis, 04, v. 3, p. 93-308 Issued Date

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

arxiv: v1 [stat.me] 22 Mar 2015

arxiv: v1 [stat.me] 22 Mar 2015 High-dimensional inference on covariance structures via the extended cross-data-matrix methodology arxiv:1503.06492v1 [stat.me] 22 Mar 2015 Kazuyoshi Yata and Makoto Aoshima Institute of Mathematics, University

More information

Testing Block-Diagonal Covariance Structure for High-Dimensional Data

Testing Block-Diagonal Covariance Structure for High-Dimensional Data Testing Block-Diagonal Covariance Structure for High-Dimensional Data MASASHI HYODO 1, NOBUMICHI SHUTOH 2, TAKAHIRO NISHIYAMA 3, AND TATJANA PAVLENKO 4 1 Department of Mathematical Sciences, Graduate School

More information

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45

Chapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45 Chapter 6 Eigenvalues Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45 Closed Leontief Model In a closed Leontief input-output-model consumption and production coincide, i.e. V x = x

More information

Non white sample covariance matrices.

Non white sample covariance matrices. Non white sample covariance matrices. S. Péché, Université Grenoble 1, joint work with O. Ledoit, Uni. Zurich 17-21/05/2010, Université Marne la Vallée Workshop Probability and Geometry in High Dimensions

More information

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX

HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX Submitted to the Annals of Statistics HYPOTHESIS TESTING ON LINEAR STRUCTURES OF HIGH DIMENSIONAL COVARIANCE MATRIX By Shurong Zheng, Zhao Chen, Hengjian Cui and Runze Li Northeast Normal University, Fudan

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi

High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis. Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi High-Dimensional AICs for Selection of Redundancy Models in Discriminant Analysis Tetsuro Sakurai, Takeshi Nakada and Yasunori Fujikoshi Faculty of Science and Engineering, Chuo University, Kasuga, Bunkyo-ku,

More information

EXISTENCE OF SOLUTIONS TO ASYMPTOTICALLY PERIODIC SCHRÖDINGER EQUATIONS

EXISTENCE OF SOLUTIONS TO ASYMPTOTICALLY PERIODIC SCHRÖDINGER EQUATIONS Electronic Journal of Differential Equations, Vol. 017 (017), No. 15, pp. 1 7. ISSN: 107-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu EXISTENCE OF SOLUTIONS TO ASYMPTOTICALLY PERIODIC

More information

Linear Algebra 1. M.T.Nair Department of Mathematics, IIT Madras. and in that case x is called an eigenvector of T corresponding to the eigenvalue λ.

Linear Algebra 1. M.T.Nair Department of Mathematics, IIT Madras. and in that case x is called an eigenvector of T corresponding to the eigenvalue λ. Linear Algebra 1 M.T.Nair Department of Mathematics, IIT Madras 1 Eigenvalues and Eigenvectors 1.1 Definition and Examples Definition 1.1. Let V be a vector space (over a field F) and T : V V be a linear

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON

THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON GEORGIAN MATHEMATICAL JOURNAL: Vol. 3, No. 2, 1996, 153-176 THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON M. SHASHIASHVILI Abstract. The Skorokhod oblique reflection problem is studied

More information

Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude

Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude Addy M. Boĺıvar Cimé Centro de Investigación en Matemáticas A.C. May 1, 2010

More information

On the simplest expression of the perturbed Moore Penrose metric generalized inverse

On the simplest expression of the perturbed Moore Penrose metric generalized inverse Annals of the University of Bucharest (mathematical series) 4 (LXII) (2013), 433 446 On the simplest expression of the perturbed Moore Penrose metric generalized inverse Jianbing Cao and Yifeng Xue Communicated

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

NEEDLET APPROXIMATION FOR ISOTROPIC RANDOM FIELDS ON THE SPHERE

NEEDLET APPROXIMATION FOR ISOTROPIC RANDOM FIELDS ON THE SPHERE NEEDLET APPROXIMATION FOR ISOTROPIC RANDOM FIELDS ON THE SPHERE Quoc T. Le Gia Ian H. Sloan Yu Guang Wang Robert S. Womersley School of Mathematics and Statistics University of New South Wales, Australia

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Statistica Sinica 26 (2016), 1117-1128 doi:http://dx.doi.org/10.5705/ss.202015.0240 A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Xu He and Peter Z. G. Qian Chinese Academy of Sciences

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson More Linear Algebra Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois

More information

Weiming Li and Jianfeng Yao

Weiming Li and Jianfeng Yao LOCAL MOMENT ESTIMATION OF POPULATION SPECTRUM 1 A LOCAL MOMENT ESTIMATOR OF THE SPECTRUM OF A LARGE DIMENSIONAL COVARIANCE MATRIX Weiming Li and Jianfeng Yao arxiv:1302.0356v1 [stat.me] 2 Feb 2013 Abstract:

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title itle On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime Authors Wang, Q; Yao, JJ Citation Random Matrices: heory and Applications, 205, v. 4, p. article no.

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

Lecture 7 Spectral methods

Lecture 7 Spectral methods CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions

Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Central Limit Theorems for Classical Likelihood Ratio Tests for High-Dimensional Normal Distributions Tiefeng Jiang 1 and Fan Yang 1, University of Minnesota Abstract For random samples of size n obtained

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Lecture: Examples of LP, SOCP and SDP

Lecture: Examples of LP, SOCP and SDP 1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:

More information

Determinants of Partition Matrices

Determinants of Partition Matrices journal of number theory 56, 283297 (1996) article no. 0018 Determinants of Partition Matrices Georg Martin Reinhart Wellesley College Communicated by A. Hildebrand Received February 14, 1994; revised

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

This condition was introduced by Chandra [1]. Ordonez Cabrera [5] extended the notion of Cesàro uniform integrability

This condition was introduced by Chandra [1]. Ordonez Cabrera [5] extended the notion of Cesàro uniform integrability Bull. Korean Math. Soc. 4 (2004), No. 2, pp. 275 282 WEAK LAWS FOR WEIGHTED SUMS OF RANDOM VARIABLES Soo Hak Sung Abstract. Let {a ni, u n i, n } be an array of constants. Let {X ni, u n i, n } be {a ni

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS* LARGE EVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILE EPENENT RANOM VECTORS* Adam Jakubowski Alexander V. Nagaev Alexander Zaigraev Nicholas Copernicus University Faculty of Mathematics and Computer Science

More information

Composite Hotelling s T-Square Test for High-Dimensional Data. 1. Introduction

Composite Hotelling s T-Square Test for High-Dimensional Data. 1. Introduction A^VÇÚO 1 33 ò 1 4 Ï 2017 c 8 Chinese Journal of Applied Probability and Statistics Aug., 2017, Vol. 33, No. 4, pp. 349-368 doi: 10.3969/j.issn.1001-4268.2017.04.003 Composite Hotelling s T-Square Test

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

Dimensionality Reduction and Principle Components

Dimensionality Reduction and Principle Components Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,

More information