ECA: High Dimensional Elliptical Component Analysis in non-gaussian Distributions

Size: px
Start display at page:

Download "ECA: High Dimensional Elliptical Component Analysis in non-gaussian Distributions"

Transcription

1 : High Dimensional Elliptical Component Analysis in non-gaussian Distributions Fang Han and Han Liu arxiv: v4 [stat.ml] 3 Oct 016 Abstract We present a robust alternative to principal component analysis PCA called elliptical component analysis for analyzing high dimensional, elliptically distributed data. estimates the eigenspace of the covariance matrix of the elliptical data. To cope with heavytailed elliptical distributions, a multivariate rank statistic is exploited. At the model-level, we consider two settings: either that the leading eigenvectors of the covariance matrix are non-sparse or that they are sparse. Methodologically, we propose procedures for both non-sparse and sparse settings. Theoretically, we provide both non-asymptotic and asymptotic analyses quantifying the theoretical performances of. In the non-sparse setting, we show that s performance is highly related to the effective rank of the covariance matrix. In the sparse setting, the results are twofold: i We show that the sparse estimator based on a combinatoric program attains the optimal rate of convergence; ii Based on some recent developments in estimating sparse leading eigenvectors, we show that a computationally efficient sparse estimator attains the optimal rate of convergence under a suboptimal scaling. Keyword: multivariate Kendall s tau, elliptical component analysis, sparse principal component analysis, optimality property, robust estimators, elliptical distribution. 1 Introduction Principal component analysis PCA plays important roles in many different areas. For example, it is one of the most useful techniques for data visualization in studying brain imaging data Lindquist, 008. This paper considers a problem closely related to PCA, namely estimating the leading eigenvectors of the covariance matrix. Let X 1,..., X n be n data points of a random vector X R d. Denote Σ to be the covariance matrix of X, and u 1,..., u m to be its top m leading eigenvectors. We want to find û 1,..., û m that can estimate u 1,..., u m accurately. Department of Statistics, University of Washington, Seattle, WA 98115, USA; fanghan@uw.edu Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA; hanliu@princeton.edu Fang Han s research was supported by NIH-R01-EB01547 and a UW faculty start-up grant. Han Liu s research was supported by the NSF CAREER Award DMS , NSF IIS , NSF IIS , NSF IIS , NIH R01-MH10339, NIH R01-GM083084, and NIH R01-HG

2 This paper is focused on the high dimensional setting where the dimension d could be comparable to, or even larger than, the sample size n, and the data could be heavy-tailed especially, non- Gaussian. A motivating example for considering such high dimensional non-gaussian heavy-tailed data is our study on functional magnetic resonance imaging fmri. In particular, in Section 6 we examine an fmri data, with 116 regions of interest ROIs, from the Autism Brian Imaging Data Exchange ABIDE project containing 544 normal subjects. There, the dimension d = 116 is comparable to the sample size n = 544. In addition, Table 3 shows that the data we consider cannot pass any normality test, and Figure 5 further indicates that the data are heavy-tailed. In high dimensions, the performance of PCA, using the leading eigenvectors of the Pearson s sample covariance matrix, has been studied for subgaussian data. In particular, for any matrix M R d d, letting TrM and σ i M be the trace and i-th largest singular value of M, Lounici 014 showed that PCA could be consistent when r Σ := TrΣ/σ 1 Σ satisfies r Σ log d/n 0. r Σ is referred to as the effective rank of Σ in the literature Vershynin, 010; Lounici, 014. When r Σ log d/n 0, PCA might not be consistent. The inconsistency phenomenon of PCA in high dimensions has been pointed out by Johnstone and Lu 009. In particular, they showed that the angle between the PCA estimator and u 1 may not converge to 0 if d/n c for some constant c > 0. To avoid this curse of dimensionality, certain types of sparsity assumptions are needed. For example, in estimating the leading eigenvector u 1 := u 11,..., u 1d T, we may assume that u 1 is sparse, i.e., s := card{j : u 1j 0} n. We call the setting that u 1 is sparse the sparse setting and the setting that u 1 is not necessarily sparse the non-sparse setting. In the sparse setting, different variants of sparse PCA methods have been proposed. For example, d Aspremont et al. 007 proposed formulating a convex semidefinite program for calculating the sparse leading eigenvectors. Jolliffe et al. 003 and Zou et al. 006 connected PCA to regression and proposed using lasso-type estimators for parameter estimation. Shen and Huang 008 and Witten et al. 009 connected PCA to singular vector decomposition SVD and proposed iterative algorithms for estimating the left and right singular vectors. Journée et al. 010 and Zhang and El Ghaoui 011 proposed greedily searching the principal submatrices of the covariance matrix. Ma 013 and Yuan and Zhang 013 proposed using modified versions of the power method to estimate eigenvectors and principal subspaces. Theoretical properties of these methods have been analyzed under both Gaussian and subgaussian assumptions. On one hand, in terms of computationally efficient methods, under the spike covariance Gaussian model, Amini and Wainwright 009 showed the consistency in parameter estimation and model selection for sparse PCA computed via the semidefinite program proposed in d Aspremont et al Ma 013 justified the use of a modified iterative thresholding method in estimating principal subspaces. By exploiting a convex program using the Fantope projection Overton and Womersley, 199; Dattorro, 005, Vu et al. 013 showed that there exist computationally efficient estimators that attain under various settings O P s log d/n rate of convergence for possibly non-spike model. In Section 5.1, we will discuss the Fantope projection in more detail. On the other hand, there exists another line of research focusing on studying sparse PCA conducted via combinatoric programs. For example, Vu and Lei 01, Lounici 013, and Vu and Lei 013 studied leading eigenvector and principal subspace estimation problems via exhaustively searching over all submatrices. They showed that the optimal O P s loged/s/n up to some

3 other parameters of Σ rate of convergence can be attained using this computationally expensive approach. Such a global search was also studied in Cai et al. 015, where they established the upper and lower bounds in both covariance matrix and principal subspace estimations. Barriers between the aforementioned statistically efficient method and computationally efficient methods in sparse PCA was pointed out by Berthet and Rigollet 013 using the principal component detection problem. Such barriers were also studied in Ma and Wu 015. One limitation for the PCA and sparse PCA theories is that they rely heavily on the Gaussian or subgaussian assumption. If the Gaussian assumption is correct, accurate estimation can be expected, otherwise, the obtained result may be misleading. To relax the Gaussian assumption, Han and Liu 014 generalized the Gaussian to the semiparametric transelliptical family called the meta-elliptical in their paper for modeling the data. The transelliptical family assumes that, after unspecified increasing marginal transformations, the data are elliptically distributed. By resorting to the marginal Kendall s tau statistic, Han and Liu 014 proposed a semiparametric alternative to scale-invariant PCA, named transelliptical component analysis, for estimating the leading eigenvector of the latent generalized correlation matrix Σ 0. In follow-up works, Wegkamp and Zhao 016 and Han and Liu 016 showed that, under various settings, i In the non-sparse case, attains the O P r Σ 0 log d/n rate of convergence in parameter estimation, which is the same rate of convergence for PCA under the subgaussian assumption Lounici, 014; Bunea and Xiao, 015; ii In the sparse case, sparse, formulated as a combinatoric program, can attain the optimal O P s loged/s/n rate of convergence under the sign subgaussian condition. More recently, Vu et al. 013 showed that, sparse, via the Fantope projection, can attain the O P s log d/n rate of convergence. Despite all these efforts, there are two remaining problems for the aforementioned works exploiting the marginal Kendall s tau statistic. First, using marginal ranks, they can only estimate the leading eigenvectors of the correlation matrix instead of the covariance matrix. Secondly, the sign subgaussian condition is not easy to verify. In this paper we show that, under the elliptical model and various settings see Corollaries 3.1, 4.1, and Theorems 5.4 and 3.5 for details, the O P s loged/s/n rate of convergence for estimating the leading eigenvector of Σ can be attained without the need of sign subgaussian condition. In particular, we present an alternative procedure, called elliptical component analysis, to directly estimate the eigenvectors of Σ and treat the corresponding eigenvalues as nuisance parameters. exploits the multivariate Kendall s tau for estimating the eigenspace of Σ. When the target parameter is sparse, the corresponding procedure is called sparse. We show that sparse, under various settings, has the following properties. i In the nonsparse setting, attains the efficient O P r Σ log d/n rate of convergence. ii In the sparse setting, sparse, via a combinatoric program, attains the minimax optimal O P s loged/s/n rate of convergence. iii In the sparse setting, sparse, via a computationally efficient program which combines the Fantope projection Vu et al., 013 and truncated power algorithm Yuan and Zhang, 013, attains the optimal O P s loged/s/n rate of convergence under a suboptimal scaling s log d/n 0. Of note, for presentation clearness, the rates presented here omit a variety of parameters regarding Σ and K. The readers should refer to Corollaries 3.1, 4.1, and Theorem 5.4 for accurate descriptions. 3

4 Table 1: The illustration of the results in sparse PCA, sparse, and sparse for the leading eigenvector estimation. Similar results also hold for principal subspace estimation. Here Σ is the covariance matrix, Σ 0 is the latent generalized correlation matrix, r M := TrM/σ 1 M represents the effective rank of M, r.c. stands for rate of convergence, n-s setting stands for the non-sparse setting, sparse setting 1 stands for the sparse setting where the estimation procedure is conducted via a combinatoric program, sparse setting stands for the sparse setting where the estimation procedure is conducted via combining the Fantope projection Vu et al., 013 and the truncated power method Yuan and Zhang, 013. For presentation clearness, the rates presented here omit a variety of parameters regarding Σ and K. The readers should refer to Corollaries 3.1, 4.1, and Theorem 5.4 for accurate descriptions. sparse PCA sparse sparse working model: subgaussian family transelliptical family elliptical family parameter of interest: eigenvectors of Σ eigenvectors of Σ 0 eigenvectors of Σ input statistics: Pearson s covariance matrix Kendall s tau multivariate Kendall s tau n-s setting r.c.: r Σ log d/n r Σ 0 log d/n r Σ log d/n sparse setting 1 r.c: s loged/s/n s log d/n general, s loged/s/n sign subgaussian s loged/s/n sparse setting r,c: s loged/s/n s log d/n general, s loged/s/n given s log d/n 0 s loged/s/n sign subgaussian given s log d/n 0 given s log d/n 0 We compare sparse PCA, sparse, and sparse in Table Related Works The multivariate Kendall s tau statistic is first introduced in Choi and Marden 1998 for testing independence and is further used in estimating low-dimensional covariance matrices Visuri et al., 000; Oja, 010 and principal components Marden, 1999; Croux et al., 00; Jackson and Chen, 004. In particular, Marden 1999 showed that the population multivariate Kendall s tau, K, shares the same eigenspace as the covariance matrix Σ. Croux et al. 00 illustrated the asymptotical efficiency of compared to PCA for the Gaussian data when d = and 3. Taskinen et al. 01 characterized the robustness and efficiency properties of in low dimensions Some related methods using multivariate rank-based statistics are discussed in Tyler 198, Tyler 1987, Taskinen et al. 003, Oja and Randles 004, Oja and Paindaveine 005, Oja et al. 006, and Sirkiä et al Theoretical analysis in low dimensions is provided in Hallin and Paindaveine 00b,a, 004, 005, 006, Hallin et al. 006, 010, 014, and some new extensions to high dimensional settings are provided in Croux et al. 013 and Feng 015. Our paper has significantly new contributions to high dimensional robust statistics literature. Theoretically, we study the use of the multivariate Kendall s tau in high dimensions, provide new properties of the multivariate rank statistic, and characterize the performance of in both non-sparse and sparse settings. Computationally, we provide an efficient algorithm for conducting 4

5 sparse and highlight the optimal rate, suboptimal scaling phenomenon in understanding the behavior of the proposed algorithm. 1. Notation Let M = [M jk ] R d d be a symmetric matrix and v = v 1,..., v d T R d be a vector. We denote v I to be the subvector of v whose entries are indexed by a set I, and M I,J to be the submatrix of M whose rows are indexed by I and columns are indexed by J. We denote suppv := {j : v j 0}. For 0 < q <, we define the l q and l vector norms as v q := d i=1 v i q 1/q and v := max 1 i d v i. We denote v 0 := cardsuppv. We define the matrix entry-wise maximum value and Frobenius norms as M max := max{ M ij } and M F = M jk 1/. Let λ j M be the j-th largest eigenvalue of M. If there are ties, λ j M is any one of the eigenvalues such that any eigenvalue larger than it has rank smaller than j, and any eigenvalue smaller than it has rank larger than j. Let u j M be any unit vector v such that v T Mv = λ j M. Without loss of generality, we assume that the first nonzero entry of u j M is positive. We denote M to be the spectral norm of M and S d 1 := {v R d : v = 1} to be the d-dimensional unit sphere. We define the restricted spectral norm M,s := sup v S d 1, v 0 s v T Mv, so for s = d, we have M,s = M. We denote fm to be the matrix with entries [fm] jk = fm jk. We denote diagm to be the diagonal matrix with the same diagonal entries as M. Let I d represent the d by d identity matrix. For any two numbers a, b R, we denote a b := min{a, b} and a b := max{a, b}. For any two sequences of positive numbers {a n } and {b n }, we write a n b n if a n = Ob n and b n = Oa n. We write b n = Ωa n if a n = Ob n, and b n = Ω o a n if b n = Ωa n and b n a n. 1.3 Paper Organization The rest of this paper is organized as follows. In the next section, we briefly introduce the elliptical distribution and review the marginal and multivariate Kendall s tau statistics. In Section 3, in the non-sparse setting, we propose the method and study its theoretical performance. In Section 4, in the sparse setting, we propose a sparse method via a combinatoric program and study its theoretical performance. A computationally efficient algorithm for conducting sparse is provided in Section 5. Experiments on both synthetic and brain imaging data are provided in Section 6. More simulation results and all technical proofs are relegated to the supplementary materials. Background This section briefly reviews the elliptical distribution, and marginal and multivariate Kendall s tau statistics. In the sequel, we denote X d = Y if random vectors X and Y have the same distribution..1 Elliptical Distribution The elliptical distribution is defined as follows. Let µ R d and Σ R d d with rankσ = q d. A d-dimensional random vector X has an elliptical distribution, denoted by X EC d µ, Σ, ξ, if 5

6 it has a stochastic representation X d = µ + ξau,.1 where U is a uniform random vector on the unit sphere in R q, ξ 0 is a scalar random variable independent of U, and A R d q is a deterministic matrix satisfying AA T = Σ. Here Σ is called the scatter matrix. In this paper, we only consider continuous elliptical distributions with Pξ = 0 = 0. An equivalent definition of the elliptical distribution is through the characteristic function expit T µψt T Σt, where ψ is a properly defined characteristic function and i := 1. ξ and ψ are mutually determined. In this setting, we denote by X EC d µ, Σ, ψ. The elliptical family is closed under independent sums, and the marginal and conditional distributions of an elliptical distribution are also elliptically distributed. Compared to the Gaussian family, the elliptical family provides more flexibility in modeling complex data. First, the elliptical family can model heavy-tail distributions in contrast, Gaussian is light-tailed with exponential tail bounds. Secondly, the elliptical family can be used to model nontrivial tail dependence between variables Hult and Lindskog, 00, i.e., different variables tend to go to extremes together in contrast, Gaussian family can not capture any tail dependence. The capability to handle heavy-tailed distributions and tail dependence is important for modeling many datasets, including: 1 financial data almost all the financial data are heavy-tailed with nontrivial tail dependence Rachev, 003; Čižek et al., 005; genomics data Liu et al., 003; Posekany et al., 011; 3 bioimaging data Ruttimann et al., In the sequel, we assume that Eξ < so that the covariance matrix CovX is well defined. For model identifiability, we further assume that Eξ = q so that CovX = Σ. Of note, the assumption that the covariance matrix of X exists is added only for presentation clearness. In particular, the follow-up theorems still hold without any requirement on the moments of ξ. Actually, still works even when Eξ = due to its construction see, for example, Equation.6 and follow-up discussions.. Marginal Rank-Based Estimators In this section we briefly review the marginal rank-based estimator using the Kendall s tau statistic. This statistic plays a vital role in estimating the leading eigenvectors of the generalized correlation matrix Σ 0 in Han and Liu 014. Letting X := X 1,..., X d T R d with X := X 1,..., X d T an independent copy of X, the population Kendall s tau statistic is defined as: τx j, X k := CovsignX j X j, signx k X k. Let X 1,..., X n R d with X i := X i1,..., X id T be n independent observations of X. The sample Kendall s tau statistic is defined as: τ jk X 1,..., X n := signx ij X i nn 1 jsignx ik X i k. 1 i<i n It is easy to verify that E τ jk X 1,..., X n = τx j, X k. Let R = [ R jk ] R d d, with R jk = sin π τ jkx 1,..., X n, be the Kendall s tau correlation matrix. The marginal rank-based estimator 6

7 θ 1 used by is obtained by plugging R into the optimization formulation in Vu and Lei 01. When X EC d µ, Σ, ξ and under mild conditions, Han and Liu 014 showed that log d E sin θ 1, u 1 Σ 0 = O s, n where s := u 1 Σ 0 0 and Σ 0 is the generalized correlation matrix of X. However, is a variant of the scale-invariant PCA and can only estimate the leading eigenvectors of the correlation matrix. How, then, to estimate the leading eigenvector of the covariance matrix in high dimensional elliptical models? A straightforward approach is to exploit a covariance matrix estimator Ŝ := [Ŝjk], defined as Ŝ jk = R jk σ j σ k,. where { σ j } d j=1 are sample standard deviations. However, since the elliptical distribution can be heavy-tailed, estimating the standard deviations is challenging and requires strong moment conditions. In this paper, we solve this problem by resorting to the multivariate rank-based method..3 Multivariate Kendall s tau Let X EC d µ, Σ, ξ and X be an independent copy of X. The population multivariate Kendall s tau matrix, denoted by K R d d, is defined as: X XX X T K := E X X..3 Let X 1,..., X n R d be n independent data points of a random vector X EC d µ, Σ, ξ. The definition of the multivariate Kendall s tau in.3 motivates the following sample version multivariate Kendall s tau estimator, which is a second-order U-statistic: K := X i X i X i X i T nn 1 X i i X <i i..4 It is obvious that E K = K, and both K and K are positive semidefinite PSD matrices of trace 1. Moreover, the kernel of the U-statistic k MK : R d R d R d d, k MK X i, X i := X i X i X i X i T X i X i,.5 is bounded under the spectral norm, i.e., k MK 1. Intuitively, such a boundedness property makes the U-statistic K more amenable to theoretical analysis. Moreover, it is worth noting that k MK X i, X i is a distribution-free kernel, i.e., for any continuous X EC d µ, Σ, ξ with the generating variable ξ, k MK X i, X i d = k MK Z i, Z i,.6 7

8 where Z i and Z i follow Z N d µ, Σ. This can be proved using the closedness of the elliptical family under independent sums and the property that Z is a stochastic scaling of X check Lemma B.7 for details. Accordingly, as will be shown later, the convergence of K to K does not depend on the generating variable ξ, and hence K enjoys the same distribution-free property as the Tyler s M estimator Tyler, However, the multivariate Kendall s tau can be directly extended to analyze high dimensional data, while the Tyler s M estimator cannot 1. The multivariate Kendall s tau can be viewed as the covariance matrix of the self-normalized data {X i X i / X i X i } i>i. It is immediate to see that K is not identical or proportional to the covariance matrix Σ of X. However, the following proposition, essentially coming from Marden 1999 and Croux et al. 00 also explicitly stated as Theorem 4.4 in Oja 010, states that the eigenspace of the multivariate Kendall s tau statistic K is identical to the eigenspace of the covariance matrix Σ. Its proof is given in the supplementary materials for completeness. Proposition.1. Let X EC d µ, Σ, ξ be a continuous distribution and K be the population multivariate Kendall s tau statistic. Then if rankσ = q, we have λ j K = E λ j ΣYj λ 1 ΣY λ qσyq,.7 where Y := Y 1,..., Y q T N q 0, I q is a standard multivariate Gaussian distribution. In addition, K and Σ share the same eigenspace with the same descending order of the eigenvalues. Proposition.1 shows that, to recover the eigenspace of the covariance matrix Σ, we can resort to recovering the eigenspace of K, which, as is discussed above, can be more efficiently estimated using K. Remark.. Proposition.1 shows that the eigenspaces of K and Σ are identical and the eigenvalues of K only depend on the eigenvalues of Σ. Therefore, if we can theoretically calculate the relationships between {λ j K} d j=1 and {λ jσ} d j=1, we can recover Σ using K. When, for example, λ 1 Σ = = λ q Σ, this relationship is calculable. In particular, it can be shown check, for example, Section 3 in Bilodeau and Brenner 1999 that Y j Y1 + + Y q 1 Beta, q 1, for j = 1,..., q, where Betaα, β is the beta distribution with parameters α and β. Accordingly, λ j K = EYj /Y + Yq = 1/q. The general relationship between {λ j K} d j=1 and {λ jσ} d j=1 is non-linear. For example, when d =, Croux et al. 00 showed that λj Σ λ j K = λ1 Σ +, for j = 1,. λ Σ 1 The Tyler s M estimator cannot be directly applied to study high dimensional data because of both theoretical and empirical reasons. Theoretically, to the authors knowledge, the sharpest sufficient condition to guarantee its consistency still requires d = on 1/ Duembgen, Empirically, our simulations show that Tyler s M estimator always fails to converge when d > n

9 3 : Non-Sparse Setting In this section we propose and study the method in the non-sparse setting when λ 1 Σ is distinct. In particular, we do not assume sparsity of u 1 Σ. Without the sparsity assumption, we propose to use the leading eigenvector u 1 K to estimate u 1 K = u 1 Σ: The estimator : u 1 K the leading eigenvector of K, where K is defined in.4. For notational simplicity, in the sequel we assume that the sample size n is even. When n is odd, we can always use n 1 data points without affecting the obtained rate of convergence. The approximation error of u 1 K to u 1 K is related to the convergence of K to K under the spectral norm via the Davis-Kahan inequality Davis and Kahan, 1970; Wedin, 197. In detail, for any two vectors v 1, v R d, let sin v 1, v be the sine of the angle between v 1 and v, with sin v 1, v := 1 v1 T v. The Davis-Kahan inequality states that the approximation error of u 1 K to u 1 K is controlled by K K divided by the eigengap between λ 1 K and λ K: sin u 1 K, u 1 K λ 1 K λ K K K. 3.1 Accordingly, to analyze the convergence rate of u 1 K to u 1 K, we can focus on the convergence rate of K to K under the spectral norm. The next theorem shows that, for the elliptical distribution family, the convergence rate of K to K under the spectral norm is K r K log d/n, where r K = TrK/λ 1 K is the effective rank of K and must be less than or equal to d. Theorem 3.1. Let X 1,..., X n be n independent observations of X EC d µ, Σ, ξ. Let K be the sample version of the multivariate Kendall s tau statistic defined in Equation.4. We have, provided that n is sufficiently large such that with probability larger than 1 α, n 16 3 r K + 1log d + log1/α, 3. K 16 K K 3 r K + 1log d + log1/α. n Remark 3.. The scaling requirement on n in 3. is posed largely for presentation clearness. As a matter of fact, we could withdraw this scaling condition by posing a different upper bound on K K. This is via employing a similar argument as in Wegkamp and Zhao 016. However, we note such a scaling requirement is necessary for proving consistency in our analysis, and was also enforced in the related literature see, for example, Theorem 3.1 in Han and Liu

10 There is a vast literature on bounding the spectral norm of a random matrix see, for example, Vershynin 010 and the references therein and our proof relies on the matrix Bernstein inequality proposed in Tropp 01, with a generalization to U-statistics following similar arguments as in Wegkamp and Zhao 016 and Han and Liu 016. We defer the proof to Section B.. Combining 3.1 and Theorem 3.1, we immediately have the following corollary, which characterizes the explicit rate of convergence for sin u 1 K, u 1 K. Corollary 3.1. Under the conditions of Theorem 3.1, provided that n is sufficiently large such that n 16 3 r K + 1log d + log1/α, we have, with probability larger than 1 α, sin u 1 K, u 1 K λ 1 K λ 1 K λ K 16 3 r K+1log d+log1/α. n Remark 3.3. Corollary 3.1 indicates that it is not necessary to require d/n 0 for u 1 K to be a consistent estimator of u 1 K. For example, when λ K/λ 1 K is upper bounded by an absolute constant strictly smaller than 1, r K log d/n 0 is sufficient to make u 1 K a consistent estimator of u 1 K. Such an observation is consistent with the observations in the PCA theory Lounici, 014; Bunea and Xiao, 015. On the other hand, Theorem 4.1 in the next section provides a rate of convergence O P λ 1 K d/n for K K. Therefore, the final rate of convergence for, under various settings, can be expressed as O P r K log d/n d/n. Remark 3.4. We note that Theorem 3.1 can also help to quantify the subspace estimation error via a variation of the Davis-Kahan inequality. In particular, let P m K and P m K be the projection matrices onto the span of m leading eigenvectors of K and K. Using Lemma 4. in Vu and Lei 013, we have P m K P m K F m λ m K λ m+1 K K K, 3.3 so that P m K P m K F can be controlled via a similar argument as in Corollary 3.1. The above bounds are all related to the eigenvalues of K. The next theorem connects the eigenvalues of K to the eigenvalues of Σ, so that we can directly bound K K and sin u 1 K, u 1 K using Σ. In the sequel, let s denote r Σ := Σ F /λ 1 Σ d to be the second-order effective rank of the matrix Σ. Theorem 3.5 The upper and lower bounds of λ j K. Letting X EC d µ, Σ, ξ, we have λ j Σ 3 λ j K 1 TrΣ + 4 Σ F log d + 8 Σ log d d, and when TrΣ > 4 Σ F log d, λ j K λ j Σ TrΣ 4 Σ F log d + 1 d 4. 10

11 Using Theorem 3.5 and recalling that the trace of K is always 1, we can replace r K by r Σ+4r Σ log d+8 log d 1 3d 1 in Theorem 3.1. We also note that Theorem 3.5 can help understand the scaling of λ j K with regard to λ j Σ. Actually, when Σ F log d = TrΣ o1, we have λ j K λ j Σ/TrΣ, and accordingly, we can continue to write λ 1 K λ 1 K λ K λ 1 Σ λ 1 Σ λ Σ. In practice, Σ F log d = TrΣ o1 is a mild condition. For example, when the condition number of Σ is upper bounded by an absolute constant, we have TrΣ Σ F d. For later purpose see, for example, Theorem 5.3, sometimes we also need to connect the elementwise maximum norm K max to that of Σ max. The next corollary gives such a connection. Corollary 3. The upper and lower bounds of K max. Letting X EC d µ, Σ, ξ, we have K max and when TrΣ > 4 Σ F log d, Σ max 1 TrΣ + 4 Σ F log d + 8 Σ log d 3 d, K max Σ max TrΣ 4 Σ F log d + 1 d 4. 4 Sparse via a Combinatoric Program We analyze the theoretical properties of in the sparse setting, where we assume λ 1 Σ is distinct and u 1 Σ 0 s < d n. In this section we study the method using a combinatoric program. For any matrix M R d d, we define the best s-sparse vector approximating u 1 M as u 1,s M := We propose to estimate u 1 Σ = u 1 K via a combinatoric program: arg max v T Mv. 4.1 v 0 s, v 1 Sparse estimator via a combinatoric program : u 1,s K, where K is defined in.4. Under the sparse setting, by definition we have u 1,s K = u 1 K = u 1 Σ. On the other hand, u 1,s K can be calculated via a combinatoric program by exhaustively searching over all s by s submatrices of K. This global search is not computationally efficient. However, the result in quantifying the approximation error of u 1,s K to u 1 K is of strong theoretical interest. Similar algorithms were also studied in Vu and Lei 01, Lounici 013, Vu and Lei 013, and Cai et al Moreover, as will be seen in the next section, this will help clarify that a computationally efficient sparse algorithm can attain the same convergence rate, though under a suboptimal scaling of n, d, s. In the following we study the performance of u 1,s K in conducting sparse. The approximation error of u 1,s K to u 1 K is connected to the approximation error of K to K under the 11

12 restricted spectral norm. This is due to the following Davis-Kahan type inequality provided in Vu and Lei 01: sin u 1,s K, u 1,s K λ 1 K λ K K K,s. 4. Accordingly, for studying sin u 1,s K, u 1,s K, we focus on studying the approximation error K K,s. Before presenting the main results, we provide some extra notation. For any random variable X R, we define the subgaussian ψ and sub-exponential norms ψ1 of X as follows: X ψ := sup k 1 k 1/ E X k 1/k and X ψ1 := sup k 1 E X k 1/k. 4.3 k 1 Any d-dimensional random vector X R d is said to be subgaussian distributed with the subgaussian constant σ if v T X ψ σ, for any v S d 1. Moreover, we define the self-normalized operator S for any random vector to be SX := X X/ X X where X is an independent copy of X. 4.4 It is immediate that K = ESXSX T. The next theorem provides a general result in quantifying the approximation error of K to K with regard to the restricted spectral norm. Theorem 4.1. Let X 1,..., X n be n observations of X EC d µ, Σ, ξ. Let K be the sample version multivariate Kendall s tau statistic defined in Equation.4. We have, when s loged/s + log1/α/n 0, for n sufficiently large, with probability larger than 1 α, K K,s sup v S d 1 v T SX ψ + K for some absolute constant C 0 > 0. Here sup v S d 1 v T X ψ sup v T SX ψ = v S d 1 sup v S d 1 where v := v 1,..., v d T and Y 1,..., Y d T N d 0, I d. s3 + logd/s + log1/α C 0, n can be further written as d i=1 v iλ 1/ i ΣY i d i=1 λ 1, 4.5 iσyi ψ It is obvious that SX is subgaussian with variance proxy 1. However, typically, a sharper upper bound can be obtained. The next theorem shows, under various settings, the upper bound can be of the same order as 1/q, which is much smaller than 1. Combined with Theorem 4.1, these results give an upper bound of K K,s. 1

13 Theorem 4.. Let X 1,..., X n be n observations of X EC d µ, Σ, ξ with rankσ = q and u 1 Σ 0 s. Let K be the sample version multivariate Kendall s tau statistic defined in Equation.4. We have, sup v T SX ψ v S d 1 λ 1 Σ λ q Σ q 1, and accordingly, when s loged/s + log1/α/n 0, with probability at least 1 α, { } K 4λ1 Σ s3 + logd/s + log1/α K,s C 0 qλ q Σ 1 + λ 1 K. n Similar to Theorem 3.1, we wish to show that K K,s = O P λ 1 K s loged/s/n. In the following, we provide several examples such that sup v v T SX ψ is of the same order as λ 1 K, so that, via Theorem 4., the desired rate is attained. Condition number controlled: Bickel and Levina 008 considered the covariance matrix model where the condition number of the covariance matrix Σ, λ 1 Σ/λ d Σ, is upper bounded by an absolute constant. Under this condition, we have and applying Theorem 3.5 we also have sup v T SX ψ d 1, v λ j K λ jσ TrΣ d 1. Accordingly, we conclude that sup v v T SX ψ and λ 1 K are of the same order. Spike covariance model: Johnstone and Lu 009 considered the following simple spike covariance model: Σ = βvv T + a I d, where β, a > 0 are two positive real numbers and v S d 1. In this case, we have, when β = oda / log d or β = Ωda, sup v T SX ψ β + a v da 1 and λ 1 K β + a β + da. A simple calculation shows that sup v v T SX ψ and λ 1 K are of the same order. Multi-Factor Model: Fan et al. 008 considered a multi-factor model, which is also related to the general spike covariance model Ma, 013: Σ = m β j v j vj T + Σ u, j=1 13

14 where we have β 1 β β m > 0, v 1,..., v m S d 1 are orthogonal to each other, and Σ u is a diagonal matrix. For simplicity, we assume that Σ u = a I d. When β j = od a 4 / log d, we have sup v T SX ψ β 1 + a v da 1 and λ 1 K β 1 + a m j=1 β j + da, and sup v v T SX ψ and λ 1 K are of the same order if, for example, m j=1 β j = Oda. Equation 4. and Theorem 4. together give the following corollary, which quantifies the convergence rate of the sparse estimator calculated via the combinatoric program in 4.1. Corollary 4.1. Under the condition of Theorem 4., if we have s loged/s + log1/α/n 0, for n sufficiently large, with probability larger than 1 α, sin u 1,s K, u 1,s K C 04λ 1 Σ/qλ q Σ 1 + λ 1 K s3 + logd/s + log1/α. λ 1 K λ K n Remark 4.3. The restricted spectral norm convergence result obtained in Theorem 4. is also applicable to analyzing principal subspace estimation accuracy. Following the notation in Vu and Lei 013, we define the principal subspace estimator to the space spanned by the top m eigenvectors of any given matrix M R d d as U m,s M := arg max V R d m M, VV T, subject to d 1IV j 0 s, 4.6 j=1 where V j is the j-th row of M and the indicator function returns 0 if and only if V j = 0. We then have U m,s KU m,s K T U m,s KU m,s K T F m λ m K λ m+1 K K K,ms. An explicit statement of the above inequality can be found in Wang et al Sparse via a Computationally Efficient Program There is a vast literature studying computationally efficient algorithms for estimating sparse u 1 Σ. In this section we focus on one such algorithm for conducting sparse by combining the Fantope projection Vu et al., 013 with the truncated power method Yuan and Zhang, Fantope Projection In this section we first review the algorithm and theory developed in Vu et al. 013 for sparse subspace estimation, and then provide some new analysis in obtaining the sparse leading eigenvector estimators. 14

15 Let Π m := V m Vm T where V m is the combination of the m leading eigenvectors of K. It is well known that Π m is the optimal rank-m projection onto K. Similarly as in 4.6, we define s Π to be the number of nonzero columns in Π m. We then introduce the sparse principal subspace estimator X m corresponding to the space spanned by the first m leading eigenvectors of the multivariate Kendall s tau matrix K. To induce sparsity, X m is defined to be the solution to the following convex program: X m := arg max K, M λ M jk, subject to 0 M I d and TrM = m, 5.1 M R d d j,k where for any two matrices A, B R d d, A B represents that B A is positive semidefinite. Here {M : 0 M I d, TrM = m} is a convex set called the Fantope. We then have the following deterministic theorem to quantify the approximation error of X m to Π m. Theorem 5.1 Vu et al If the tuning parameter λ in 5.1 satisfies that λ K K max, we have 4s Π λ X m Π m F λ m K λ m+1 K, where we remind that s Π is the number of nonzero columns in Π m. It is easy to see that X m is symmetric and the rank of X m must be greater than or equal to m, but is not necessarily exactly m. However, in various cases, dimension reduction for example, it is desired to estimate the top m leading eigenvectors of Σ, or equivalently, to estimate an exactly rank m projection matrix. Noticing that X m is a real symmetric matrix, we propose to use the following estimate X m R d d : X m := j m u j X m [u j X m ] T. 5. We then have the next theorem, which quantifies the distance between X m and Π m. Theorem 5.. If λ K K max, we have X m Π m F 4 X m Π m F 5. A Computationally Efficient Algorithm 16s Π λ λ m K λ m+1 K. In this section we propose a computationally efficient algorithm to conduct sparse via combining the Fantope projection with the truncated power algorithm proposed in Yuan and Zhang 013. We focus on estimating the leading eigenvector of K since the rest can be iteratively estimated using the deflation method Mackey, 008. The main idea here is to exploit the Fantope projection for constructing a good initial parameter for the truncated power algorithm and then perform iterative thresholding as in Yuan and Zhang 013. We call this the Fantope-truncated power algorithm, or FM, for abbreviation. Before 15

16 proceeding to the main algorithm, we first introduce some extra notation. For any vector v R d and an index set J {1,..., d}, we define the truncation function TRC, to be TRCv, J := v 1 1I1 J,..., v d 1Id J T, 5.3 where 1I is the indicator function. The initial parameter v 0, then, is the normalized vector consisting of the largest entries in u 1 X 1, where X 1 is calculated in 5.1: v 0 = w 0 / w 0, where w 0 = TRCu 1 X 1, J δ and J δ = {j : u 1 X 1 j δ}. 5.4 We have v 0 0 = supp{j : u 1 X 1 j > 0}. Algorithm 1 then provides the detailed FM algorithm and the final FM estimator is denoted as û FT 1,k. Algorithm 1 The FM algorithm. Within each iteration, a new sparse vector v t with v t 0 k is updated. The algorithm terminates when v t v t 1 is less than a given threshold ɛ. Algorithm: û FT 1,k K FM K, k, ɛ Initialize: X 1 calculated by 5.1 with m = 1, v 0 is calculated using 5.4, and t 0 Repeat: t t + 1 X y Kv t 1 If X t 0 k, then v t = X t / X t Else, let A t be the indices of the elements in X t with the k largest absolute values v t = TRCX t, A t / TRCX t, A t Until convergence: v t v t 1 ɛ û FT 1,k K v t Output: û FT 1,k K In the rest of this section we study the approximation accuracy of û FT 1,k to u 1K. Via observing Theorems 5.1 and 5., it is immediate that the approximation accuracy of u 1 X 1 is related to K K max. The next theorem gives a nonasymptotic upper bound of K K max, and accordingly, combined with Theorems 5.1 and 5., gives an upper bound on sin u 1 X 1, u 1 K. Theorem 5.3. Let X 1,..., X n be n observations of X EC d µ, Σ, ξ with rankσ = q and u 1 Σ 0 s. Let K be the sample version multivariate Kendall s tau statistic defined in Equation.4. If log d + log1/α/n 0, we have there exists some positive absolute constant C 1 such that for sufficiently large n, with probability at least 1 α, Accordingly, if K 8λ1 Σ K max C 1 qλ q Σ + K log d + log1/α max. n 8λ1 Σ λ C 1 qλ q Σ + K log d + log1/α max, 5.5 n 16

17 we have, with probability at least 1 α, sin u 1 X 1, u 1 K 8 sλ λ 1 K λ K. Theorem 5.3 builds sufficient conditions under which u 1 X 1 is a consistent estimator of u 1 K. In multiple settings the condition number controlled, spike covariance model, and multifactor model settings considered in Section 4 for example when λ λ 1 K log d/n, we have sin u 1 X 1, u 1 K = O P s log d/n. This is summarized in the next corollary. Corollary 5.1. Under the conditions of Theorem 5.3, if we further have λ 1 Σ/qλ q Σ = Oλ 1 K, Σ F log d = TrΣ o1, λ Σ/λ 1 Σ is upper bounded by an absolute constant less than 1, and λ λ 1 K log d/n, then sin u 1 X 1, u 1 K = O P s log d Corollary 5.1 is a direct consequence of Theorem 5.3 and Theorem 3.5, and its proof is omitted. We then turn to study the estimation error of û FT 1,k K. By examining Theorem 4 in Yuan and Zhang 013, for theoretical guarantee of fast rate of convergence, it is enough to show that v 0 T u 1 K is lower bounded by an absolute constant larger than zero. In the next theorem, we show that, under mild conditions, this is true with high probability, and accordingly we can exploit the result in Yuan and Zhang 013 to show that û FT 1,k K attains the same optimal convergence rate as that of u 1,s K. Theorem 5.4. Under the conditions of Corollary 5.1, let J 0 := {j : u 1 K j = Ω 0 s log d/ n}. Set δ in 5.4 to be δ = C slog d/ n for some positive absolute constant C. If s log d/n 0, and u 1 K J0 C 3 > 0 is lower bounded by an absolute positive constant, then, with probability tending to 1, v 0 0 s and v 0 T u 1 K is lower bounded by C 3 /. Accordingly under the condition of Theorem 4 in Yuan and Zhang 013, for k s, we have n. sin û FT 1,k K, k + s log d u 1 K = O P. n Remark 5.5. Although a similar second step truncation is performed, the assumption that the largest entries in u 1 K satisfy u 1 K J0 C 3 is much weaker than the assumption in Theorem 3. of Vu et al. 013 because we allow a lot of entries in the leading eigenvector to be small and not detectable. This is permissible since our aim is parameter estimation instead of guaranteeing the model selection consistency. Remark 5.6. In practice, we can adaptively select the tuning parameter k in Algorithm 1. One possible way is to use the criterion of Yuan and Zhang 013, selecting k that maximizes û FT 1,k K T K val û FT 1,k K, where K val is an independent empirical multivariate Kendall s tau statistic based on a separated sample set of the data. Yuan and Zhang 013 showed that such a heuristic performed quite well in applications. 17

18 Remark 5.7. In Corollary 5.1 and Theorem 5.4, we assume that λ is in the same scale of λ 1 K log d/n. In practice, λ is a tuning parameter. Here we can select λ using similar data driven estimation procedures as proposed in Lounici 013 and Wegkamp and Zhao 016. The main idea is to replace the population quantities with their corresponding empirical versions in 5.5. We conjecture that similar theoretical behaviors can be anticipated by the data driven way. 6 Numerical Experiments In this section we use both synthetic and real data to investigate the empirical usefulness of. We use the FM algorithm described in Algorithm 1 for parameter estimation. To estimate more than one leading eigenvectors, we exploit the deflation method proposed in Mackey 008. Here the cardinalities of the support sets of the leading eigenvectors are treated as tuning parameters. The following three methods are considered: : Sparse PCA method on the Pearson s sample covariance matrix; : Transelliptical component analysis based on the transformed Kendall s tau covariance matrix shown in Equation.; : Elliptical component analysis based on the multivariate kendall s tau matrix. For fairness of comparison, and also exploit the FM algorithm, while using the Kendall s tau covariance matrix and Pearson s sample covariance matrix as the input matrix. The tuning parameter λ in 5.1 is selected using the method discussed in Remark 5.7, and the truncation value δ in 5.4 is selected such that v 0 0 = s for the pre-specified sparsity level s. 6.1 Simulation Study In this section, we conduct a simulation study to back up the theoretical results and further investigate the empirical performance of Dependence on Sample Size and Dimension We first illustrate the dependence of the estimation accuracy of the sparse estimator on the triplet n, d, s. We adopt the data generating schemes of Yuan and Zhang 013 and Han and Liu 014. More specifically, we first create a covariance matrix Σ whose first two eigenvectors v j := v j1,..., v jd T are specified to be sparse: v 1j = { j 10 0 otherwise and v j = { j 0 0 otherwise. Then we let Σ be Σ = 5v 1 v T 1 + v v T + I d, where I d R d d is the identity matrix. We have λ 1 Σ = 6, λ Σ = 3, λ 3 Σ =... = λ d Σ = 1. Using Σ as the covariance matrix, we generate n data points from a Gaussian distribution or a multivariate-t distribution with degrees of freedom 3. Here the dimension d varies from 64 to 56 and the sample size n varies from 10 to 500. Figure 1 18

19 plots the averaged angle distances sin ṽ 1, v 1 between the sparse estimate ṽ 1 and the true parameter v 1, for dimensions d = 64, 100, 56, over 1,000 replications. In each setting, s := v 1 0 is fixed to be a constant s = d=64 d=100 d= d=64 d=100 d= log d/n log d/n A B Figure 1: Simulation for two different distributions normal and multivariate-t with varying numbers of dimension d and sample size n. Plots of s between the estimators and the true parameters are conducted over 1,000 replications. A Normal distribution; B Multivariate-t distribution. By examining the two curves in Figure 1 A and B, the between v 1 and ṽ 1 starts at almost zero for sample size n large enough, and then transits to almost one as the sample size decreases in another word, 1/n increases simultaneously. Figure 1 shows that all curves almost overlapped with each other when the s are plotted against log d/n. This phenomenon confirms the results in Theorem 5.4. Consequently, the ratio n/ log d acts as an effective sample size in controlling the prediction accuracy of the eigenvectors. In the supplementary materials, we further provide results when s is set to be 5 and 0. There one will see the conclusion drawn here still holds Estimating the Leading Eigenvector of the Covariance Matrix We now focus on estimating the leading eigenvector of the covariance matrix Σ. The first three rows in Table list the simulation schemes of n, d and Σ. In detail, let ω 1 > ω > ω 3 =... = ω d be the eigenvalues and v 1,..., v d be the eigenvectors of Σ with v j := v j1,..., v jd T. The top m leading eigenvectors v 1,..., v m of Σ are specified to be sparse such that s j := v j 0 is small and v jk = { 1/ s j, 1+ j 1 i=1 s i k j i=1 s i, 0, otherwise. 19

20 Table : Simulation schemes with different n, d and Σ. Here the eigenvalues of Σ are set to be ω 1 >... > ω m > ω m+1 =... = ω d and the top m leading eigenvectors v 1,..., v m of Σ are specified to be sparse with s j := v j 0 and u jk = 1/ s j for k [1 + j 1 i=1 s i, j i=1 s i] and zero for all the others. Σ is generated as Σ = m j=1 ω j ω d v j vj T + ω di d. The column Cardinalities shows the cardinality of the support set of {v j } in the form: s 1, s,..., s m,,,.... The column Eigenvalues shows the eigenvalues of Σ in the form: ω 1, ω,..., ω m, ω d, ω d,.... In the first three schemes, m is set to be ; In the second three schemes, m is set to be 4. Scheme n d Cardinalities Eigenvalues Scheme , 10,,,... 6, 3, 1, 1, 1, 1,... Scheme , 10,,... 6, 3, 1, 1, 1, 1,... Scheme , 10,,,... 6, 3, 1, 1, 1, 1,... Scheme , 8, 6, 5,,,... 8, 4,, 1, 0.01, 0.01,... Scheme , 8, 6, 5,,,... 8, 4,, 1, 0.01, 0.01,... Scheme , 8, 6, 5,,,... 8, 4,, 1, 0.01, 0.01,... Accordingly, Σ is generated as Σ = m ω j ω d v j vj T + ω d I d. j=1 Table shows the cardinalities s 1,..., s m and eigenvalues ω 1,..., ω m and ω d. In this section we set m = for the first three schemes and m = 4 for the later three schemes. We consider the following four different elliptical distributions: = d χ d. Here χ d is the chi-distribution with Normal X EC d 0, Σ, ξ 1 d/eξ 1 with ξ 1 i.i.d. degrees of freedom d. For Y 1,..., Y d N0, 1, Y Y d d = χ d. In this setting, X follows a Gaussian distribution Fang et al., Multivariate-t X EC d 0, Σ, ξ d/eξ with ξ = d κξ1 /ξ. Here ξ d 1 = χ d and ξ d = χ κ with κ Z +. In this setting, X follows a multivariate-t distribution with degrees of freedom κ Fang et al., Here we consider κ = 3. EC1 X EC d 0, Σ, ξ 3 with ξ 3 F d, 1, i.e., ξ 3 follows an F -distribution with degrees of freedom d and 1. Here ξ 3 has no finite mean. But could still estimate the eigenvectors of the scatter matrix and is thus robust. EC X EC d 0, Σ, ξ 4 d/eξ 4 with ξ 4 Exp1, i.e., ξ 4 follows an exponential distribution with the rate parameter 1. We generate n data points according to the schemes 1 to 3 and the four distributions discussed above 1,000 times each. To show the estimation accuracy, Figure plots the s between the estimate v 1 and v 1, defined as sin v 1, v 1, against the number of estimated nonzero entries defined as v 1 0, for three different methods:,, and. 0

21 Scheme 1 normal Scheme normal Scheme 3 normal Scheme 1 multivariate-t Scheme multivariate-t Scheme 3 multivariate-t Scheme 1 EC1 Scheme EC1 Scheme 3 EC Scheme 1 EC Scheme EC Scheme 3 EC Figure : Curves of s between the estimates and true parameters for different schemes and distributions normal, multivariate-t, EC1, and EC, from top to bottom using the FM algorithm. Here we are interested in estimating the leading eigenvector. The horizontal-axis represents the cardinalities of the estimates support sets and the vertical-axis represents the s. 1

Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model

Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model Fang Han Department of Biostatistics Johns Hopkins University Baltimore, MD 21210 fhan@jhsph.edu Han Liu Department

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Zhaoran Wang Huanran Lu Han Liu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08540 {zhaoran,huanranl,hanliu}@princeton.edu

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Optimal Linear Estimation under Unknown Nonlinear Transform

Optimal Linear Estimation under Unknown Nonlinear Transform Optimal Linear Estimation under Unknown Nonlinear Transform Xinyang Yi The University of Texas at Austin yixy@utexas.edu Constantine Caramanis The University of Texas at Austin constantine@utexas.edu Zhaoran

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

An l Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation

An l Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation Journal of Machine Learning Research 18 (2018 1-42 Submitted 3/16; Revised 5/17; Published 4/18 An l Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation Jianqing Fan Weichen

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

Invertibility of random matrices

Invertibility of random matrices University of Michigan February 2011, Princeton University Origins of Random Matrix Theory Statistics (Wishart matrices) PCA of a multivariate Gaussian distribution. [Gaël Varoquaux s blog gael-varoquaux.info]

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

An l Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation

An l Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation An l Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation Jianqing Fan, Weichen Wang and Yiqiao Zhong arxiv:1603.03516v2 [math.st] 2 Jun 2017 Department of Operations Research

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

arxiv: v1 [math.st] 10 Sep 2015

arxiv: v1 [math.st] 10 Sep 2015 Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and

More information

approximation algorithms I

approximation algorithms I SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Robust Covariance Estimation for Approximate Factor Models

Robust Covariance Estimation for Approximate Factor Models Robust Covariance Estimation for Approximate Factor Models Jianqing Fan, Weichen Wang and Yiqiao Zhong arxiv:1602.00719v1 [stat.me] 1 Feb 2016 Department of Operations Research and Financial Engineering,

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Sparse Covariance Matrix Estimation with Eigenvalue Constraints

Sparse Covariance Matrix Estimation with Eigenvalue Constraints Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Computational and Statistical Boundaries for Submatrix Localization in a Large Noisy Matrix

Computational and Statistical Boundaries for Submatrix Localization in a Large Noisy Matrix University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2017 Computational and Statistical Boundaries for Submatrix Localization in a Large Noisy Matrix Tony Cai University

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Inverse Power Method for Non-linear Eigenproblems

Inverse Power Method for Non-linear Eigenproblems Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls

Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls 6976 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Minimax Rates of Estimation for High-Dimensional Linear Regression Over -Balls Garvesh Raskutti, Martin J. Wainwright, Senior

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

arxiv: v3 [stat.ml] 31 Aug 2018

arxiv: v3 [stat.ml] 31 Aug 2018 Sparse Generalized Eigenvalue Problem: Optimal Statistical Rates via Truncated Rayleigh Flow Kean Ming Tan, Zhaoran Wang, Han Liu, and Tong Zhang arxiv:1604.08697v3 [stat.ml] 31 Aug 018 September 5, 018

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.

More information

SVD, PCA & Preprocessing

SVD, PCA & Preprocessing Chapter 1 SVD, PCA & Preprocessing Part 2: Pre-processing and selecting the rank Pre-processing Skillicorn chapter 3.1 2 Why pre-process? Consider matrix of weather data Monthly temperatures in degrees

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

A Generalized Least Squares Matrix Decomposition

A Generalized Least Squares Matrix Decomposition A Generalized Least Squares Matrix Decomposition Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, & Department of Statistics, Rice University, Houston, TX. Logan Grosenick

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Does l p -minimization outperform l 1 -minimization?

Does l p -minimization outperform l 1 -minimization? Does l p -minimization outperform l -minimization? Le Zheng, Arian Maleki, Haolei Weng, Xiaodong Wang 3, Teng Long Abstract arxiv:50.03704v [cs.it] 0 Jun 06 In many application areas ranging from bioinformatics

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

arxiv: v3 [stat.me] 8 Jun 2018

arxiv: v3 [stat.me] 8 Jun 2018 Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

arxiv: v1 [math.pr] 22 May 2008

arxiv: v1 [math.pr] 22 May 2008 THE LEAST SINGULAR VALUE OF A RANDOM SQUARE MATRIX IS O(n 1/2 ) arxiv:0805.3407v1 [math.pr] 22 May 2008 MARK RUDELSON AND ROMAN VERSHYNIN Abstract. Let A be a matrix whose entries are real i.i.d. centered

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

SUPPLEMENTAL NOTES FOR ROBUST REGULARIZED SINGULAR VALUE DECOMPOSITION WITH APPLICATION TO MORTALITY DATA

SUPPLEMENTAL NOTES FOR ROBUST REGULARIZED SINGULAR VALUE DECOMPOSITION WITH APPLICATION TO MORTALITY DATA SUPPLEMENTAL NOTES FOR ROBUST REGULARIZED SINGULAR VALUE DECOMPOSITION WITH APPLICATION TO MORTALITY DATA By Lingsong Zhang, Haipeng Shen and Jianhua Z. Huang Purdue University, University of North Carolina,

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Ioannis Gkioulekas arvard SEAS Cambridge, MA 038 igkiou@seas.harvard.edu Todd Zickler arvard SEAS Cambridge, MA 038 zickler@seas.harvard.edu

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Math 408 Advanced Linear Algebra

Math 408 Advanced Linear Algebra Math 408 Advanced Linear Algebra Chi-Kwong Li Chapter 4 Hermitian and symmetric matrices Basic properties Theorem Let A M n. The following are equivalent. Remark (a) A is Hermitian, i.e., A = A. (b) x

More information

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning Short Course Robust Optimization and Machine Machine Lecture 4: Optimization in Unsupervised Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 s

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Sparse principal component analysis via random projections

Sparse principal component analysis via random projections Sparse principal component analysis via random projections Milana Gataric, Tengyao Wang and Richard J. Samworth Statistical Laboratory, University of Cambridge {m.gataric,t.wang,r.samworth}@statslab.cam.ac.uk

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information