Large Covariance Estimation by Thresholding Principal Orthogonal Complements

Size: px
Start display at page:

Download "Large Covariance Estimation by Thresholding Principal Orthogonal Complements"

Transcription

1 Large Covariance Estimation by hresholding Principal Orthogonal Complements Jianqing Fan Yuan Liao Martina Mincheva Department of Operations Research and Financial Engineering, Princeton University Abstract his paper deals with estimation of high-dimensional covariance with a conditional sparsity structure, which is the composition of a low-rank matrix plus a sparse matrix. By assuming sparse error covariance matrix in a multi-factor model, we allow the presence of the cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complement hresholding (POE) method to explore such an approximate factor structure. he POE estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 20) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high dimensional data. he rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms, including the spectral norm. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. he uniform rates of convergence for the unobserved factors and their factor loadings are derived. he asymptotic results are also verified by extensive simulation studies. Keywords: High dimensionality, approximate factor model, unknown factors, principal components, sparse matrix, low-rank matrix, thresholding, cross-sectional correlation. Address: Department of ORFE, Sherrerd Hall, Princeton University, Princeton, NJ 08544, USA, e- mail: jqfan@princeton.edu, yuanliao@princeton.edu, mincheva@princeton.edu. he research was partially supported by NIH R0GM , NIH R0-GM0726, and DMS

2 Introduction Information and technology make large data sets widely available for scientific discovery. Much statistical analysis of such high-dimensional data involves the estimation of covariance matrix or its inverse (precision matrix). Examples include portfolio management and risk assessment (Fan, Fan and Lv, 2008), high-dimensional classification such as Fisher discriminant (Hastie, ibshirani and Friedman, 2009), graphic models (Meinshausen and Bühlmann, 2006), statistical inference such as controlling false discoveries in multiple testing (Leek and Storey, 2008; Efron, 200), finding quantitative trait loci based on longitudinal data (Yap, Fan, and Wu, 2009; Xiong, et al., 20), and testing the capital asset pricing model (Sentana, 2009), among others. See Section 4 for some of those applications. Yet, the dimensionality is often either comparable to the sample size or even larger. In such a case, the sample covariance is known to have a poor performance due to its diverse spectra, and some regularization is needed. Realizing the importance of estimating large covariance matrix and the challenges brought by the high dimensionality, researchers have proposed various regularization techniques to consistently estimate Σ in recent years. One of the key assumptions is that the covariance matrix is sparse, namely, many entries are zero (Bickel and Levia, 2008, Rothman et al, 2009, Lam and Fan 2009, Cai and Zhou, 200, Cai and Liu, 20). In many applications, however, the sparsity assumption directly on Σ is not appropriate. For example, the financial returns depend on the equity market risks, housing prices depend on the economic health, gene expressions can be stimulated by cytokines, among others. Due to the presence of common factors, it is unrealistic to assume that many outcomes are uncorrelated. An alternative method is to assume a structure of factor model, as in Fan, Fan and Lv (2008). However, they restrict themselves to the strict factor models with known factors. A natural extension is the conditional sparsity. Given the common factors, the outcomes are sparse. his approximate factor model is frequently used in economics and financial studies (Chamberlain and Rothschild, 983; Fama and French 993; Bai and Ng, 2002). A factor model typically takes the following form: y it = b if t + u it, (.) where y it is the observed datum for the ith (i =,..., p) variable at time t =,..., ; b i is a vector of factor loadings; f t is a K vector of common factors, and u it is the idiosyncratic error component, uncorrelated with f t. In a data-rich environment, p can diverge at a rate 2

3 faster than. he factor model (.) can be put in the matrix form as y t = Bf t + u t. (.2) where y t = (y t,..., y pt ), B = (b,..., b p ) and u t = (u t,..., u pt ). Under model (.), the covariance matrix Σ is given by Σ = Bcov(f t )B + Σ u, (.3) where Σ u is the covariance matrix of u t. We assume that Σ u is sparse (instead of diagnonal) and refer to the model as the approximate factor model. he conditional sparsity of form (.2) was explored by Fan, Liao and Mincheva (20) in estimating the covariance matrix, when the factors {f t } are observable. his allows them to use the regression analysis to estimate {u t }. his paper deals with the situation in which the factors are unobservable and have to be inferred. Our approach is very simple. Run the singular value decomposition on the sample covariance matrix Σ, keep the covariance matrix formed by the first K principal components, and apply the thresholding procedure to the remaining covariance matrix. his results in a Principal Orthogonal complement hresholding (POE) estimator. See Section 2 for additional details. We will investigate various properties of POE under the assumptions that the data are serial dependence, which include independent observation as a specific example. High-dimensional approximate factor model (.2) is innately related to the principal component analysis, as clearly elucidated in Section 2. his is a distinguished feature not shared by the finite-dimensionality. Bai (2003) established the large sample properties of the principal components as the estimators of the factor loading as well as the estimated common factors when they are unobservable. Forni, Hallin, Lippi and Reichlin (2000) established consistency for the estimated common components b if t. In addition, Doz, Giannone and Reichlin (2006) proposed quasi-maximum likelihood estimations and investigated the effect of misspecification on the estimation of common factors, and Wang (200) studied the large sample theory of high dimensional factor models with a multi-level factor structure. Stock and Watson (2002) considered time-varying factor loadings. See Bai and Shi (20) for a recent review. here has been an extensive literature in recent years that deals with sparse principal components, which have been widely used to enhance the convergence of the principal components in high-dimensional space. d Aspremont, Bach and El Ghaoui (2008) proposed a semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all target numbers of non-vanishing coefficients. Shen and Huang 3

4 (2008) proposed the spca-rsvd algorithm and Witten, ibshirani, and Hastie (2009) used the spc algorithm for computing regularized principal component. he idea is further extended by Ma (20) who iteratively applied thresholding and the QR decomposition to find sparse principal components and derived the rates of convergence of sparse principal components. Johnstone and Lu (2009) screened the variables first by a marginal variance and then applied the PCA to the screened variables to obtain a sparse principal component. hey proved the consistency of such a method. Amini and Wainwright (2009) analyzed semidefinite relaxations for sparse principal components. Zhang and El Ghaoui (20) proposed a fast block coordinate ascent algorithm for computing sparse PCA. he rest of the paper is organized as follows. Section 2 gives our estimation procedures and builds the relationship between the principal components analysis and the factor analysis in high-dimensional space. Section 3 provides the asymptotic theory for various estimated quantities. Specific applications of regularized covariance matrix are given in Section 4. Numerical results are reported in Section 5. Finally, Section 6 presents a real data example. All proofs are given in the appendix. hroughout the paper, we use λ min (A) and λ max (A) to denote the minimum and maximum eigenvalues of a matrix A. We also denote by A F, A, A and A max the Frobenius norm, spectral norm (also called operator norm), L -norm, and elementwise norm of a matrix A respectively, defined respectively as A F = tr /2 (A A), A = λ /2 max(a A), A = max j i a ij and A max = max i,j a ij. Note that when A is a vector, A is equal to the Euclidean norm. 2 Regularized Covariance Matrix via PCA 2. POE Sparsity assumption directly on Σ is inappropriate in many applications due to the presence of unobserved factors. Instead, we propose a nonparametric estimator of Σ based on the principal components analysis. Let λ λ 2 λ p be the ordered eigenvalues of the sample covariance matrix Σ and { ξ i } p i= be their corresponding eigenvectors. hen the sample covariance has the following spectral decomposition: Σ = K i= λ i ξi ξ i + R, (2.) where R = p i=k+ λ i ξi ξ i = (ˆr ij ) p p is the principal orthogonal complement, and K is the number of principal components. 4

5 Now we apply thresholding on R (Bickel and Levina, 2008). Define R = (ˆr ij) p p, ˆr ij = ˆr ij I( ˆr ij τ ij ), (2.2) where τ ij > 0 is an entry-dependent adaptive threshold (Cai and Liu, 20). In particular, the constant thresholding τ ij = δ is allowed. An example of the adaptive thresholding is τ ij = τ(ˆr iiˆr jj ) /2, for a given τ > 0 (2.3) where ˆr ii is the i th diagonal element of R. his corresponds to applying the thresholding with parameter τ to the correlation matrix of R. Moreover, instead of the hard-thresholding, one can also use more general thresholding functions of Antoniadis and Fan (200), as in Rothman et al. (2009) and Cai and Liu (20). he estimator of Σ is then defined as: Σ = K i= λ i ξi ξ i + R. (2.4) We will call the estimator as Principal Orthogonal complement thresholding (POE) estimator. It is obtained by thresholding the remaining components of the sample covariance matrix, after taking out the first K principal components. In practice, the number of principal components can be estimated based on the sample covariance matrix. Determining K in a data-driven way is an important topic, and is well understood in the literature. See, for example, Hallin and Liška (2007), Kapetanios (200), Onatski (200), etc, in latent factor models. With the choice of τ ij in (2.3), our estimator encompasses many popular estimators as its specific cases. When τ = 0, the estimator will be the sample covariance matrix and when τ =, the estimator becomes that based on the strict factor model. When K = 0, our estimator is the same as the threhsolding estimator of Bickel and Levina (2008) or the adaptive thresholding estimator of Cai and Liu (20) with a slight modification of the thresholding parameter that takes the standard error of the sample covariance. 2.2 High dimensional PCA and factor model We now elucidate why PCA can be used for the factor analysis. he main reason is that when dimensionality p is large, the covariance matrix Σ has very spiked eigenvalues. For 5

6 Σ u = (σ ij ) p p as in (.3), define m p = max i p I(σ ij 0). (2.5) j p which is assumed to grow slowly in p. Hence Σ u is a sparse covariance matrix. he rationale behind this assumption is that, after the common factors are taken out, many pairs of the cross-sectional units become uncorrelated. As we now demonstrate, we need the following lemma. Lemma 2.. Let {λ i } p i= be eigenvalues of Σ in descending order and {ξ i} p i= be their associated eigenvectors. Correspondingly, let { λ i } p i= be eigenvalues of Σ in descending order and { ξ i } p i= be their associated eigenvectors.. (Weyl s heorem) λ i λ i Σ Σ. 2. (sin θ heorem, Davis and Kahn, 970) ξ i ξ i When the factor loadings {b i } p i= then have p 2 Σ Σ min( λ i λ i, λ i λ i+ ). are a random sample from a certain population, we p b i b i = p B B Eb i b i, as p, (2.6) i= under some mild conditions. Note that the linear space spanned by the first K principal components of Bcov(f t )B is the same as that spanned by the columns of B when cov(f t ) is non-degenerate. hus, we can assume without loss of generality that the columns of B are orthogonal and cov(f t ) = I K, the identity matrix. he assumptions correspond to the identifiability condition in decomposition (.3). Let b,, b K be the columns of B, ordered such that { b j } K j= is in a non-increasing order. hen, { b j / b j } K j= are principal components of the matrix BB with eigenvalues { b j } K j= and the rest zero. Let { λ j } K j= be the eigenvalues (in non-increasing order) of p B B, which are bounded from below and above by (2.6), assuming K p. Since the non-vanishing eigenvalues of the matrix BB are the same as those of B B, the non-vanishing eigenvalues of the matrix BB are {p λ j } K j=, and p λ j = b j. Correspondingly, let {λ i } p i= be the values of Σ in a descending order and {ξ j } p j= be their corresponding eigenvectors. hen, an application of Weyl s theorem yields that 6

7 Proposition 2.. For the factor model (.3) with the normalization condition cov(f t ) = I K and B B is diagonal, (2.7) we have λ j b j Σ u, for j K, λ j Σ u, for j > K. If in addition (2.6) holds with λ min (Eb i b i) bounded away from zero, then b j /p is bounded away from zero for all large p. he above proposition reveals that the first K eigenvalues of Σ is of order p, whereas the rest is only of order Σ u. Σ u = (σ u,ij ), we have Under the sparsity assumption (2.5), with the notation Σ u Σ u m p max i σ u,ii. hus, when m p = o(p) and max i σ u,ii is bounded, we have distinguished eigenvalues between the principal components and the rest of the components. More generally, Σ u Σ u max i p σ u,ij q (σ u,ii σ u,jj ) ( q)/2, for q. (2.8) j= If we assume that the right-hand side of (2.8) is bounded by c p p, then the conclusion continues to hold. his generalizes the notion of sparsity defined by (2.5), which corresponds to q = 0 and was used in Bickel and Levina (2008) and Cai and Liu (20). Using Proposition 2. and the sin θ theorem of Davis and Kahn (970), we have the following Proposition 2.2. Under the normalization (2.7), if { b j } K j= are distinct, then ξ j b j / b j = O(p Σ u ), for j K Proposition 2.2 reveals that when p is large, the principal components {ξ j } K j= are close to the normalized vector { b j } K j=. he space spanned by the first K principal components are close to the space spanned by the columns of the factor loading matrix B. his provides the mathematics for using the principal components as proxy of the space spanned by the columns of the loading matrix. Our conditions for the consistency of principal components are much weaker than those in Jung and Marron (2009). A penalized least-squares approach for the low-rank plus sparse matrix is to minimize, 7

8 with respect to B and S, the objective function Σ BB S 2 F + λrank(b) + i j p λ ( s ij ). (2.9) For example, Luo (20) takes p λ ( θ ) = λ θ (but including the penalty in the diagnonal term) and relax the penalty on rank(b) by its nuclear norm (the L -norm of singular values). he advantage of our approach is that no optimization is needed. Even when the rank of B is known, and p λ ( θ ) = λ 2 (λ θ ) 2 + (Antoniadis and Fan, 200) is the hard thresholding penalty, the minimization of (2.9) still consists of two-step iterations: Given B, S is the thresholded matrix Σ BB, and given S, B is the un-normalized principal components of Σ S. With the POE, the above iterations are avoided and Propositions 2. and 2.2 provide some mathematical justifications. 2.3 Least squares point of view he POE (2.4) has an equivalent representation using a constrained least squares method, due to the low-rank decomposition (.3). least squares method seeks for Λ = ( b,..., b p ) and f t such that subject to the normalization ( Λ, f t ) = arg min b i,f t p i= In the latent factor model (.), the (y it b if t ) 2, (2.0) ft f t = I K, and p p i= b i b i is diagonal. (2.) he constraints (2.) correspond to the normalization (2.7). Here, through inclusion of the intercept terms a i in (2.0) if necessary, we assume that the mean of each variable {y it } has been removed so have been the factors {f t }. Putting in the matrix form, the optimization problem can be written as arg min B,F Y BF 2 F F F = I K, B B is diagonal. (2.2) where Y = (y,..., y ) and F = (f,, f ). For each given F, the least-squares estimator is Λ = YF, using the constraint (2.) on the factors. Substituting this into (2.2), the 8

9 object function now becomes Y YFF 2 F = tr[(i FF )Y Y]. he minimizer is now clear: the columns of F are the eigenvectors corresponding to the K largest eigenvalues of the matrix Y Y and Λ = Y F. We will show that under some mild regularity conditions, as p and, b i f t consistently estimates b if t for each i and t. Since Σ u is assumed to be sparse, we can construct an estimator of Σ u using the adaptive thresholding method by Cai and Liu (20) as follows. For some pre-determined decreasing sequence ω > 0, let û it = y it b i f t, σ ij = û it û jt, and ˆθij = (û it û jt σ ij ) 2. Define Σ u = (û it ), Σ u = ( σ ij) p p, and σ ij = σ ij I( σ ij ˆθ ij ω ). (2.3) Analogous to the decomposition (.3), we obtain the following substitution estimators Σ = Λ Λ + Σ u, (2.4) and ( Σ ) = ( Σ u ) ( Σ u ) Λ[IK + Λ ( Σ u ) Λ] Λ ( Σ u ), (2.5) by the Sherman-Morrison-Woodbury formula, noting that f t f t = I K. he following theorem shows that the two estimators based on either regularized PCA or least squares substitution are equivalent. heorem 2.. If the entry-dependent threshold in (2.2) is the same as the thresholding parameter used in (2.3). hen, Σu = R and the estimator (2.4) is equivalent to the substitution estimator (2.4), that is, Σ = Σ, and Σ u = R. 9

10 3 Asymptotic Properties 3. Assumptions his section gives the assumptions for the convergence of Σ Σ as well as ( Σ ) Σ under model (.2), in which only {y t } is observable. Recall that we impose the identifiability condition (2.7). We make the following assumptions. Assumption 3.. (i) {u t } t is stationary and ergodic with mean vector zero and covariance matrix Σ u. (ii) here exit constants c, c 2 > 0 such that c < λ min (Σ u ) λ max (Σ u ) < c 2. (iii) here exist r > 0 and b > 0, such that for any s > 0 and i p, P ( u it > s) exp( (s/b ) r ). Condition (ii) requires that Σ u is well-conditioned as in Chamberlain and Rothschild (983). As noted in Bickel and Levina (2004), this condition is satisfied if for each t, {u it } i= is a stationary and ergodic process with spectral density that is bounded away from both zero and infinity. Condition (iii) requires the distributions of the idiosyncratic terms to have exponential-type tails, which allows us to apply the large deviation theory to u itu jt σ u,ij. Assumption 3.2. (i) {f t } t is stationary and ergodic. (ii) {u t } t and {f t } t are independent. We introduce the strong mixing conditions to conduct asymptotic analysis of the least square estimates. Let F 0 and F denote the σ-algebras generated by {(f t, u t ) : t 0} and {(f t, u t ) : t } respectively. In addition, define the mixing coefficient α( ) = sup A F 0,B F P (A)P (B) P (AB). (3.) Assumption 3.3. (i) Exponential tail: here exist b 2 > 0 and r 2 > 0 such that for all i, t, and s > 0, (ii) Strong mixing: here exists r 3 satisfying: for all Z +, P ( f it > s) exp( (s/b 2 ) r 2 ). > 0 such that 3r +.5r 2 + r 3 >, and C > 0 α( ) exp( C r 3 ). 0

11 In addition, max t Eu su t /p = O(). s= In addition, we impose the following regularity conditions. Assumption 3.4. here exists M > 0 such that for all i p and t, (i) b i max < M, and E f t 4 < K 2 M, (ii) E[p /2 (u su t Eu su t )] 4 < M, (iii) E (pk) /2 p i= b iu it 4 < M. his assumption provides regularity conditions in order to consistently estimate the transformed common factors as well as the factor loadings. Conditions (ii) and (iii) are satisfied, for example, if {u it } i= and {b ik u it } i= are stationary and ergodic processes with finite fourth moment for each t, k. For simplicity, we only consider nonrandom factor loadings and a known number of factors. Note that the conditions here are weaker than those in Bai (2003), as there is no need to establish the asymptotic normality in order to estimate the covariance matrix. he following assumption requires that the factors should be pervasive, that is, impact every individual time series (Harding, 2009). See also (2.6). Assumption 3.5. p B B Ω = o() for some K K symmetric positive definite matrix Ω such that λ min (Ω) is bounded away from both zero and infinity. 3.2 Convergence of the idiosyncratic covariance Estimating the covariance matrix Σ u of the idiosyncratic components {u t } is important for many statistical inference purposes. For example, it is needed for large sample inference of the unknown factors and loadings in Bai (2003), Wang (200), for testing the capital asset pricing model in Sentana (2009), and large-scale testing in Fan, Han and Gu (202). See Section 4 for the last two applications. We estimate Σ u by applying the adaptive thresholding given by (2.3). By heorem 2., it is also the same as R given by (2.2). We apply the POE estimator with adaptive threshold: τ ij = C ˆθ ij ω, (3.2) where C > 0 is a sufficiently large constant, and throughout the paper we denote ω = K log p + K 2 + K3 log p + p and δ = m p ω. (3.3)

12 he following theorem shows that Σ u is asymptotically nonsingular as and p. he rate of convergence under the spectral norm is also derived. Let γ = 3r +.5r2 + r3. heorem 3.. Suppose max{(log p) 6/γ, K 4 (log(p )) 2 } = o( ), and /4 K 3 Under Assumptions , the POE estimator defined in (2.3) satisfies = o( p). Σ u Σ u = O p (δ ). If further δ = o(), then Σ u is invertible with probability approaching one, and ( Σ u ) Σ u = O p (δ ). When estimating Σ u, p is allowed to grow exponentially fast in, that is, there exists a > 0, as long as log p = O( a ), Σ u can be made consistent under the spectral norm. In addition, Σ u is invertible while the classical sample covariance matrix based on the residuals is not when p >. Remark 3.. Fan, Liao and Mincheva (20) showed that when {f t } are observable, the rate of convergence of the adaptive thresholding estimator is given by Σ u Σ u = O p (m p K ) log p = ( Σ u ) Σ u. Hence when the common factors are unobservable, the rate of convergence has an additional term m p K 3 / p, coming from the impact of estimating the unknown factors. his impact vanishes when p log p K 4. As p increases, more information about the common factors is collected, which results in more accurate estimation of the common factors {f t }. he estimation error becomes negligible when p is significantly larger than. If in addition K is bounded, the rate of convergence achieves the minimax rate as in Cai and Zhou (200). 3.3 Convergence of the POE estimator As was found by Fan et al. (2008), deriving the rate of convergence under the spectral norm for Σ Σ is inappropriate. We illustrate this using a simple example. Example 3.. Consider an ideal case where we know b i = (, 0,..., 0) for each i =,..., p, Σ u = I p, and {f t } are observable. hen when estimating Σ, we only need to estimate cov(f t ) using the sample covariance matrix ĉov(f t ), and obtain an estimator for Σ: Σ = Bĉov(f t )B + I p. 2

13 Simple calculations yield to Σ Σ = (f t f ) 2 var(f t ) p p, where p denotes the p-dimensional column vector of ones with p p = p. Σ Σ = O p (p/ ), which is o p () only if p = o( ). herefore, Alternatively, Fan et al. (2008) suggested to use the weighted quadratic loss to study the rate of convergence in high dimensional factor models: A Σ = p /2 Σ /2 AΣ /2 F. which is closely related to the entropy loss, introduced by James and Stein (96). Here p /2 is a normalization factor such that Σ Σ =. echnically, the impact of high dimensionality on the convergence rate of Σ Σ is via the number of factor loadings in B. It was shown by Fan et al. (2008) that B appears in Σ Σ Σ through B Σ B, and that λ max (B Σ B) is bounded, which successfully cancels out the curse of high dimensionality introduced by B. he following theorem gives the rate of convergence under various norms. heorem 3.2. Under the assumptions of heorem 3., the POE estimator defined in (2.4) satisfies Σ Σ Σ = O p ( K p log p + K2 ) log p + δ, Σ Σ max = O p (K 3 log K + ω ). In addition, if δ = o(), then Σ is nonsingular with probability approaching one, with ( Σ ) Σ = O p (δ ). In practical applications, K is typically small compared to p and. It can even be thought of as a constant. For example, in the Fama-French model, K = 3. Our result also covers the case of K = 0, when the target covariance Σ = Σ u is a sparse matrix. he POE estimator is then the same as either the thresholding estimator of Bickel and Levina (2008) or the adaptive thresholding estimator of Cai and Liu (20). heorem 3.2 then implies ( ) Σ log p Σ Σ = O p = ( Σ ) Σ. 3

14 Remark 3.2. When estimating Σ, p is allowed to grow exponentially fast in, and the estimator has the same rate of convergence as the estimator Σ u in heorem 3.. When p becomes much larger than, the precision matrix can be estimated as if the factors were observable. his fact was also shown in the asymptotic expansions obtained by Bai (2003) when K is finite and Σ u is diagonal. herefore in the approximate factor model, we can do a much better job in estimating Σ than estimating Σ. he intuition follows from the fact that Σ has bounded eigenvalues whereas Σ has diverging eigenvalues. 3.4 Convergence of unknown factors and factor loadings Many applications of the factor models contain the latent factors, and as a result, the common factors need to be estimated by the method of principal components. In this case, the factor loadings in B and the common factors f t are not separably identifiable, as for any nonsingular K K matrix H, Bf t = BH Hf t. Hence (B, f t ) cannot be identified from (BH, Hf t ). However, this ambiguity is eliminated by the identifiability condition (2.7), subject to a permutation. Note that the linear space spanned by B is the same as that by BH. In practice, it often does not matter which one to be used. Let V denote the K K diagonal matrix of the first K largest eigenvalues of the sample covariance matrix in decreasing order. Recall that F = (f,..., f ) and let H = V F FB B. hen for t =,...,, Hf t = V F (Bf,..., Bf ) Bf t. Note that Hf t depends only on the data V F and identifiable part of parameters {Bft }. herefore, there is no identifiability issue in Hf t no matter the identifiability condition (2.7) is imposed. Bai (2003) showed that when K is fixed and Σ u is diagnonal, b i and f t are consistent estimators of H b i and Hf t respectively. he following theorem allows K to grow slowly and Σ u to be a sparse non-diagonal matrix. heorem 3.3. Under the assumptions of heorem 3., ( ) max b i (H ) ω b i = O p, i p K and max t f t Hf t = O p (δ ), 4

15 where δ = K/ + K 3/2 /4 / p. As a consequence of heorem 3.3, we obtain the following Corollary 3.. Under the assumptions of heorem 3., ( ) max b i f t b if t = O p Kδ + ω (log ) /r 2. i p,t 4 Applications We give four examples which are immediate applications of the results in heorems Example 4. (Large-scale hypothesis testing). Controlling the false discovery rate in largescale hypothesis testing based on correlated test statistics is an important and challenging problem in statistics (Leek and Storey, 2008; Efron, 200). Suppose that the test statistic for each of the hypothesis H i0 : µ i = 0 vs. H i : µ i 0 is Z i N(µ i, ). hese test statistics Z are correlated and follow N(µ, Σ) with unknown covariance matrix Σ. For a given critical value x, the false discovery proportion is then defined as FDP(x) = V (x)/r(x) where V (x) = p p I( Z i > x) and R(x) = p I( Z i > x) µ i =0 are the total number of false discoveries and the total number of discoveries, respectively. Our interest is to estimate FDP(x) for each given x. Note that R(x) is an observable quantity. Only V (x) needs to be estimated. If the covariance Σ admits the approximate factor structure (.3), then the test statistics can be stochastically decomposed as i= Z = µ + Bf + u, where Σ u is sparse. (4.) By the principal factor approximation (heorem, Fan, Han, Gu, 202) V (x) = p {Φ(a i (z x/2 + η i )) + Φ(a i (z x/2 η i ))} + o P (p), (4.2) i= 5

16 when m p = o(p) and the number of true significant hypothesis {i : µ i 0} is o(p), where z x is the upper x-quantile of the standard normal distribution, η i = (Bf) i and a i = var(u i ). Now suppose that we have n repeated measurements from the model (4.). hen, by Corollary 3., {η i } can be uniformly estimated, and hence p V (x) can be consistently estimated and hence FDP(x). Efron (200) obtained these repeated test statistics based on the bootstrap sample from the original raw data. Our theory gives a formal justification to the seminal work of Efron (2007, 200). Example 4.2 (Risk management). he maximum elementwise estimation error Σ Σ max appears in risk assessment as in Fan, Zhang and Yu (2008). For a fixed portfolio allocation vector w, the true portfolio variance and the estimated one are given by w Σw and w Σ w respectively. he estimation error is bounded by w Σ w w Σw Σ Σ max w 2, where w, the l norm of w, is the gross exposure of the portfolio. Usually a constraint is placed on the total percentage of the short positions, in which case we have a restriction w c for some c > 0. In particular, c = corresponds to a portfolio with no-short positions (all weights are nonnegative). heorem 3.2 quantifies the maximum approximation error. Example 4.3 (Panel regression with a factor structure in the errors). Consider the following panel regression model Y it = x itβ + ε it, i p, t, ε it = b if t + u it, where x it is a vector of observable regressors with fixed dimension. he regression error ε it has a factor structure, but b i, f t and u it are all unobservable. Assuming x it to be independent of ε it, we are interested in the common regression coefficients β. he above panel regression model has been considered by many researchers, such as Ahn, Lee and Schmidt (200), Pesaran (2006), etc, which has broad applications in social sciences. For example, in the income studies, Y it represents the income of individual i at age t, x it is a vector of observable characteristics that are associated with income. Here b i represents a vector of unmeasured skills, such as innate ability, motivation, and hardworking; f t is a vector of unobservable prices for the unmeasured skills, which is assumed to be time-varying. Although OLS (ordinary least squares) produces a consistent estimator of β, a more efficient estimation would be obtained by GLS (generalized least squares). he GLS method 6

17 depends on an estimator of Σ ɛ, the inverse of the covariance matrix of ε t = (ε t,..., ε pt ), which should be consistent under the spectral norm. In a large panel model, p can be larger than. As a result, the traditional GLS, which estimates Σ ɛ covariance, is infeasible. based on the sample residual By assuming the covariance matrix of (u t,..., u pt ) to be sparse, we can successfully solve this problem by applying heorem 3.2. Although ε it is unobservable, it can be replaced by the regression residuals ˆε it, obtained via first regressing Y it on x it. We then apply the POE estimator to ε t ε t, as described in Section 2.. By heorem 3.2, the inverse of the resulting estimator is a consistent estimator of Σ ɛ under the spectral norm. A slight difference lies in the fact that when we apply POE, ε tε t is replaced with ε t ε t, which introduces an additional term O p ( log p ) in the estimation error. Example 4.4 (esting for asset pricing theory). A celebrated financial economic theory is the capital asset pricing model (CAPM, Sharpe 964) that makes William Sharpe win the Nobel prize in Economics in 990, whose extension is the multi-factor model (Ross, 976, Chamberlain and Rothschild, 983). It states that in a frictionless market, the excessive returns of any financial asset equals to the excessive returns of the risk factors plus idiosyncratic noises that are only related to the asset itself. In the multi-period model, the excess return y it of firm i at time t follows model (.), in which f t is the excess returns of the risk factors at time t and u it is the idiosyncratic noise. o test the null hypothesis (.2), one embeds the model into the multivariate linear model y t = µ + Bf t + u t, t =,, (4.3) and wishes to test H 0 : µ = 0. he F-test statistic involves the estimation of the covariance matrix Σ u, whose estimates are degenerate without regularization when p even if f t are observable risk factors. herefore, in the literature (Sentana, 2009, and references therein), one focuses on the case p. he typical choices of parameters are = 60 monthly data and the number of assets p = 5, 0 or 25. However, the CAPM should hold for all tradeable assets, not just a small fraction of assets. With our regularized technique, nondegenerate estimate Σ u can be obtained and the F-test or likelihood-ratio test statistics can be employed even when p. 5 Monte Carlo Experiments In this section, we will examine the performance of the POE method in the finite sample. We will also demonstrate the effect this estimator on the asset allocation and risk 7

18 assessment. Similarly to Fan, et al. (2008, 20), we simulated from a standard Fama-French three-factor model, assuming a sparse error covariance matrix and three unobserved factors. hroughout this section, the time span is fixed at = 300, and the dimensionality p increases from to 600. We assume that the excess returns of each of p stocks over the risk-free interest rate follow the following model: y it = b i f t + b i2 f 2t + b i3 f 3t + u it. he factor loadings are drawn from trivariate normal distributions b N 3 (µ B, Σ B ), the idiosyncratic errors from u t N p (0, Σ u ), and the factor returns f t follow an VAR() model. o make the simulation more realistic, model parameters are calibrated from the financial returns, as detailed in the following section. 5. Calibration o calibrate the model, we use the data on annualized returns of 00 industrial portfolios from the website of Kenneth French, and the data on 3-month reasury bill rates from the CRSP database. hese industrial portfolios are formed as the intersection of 0 portfolios based on size (market equity) and 0 portfolios based on book equity to market equity ratio. heir excess returns (ỹ t ) are computed for the period from Jan st, 2009 to Dec 3 st, 200. Here, we present a short outline of the calibration procedure.. Given {ỹ t } 500 as the input data, we calculate a 00 3 matrix B, and matrix F, using the principal components method described in Section We calculate the sample mean vector µ B and sample covariance matrix Σ B of the rows of B, which are reported in able. he factor loadings b i = (b i, b i2, b i3 ) i =,..., p are drawn from N 3 (µ B, Σ B ). for able : Mean and covariance matrix used to generate b µ B Σ B

19 3. Assume that the factors f t follow the stationary vector autoregressive model, f t = µ + Φf t + ε t, a VAR() model. hen obtain the multivariate least squares estimator for µ and Φ, and estimate Σ ɛ, by autoregressing the columns of F. Note that all eigenvalues of Φ fall within the unit circle, so our model is stationary. he covariance matrix cov(f t ) can be obtained by solving the linear equation below, cov(f t ) = Φcov(f t )Φ + Σ ɛ. he estimated parameters are depicted in able 2. We then obtain the data generating process of f t. able 2: Parameters of f t generating process µ cov(f t ) Φ For each value of p, we generate a sparse covariance matrix Σ u of the form: Σ u = DΣ 0 D. Here, Σ 0 is the error correlation matrix, and D is the diagonal matrix of the standard deviations of the errors. We set D = diag(σ,..., σ p ), where each σ i is generated independently from a Gamma distribution G(α, β), and α and β are chosen to match the sample mean and sample standard deviation of the standard deviations of the errors. A similar approach to Fan et al. (20) has been used in this calibration step. he off-diagonal entries of Σ 0 are generated independently from a normal distribution, with mean and standard deviation equal to the sample mean and sample standard deviation of the sample correlations among the estimated residuals, conditional on their absolute values being no larger than We then employ hard thresholding to make Σ 0 be sparse, where the threshold is found as the smallest constant that provides the positive definiteness of Σ 0. More precisely, start with threshold value, which gives Σ 0 = I p and then decrease the threshold values in grid until positive definiteness is violated. 9

20 5.2 Simulation For the simulation, we fix = 300, and let p increase from to 600. For each fixed p, we repeat the following steps N = 200 times, and record the means and the standard deviations of each respective norm.. Generate independently {b i } p i= N 3(µ B, Σ B ), and set B = (b,..., b p ). 2. Generate independently {u t } N p (0, Σ u ). 3. Generate {f t } as a vector autoregressive sequence of the form f t = µ + Φf t + ε t. 4. Calculate {y t } from y t = Bf t + u t. 5. Calculate Λ and F based on {y t }, using the PCA method described in Section 2.3. log p 6. Set ω = 2(K + K3 p ) with K = 3 to be the threshold for creating Σ u. Calculate Σ and ( Σ ) using the POE method. 7. Calculate the sample covariance matrix Σ sam. In the graphs below, we plot the averages and standard deviations of the distance from Σ and Σ sam to the true covariance matrix Σ, under norms. Σ,. and. max. We also plot the means and standard deviations of the distances from ( Σ ) and Σ sam to Σ under the spectral norm. We plot values of p from 20 to 600 in increments of 20. Due to invertibility, the spectral norm for Σ sam is plotted only up to p = 280. Also, we zoom into these graphs by plotting the values of p from to 00, this time in increments of. Notice that we also plot the distance from Σ obs to Σ for comparison, where Σ obs is the estimated covariance matrix proposed by Fan et al. (20), assuming the factors are observable. 5.3 Results From the simulation results, reported in Figures -4, we observe that our estimator under the unobservable factor model performs just as well as the estimator in Fan et al. (20) under the observable factor model, when p is large enough. he cost of not knowing the factors is approximately of order O p (/ p). It can be seen in Figures and 2 that this cost vanishes for p > 300. o give a better insight of the impact of estimating the unknown factors for small p, a separate set of simulations is conducted for p 00. As we can see from Figures (bottom panel) and 2 (middle and bottom panels), the impact decreases quickly. In addition, when estimating Σ, it is hard to distinguish the estimators with known and unknown factors, whose performances are quite stable compared to the sample covariance 20

21 Figure : Averages (left panel) and standard deviations (right panel) of Σ Σ Σ with with known factors (solid red curve), unknown factors (solid blue curve), and Σ sam Σ Σ (dashed curve) over 200 simulations, as a function of the dimensionality p. op panel: p ranges in 20 to 600 with increment 20; bottom panel: p ranges in to 00 with increment..5 Average 0.03 Standard Deviation p p 0.7 Average 0.05 Standard Deviation p p Figure 2: Averages (left panel) and standard deviations (right panel) of ( Σ ) Σ with known (solid red curve) and unknown (solid blue curve) factors and ( Σ sam ) Σ (dashed curve) over 200 simulations, as a function of the dimensionality p. op panel: p ranges in 20 to 600 with increment 20; middle panel: p ranges in to 00 with increment ; Bottom panel: the same as the top panel with dashed curve excluded. 000 Average 350 Standard Deviation p p 8 Average 0.7 Standard Deviation p p.3 Average 0.2 Standard Deviation p p 2

22 Figure 3: Averages (left panel) and standard deviations (right panel) Σ Σ max with known (solid red curve) and unknown (solid blue curve) factors and Σ sam Σ max (dashed curve) over 200 simulations, as a function of the dimensionality p. hey are nearly indifferentable..4.3 Average Standard Deviation p p matrix. Also, the maximum absolute elementwise error (Figure 3) of our estimator performs very similarly to that of the sample covariance matrix, which coincides with our asymptotic result. Figure 4 shows that the performances of the three methods are indistinguishable in spectral norm, as expected. Figure 4: Averages of Σ Σ with known (solid red curve) and unknown (solid blue curve) factors and Σ sam Σ (dashed curve) over 200 simulations, as a function of the dimensionality p. he three curves are plotted, but hardly distinguishable. Average p 5.4 Portfolio allocation We demonstrate the improvement of our method compared to the sample covariance and that based on the strict factor model, in a problem of portfolio allocation for risk minimization purposes. Let Σ be a generic estimator of the covariance matrix of y t, and w be the allocation vector of a portfolio consisting of the corresponding p financial securities. hen the theoretical and the empirical risk of the given portfolio would be R(w) = w Σw and ˆR(w) = w Σw, respectively. Now, define ŵ = argminw =w Σw, 22

23 the estimated (minimum variance) portfolio. hen the actual risk of the estimated portfolio is defined as R(ŵ) = ŵ Σŵ, and the estimated risk (also called empirical risk) is equal to ˆR(ŵ) = ŵ Σŵ. In practice, the actual risk is unknown, and only the empirical risk can be calculated. For each fixed p, the population Σ was generated in the same way as described in Section 4., with a sparse but not diagonal error covariance. We use three different methods to estimate Σ and obtain ŵ: strict factor model Σ diag (estimate Σ u using a diagonal matrix), our POE estimator Σ, both are with unknown factors, and sample covariance Σ sam. We then calculate the corresponding actual and empirical risks. It is interesting to examine the accuracy and the performance of the actual risk of our portfolio ŵ in comparison to the oracle risk R = min w = w Σw, which is the theoretical risk of the portfolio we would have created if we knew the true covariance matrix Σ. We thus compare the regret R(ŵ) R, which is always nonnegative, for three estimators of Σ. hey are summarized by using the box plots over the 200 simulations. he results are reported in Figure 5. In practice, we are also concerned about the difference between the actual and empirical risk of the chosen portfolio ŵ. Hence, in Figure 6, we also compare the average difference R(ŵ) ˆR(ŵ) over 200 simulations. When ŵ is obtained based on the strict factor model, both the differences between actual and oracle risk, and between actual and empirical risk are persistently greater than the corresponding differences for the approximate factor estimator. Figure 5: Box plots of regrets R(ŵ) R for p = 80 and 40. In each panel, the box plots from left to right correspond to ŵ obtained using Σ based on approximate factor model, strict factor model, and sample covariance, respectively p= p= approx. factor strict factor sample 0 approx. factor strict factor sample 23

24 Figure 6: Error of risk estimation R(ŵ) ˆR(ŵ) as p increases. ŵ and ˆR are obtained based on three estimators of Σ. Blue line: approximate factor model Σ ; red line: strict factor model Σ diag ; dotted line: sample covariance Σ sam Risk factor: sparse factor: diagonal sample cov p 6 Real Data Example We demonstrate an application of the approximate factor model on real data. he data was obtained from the CRSP database, and consists of p = 50 stocks and their annualized daily returns for the period Jan. st, 200-Dec.3 st 200 ( = 252). he stocks are chosen from 5 different industry sectors, (more specifically, Consumer Goods-extile & Apparel Clothing, Financial-Credit Services, Healthare-Hospitals, Services-Restaurants, Utilities-Water utilities), with 0 stocks from each sector. he largest eignevalues of the sample covariance equal 0.002, and , while the rest are bounded by Hence we consider K =, 2, 3 factors respectively. he threshold has been chosen using a leave-one-out cross-validation procedure. Figure 7 shows the heatmap of the thresholded error correlation matrix. We compare the level of sparsity (percentage of non-zero off-diagonal elements) for the diagonal 5 blocks of size 0 0, versus the sparsity of the rest of the matrix. For K = 2, our method results in 25.8% non-zero off-diagonal elements in the 5 diagonal blocks, as opposed to 7.3% non-zero elements in the rest of the covariance matrix. Note that, out of the non-zero elements in the central 5 blocks, 00% are positive, as opposed to a distribution of 60.3% positive and 39.7% negative amongst the non-zero elements in off-diagonal blocks. here is a strong positive correlation between the returns of companies in the same industry after the common factors are taken out, and the thresholding has preserved them. he results for K = and K = 3 show the same characteristics. hese provide stark evidence that the strict factor model is not appropriate. 7 Conclusion We study the problem of estimating a high dimensional covariance matrix with conditional sparsity. Realizing unconditional sparsity assumption is inappropriate in many appli- 24

25 Figure 7: Heatmap of thresholded error correlation matrix for number of factors K =, K = 2 and K = 3. cations, we introduce a factor model that has a conditional sparsity feature, and propose the POE estimator to take advantage of the structure. his expands considerably the scope of the model based on the strict factor model, which assumes independent idiosyncratic noise and is too restrictive in practice. By assuming sparse error covariance matrix, we allow for the presence of the cross-sectional correlation even after taking out common factors. he sparse covariance is estimated by the adaptive thresholding technique. It is found that the rates of convergence of the estimators have an extra term approximately O p (p /2 ) in addition to the results based on observable factors by Fan et al. (2008) and Fan et al. (20), which arises from the effect of estimating the unobservable factors. As we can see, this effect vanishes as the dimensionality increases, as there are more information available about the common factors. When p gets large enough, the effect of estimating the unknown factors is negligible, and we estimate the covariance matrices as if we knew the factors. his fact was also shown in the asymptotic expansions obtained by Bai (2003). he sparse covariance is estimated using the adaptive hard thresholding. Recently, Cai and Liu (20) studied a more general thresholding function of Antoniadis and Fan (200), which admits the form σ ij (θ ij ) = s(σ ij ), and also allows for soft-thresholding. It is easy to apply the more general thresholding here as well, and the rate of convergence of the resulting covariance matrix estimators should be straightforward to derive. A Proofs for Section 2 Proof of heorem 2. Proof. he sample covariance matrix of the residuals using least squares method is given by Σ u = (Y Λ F )(Y F Λ ) 25

26 = YY Λ Λ. where we used the normalization condition F F = IK. If we show that Λ Λ = K i= λ i ξi ξ i, then from the decompositions of the sample covariance we have R = Σ u. YY = Λ Λ + Σ u = K i= λ i ξi ξ i + R, Consequently, applying thresholding on Σ u is equivalent to applying thresholding on R, which gives the desired result. We now show Λ Λ = K i= λ i ξi ξ i indeed holds. Consider again the least squares problem (2.0) but with the following alternative normalization constraints: p p b i b i = I K, i= f t f t is diagonal. Let ( Λ, F) be the solution to the new optimization problem. Switching the roles of B and F, then the solution of (2.2) is Λ = (ˆξ,, ˆξ K ) and F = p Y Λ. In addition, F F = diag( λ,, λ K ). From Λ F = Λ F, it follows that Λ Λ = Λ F F Λ = Λ F F Λ K = λ i ξi ξ i. i= Q.E.D. B Proofs for Section 3 We will proceed by subsequently showing heorems 3.3, 3. and

27 B. Preliminary lemmas he following results are to be used subsequently. he proofs of Lemmas B. and B.2 are found in Fan, Liao and Martina (20). Lemma B.. Suppose A, B are symmetric semi-positive definite matrices, and λ min (B) > c for a sequence c > 0. If A B = o p (c ), then λ min (A) > c /2, and A B = O p (c 2 ) A B. Lemma B.2. Suppose that the random variables Z, Z 2 both satisfy the exponential-type tail condition: here exist r, r 2 (0, ) and b, b 2 > 0, such that s > 0, P ( Z i > s) exp( (s/b i ) r i ), i =, 2. hen for some r 3 and b 3 > 0, and any s > 0, P ( Z Z 2 > s) exp( (s/b 3 ) r 3 ). (B.) Lemma B.3. Under the assumptions of heorem 3., (i) max i,j K f itf jt Ef it f jt = O p ( (log K)/ ). (ii) max i,j p u itu jt Eu it u jt = O p ( (log p)/ ) (iii) max i K,j p f itu jt = O p ( (log p)/ ) Proof. he proof follows from the Bernstein inequality for weakly dependent data (Merlevède et al. (2009, heorem ). See Lemmas A.3 and B. of Fan, Liao and Mincheva (20). Lemma B.4. Let λ K denote the Kth largest eigenvalue of Σ = y ty t, then λ K > C p with probability approaching one for some C > 0. Proof. First of all, by Proposition 2., under Assumption 3.5, the Kth largest eigenvalue λ K of Σ satisfies: λ K b K λ K b K pλ min (Ω) Σ u pλ min (Ω)/2 for sufficiently large p. Using Weyl s theorem, we need only to prove that Σ Σ = o p (p). Without loss of generality, we prove the result under the identifiability condition (2.7). Using model (.2), Σ = (Bf t + u t )(Bf t + u t ). 27

Theory and Applications of High Dimensional Covariance Matrix Estimation

Theory and Applications of High Dimensional Covariance Matrix Estimation 1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Robust Covariance Estimation for Approximate Factor Models

Robust Covariance Estimation for Approximate Factor Models Robust Covariance Estimation for Approximate Factor Models Jianqing Fan, Weichen Wang and Yiqiao Zhong arxiv:1602.00719v1 [stat.me] 1 Feb 2016 Department of Operations Research and Financial Engineering,

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Identifying Financial Risk Factors

Identifying Financial Risk Factors Identifying Financial Risk Factors with a Low-Rank Sparse Decomposition Lisa Goldberg Alex Shkolnik Berkeley Columbia Meeting in Engineering and Statistics 24 March 2016 Outline 1 A Brief History of Factor

More information

R = µ + Bf Arbitrage Pricing Model, APM

R = µ + Bf Arbitrage Pricing Model, APM 4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

Maximum Likelihood Estimation for Factor Analysis. Yuan Liao

Maximum Likelihood Estimation for Factor Analysis. Yuan Liao Maximum Likelihood Estimation for Factor Analysis Yuan Liao University of Maryland Joint worth Jushan Bai June 15, 2013 High Dim. Factor Model y it = λ i f t + u it i N,t f t : common factor λ i : loading

More information

ASSET PRICING MODELS

ASSET PRICING MODELS ASSE PRICING MODELS [1] CAPM (1) Some notation: R it = (gross) return on asset i at time t. R mt = (gross) return on the market portfolio at time t. R ft = return on risk-free asset at time t. X it = R

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Econ671 Factor Models: Principal Components

Econ671 Factor Models: Principal Components Econ671 Factor Models: Principal Components Jun YU April 8, 2016 Jun YU () Econ671 Factor Models: Principal Components April 8, 2016 1 / 59 Factor Models: Principal Components Learning Objectives 1. Show

More information

Approaches to High-Dimensional Covariance and Precision Matrix Estimation

Approaches to High-Dimensional Covariance and Precision Matrix Estimation Approaches to High-Dimensional Covariance and Precision Matrix Estimation Jianqing Fan 1, Yuan Liao and Han Liu Department of Operations Research and Financial Engineering, Princeton University Bendheim

More information

arxiv:math/ v1 [math.st] 4 Jan 2007

arxiv:math/ v1 [math.st] 4 Jan 2007 High Dimensional Covariance Matrix Estimation arxiv:math/0701124v1 [math.st] 4 Jan 2007 Using a Factor Model By Jianqing Fan, Yingying Fan and Jinchi Lv Princeton University August 12, 2006 High dimensionality

More information

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM. 4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Bootstrapping factor models with cross sectional dependence

Bootstrapping factor models with cross sectional dependence Bootstrapping factor models with cross sectional dependence Sílvia Gonçalves and Benoit Perron University of Western Ontario and Université de Montréal, CIREQ, CIRAO ovember, 06 Abstract We consider bootstrap

More information

A PANIC Attack on Unit Roots and Cointegration. July 31, Preliminary and Incomplete

A PANIC Attack on Unit Roots and Cointegration. July 31, Preliminary and Incomplete A PANIC Attack on Unit Roots and Cointegration Jushan Bai Serena Ng July 3, 200 Preliminary and Incomplete Abstract his paper presents a toolkit for Panel Analysis of Non-stationarity in Idiosyncratic

More information

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University) Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber

More information

Bootstrapping factor models with cross sectional dependence

Bootstrapping factor models with cross sectional dependence Bootstrapping factor models with cross sectional dependence Sílvia Gonçalves and Benoit Perron McGill University, CIREQ, CIRAO and Université de Montréal, CIREQ, CIRAO ovember 4, 07 Abstract We consider

More information

arxiv: v1 [stat.me] 16 Feb 2018

arxiv: v1 [stat.me] 16 Feb 2018 Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu

More information

arxiv: v2 [math.st] 2 Jul 2017

arxiv: v2 [math.st] 2 Jul 2017 A Relaxed Approach to Estimating Large Portfolios Mehmet Caner Esra Ulasan Laurent Callot A.Özlem Önder July 4, 2017 arxiv:1611.07347v2 [math.st] 2 Jul 2017 Abstract This paper considers three aspects

More information

Factor Models for Asset Returns. Prof. Daniel P. Palomar

Factor Models for Asset Returns. Prof. Daniel P. Palomar Factor Models for Asset Returns Prof. Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) MAFS6010R- Portfolio Optimization with R MSc in Financial Mathematics Fall 2018-19, HKUST,

More information

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology FE670 Algorithmic Trading Strategies Lecture 3. Factor Models and Their Estimation Steve Yang Stevens Institute of Technology 09/12/2012 Outline 1 The Notion of Factors 2 Factor Analysis via Maximum Likelihood

More information

Regression: Ordinary Least Squares

Regression: Ordinary Least Squares Regression: Ordinary Least Squares Mark Hendricks Autumn 2017 FINM Intro: Regression Outline Regression OLS Mathematics Linear Projection Hendricks, Autumn 2017 FINM Intro: Regression: Lecture 2/32 Regression

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA

More information

MFE Financial Econometrics 2018 Final Exam Model Solutions

MFE Financial Econometrics 2018 Final Exam Model Solutions MFE Financial Econometrics 2018 Final Exam Model Solutions Tuesday 12 th March, 2019 1. If (X, ε) N (0, I 2 ) what is the distribution of Y = µ + β X + ε? Y N ( µ, β 2 + 1 ) 2. What is the Cramer-Rao lower

More information

Estimating Latent Asset-Pricing Factors

Estimating Latent Asset-Pricing Factors Estimating Latent Asset-Pricing Factors Martin Lettau and Markus Pelger April 9, 8 Abstract We develop an estimator for latent factors in a large-dimensional panel of financial data that can explain expected

More information

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of

More information

Sufficient Forecasting Using Factor Models

Sufficient Forecasting Using Factor Models Sufficient Forecasting Using Factor Models arxiv:1505.07414v2 [math.st] 24 Dec 2015 Jianqing Fan, Lingzhou Xue and Jiawei Yao Princeton University and Pennsylvania State University Abstract We consider

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R, Principal Component Analysis (PCA) PCA is a widely used statistical tool for dimension reduction. The objective of PCA is to find common factors, the so called principal components, in form of linear combinations

More information

Factor Modelling for Time Series: a dimension reduction approach

Factor Modelling for Time Series: a dimension reduction approach Factor Modelling for Time Series: a dimension reduction approach Clifford Lam and Qiwei Yao Department of Statistics London School of Economics q.yao@lse.ac.uk p.1 Econometric factor models: a brief survey

More information

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50 GARCH Models Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 50 Outline 1 Stylized Facts ARCH model: definition 3 GARCH model 4 EGARCH 5 Asymmetric Models 6

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Portfolio Allocation using High Frequency Data. Jianqing Fan

Portfolio Allocation using High Frequency Data. Jianqing Fan Portfolio Allocation using High Frequency Data Princeton University With Yingying Li and Ke Yu http://www.princeton.edu/ jqfan September 10, 2010 About this talk How to select sparsely optimal portfolio?

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Estimation of the Global Minimum Variance Portfolio in High Dimensions Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Multivariate GARCH models.

Multivariate GARCH models. Multivariate GARCH models. Financial market volatility moves together over time across assets and markets. Recognizing this commonality through a multivariate modeling framework leads to obvious gains

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Cross-Sectional Regression after Factor Analysis: Two Applications

Cross-Sectional Regression after Factor Analysis: Two Applications al Regression after Factor Analysis: Two Applications Joint work with Jingshu, Trevor, Art; Yang Song (GSB) May 7, 2016 Overview 1 2 3 4 1 / 27 Outline 1 2 3 4 2 / 27 Data matrix Y R n p Panel data. Transposable

More information

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω) Online Appendix Proof of Lemma A.. he proof uses similar arguments as in Dunsmuir 979), but allowing for weak identification and selecting a subset of frequencies using W ω). It consists of two steps.

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Robust Factor Models with Explanatory Proxies

Robust Factor Models with Explanatory Proxies Robust Factor Models with Explanatory Proxies Jianqing Fan Yuan Ke Yuan Liao March 21, 2016 Abstract We provide an econometric analysis for the factor models when the latent factors can be explained partially

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Network Connectivity and Systematic Risk

Network Connectivity and Systematic Risk Network Connectivity and Systematic Risk Monica Billio 1 Massimiliano Caporin 2 Roberto Panzica 3 Loriana Pelizzon 1,3 1 University Ca Foscari Venezia (Italy) 2 University of Padova (Italy) 3 Goethe University

More information

Confounder Adjustment in Multiple Hypothesis Testing

Confounder Adjustment in Multiple Hypothesis Testing in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen

More information

Applications of Random Matrix Theory to Economics, Finance and Political Science

Applications of Random Matrix Theory to Economics, Finance and Political Science Outline Applications of Random Matrix Theory to Economics, Finance and Political Science Matthew C. 1 1 Department of Economics, MIT Institute for Quantitative Social Science, Harvard University SEA 06

More information

INFERENTIAL THEORY FOR FACTOR MODELS OF LARGE DIMENSIONS. Jushan Bai. May 21, 2002

INFERENTIAL THEORY FOR FACTOR MODELS OF LARGE DIMENSIONS. Jushan Bai. May 21, 2002 IFEREIAL HEORY FOR FACOR MODELS OF LARGE DIMESIOS Jushan Bai May 2, 22 Abstract his paper develops an inferential theory for factor models of large dimensions. he principal components estimator is considered

More information

arxiv: v2 [stat.me] 10 Sep 2018

arxiv: v2 [stat.me] 10 Sep 2018 Factor-Adjusted Regularized Model Selection arxiv:1612.08490v2 [stat.me] 10 Sep 2018 Jianqing Fan Department of ORFE, Princeton University Yuan Ke Department of Statistics, University of Georgia and Kaizheng

More information

Nowcasting Norwegian GDP

Nowcasting Norwegian GDP Nowcasting Norwegian GDP Knut Are Aastveit and Tørres Trovik May 13, 2007 Introduction Motivation The last decades of advances in information technology has made it possible to access a huge amount of

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Empirical Market Microstructure Analysis (EMMA)

Empirical Market Microstructure Analysis (EMMA) Empirical Market Microstructure Analysis (EMMA) Lecture 3: Statistical Building Blocks and Econometric Basics Prof. Dr. Michael Stein michael.stein@vwl.uni-freiburg.de Albert-Ludwigs-University of Freiburg

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

This paper develops a test of the asymptotic arbitrage pricing theory (APT) via the maximum squared Sharpe

This paper develops a test of the asymptotic arbitrage pricing theory (APT) via the maximum squared Sharpe MANAGEMENT SCIENCE Vol. 55, No. 7, July 2009, pp. 1255 1266 issn 0025-1909 eissn 1526-5501 09 5507 1255 informs doi 10.1287/mnsc.1090.1004 2009 INFORMS Testing the APT with the Maximum Sharpe Ratio of

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Delta Theorem in the Age of High Dimensions

Delta Theorem in the Age of High Dimensions Delta Theorem in the Age of High Dimensions Mehmet Caner Department of Economics Ohio State University December 15, 2016 Abstract We provide a new version of delta theorem, that takes into account of high

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

DYNAMIC AND COMPROMISE FACTOR ANALYSIS

DYNAMIC AND COMPROMISE FACTOR ANALYSIS DYNAMIC AND COMPROMISE FACTOR ANALYSIS Marianna Bolla Budapest University of Technology and Economics marib@math.bme.hu Many parts are joint work with Gy. Michaletzky, Loránd Eötvös University and G. Tusnády,

More information

A multiple testing approach to the regularisation of large sample correlation matrices

A multiple testing approach to the regularisation of large sample correlation matrices A multiple testing approach to the regularisation of large sample correlation matrices Natalia Bailey Department of Econometrics and Business Statistics, Monash University M. Hashem Pesaran University

More information

arxiv: v2 [stat.me] 21 Jun 2017

arxiv: v2 [stat.me] 21 Jun 2017 Factor Models for Matrix-Valued High-Dimensional Time Series Dong Wang arxiv:1610.01889v2 [stat.me] 21 Jun 2017 Department of Operations Research and Financial Engineering, Princeton University, Princeton,

More information

Forecast comparison of principal component regression and principal covariate regression

Forecast comparison of principal component regression and principal covariate regression Forecast comparison of principal component regression and principal covariate regression Christiaan Heij, Patrick J.F. Groenen, Dick J. van Dijk Econometric Institute, Erasmus University Rotterdam Econometric

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Hands-on Matrix Algebra Using R

Hands-on Matrix Algebra Using R Preface vii 1. R Preliminaries 1 1.1 Matrix Defined, Deeper Understanding Using Software.. 1 1.2 Introduction, Why R?.................... 2 1.3 Obtaining R.......................... 4 1.4 Reference Manuals

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

ROBUST FACTOR MODELS WITH COVARIATES

ROBUST FACTOR MODELS WITH COVARIATES Submitted to the Annals of Statistics ROBUS FACOR MODELS WIH COVARIAES By Jianqing Fan, Yuan Ke and Yuan Liao Princeton University and Rutgers University We study factor models when the latent factors

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Variable Screening in High-dimensional Feature Space

Variable Screening in High-dimensional Feature Space ICCM 2007 Vol. II 735 747 Variable Screening in High-dimensional Feature Space Jianqing Fan Abstract Variable selection in high-dimensional space characterizes many contemporary problems in scientific

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title

On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime. Title itle On singular values distribution of a matrix large auto-covariance in the ultra-dimensional regime Authors Wang, Q; Yao, JJ Citation Random Matrices: heory and Applications, 205, v. 4, p. article no.

More information

Forecasting 1 to h steps ahead using partial least squares

Forecasting 1 to h steps ahead using partial least squares Forecasting 1 to h steps ahead using partial least squares Philip Hans Franses Econometric Institute, Erasmus University Rotterdam November 10, 2006 Econometric Institute Report 2006-47 I thank Dick van

More information

Estimation of Graphical Models with Shape Restriction

Estimation of Graphical Models with Shape Restriction Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu

More information