arxiv: v2 [math.st] 7 Aug 2014

Size: px
Start display at page:

Download "arxiv: v2 [math.st] 7 Aug 2014"

Transcription

1 Sparse and Low-Rank Covariance Matrices Estimation Shenglong Zhou, Naihua Xiu, Ziyan Luo +, Lingchen Kong Department of Applied Mathematics + State Key Laboratory of Rail Traffic Control and Safety arxiv: v2 [math.st] 7 Aug Beijing Jiaotong University, Beijing 00044, P. R. China Abstract This paper aims at achieving a simultaneously sparse and low-rank estimator from the semidefinite population covariance matrices.we first benefit from a convex optimization which develops l -norm penalty to encourage the sparsity and nuclear norm to favor the low-rank property. For the proposed estimator, we then prove that with large probability, the Frobenious norm of the estimation rate can be of order O ( (s logr /n under a mild case, where s and r denote the number of sparse entries and the rank of the population covariance respectively, n notes the sample capacity. Finally an efficient alternating direction method of multipliers with global convergence is proposed to tackle this problem, and meantime merits of the approach are also illustrated by practicing numerical simulations. Keywords: Covariance matrices, Sparse, Low-rank, Rate of estimation, Alternating direction method of multipliers Introduction Estimation of population covariance matrices from samples of multivariate data has draw many attentions in the last decade owing to its fundamental importance in multivariate analysis. With dramatic advances in technology in recent years, various research fields, such as genetic data, brain imaging, spectroscopic imaging, climate data and so on, have been used to deal with massive highdimensional data sets, whose sample sizes can be very small relative to dimension. In such settings, the standard and the most usual sample covariance matrices often performs poorly [, 2, ]. Fortunately, regularization as a class of new methods to estimate covariance matrices has recently emerged to overcome those shortages of using traditional sample covariance matrices. These methods encompass several specified forms, banding [, 6, 7], tapering [4, 0] and thresholding [2, 5, 8, 6] for instance. Aug 7, longnan_zsl@63.com, nhxiu@bjtu.edu.cn, zyluo@bjtu.edu.cn,konglchen@26.com

2 Moreover, there are many cases where the model is known to be structured in several ways at the same time. In recent years, one of research contents is to estimate a covariance matrix possessing both sparsity and positive definiteness. For instance, Rothman [5] gave the following model: ˆΣ + = argmin Σ 0 2 Σ Σ n 2 F τlogdet(σ + λ Σ, ( where Σ n is the sample covariance matrix, F is the Frobenious norm, is the element-wise l -norm and Σ := Σ Diag(Σ. From the optimization viewpoint, ( is similar to the graphical lasso criterion [9] which also has a log-determinant part and the element-wise l -penalty. Rothman [5] derived an iterative procedure to solve (. While Xue, Ma and Zou [8] omitted the log-determinant part and considered the positive definite constraint {Σ εi} for some arbitrarily small ε > 0: ˆΣ ε = argmin Σ εi 2 Σ Σ n 2 F + λ Σ, (2 They utilized an efficient alternating direction method (ADM to solve the challenging problem (2 and established its convergence properties. Most of the literatures, e.g.,[2, 5, 8], required the population covariance matrices being positive definite, and thus there is no essence of pursuing the low-rank of the estimator. By contrast, newly appeared research topic is to consider simultaneously the sparsity and low-rank of a structured model, which implies that the population covariance matrices are no longer restricted to the positive definite matrix cone and can be relaxed to the positive semidefinite cone. In addition, the models with structure of being simultaneously the sparsity and low-rank are widely applied into practice, such as sparse signal recovery from quadratic measurements and sparse phase retrieval, see [3] for example. Moreover, Richard et al. [4] showed that both sparse and low-rank model can be derived in covariance matrix when the random variables are highly correlated in groups, which means this covariance matrix has a block diagonal structure. With stimulations of those ideas, we construct the following convex model encompassing the l - norm and nuclear norm for estimating the covariance matrix: ˆΣ = argmin Σ 0 2 Σ Σ n 2 F + λ Σ + τ Σ, (3 where λ 0,τ 0 are tuning parameters. The l -norm penalty Σ = i,j σ i j is also called lassotype penalty and is used to encourage sparse solutions. The nuclear norm, Σ = i λ i (Σ with λ i (Σ being the eigenvalue of Σ, is the trace norm when Σ 0 and ensures low-rank solutions of (3. Here we inroduce the approximate rank to interpret the low-rank, which is defined as ar (A := ar {,2,, p} being the smallest number such that σ ar + (A γ, (4 σ (A 2

3 where σ i (A(i =,2,, p are the singular value of A with σ (A σ 2 (A σ p (A, and γ > 0 could be chosen based on the needs, throughout our paper we fix γ = 0.00 for simplicity. The contributions of this paper mainly center on two aspects. For one thing, being different from [3, 4], we establish the theoretical statistical theory under different assumptions rather than giving the generalized error bound of the estimation. Especially, we acquire the estimation rate O ( (s logr /n under the Frobenious norm error, which improves the optimal rate O ( (s log p/n where the low-rank property of the estimator does not be considered [7, 5, 8] and p is the samples dimension with p > max{n,r }. For another, we take advantage of the alternating direction method of multipliers (ADMM, also can be seen in [8, 9], to combat our problem (3. The organization of this paper is as follows. In Section 2 we will present some theoretical properties of the estimator derived by the proposed model (3. After that the alternating direction method of multipliers (ADMM is going to be introduced to combat the problem, and numerical experiments are projected to show the performance of this method in Sections 3 and 4 respectively. We make a conclusion in the last section. 2 A Sparse and Low-Rank Covariance Estimator Before the main part, we hereafter introduce some notations. E(X and P(A denote the expectation of X and the probability of the incident A occurring respectively. Card(S is the number of entries of the set S. Normal distribution with mean µ and covariance Σ is written as N ( µ,σ. Say Y n = O P ( if for every ε > 0, there is a C > 0 such that P{ Y n > C } < ε for all n n 0 (ε, and say Y n = O P (a n if Y n /a n = O P (. If there are two constants C C 2 such that C X n /Y n C 2, we write as X n Y n. For given observed independently and identically distributed (i.i.d. for short p-variate random variables X,, X n with covariance matrix Σ 0 and p > n, the goal is to estimate the unknown matrix Σ 0 based on the sample {X l : l =,,n}. This problem is called covariance matrix estimation which is of fundamental importance in multivariate analysis. Given a random sample {X,, X n } from E(X = 0 (without loss of generality and a population covariance matrix Σ 0 = ( σ 0i j i,j p = E( X X, the sample covariance matrix is Σ n = ( σ ni j i,j p = n ( Xl X ( X l X, n where X = n n l= X l. Denote S the support set of the population covariance matrix Σ 0 as l= S = { (i, j : σ 0i j 0 }, s = Card(S, r = rank(σ 0. Assumption 2. For all p, 0 λ min (Σ 0 λ max (Σ 0 λ <, where λ is a constant. 3

4 Assumption 2. is a common used condition in covariance matrix estimation, on which a useful lemma based is recalled here for the sequel analysis. One can also refer it in []. Lemma 2.2 Let X l R p,l =,,n be i.i.d. N (0,Σ 0 and Assumption 2. holds, (i.e., for all p, λ max (Σ 0 λ <. Then, if Σ 0 = ( σ 0i j i,j p, n ( P{ } Xl i X l j σ 0i j nν C exp { C 2 nν 2}, f or ν δ (5 l= where constants C,C 2 and δ depend on λ only. Another lemma which plays an important role in our main results is stated below. Lemma 2.3 Suppose that Assumption 2. holds, λ ε 6 s,τ ε 8. Then for ε > 0 sufficiently small, r max σ0i j σ ni j λ i mpl i es ˆΣ Σ 0 F ε, (6 where ˆΣ is defined as (3. i,j p Proof First make the eigenvalue decomposition of Σ 0 (r = rank(σ 0 as ( Σ 0 = UΛ Σ0 U Diag(λ(Σ0 U U, 0 where U R p p with UU = U U = I p is the matrix composed of eigenvectors, Diag(λ(Σ 0 R r r is a diagonal matrix generated by eigenvalues with λ(σ 0 = (λ (Σ 0,,λ r (Σ 0 and λ j (Σ 0 > 0, j =,,r. By denoting Σ := U U + Σ 0 with = consider the model ˆ := argmin +Λ Σ0 0 argmin +Λ Σ0 0 F ( = ( argmin +Diag(λ(Σ 0 0 and R r r, which implies = U ΣU Λ Σ0, we F ( (7 U U + Σ 0 Σ n 2 F 2 + λ U U + Σ 0 + τ U U + Σ 0. Clearly, from (3 we have ˆΣ = U ˆ U +Σ 0 which implies ˆ = U ˆΣU Λ Σ0. For a given ε > 0 sufficiently small and any F = ε (i.e., F = ε, we compute F ( F (0 = U U + Σ 0 Σ n 2 F 2 + λ U U + Σ 0 + τ U U + Σ 0 2 Σ 0 Σ n 2 F + λ Σ 0 + τ Σ 0 = 2 2 F + U U,Σ 0 Σ n + λ ( U U + Σ 0 Σ 0 +τ ( U U + Σ 0 Σ F + I + II + III. 4

5 For convenience we denote = U U. Then for I, it holds I =,Σ 0 Σ n ( tr (Σ n Σ 0 = ( i j σ0i j σ ni j max σ0i j σ ni j, i,j i,j p For II, we obtain by noting S = { (i, j : σ 0i j 0 } and s = Card(S that, II = λ ( Σ 0 + Σ 0 = λ ( Σ 0S + S + S C Σ 0S ( λ Σ 0S + S + S C ( Σ 0S + S + S = λ ( S C S. From the Hölder Inequality, one can prove that = = = For III, combining with (8 we get that r λ i ( r F = r F, (8 i= S s S F. (9 III = τ ( Σ 0 + Σ 0 τ = τ τ r F. Since max i,j σ0i j σ ni j λ and (9, G( F ( F (0 = 2 2 F + I + II + III 2 2 F λ + λ ( S C S τ r F = 2 2 F ( λ S + S C ( S C S τ r F = 2 2 F 2λ S τ r F 2 2 F 2λ s S F τ r F 2 2 F 2λ s F τ r F Therefore, by λ = 2 2 F 2λ s F τ r F. ε 6 s,τ ε 8 r G( 2 2 F 2λ s S F τ r F ε2 2 ε2 8 ε2 8 = ε2 4 > 0. 5

6 Hence we prove that if λ ε 6 s,τ ε 8 r and max i,j σ0i j σ ni j λ, for any F = ε, it holds G( F ( F (0 > 0. In addition, from (7, we have ˆ = argmin F ( F (0 = +Diag(λ Σ0 0 argmin +Diag(λ Σ0 0 G(, (0 which implies that ˆ F ε. Otherwise, we suppose ˆ F > ε, then ε ˆ / ˆ F F = ε. Since for any F = ε, it follows G( > 0 = G(0 which is contradicted with the fact G( is a convex function and G( ˆ G(0 = 0, because ( ε 0 < G ˆ ˆ F ( = G ε ˆ + ˆ F ε G( ˆ + ˆ F ( ε ˆ ( F ε ˆ F 0 G(0 0. Finally ˆ F ε indicates that ˆΣ Σ 0 F ε due to ˆ F = U ˆ U F = ˆΣ Σ 0 F. Hence the desired result is obtained. Then in order to acquiring the rate of the estimation, the two following commonly used assumptions are needed to introduced, and also can be seen [, 5]. Assumption 2.4 holds, for example, if X l,, X l p are Gaussian. } Assumption 2.4 E {exp(t X 2l j C < hold for all j =,, p and 0 < t < t 0, where t 0 and C are two constants. Assumption 2.5 E{ Xl j 2α } C 2 < hold for all j =,, p, where some α 2 and C 2 is a constant. Built on the two assumptions, we give our main results with regard to rates of the estimator of (3. Theorem 2.6 For some δ [,2, let K be ( a sufficiently large constant and suppose Assumptions 2. logr and 2.4 hold. If λ = K n + log p s logr,τ = O nk δ P nr + s log p and s logr + (s log p/k δ nr K δ = o(n, then ˆΣ Σ 0 F = O P ( s logr n + s log p nk δ. ( Proof Since Assumptions 2. and 2.4 hold, a fact employed by Rothman et al. [6] is that for ν > 0 sufficiently small, { P max i,j n } σ0i j σ ni j ν C 3 p 2 exp { C 4 nν 2}, 6

7 where C 3 and C 4 are some constants. We then apply the bound s logr +(s log p/k δ = o(n and Lemma 2.3 with ε = 6K s logr n i,j n + s log p logr nk δ, λ = K n + log p nk δ to obtain that P { ˆΣ Σ 0 F ε } { } P max σ0i j σ ni j λ C 3 p 2 C 4K 2 δ r C 4K 2. Evidently, C 3 p 2 C 4K 2 δ r C 4K 2 can be arbitrarily close to one by choosing K sufficiently large. Corollary 2.7 For some δ [,2, let K be a sufficiently large constant such that K δ logr log p, and ( logr suppose Assumptions 2., 2.4 hold. If λ = K n,τ = O s logr P nr and s logr = o(n, then ˆΣ Σ 0 F = O P s logr. n (2 ( s logr Remark 2.8 Clearly, if the λ min (Σ 0 > 0 in Assumption 2., the better rate O P n would reduce ( s log p to O P n. It is worth mentioning that under the the Assumption 2., the minimax optimal rate ( s log p of convergence under the Frobenius norm in Theorem 4 of [7] is O P n which also has been obtained by [8]. However, to attain the same rate in the presence of the log-determinant barrier term (, Rothman [5] instead would require that λ min, the minimal eigenvalue of the true covariance matrix, should be bounded away from zero by some positive constant, and also that the barrier parameter should be bounded by some positive quantity. [8] illustrated this theory requiring a lower bound on λ min is not very appealing. Theorem 2.9 Let K 2 be a sufficiently large constant and suppose that Assumptions 2. and 2.5 hold. If ( p λ = K 4/α 2 n,τ = O sp 4/α P nr and sp 4/α = o(n, then ˆΣ Σ 0 F = O P sp 4/α n. (3 Proof Since Assumptions 2. and 2.5 hold, one can modify a result of Bickel & Levina (2008a and show that for ν > 0 sufficiently small { P max i,j n } σ0i j σ ni j ν p 2 C 5 n α/2 ν α, 7

8 where C 5 is a constant. We then apply the bound sp 4/α sp = o(n and Lemma 2.3 with ε = 6K 4/α 2 n,λ = p K 4/α 2 n to get P { ˆΣ Σ F 0 ε } { } P max σ0i j σ ni j λ C 6 K α i,j n Apparently, the bound C 6 K α 2 can be arbitrarily close to one by taking K 2 sufficiently large Alternating Direction Method of multipliers In this section, we will construct the alternating direction method of multipliers (ADMM to solve problem (3. By introducing an auxiliary variable Γ, problem (3 can be rewritten as min 2 Σ Σ n 2 F + λ Σ + τ Γ, (4 s.t. Γ 0, Σ Γ = 0. The constraint Γ 0 can be put into the objective function by using an indicator function: { 0, Γ 0 I (Γ 0 = +, otherwise. This leads to the following equivalent reformulation of (4: min 2 Σ Σ n 2 F + λ Σ + τ Γ + I (Γ 0, (5 s.t. Σ Γ = 0. Recently, the alternating direction method of multipliers (ADMM has been studied extensively for solving (5. A typical iteration of ADMM for solving (5 can be described as Γ k+ := argmin Γ L µ (Σ k,γ,λ k (6 Σ k+ := argmin Σ L µ (Σ,Γ k+,λ k (7 Λ k+ := Λ k ( Γ k+ Σ k+, (8 µ where the augmented Lagrangian function L µ (Σ,Γ,Λ is defined as L µ (Σ,Γ,Λ := 2 Σ Σ n 2 F + λ Σ + τ Γ +I (Γ 0 Λ,Γ Σ + 2µ Σ Γ 2 F, (9 in which Λ is the Lagrange multiplier and µ > 0 is a penalty parameter. Note that ADMM (6-8 can be written explicitly as 8

9 Γ k+ := argmin Γ 0 τ Γ + 2µ Γ (Σk + µλ k 2 F (20 Σ k+ := argmin Σ λ Σ + µ + 2µ Σ µ µ + (Σ n + µ Γk+ Λ k 2 F (2 Λ k+ := Λ k (Γ k+ Σ k+ /µ. (22 We now show that the two subproblems (20 and (2 can be easily solved. For the subproblem (20, Γ k+ = argmin τ Γ + Γ 0 2µ Γ (Σk + µλ k 2 F = argmin τ Γ, I + Γ 0 2µ Γ (Σk + µλ k 2 F = argmin Γ (Σ k + µλ k µτi 2 F Γ 0 = (Σ k + µλ k µτi +, (23 where (X + denote the projection of a matrix X onto the convex positive semidefinite cone S n +. Namely (X + = U Diag(max{λ(X,0}U, where X = U Diag(λ(X U and λ(x = (λ (X,λ 2 (X,,λ p (X. The solution of the second subproblem (2 is given by the l -shrinkage operation Σ k+ = = µ µ + Shrink(Σ n + µ Γk+ Λ k,λ µ ( { } ( max Pi j λ,0 sign Pi j µ + i,j p, (24 where P = ( P i j i,j p := Σ n + µ Γk+ Λ k and sign( is a sign function. Therefore, combining with (22-(24, the whole algorithm is written as follows Table : The framework of the ADMM. ADMM: Alternating Direction Method of Multipliers Initialize µ, λ, τ, Σ 0, Λ 0 ; Repeat Compute Γ k+ = (Σ k + µλ k µτi + ; Compute Σ k+ = µ µ+ Shrink(Σ n + µ Γk+ Λ k,λ; Compute Λ k+ := Λ k µ (Γk+ Σ k+. Untill Convergence Return To end this section, we prove that the sequence (Γ k,σ k,λ k produced by the alternating direction method of multipliers (Table converges to (ˆΓ, ˆΣ, ˆΛ, where (ˆΓ, ˆΣ is an optimal solution of (5 and 9

10 ˆΛ is the optimal dual variable. Now we label some necessary notations for the ease of presentation. Let H be a 2p 2p matrix defined as H = ( µ I p p 0 0 µi p p, the weighted norm 2 H stands for V 2 H := V, HV and the corresponding inner product, H is U,V H := U, HV. Before presenting the main theorem with regard to the global convergence of ADMM, we introduce the following lemma. Lemma 3. Assume that (ˆΓ, ˆΣ is an optimal solution of (5 and ˆΛ is the corresponding optimal dual variable associated with the equality constraint Σ Γ = 0. Then the sequence { (Γ k,σ k,λ k } produced by ADMM satisfies ˆV V k 2 H ˆV V k+ 2 H V k+ V k 2 H, (25 where V k = (Λ k,σ k and ˆV = ( ˆΛ, ˆΣ. Based on the lemma above, the convergent theorem can be derived immediately. Theorem 3.2 The sequence { (Γ k,σ k,λ k } generated by Algorithm from any starting point converges to an optimal solution of (5. 4 Numerical Simulations In this section we will exploit the proposed method ADMM to tackle two examples, one of which possessed the block structured population covariance matrix, and another utilized the banded population covariance matrix. Actually as the constraint Σ 0, our proposed model (3 is equivalent to ˆΣ = argmin Σ 0 2 Σ (Σ n τi 2 F + λ Σ. (26 So similar to the method in [8], one can solve the soft-thresholding estimator Σ st := argmin Σ 2 Σ (Σ n τi 2 F + λ Σ to initialize the Σ 0. If the derived Σ st 0 then the recovered sparse and low-rank semidefinite estimator ˆΣ = Σ st. In our stimulation, we uniformly initialize Λ 0 as the matrix with all entries being, Σ 0 as zero matrix and Σ st respectively. Unlike λ and τ, µ does not change the final covariance estimator, thus we fixed µ = just for simplicity and the stop criteria is set as { } max Γ k+ Γ k F, Σ k+ Σ k F For the sample dimensions, we always take n = 50 and p = 00,200,500,000. 0

11 4. Example I: Block Structure Analogous to the model, modified slightly here, emerged in [4] who synthesized n samples X l N (0,Σ 0 for a block diagonal population covariance matrix Σ 0 R p p, we will use K (= 5,0,20 blocks of random sizes, and each block is generated by v v where the entries of v are drawn i.i.d. from the uniform distribution on [,]. Evidently, the rank of Σ 0 produced in the way is K. Corresponding MATLAB code of generating X l is X l = Σ /2 0 r andn(p,, l =,2,,n, thereby deriving the sample covariance matrix Σ n = n ( Xl X ( X l X. n l= What is worth mentioning is that if Σ 0 is a positive definite matrix, the solution our method obtains would be a positive definite matrix with full rank. But fortunately, compared to some largest singular values of Σ 0, the left are relatively small so that can be ignored. Here, therefore, we consider the approximate rank (4. In addition, we say the sparsity of a matrix A = ( a i j R n p by sp(a = Card{ (i, j : a i j 0 } n p. Figure : Covariance estimation with K = 5, λ = 0.5,τ = 0.2 and p = 200. In the row above, F PR = 0.092,T PR =, whilst F PR = 0.08,T PR = in the row below.

12 Apart from the approximate ar and the sparsity sp of the sparse and low-rank semidefinite estimator ˆΣ = ( σ i j i,j p, we also take advantage of other two types of errors to show the selection performance of our proposed method ADMM: F PR := Card{ (i, j : σ 0i j 0& σ i j = 0 } { Card { Card (i, j : σ0i j 0& σ i j 0 } }, T PR := (i, j : σ i j = 0 Card { (i, j : σ i j 0 }, where F PR stands for the false positive rate, which means the rate of significant variables that are unselected over the whole zero entries, and T PR denotes the true positive rate, which implies the ratio of significant variables that are selected over the entire none zero elements. For more visualized purpose, we plot the Population Covariance, Sample Covariance an the Recovered Covariance. From Figures and 2, the left Population Covariance is Σ 0, the median Sample Covariance stands for Σ n and the right Recovered Covariance denotes ˆΣ. The yellow region stands for the sparse area in which the values are quite close (or most of them equal to zero, while the green zone is the place where the none zero entries locate. Moreover the deeper the green color is, the larger the value stands. Evidently the recovered covariance matrices are quite dependent on the sample covariance matrix. Figure 2: Covariance estimation with K = 5, λ = 0.5,τ = 0.2 and p = 500. In the row above, F PR = 0.098,T PR = 0.99, whilst F PR = 0.07,T PR = in the row below. 2

13 We then report average results over 00 runs. Timing (in seconds was carried out on a CPU 2.6GHz desktop. In computation, the sparse parameter λ and the low-rank parameter τ are given in corresponding stimulations respectively. Σ 0 ia taken as the zero matrix. As we mentioned before the rank of the produced Σ 0 actually is given, i.e., r (Σ 0 = ar (Σ 0 = K. Table 2 shows the performance of our approach under different dimensions p = 00,200,500,000 and various blocks K. Obviously, all the ar ( ˆΣ nearly tends to the true rank K. The values of F PR and T PR are quite desirable, which manifests that the selection performance of ADMM is very well. Moreover, the CPU time reveals the method runs relatively fast. In addition, with the K rising, for example p = 500, the F PR is decreasing while F PR is increasing, and the time spent by the method is also ascending. In addition, for small dimension such as p = 200 and large block such as K = 50, ADMM performs unstably because various stimulations can not be recovered. Table 2: The performance of ADMM under different dimensions p and blocks K. n = 50, ar (Σ 0 = K, λ = 0.25, τ = 0.5, Σ 0 = 0 p ar ( ˆΣ sp(σ 0 sp( ˆΣ F PR T PR Time K = K = K = K = To simply observe the performance of our proposed method under different parameters λ,τ and initialized Σ 0, we fix n = 50, p = 200 and ar (Σ 0 = K = 5. From Table 3, results of left columns of ar ( ˆΣ,F PR,T PR and Time are generated from the initialized Σ 0 = 0, and results of right columns are produced with Σ 0 = Σ st. One can easily to discern that when λ = 0.05 and Σ 0 = Σ st, the performance of the method is relatively bad regardless of what τ is taken due to the ar ( ˆΣ and T PR are undesirable. By contrast, when λ = 0.25 and 0.50, it behaves much better, particularly when τ = Moreover, with the increasing of λ, the rate F PR of significant variables that are unselected is rising, even though the percentage T PR of significant variables that are selected is ascending as well. By comparing the effectiveness of those two initialized point Σ 0 = 0 and Σ st, as shown in the table, ar ( ˆΣ,F PR,T PR and Time generated from Σ 0 = 0 are basically same, which means method with zero starting point 3

14 performs more stable. But when λ = 0.25 and 0.50, ADMM with Σ 0 = Σ st generates larger T PR than that from Σ 0 = 0, moreover it needs less computational time in all stimulations regardless of the parameters. Table 3: The performance of ADMM under distinct parameters λ,τ and initialized Σ 0. n = 50, p = 200, ar (Σ 0 = K = 5 λ τ ar ( ˆΣ F PR T PR Time Example II: Banded Structure In this part we consider the population covariance matrix with banded structure which has been emerged in [, 3, 8]. To be more exact, the population covariance matrix Σ 0 = (σ 0i j i,j p R p p has the following formula { σ 0i j = max } ( 0 i j, 0 = 0 i j. + We first report average results over 00 replicators and take the sparse parameter λ = 0.5 and the low-rank parameter τ = 0.75 respectively. Information listed in Table 4 shows the performance of our approach under different dimensions p = 00,200,500,000 and two distinct starting point Σ 0 = 0 and Σ st. As we can discern in Table 4, compared with Σ 0, the ar ( ˆΣ and sp( ˆΣ are relatively small, and the former ascends while the latter descends with the rise of p. In addition, in Example 4. the block structured Σ 0 whose r ank(σ 0 = ar (Σ 0 = K leads to the estimator the rank of ˆΣ is also close to K. Being distinct with that, in this example, the ar (Σ 0 increases with the dimension p and is not lowrank, but the recovered solution ˆΣ has been rendered the relatively low-rank property. The values of F PR and T PR are both quite desirable, which manifests that the selection performance of ADMM is very well in this example. Moreover, the CPU time reveals the method runs extremely fast as well. In addition, under such parameters Zλ = 0.5, τ = 0.75, ADMM behaves nearly identically even though the starting point Σ 0 are different. 4

15 Table 4: The performance of ADMM over 00 simulations under different dimensions p and Σ 0. n = 50, λ = 0.5, τ = 0.75 p ar (Σ 0 ar ( ˆΣ sp(σ 0 sp( ˆΣ F PR T PR Time Σ 0 = Σ 0 = Σ st Figure 3: Covariance estimation with λ = 0.5,τ = 0.75 and p = 00. In the row above, F PR = 0.062,T PR =, whilst F PR = 0.054,T PR = in the row below. Then from Figure 3, one can check that the recovered covariance matrices ˆΣ are quite dependent on the sample covariance matrix Σ n, and the selection performance are relatively well because F PR is pretty small while T PR is close to. 5

16 To simply observe the behavior of ADMM under different parameters λ,τ and initialized Σ 0, we fix n = 50, p = 200 and ar (Σ 0 = 76. As indicated in Table 5, results of left columns of ar ( ˆΣ,F PR,T PR and Time are generated from the initialized point Σ 0 = 0, and results of right columns are produced with Σ 0 = Σ st. It is clear that the CPU time cost by the method with starting point Σ st is almost less than that from zero initialization. With the λ rising, TPR generated by the method with Σ 0 = Σ st is increasing to, while that with Σ 0 = 0 basically stabilizes at In terms of the ar ( ˆΣ, the proposed method with starting point Σ st will not create a lower rank solution, comparing with Σ 0 = 0. The reason for this phenomenon probably is that Σ st is a sparse point but not low rank; from (26 if Σ st is an approximately semidefinite positive matrix, algorithm will stop after a few iterations which results in the solution is not a desired low-rank one. Table 5: The performance of ADMM under distinct parameters λ,τ and initialized Σ 0. n = 50, p = 200, ar (Σ 0 = 76 λ τ ar ( ˆΣ F PR T PR Time Conclusion We have acquired a positive semidefinite estimator, being simultaneously sparse and low-rank, from samples of the covariance matrices through utilizing l norm and nuclear norm penalties. The theoretical properties manifest that in high-dimensional settings the estimator we have constructed performs very well. Meantime, the efficient ADMM with global convergence has possessed several merits illustrated by the numerical simulations, such as less computational time and beautiful recovered effectiveness. 6

17 Acknowledgement The work was supported in part by the National Basic Research Program of China (200CB73250, the National Natural Science Foundation of China (708, 72702, References [] Bickel, P. and Levina, E.: Regularized estimation of large covariance matrices. Ann. Statist. 36, (2008. [2] Bickel, P. and Levina, E.: Covariance regularization by thresholding. Ann. Statist. 36, (2008. [3] Cai, T. and Liu, W.: Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 06, -3 (20. [4] Cai, T., Zhang, C.: and Zhou, H., Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38, (200. [5] Cai T T, Yuan M.: Adaptive covariance matrix estimation through block thresholding. Ann.Statist. 40, (202. [6] Cai, T. and Zhou, H.: Minimax estimation of large covariance matrices under l -norm. Statist. Sinica 22, (202. [7] Cai, T. and Zhou, H.: Optimal rates of convergence for sparse covariance matrix estimation. Ann.Statist. 40, (202. [8] El Karoui, N.: Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Statist. 36, (2008. [9] Friedman, J., Hastie, T. and Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso. Biostat. 9, 432 (2008. [0] Furrer, R. and Bengtsson, T.: Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal. 98, (2007. [] Johnstone, I.: On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist., (200. [2] Liu H, Wang L, Zhao T.: Sparse Covariance Matrix Estimation with Eigenvalue Constraints. Journal of Computational and Graphical Statistics, (203 accepted. 7

18 [3] Oymak, S., Jalali, A., Fazel, M., Eldar, Y. C. and Hassibi, B.: Simultaneously structured models with application to sparse and low-rank matrices. arxiv preprint arxiv: (202 to appear. [4] Richard, E., Savalle, P. A., and Vayatis, N.: Estimation of simultaneously sparse and low rank matrices. arxiv preprint arxiv: (202 to appear. [5] Rothman, A.: Positive definite estimators of large covariance matrices. Biometrika 99, (202. [6] Rothman, A., Levina, E., and Zhu, J.: Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 04, (2009. [7] Wu, W. and Pourahmadi, M.: Nonparametric estimation of large covariance matrices of longitudinal data. Biomet. 90, (2003. [8] Xue, L., Ma, S. and Zou, H.: Positive Definite l Penalized Estimation of Large Covariance Matrices. J. Amer. Statist. Assoc., 07, (202. [9] Yuan X.M.: Alternating direction method of multipliers for covariance selection models. Journal of Scientific Computing 5, (202. Appendix Proof of Lemma 3. For convenience, we denote that g (Γ := τ Γ + I (Γ 0. Clearly, g : R p p R is a convex function. Since (ˆΓ, ˆΣ is an optimal solution of (5, which satisfies the following KKT conditions: ˆΛ ( τ ˆΓ + I (ˆΓ 0 = g (ˆΓ, (27 ( ˆΛ ˆΣ + Σ n ˆΣ, (28 λ ˆΣ = ˆΓ. (29 Note that the optimality conditions for the first subproblem in ADMM, i.e., the subproblem with respect to Γ in (6, are given by Λ k µ (Γk+ Σ k g (Γ k+, (30 this together with (8, i.e., Λ k = Λ k+ + µ ( Γ k+ Σ k+, we have Λ k+ µ (Σk+ Σ k g (Γ k+, (3 8

19 Combining (27 and (3 and using the fact that g ( is a monotone operator, we get ˆΓ Γ k+, ˆΛ Λ k+ + µ (Σk+ Σ k 0. (32 The optimality conditions for the second subproblem (i.e., the subproblem with respect to Σ in (7 are given by λ ( Λ k Σ k+ + Σ n µ (Σk+ Γ k+ Σ k+, (33 this together with (8, i.e., Λ k = Λ k+ + µ λ ( Γ k+ Σ k+, we have ( Λ k+ Σ k+ + Σ n Σ k+, (34 Similarly, combining (28 and (34, using the fact that is a monotone operator, we get ˆΣ Σ k+, ˆΛ + Λ k+ ( ˆΣ Σ k+ 0, (35 The summation of (32 and (35 gives ˆΣ Σ k+ 2 F ˆΣ Σ k+, ˆΛ + Λ k+ + ˆΓ Γ k+, ˆΛ Λ k+ + µ ˆΓ Γ k+,σ k+ Σ k = ˆΣ Σ k+, ˆΛ + Λ k+ + ˆΣ Σ k+ + µ(λ k+ Λ k, ˆΛ Λ k+ + µ ˆΣ Σ k+ + µ(λ k+ Λ k,σ k+ Σ k, (36 where the equality because of Γ k+ = Σ k+ µ(λ k+ Λ k and ˆΣ = ˆΓ. Simple algebraic derivation from (36 yields the following inequality: ˆΣ Σ k+ 2 F Λk+ Λ k,σ k+ Σ k µ ˆΛ Λ k+,λ k+ Λ k + µ ˆΣ Σ k+,σ k+ Σ k, (37 Rearranging the right hand side of (37 using ˆΛ Λ k+ = ˆΛ Λ k + Λ k Λ k+ and ˆΣ Σ k+ = ˆΣ Σ k + Σ k Σ k+, then (37 can be reduced to µ ˆΛ Λ k,λ k+ Λ k + µ ˆΣ Σ k,σ k+ Σ k µ Λ k+ Λ k 2 F + µ Σk+ Σ k 2 F + ˆΣ Σ k+ 2 F Λk+ Λ k,σ k+ Σ k, Using the notation of V k and ˆV, the inequality above can be rewritten as ˆV V k,v k+ V k H V k+ V k 2 H + ˆΣ Σ k+ 2 F Λk+ Λ k,σ k+ Σ k, (38 9

20 Combining (38 with the following identity we get ˆV V k+ 2 H = V k+ V k 2 H 2 ˆV V k,v k+ V k H + ˆV V k 2 H, ˆV V k 2 H ˆV V k+ 2 H = 2 ˆV V k,v k+ V k H V k+ V k 2 H 2 V k+ V k 2 H + 2 ˆΣ Σ k+ 2 F 2 Σk+ Σ k,λ k+ Λ k V k+ V k 2 H = V k+ V k 2 H + 2 ˆΣ Σ k+ 2 F 2 Σk+ Σ k,λ k+ Λ k, (39 Now, using (34 for k instead of k +, we get, λ ( Λk Σ k + Σ n Σ k, (40 Combining (34,(40 and using the fact that is a monotone function, we obtain, Σ k+ Σ k, Λ k+ + Λ k (Σ k+ Σ k 0, which means Σ k+ Σ k,λ k+ Λ k Σ k+ Σ k 2 F 0, (4 By substituting (4 into (39, we get the desired result (25. Proof of Theorem 3.2 From Lemma 3., we can get that (a V k+ V k 2 H 0; (b {V k } lies in a compact region; (c V k+ ˆV 2 H is monotonically non-increasing and thus converges. Connecting with notations of H,V and (a, it holds that Λ k+ Λ k 0 and Σ k+ Σ k 0, which together with (8 imply that Γ k+ Γ k 0 and Σ k Γ k 0. From (b, {V k } has a subsequence {V k j } that converges to ˇV = ( ˇΛ, ˇΣ, i.e., Λ k j ˇΛ and Σ k j ˇΣ. Also we have Γ k j ˇΓ(:= ˇΣ from Σ k Γ k 0. Therefore, (ˇΓ, ˇΣ, ˇΛ is a limit point of { (Γ k,σ k,λ k }. Note that (30 and (33 respectively imply that those together with ˇΓ = ˇΣ, it reduces that ˇΛ + µ (ˇΓ ˇΣ g (ˇΓ, λ ( ˇΛ ˇΣ + Σ n µ ( ˇΣ ˇΓ ˇΣ, ˇΛ g (ˇΓ, (42 ( ˇΛ ˇΣ + Σ n ˇΣ, (43 λ (42,(43 and ˇΓ = ˇΣ mean that (ˇΓ, ˇΣ, ˇΛ is an optimal solution to (5. Therefore, we showed that any limit point of { (Γ k,σ k,λ k } is an optimal solution to (5. 20

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Posterior convergence rates for estimating large precision. matrices using graphical models

Posterior convergence rates for estimating large precision. matrices using graphical models Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department

More information

Sparse Covariance Matrix Estimation with Eigenvalue Constraints

Sparse Covariance Matrix Estimation with Eigenvalue Constraints Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

arxiv: v1 [math.st] 31 Jan 2008

arxiv: v1 [math.st] 31 Jan 2008 Electronic Journal of Statistics ISSN: 1935-7524 Sparse Permutation Invariant arxiv:0801.4837v1 [math.st] 31 Jan 2008 Covariance Estimation Adam Rothman University of Michigan Ann Arbor, MI 48109-1107.

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Sparse Permutation Invariant Covariance Estimation

Sparse Permutation Invariant Covariance Estimation Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan, Ann Arbor, USA. Peter J. Bickel University of California, Berkeley, USA. Elizaveta Levina University of Michigan,

More information

Sparse estimation of high-dimensional covariance matrices

Sparse estimation of high-dimensional covariance matrices Sparse estimation of high-dimensional covariance matrices by Adam J. Rothman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Tuning-parameter selection in regularized estimations of large covariance matrices

Tuning-parameter selection in regularized estimations of large covariance matrices Journal of Statistical Computation and Simulation, 2016 Vol. 86, No. 3, 494 509, http://dx.doi.org/10.1080/00949655.2015.1017823 Tuning-parameter selection in regularized estimations of large covariance

More information

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models LETTER Communicated by Clifford Lam Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models Lingyan Ruan lruan@gatech.edu Ming Yuan ming.yuan@isye.gatech.edu School of Industrial and

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

Latent Variable Graphical Model Selection Via Convex Optimization

Latent Variable Graphical Model Selection Via Convex Optimization Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Estimation of Graphical Models with Shape Restriction

Estimation of Graphical Models with Shape Restriction Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Sparse Permutation Invariant Covariance Estimation: Final Talk

Sparse Permutation Invariant Covariance Estimation: Final Talk Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

arxiv: v1 [stat.me] 16 Feb 2018

arxiv: v1 [stat.me] 16 Feb 2018 Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University

More information

Research Article Solving the Matrix Nearness Problem in the Maximum Norm by Applying a Projection and Contraction Method

Research Article Solving the Matrix Nearness Problem in the Maximum Norm by Applying a Projection and Contraction Method Advances in Operations Research Volume 01, Article ID 357954, 15 pages doi:10.1155/01/357954 Research Article Solving the Matrix Nearness Problem in the Maximum Norm by Applying a Projection and Contraction

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics

More information

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Zhaoran Wang Huanran Lu Han Liu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08540 {zhaoran,huanranl,hanliu}@princeton.edu

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Qian Zhao 1,2 Deyu Meng 1,2 Xu Kong 3 Qi Xie 1,2 Wenfei Cao 1,2 Yao Wang 1,2 Zongben Xu 1,2 1 School of Mathematics and Statistics,

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,

More information

arxiv: v1 [math.ra] 11 Aug 2014

arxiv: v1 [math.ra] 11 Aug 2014 Double B-tensors and quasi-double B-tensors Chaoqian Li, Yaotang Li arxiv:1408.2299v1 [math.ra] 11 Aug 2014 a School of Mathematics and Statistics, Yunnan University, Kunming, Yunnan, P. R. China 650091

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

arxiv: v2 [math.st] 2 Jul 2017

arxiv: v2 [math.st] 2 Jul 2017 A Relaxed Approach to Estimating Large Portfolios Mehmet Caner Esra Ulasan Laurent Callot A.Özlem Önder July 4, 2017 arxiv:1611.07347v2 [math.st] 2 Jul 2017 Abstract This paper considers three aspects

More information

EE 227A: Convex Optimization and Applications October 14, 2008

EE 227A: Convex Optimization and Applications October 14, 2008 EE 227A: Convex Optimization and Applications October 14, 2008 Lecture 13: SDP Duality Lecturer: Laurent El Ghaoui Reading assignment: Chapter 5 of BV. 13.1 Direct approach 13.1.1 Primal problem Consider

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

arxiv: v1 [stat.co] 22 Jan 2019

arxiv: v1 [stat.co] 22 Jan 2019 A Fast Iterative Algorithm for High-dimensional Differential Network arxiv:9.75v [stat.co] Jan 9 Zhou Tang,, Zhangsheng Yu,, and Cheng Wang Department of Bioinformatics and Biostatistics, Shanghai Jiao

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Minimax Rate-Optimal Estimation of High- Dimensional Covariance Matrices with Incomplete Data

Minimax Rate-Optimal Estimation of High- Dimensional Covariance Matrices with Incomplete Data University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 9-2016 Minimax Rate-Optimal Estimation of High- Dimensional Covariance Matrices with Incomplete Data T. Tony Cai University

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Computational Statistics and Data Analysis. SURE-tuned tapering estimation of large covariance matrices

Computational Statistics and Data Analysis. SURE-tuned tapering estimation of large covariance matrices Computational Statistics and Data Analysis 58(2013) 339 351 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

More information

RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS. December

RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS. December RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS MIN TAO AND XIAOMING YUAN December 31 2009 Abstract. Many applications arising in a variety of fields can be

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Caihua Chen Bingsheng He Yinyu Ye Xiaoming Yuan 4 December, Abstract. The alternating direction method

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Sparse orthogonal factor analysis

Sparse orthogonal factor analysis Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In

More information

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of

More information

Recovery of Simultaneously Structured Models using Convex Optimization

Recovery of Simultaneously Structured Models using Convex Optimization Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity

More information

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

Low Rank Matrix Completion Formulation and Algorithm

Low Rank Matrix Completion Formulation and Algorithm 1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

(k, q)-trace norm for sparse matrix factorization

(k, q)-trace norm for sparse matrix factorization (k, q)-trace norm for sparse matrix factorization Emile Richard Department of Electrical Engineering Stanford University emileric@stanford.edu Guillaume Obozinski Imagine Ecole des Ponts ParisTech guillaume.obozinski@imagine.enpc.fr

More information

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

Two Results on the Schatten p-quasi-norm Minimization for Low-Rank Matrix Recovery

Two Results on the Schatten p-quasi-norm Minimization for Low-Rank Matrix Recovery Two Results on the Schatten p-quasi-norm Minimization for Low-Rank Matrix Recovery Ming-Jun Lai, Song Li, Louis Y. Liu and Huimin Wang August 14, 2012 Abstract We shall provide a sufficient condition to

More information

Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization

Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization Yuan Shen Zaiwen Wen Yin Zhang January 11, 2011 Abstract The matrix separation problem aims to separate

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Low-Rank Factorization Models for Matrix Completion and Matrix Separation for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information