arxiv: v2 [math.st] 7 Aug 2014
|
|
- Prudence Randall
- 6 years ago
- Views:
Transcription
1 Sparse and Low-Rank Covariance Matrices Estimation Shenglong Zhou, Naihua Xiu, Ziyan Luo +, Lingchen Kong Department of Applied Mathematics + State Key Laboratory of Rail Traffic Control and Safety arxiv: v2 [math.st] 7 Aug Beijing Jiaotong University, Beijing 00044, P. R. China Abstract This paper aims at achieving a simultaneously sparse and low-rank estimator from the semidefinite population covariance matrices.we first benefit from a convex optimization which develops l -norm penalty to encourage the sparsity and nuclear norm to favor the low-rank property. For the proposed estimator, we then prove that with large probability, the Frobenious norm of the estimation rate can be of order O ( (s logr /n under a mild case, where s and r denote the number of sparse entries and the rank of the population covariance respectively, n notes the sample capacity. Finally an efficient alternating direction method of multipliers with global convergence is proposed to tackle this problem, and meantime merits of the approach are also illustrated by practicing numerical simulations. Keywords: Covariance matrices, Sparse, Low-rank, Rate of estimation, Alternating direction method of multipliers Introduction Estimation of population covariance matrices from samples of multivariate data has draw many attentions in the last decade owing to its fundamental importance in multivariate analysis. With dramatic advances in technology in recent years, various research fields, such as genetic data, brain imaging, spectroscopic imaging, climate data and so on, have been used to deal with massive highdimensional data sets, whose sample sizes can be very small relative to dimension. In such settings, the standard and the most usual sample covariance matrices often performs poorly [, 2, ]. Fortunately, regularization as a class of new methods to estimate covariance matrices has recently emerged to overcome those shortages of using traditional sample covariance matrices. These methods encompass several specified forms, banding [, 6, 7], tapering [4, 0] and thresholding [2, 5, 8, 6] for instance. Aug 7, longnan_zsl@63.com, nhxiu@bjtu.edu.cn, zyluo@bjtu.edu.cn,konglchen@26.com
2 Moreover, there are many cases where the model is known to be structured in several ways at the same time. In recent years, one of research contents is to estimate a covariance matrix possessing both sparsity and positive definiteness. For instance, Rothman [5] gave the following model: ˆΣ + = argmin Σ 0 2 Σ Σ n 2 F τlogdet(σ + λ Σ, ( where Σ n is the sample covariance matrix, F is the Frobenious norm, is the element-wise l -norm and Σ := Σ Diag(Σ. From the optimization viewpoint, ( is similar to the graphical lasso criterion [9] which also has a log-determinant part and the element-wise l -penalty. Rothman [5] derived an iterative procedure to solve (. While Xue, Ma and Zou [8] omitted the log-determinant part and considered the positive definite constraint {Σ εi} for some arbitrarily small ε > 0: ˆΣ ε = argmin Σ εi 2 Σ Σ n 2 F + λ Σ, (2 They utilized an efficient alternating direction method (ADM to solve the challenging problem (2 and established its convergence properties. Most of the literatures, e.g.,[2, 5, 8], required the population covariance matrices being positive definite, and thus there is no essence of pursuing the low-rank of the estimator. By contrast, newly appeared research topic is to consider simultaneously the sparsity and low-rank of a structured model, which implies that the population covariance matrices are no longer restricted to the positive definite matrix cone and can be relaxed to the positive semidefinite cone. In addition, the models with structure of being simultaneously the sparsity and low-rank are widely applied into practice, such as sparse signal recovery from quadratic measurements and sparse phase retrieval, see [3] for example. Moreover, Richard et al. [4] showed that both sparse and low-rank model can be derived in covariance matrix when the random variables are highly correlated in groups, which means this covariance matrix has a block diagonal structure. With stimulations of those ideas, we construct the following convex model encompassing the l - norm and nuclear norm for estimating the covariance matrix: ˆΣ = argmin Σ 0 2 Σ Σ n 2 F + λ Σ + τ Σ, (3 where λ 0,τ 0 are tuning parameters. The l -norm penalty Σ = i,j σ i j is also called lassotype penalty and is used to encourage sparse solutions. The nuclear norm, Σ = i λ i (Σ with λ i (Σ being the eigenvalue of Σ, is the trace norm when Σ 0 and ensures low-rank solutions of (3. Here we inroduce the approximate rank to interpret the low-rank, which is defined as ar (A := ar {,2,, p} being the smallest number such that σ ar + (A γ, (4 σ (A 2
3 where σ i (A(i =,2,, p are the singular value of A with σ (A σ 2 (A σ p (A, and γ > 0 could be chosen based on the needs, throughout our paper we fix γ = 0.00 for simplicity. The contributions of this paper mainly center on two aspects. For one thing, being different from [3, 4], we establish the theoretical statistical theory under different assumptions rather than giving the generalized error bound of the estimation. Especially, we acquire the estimation rate O ( (s logr /n under the Frobenious norm error, which improves the optimal rate O ( (s log p/n where the low-rank property of the estimator does not be considered [7, 5, 8] and p is the samples dimension with p > max{n,r }. For another, we take advantage of the alternating direction method of multipliers (ADMM, also can be seen in [8, 9], to combat our problem (3. The organization of this paper is as follows. In Section 2 we will present some theoretical properties of the estimator derived by the proposed model (3. After that the alternating direction method of multipliers (ADMM is going to be introduced to combat the problem, and numerical experiments are projected to show the performance of this method in Sections 3 and 4 respectively. We make a conclusion in the last section. 2 A Sparse and Low-Rank Covariance Estimator Before the main part, we hereafter introduce some notations. E(X and P(A denote the expectation of X and the probability of the incident A occurring respectively. Card(S is the number of entries of the set S. Normal distribution with mean µ and covariance Σ is written as N ( µ,σ. Say Y n = O P ( if for every ε > 0, there is a C > 0 such that P{ Y n > C } < ε for all n n 0 (ε, and say Y n = O P (a n if Y n /a n = O P (. If there are two constants C C 2 such that C X n /Y n C 2, we write as X n Y n. For given observed independently and identically distributed (i.i.d. for short p-variate random variables X,, X n with covariance matrix Σ 0 and p > n, the goal is to estimate the unknown matrix Σ 0 based on the sample {X l : l =,,n}. This problem is called covariance matrix estimation which is of fundamental importance in multivariate analysis. Given a random sample {X,, X n } from E(X = 0 (without loss of generality and a population covariance matrix Σ 0 = ( σ 0i j i,j p = E( X X, the sample covariance matrix is Σ n = ( σ ni j i,j p = n ( Xl X ( X l X, n where X = n n l= X l. Denote S the support set of the population covariance matrix Σ 0 as l= S = { (i, j : σ 0i j 0 }, s = Card(S, r = rank(σ 0. Assumption 2. For all p, 0 λ min (Σ 0 λ max (Σ 0 λ <, where λ is a constant. 3
4 Assumption 2. is a common used condition in covariance matrix estimation, on which a useful lemma based is recalled here for the sequel analysis. One can also refer it in []. Lemma 2.2 Let X l R p,l =,,n be i.i.d. N (0,Σ 0 and Assumption 2. holds, (i.e., for all p, λ max (Σ 0 λ <. Then, if Σ 0 = ( σ 0i j i,j p, n ( P{ } Xl i X l j σ 0i j nν C exp { C 2 nν 2}, f or ν δ (5 l= where constants C,C 2 and δ depend on λ only. Another lemma which plays an important role in our main results is stated below. Lemma 2.3 Suppose that Assumption 2. holds, λ ε 6 s,τ ε 8. Then for ε > 0 sufficiently small, r max σ0i j σ ni j λ i mpl i es ˆΣ Σ 0 F ε, (6 where ˆΣ is defined as (3. i,j p Proof First make the eigenvalue decomposition of Σ 0 (r = rank(σ 0 as ( Σ 0 = UΛ Σ0 U Diag(λ(Σ0 U U, 0 where U R p p with UU = U U = I p is the matrix composed of eigenvectors, Diag(λ(Σ 0 R r r is a diagonal matrix generated by eigenvalues with λ(σ 0 = (λ (Σ 0,,λ r (Σ 0 and λ j (Σ 0 > 0, j =,,r. By denoting Σ := U U + Σ 0 with = consider the model ˆ := argmin +Λ Σ0 0 argmin +Λ Σ0 0 F ( = ( argmin +Diag(λ(Σ 0 0 and R r r, which implies = U ΣU Λ Σ0, we F ( (7 U U + Σ 0 Σ n 2 F 2 + λ U U + Σ 0 + τ U U + Σ 0. Clearly, from (3 we have ˆΣ = U ˆ U +Σ 0 which implies ˆ = U ˆΣU Λ Σ0. For a given ε > 0 sufficiently small and any F = ε (i.e., F = ε, we compute F ( F (0 = U U + Σ 0 Σ n 2 F 2 + λ U U + Σ 0 + τ U U + Σ 0 2 Σ 0 Σ n 2 F + λ Σ 0 + τ Σ 0 = 2 2 F + U U,Σ 0 Σ n + λ ( U U + Σ 0 Σ 0 +τ ( U U + Σ 0 Σ F + I + II + III. 4
5 For convenience we denote = U U. Then for I, it holds I =,Σ 0 Σ n ( tr (Σ n Σ 0 = ( i j σ0i j σ ni j max σ0i j σ ni j, i,j i,j p For II, we obtain by noting S = { (i, j : σ 0i j 0 } and s = Card(S that, II = λ ( Σ 0 + Σ 0 = λ ( Σ 0S + S + S C Σ 0S ( λ Σ 0S + S + S C ( Σ 0S + S + S = λ ( S C S. From the Hölder Inequality, one can prove that = = = For III, combining with (8 we get that r λ i ( r F = r F, (8 i= S s S F. (9 III = τ ( Σ 0 + Σ 0 τ = τ τ r F. Since max i,j σ0i j σ ni j λ and (9, G( F ( F (0 = 2 2 F + I + II + III 2 2 F λ + λ ( S C S τ r F = 2 2 F ( λ S + S C ( S C S τ r F = 2 2 F 2λ S τ r F 2 2 F 2λ s S F τ r F 2 2 F 2λ s F τ r F Therefore, by λ = 2 2 F 2λ s F τ r F. ε 6 s,τ ε 8 r G( 2 2 F 2λ s S F τ r F ε2 2 ε2 8 ε2 8 = ε2 4 > 0. 5
6 Hence we prove that if λ ε 6 s,τ ε 8 r and max i,j σ0i j σ ni j λ, for any F = ε, it holds G( F ( F (0 > 0. In addition, from (7, we have ˆ = argmin F ( F (0 = +Diag(λ Σ0 0 argmin +Diag(λ Σ0 0 G(, (0 which implies that ˆ F ε. Otherwise, we suppose ˆ F > ε, then ε ˆ / ˆ F F = ε. Since for any F = ε, it follows G( > 0 = G(0 which is contradicted with the fact G( is a convex function and G( ˆ G(0 = 0, because ( ε 0 < G ˆ ˆ F ( = G ε ˆ + ˆ F ε G( ˆ + ˆ F ( ε ˆ ( F ε ˆ F 0 G(0 0. Finally ˆ F ε indicates that ˆΣ Σ 0 F ε due to ˆ F = U ˆ U F = ˆΣ Σ 0 F. Hence the desired result is obtained. Then in order to acquiring the rate of the estimation, the two following commonly used assumptions are needed to introduced, and also can be seen [, 5]. Assumption 2.4 holds, for example, if X l,, X l p are Gaussian. } Assumption 2.4 E {exp(t X 2l j C < hold for all j =,, p and 0 < t < t 0, where t 0 and C are two constants. Assumption 2.5 E{ Xl j 2α } C 2 < hold for all j =,, p, where some α 2 and C 2 is a constant. Built on the two assumptions, we give our main results with regard to rates of the estimator of (3. Theorem 2.6 For some δ [,2, let K be ( a sufficiently large constant and suppose Assumptions 2. logr and 2.4 hold. If λ = K n + log p s logr,τ = O nk δ P nr + s log p and s logr + (s log p/k δ nr K δ = o(n, then ˆΣ Σ 0 F = O P ( s logr n + s log p nk δ. ( Proof Since Assumptions 2. and 2.4 hold, a fact employed by Rothman et al. [6] is that for ν > 0 sufficiently small, { P max i,j n } σ0i j σ ni j ν C 3 p 2 exp { C 4 nν 2}, 6
7 where C 3 and C 4 are some constants. We then apply the bound s logr +(s log p/k δ = o(n and Lemma 2.3 with ε = 6K s logr n i,j n + s log p logr nk δ, λ = K n + log p nk δ to obtain that P { ˆΣ Σ 0 F ε } { } P max σ0i j σ ni j λ C 3 p 2 C 4K 2 δ r C 4K 2. Evidently, C 3 p 2 C 4K 2 δ r C 4K 2 can be arbitrarily close to one by choosing K sufficiently large. Corollary 2.7 For some δ [,2, let K be a sufficiently large constant such that K δ logr log p, and ( logr suppose Assumptions 2., 2.4 hold. If λ = K n,τ = O s logr P nr and s logr = o(n, then ˆΣ Σ 0 F = O P s logr. n (2 ( s logr Remark 2.8 Clearly, if the λ min (Σ 0 > 0 in Assumption 2., the better rate O P n would reduce ( s log p to O P n. It is worth mentioning that under the the Assumption 2., the minimax optimal rate ( s log p of convergence under the Frobenius norm in Theorem 4 of [7] is O P n which also has been obtained by [8]. However, to attain the same rate in the presence of the log-determinant barrier term (, Rothman [5] instead would require that λ min, the minimal eigenvalue of the true covariance matrix, should be bounded away from zero by some positive constant, and also that the barrier parameter should be bounded by some positive quantity. [8] illustrated this theory requiring a lower bound on λ min is not very appealing. Theorem 2.9 Let K 2 be a sufficiently large constant and suppose that Assumptions 2. and 2.5 hold. If ( p λ = K 4/α 2 n,τ = O sp 4/α P nr and sp 4/α = o(n, then ˆΣ Σ 0 F = O P sp 4/α n. (3 Proof Since Assumptions 2. and 2.5 hold, one can modify a result of Bickel & Levina (2008a and show that for ν > 0 sufficiently small { P max i,j n } σ0i j σ ni j ν p 2 C 5 n α/2 ν α, 7
8 where C 5 is a constant. We then apply the bound sp 4/α sp = o(n and Lemma 2.3 with ε = 6K 4/α 2 n,λ = p K 4/α 2 n to get P { ˆΣ Σ F 0 ε } { } P max σ0i j σ ni j λ C 6 K α i,j n Apparently, the bound C 6 K α 2 can be arbitrarily close to one by taking K 2 sufficiently large Alternating Direction Method of multipliers In this section, we will construct the alternating direction method of multipliers (ADMM to solve problem (3. By introducing an auxiliary variable Γ, problem (3 can be rewritten as min 2 Σ Σ n 2 F + λ Σ + τ Γ, (4 s.t. Γ 0, Σ Γ = 0. The constraint Γ 0 can be put into the objective function by using an indicator function: { 0, Γ 0 I (Γ 0 = +, otherwise. This leads to the following equivalent reformulation of (4: min 2 Σ Σ n 2 F + λ Σ + τ Γ + I (Γ 0, (5 s.t. Σ Γ = 0. Recently, the alternating direction method of multipliers (ADMM has been studied extensively for solving (5. A typical iteration of ADMM for solving (5 can be described as Γ k+ := argmin Γ L µ (Σ k,γ,λ k (6 Σ k+ := argmin Σ L µ (Σ,Γ k+,λ k (7 Λ k+ := Λ k ( Γ k+ Σ k+, (8 µ where the augmented Lagrangian function L µ (Σ,Γ,Λ is defined as L µ (Σ,Γ,Λ := 2 Σ Σ n 2 F + λ Σ + τ Γ +I (Γ 0 Λ,Γ Σ + 2µ Σ Γ 2 F, (9 in which Λ is the Lagrange multiplier and µ > 0 is a penalty parameter. Note that ADMM (6-8 can be written explicitly as 8
9 Γ k+ := argmin Γ 0 τ Γ + 2µ Γ (Σk + µλ k 2 F (20 Σ k+ := argmin Σ λ Σ + µ + 2µ Σ µ µ + (Σ n + µ Γk+ Λ k 2 F (2 Λ k+ := Λ k (Γ k+ Σ k+ /µ. (22 We now show that the two subproblems (20 and (2 can be easily solved. For the subproblem (20, Γ k+ = argmin τ Γ + Γ 0 2µ Γ (Σk + µλ k 2 F = argmin τ Γ, I + Γ 0 2µ Γ (Σk + µλ k 2 F = argmin Γ (Σ k + µλ k µτi 2 F Γ 0 = (Σ k + µλ k µτi +, (23 where (X + denote the projection of a matrix X onto the convex positive semidefinite cone S n +. Namely (X + = U Diag(max{λ(X,0}U, where X = U Diag(λ(X U and λ(x = (λ (X,λ 2 (X,,λ p (X. The solution of the second subproblem (2 is given by the l -shrinkage operation Σ k+ = = µ µ + Shrink(Σ n + µ Γk+ Λ k,λ µ ( { } ( max Pi j λ,0 sign Pi j µ + i,j p, (24 where P = ( P i j i,j p := Σ n + µ Γk+ Λ k and sign( is a sign function. Therefore, combining with (22-(24, the whole algorithm is written as follows Table : The framework of the ADMM. ADMM: Alternating Direction Method of Multipliers Initialize µ, λ, τ, Σ 0, Λ 0 ; Repeat Compute Γ k+ = (Σ k + µλ k µτi + ; Compute Σ k+ = µ µ+ Shrink(Σ n + µ Γk+ Λ k,λ; Compute Λ k+ := Λ k µ (Γk+ Σ k+. Untill Convergence Return To end this section, we prove that the sequence (Γ k,σ k,λ k produced by the alternating direction method of multipliers (Table converges to (ˆΓ, ˆΣ, ˆΛ, where (ˆΓ, ˆΣ is an optimal solution of (5 and 9
10 ˆΛ is the optimal dual variable. Now we label some necessary notations for the ease of presentation. Let H be a 2p 2p matrix defined as H = ( µ I p p 0 0 µi p p, the weighted norm 2 H stands for V 2 H := V, HV and the corresponding inner product, H is U,V H := U, HV. Before presenting the main theorem with regard to the global convergence of ADMM, we introduce the following lemma. Lemma 3. Assume that (ˆΓ, ˆΣ is an optimal solution of (5 and ˆΛ is the corresponding optimal dual variable associated with the equality constraint Σ Γ = 0. Then the sequence { (Γ k,σ k,λ k } produced by ADMM satisfies ˆV V k 2 H ˆV V k+ 2 H V k+ V k 2 H, (25 where V k = (Λ k,σ k and ˆV = ( ˆΛ, ˆΣ. Based on the lemma above, the convergent theorem can be derived immediately. Theorem 3.2 The sequence { (Γ k,σ k,λ k } generated by Algorithm from any starting point converges to an optimal solution of (5. 4 Numerical Simulations In this section we will exploit the proposed method ADMM to tackle two examples, one of which possessed the block structured population covariance matrix, and another utilized the banded population covariance matrix. Actually as the constraint Σ 0, our proposed model (3 is equivalent to ˆΣ = argmin Σ 0 2 Σ (Σ n τi 2 F + λ Σ. (26 So similar to the method in [8], one can solve the soft-thresholding estimator Σ st := argmin Σ 2 Σ (Σ n τi 2 F + λ Σ to initialize the Σ 0. If the derived Σ st 0 then the recovered sparse and low-rank semidefinite estimator ˆΣ = Σ st. In our stimulation, we uniformly initialize Λ 0 as the matrix with all entries being, Σ 0 as zero matrix and Σ st respectively. Unlike λ and τ, µ does not change the final covariance estimator, thus we fixed µ = just for simplicity and the stop criteria is set as { } max Γ k+ Γ k F, Σ k+ Σ k F For the sample dimensions, we always take n = 50 and p = 00,200,500,000. 0
11 4. Example I: Block Structure Analogous to the model, modified slightly here, emerged in [4] who synthesized n samples X l N (0,Σ 0 for a block diagonal population covariance matrix Σ 0 R p p, we will use K (= 5,0,20 blocks of random sizes, and each block is generated by v v where the entries of v are drawn i.i.d. from the uniform distribution on [,]. Evidently, the rank of Σ 0 produced in the way is K. Corresponding MATLAB code of generating X l is X l = Σ /2 0 r andn(p,, l =,2,,n, thereby deriving the sample covariance matrix Σ n = n ( Xl X ( X l X. n l= What is worth mentioning is that if Σ 0 is a positive definite matrix, the solution our method obtains would be a positive definite matrix with full rank. But fortunately, compared to some largest singular values of Σ 0, the left are relatively small so that can be ignored. Here, therefore, we consider the approximate rank (4. In addition, we say the sparsity of a matrix A = ( a i j R n p by sp(a = Card{ (i, j : a i j 0 } n p. Figure : Covariance estimation with K = 5, λ = 0.5,τ = 0.2 and p = 200. In the row above, F PR = 0.092,T PR =, whilst F PR = 0.08,T PR = in the row below.
12 Apart from the approximate ar and the sparsity sp of the sparse and low-rank semidefinite estimator ˆΣ = ( σ i j i,j p, we also take advantage of other two types of errors to show the selection performance of our proposed method ADMM: F PR := Card{ (i, j : σ 0i j 0& σ i j = 0 } { Card { Card (i, j : σ0i j 0& σ i j 0 } }, T PR := (i, j : σ i j = 0 Card { (i, j : σ i j 0 }, where F PR stands for the false positive rate, which means the rate of significant variables that are unselected over the whole zero entries, and T PR denotes the true positive rate, which implies the ratio of significant variables that are selected over the entire none zero elements. For more visualized purpose, we plot the Population Covariance, Sample Covariance an the Recovered Covariance. From Figures and 2, the left Population Covariance is Σ 0, the median Sample Covariance stands for Σ n and the right Recovered Covariance denotes ˆΣ. The yellow region stands for the sparse area in which the values are quite close (or most of them equal to zero, while the green zone is the place where the none zero entries locate. Moreover the deeper the green color is, the larger the value stands. Evidently the recovered covariance matrices are quite dependent on the sample covariance matrix. Figure 2: Covariance estimation with K = 5, λ = 0.5,τ = 0.2 and p = 500. In the row above, F PR = 0.098,T PR = 0.99, whilst F PR = 0.07,T PR = in the row below. 2
13 We then report average results over 00 runs. Timing (in seconds was carried out on a CPU 2.6GHz desktop. In computation, the sparse parameter λ and the low-rank parameter τ are given in corresponding stimulations respectively. Σ 0 ia taken as the zero matrix. As we mentioned before the rank of the produced Σ 0 actually is given, i.e., r (Σ 0 = ar (Σ 0 = K. Table 2 shows the performance of our approach under different dimensions p = 00,200,500,000 and various blocks K. Obviously, all the ar ( ˆΣ nearly tends to the true rank K. The values of F PR and T PR are quite desirable, which manifests that the selection performance of ADMM is very well. Moreover, the CPU time reveals the method runs relatively fast. In addition, with the K rising, for example p = 500, the F PR is decreasing while F PR is increasing, and the time spent by the method is also ascending. In addition, for small dimension such as p = 200 and large block such as K = 50, ADMM performs unstably because various stimulations can not be recovered. Table 2: The performance of ADMM under different dimensions p and blocks K. n = 50, ar (Σ 0 = K, λ = 0.25, τ = 0.5, Σ 0 = 0 p ar ( ˆΣ sp(σ 0 sp( ˆΣ F PR T PR Time K = K = K = K = To simply observe the performance of our proposed method under different parameters λ,τ and initialized Σ 0, we fix n = 50, p = 200 and ar (Σ 0 = K = 5. From Table 3, results of left columns of ar ( ˆΣ,F PR,T PR and Time are generated from the initialized Σ 0 = 0, and results of right columns are produced with Σ 0 = Σ st. One can easily to discern that when λ = 0.05 and Σ 0 = Σ st, the performance of the method is relatively bad regardless of what τ is taken due to the ar ( ˆΣ and T PR are undesirable. By contrast, when λ = 0.25 and 0.50, it behaves much better, particularly when τ = Moreover, with the increasing of λ, the rate F PR of significant variables that are unselected is rising, even though the percentage T PR of significant variables that are selected is ascending as well. By comparing the effectiveness of those two initialized point Σ 0 = 0 and Σ st, as shown in the table, ar ( ˆΣ,F PR,T PR and Time generated from Σ 0 = 0 are basically same, which means method with zero starting point 3
14 performs more stable. But when λ = 0.25 and 0.50, ADMM with Σ 0 = Σ st generates larger T PR than that from Σ 0 = 0, moreover it needs less computational time in all stimulations regardless of the parameters. Table 3: The performance of ADMM under distinct parameters λ,τ and initialized Σ 0. n = 50, p = 200, ar (Σ 0 = K = 5 λ τ ar ( ˆΣ F PR T PR Time Example II: Banded Structure In this part we consider the population covariance matrix with banded structure which has been emerged in [, 3, 8]. To be more exact, the population covariance matrix Σ 0 = (σ 0i j i,j p R p p has the following formula { σ 0i j = max } ( 0 i j, 0 = 0 i j. + We first report average results over 00 replicators and take the sparse parameter λ = 0.5 and the low-rank parameter τ = 0.75 respectively. Information listed in Table 4 shows the performance of our approach under different dimensions p = 00,200,500,000 and two distinct starting point Σ 0 = 0 and Σ st. As we can discern in Table 4, compared with Σ 0, the ar ( ˆΣ and sp( ˆΣ are relatively small, and the former ascends while the latter descends with the rise of p. In addition, in Example 4. the block structured Σ 0 whose r ank(σ 0 = ar (Σ 0 = K leads to the estimator the rank of ˆΣ is also close to K. Being distinct with that, in this example, the ar (Σ 0 increases with the dimension p and is not lowrank, but the recovered solution ˆΣ has been rendered the relatively low-rank property. The values of F PR and T PR are both quite desirable, which manifests that the selection performance of ADMM is very well in this example. Moreover, the CPU time reveals the method runs extremely fast as well. In addition, under such parameters Zλ = 0.5, τ = 0.75, ADMM behaves nearly identically even though the starting point Σ 0 are different. 4
15 Table 4: The performance of ADMM over 00 simulations under different dimensions p and Σ 0. n = 50, λ = 0.5, τ = 0.75 p ar (Σ 0 ar ( ˆΣ sp(σ 0 sp( ˆΣ F PR T PR Time Σ 0 = Σ 0 = Σ st Figure 3: Covariance estimation with λ = 0.5,τ = 0.75 and p = 00. In the row above, F PR = 0.062,T PR =, whilst F PR = 0.054,T PR = in the row below. Then from Figure 3, one can check that the recovered covariance matrices ˆΣ are quite dependent on the sample covariance matrix Σ n, and the selection performance are relatively well because F PR is pretty small while T PR is close to. 5
16 To simply observe the behavior of ADMM under different parameters λ,τ and initialized Σ 0, we fix n = 50, p = 200 and ar (Σ 0 = 76. As indicated in Table 5, results of left columns of ar ( ˆΣ,F PR,T PR and Time are generated from the initialized point Σ 0 = 0, and results of right columns are produced with Σ 0 = Σ st. It is clear that the CPU time cost by the method with starting point Σ st is almost less than that from zero initialization. With the λ rising, TPR generated by the method with Σ 0 = Σ st is increasing to, while that with Σ 0 = 0 basically stabilizes at In terms of the ar ( ˆΣ, the proposed method with starting point Σ st will not create a lower rank solution, comparing with Σ 0 = 0. The reason for this phenomenon probably is that Σ st is a sparse point but not low rank; from (26 if Σ st is an approximately semidefinite positive matrix, algorithm will stop after a few iterations which results in the solution is not a desired low-rank one. Table 5: The performance of ADMM under distinct parameters λ,τ and initialized Σ 0. n = 50, p = 200, ar (Σ 0 = 76 λ τ ar ( ˆΣ F PR T PR Time Conclusion We have acquired a positive semidefinite estimator, being simultaneously sparse and low-rank, from samples of the covariance matrices through utilizing l norm and nuclear norm penalties. The theoretical properties manifest that in high-dimensional settings the estimator we have constructed performs very well. Meantime, the efficient ADMM with global convergence has possessed several merits illustrated by the numerical simulations, such as less computational time and beautiful recovered effectiveness. 6
17 Acknowledgement The work was supported in part by the National Basic Research Program of China (200CB73250, the National Natural Science Foundation of China (708, 72702, References [] Bickel, P. and Levina, E.: Regularized estimation of large covariance matrices. Ann. Statist. 36, (2008. [2] Bickel, P. and Levina, E.: Covariance regularization by thresholding. Ann. Statist. 36, (2008. [3] Cai, T. and Liu, W.: Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 06, -3 (20. [4] Cai, T., Zhang, C.: and Zhou, H., Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38, (200. [5] Cai T T, Yuan M.: Adaptive covariance matrix estimation through block thresholding. Ann.Statist. 40, (202. [6] Cai, T. and Zhou, H.: Minimax estimation of large covariance matrices under l -norm. Statist. Sinica 22, (202. [7] Cai, T. and Zhou, H.: Optimal rates of convergence for sparse covariance matrix estimation. Ann.Statist. 40, (202. [8] El Karoui, N.: Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Statist. 36, (2008. [9] Friedman, J., Hastie, T. and Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso. Biostat. 9, 432 (2008. [0] Furrer, R. and Bengtsson, T.: Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal. 98, (2007. [] Johnstone, I.: On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist., (200. [2] Liu H, Wang L, Zhao T.: Sparse Covariance Matrix Estimation with Eigenvalue Constraints. Journal of Computational and Graphical Statistics, (203 accepted. 7
18 [3] Oymak, S., Jalali, A., Fazel, M., Eldar, Y. C. and Hassibi, B.: Simultaneously structured models with application to sparse and low-rank matrices. arxiv preprint arxiv: (202 to appear. [4] Richard, E., Savalle, P. A., and Vayatis, N.: Estimation of simultaneously sparse and low rank matrices. arxiv preprint arxiv: (202 to appear. [5] Rothman, A.: Positive definite estimators of large covariance matrices. Biometrika 99, (202. [6] Rothman, A., Levina, E., and Zhu, J.: Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 04, (2009. [7] Wu, W. and Pourahmadi, M.: Nonparametric estimation of large covariance matrices of longitudinal data. Biomet. 90, (2003. [8] Xue, L., Ma, S. and Zou, H.: Positive Definite l Penalized Estimation of Large Covariance Matrices. J. Amer. Statist. Assoc., 07, (202. [9] Yuan X.M.: Alternating direction method of multipliers for covariance selection models. Journal of Scientific Computing 5, (202. Appendix Proof of Lemma 3. For convenience, we denote that g (Γ := τ Γ + I (Γ 0. Clearly, g : R p p R is a convex function. Since (ˆΓ, ˆΣ is an optimal solution of (5, which satisfies the following KKT conditions: ˆΛ ( τ ˆΓ + I (ˆΓ 0 = g (ˆΓ, (27 ( ˆΛ ˆΣ + Σ n ˆΣ, (28 λ ˆΣ = ˆΓ. (29 Note that the optimality conditions for the first subproblem in ADMM, i.e., the subproblem with respect to Γ in (6, are given by Λ k µ (Γk+ Σ k g (Γ k+, (30 this together with (8, i.e., Λ k = Λ k+ + µ ( Γ k+ Σ k+, we have Λ k+ µ (Σk+ Σ k g (Γ k+, (3 8
19 Combining (27 and (3 and using the fact that g ( is a monotone operator, we get ˆΓ Γ k+, ˆΛ Λ k+ + µ (Σk+ Σ k 0. (32 The optimality conditions for the second subproblem (i.e., the subproblem with respect to Σ in (7 are given by λ ( Λ k Σ k+ + Σ n µ (Σk+ Γ k+ Σ k+, (33 this together with (8, i.e., Λ k = Λ k+ + µ λ ( Γ k+ Σ k+, we have ( Λ k+ Σ k+ + Σ n Σ k+, (34 Similarly, combining (28 and (34, using the fact that is a monotone operator, we get ˆΣ Σ k+, ˆΛ + Λ k+ ( ˆΣ Σ k+ 0, (35 The summation of (32 and (35 gives ˆΣ Σ k+ 2 F ˆΣ Σ k+, ˆΛ + Λ k+ + ˆΓ Γ k+, ˆΛ Λ k+ + µ ˆΓ Γ k+,σ k+ Σ k = ˆΣ Σ k+, ˆΛ + Λ k+ + ˆΣ Σ k+ + µ(λ k+ Λ k, ˆΛ Λ k+ + µ ˆΣ Σ k+ + µ(λ k+ Λ k,σ k+ Σ k, (36 where the equality because of Γ k+ = Σ k+ µ(λ k+ Λ k and ˆΣ = ˆΓ. Simple algebraic derivation from (36 yields the following inequality: ˆΣ Σ k+ 2 F Λk+ Λ k,σ k+ Σ k µ ˆΛ Λ k+,λ k+ Λ k + µ ˆΣ Σ k+,σ k+ Σ k, (37 Rearranging the right hand side of (37 using ˆΛ Λ k+ = ˆΛ Λ k + Λ k Λ k+ and ˆΣ Σ k+ = ˆΣ Σ k + Σ k Σ k+, then (37 can be reduced to µ ˆΛ Λ k,λ k+ Λ k + µ ˆΣ Σ k,σ k+ Σ k µ Λ k+ Λ k 2 F + µ Σk+ Σ k 2 F + ˆΣ Σ k+ 2 F Λk+ Λ k,σ k+ Σ k, Using the notation of V k and ˆV, the inequality above can be rewritten as ˆV V k,v k+ V k H V k+ V k 2 H + ˆΣ Σ k+ 2 F Λk+ Λ k,σ k+ Σ k, (38 9
20 Combining (38 with the following identity we get ˆV V k+ 2 H = V k+ V k 2 H 2 ˆV V k,v k+ V k H + ˆV V k 2 H, ˆV V k 2 H ˆV V k+ 2 H = 2 ˆV V k,v k+ V k H V k+ V k 2 H 2 V k+ V k 2 H + 2 ˆΣ Σ k+ 2 F 2 Σk+ Σ k,λ k+ Λ k V k+ V k 2 H = V k+ V k 2 H + 2 ˆΣ Σ k+ 2 F 2 Σk+ Σ k,λ k+ Λ k, (39 Now, using (34 for k instead of k +, we get, λ ( Λk Σ k + Σ n Σ k, (40 Combining (34,(40 and using the fact that is a monotone function, we obtain, Σ k+ Σ k, Λ k+ + Λ k (Σ k+ Σ k 0, which means Σ k+ Σ k,λ k+ Λ k Σ k+ Σ k 2 F 0, (4 By substituting (4 into (39, we get the desired result (25. Proof of Theorem 3.2 From Lemma 3., we can get that (a V k+ V k 2 H 0; (b {V k } lies in a compact region; (c V k+ ˆV 2 H is monotonically non-increasing and thus converges. Connecting with notations of H,V and (a, it holds that Λ k+ Λ k 0 and Σ k+ Σ k 0, which together with (8 imply that Γ k+ Γ k 0 and Σ k Γ k 0. From (b, {V k } has a subsequence {V k j } that converges to ˇV = ( ˇΛ, ˇΣ, i.e., Λ k j ˇΛ and Σ k j ˇΣ. Also we have Γ k j ˇΓ(:= ˇΣ from Σ k Γ k 0. Therefore, (ˇΓ, ˇΣ, ˇΛ is a limit point of { (Γ k,σ k,λ k }. Note that (30 and (33 respectively imply that those together with ˇΓ = ˇΣ, it reduces that ˇΛ + µ (ˇΓ ˇΣ g (ˇΓ, λ ( ˇΛ ˇΣ + Σ n µ ( ˇΣ ˇΓ ˇΣ, ˇΛ g (ˇΓ, (42 ( ˇΛ ˇΣ + Σ n ˇΣ, (43 λ (42,(43 and ˇΓ = ˇΣ mean that (ˇΓ, ˇΣ, ˇΛ is an optimal solution to (5. Therefore, we showed that any limit point of { (Γ k,σ k,λ k } is an optimal solution to (5. 20
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationRegularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008
Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)
More informationHigh Dimensional Covariance and Precision Matrix Estimation
High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance
More informationTuning Parameter Selection in Regularized Estimations of Large Covariance Matrices
Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationPosterior convergence rates for estimating large precision. matrices using graphical models
Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department
More informationSparse Covariance Matrix Estimation with Eigenvalue Constraints
Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationarxiv: v1 [math.st] 31 Jan 2008
Electronic Journal of Statistics ISSN: 1935-7524 Sparse Permutation Invariant arxiv:0801.4837v1 [math.st] 31 Jan 2008 Covariance Estimation Adam Rothman University of Michigan Ann Arbor, MI 48109-1107.
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationSparse Permutation Invariant Covariance Estimation
Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan, Ann Arbor, USA. Peter J. Bickel University of California, Berkeley, USA. Elizaveta Levina University of Michigan,
More informationSparse estimation of high-dimensional covariance matrices
Sparse estimation of high-dimensional covariance matrices by Adam J. Rothman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationTuning-parameter selection in regularized estimations of large covariance matrices
Journal of Statistical Computation and Simulation, 2016 Vol. 86, No. 3, 494 509, http://dx.doi.org/10.1080/00949655.2015.1017823 Tuning-parameter selection in regularized estimations of large covariance
More informationRegularized Parameter Estimation in High-Dimensional Gaussian Mixture Models
LETTER Communicated by Clifford Lam Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models Lingyan Ruan lruan@gatech.edu Ming Yuan ming.yuan@isye.gatech.edu School of Industrial and
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationA Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem
A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a
More informationLatent Variable Graphical Model Selection Via Convex Optimization
Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationAdaptive estimation of the copula correlation matrix for semiparametric elliptical copulas
Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work
More informationHomework 1. Yuan Yao. September 18, 2011
Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationExact Low-rank Matrix Recovery via Nonconvex M p -Minimization
Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationSparse PCA in High Dimensions
Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and
More informationDiscussion of High-dimensional autocovariance matrices and optimal linear prediction,
Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationComputationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor
Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationEstimation of Graphical Models with Shape Restriction
Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. kchiong@usc.edu
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationSparse Permutation Invariant Covariance Estimation: Final Talk
Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2
More informationA Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices
A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationarxiv: v1 [stat.me] 16 Feb 2018
Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University
More informationResearch Article Solving the Matrix Nearness Problem in the Maximum Norm by Applying a Projection and Contraction Method
Advances in Operations Research Volume 01, Article ID 357954, 15 pages doi:10.1155/01/357954 Research Article Solving the Matrix Nearness Problem in the Maximum Norm by Applying a Projection and Contraction
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.11
XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics
More informationTighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time
Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Zhaoran Wang Huanran Lu Han Liu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08540 {zhaoran,huanranl,hanliu}@princeton.edu
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationTHE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR
THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},
More informationAnalysis of Robust PCA via Local Incoherence
Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationSupplementary Material of A Novel Sparsity Measure for Tensor Recovery
Supplementary Material of A Novel Sparsity Measure for Tensor Recovery Qian Zhao 1,2 Deyu Meng 1,2 Xu Kong 3 Qi Xie 1,2 Wenfei Cao 1,2 Yao Wang 1,2 Zongben Xu 1,2 1 School of Mathematics and Statistics,
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationAdaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise
Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,
More informationarxiv: v1 [math.ra] 11 Aug 2014
Double B-tensors and quasi-double B-tensors Chaoqian Li, Yaotang Li arxiv:1408.2299v1 [math.ra] 11 Aug 2014 a School of Mathematics and Statistics, Yunnan University, Kunming, Yunnan, P. R. China 650091
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationarxiv: v2 [math.st] 2 Jul 2017
A Relaxed Approach to Estimating Large Portfolios Mehmet Caner Esra Ulasan Laurent Callot A.Özlem Önder July 4, 2017 arxiv:1611.07347v2 [math.st] 2 Jul 2017 Abstract This paper considers three aspects
More informationEE 227A: Convex Optimization and Applications October 14, 2008
EE 227A: Convex Optimization and Applications October 14, 2008 Lecture 13: SDP Duality Lecturer: Laurent El Ghaoui Reading assignment: Chapter 5 of BV. 13.1 Direct approach 13.1.1 Primal problem Consider
More informationSemidefinite Programming
Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationLog Covariance Matrix Estimation
Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed
More informationarxiv: v1 [stat.co] 22 Jan 2019
A Fast Iterative Algorithm for High-dimensional Differential Network arxiv:9.75v [stat.co] Jan 9 Zhou Tang,, Zhangsheng Yu,, and Cheng Wang Department of Bioinformatics and Biostatistics, Shanghai Jiao
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationMinimax Rate-Optimal Estimation of High- Dimensional Covariance Matrices with Incomplete Data
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 9-2016 Minimax Rate-Optimal Estimation of High- Dimensional Covariance Matrices with Incomplete Data T. Tony Cai University
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationComputational Statistics and Data Analysis. SURE-tuned tapering estimation of large covariance matrices
Computational Statistics and Data Analysis 58(2013) 339 351 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
More informationRECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS. December
RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS MIN TAO AND XIAOMING YUAN December 31 2009 Abstract. Many applications arising in a variety of fields can be
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationThe Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent
The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Caihua Chen Bingsheng He Yinyu Ye Xiaoming Yuan 4 December, Abstract. The alternating direction method
More informationAn algorithm for the multivariate group lasso with covariance estimation
An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationSparse orthogonal factor analysis
Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In
More informationEstimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation
Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of
More informationRecovery of Simultaneously Structured Models using Convex Optimization
Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)
More informationGeneralized Power Method for Sparse Principal Component Analysis
Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work
More informationConditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint
Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity
More informationRobust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition
Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis
More informationLow Rank Matrix Completion Formulation and Algorithm
1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9
More informationSparse Approximation via Penalty Decomposition Methods
Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems
More information(k, q)-trace norm for sparse matrix factorization
(k, q)-trace norm for sparse matrix factorization Emile Richard Department of Electrical Engineering Stanford University emileric@stanford.edu Guillaume Obozinski Imagine Ecole des Ponts ParisTech guillaume.obozinski@imagine.enpc.fr
More informationSupplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data
Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationHigh-dimensional two-sample tests under strongly spiked eigenvalue models
1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data
More informationTwo Results on the Schatten p-quasi-norm Minimization for Low-Rank Matrix Recovery
Two Results on the Schatten p-quasi-norm Minimization for Low-Rank Matrix Recovery Ming-Jun Lai, Song Li, Louis Y. Liu and Huimin Wang August 14, 2012 Abstract We shall provide a sufficient condition to
More informationAugmented Lagrangian alternating direction method for matrix separation based on low-rank factorization
Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization Yuan Shen Zaiwen Wen Yin Zhang January 11, 2011 Abstract The matrix separation problem aims to separate
More informationMulti-stage convex relaxation approach for low-rank structured PSD matrix recovery
Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationLow-Rank Factorization Models for Matrix Completion and Matrix Separation
for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More information