Estimation of Graphical Models with Shape Restriction

Size: px
Start display at page:

Download "Estimation of Graphical Models with Shape Restriction"

Transcription

1 Estimation of Graphical Models with Shape Restriction BY KHAI X. CHIONG USC Dornsife INE, Department of Economics, University of Southern California, Los Angeles, California 989, U.S.A. AND HYUNGSIK ROGER MOON Department of Economics, USC Dornsife INE, University of Southern California, Los Angeles, California 989, U.S.A., and School of Economics, Yonsei University, Seoul, Korea. SUMMARY Gaussian graphical models are recently used in economics to obtain networks of dependence among agents. he standard estimator is the Graphical Lasso (), which amounts to a maximum likelihood estimation regularized using the L, matrix norm on the precision matrix Ω. he L, norm is a lasso penalty that controls for sparsity, or the number of zeros in Ω. We propose a new estimator called Structured Graphical Lasso (S) that uses the L, mixed norm instead of the L, lasso norm. he use of the L, penalty controls for the structure of the sparsity in Ω. We show that S is asymptotically equivalent to an infeasible problem which prioritizes the sparsity-recovery of high-degree nodes. Monte Carlo simulation shows that S outperforms in terms of estimating the overall precision matrix and in terms of estimating the structure of the graphical model. In an empirical application to a classic firms investment dataset, we obtain a network of firms dependence that exhibits the core-periphery structure, with General Motors, General Electric and U.S. Steel forming the core group of firms. Date: April 7, 7 Some key words: Gaussian graphical models; Glasso; Inverse covariance matrices; Lasso; Precision matrices; Sparsity. INRODUCION he Gaussian graphical model is a graph summarizing the conditional independence relationships among a group of random variables. wo nodes i and j in the graph are not linked if and only if i and j are statistically independent conditional on all other nodes besides i and j. In As an example, consider the setting of our empirical application: the variables (X,..., X p ) represent the annual investment of the p number of firms: (,..., p). he goal is to recover a graph (the graphical model) that describes the pairwise conditional independence relationships among the random variables (X,..., X p ). Suppose that two firms i and j are linked in this graphical model, it means that the investment decisions of firms i and j are dependent on each other, conditional on all other firms X k, k i, j. Hence the investment decisions of firms i and j have direct effects on each other, without the mediation of other firms. Graphical model is a popular tool in biostatistics that has seen recent applications in economics, for instance, Giudici and Spelta () uses graphical modeling to obtain the network of international financial flows. In our own empirical application here, we use graphical models

2 to obtain the network of dependence among firms investment decision, using a classic dataset of firms investment. Interestingly, we recover a core-periphery network structure among the firms, with General Motors, General Electric and U.S. Steel forming the core group of firms. In a coreperiphery network, there is a core group of nodes that all other nodes (including other nodes in the core) are linked to. In addition, the periphery nodes do not have links to each other, they only have links to the core. A widely-used technique to estimate graphical models is called Graphical Lasso () which aims to estimate the precision matrix Ω while imposing a sparsity constraint on Ω. o see the intuition, suppose that (X,..., X p ) is distributed with N (, Σ ). If the (i, j)-th entry of the precision matrix Σ Ω, then it is known that X i and X j are independent conditional on all other X k, k i, j. he sparsity constraint is needed, as otherwise the sample precision matrix produces a dense matrix without any zeros, and hence does not identify the non-links in the graphical model. In this note, we propose an estimation method for graphical models, which works by estimating the precision matrix, Ω, taking into account the structure of sparsity in Ω. he standard approach estimates the precision matrices while controlling for the sparsity of Ω, i.e. the number of zeros in Ω. However it does not consider when these zeros could be distributed in a certain pattern or structure in Ω. For instance, in many economic applications it is reasonable to think that in the graphical model, some nodes have very few links, while some hub-like nodes have many links to other nodes. Such settings are common in economic applications: for instance, we commonly observe the so-called small-world properties in social and economic networks (Jackson (8)). We adapt the well-known method (e.g., Banerjee et al. (8); Friedman et al. (8); Yuan and Lin (7); Fan et al. ()), to estimate a precision matrix, taking into account the hidden structure of its sparsity. he main feature of our estimation method is to use the L, -norm penalty, which corresponds to the second moment of the weighted degree of nodes, i.e. p k= Ω ik, for i =,..., p. Notice that Lasso uses the L, -norm penalty, which is proportional to the first moment of the weighted degree distribution. We derive the asymptotic distribution of the new estimator, called Structured Graphical Lasso (S) when the sample size and the dimension of Ω, p, is fixed. he main theoretical finding is that S is asymptotically equivalent to an infeasible problem which penalizes an entry of Ω proportionally according to the influence of the entry, where the influence of an (i, j)-th entry is defined as the sum of the actual weighted degrees of nodes i and j in the graphical model. Specifically, S is asymptotically equivalent to a modified where the element-by-element penalty factor for the (i, j)-th entry of Ω is proportional to (d i + d j ), where d i = p k= Ω,ik is the true weighted degree of node i. Note that d i is unknown to us, and hence it is infeasible to be implemented as a estimator. his has two implications. First, S is similar to in the sense that both achieve sparsity the non-differentiability of the penalty function at zeros results in entries that are exactly estimated to be zero. Second, S differs from in the sense that it prioritizes recovering the zeros of two high-degree nodes. For instance going back to the example of network of dependence among firms, consider two firms i, j with high weighted degrees. If Ω,ij =, then S tends to estimate ˆΩ ij = correctly with higher probability compared to (Figure ). Nodes with higher weighted degrees can be viewed as being more influential and important, and we might be more interested in uncovering the relationships between more influential agents.

3 In practice, we can also combine both S and by using both L, and L, norms as penalty terms, essentially controlling for both the first and the second moments of the weighted degree distribution. We show by means of simulations showing that S performs better than. hat is, S achieves lower overall Kullback-Leibler and Frobenius losses in terms of estimating the overall precision matrix. In addition, in terms of recovering the structure of the graphical model, we also show that S outperforms. he paper is organized as follows. Section introduces the model, the L, -norm penalty, and a motivating example. Section presents asymptotic results. Section contains Monte Carlo simulation results. Section contains an empirical application. he appendix contains technical proofs and derivations.. Related literature his paper is related to literature of estimating high-dimensional inverse covariance matrices. An overview of this topic can be found in Cai et al. (); Fan et al. (). he standard sample covariance estimator performs poorly when the dimension of the true covariance matrix is high relative to the sample size. One approach is to impose a sparsity constraint on the precision matrix, and estimate the sparse precision matrix by maximizing a penalized likelihood. Our estimator follows this approach. A different class of procedure frames the estimation of the precision matrix as nodewise or pairwise regressions, where the Lasso or Dantzig selector can then be used to achieve variable selection and high-dimensional regularization. he relevant papers belonging to this class are Cai et al. (, ); Meinshausen and Bühlmann (); Peng et al. (); Ren et al. (). A recent approach to estimating high-dimensional Gaussian graphical models is the Innovated Scalable Efficient Estimation proposed by Fan and Lv (). Instead of estimating the inverse covariance through penalized likelihoods, they convert the original problem into that of estimating a large covariance matrix. Another type of mixed norm that is widely used in the high-dimensional linear regression setting is the L, -norm, also known as the Group Lasso (Friedman et al. (); Yuan and Lin ()), which is useful when parameters are naturally partitioned into disjoint groups, and we would like to achieve sparsity with respect to whole groups. In contrast, our proposed L, -norm is distinct from the L, -norm, and has not been studied in the context of covariance estimation.. SEUP Let X = (X,..., X p ) R p be a p-dimensional vector distributed according to a multivariate distribution with mean zero and covariance matrix Σ. Our goal is to obtain the graphical model corresponding to X, which is defined as follows. he (Gaussian) graphical model of X is an undirected graph G = (V, E). he set of nodes (or vertices) of this graph is V = {,..., p}. Each node i corresponds to a random variable X i. he set of links (or edges) E is such that {i, j} / E if and only if X i X j conditional on all X k, k V \ {i, j}. hat is, the graphical model is defined such that there is no link between nodes i and j in the graph if and only if X i and X j are independent conditional on all other nodes besides i and j. As such the graphical model G summarizes the pairwise conditional independence relationships among (X,..., X p ). he existence of a link in G amounts to conditional dependence between the two nodes given all other ones. Estimation of graphical models is based on the following known fact: if X is Gaussian, then the non-zeros in its precision matrix (or inverse covariance matrix) Ω := Σ corresponds ex-

4 actly to links in the graphical model (see Hastie et al. (); Whittaker (9)). hat is, there is a link between nodes i and j if and only if the (i, j)-th entry of Ω is non-zero. Similarly the lack of a link {i, j} between a pair of nodes i and j is equivalent to the (i, j)-th entry of Ω being zero. Now suppose we observe i.i.d samples X t, t =,..., from X = (X,..., X p ). We now introduce the estimator, as well as our proposed S estimator, that allows us to recover the graphical model from the samples. he output of these estimators is a sparse precision matrix ˆΩ. From ˆΩ, the estimated graphical model is then constructed by including a link between nodes i and j if and only if ˆΩ i,j. For a m n matrix A, define the mixed L p,q norm as follows: ( n m ) q/p /q A p,q = a ij p. j= i= he following L, norm in equation () will play a crucial role in our estimator: ( n m ) / A, = a ij. () j= Our Structured- (S) estimator is defined in equation () below, where ˆΣ is the sample covariance matrix calculated from the sample (X t ) t= {. } ˆΩ sglasso := argmax Ω i= log det Ω tr(ωˆσ) λ Ω,. () In comparison, the estimator (Banerjee et al. (8); Friedman et al. (8); Yuan and Lin (7); Fan et al. ()) is defined in equation (). While S uses the L, mixed norm as a penalty to the likelihood, in equation () uses the L, norm, which is the familiar lasso penalty (sum of the absolute values of the entries of Ω). { ˆΩ glasso := argmax Ω log det Ω tr(ωˆσ) λ Ω, }. () Another definition of the is one where only Ω is penalized, where Ω denotes the matrix Ω where its diagonal entries are set to zero. While Yuan and Lin (7); Rothman et al. (8) use equation (), other authors such as Banerjee et al. (8); Friedman et al. (8) use the latter. For various expositional reasons, we will penalize Ω instead of Ω. However in the Monte Carlo simulation, we will additionally show results for the variants of both and S where the diagonals are not penalized. Although we are comparing S with in this paper, we could also combine them together. hat is, ˆΩ := argmax Ω { log det Ω tr(ωˆσ) λ Ω, λ Ω,}, and use cross-validation procedures to tune the parameters λ and λ. We do not require that the data-generating process be Gaussian. herefore our estimator can be seen as maximizing a Gaussian quasi-likelihood subject to regularization. When there is no penalty (λ = ), then ˆΩ defined in both () and () correspond to the Quasi Maximum Likelihood estimator of the inverse covariance matrix, which is given by the inverse of the empirical sample When the DGP is non-gaussian, the graphical model estimated using or S is still useful it corresponds to the structure of pairwise conditional correlations (partialling out other variables).

5 covariance matrix. It is well-known that the unpenalized sample estimator behaves poorly, and it is unsuitable for graphical modeling when zeros of the estimates are important.. Motivating Example We illustrate the difference between the L, and the conventional L, lasso norms. Define Ω and Ω as the precision matrices given by equations () and () below. Ω = () Ω = he precision matrices Ω and Ω give rise respectively to the graphical models in Figure and Figure below, where links in the graphs represent non-zero entries in the precision matrices. () Fig. : Ω : star graph Fig. : Ω : AR() model. Both the standard Lasso and Frobenius norms do not distinguish between Ω and Ω, while our proposed L, -norm does. he reason is clear: both Ω and Ω have the same number of zeros. hey have the same sparsity but the structure of this sparsity is very different. More concretely, for the L, lasso norm, we have: L, (Ω ) = L, (Ω ) = p j= p i j + = 8 +. Similarly, for the L, Frobenius norm, we have L, (Ω ) = L, (Ω ). Now on the other hand, the L, -norm can distinguish between these two structures. he L, norm evaluated at Ω for the star graph is Ω,.. While the L, norm evaluated at Ω for the AR() graph is Ω,.9.. Remarks Lam and Fan (9) study a very general { class of estimators that nests the. In particular, they consider argmax Ω log det Ω tr(ωˆσ) i j p λ( Ω ij ) }, where p λ ( ) is a penalty function that depends on a regularization parameter λ. his general framework nests many estimators including the conventional (by setting p λ ( ω ) = λ ω ). he S estimator does not belong to this class because Ω, is not additively separable in the entries of Ω. Hence, S cannot be analyzed using the framework proposed in Lam and Fan (9). It is possible to incorporate other penalty functions into our estimator. Consider: ˆΩ sc-glasso := { argmax Ω log det Ω tr(ωˆσ) pλ (Ω),}, where pλ (Ω) denotes the matrix (of the same dimension as Ω) such that p λ (Ω) ij = p λ (Ω ij ). For instance, we can then let p λ ( ) to be the Smoothly Clipped Absolute Deviation (SCAD) penalty function (Fan and Li ()). For instance, when p/ c >, and the samples are drawn i.i.d. from a multivariate Gaussian distribution, we know from Johnstone () that neither the eigenvalues nor the eigenvectors of the sample covariance matrix are consistent estimators of the true covariance matrix.

6 . HEOREICAL PROPERIES In this section we derive the limiting distribution of S when the number of samples goes to infinity while p is fixed (following Yuan and Lin (7)). he main finding of this section is that S is asymptotically equivalent to an infeasible problem which prioritizes the sparsity-recovery of high-degree nodes. More precisely, S is asymptotically equivalent to a modified estimator where each row of Ω is penalized proportionally according to its true degree. he true degree of row (or node) i is defined by d i = p j= Ω,ij. his asymptotic equivalence result has two implications. First, S inherits the sparsity-discovery property of where zeros in the precision matrix are estimated precisely to be zero. herefore S estimates are also sparse. Secondly, S differs from in that it seeks out higher degrees nodes and prioritizes recovering the zeropattern associated with these nodes. o derive the limiting distributions of our S estimator, we assume a high level assumption on the weak limit of the sample covariance matrix. Assumption. We assume that vech ( (ˆΣ Σ )) vech (W ) N(, Λ), where the notation vech is the half-vectorization operator that takes only the lower-triangular part of a symmetric matrix. his high level condition holds when X j,t X k,t have higher moments and the serial dependence is weak. When X t i.i.d. N(, Σ ), then (vech (ˆΣ Σ )) vech (W ) N(, Λ), where Λ is such that Cov(w ij, w i j ) = Cov(X i,tx j,t, X i,tx j,t). Now let D p be a p p matrix whose (i, j) entry is d j p k= Ω,jk, the true weighted degree of node j. Define ˆΩ Dp = argmax{log det Ω tr(ωˆσ) λ D p Ω, } () Ω to be a variant of the estimator. Here, the operator is the element-wise multiplication. he S estimator is defined as: ˆΩ λ = argmax Ω {log det Ω tr(ωˆσ) λ Ω, }. HEOREM. Suppose that as, λ λ. hen, (ˆΩλ Ω ) = (ˆΩ Dp Ω ) + o p (). In addition, suppose that Assumption holds. hen, (ˆΩλ Ω ), and (ˆΩ Dp Ω ) argmin U R p p V (U), where V (U) = tr(uσ UΣ ) + tr(uw ) + λ i= j= g(u ij )d j (7) as. Here Σ = Ω is the true covariance matrix of X, and g(u ij ) := u ij sign(ω ij )(Ω ij ) + u ij (Ω ij = ).

7 Compared to the limit in heorem, the estimator has the following limiting distribution (Yuan and Lin (7)) : where argmin V glasso (U) U R p p V glasso (U) = tr(uσ UΣ ) + tr(uw ) + λ i= j= 7 g(u ij ) (8) where λ λ. Comparing equations (7) and (8), we see that the S obtains the same limiting distribution as that of a modified with element-wise LASSO penalty given by D p Ω,. Since D p is constructed using the true, unknown values of Ω, S cannot be simply implemented as. Moreover from Proposition below, we can further say that S is asymptotically equivalent to a variant of where the penalty term is p p i= j= (d i + d j ) Ω ij. hat is, each entry Ω ij is given the penalty factor equals to the sum of the (true) weighted degrees of nodes i and j. PROPOSIION. Let D p be a p p matrix whose (i, j) entry is d j p k= Ω,jk. It is true that D p Ω, = p p i= j= (d i + d j ) Ω ij.. Illustrating the asymptotic distribution We consider the following true precision matrix, Ω. It is an AR() model with p =. he corresponding graph representing Ω is depicted in Figure.. Ω =..... Fig. : Graph of Ω In Figure 9 in the Appendix, we illustrate the asymptotic distribution for both and S, setting λ = in heorem. Each circle in the plot represents a draw from the distribution over ˆΩ as given by heorem. Note that the precision matrix is such that Ω, = Ω, =. For, we have ˆPr(ˆΩ = ) =.9, while for S, ˆPr(ˆΩ = ) =.. herefore, S does a better job at estimating Ω, =, where a non-link involves a node of relatively higher degree. In Figure (Appendix), we show the asymptotic distribution of ˆΩ without any penalty or regularization, which corresponds to the Maximum-Likelihood estimator of the inverse covariance matrix. Comparing the two figures, we see that both L, and L, norms are able to estimate entries of Ω to be zero exactly, while MLE without regularization fails to recovers any sparsity structure. Yuan and Lin (7) consider the estimator where the diagonals are not penalized, so that V glasso (U) = tr(uσ UΣ ) + tr(uw ) + λ p i= p j i g(u ij) in their paper.

8 8. NUMERICAL RESULS We consider different graphical models as depicted in Figures and. he models in Figure has nodes (p = ), and the models in Figure has nodes (p = ). From a given graphical model, we generate the true precision matrix Ω such that Ω,ij = if and only if there is a link between nodes i and j, otherwise we set Ω,ij =.. We set Ω,ii =. For each model, we draw independent datasets from N(, Ω ). hat is, each dataset comprises of (X t ) t=, where X t R p is randomly drawn from N(, Ω ). Here, our sample size is = when p = and = when p =. For each of the dataset, we use our estimator, as well as the, to estimate the underlying inverse covariance matrix. (a) Model (b) Model (c) Model (d) Model (e) Model Fig. : p = (a) Model (b) Model 7 (c) Model (d) Model 9 (e) Model Fig. : p = Both estimators involve choosing the λ tuning parameters. First, we present the result when cross-validation is used to tune λ. More precisely, at each replication, we estimate Ω using both S and, where the optimal λ is determined using a -folds cross-validation. During each cross-validation, we use the Kullback-Leibler (KL) loss in Equation () averaged over the two-folds to evaluate predictive accuracies. Equation (9) gives the KL loss between the estimated ˆΩ from the training set versus the estimated Ω from the validation set.

9 KL(λ ) = log det(ω) log det ˆΩ + tr(ˆωω ) p (9) 9 Model Optimal λ KL Frobenius F score (a) (b) (a) (b) (a) (b) (a) (b) S S S S () (.) (.) (.) (.97) (.) (.7) (.) (.7) () (.7) (.) (.7) (.) (.9) (.8) (.) (.7) () (.) (.) (.7) (.) (.) (.77) (.) (.) () (.) (.) (.78) (.) (.8) (.7) (.) (.) () (.) (.) (.79) (.) (.) (.) (.) (.9) () (.) (.8) (.) (.) (.99) (.9) (.) (.7) (7) (.8) (.8) (.99) (.9) (.99) (.) (.8) (.) (8) (.7) (.) (.9) (.) (.) (.) (.) (.) (9) (.) (.) (.) (.) (.8) (.) (.9) (.) () (.8) (.) (.) (.) (.8) (.) (.) (.) able : Averages and standard errors from replications. Columns (a) and (b) report the optimal λ as determined from cross-validation for (a) S and (b). Columns (a) and (b) are the average Kullback- Leibler losses. Columns (a) and (b) report the average Frobenius losses. he last two columns report the F scores, which is a metric for how accurately the graphical model is estimated. We report the simulation result in able. In the table, the (a) columns corresponds to our S estimator whereas the (b) columns refer to. In Columns (a) and (b), we report the optimal λ as determined by cross-validation, averaged across the replications. In Columns (a) and (b), we report the Kullback-Leibler loss (averaged across replications) between the S and estimates, ˆΩ, and the true precision matrix Ω given by KL(ˆΩ, Ω ) = log det(ω ) log det ˆΩ + tr(ˆωω ) p. In Columns (a) and (b), we report the average Frobenius loss between ˆΩ and Ω. In the last two columns of able, we report the accuracy of graph recovery using S and. he Kullback-Leibler loss and the Frobenius norms may not fully capture how accurately the zeros are recovered. We introduce an additional metric: F score is F = precision recall precision+recall, where precision is the ratio of true positives (P) to all predicted positives (P + FP), recall is the ratio of true positives to all actual positives (P + FN). he notations are further explained in the confusion matrix in able. Alternatively, the F score can be written as F = P P +F P +F N.

10 Predicted value is Predicted value is Actual value is rue positive (P) False negative (FN) Actual value is False positive (FP) rue negative (N) able : Confusion matrix Hence, the F score measures the quality of a binary classifier by equally balancing both the precision and the recall of a classifier. he larger the F score is, the better the classifier is. he F score is commonly used in machine learning to evaluate binary classifiers. For instance, the Yelp competition uses the F score as a metric to rank competing models. he F score is P + N P + N+F P +F N favored over the metric Accuracy = especially in our current setting where the graphical models are sparse. his is because a model that naively predicts all negatives will obtain a high Accuracy score just because there are many actual negatives, and the N term dominates the Accuracy score. he conclusion from able is that S achieves significantly lower Kullback-Leibler losses across all models. In terms of Frobenius losses, S also slightly outperforms. Moreover in in terms of graph accuracy, S also outperforms, as indicated by the F scores. In able in the appendix, we compare S with where the diagonals are not penalized. he conclusion that S outperforms still holds.. Minimum Kullback-Leibler losses Now for each estimator, we choose λ that minimizes the Kullback-Leibler loss (Equation ) between the estimated and the true inverse covariance matrix. herefore, we are comparing the lowest Kullback-Leibler losses that can be achieved by our estimator versus the. KL(λ ) = log det(ω ) log det ˆΩ + tr(ˆωω ) p () We conclude from able that the superior performance of S over exists after taking away the randomness due to cross-validations. In particular, the lowest possible KL losses achievable by our estimator is smaller, across all models, than the corresponding KL losses for the.. Recovering high-degree sparsities heorem says that the S prioritizes recovering the sparsity between nodes that have higher degrees. Here we show clear numerical evidence that S performs better than in terms of recovering the zeros corresponding to two high-degree nodes. In particular for each of the models described above, we examined the pair of nodes (i, j) such that d i + d j is largest and Ω,ij =, where d i = p k= Ω,ik is the true weighted degree of node i. For instance, for model, the argmax i,j (d i + d j ) such that Ω,ij = would correspond to the pairs of nodes (, ) and (, ). For simplicity, if there are multiple pairs of nodes that maximize d i + d j, we will pick the pair of nodes that comes first when the adjacency matrix is vectorized. In each of the model, we calculate the fraction (out of times) that and S correctly recover Ω,ij = for the largest d i + d j. We plot this result in Figure, which shows that across all models, the probability of correctly recovering Ω,ij = for high-degree nodes is greater when S is used, compared to.

11 Model Minimum Minimum Percentage KL Frobenius Dominance (a) (b) (a) (b) S S () (.8) (.7) (.9) (.) () (.9) (.) (.9) (.) () (.) (.) (.) (.) () (.) (.) (.8) (.) () (.) (.) (.97) (.9) () (.9) (.) (.8) (.9) (7) (.8) (.9) (.7) (.8) (8) (.9) (.) (.8) (.88) (9) (.9) (.98) (.78) (.8) () (.8) (.9) (.8) (.8) able : Minimum possible KL and Frobenius losses across all λ. he numbers reported are the average KL and Frobenius losses evaluated at the λ that result in the lowest possible values. Standard deviations across replications are reported in parentheses. In the last column, we report the fraction of times (out of the replications) in which S performed better in terms of KL loss at the corresponding λ. In practical applications, we are often interested in the relationships between influential agents (as measured by their weighted degrees in the true graphical model). his paper suggests that we can better uncover the dependence relationships between influential agents using S.. Computational details Computing the S estimator as in equation () is a convex optimization problem. More precisely, the problem consists of minimizing a smooth convex term given by log det Ω + tr(ωˆσ), and a non-differentiable convex term corresponding to λ Ω,. Since the objective can be formulated as minimizing the sum of a differentiable convex function and a non-differentiable convex function, we can use the proximal gradient method (Beck and eboulle (9)) to compute S. In this paper, we use CVX, a Matlab package for specifying and solving convex programs (Grant et al. (8)). CVX supports convex objective functions that are non-smooth (Grant and Boyd (8)). CVX also allows us to easily enforce symmetry and positive-definiteness of the matrix minimand, which is needed here. he code is readily available from the authors upon request.

12 Model Model Model Model S. S. S. S (a) Model (b) Model (c) Model (d) Model Model Model Model 7 Model S. S. S. S (e) Model (f) Model (g) Model 7 (h) Model 8 Model 9 Model S... (i) Model 9. S... (j) Model Fig. : S more accurately recovers the zeros belonging to high-degree nodes. Each figure shows the estimated probability of recovering the non-link (i, j) such that (d i + d j) is the largest in each model, as a function of the tuning parameter λ. More precisely, for a given λ, we calculate the number of times (out of the replications) that S and successfully estimate the entry Ω ij as zero, where (i, j) is such that Ω i,j = and that d i + d j has the highest value among all non-links.. EMPIRICAL EXAMPLE For a real-world empirical application, we use the Grunfeld investment data, which is a classic dataset that have been used in undergraduate and graduate econometrics. It is used widely as a teaching example in Greene (), and the data itself can be freely downloaded from the Greene () s companion website. he data consist of time series of = yearly observations on p = firms. he three variables are I it = real gross investment of firm i in year t, F it = real market value of the firm, C it = real value of capital stock such as plant and equipment. For each firm i, we first estimate the following linear equation (separately): I it = β i + β i F it + β i C it + ɛ it. Greene then uses

13 Fig. 7: Network of firms dependence estimated using S. he result using is identical. he estimated graphical model exhibits a clear core-periphery structure.he core firms are, and, which are respectively General Motors, U.S. Steel and General Electric. With this classification, all periphery firms do not have links to each other, but are linked only to the core firms. All core firms are linked to each other. Fig. 8: Network of firms dependence, as estimated using the sample inverse covariance matrix of the residuals. his network is a complete graph every node is linked to all other nodes. here are no zeros in the estimated inverse covariance matrix. Similarly, the sample covariance matrix has no zeros as well. the residual estimates ˆɛ it to calculate the sample inverse covariance matrix between the firms investment residuals. he graphical model produced by this sample inverse covariance matrix is in Figure 8. We see that the graphical model representing firms error dependence structure is a complete graph where every node is linked to all other nodes. We now use our proposed S and the conventional to estimate the precision matrix of the residuals ˆɛ it, i =,...,, t =,...,. he tuning parameters are determined using a two-fold cross-validation procedure. he result is shown in Figure 7. It turns out that both S and yield identical graphical models. Moreover, the graphical model exhibits a clear core-periphery structure: there is a group of core firms that are linked to each other, as well as linked to periphery firms. he periphery firms are not linked to each other, but are linked to the core firms. he core firms here are firms, and, which are respectively General Motors, U.S. Steel and General Electric. (With this classification, periphery firms do not have links to each other, but only to the core firms)

14 Without using S or, we would not be able to recover this core-periphery structure. Both the sample covariance matrix and the inverse covariance matrix do not contain any zeros.. CONCLUDING REMARKS We propose a new estimator for precision matrices. his estimator is called Structured Graphical Lasso (S), because it uses the L, -norm to control for the structure of sparsity in the precision matrix. In contrast, the standard Graphical Lasso () uses the L, lasso norm to control for the number of sparsity or zeros in the precision matrix. We show that S is asymptotically equivalent to an infeasible problem which prioritizes the sparsity-recovery of high-degree nodes. In another words, S is able to penalize an entry of the precision matrix proportionally according to the influence of the entry. S is potentially valuable in real-world settings where we are interested in the relationships between influential agents. In a variety of numerical simulations, S is shown to perform better than. APPENDIX A. Proof of heorem We will use the following important result that concerns the asymptotics of the minimizer of a convex random function (Geyer (99); Hjort and Pollard (); Kato (9); Pollard (99)). LEMMA A. Let f n (x, Ω) : R d Ω R be a random convex function, that is, f n (x, ) is a random variable for each x R d. Moreover, let x n(ω) be the unique minimizer of f n (x, Ω) with respect to x. Suppose that f n (x, ) converges in distribution to some random convex function f (x, ) for each x. Now if x (Ω) is the unique minimizer of f (x, Ω), then the random variable x n converges in distribution to the random variable x o prove heorem, define the following convex random functions by re-parameterization U = (Ω Ω ): V (U) = log det (Ω + U ) + tr {(Ω + U ) } ˆΣ + λ Ω + U, + log det Ω tr(ω ˆΣ) λ Ω,, (A) and V l (U) = log det (Ω + U ) + tr {(Ω + U ) } ( ˆΣ + λ D p Ω + U ), + log det Ω tr(ω ˆΣ) λ D p Ω,. (A) his function is random because of its dependence on the sample covariance matrix, ˆΣ. By definition, our S estimator ˆΩ λ satisfies the following. (ˆΩλ Ω ) = Û := argmin V (U). U By Lemma, the limit distribution of (ˆΩ λ Ω ) follows if we show V (U) V (U), (A) as and as λ λ, where the random limit function V (U) is defined in equation (7).

15 Following the argument in the proof of heorem of Yuan and Lin (7), we have that: log det (Ω + U ) log det Ω = tr(uσ ) tr(uσ ( ) UΣ ) + o tr {(Ω + U ) } ( U ˆΣ tr(ω ˆΣ) ˆΣ ) = tr ( ) ( ) UΣ U(ˆΣ Σ = tr ) + tr (A) (A) Moreover, we know that when is large enough, Ω,ij + u ij Ω,ij = g ij where g ij = u ij sign(ω,ij )(Ω,ij ) + u ij (Ω,ij = ). herefore we can show that the difference of the penalty terms can be written as Ω + U ( Ω, = Ω,ij + u ) ( ij ) Ω,ij, = = = = = j= j= i= i= j= [ ( Ω,ij + u ij + Ω,ij [ ( Ω,ij + u ij + Ω,ij j= j= i= i= i= ) i= ) i= ( Ω,ij + u ) ] ij Ω,ij ] g ij [ ( Ω,ij + u ij Ω,ij + Ω,ij [( g ij + j= i= j= i,i = ) Ω,ij i= g ij g i j + j= i,i = We can rewrite the second term of equation ( A) as follows: g ij Ω,i j = g ij Ω,ji j= i,i = = = j= i,i = j= i= i= j= i= ) ] g ij ( ) g ij Ω,ji i = g ij d j, where d j is the true weighted degree of node j defined as d j p k= Ω,jk. Combining equations ( A), ( A), ( A) and ( A), we can write V (U) as ( V (U) = tr(uσ UΣ ) + tr U ) (ˆΣ Σ ) + λ j= i,i = g ij g i j + λ i= ] g ij Ω,ij g i j (A) i= j= g ij d j + o()

16 In the limit as, we then have: ( V (U) = tr(uσ UΣ ) + tr U ) (ˆΣ Σ ) + λ Similarly, in the limit as, we have: i= j= ( V l (U) = tr(uσ UΣ ) + tr U ) (ˆΣ Σ ) + λ i= j= g ij d j g ij d j herefore, we have the first result of the theorem. Under Assumption the second result of the theorem follows by the continuous mapping theorem and Lemma. A. Proof of Proposition Proof. Since Ω is symmetric, we have D p Ω, = = = = = = i= j= i= j<i d j Ω ij i= j<i i= j<i i= j<i i= j= p d j Ω ij + j= i>j p d j Ω ij + d j Ω ij + j= i>j i= j<i (d i + d j ) Ω ij + d i Ω ji + d i Ω ij + d i Ω ij + d i Ω ii i= d i Ω ii i= d i Ω ii i= (d i + d i ) Ω ii i= (A7) (d i + d j ) Ω ij (A8) A. Additional tables and figures Fig. 9: Asymptotic distributions, with λ =. Left-hand side is, right-hand side is S. For, ˆPr(ˆΩ = ) =.9, ˆPr(ˆΩ = ) =.9. For S, ˆPr(ˆΩ = ) =., ˆPr(ˆΩ = ) =.7. Standard deviations of both estimators are comparable.

17 7 Θ, Θ, Fig. : Asymptotic distribution of the Maximum Likelihood Estimator (MLE). his corresponds to the case where λ =. Model Optimal λ KL Frobenius (a) (b) (a) (b) (a) (b) S S S () (.) (.9) (.) (.8) (.) (.) () (.7) (.9) (.7) (.) (.9) (.7) () (.) (.) (.7) (.9) (.) (.8) () (.) (.) (.78) (.8) (.8) (.) () (.) (.7) (.79) (.8) (.) (.) () (.) (.7) (.) (.9) (.99) (.) (7) (.8) (.9) (.99) (.9) (.99) (.) (8) (.7) (.) (.9) (.) (.) (.7) (9) (.) (.) (.) (.7) (.8) (.) () (.8) (.) (.) (.8) (.8) (.) able : Identical to able, except we use the version of where the diagonals are not penalized. Averages and standard errors from replications. Columns (a) and (b) report the optimal λ as determined from crossvalidation for (a) S and (b). Columns (a) and (b) are the average Kullback-Leibler losses. he last two columns are the average Frobenius losses. REFERENCES Banerjee, O., El Ghaoui, L., and d Aspremont, A. (8). Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. he Journal of Machine Learning Research, 9:8. Beck, A. and eboulle, M. (9). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, ():8.

18 8 Cai,., Liu, W., and Luo, X. (). A constrained minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, (9):9 7. Cai,.., Liu, W., Zhou, H. H., et al. (). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. he Annals of Statistics, (): 88. Fan, J., Feng, Y., and Wu, Y. (9). Network exploration via the adaptive lasso and scad penalties. he annals of applied statistics, ():. Fan, J. and Li, R. (). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 9():8. Fan, J., Liao, Y., and Liu, H. (). An overview on the estimation of large covariance and precision matrices. arxiv preprint arxiv:.99. Fan, J., Liao, Y., and Liu, H. (). An overview of the estimation of large covariance and precision matrices. he Econometrics Journal, 9():C C. Fan, Y. and Lv, J. (). Innovated scalable efficient estimation in ultra-large gaussian graphical models. arxiv preprint arxiv:.. Friedman, J., Hastie,., and ibshirani, R. (8). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9():. Friedman, J., Hastie,., and ibshirani, R. (). A note on the group lasso and a sparse group lasso. arxiv preprint arxiv:.7. Geyer, C. J. (99). On the asymptotics of constrained m-estimation. he Annals of Statistics, pages 99. Giudici, P. and Spelta, A. (). Graphical network models for international financial flows. Journal of Business & Economic Statistics, ():8 8. Grant, M., Boyd, S., and Ye, Y. (8). Cvx: Matlab software for disciplined convex programming. Grant, M. C. and Boyd, S. P. (8). Graph implementations for nonsmooth convex programs. In Recent advances in learning and control, pages 9. Springer. Greene, W. H. (). Econometric analysis. Prentice hall, 7th edition. Hastie,., ibshirani, R., and Wainwright, M. (). Statistical learning with sparsity. CRC press. Hjort, N. L. and Pollard, D. (). Asymptotics for minimisers of convex processes. arxiv preprint arxiv:7.8. Jackson, M. O. (8). Social and economic networks. Princeton University Press. Johnstone, I. M. (). On the distribution of the largest eigenvalue in principal components analysis. Annals of statistics, pages 9 7. Kato, K. (9). Asymptotics for argmin processes: Convexity arguments. Journal of Multivariate Analysis, (8):8 89. Lam, C. and Fan, J. (9). Sparsistency and rates of convergence in large covariance matrix estimation. Annals of statistics, 7(B):. Meinshausen, N. and Bühlmann, P. (). High-dimensional graphs and variable selection with the lasso. he annals of statistics, pages. Peng, J., Wang, P., Zhou, N., and Zhu, J. (). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association. Pollard, D. (99). Asymptotics for least absolute deviation regression estimators. Econometric heory, 7():8 99. Ren, Z., Sun,., Zhang, C.-H., Zhou, H. H., et al. (). Asymptotic normality and optimalities in estimation of large gaussian graphical models. he Annals of Statistics, ():99. Rothman, A. J., Bickel, P. J., Levina, E., Zhu, J., et al. (8). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, :9. Whittaker, J. (9). Graphical Models in Applied Multivariate Statistics. London: Wiley Publishing. Yuan, M. and Lin, Y. (). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 8():9 7. Yuan, M. and Lin, Y. (7). Model selection and estimation in the gaussian graphical model. Biometrika, 9():9.

arxiv: v2 [econ.em] 1 Oct 2017

arxiv: v2 [econ.em] 1 Oct 2017 Estimation of Graphical Models using the L, Norm Khai X. Chiong and Hyungsik Roger Moon arxiv:79.8v [econ.em] Oct 7 Naveen Jindal School of Management, University of exas at Dallas E-mail: khai.chiong@utdallas.edu

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models LETTER Communicated by Clifford Lam Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models Lingyan Ruan lruan@gatech.edu Ming Yuan ming.yuan@isye.gatech.edu School of Industrial and

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Sparse estimation of high-dimensional covariance matrices

Sparse estimation of high-dimensional covariance matrices Sparse estimation of high-dimensional covariance matrices by Adam J. Rothman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The

More information

Sparse Permutation Invariant Covariance Estimation: Final Talk

Sparse Permutation Invariant Covariance Estimation: Final Talk Sparse Permutation Invariant Covariance Estimation: Final Talk David Prince Biostat 572 dprince3@uw.edu May 31, 2012 David Prince (UW) SPICE May 31, 2012 1 / 19 Electronic Journal of Statistics Vol. 2

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models arxiv:1006.3316v1 [stat.ml] 16 Jun 2010 Contents Han Liu, Kathryn Roeder and Larry Wasserman Carnegie Mellon

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Sparse Permutation Invariant Covariance Estimation

Sparse Permutation Invariant Covariance Estimation Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan, Ann Arbor, USA. Peter J. Bickel University of California, Berkeley, USA. Elizaveta Levina University of Michigan,

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

Proximity-Based Anomaly Detection using Sparse Structure Learning

Proximity-Based Anomaly Detection using Sparse Structure Learning Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

arxiv: v1 [math.st] 31 Jan 2008

arxiv: v1 [math.st] 31 Jan 2008 Electronic Journal of Statistics ISSN: 1935-7524 Sparse Permutation Invariant arxiv:0801.4837v1 [math.st] 31 Jan 2008 Covariance Estimation Adam Rothman University of Michigan Ann Arbor, MI 48109-1107.

More information

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models Journal of Machine Learning Research 12 (211) 2975-326 Submitted 9/1; Revised 6/11; Published 1/11 High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou Department of Statistics

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices

A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices A Multiple Testing Approach to the Regularisation of Large Sample Correlation Matrices Natalia Bailey 1 M. Hashem Pesaran 2 L. Vanessa Smith 3 1 Department of Econometrics & Business Statistics, Monash

More information

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

arxiv: v1 [stat.me] 16 Feb 2018

arxiv: v1 [stat.me] 16 Feb 2018 Vol., 2017, Pages 1 26 1 arxiv:1802.06048v1 [stat.me] 16 Feb 2018 High-dimensional covariance matrix estimation using a low-rank and diagonal decomposition Yilei Wu 1, Yingli Qin 1 and Mu Zhu 1 1 The University

More information

Indirect multivariate response linear regression

Indirect multivariate response linear regression Biometrika (2016), xx, x, pp. 1 22 1 2 3 4 5 6 C 2007 Biometrika Trust Printed in Great Britain Indirect multivariate response linear regression BY AARON J. MOLSTAD AND ADAM J. ROTHMAN School of Statistics,

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Estimating Sparse Precision Matrix with Bayesian Regularization

Estimating Sparse Precision Matrix with Bayesian Regularization Estimating Sparse Precision Matrix with Bayesian Regularization Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement Graphical

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

On the inconsistency of l 1 -penalised sparse precision matrix estimation

On the inconsistency of l 1 -penalised sparse precision matrix estimation On the inconsistency of l 1 -penalised sparse precision matrix estimation Otte Heinävaara Helsinki Institute for Information Technology HIIT Department of Computer Science University of Helsinki Janne

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) 1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering

More information

Latent Variable Graphical Model Selection Via Convex Optimization

Latent Variable Graphical Model Selection Via Convex Optimization Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Sparse Permutation Invariant Covariance Estimation

Sparse Permutation Invariant Covariance Estimation Sparse Permutation Invariant Covariance Estimation Adam J. Rothman University of Michigan Ann Arbor, MI 48109-1107 e-mail: ajrothma@umich.edu Peter J. Bickel University of California Berkeley, CA 94720-3860

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

Sparse Covariance Matrix Estimation with Eigenvalue Constraints

Sparse Covariance Matrix Estimation with Eigenvalue Constraints Sparse Covariance Matrix Estimation with Eigenvalue Constraints Han Liu and Lie Wang 2 and Tuo Zhao 3 Department of Operations Research and Financial Engineering, Princeton University 2 Department of Mathematics,

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

Sparse and Robust Optimization and Applications

Sparse and Robust Optimization and Applications Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Learning Local Dependence In Ordered Data

Learning Local Dependence In Ordered Data Journal of Machine Learning Research 8 07-60 Submitted 4/6; Revised 0/6; Published 4/7 Learning Local Dependence In Ordered Data Guo Yu Department of Statistical Science Cornell University, 73 Comstock

More information

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Qingming Tang Chao Yang Jian Peng Jinbo Xu Toyota Technological Institute at Chicago Massachusetts Institute of Technology Abstract. This

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts

Journal of Multivariate Analysis. Consistency of sparse PCA in High Dimension, Low Sample Size contexts Journal of Multivariate Analysis 5 (03) 37 333 Contents lists available at SciVerse ScienceDirect Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva Consistency of sparse PCA

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Theory and Applications of High Dimensional Covariance Matrix Estimation

Theory and Applications of High Dimensional Covariance Matrix Estimation 1 / 44 Theory and Applications of High Dimensional Covariance Matrix Estimation Yuan Liao Princeton University Joint work with Jianqing Fan and Martina Mincheva December 14, 2011 2 / 44 Outline 1 Applications

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Learning Markov Network Structure using Brownian Distance Covariance

Learning Markov Network Structure using Brownian Distance Covariance arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the

More information

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction NUCLEAR NORM PENALIZED ESTIMATION OF IERACTIVE FIXED EFFECT MODELS HYUNGSIK ROGER MOON AND MARTIN WEIDNER Incomplete and Work in Progress. Introduction Interactive fixed effects panel regression models

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang Intel-NTU, National Taiwan University, Taiwan Shou-de Lin Intel-NTU, National Taiwan University, Taiwan WANGJIM123@GMAIL.COM SDLIN@CSIE.NTU.EDU.TW Abstract This paper proposes a robust method

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models Han Liu Kathryn Roeder Larry Wasserman Carnegie Mellon University Pittsburgh, PA 15213 Abstract A challenging

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0076.R2 Title Graph Estimation for Matrix-variate Gaussian Data Manuscript ID SS-2017-0076.R2 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0076

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation

Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation Estimating Structured High-Dimensional Covariance and Precision Matrices: Optimal Rates and Adaptive Estimation T. Tony Cai 1, Zhao Ren 2 and Harrison H. Zhou 3 University of Pennsylvania, University of

More information

Posterior convergence rates for estimating large precision. matrices using graphical models

Posterior convergence rates for estimating large precision. matrices using graphical models Biometrika (2013), xx, x, pp. 1 27 C 2007 Biometrika Trust Printed in Great Britain Posterior convergence rates for estimating large precision matrices using graphical models BY SAYANTAN BANERJEE Department

More information

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Adaptive First-Order Methods for General Sparse Inverse Covariance Selection

Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Zhaosong Lu December 2, 2008 Abstract In this paper, we consider estimating sparse inverse covariance of a Gaussian graphical

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information