A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure

Size: px
Start display at page:

Download "A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure"

Transcription

1 A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure Micol Marchetti-Bowick Machine Learning Department Carnegie Mellon University Abstract Background: A critical task in the study of biological systems is understanding how gene expression is regulated within the cell. This problem is typically divided into multiple separate tasks, including performing eqtl mapping to identify SNP-gene relationships and estimating gene network structure to identify gene-gene relationships. Aim: In this work, we pursue a holistic approach to discovering the patterns of gene regulation in the cell. We present a new method for jointly performing eqtl mapping and gene network estimation while encouraging a transfer of information between the two tasks. Data: We evaluate our approach on both synthetic data and on a real yeast eqtl dataset that contains, 57 SNP genotypes and, 409 gene expression measurements for 4 yeast samples. Methods: To construct a unified model for jointly performing eqtl mapping and gene network inference, we formulate the problem as a multiple-output regression task in which we aim to learn the regression coefficients while simultaneously estimating the conditional independence relationships among the set of response variables. The approach we develop uses structured sparsity penalties to encourage the sharing of information between the regression coefficients and the output network in a mutually beneficial way. Our model, inverse-covariance-fused lasso, is formulated as a biconvex optimization problem that we solve via alternating minimization. We derive new, efficient optimization routines to solve each convex sub-problem that are based on existing state-of-the-art methods. Results: We demonstrate the value of our approach by applying our method to both simulated data and a real yeast eqtl dataset data. Experimental results demonstrate that our approach outperforms a large number of existing methods on the recovery of the true sparse structure of both the eqtl associations and the gene network. Conclusions: We show that inverse-covariance-fused lasso can be used to perform joint eqtl mapping and gene network estimation on a yeast dataset, yielding more biologically coherent results than previous work. Furthermore, the same problem setting appears in many different applications, and therefore our model can be deployed in a wide range of domains.

2 Introduction A critical task in the study of biological systems is understanding how gene expression is regulated within the cell. Although this problem has been studied extensively over the past few decades, it has recently gained momentum due to the rapid advancement in techniques for high-throughput data acquisition. Within this broad task, two sub-problems of particular interest are (a) understanding how various genetic loci regulate gene expression, a problem known as eqtl mapping (Rockman and Kruglyak, 2006), and (b) determining which genes have a direct influence on the expression of other genes, a problem known as gene network estimation (Gardner and Faith, 2005). Prior work on learning regulatory associations has largely treated eqtl mapping and gene network estimation as completely separate problems. In this work, we pursue a holistic approach to discovering the patterns of gene regulation in the cell by integrating eqtl mapping and gene network estimation into a single model and performing joint inference. Specifically, given a dataset that contains both genotype information for a set of single nucleotide polymorphisms (SNPs) and mrna expression measurements for a set of genes, we aim to simultaneously learn the SNP-gene and gene-gene relationships. The key element of our approach is that we enable the transfer of knowledge between these two tasks in order to yield more accurate solutions to both problems. In particular, we assume that two genes that are tightly linked in a regulatory network are likely to be associated with similar sets of SNPs in an eqtl map, and vice versa. This allows us to use information about gene-gene relationships to learn more accurate eqtl associations, and similarly to use information about SNP-gene relationships to learn a more accurate gene network. We construct a unified model for this problem by formulating it as a multiple-output regression task in which we jointly estimate the regression coefficients and the inverse covariance structure among the response variables. Specifically, given SNPs x = (x,..., x p ) and genes y = (y,..., y q ), our goal is to regress y on x and simultaneously estimate the inverse covariance of y. Under this model, the matrix of regression coefficients encodes the SNP-gene relationships in the eqtl map, whereas the inverse covariance matrix captures the gene-gene relationships in the gene network. In order to ensure that information is transferred between the two components of the model, we incorporate a regularization penalty that explicitly encourages pairs of genes that have a high weight in the inverse covariance matrix to also have similar regression coefficient values. This structured penalty enables the two estimates to learn from one another as well as from the data. A large number of techniques have been developed to address eqtl mapping and gene network estimation in isolation. The traditional approach to eqtl mapping is to examine each SNP-gene pair independently and perform a univariate statistical test to determine whether an association exists between the two. Recently, a series of more complex multivariate models have been utilized for this task in order to jointly capture the effects of several SNPs on one gene (Michaelson et al., 200). Going one step further, a few approaches have been developed that jointly model multiple SNPs and multiple genes (Kim and Xing, 2009; Kim et al., 202). Conversely, the simplest method for gene network inference is to construct a graph in which each edge is weighted by the marginal correlation between the corresponding genes. Another related method that has recently gained popularity, known equivalently as Gaussian graphical model estimation or inverse covariance estimation, produces a network in which the edges are instead weighted by the conditional correlations (Dempster, 972). Other approaches include using Bayesian networks or ordinary differential equations to model the relationships between genes. (Marbach et al., 200). In this work, we will use a multiple-output (also called multi-task) regression model to perform eqtl mapping and incorporate a Gaussian graphical model over the genes in order to simultaneously infer the gene network. However, it is well known that model estimation is challenging in the 2

3 high-dimensional setting in which we have more variables in the model than samples from which to learn. This is the precisely the setting frequently encountered in genomics, where we see datasets with at most a few hundred samples but as many as million SNPs and thousands of genes. As a result, we will incorporate several regularization penalties in order to facilitate model estimation. The challenge of high-dimensional data has already inspired a remarkable number of useful methods for penalized multi-task regression. Parameter estimation can be improved by leveraging sparse regularizers to encourage the outputs to share an underlying representation. For example, mixed-norm penalties such as the l,2 or l, norm can be used to encourage all outputs to have nonzero coefficients for the same set of inputs (Obozinski et al., 2008). In other cases, structured sparsity can be used to encourage similarities only among responses that are known to be related. Examples of such approaches include the group lasso (Yuan and Lin, 2006), which encourages groups of related outputs to have nonzero coefficients for the same subset of inputs, and the graph-guided fused lasso (Kim and Xing, 2009), which encourages pairs of outputs that are connected in a graph to have similar coefficient values. These penalties leverage prior knowledge of the relationships among the outputs (e.g. group or graph structures) to enforce similarity constraints. The principal limitation of using such structured sparsity in multi-task regression is that it typically requires extensive prior knowledge of the relationships among the outputs, which is not always readily available. To circumvent this, another class of models have been developed that jointly learn the regression coefficients along with the output dependency structure (Rothman et al., 200; Zhang and Yeung, 200; Lee and Liu, 202; Sohn and Kim, 202; Rai et al., 202). However, none of these approaches use a regularization penalty to explicitly encourage shared structure between the estimate of the regression parameters and the output network. Furthermore, the majority of these methods do not learn the covariance structure of the outputs y but rather the conditional covariance of the outputs given the inputs y x. This can be interpreted as the covariance of the noise matrix, and does not capture the true relationships between outputs. In high-dimensional settings, inverse covariance estimation models can also benefit from penalties that induce sparsity or encourage a particular structure. For example, Tan et al. (204) use a row-column overlap norm penalty to encourage a network structure with hubs. When the goal is to learn a series of related networks for different conditions or time points, regularization can be used to encourage similarities between the networks (Mohan et al., 204; Peterson et al., 205). Here we present a novel approach that jointly performs eqtl mapping and gene network inference while simultaneously leveraging structured sparsity. In particular, our work has three significant contributions: () we construct a novel multiple-output regression model that jointly estimates the regression coefficients and the output network while encouraging shared structure; (2) we develop an efficient alternating minimization procedure for our model, which has a biconvex objective, and derive nontrivial extensions to two state-of-the-art optimization techniques to solve each sub-problem; (3) we perform synthetic and real data experiments to demonstrate that our approach leads to more accurate estimation of the sparsity pattern of the regression coefficients, more accurate estimation of the sparse network structure, and a lower prediction error than baseline methods. To the best of our knowledge, there are no existing approaches that jointly estimate regression coefficients and output relationships while also incorporating regularization penalties to encourage the explicit sharing of information between the two estimates. The remainder of this article is organized as follows. We begin by providing some background in Section 2, and then present our model and optimization technique in Sections 3 and 4. Next we demonstrate the value of our approach in Sections 5 and 6 by applying our model to both synthetic and real data, and comparing the results to several baselines. Finally, we conclude in Section 7 by discussing the implications of our findings. 3

4 2 Background Before introducing our approach, we provide some background on the problems of penalized multiple-output regression and sparse inverse covariance estimation, which will form the building blocks of our unified model. In what follows, we assume X is an n-by-p dimensional matrix of SNP genotypes, which we also call inputs, and Y is an n-by-q dimensional matrix of gene expression values, which we also call outputs. Here n is the number of samples, p is the number of SNPs, and q is the number of genes. The element x ij {0,, 2} represents the genotype value of sample i at SNP j, encoded as 0 for two copies of the minor allele, for one copy of the minor allele, and 2 for two copies of the minor allele. Similarly y ik R represents the expression value of sample i in gene k. We assume that the expression values for each gene are mean-centered. 2. Multiple-Output Lasso Given input matrix X and output matrix Y, the standard l -penalized multiple-output regression problem, also known as the multi-task lasso (Tibshirani, 996), is given by min B n Y XB 2 F + λ B () where B is a p-by-q dimensional matrix and β jk is the regression coefficient that maps SNP x j to gene y k. Here is an l norm penalty that induces sparsity among the estimated coefficients, and λ is a regularization parameter that controls the degree of sparsity. The objective function given above is derived from the penalized negative log likelihood of a multivariate Gaussian distribution, assuming y x N (x T B, ε 2 I) where we let ε 2 = for simplicity. Although this problem is formulated in a multiple-output framework, the l norm penalty merely encourages sparsity, and does not enforce any shared structure between the regression coefficients of different outputs. As a result, the objective function given in () decomposes into q independent regression problems. 2.2 Graph-Guided Fused Lasso Given a weighted graph W R q q that encodes a set of pairwise relationships among the outputs, we can modify the regression problem by imposing an additional fusion penalty that encourages genes y k and y m to have similar parameter vectors β k and β m when the weight of the edge connecting them is large. This problem is known as the graph-guided fused lasso (Kim et al., 2009; Kim and Xing, 2009; Chen et al., 200) and is given by min B n Y XB 2 F + λ B + γ (k,m) w km β k sgn(w km )β m (2) Here the l norm penalty again encourages sparsity in the estimated coefficient matrix. In contrast, the second penalty term, known as a graph-guided fusion penalty, encourages similarity among the regression parameters for all pairs of outputs. The weight of each term in the fusion penalty is dictated by w km, which encodes the strength of the relationship between y k and y m. Furthermore, the sign of w km determines whether to encourage a positive or negative relationship between parameters; if w km > 0 (i.e. genes y k and y m are positively correlated), then we encourage β k to be equal to β m, but if w km < 0 (i.e. genes y k and y m are negatively correlated), we encourage β k to be equal to β m. If w km = 0, then genes y k and y m are unrelated, and so we don t fuse their respective regression coefficients. 4

5 2.3 Sparse Inverse Covariance Estimation In the graph-guided fused lasso model defined in (2), the graph W must be known ahead of time. However, it is also possible to learn a network over the set of genes. One way to do this is to estimate their pairwise conditional independence relationships. If we assume y N (µ, Σ), where we let µ = 0 for simplicity, then these conditional independencies are encoded in the inverse covariance matrix, or precision matrix, defined as Θ = Σ. We can obtain a sparse estimate of the precision matrix using the graphical lasso (Friedman et al., 2008) given by min Θ n tr(y T Y Θ) log det(θ) + λ Θ (3) This objective is again derived from the penalized negative log likelihood of a Gaussian distribution, where this time the l penalty term encourages sparsity among the entries of the precision matrix. 3 The Inverse-Covariance-Fused Lasso In this section, we introduce a new approach for jointly estimating the coefficients in a multipleoutput regression problem and the edges of a network over the regression outputs. We apply this technique to the problem of simultaneously learning an eqtl map and a gene regulatory network from genome (SNP) data and transcriptome (gene expression) data. Although we focus exclusively on this application, the same problem formulation appears in other domains as well, such as the task of modeling the dependencies between a set of economic indicators and the stock prices of various companies while also understanding the relationships among the companies. 3. A Joint Regression and Network Estimation Model Given SNPs x R p and genes y R q, in order to jointly model the n-by-p regression parameter matrix B and the q-by-q inverse covariance matrix Θ, we begin with two core modeling assumptions, x N (0, T ) (4) y x N (x T B, E) (5) where T is the covariance of x and E is the conditional covariance of y x. Given the above model, we can also derive the marginal distribution of y. To do this, we first use the fact that the marginal distribution p(y) is Gaussian. We can then use the law of total expectation and the law of total variance to derive the mean and covariance of y, as follows. E y (y) = E x (E y x (y x)) = 0 (6) Cov y (y) = E x (Cov y x (y x)) + Cov x (E y x (y x)) = E + B T T B (7) Using these facts, we conclude that the distribution of y is given by y N (0, Θ ) (8) where Θ = E + B T T B denotes the marginal covariance of y. This allows us to explicitly relate Θ, the inverse covariance of y, to B, the matrix of regression parameters. Lastly, we simplify our model by assuming T = τ 2 I p p and E = ε 2 I q q. With this change, the relationship between B and Θ can be summarized as Θ B T B because B is now the only term that contributes to the off-diagonal entries of Θ and hence to the inverse covariance structure among the genes. We refer the reader to Equation B.44 of Appendix B in Bishop (2006). 5

6 3.2 Estimating Model Parameters with a Fusion Penalty Now that we have a model that explicitly captures B and Θ, we want to jointly estimate these parameters from the data while encouraging the relationship Θ B T B. To do this, we formulate our model as a convex optimization problem with an objective function of the form loss y x (B) + loss y (Θ) + penalty(b, Θ) (9) where loss y x (B) is a loss function derived from the negative log likelihood of y x, loss y (Θ) is a loss function derived from the negative log likelihood of y, and penalty(b, Θ) is a penalty term that encourages shared structure between the estimates of B and Θ. Given n i.i.d. observations of x and y, let X be a matrix that contains one observation of x per row and let Y be a matrix that contains one observation of y per row. Then we define the inverse-covariance-fused lasso optimization problem as min B,Θ n Y XB 2 F + n tr(y T Y Θ) log det(θ) + λ B + λ 2 Θ + γ (k,m) θ km β k + sgn(θ km )β m (0) This objective effectively boils down to a combination of problems () and (3), with the addition of a graph-guided fusion penalty from (2) to encourage transfer learning between the estimate of B and Θ. We deconstruct the above objective by describing the role of each term in the model: The first term n Y XB 2 F is the standard squared error loss for multiple-output regression, and is derived from the log likelihood of y x N (x T B, ε 2 I). Its role is to encourage the coefficients B to map X to Y. The second term n tr(y T Y Θ) log det(θ) is the loss function for inverse covariance estimation over Y, and is derived from the log likelihood of y N (0, Θ ). Its role is to encourage the network Θ to reflect the partial correlations among the outputs. The third term λ B = λ j,k β jk is an l norm penalty over the matrix of regression coefficients that induces sparsity in B. The fourth term λ 2 Θ = λ 2 k,m θ km is an l norm penalty over the precision matrix that induces sparsity in Θ. The final term γ (k,m) θ km β k +sgn(θ km )β m is a graph-guided fusion penalty that encourages similarity between the coefficients of closely related outputs; specifically, when y k and y m have a positive partial correlation and β jk β jm for any j, it imposes a penalty proportional to θ km, and when y k and y m have a negative partial correlation and β jk β jm for any j, it imposes a penalty proportional to θ km. 2 2 Note that θ km is negatively proportional to the partial correlation between y k and y m, meaning that a negative value of θ km indicates a positive partial correlation and a positive value of θ km indicates a negative partial correlation. The partial correlation coefficient between y j and y k is given by ρ jk = θ jk / θ iiθ jj and is defined as the correlation between the errors of the best linear predictions of y j and y k when conditioned on covariates {y m : m j, k} (see, e.g., Peng et al. (2009)). This explains why the sign is flipped in the fusion penalty in (0) relative to the one in (2). 6

7 A derivation of the two loss functions can be found in Appendix A. These come directly out of the modeling assumptions given in (5) and (8). The two l norm penalties induce sparsity in the estimates of B and Θ, which makes estimation feasible in the high-dimensional setting where p, q > n and furthermore is necessary for interpreting the eqtl map and gene network. Lastly, we prove that the graph-guided fused lasso penalty encourages the structure Θ B T B, thereby linking the two estimates. Note that λ, λ 2, and γ are regularization parameters that determine the relative weight of each penalty term. First we consider the optimization problem ˆΘ = arg min Θ f(θ) tr(b T BΘ) log det(θ). We can solve this problem in closed form by taking the gradient Θ f(θ) = B T B Θ and setting it to 0, which yields the solution ˆΘ = B T B. This suggests that the penalty tr(b T BΘ) encourages the desired structure, while the log determinant term enforces the constraint that Θ be positive semidefinite, which is necessary for Θ to be a valid inverse covariance matrix. Instead of directly using this penalty in our model, we demonstrate its equivalence to the graph-guided fused lasso penalty shown in (2) when we substitute Θ for W. We compare the trace penalty, denoted TRP, and the graph-guided fused lasso penalty, denoted GFL, below. TRP(B, Θ) = tr(b T BΘ) = GFL(B, Θ) = q k= m= q k= m= q θ km β T k β m () q θ km β k + sgn(θ km )β m (2) We show how these penalties are equivalent by considering three different cases. Case of θ km = 0. When θ km = 0, the corresponding term in () becomes 0 β T k β m = 0 and the corresponding term in (2) becomes 0 β k + sgn(θ km )β m = 0. Therefore there is nothing linking β k and β m when θ km = 0 in either penalty. Case of θ km < 0. When θ km < 0, the corresponding term in () is minimized when β T k β m is large and positive, which occurs when β k and β m point in the same direction. Similarly, the corresponding term in (2) becomes θ km β k β m and the penalty is minimized when β k = β m. Therefore when θ km is negative, both penalties encourage similarity between β k and β m with strength proportional to the magnitude of θ km. Case of θ km > 0. When θ km > 0, the corresponding term in () is minimized when β T k β m is large and negative, which occurs when β k and β m point in opposite directions. Similarly, the corresponding term in (2) becomes θ km β k +β m and the penalty is minimized when β k = β m. Therefore when θ km is positive, both penalties encourage similarity between β k and β m with strength proportional to the magnitude of θ km. We choose to use the graph-guided fused lasso penalty instead of the trace penalty because it more strictly enforces the relationship between B and Θ by fusing the regression parameter values of highly correlated genes. 3.3 Relationship to Other Methods There are currently two existing approaches that jointly estimate regression coefficients and network structure: multivariate regression with covariance estimation (MRCE), from Rothman et al. (200), and conditional Gaussian graphical models (CGGMs), originally from Sohn and Kim (202) and 7

8 further developed by Wytock and Kolter (203) and Yuan and Zhang (204). In this section, we describe how our approach differs from these others. All three methods (including ours) assume that the inputs X and outputs Y are related according to the basic linear model Y = XB + E, where E is a matrix of Gaussian noise. However, each approach imposes a different set of additional assumptions on top of this, which we discuss below. MRCE: This method assumes that E N (0, Ω ), which leads to Y X N (XB, Ω ). MRCE estimates B and Ω by solving the following objective: min B,Ω n tr((y XB)T (Y XB) Ω) log det(ω) + λ B + λ 2 Ω (3) It s very important to note that Ω is the conditional inverse covariance of Y X, which actually corresponds to the inverse covariance of the noise matrix E rather than the inverse covariance of the output matrix Y. We therefore argue that Ω doesn t capture any patterns that are shared with the regression coefficients B, since by definition Ω encodes the structure in Y that cannot be explained by XB. CGGM: This approach makes an initial assumption that X and Y are jointly Gaussian with the following distribution: ( X Y ) N ([ 0 0 ] [ Γ Λ, Λ T Ω In this formulation, the distribution of Y X is given by Y X N ( XΛΩ, Ω ). This corresponds to the reparameterization of B as ΛΩ, where Ω is the conditional inverse covariance matrix and Λ represents the direct influence of X on Y. CGGMs estimate Λ and Ω by solving the following optimization problem, where sparsity penalties are applied to Λ and Ω instead of B and Ω as was the case in (3): min Λ,Ω ]) n tr((y + XΛΩ ) T (Y + XΛΩ ) Ω) log det(ω) + λ Λ + λ 2 Ω (4) Here the meaning of Ω has not changed, and it once again represents the inverse covariance of the noise matrix. ICLasso: Our method implicitly assumes two underlying models: Y X N (XB, I) and Y N (0, Θ ). In this case, Θ represents the marginal inverse covariance of Y rather than the conditional inverse covariance of Y X, which was captured by Ω in (3) and (4). The optimization problem in (0) is obtained by combining the loss functions derived from the log likelihood of each model and then incorporating sparsity penalties over B and Θ and an additional graph-guided fusion penalty to encourage shared structure. Both MRCE and CGGMs have two important drawbacks that are not shared by our approach. First, both of these methods estimate Ω, the precision matrix of the noise term, rather than Θ, the precision matrix of the outputs Y. This means that they are estimating a network over the outputs in which each edge represents a partial correlation after conditioning on both the xs and all other ys. Second, neither method incorporates a structured sparsity penalty that explicitly encourages shared structure between the network and the regression coefficients. In fact, it would not make sense for these methods to apply a joint penalty over B and Ω because, as discussed above, we wouldn t expect these parameters to have any shared structure. By comparison, our method learns the true output network Θ and uses a graph-guided fused lasso penalty to explicitly encourage outputs that are closely related in Θ to have similar parameter values in B. 8

9 4 Optimization via Alternating Minimization In this section we present an efficient algorithm to solve the inverse-covariance-induced fused lasso problem defined in (0). We start by rewriting the fusion penalty as follows: GFL(B, Θ) = γ (k,m) θ km β k + sgn(θ km )β m = γ (k,m) max{θ km, 0} β k + β m + γ (k,m) max{ θ km, 0} β k β m, from which it is clear that GFL is a bi-convex function, meaning that with fixed B, it is convex in Θ, and vice versa. Thus, upon defining we can rewrite the original objective as g(b) = n Y XB 2 F + λ B h(θ) = n tr(y T Y Θ) log det(θ) + λ 2 Θ, min B,Θ g(b) + h(θ) + GFL(B, Θ). (5) The function g is the usual lasso formulation in (), the function h is the graphical lasso formulation in (3), and the graph-guided fusion penalty couples the two problems. Fortunately, since GFL is bi-convex, we can solve the joint problem (5) using an alternating minimization strategy. Next we leverage and extend state-of-the-art convex optimization routines to solve each convex sub-problem. Fix Θ, Minimize B. When Θ is fixed, minimizing the objective over B reduces to the wellknown graph-guided fused lasso problem, f Θ (B) = g(b) + GFL(B, Θ), (6) which we optimize using the proximal-average proximal gradient descent (PA-PG) algorithm from Yu (203). This algorithm is very simple. On each iteration, we first take a gradient step of the form B ηx (XB Y ) using some small step size η. Then we compute the weighted average of the component proximal operators for each pair of outputs, where the prox that corresponds to pair (k, m) is given by: ˆB = arg min B 2η B Z 2 F + β.k + sgn(θ km )β.m (7) and the weight of this term is given by θ km /θ tot where θ tot = (k,m) θ k,m. Due to the separability of (7) over the rows of B, we can solve for each β j independently. Furthermore, it s clear that for any i {k, m}, we have β ji = z ji. Solving for the remaining elements β jk and β jm leads to the following two-dimensional subproblem: ˆβ jk, ˆβ jm = arg min β jk,β 2η (β jk z jk ) 2 + (β jm z jm ) 2 + β jk + sgn(θ km )β jm. (8) jm which can be solved in closed form. Therefore the full solution to the prox operator can be written compactly as follows, where d km = z k + sgn(θ km )z m. ˆβ i = z i for i / {k, m} ˆβ k = z k sgn(d km ) min{η, 2 d km } ˆβ m = z m sgn(θ km ) sgn(d km ) min{η, 2 d km } 9

10 From these formulas, we can see that β jk and sgn(θ km )β jm are always fused towards each other. For example, when sgn(θ km ) < 0, we want to push β jk and β jm towards the same value. In this case, the larger of z jk and z jm will be decremented and the smaller value will be incremented by the same quantity. In contrast, when sgn(θ km ) > 0, the solution pushes β jk towards β jm. We summarize the procedure in Algorithm of Appendix B. In practice, we use the accelerated version of the algorithm, PA-APG. Using the argument from Yu (203), we can prove that this accelerated algorithm converges to an ɛ-optimal solution in at most O(/ɛ) steps, which is significantly better than the converge rate of subgradient descent, which is O(/ ɛ). Fix B, Minimize Θ. When B is fixed, minimizing the objective over Θ reduces to a variation of the well-known graphical lasso problem, f B (Θ) = h(θ) + GFL(B, Θ), (9) which can be optimized by adapting the block coordinate descent (BCD) algorithm of Friedman et al. (2008). Indeed, we can rewrite the objective by introducing two q q dimensional coefficient matrices U and L whose elements are defined as U km = n Y.k.m + λ 2 + γ β.k + β.m (20) L km = n Y.k.m λ 2 γ β.k β.m. (2) Using this notation, we collect all linear terms involving Θ + := max{θ, 0} and Θ := max{ Θ, 0} and reformulate the objective given in (9) as min Θ log det(θ) + Θ +, U Θ, L. (22) The graphical lasso is a special case of the above problem in which U = L. In our case, U and L differ because of the structure of the GFL penalty. Nevertheless, we can derive a block coordinate algorithm for this more general setting. First we dualize (22) to get the following problem: max log det Ξ. (23) L Ξ U where Θ = Ξ. Then it can be shown that the diagonal of the covariance Ξ must attain the upper bound, i.e. we must have Ξ jj = U jj j =,..., q. Next, we perform block coordinate descent by cycling through each column (or row, due to symmetry) of Ξ. We denoted an arbitrary column of Ξ by ξ j, with corresponding columns u j and l j in U and L, respectively. Let Ξ j be the submatrix of Ξ obtained by deleting column j and row j. Then, by applying Schur s complement, maximizing (23) with respect to ξ j with all other columns fixed amounts to: Dualizing again, with ξ j = Ξ j α, we obtain min l j ξ j u 2 ξ j j Ξ j ξ j. (24) min α 2 α Ξj α + u α + l α, (25) which is essentially a lasso problem that we can solve using any known algorithm. Algorithm 2 of Appendix B outlines our procedure for solving (9). We use coordinate descent and apply a variant of the soft-thresholding operator to solve for each coordinate. This algorithm converges very quickly because there is no tuning of the step size, and each iteration involves only a matrix-vector product. 0

11 B Truth B GFLasso B GFLasso2 B CGGM B ICLasso Θ Truth Covariance GFLasso Precision GFLasso2 Ω CGGM Θ ICLasso Figure : A comparison of results on a single synthetic dataset with p = 60, q = 60, and a blockstructured output graph. The far left panel contains the ground truth for B and Θ. The remaining panels show the estimates of the regression coefficients for each method (top) along with the graph structure that was used or estimated by the method (bottom). 5 Simulation Study We first evaluate our method on synthetic data with known values of B and Θ so that we can directly measure how well the the true parameter values are recovered. We compare our regression estimates to several baselines, including ˆB standard multi-task lasso (Lasso), graph-guided fused lasso using a sparse covariance matrix as its graph (GFLasso), graph-guided fused lasso using a sparse precision matrix as its graph (GFlasso2), sparse multivariate regression with covariance estimation (MRCE), and the conditional Gaussian graphical model (CGGM). We also compare our network estimates ˆΘ to the graphical lasso (GLasso). For each method, we selected hyperparameter values by minimizing the error on a held-out validation set. For this analysis, we used two different synthetic data settings, each imposing a different network structure over the outputs: block-structured network: the outputs are divided into non-overlapping groups, and each group forms a fully connected sub-graph in the network tree-structured network: the outputs are related to one another according to a tree with a single root node and a branching factor of three, and all nodes share edges with their parents and children in the network Each synthetic dataset is generated using p = 60 inputs, q {30, 60, 90, 20} outputs, and n = 50 samples. After constructing a network over the outputs using one of the structures described above, we generate the sparsity pattern and parameter values for the coefficient matrix B. For the block-structured network, we choose a random 5% of inputs for each group, and assign positive weights between these inputs and all nodes in the group. We draw coefficient values according to v j Unif(0.2, 0.8) and set β jk = v j k group, such that all members of the group have the same coefficient value for these inputs. We also randomly select 5% additional inputs for each group that will be shared across all members of that group and one other group, and generate coefficients in the same way. For the tree-structured network, we choose a random 5% of inputs for each node, and assign a positive weight between those inputs and both the node and each of its children. We similarly generate coefficient values according to v j Unif(0.2, 0.8) and set β jk = v j k {parent,children}. Next we generate X N (0, Σ) where Σ jk = 0.6 j k. Finally,

12 Average Precision/Recall Curve for B Average Precision/Recall Curve for Θ Precision Lasso GFLasso GFLasso2 MRCE CGGM ICLasso Recall Average Precision/Recall Curve for B Precision GLasso ICLasso Recall Average Precision/Recall Curve for Θ Precision Lasso GFLasso GFLasso2 MRCE CGGM ICLasso Recall Precision GLasso ICLasso Recall Figure 2: Average precision-recall curves for B (left) and Θ (right) for synthetic datasets with two different parameter settings: p = 60, q = 20, block-structured graph (top), and p = 60, q = 60, tree-structured graph (bottom). given X and B, we generate Y N (XB, I). We create multiple synthetic datasets by varying the network structure and the number of outputs. For each setting, we average our results over 5 synthetic datasets. A synthetic data example is shown in Figure. The ground truth for both B and Θ is given in the far left panel. The next three columns show the estimated values of B for three competing methods, and the results of our method are shown on the far right. In this example, the drawbacks of each of the baseline methods are evident. The covariance matrix used for the network structure in GFLasso captures many spurious patterns in Y that don t correspond to true patterns in the regression map, which confuses the estimate of B. The precision matrix used for the network structure in GFLasso2 does not accurately capture the true inverse covariance structure because of the low signal-to-noise ration in Y. This prevents the fusion penalty from effectively influencing the estimate of B. Finally, although CGGM gets a reasonable estimate of the network, this structure is not explicitly enforced in B, which still leads to a poor estimate of the regression parameters. In contrast, the cleanest estimate of both ˆB and ˆΘ comes from ICLasso. Next we quantitatively evaluate how well each method is able estimate the sparsity structure of B and Θ. Since the hyperparameter values selected by minimizing the error on a validation set do not usually correspond to the values that lead to the most accurate sparsity, we calculate a precision-recall curve on top of ˆB and ˆΘ. To do this, we vary the threshold at which each parameter value is considered zero or nonzero and we calculate the precision and recall at each threshold. Figure 2 shows the precision-recall curves for B and Θ averaged over 5 datasets in two different parameter settings. The top two plots show results for a block-structured network with q = 20. Here it s evident that ICLasso significantly outperforms all of the baselines in its estimate of both the regression map and network structure. The bottom two plots show results for a tree-structured 2

13 AUCPR of B Best F Score of B AUCPR of Θ Best F Score of Θ Lasso GFLasso 0.4 GFLasso2 MRCE 0.3 CGGM ICLasso 0.2 q=30 q=60 q=90 q= Lasso GFLasso 0.4 GFLasso2 MRCE 0.3 CGGM ICLasso 0.2 q=30 q=60 q=90 q= GLasso ICLasso 0.2 q=30 q=60 q=90 q= GLasso ICLasso 0.2 q=30 q=60 q=90 q=20 Figure 3: Area under the precision-recall curve and best F score over all regularization parameter values for B and Θ shown for different numbers of outputs q and averaged over 30 synthetic datasets generated with both graph types. Prediction Error of Y on a Test Set Lasso GFLasso GFLasso2 MRCE CGGM ICLasso Figure 4: Test set prediction error on synthetic data with p = 60, q = 20, and a block-structured graph. Each box shows the mean (red), standard error interval (blue), and standard deviation (black) over 5 random datasets. network with q = 60. Although ICLasso wins on B by a small margin, it s true advantage in this setting is its estimate of Θ. To further quantify these results, the the average area under the PR curve for B is 0.87 ± 0.0 for ICLasso and 0.79 ± 0.02 for GFLasso2, which is the closest baseline. The average area under the PR curve for Θ is 0.9 ± 0.0 for ICLasso and 0.82 ± 0.06 for GLasso. Figure 3 shows the average area under the precision-recall curve and best F score (over all hyperparameter values) for B and Θ averaged over both network structures for each value of q. This shows that ICLasso consistently outperforms all baselines over multiple random datasets generated with multiple parameter settings. Finally, we examine the prediction error of each model when using ˆB to predict Y from X. Figure 4 shows a box plot of the error values obtained in one parameter setting. This demonstrates that ICLasso obtains lower error than the other methods. The results on synthetic data demonstrate that ICLasso performs significantly better than competing methods on a wide variety of metrics (accurate recovery of sparse structure in B, accurate recovery of sparse structure in Θ, and prediction of Y ) when our data assumptions hold. 3

14 Table : Regression Error on Yeast Data density training error validation error Lasso.65% GFLasso 2.87% ICLasso 6.88% Yeast eqtl Mapping In order to evaluate our approach in a real-world setting, we applied ICLasso to a yeast eqtl dataset from Brem and Kruglyak (2005) that consists of 2, 956 SNP genotypes and 5, 637 gene expression measurements across 4 yeast samples. Before running our method, we removed, 799 SNPs with duplicate genotypes and retained only the top 25% of genes with the highest variance in expression, leaving p =, 57 SNPs and q =, 409 genes in our analysis. We used our approach to jointly perform eqtl mapping and gene network inference on the yeast dataset, treating the the SNPs as inputs X and the genes as outputs Y. We trained our model on 9 samples and used the remaining 23 samples as a validation set for tuning the hyperparameters. This yielded λ = 0.54, λ 2 = 0.57, and γ = 0.8. Given the trained model, we read the eqtl associations from the regression coefficient matrix ˆB, which encodes SNP-gene relationships, and obtained the gene network from the inverse covariance matrix ˆΘ, which encodes gene-gene relationships. In addition to ICLasso, we ran Lasso and GFlasso on the yeast data to obtain two additional estimates of B, and ran graphical lasso to obtain another estimate of Θ. 3 Table shows the density of ˆB obtained with each method, along with the prediction error of Y on the training set and on the held-out validation set, which were calculated using Y train X train ˆB 2 F and Y valid X valid ˆB 2 F, respectively. We chose not to sacrifice any data for a test set, but these results indicate that ICLasso achieves an equivalent or better fit to the training and validation sets than lasso and glasso. 6. Quantitative Analysis Because the true yeast eqtls and gene network structure are not known, there is no ground truth for this problem. We instead analyzed the output of each method by performing a series of enrichment analyses that together provide a comprehensive picture of the biological coherence of the results. An enrichment analysis uses gene annotations to identify specific biological processes, functions, or structures that are over-represented among a group of genes relative to the full set of genes that is examined (Subramanian et al., 2005). To evaluate our yeast data results, we performed three types of enrichment analyses: biological process and molecular function enrichment using annotations from the Gene Ontology (GO) database (Ashburner et al., 2000), and pathway enrichment using annotations from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa and Goto, 2000). We used a hypergeometric test to compute a p-value for each term, and then adjusted the values to account for multiple hypothesis testing. Significance was determined using an adjusted p-value cutoff of The hyperparameters for each of these methods were selected with the same validation set, and we used λ = for lasso, λ = 0.23 and γ = for gflasso, and λ = 0.47 for glasso. 4

15 Table 2: GO and KEGG Enrichment Analysis on Yeast eqtl Map number of enriched terms average number of enriched SNPs increase GO-BP GO-MF KEGG GO-BP GO-MF KEGG average increase Lasso GFLasso % % ICLasso % % Table 3: GO and KEGG Enrichment Analysis on Yeast Gene Network number of enriched terms average number of enriched clusters increase GO-BP GO-MF KEGG GO-BP GO-MF KEGG average increase GLasso ICLasso % % We first analyzed ˆB by performing a per-snp enrichment analysis. For each SNP j, we used the nonzero elements in β j to identify the set of genes associated with the SNP. Next we performed GO and KEGG enrichment analyses on this group of genes by comparing their annotations to the full set of, 409 genes that we included in our study. We repeated this procedure for each SNP, and calculated the total number of terms that were enriched over all SNPs to obtain a global measure of enrichment for ˆB. In addition, we calculated the total number of SNPs that were enriched for at least one term in each category. These results are summarized in Table 2. It is evident that ICLasso vastly outperforms both GFLasso and Lasso on estimating the regression coefficients, since it has more than twice as many enriched terms in GO biological process, GO molecular function, and KEGG than either baseline method. Next we used a similar approach to evaluate the structure present in ˆΘ. We first obtained groups of genes by using spectral clustering to perform community detection among the genes based on the inferred network structure. After clustering the genes into 00 groups, 4 we performed GO and KEGG enrichment analyses on each cluster and calculated the total number of enriched terms along with the total number of clusters that were enriched for at least one term. These results are summarized in Table 3. Once again, our approach has more enrichment than the baseline in every category, which implies that the gene network estimated by ICLasso has a much more biologically correct structure than the network estimated by GLasso. 6.2 Qualitative Analysis The quantitative results in Tables 2 and 3 indicate that, compared to other methods, our approach identifies more eqtls that are associated with genes significantly enriched in certain biological processes and pathways. A more detailed examination of our results revealed that many of the enriched terms correspond to metabolic pathways, and that the eqtls we identified agree with those discovered in a previous study that analyzed the effect of genetic variations on the yeast metabolome. 4 We also clustered with 25, 50, and 200 groups and obtained similar results. 5

16 Breunig et al. (204) identified the metabolite quantitative trait loci (mqtls) for 34 metabolites and then examined each mqtl for the presence of metabolic genes in the same pathway as the linked metabolite. We found that 0 of these 34 metabolites were linked to metabolic genes where our identified eqtls reside. For example, Breunig et. al. determined that the metabolite valine is linked to an mqtl in a region spanned by the ILV6 gene, which encodes a protein involved in valine biosynthesis. In our study, we also identified an eqtl located in ILV6. Moreover, we found that the eqtl in ILV6 is associated with 365 genes that are significantly enriched for pathways involved in the metabolism and biosynthesis of various amino acids. This is consistent with the fact that the metabolism and biosynthesis of amino acids in the cell needs to be coordinated. Furthermore, our enrichment analysis shows that the eqtl-associated genes we identified are enriched for various metabolic pathways (e.g. sulfur, riboflavin, protein, starch, and sucrose metabolism, oxidative phosphorylation, glycolysis), as well as more general pathways, such as cell cycle pathways, and MAPK pathways. This is consistent with the roles of the mqtls identified by Breunig et al. Interestingly, among these genes, SAM, encoding an S-adenosylmethionine synthetase, is also among the eqtls in our list. Our results show that the eqtl we found in SAM is associated with 252 genes that are enriched for cytoplasmic translation and ribosome functions, consistent with the fact that SAM is the methyl donor in most methylation reactions and is essential for DNA methylation of proteins, nucleic acids, and lipids (Roberts and Selker, 995). 7 Conclusion In this work, we propose a new model called the inverse-covariance-induced fused lasso which jointly estimates regression coefficients B and an output network Θ while using a graph-guided fused lasso penalty to explicitly encourage shared structure. Our model is formulated as a biconvex optimization problem, and we derive new, efficient optimization routines for each convex subproblem based on existing methods. Our results on both synthetic and real data unequivocally demonstrate that our model achieves significantly better performance on recovery of the structure of B, recovery of the structure of Θ, and prediction error than all six baselines that we evaluated. In this paper, we demonstrated that our approach can effectively be used to perform joint eqtl mapping and gene network estimation on a yeast dataset, yielding more biologically coherent results than previous work. However, the same problem setting appears in many different applications, and the inverse-covariance-induced fused lasso model can therefore be effectively used within a wide range of domains. 6

17 References Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., et al. (2000). Gene ontology: tool for the unification of biology. Nature Genetics, 25(): Bishop, C. M. (2006). Pattern recognition. Machine Learning. Brem, R. B. and Kruglyak, L. (2005). The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences, 02(5): Breunig, J. S., Hackett, S. R., Rabinowitz, J. D., and Kruglyak, L. (204). metabolome variation in yeast. PLoS Genetics, 0(3):e Genetic basis of Chen, X., Kim, S., Lin, Q., Carbonell, J. G., and Xing, E. P. (200). Graph-structured multitask regression and an efficient optimization method for general fused lasso. arxiv preprint arxiv: Dempster, A. P. (972). Covariance selection. Biometrics, pages Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3): Gardner, T. S. and Faith, J. J. (2005). Reverse-engineering transcription control networks. Physics of Life Reviews, 2(): Kanehisa, M. and Goto, S. (2000). Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28(): Kim, S., Sohn, K.-A., and Xing, E. P. (2009). A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics, 25(2):i204 i22. Kim, S. and Xing, E. P. (2009). Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genetics, 5(8):e Kim, S., Xing, E. P., et al. (202). Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eqtl mapping. The Annals of Applied Statistics, 6(3): Lee, W. and Liu, Y. (202). Simultaneous multiple response regression and inverse covariance matrix estimation via penalized gaussian maximum likelihood. Journal of Multivariate Analysis, : Marbach, D., Prill, R. J., Schaffter, T., Mattiussi, C., Floreano, D., and Stolovitzky, G. (200). Revealing strengths and weaknesses of methods for gene network inference. Proceedings of the National Academy of Sciences, 07(4): Michaelson, J. J., Alberts, R., Schughart, K., and Beyer, A. (200). Data-driven assessment of eqtl mapping methods. BMC Genomics, ():502. Mohan, K., London, P., Fazel, M., Witten, D., and Lee, S.-I. (204). Node-based learning of multiple gaussian graphical models. The Journal of Machine Learning Research, 5():

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

TREE-GUIDED GROUP LASSO FOR MULTI-RESPONSE REGRESSION WITH STRUCTURED SPARSITY, WITH AN APPLICATION TO EQTL MAPPING

TREE-GUIDED GROUP LASSO FOR MULTI-RESPONSE REGRESSION WITH STRUCTURED SPARSITY, WITH AN APPLICATION TO EQTL MAPPING Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 TREE-GUIDED GROUP LASSO FOR MULTI-RESPONSE REGRESSION WITH STRUCTURED SPARSITY, WITH AN APPLICATION TO EQTL MAPPING By Seyoung Kim and

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression Piyush Rai Dept. of Computer Science University of Texas at Austin Austin, TX piyush@cs.utexas.edu Abhishek Kumar Dept.

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Joint Estimation of Structured Sparsity and Output Structure in Multiple-Output Regression via Inverse-Covariance Regularization

Joint Estimation of Structured Sparsity and Output Structure in Multiple-Output Regression via Inverse-Covariance Regularization Joint Estimation of Structured Sparsity and Output Structure in Multiple-Output Regression via Inverse-Covariance Regularization Kyung-Ah Sohn, Seyoung Kim School of Computer Science, Carnegie Mellon University

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

An Efficient Proximal Gradient Method for General Structured Sparse Learning

An Efficient Proximal Gradient Method for General Structured Sparse Learning Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Group lasso for genomic data

Group lasso for genomic data Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Discriminative Feature Grouping

Discriminative Feature Grouping Discriative Feature Grouping Lei Han and Yu Zhang, Department of Computer Science, Hong Kong Baptist University, Hong Kong The Institute of Research and Continuing Education, Hong Kong Baptist University

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

Hub Gene Selection Methods for the Reconstruction of Transcription Networks

Hub Gene Selection Methods for the Reconstruction of Transcription Networks for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Node-Based Learning of Multiple Gaussian Graphical Models

Node-Based Learning of Multiple Gaussian Graphical Models Journal of Machine Learning Research 5 (04) 445-488 Submitted /; Revised 8/; Published /4 Node-Based Learning of Multiple Gaussian Graphical Models Karthik Mohan Palma London Maryam Fazel Department of

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Sparse Additive Functional and kernel CCA

Sparse Additive Functional and kernel CCA Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Learning Markov Network Structure using Brownian Distance Covariance

Learning Markov Network Structure using Brownian Distance Covariance arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie, Machine Learning Department, Carnegie Mellon University 1. Background Latent Variable Models (LVMs) are

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Bayesian Inference of Multiple Gaussian Graphical Models

Bayesian Inference of Multiple Gaussian Graphical Models Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference

More information

Proximity-Based Anomaly Detection using Sparse Structure Learning

Proximity-Based Anomaly Detection using Sparse Structure Learning Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim Gene Regulatory Networks II 02-710 Computa.onal Genomics Seyoung Kim Goal: Discover Structure and Func;on of Complex systems in the Cell Identify the different regulators and their target genes that are

More information

Predicting causal effects in large-scale systems from observational data

Predicting causal effects in large-scale systems from observational data nature methods Predicting causal effects in large-scale systems from observational data Marloes H Maathuis 1, Diego Colombo 1, Markus Kalisch 1 & Peter Bühlmann 1,2 Supplementary figures and text: Supplementary

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Heterogeneous Multitask Learning with Joint Sparsity Constraints

Heterogeneous Multitask Learning with Joint Sparsity Constraints Heterogeneous ultitas Learning with Joint Sparsity Constraints Xiaolin Yang Department of Statistics Carnegie ellon University Pittsburgh, PA 23 xyang@stat.cmu.edu Seyoung Kim achine Learning Department

More information

Grouping of correlated feature vectors using treelets

Grouping of correlated feature vectors using treelets Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features

More information

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Qingming Tang Chao Yang Jian Peng Jinbo Xu Toyota Technological Institute at Chicago Massachusetts Institute of Technology Abstract. This

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

Within Group Variable Selection through the Exclusive Lasso

Within Group Variable Selection through the Exclusive Lasso Within Group Variable Selection through the Exclusive Lasso arxiv:1505.07517v1 [stat.me] 28 May 2015 Frederick Campbell Department of Statistics, Rice University and Genevera Allen Department of Statistics,

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation

A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation A Divide-and-Conquer Procedure for Sparse Inverse Covariance Estimation Cho-Jui Hsieh Dept. of Computer Science University of Texas, Austin cjhsieh@cs.utexas.edu Inderjit S. Dhillon Dept. of Computer Science

More information

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin

More information

Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso

Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso Keerthiram Murugesan 1 and Jaime Carbonell 1 1 Language Technologies Institute Carnegie Mellon University Pittsburgh, USA CMU-LTI-15-??

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Correlate. A method for the integrative analysis of two genomic data sets

Correlate. A method for the integrative analysis of two genomic data sets Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives Paul Grigas May 25, 2016 1 Boosting Algorithms in Linear Regression Boosting [6, 9, 12, 15, 16] is an extremely

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information