Differential network analysis from cross-platform gene expression data: Supplementary Information

Size: px

Start display at page:

Download "Differential network analysis from cross-platform gene expression data: Supplementary Information"

Angela Lester
5 years ago
Views:

1 Differential network analysis from cross-platform gene expression data: Supplementary Information Xiao-Fei Zhang, Le Ou-Yang, Xing-Ming Zhao, and Hong Yan Contents 1 Supplementary Figures Supplementary Text 7.1 Brief review of ADMM algorithms ADMM algorithm for fused graphical lasso with weighted l 1 penalty Θ and Θ k update Z and Z k update Complete ADMM algorithm Stop criteria Varying penalty parameter Complete algorithm of the TDJGL model Model selection Criteria for platinum response groups Comparison with other graphical lasso models on the ovarian cancer data

1 Supplementary Figures 8 8 8 8 8 8 Figure S1: Examples for three types

a) Erdös-Rényi network, b) scale-free network, and c) community network.

FP Edges 5 FP Differential Edges 1 Positive Edges 3 5 5 15 1 5 3 1 FP

compared methods on Erdos-Renyi network with p =, K = 3, τ = 1% and a) n

Each colored line corresponds to a fixed value of λ ω for GGL), as λ 1 ω

2 1 Supplementary Figures Figure S1: Examples for three types of networks used to generate simulated data sets. a) Erdös-Rényi network, b) scale-free network, and c) community network FP Edges 5 FP Differential Edges Positive Edges FP Edges 5 FP Differential Edges 1 Positive Edges FP Edges 5 FP Differential Edges Positive Edges Figure S: Performance of the compared methods on Erdos-Renyi network with p =, K = 3, τ = 1% and a) n = 5, b) n =, c) n =. Each colored line corresponds to a fixed value of λ ω for GGL), as λ 1 ω 1 for GGL) is varied. Variables corresponding to the axes are explained in Table 1 in the main text. Results are averaged over random generations of the data.

3 15 1 5 5 5 3 1 3 15 1 5 3 1 5 3 15 1 5 5 3 1 Figure S3: Performance of the compared methods on community network with p =, K = 3, τ = 1% and a) n = 5, b) n =, c) n =.

3 Figure S3: Performance of the compared methods on community network with p =, K = 3, τ = 1% and a) n = 5, b) n =, c) n =. Each colored line corresponds to a fixed value of λ ω for GGL), as λ 1 ω 1 for GGL) is varied. Variables corresponding to the axes are explained in Table 1 in the main text. Results are averaged over random generations of the data. 3

4 8 a) n = FP Edges 8 FP Edges 8 FP Edges FP Differential Edges b) n = 5 FP Differential Edges c) n = 5 FP Differential Edges Positive Edges Positive Edges Positive Edges Figure S4: Performance of the compared models on scale-free network with p =, K = 3, τ = % and a) n = 5, b) n =, c) n =. Each colored line corresponds to a fixed value of λ ω for GGL), as λ 1 ω 1 for GGL) is varied. Variables corresponding to the axes are explained in Table 1 in the main text. Results are averaged over random generations of the data. 4

5 Figure S5: Performance of the compared models on scale-free network with p =, K = 3, τ = 5% and a) n = 5, b) n =, c) n =. Each colored line corresponds to a fixed value of λ ω for GGL), as λ 1 ω 1 for GGL) is varied. Variables corresponding to the axes are explained in Table 1 in the main text. Results are averaged over random generations of the data. 5

6 G45 133) 6 G45 119) 61 G45 195) U ) HuEx 134) U ) HuEx 16) U ) HuEx 16) Figure S6: Overlaps between the edges and differential edges) detected by TDJGL from the three platforms for a) platinumresistant tumors, b) platinum-sensitive tumors and c) differential networks. 6

7 Supplementary Text.1 Brief review of ADMM algorithms In this section, we briefly review the standard alternating direction method of multipliers ADMM) algorithms [1]. ADMM is a technique for solving optimization problem in the following form []: min X f X) + g X) subject to X X. 1) ADMM is attractive when the proximal operator of f X) + g X) cannot be easily obtained, but the proximal operator of f X) and the proximal operator of g X) can be easily computed. The approach consists of the following three steps []: 1. Rewrite the problem 1) as min X,Z f X) + g Z) subject to X X, X = Z, ) where functions f and g are decoupled by introducing a new optimization variable, Z.. Form the augmented Lagrangian L X, Z, Λ) = f X) + g Z) + Λ, X Z + X Z F, 3) where Λ is the Lagrange multiplier and > is a penalty parameter. 3. Iterate the following two steps until convergence a) Update each primal variable in turn by minimizing the augmented Lagrangian 3) with respect to that variable, while holding all other variables. The update in the tth iteration are as follows: b) Update the dual variable using a dual-ascent update: X t+1 arg min L X, Z t, Λ t), X X Z t+1 arg min L X t, Z, Λ t). Z Λ t+1 Λ t + X t+1 Z t+1).. ADMM algorithm for fused graphical lasso with weighted l 1 penalty Following in the method of [3], we use the ADMM algorithm to solve the fused graphical lasso with weighted l 1 penalty problem 5) in the main text) min Θ, Θ k S p ++ n c tr S kc Θ kc) log det Θ kc)) + λ 1 c=1 ω θ kc + λ ψ θ c=1 i j i,j θ k. 4) To solve the problem 4), we reformulate it by introducing new variables Z and Z k, so as to decouple some of the terms in the objective function that are difficult to optimize jointly: min Θ, Θ k S p ++, Z, Z k n c tr S kc Θ kc) log det Θ kc)) + λ 1 c=1 ω z kc + λ c=1 i j i,j ψ z zk subject to Θ = Z, Θ k = Z k. 5) The augmented Lagrangian to 5) is given by L Θ, Θ k, Z, Z k, U, U k) = n c tr S kc Θ kc) log det Θ kc)) + λ 1 c=1 +λ + i,j ψ z ω z kc c=1 i j z k + U, Θ Z + U k, Θ k Z k Θ Z F + Θ k Z k F, 6) where U and U k are dual variables and severs as the penalty parameter. The ADMM algorithm updates each primal variable while holding the other variables fixed and then updates the dual variables using a dual-ascent update rule. We now derive the update rules for the variables. 7

8 ..1 Θ and Θ k update Before introducing the update rules for Θ and Θ k, we first define the Expand operator: Expand A,, n) = arg min n log det Θ) + Θ S p Θ A = 1 D U + ++ D + 4n I ) U T, 7) where UDU T is the eigenvalue decomposition of a symmetric matrix A. The Expand operator has been used to solve the graphical lasso problem in previous studies [, 4, 3]. Note that Θ = arg min Θ S p ++ = arg min Θ S p ++ L Θ, Θ k, Z, Z k, U, U k) n 1 log det Θ ) + Θ Z 1 n1 S + U )) F Now it follows from the definition of the Expand operator that Θ Expand Z 1 n1 S + U ) ),, n 1. 9) 8) The update for Θ k can be derived in a similar way: Θ k Expand Z k 1 n S k + U k) ),, n. 1).. Z and Z k update Minimizing augmented Lagrangian 6) with respect to Z and Z k can be written as follows: {Z, Z k } = arg min L Θ, Θ k, Z, Z k, U, U k) {Z, Z k } ) Θ kc + Ukc = arg min {Z, Z k } c=1 Zkc F + λ 1 ω z kc + λ c=1 i j i,j ψ z z k.11) Now 11) is completely separable with respect to each pair of matrix elements i, j). That is, we can solve for each i, j): {z, z k } = arg min {z,zk } z kc a kc ) + λ1 ω z kc + λ ψ z z k, 1) c=1 where a kc = zkc. This is a special case of the fused lasso signal approximator [3, 5] and it has a very simple closed form solution. When λ 1 =, the solution to 1) takes the form + ukc c=1 z, z k ) = a λ ψ /, a k + λ ψ / ) a + λ ψ /, a k λ ψ / ) ) a +ak +ak, a if a if a k if a > ak > a ak + λ ψ / + λ ψ / λ ψ /. 13) When λ 1 >, the solution to 1) can be obtained through soft-thresholding 13) by λ 1 ω. Here the soft-thresholding is defined as Sx, c) = signx) x c) +, where a + = maxa, )...3 Complete ADMM algorithm Based on the augmented Lagrangian, the complete ADMM algorithm for 4) is given in Algorithm 1. We find it is useful in practice to vary the value of penalty parameter for each iteration. Therefore, we update in each iteration based on the primal residues and dual residues. We present the update rule for at the end of this section. Please note that according to the ADMM algorithm, the estimates ˆΘ and ˆΘ are not exactly sparse, while Ẑ and Ẑ are sparse which are obtained through soft-thresholding...4 Stop criteria Let Θ kc t, Z kc t and U kc t denote the estimates at the tth iteration. The primal residual [1] is defined as [ ] [ ] Θ t Z P t = Θ k t t Z k, 14) t 8

9 Algorithm 1 ADMM algorithm for optimization problem 4) Inputs: Sample covariance matrices S and S k, sample size n 1 and n, penalty parameter = 1, µ = 1, τ incr = and τ decr = Output: Estimated precision matrices, ˆΘ and ˆΘ, and their sparse approximation, Ẑ and Ẑ Initialize: Θ kc = I, Z kc = I, U kc = for c = 1, while Not converged do 1. Θ Expand Z 1. Θ k Expand n1 S + U ),, n 1 ) ; Z k 1 n S k + U k) ),, n ; 3. Update Z and Z k through solving problem 11); 4. U U + Θ Z ) ; 5. U k U k + Θ k Z k) ; 6. Update according to 18). Output Θ, Θ, Z and Z and the dual residual [1] is defined as D t = [ ] Z t Z k t [ ]) Z t 1 Z k. 15) t 1 We consider the process converges if both primal residual and dual residual are sufficiently small [1]. More specially, we introduce small positive constants ɛ abs and ɛ rel, and declare P t and D t small if P t F { [ ] [ ] } pɛ abs + ɛ rel Θ t max Θ k, Z t t Z k, 16) t and F D t F [ ] pɛ abs + ɛ rel U t U k. 17) t F We set ɛ abs = 1 3 and ɛ rel = 1 3 as suggested by Boyd et al. [1]. The choice of the value of ɛ abs depends on the scale of the variable values...5 Varying penalty parameter In practice, it is useful to use different penalty parameter for each iteration, which might improve the convergence as well as make performance less dependent on the initial value of penalty parameter [1]. Therefore, we update the value of in the tth iteration according to the primal residuals and dual residuals: τ incr if P t F > µ D t F /τ decr if D t F > µ F t F 18) otherwise where µ > 1, τ incr > 1 and τ decr > 1 are the adaptation parameters. We set µ = 1, τ incr = and τ decr = according to [1]..3 Complete algorithm of the TDJGL model By taking Equation 4) into Equation 3) in the main text, the TDJGL model can be written as follows: K n c tr S kc Θ kc) log det Θ kc)) K K + λ 1 θ kc + λ min {Θ} k=1 c=1 i j k=1 c=1 F i,j k=1 θ θk s.t. Θ kc S p ++, for k = 1,..., K and c = 1,. 19) Given the sample covariance matrices { S kc} c=1, and the two parameters λ 1 and λ, we can find the estimates of { } c=1, {Ẑkc } c=1, precision matrices ˆΘ kc and their sparse approximation using Algorithm. Here we first decomposed 19) into K individual subproblems 4) using the method of local linear approximation see the main text). Then we 9

10 { } c=1, iteratively solve each subproblems until convergence. Because the estimates ˆΘ kc } c=1, should be not accurately sparse {Ẑkc when they are obtained by using the ADMM algorithm, we use in the experiments, which are introduced as { } c=1, sparse approximation of ˆΘ kc by the ADMM algorithm please refer to Algorithm 1). Algorithm Complete algorithm of the TDJGL model 19) Inputs: The K sample covariance matrices { S kc} c=1,, and the regularization parameters λ 1 and λ Output: {Ẑkc} c=1, The K estimated precision covariance matrices Main algorithms: { ˆΘ kc } c=1, and their sparse approximation 1. Initialize ˆΘ kc for k = 1,..., K and c = 1,. 1. Compute the weights ω = K ˆθ kc k=1 c=1 1 and ψ = K ˆθ ˆθ k k=1 for i, j = 1,..., p. 3. Update ˆΘ and ˆΘ k for all k = 1,..., K by solving problem 4) using Algorithm Repeat Steps and 3 until the convergence condition is achieved. { } c=1, } c=1, 5. Output ˆΘ kc {Ẑkc and.4 Model selection For the TDJGL model, parameter λ 1 controls the sparsity of the estimated networks, and parameter λ has an influence on the sparsity of resulting differential networks. Therefore, the choice of λ 1 and λ is critical. We determine the values of parameters in data-driven method via stability selection [6]. Stability selection, which seeks the parameters leading to most stable set of edges, has better results for network inference than other model selection methods including cross validation, Akaike information criterion and Bayesian information criterion [7, 8, 9, 1]. We choose λ 1 and λ so as to use the least amount of regularization that simultaneously makes network sparse and stable. Here we resort to a recently developed stability selection method called StARS [7]. Because λ 1 mainly influences the sparsity and stability of resulting gene networks, while λ mainly controls the sparsity and stability of estimated differential networks, we determine their values separately. We first determine the value of λ 1 while setting λ =. Then we determine the value of λ while setting λ 1 with the value chosen in the previous step. We draw S random sample sets D 1,..., D S from the n = n 1 + n patients, each of size.8n. For now, we choose λ 1 } c=1, {Êkc from a given vector of regularization parameter Λ 1 with setting λ =. We estimate K networks s λ 1, λ ) for each D s and each λ 1 from Λ 1. The optimal value of λ 1 controls the average variance over the edges of the networks inferred from sub-sampled data: λ λ) 1,opt = arg min γ Λ 1 max λ 1 γ K k=1 c=1 i<j ā kc λ 1, λ ) 1 ā kc λ 1, λ ) ) / ) / p ) K β, ) where ā kc λ 1, λ ) = 1 S S s=1 i, I j) Ê kc s λ 1, λ ). Here we present StARS using a completeness form. Interested reader is referred to [7]. After determining λ 1, we choose λ from a given vector of regularization parameter Λ according to stability of inferred differential networks. For now, we set λ 1 = λ λ) 1,opt which is determined above. We estimate K networks {Êkc s λ 1, λ ) } k for each D s and each λ from Λ. Then we construct K differential networks, { DE s λ 1, λ ), based on the estimated networks. The optimal value for λ is chosen according to the average variance over the differential edges inferred from sub-sampled data: K λ λ1),opt = arg min γ Λ max b k λ 1, λ ) 1 b k λ 1, λ ) ) / ) / p K β λ γ, 1) i<j where b k λ 1, λ ) = 1 S S s=1 i, I j) k=1 DE kc s λ 1, λ ) ). } c=1, 1

11 In this study, we set the number of random sample sets N = and the stability parameter β =.1..5 Criteria for platinum response groups The criterion that has been used in [11, 1] is adopted to define platinum-based chemotherapy response groups. In particular, we download the clinical information Biotab format) of ovarian tumors from the TCGA website. We obtain the drug information from the nationwidechildrens.org clinical drug ov.txt file. Here we only consider platinum-based drugs carboplatin, cisplatin, carbo) with regimen indication ADJUVANT and OTHER, SPECIFY IN NOTES. The cancer progression information are obtained from the nationwidechildrens.org clinical follow up v1. nte ov.txt file. New tumors with new tumor event dx evidence = [Not Available] and new neoplasm event type = [Unknown] are not considered. The follow-up information of tumors with no progression are obtained from the nationwidechildrens.org clinical follow up v1. ov.txt file. For tumors with progression, it is defined as platinum-resistant if the new tumor occurs within 6 months of the end of primary treatment days to new tumor event after initial treatment days to drug therapy end 18 days), and it is defined as platinum-sensitive otherwise days to new tumor event after initial tr days to drug therapy end > 18 days). For tumors with no progression, it is defined as platinum-sensitive if the follow-up interval is at least 6 months from the date of last primary treatment days to last followup days to drug therapy end > 18 days). Among the 514 tumors that have all the three types of gene expression profiles, 3 tumors that have explicit cisplatinum status, with 4 platinum-sensitive tumors and 98 platinum-resistant tumors. The sensitive and resistant information for each sample is presented in Supplementary information..6 Comparison with other graphical lasso models on the ovarian cancer data We compare TDJGL with FGL, GGL and GL on the ovarian cancer data. For FGL, we run it separately for each platform and each time it is applied across the two patient groups. For GGL, we run it separately for each patient group and each time it is applied across all the three platforms. For GL, we run it separately for each patient group and each platform type. In order to provide interpretable results, we select the tuning parameters of the compared methods to give the similar number of edges and differential edges as those of TDJGL. Unlike TDJGL and FGL, GGL and GL cannot control the similarities of precision matrices between different patient groups. They tend to identify too much differential edges. To better interpret the results of GGL and GL, we sort the absolute values of differential scores in decreasing order, and take the top #DE the number of differential edges identified by TDJGL) edges for each model. A common challenge in evaluating gene network inference and differential network analysis using real data is the lack of the gold standards. That is, in our ovarian cancer data analysis, we cannot obtain the true gene networks in the platinumresistant tumors and the platinum-sensitive tumors. Therefore, it is difficult to compare different methods in terms of the accuracy of identifying group-specific gene networks and differential networks. In this study, we adopt an alternative way to evaluate performance. First, we compare the methods based on the overlaps between edges inferred from different platforms, which can assess the consistency. A method that produces a greater number of edges shared by different platforms is more consistent. Then, we compare the hub nodes in the differential networks in terms of known drug resistance-related genes and cancer-related genes. A method that works better in capturing known functionally important genes in the differential networks might have better performance in inferring the differential networks. We observe that overlaps between edges and differential edges) identified by FGL and GL from the different platforms are quite low Figures S7 and S9). For GGL which encourages a similar network structure across all platforms, more than half of identified edges are shared by all the three platforms Figure S8 a)-b)). Because GGL does not consider the similarity of differential networks across platforms, a great number of differential edges detected by GGL are supported by only one platform Figure S8 c)). As mentioned in the main text, both gene networks and differential networks inferred by TDJGL share a great number of edges across all the three platforms Figure S6). We also compare the hub nodes in the differential networks inferred by different methods. For FGL, GGL and GL, we consider the 18 genes which have the largest degree of connectivity as hub genes. From Table S1, we find that the set of hub genes determined by TDGJL includes more cisplatin resistance-related genes, drug resistance-related genes and cancer-related genes than those determined by the other three methods. Table S1: The number of hub genes that have been reported as platinum resistance-related genes, drug resistance-related genes and cancer-related genes Methods GEAR cisplatin GEAR drug CGC TDJGL FGL 3 6 GGL GL

12 a) b) c) G45 79) G45 76) G45 168) U ) HuEx 1793) U133 66) HuEx 1713) U ) HuEx 63) Figure S7: Overlaps between the edges and differential edges) detected by FGL from the three platforms for a) platinumresistant tumors, b) platinum-sensitive tumors and c) differential networks. a) b) c) Figure S8: Overlaps between the edges and differential edges) detected by GGL from the three platforms for a) platinumresistant tumors, b) platinum-sensitive tumors and c) differential networks. Figure S9: Overlaps between the edges and differential edges) detected by GL from the three platforms for a) platinum resistant tumors, b) platinum-sensitive tumors and c) differential networks. References [1] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 31):1 1, 11. [] Karthik Mohan, Palma London, Maryam Fazel, Daniela Witten, and Su-In Lee. Node-based learning of multiple gaussian graphical models. The Journal of Machine Learning Research, 151): , 14. [3] Patrick Danaher, Pei Wang, and Daniela M Witten. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B Statistical Methodology), 76): , 14. 1

13 [4] Daniela M Witten and Robert Tibshirani. Covariance-regularized regression and classification for high dimensional problems. Journal of the Royal Statistical Society: Series B Statistical Methodology), 713): , 9. [5] Holger Hoefling. A path algorithm for the fused lasso signal approximator. Journal of Computational and Graphical Statistics, 194):984 6, 1. [6] Nicolai Meinshausen and Peter Bühlmann. Stability selection. Journal of the Royal Statistical Society: Series B Statistical Methodology), 74): , 1. [7] Han Liu, Kathryn Roeder, and Larry Wasserman. Stability approach to regularization selection stars) for high dimensional graphical models. In Advances in neural information processing systems, pages , 1. [8] Genevera I Allen and Zhandong Liu. A local poisson graphical model for inferring networks from sequencing data. IEEE Transactions on NanoBioscience, 13): , 13. [9] Marinka Žitnik and Blaž Zupan. Gene network inference by fusing data from diverse distributions. Bioinformatics, 311):i3 i39, 15. [1] Zachary D Kurtz, Christian L Müller, Emily R Miraldi, Dan R Littman, Martin J Blaser, and Richard A Bonneau. Sparse and compositionally robust inference of microbial ecological networks. PLOS Computational Biology, 115):e46, 15. [11]. The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature, ):9 615, 11. [1] Sheida Nabavi, Daniel Schmolze, Mayinuer Maitituoheti, Sadhika Malladi, and Andrew H Beck. Emdomics: a robust and powerful method for the identification of genes differentially expressed between heterogeneous classes. Bioinformatics, page btv634,

Node-Based Learning of Multiple Gaussian Graphical Models

Journal of Machine Learning Research 5 (04) 445-488 Submitted /; Revised 8/; Published /4 Node-Based Learning of Multiple Gaussian Graphical Models Karthik Mohan Palma London Maryam Fazel Department of