Differential network analysis from cross-platform gene expression data: Supplementary Information

Size: px
Start display at page:

Download "Differential network analysis from cross-platform gene expression data: Supplementary Information"

Transcription

1 Differential network analysis from cross-platform gene expression data: Supplementary Information Xiao-Fei Zhang, Le Ou-Yang, Xing-Ming Zhao, and Hong Yan Contents 1 Supplementary Figures Supplementary Text 7.1 Brief review of ADMM algorithms ADMM algorithm for fused graphical lasso with weighted l 1 penalty Θ and Θ k update Z and Z k update Complete ADMM algorithm Stop criteria Varying penalty parameter Complete algorithm of the TDJGL model Model selection Criteria for platinum response groups Comparison with other graphical lasso models on the ovarian cancer data

2 1 Supplementary Figures Figure S1: Examples for three types of networks used to generate simulated data sets. a) Erdös-Rényi network, b) scale-free network, and c) community network FP Edges 5 FP Differential Edges Positive Edges FP Edges 5 FP Differential Edges 1 Positive Edges FP Edges 5 FP Differential Edges Positive Edges Figure S: Performance of the compared methods on Erdos-Renyi network with p =, K = 3, τ = 1% and a) n = 5, b) n =, c) n =. Each colored line corresponds to a fixed value of λ ω for GGL), as λ 1 ω 1 for GGL) is varied. Variables corresponding to the axes are explained in Table 1 in the main text. Results are averaged over random generations of the data.

3 Figure S3: Performance of the compared methods on community network with p =, K = 3, τ = 1% and a) n = 5, b) n =, c) n =. Each colored line corresponds to a fixed value of λ ω for GGL), as λ 1 ω 1 for GGL) is varied. Variables corresponding to the axes are explained in Table 1 in the main text. Results are averaged over random generations of the data. 3

4 8 a) n = FP Edges 8 FP Edges 8 FP Edges FP Differential Edges b) n = 5 FP Differential Edges c) n = 5 FP Differential Edges Positive Edges Positive Edges Positive Edges Figure S4: Performance of the compared models on scale-free network with p =, K = 3, τ = % and a) n = 5, b) n =, c) n =. Each colored line corresponds to a fixed value of λ ω for GGL), as λ 1 ω 1 for GGL) is varied. Variables corresponding to the axes are explained in Table 1 in the main text. Results are averaged over random generations of the data. 4

5 Figure S5: Performance of the compared models on scale-free network with p =, K = 3, τ = 5% and a) n = 5, b) n =, c) n =. Each colored line corresponds to a fixed value of λ ω for GGL), as λ 1 ω 1 for GGL) is varied. Variables corresponding to the axes are explained in Table 1 in the main text. Results are averaged over random generations of the data. 5

6 G45 133) 6 G45 119) 61 G45 195) U ) HuEx 134) U ) HuEx 16) U ) HuEx 16) Figure S6: Overlaps between the edges and differential edges) detected by TDJGL from the three platforms for a) platinumresistant tumors, b) platinum-sensitive tumors and c) differential networks. 6

7 Supplementary Text.1 Brief review of ADMM algorithms In this section, we briefly review the standard alternating direction method of multipliers ADMM) algorithms [1]. ADMM is a technique for solving optimization problem in the following form []: min X f X) + g X) subject to X X. 1) ADMM is attractive when the proximal operator of f X) + g X) cannot be easily obtained, but the proximal operator of f X) and the proximal operator of g X) can be easily computed. The approach consists of the following three steps []: 1. Rewrite the problem 1) as min X,Z f X) + g Z) subject to X X, X = Z, ) where functions f and g are decoupled by introducing a new optimization variable, Z.. Form the augmented Lagrangian L X, Z, Λ) = f X) + g Z) + Λ, X Z + X Z F, 3) where Λ is the Lagrange multiplier and > is a penalty parameter. 3. Iterate the following two steps until convergence a) Update each primal variable in turn by minimizing the augmented Lagrangian 3) with respect to that variable, while holding all other variables. The update in the tth iteration are as follows: b) Update the dual variable using a dual-ascent update: X t+1 arg min L X, Z t, Λ t), X X Z t+1 arg min L X t, Z, Λ t). Z Λ t+1 Λ t + X t+1 Z t+1).. ADMM algorithm for fused graphical lasso with weighted l 1 penalty Following in the method of [3], we use the ADMM algorithm to solve the fused graphical lasso with weighted l 1 penalty problem 5) in the main text) min Θ, Θ k S p ++ n c tr S kc Θ kc) log det Θ kc)) + λ 1 c=1 ω θ kc + λ ψ θ c=1 i j i,j θ k. 4) To solve the problem 4), we reformulate it by introducing new variables Z and Z k, so as to decouple some of the terms in the objective function that are difficult to optimize jointly: min Θ, Θ k S p ++, Z, Z k n c tr S kc Θ kc) log det Θ kc)) + λ 1 c=1 ω z kc + λ c=1 i j i,j ψ z zk subject to Θ = Z, Θ k = Z k. 5) The augmented Lagrangian to 5) is given by L Θ, Θ k, Z, Z k, U, U k) = n c tr S kc Θ kc) log det Θ kc)) + λ 1 c=1 +λ + i,j ψ z ω z kc c=1 i j z k + U, Θ Z + U k, Θ k Z k Θ Z F + Θ k Z k F, 6) where U and U k are dual variables and severs as the penalty parameter. The ADMM algorithm updates each primal variable while holding the other variables fixed and then updates the dual variables using a dual-ascent update rule. We now derive the update rules for the variables. 7

8 ..1 Θ and Θ k update Before introducing the update rules for Θ and Θ k, we first define the Expand operator: Expand A,, n) = arg min n log det Θ) + Θ S p Θ A = 1 D U + ++ D + 4n I ) U T, 7) where UDU T is the eigenvalue decomposition of a symmetric matrix A. The Expand operator has been used to solve the graphical lasso problem in previous studies [, 4, 3]. Note that Θ = arg min Θ S p ++ = arg min Θ S p ++ L Θ, Θ k, Z, Z k, U, U k) n 1 log det Θ ) + Θ Z 1 n1 S + U )) F Now it follows from the definition of the Expand operator that Θ Expand Z 1 n1 S + U ) ),, n 1. 9) 8) The update for Θ k can be derived in a similar way: Θ k Expand Z k 1 n S k + U k) ),, n. 1).. Z and Z k update Minimizing augmented Lagrangian 6) with respect to Z and Z k can be written as follows: {Z, Z k } = arg min L Θ, Θ k, Z, Z k, U, U k) {Z, Z k } ) Θ kc + Ukc = arg min {Z, Z k } c=1 Zkc F + λ 1 ω z kc + λ c=1 i j i,j ψ z z k.11) Now 11) is completely separable with respect to each pair of matrix elements i, j). That is, we can solve for each i, j): {z, z k } = arg min {z,zk } z kc a kc ) + λ1 ω z kc + λ ψ z z k, 1) c=1 where a kc = zkc. This is a special case of the fused lasso signal approximator [3, 5] and it has a very simple closed form solution. When λ 1 =, the solution to 1) takes the form + ukc c=1 z, z k ) = a λ ψ /, a k + λ ψ / ) a + λ ψ /, a k λ ψ / ) ) a +ak +ak, a if a if a k if a > ak > a ak + λ ψ / + λ ψ / λ ψ /. 13) When λ 1 >, the solution to 1) can be obtained through soft-thresholding 13) by λ 1 ω. Here the soft-thresholding is defined as Sx, c) = signx) x c) +, where a + = maxa, )...3 Complete ADMM algorithm Based on the augmented Lagrangian, the complete ADMM algorithm for 4) is given in Algorithm 1. We find it is useful in practice to vary the value of penalty parameter for each iteration. Therefore, we update in each iteration based on the primal residues and dual residues. We present the update rule for at the end of this section. Please note that according to the ADMM algorithm, the estimates ˆΘ and ˆΘ are not exactly sparse, while Ẑ and Ẑ are sparse which are obtained through soft-thresholding...4 Stop criteria Let Θ kc t, Z kc t and U kc t denote the estimates at the tth iteration. The primal residual [1] is defined as [ ] [ ] Θ t Z P t = Θ k t t Z k, 14) t 8

9 Algorithm 1 ADMM algorithm for optimization problem 4) Inputs: Sample covariance matrices S and S k, sample size n 1 and n, penalty parameter = 1, µ = 1, τ incr = and τ decr = Output: Estimated precision matrices, ˆΘ and ˆΘ, and their sparse approximation, Ẑ and Ẑ Initialize: Θ kc = I, Z kc = I, U kc = for c = 1, while Not converged do 1. Θ Expand Z 1. Θ k Expand n1 S + U ),, n 1 ) ; Z k 1 n S k + U k) ),, n ; 3. Update Z and Z k through solving problem 11); 4. U U + Θ Z ) ; 5. U k U k + Θ k Z k) ; 6. Update according to 18). Output Θ, Θ, Z and Z and the dual residual [1] is defined as D t = [ ] Z t Z k t [ ]) Z t 1 Z k. 15) t 1 We consider the process converges if both primal residual and dual residual are sufficiently small [1]. More specially, we introduce small positive constants ɛ abs and ɛ rel, and declare P t and D t small if P t F { [ ] [ ] } pɛ abs + ɛ rel Θ t max Θ k, Z t t Z k, 16) t and F D t F [ ] pɛ abs + ɛ rel U t U k. 17) t F We set ɛ abs = 1 3 and ɛ rel = 1 3 as suggested by Boyd et al. [1]. The choice of the value of ɛ abs depends on the scale of the variable values...5 Varying penalty parameter In practice, it is useful to use different penalty parameter for each iteration, which might improve the convergence as well as make performance less dependent on the initial value of penalty parameter [1]. Therefore, we update the value of in the tth iteration according to the primal residuals and dual residuals: τ incr if P t F > µ D t F /τ decr if D t F > µ F t F 18) otherwise where µ > 1, τ incr > 1 and τ decr > 1 are the adaptation parameters. We set µ = 1, τ incr = and τ decr = according to [1]..3 Complete algorithm of the TDJGL model By taking Equation 4) into Equation 3) in the main text, the TDJGL model can be written as follows: K n c tr S kc Θ kc) log det Θ kc)) K K + λ 1 θ kc + λ min {Θ} k=1 c=1 i j k=1 c=1 F i,j k=1 θ θk s.t. Θ kc S p ++, for k = 1,..., K and c = 1,. 19) Given the sample covariance matrices { S kc} c=1, and the two parameters λ 1 and λ, we can find the estimates of { } c=1, {Ẑkc } c=1, precision matrices ˆΘ kc and their sparse approximation using Algorithm. Here we first decomposed 19) into K individual subproblems 4) using the method of local linear approximation see the main text). Then we 9

10 { } c=1, iteratively solve each subproblems until convergence. Because the estimates ˆΘ kc } c=1, should be not accurately sparse {Ẑkc when they are obtained by using the ADMM algorithm, we use in the experiments, which are introduced as { } c=1, sparse approximation of ˆΘ kc by the ADMM algorithm please refer to Algorithm 1). Algorithm Complete algorithm of the TDJGL model 19) Inputs: The K sample covariance matrices { S kc} c=1,, and the regularization parameters λ 1 and λ Output: {Ẑkc} c=1, The K estimated precision covariance matrices Main algorithms: { ˆΘ kc } c=1, and their sparse approximation 1. Initialize ˆΘ kc for k = 1,..., K and c = 1,. 1. Compute the weights ω = K ˆθ kc k=1 c=1 1 and ψ = K ˆθ ˆθ k k=1 for i, j = 1,..., p. 3. Update ˆΘ and ˆΘ k for all k = 1,..., K by solving problem 4) using Algorithm Repeat Steps and 3 until the convergence condition is achieved. { } c=1, } c=1, 5. Output ˆΘ kc {Ẑkc and.4 Model selection For the TDJGL model, parameter λ 1 controls the sparsity of the estimated networks, and parameter λ has an influence on the sparsity of resulting differential networks. Therefore, the choice of λ 1 and λ is critical. We determine the values of parameters in data-driven method via stability selection [6]. Stability selection, which seeks the parameters leading to most stable set of edges, has better results for network inference than other model selection methods including cross validation, Akaike information criterion and Bayesian information criterion [7, 8, 9, 1]. We choose λ 1 and λ so as to use the least amount of regularization that simultaneously makes network sparse and stable. Here we resort to a recently developed stability selection method called StARS [7]. Because λ 1 mainly influences the sparsity and stability of resulting gene networks, while λ mainly controls the sparsity and stability of estimated differential networks, we determine their values separately. We first determine the value of λ 1 while setting λ =. Then we determine the value of λ while setting λ 1 with the value chosen in the previous step. We draw S random sample sets D 1,..., D S from the n = n 1 + n patients, each of size.8n. For now, we choose λ 1 } c=1, {Êkc from a given vector of regularization parameter Λ 1 with setting λ =. We estimate K networks s λ 1, λ ) for each D s and each λ 1 from Λ 1. The optimal value of λ 1 controls the average variance over the edges of the networks inferred from sub-sampled data: λ λ) 1,opt = arg min γ Λ 1 max λ 1 γ K k=1 c=1 i<j ā kc λ 1, λ ) 1 ā kc λ 1, λ ) ) / ) / p ) K β, ) where ā kc λ 1, λ ) = 1 S S s=1 i, I j) Ê kc s λ 1, λ ). Here we present StARS using a completeness form. Interested reader is referred to [7]. After determining λ 1, we choose λ from a given vector of regularization parameter Λ according to stability of inferred differential networks. For now, we set λ 1 = λ λ) 1,opt which is determined above. We estimate K networks {Êkc s λ 1, λ ) } k for each D s and each λ from Λ. Then we construct K differential networks, { DE s λ 1, λ ), based on the estimated networks. The optimal value for λ is chosen according to the average variance over the differential edges inferred from sub-sampled data: K λ λ1),opt = arg min γ Λ max b k λ 1, λ ) 1 b k λ 1, λ ) ) / ) / p K β λ γ, 1) i<j where b k λ 1, λ ) = 1 S S s=1 i, I j) k=1 DE kc s λ 1, λ ) ). } c=1, 1

11 In this study, we set the number of random sample sets N = and the stability parameter β =.1..5 Criteria for platinum response groups The criterion that has been used in [11, 1] is adopted to define platinum-based chemotherapy response groups. In particular, we download the clinical information Biotab format) of ovarian tumors from the TCGA website. We obtain the drug information from the nationwidechildrens.org clinical drug ov.txt file. Here we only consider platinum-based drugs carboplatin, cisplatin, carbo) with regimen indication ADJUVANT and OTHER, SPECIFY IN NOTES. The cancer progression information are obtained from the nationwidechildrens.org clinical follow up v1. nte ov.txt file. New tumors with new tumor event dx evidence = [Not Available] and new neoplasm event type = [Unknown] are not considered. The follow-up information of tumors with no progression are obtained from the nationwidechildrens.org clinical follow up v1. ov.txt file. For tumors with progression, it is defined as platinum-resistant if the new tumor occurs within 6 months of the end of primary treatment days to new tumor event after initial treatment days to drug therapy end 18 days), and it is defined as platinum-sensitive otherwise days to new tumor event after initial tr days to drug therapy end > 18 days). For tumors with no progression, it is defined as platinum-sensitive if the follow-up interval is at least 6 months from the date of last primary treatment days to last followup days to drug therapy end > 18 days). Among the 514 tumors that have all the three types of gene expression profiles, 3 tumors that have explicit cisplatinum status, with 4 platinum-sensitive tumors and 98 platinum-resistant tumors. The sensitive and resistant information for each sample is presented in Supplementary information..6 Comparison with other graphical lasso models on the ovarian cancer data We compare TDJGL with FGL, GGL and GL on the ovarian cancer data. For FGL, we run it separately for each platform and each time it is applied across the two patient groups. For GGL, we run it separately for each patient group and each time it is applied across all the three platforms. For GL, we run it separately for each patient group and each platform type. In order to provide interpretable results, we select the tuning parameters of the compared methods to give the similar number of edges and differential edges as those of TDJGL. Unlike TDJGL and FGL, GGL and GL cannot control the similarities of precision matrices between different patient groups. They tend to identify too much differential edges. To better interpret the results of GGL and GL, we sort the absolute values of differential scores in decreasing order, and take the top #DE the number of differential edges identified by TDJGL) edges for each model. A common challenge in evaluating gene network inference and differential network analysis using real data is the lack of the gold standards. That is, in our ovarian cancer data analysis, we cannot obtain the true gene networks in the platinumresistant tumors and the platinum-sensitive tumors. Therefore, it is difficult to compare different methods in terms of the accuracy of identifying group-specific gene networks and differential networks. In this study, we adopt an alternative way to evaluate performance. First, we compare the methods based on the overlaps between edges inferred from different platforms, which can assess the consistency. A method that produces a greater number of edges shared by different platforms is more consistent. Then, we compare the hub nodes in the differential networks in terms of known drug resistance-related genes and cancer-related genes. A method that works better in capturing known functionally important genes in the differential networks might have better performance in inferring the differential networks. We observe that overlaps between edges and differential edges) identified by FGL and GL from the different platforms are quite low Figures S7 and S9). For GGL which encourages a similar network structure across all platforms, more than half of identified edges are shared by all the three platforms Figure S8 a)-b)). Because GGL does not consider the similarity of differential networks across platforms, a great number of differential edges detected by GGL are supported by only one platform Figure S8 c)). As mentioned in the main text, both gene networks and differential networks inferred by TDJGL share a great number of edges across all the three platforms Figure S6). We also compare the hub nodes in the differential networks inferred by different methods. For FGL, GGL and GL, we consider the 18 genes which have the largest degree of connectivity as hub genes. From Table S1, we find that the set of hub genes determined by TDGJL includes more cisplatin resistance-related genes, drug resistance-related genes and cancer-related genes than those determined by the other three methods. Table S1: The number of hub genes that have been reported as platinum resistance-related genes, drug resistance-related genes and cancer-related genes Methods GEAR cisplatin GEAR drug CGC TDJGL FGL 3 6 GGL GL

12 a) b) c) G45 79) G45 76) G45 168) U ) HuEx 1793) U133 66) HuEx 1713) U ) HuEx 63) Figure S7: Overlaps between the edges and differential edges) detected by FGL from the three platforms for a) platinumresistant tumors, b) platinum-sensitive tumors and c) differential networks. a) b) c) Figure S8: Overlaps between the edges and differential edges) detected by GGL from the three platforms for a) platinumresistant tumors, b) platinum-sensitive tumors and c) differential networks. Figure S9: Overlaps between the edges and differential edges) detected by GL from the three platforms for a) platinum resistant tumors, b) platinum-sensitive tumors and c) differential networks. References [1] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 31):1 1, 11. [] Karthik Mohan, Palma London, Maryam Fazel, Daniela Witten, and Su-In Lee. Node-based learning of multiple gaussian graphical models. The Journal of Machine Learning Research, 151): , 14. [3] Patrick Danaher, Pei Wang, and Daniela M Witten. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B Statistical Methodology), 76): , 14. 1

13 [4] Daniela M Witten and Robert Tibshirani. Covariance-regularized regression and classification for high dimensional problems. Journal of the Royal Statistical Society: Series B Statistical Methodology), 713): , 9. [5] Holger Hoefling. A path algorithm for the fused lasso signal approximator. Journal of Computational and Graphical Statistics, 194):984 6, 1. [6] Nicolai Meinshausen and Peter Bühlmann. Stability selection. Journal of the Royal Statistical Society: Series B Statistical Methodology), 74): , 1. [7] Han Liu, Kathryn Roeder, and Larry Wasserman. Stability approach to regularization selection stars) for high dimensional graphical models. In Advances in neural information processing systems, pages , 1. [8] Genevera I Allen and Zhandong Liu. A local poisson graphical model for inferring networks from sequencing data. IEEE Transactions on NanoBioscience, 13): , 13. [9] Marinka Žitnik and Blaž Zupan. Gene network inference by fusing data from diverse distributions. Bioinformatics, 311):i3 i39, 15. [1] Zachary D Kurtz, Christian L Müller, Emily R Miraldi, Dan R Littman, Martin J Blaser, and Richard A Bonneau. Sparse and compositionally robust inference of microbial ecological networks. PLOS Computational Biology, 115):e46, 15. [11]. The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature, ):9 615, 11. [1] Sheida Nabavi, Daniel Schmolze, Mayinuer Maitituoheti, Sadhika Malladi, and Andrew H Beck. Emdomics: a robust and powerful method for the identification of genes differentially expressed between heterogeneous classes. Bioinformatics, page btv634,

Node-Based Learning of Multiple Gaussian Graphical Models

Node-Based Learning of Multiple Gaussian Graphical Models Journal of Machine Learning Research 5 (04) 445-488 Submitted /; Revised 8/; Published /4 Node-Based Learning of Multiple Gaussian Graphical Models Karthik Mohan Palma London Maryam Fazel Department of

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Exact Hybrid Covariance Thresholding for Joint Graphical Lasso Qingming Tang Chao Yang Jian Peng Jinbo Xu Toyota Technological Institute at Chicago Massachusetts Institute of Technology Abstract. This

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

Learning Graphical Models With Hubs

Learning Graphical Models With Hubs Learning Graphical Models With Hubs Kean Ming Tan, Palma London, Karthik Mohan, Su-In Lee, Maryam Fazel, Daniela Witten arxiv:140.7349v1 [stat.ml] 8 Feb 014 Abstract We consider the problem of learning

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF

Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Learning the Network Structure of Heterogenerous Data via Pairwise Exponential MRF Jong Ho Kim, Youngsuk Park December 17, 2016 1 Introduction Markov random fields (MRFs) are a fundamental fool on data

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

arxiv: v4 [stat.me] 11 Jul 2012

arxiv: v4 [stat.me] 11 Jul 2012 The joint graphical lasso for inverse covariance estimation across multiple classes arxiv:1111.034v4 [stat.me] 11 Jul 01 Patrick Danaher Department of Biostatistics, University of Washington, USA Pei Wang

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

arxiv: v1 [stat.co] 22 Jan 2019

arxiv: v1 [stat.co] 22 Jan 2019 A Fast Iterative Algorithm for High-dimensional Differential Network arxiv:9.75v [stat.co] Jan 9 Zhou Tang,, Zhangsheng Yu,, and Cheng Wang Department of Bioinformatics and Biostatistics, Shanghai Jiao

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

10725/36725 Optimization Homework 4

10725/36725 Optimization Homework 4 10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Discriminative Feature Grouping

Discriminative Feature Grouping Discriative Feature Grouping Lei Han and Yu Zhang, Department of Computer Science, Hong Kong Baptist University, Hong Kong The Institute of Research and Continuing Education, Hong Kong Baptist University

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Distributed ADMM for Gaussian Graphical Models Yaoliang Yu Lecture 29, April 29, 2015 Eric Xing @ CMU, 2005-2015 1 Networks / Graphs Eric Xing

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Stochastic dynamical modeling:

Stochastic dynamical modeling: Stochastic dynamical modeling: Structured matrix completion of partially available statistics Armin Zare www-bcf.usc.edu/ arminzar Joint work with: Yongxin Chen Mihailo R. Jovanovic Tryphon T. Georgiou

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

An ensemble learning method for variable selection

An ensemble learning method for variable selection An ensemble learning method for variable selection Vincent Audigier, Avner Bar-Hen CNAM, MSDMA team, Paris Journées de la statistique 2018 1 / 17 Context Y n 1 = X n p β p 1 + ε n 1 ε N (0, σ 2 I) β sparse

More information

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin

More information

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

arxiv: v3 [stat.ml] 14 Apr 2016

arxiv: v3 [stat.ml] 14 Apr 2016 arxiv:1307.0048v3 [stat.ml] 14 Apr 2016 Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce Kun Yang April 15, 2016 Abstract In this paper, we propose a one-pass

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

arxiv: v1 [cs.cv] 1 Jun 2014

arxiv: v1 [cs.cv] 1 Jun 2014 l 1 -regularized Outlier Isolation and Regression arxiv:1406.0156v1 [cs.cv] 1 Jun 2014 Han Sheng Department of Electrical and Electronic Engineering, The University of Hong Kong, HKU Hong Kong, China sheng4151@gmail.com

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA

STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY 0 Final Report DISTRIBUTION A: Distribution approved for public

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Sparse Inverse Covariance Estimation for High-throughput microrna Sequencing Data in the Poisson Log-Normal Graphical Model

Sparse Inverse Covariance Estimation for High-throughput microrna Sequencing Data in the Poisson Log-Normal Graphical Model Sparse Inverse Covariance Estimation for High-throughput microrna Sequencing Data in the Poisson Log-Normal Graphical Model arxiv:1708.04490v1 [stat.co] 15 Aug 2017 David Sinclair, Giles Hooker September

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

An interior-point stochastic approximation method and an L1-regularized delta rule

An interior-point stochastic approximation method and an L1-regularized delta rule Photograph from National Geographic, Sept 2008 An interior-point stochastic approximation method and an L1-regularized delta rule Peter Carbonetto, Mark Schmidt and Nando de Freitas University of British

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

Dealing with Constraints via Random Permutation

Dealing with Constraints via Random Permutation Dealing with Constraints via Random Permutation Ruoyu Sun UIUC Joint work with Zhi-Quan Luo (U of Minnesota and CUHK (SZ)) and Yinyu Ye (Stanford) Simons Institute Workshop on Fast Iterative Methods in

More information

UVA CS 4501: Machine Learning

UVA CS 4501: Machine Learning UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science Today Extra q Optimization of SVM ü SVM

More information

arxiv: v3 [math.oc] 20 Mar 2013

arxiv: v3 [math.oc] 20 Mar 2013 1 Design of Optimal Sparse Feedback Gains via the Alternating Direction Method of Multipliers Fu Lin, Makan Fardad, and Mihailo R. Jovanović Abstract arxiv:1111.6188v3 [math.oc] 20 Mar 2013 We design sparse

More information

Structural Learning and Integrative Decomposition of Multi-View Data

Structural Learning and Integrative Decomposition of Multi-View Data Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018 Dr. Gen Li, Columbia University, Mailman

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Penalized versus constrained generalized eigenvalue problems

Penalized versus constrained generalized eigenvalue problems Penalized versus constrained generalized eigenvalue problems Irina Gaynanova, James G. Booth and Martin T. Wells. arxiv:141.6131v3 [stat.co] 4 May 215 Abstract We investigate the difference between using

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Learning Markov Network Structure using Brownian Distance Covariance

Learning Markov Network Structure using Brownian Distance Covariance arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Automatic Response Category Combination. in Multinomial Logistic Regression

Automatic Response Category Combination. in Multinomial Logistic Regression Automatic Response Category Combination in Multinomial Logistic Regression Bradley S. Price, Charles J. Geyer, and Adam J. Rothman arxiv:1705.03594v1 [stat.me] 10 May 2017 Abstract We propose a penalized

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information