25 : Graphical induced structured input/output models
|
|
- Alfred Palmer
- 5 years ago
- Views:
Transcription
1 10-708: Probabilistic Graphical Models , Spring : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph structured fused lasso So far we have seen how probabilistic graphical models can be used to express the structure, i.e the dependence between the input and output variables. This lecture discusses alternate graph-based models which, instead of modeling the probabilistic assumptions about the data encode the input/output structure of the data in an obective function via novel regularizers. The basic form of such graph-based obective functions involves an empirical loss term and a regularizer which captures the graphical dependencies between the parameters. Such graphical induced models have a large range of applications, particularly in computational biology. One such application is in the broad area of Genetics, where the goal is to find SNPs (variations in the genome) that are relevant to a disease. The hypothesis is that multiple such mutations/variations are responsible together for diseases. However since the number of SNPs is very large (a few million), checking all combinations of SNPs for correlation with the disease is a computationally hard problem. This problem is further exacerbated due to the presence of multiple phenotypes i.e characteristics relevant to the disease - making it an association mapping problem where the goal is to find strong associations or correlations between several SNPs (also called the genotype) and multiple phenotypes (such as allergy, blood pressure etc). In addition, we want to consider the multiple correlated phenotypes ointly while finding the association with the genotype. The statistical challenge involved in this problem is: given multivariate input X (the SNPs) and multivariate output Y (the phenotypes), to identify the association between X and Y, where the output covariates Y can be a graph connecting the phenotypes, a tree structure connecting genes etc. The work by [1] proposes a formulation for the case where the output Y is a graph of traits, with edges between traits indicating their correlation. Another challenge is that: the number of examples available will be in the order of a few thousands whereas the number of covariates/features in X is of the order of a few million. This will make the problem statistically under-determined. Let X be an N J design matrix of genotypes for N individuals and J SNPs. Let Y denote an N K matrix of quantitative-trait (i.e phenotype) measurements over the same set of individuals. We use y k to denote the k-th column (i.e., the trait) of Y. Let the graph representing the relationship between the traits (called the Quantitative Trait Network) be G with vertices V and edges E. Due to the multivariate output, the parameters are a matrix B R J K instead of a vector. This matrix B = (β 1, β 2... β K ) where β k = (β 1k... β Jk ) R J encodes the structure and strength of the association. The parameter β k represents the association strength between SNP and trait k. See Figure 1 for an illustration. We want to bias our learning towards finding sparse associations because these are biologically more meaningful and at the same time make the problem statistically viable. The standard approach towards finding sparse models is to use Lasso. However this method cannot be directly applied here as it fails to capture the dependence amongst the parameters β k. The work by [1] uses two regularizer terms: the first is similar to the lasso and enforces the sparsity constraint, and the second term enforces the graph structure constraints and is called the graph-constrained fusion penalty. They propose two obective functions to express the 1
2 2 25 : Graphical induced structured input/output models graph structure. Model G c F Lasso : This model uses the graph without the edge-weights. The obective function with the two regularizers and a squared error loss term has the form: ˆB = arg min β (y k Xβ k ) T (y k Xβ k ) + λ k k β k + γ β m sign(r ml )β l (1) (m,l) E The fusion penalty β k β m where β k and β m are the association strengths for trait k and m respectively, tries enforce similar association strength for the two correlated traits. See Figure 1 for an illustration. Model G w F Lasso : This model uses the edge-weights in the graph. The obective function has the form: ˆB = arg min β (y k Xβ k ) T (y k Xβ k ) + λ k k β k + γ f(r ml ) β m sign(r ml )β l (2) (m,l) E Figure 1: A figure illustrating the association strengths for two correlated traits k and m. If the two traits m and l are highly correlated in the graph G with a relatively large edge weight, the fusion effect over the two traits will intensify, and as a result the difference between the two corresponding regression coefficients and will be penalized more than those for other pairs of traits with weaker correlation. Compared to G c F Lasso, G w F Lasso is significantly more flexible due to its usage of the edge weights to incorporate the strength of correlation. For example, when two groups of highly correlated traits show a relatively weaker correlation across the two subnetworks, G w F Lasso can handle the hierarchical subgroup structure and adust the amount of fusion accordingly by weighting each fusion term. The optimization problems in Eqn (1) and Eqn (2) are convex, and can be formulated as a quadratic programming problem using the similar approach for solving the fused lasso problem. Although there are many publicly available software packages that efficiently solve such quadratic programming problems, these approaches do not scale in terms of computation time when working with large problems involving hundreds or thousands of traits. Since the main difficulty of directly optimizing Eqn (1) and Eqn (2) arises from the non-smooth nature of the l 1 norm, this problem can be transformed to an equivalent form that involves only smooth functions. Then one can use a fast coordinate-descent algorithm to find the estimates of regression coefficients. It can be shown that solving the optimization problem in Eqn (2) is equivalent to solving the following
3 25 : Graphical induced structured input/output models 3 problem with a smooth function of the l 2 norm. min β k,d k,d ml subect to d k = 1,k d ml = 1, (m,l) E k (y k Xβ k ) T (y k Xβ k ) + λ k d k 0 for all, k d ml 0 for all, (m, l) E (β k ) 2 d k + γ (m,l) E f(r ml ) 2 (β m sign(r ml )β l ) 2 where d k and d ml are additional variables that we need to estimate. We solve the above problem using a coordinate-descent approach that iteratively updates variables of interest β k, and (d k,d ml ), until there is little improvement in the value of the obective function. Using this approach, we first fix values of d k and d ml, and find the update equation for β k by differentiating the obective function in Eqn 3 with respect to each β k and setting it to 0. The coordinate-descent procedure finds the optimal for fixed regularization parameters λ and γ. The regularization parameters and can be determined automatically by a cross-validation or by using a validation set. d ml (3) 2 Tree-guided Group Lasso 2.1 Motivation In a univariate-output regression setting, sparse regression methods that extend lasso have been proposed to allow the recovered relevant inputs to reflect the underlying structural information among the inputs. Group lasso apply an L 1 norm of the lasso penalty over groups of inputs, while using an L 2 norm for the input variables within each group. This L 1 /L 2 norm for group lasso has been extended to a more general setting to encode prior knowledge on various sparsity patterns, where the key idea is to allow the groups to have an overlap. However, the overlapping groups in their regularization methods can cause an imbalance among different outputs, because the regression coefficients for an output that appears in large number of groups are more heavily penalized than for other outputs with memberships to fewer groups. So the tree-guided group lasso for multi-task regression with structured sparsity has been has been proposed. It uses a novel weighting scheme that systematically weights each group in the tree-guided group-lasso penalty such that clusters of strongly correlated outputs are more encouraged to share common covariates than clusters of weakly correlated outputs. This model is also motivated by the genetic association mapping problem, where the goal is to identify a small number of SNPs (inputs) out of millions of SNPs that influence phenotypes (outputs) such as gene expression measurements. 2.2 Background on Sparse Regression and Multi-task Learning The basic linear model on multi-task regression is: y k = Xβ k + ɛ k, k = 1, 2,.., K, where β k is a vector of J regression coefficients for the k-th output, and ɛ k is a vector of N independent error terms having mean 0 and a constant variance. X denote the N J input matrix, and Y denote
4 4 25 : Graphical induced structured input/output models the N K output matrix. When J is large and the number of inputs relevant to the output is small, lasso offers an effective feature selection method for this model. Let B = (β 1, β 2,..., β K ) denote the J K matrix of regression coefficients for all K outputs. Then, lasso obtains ˆB lasso by solving the following optimization problem: ˆB lasso = argmin k (y k Xβ k ) T (y k Xβ k ) + λ k β k, where λ is a tuning parameter that controls the amount of sparsity in the solution. In multi-task learning, where the goal is to select input variables that are relevant to at least one task, an L 1 /L 2 penalty has been used to take advantage of the relatedness of the outputs. The L 1 /L 2 -penalized multi-task regression is defined as the following optimization problem: ˆB L1/L2 = argmin k (y k Xβ k ) T (y k Xβ k ) + λ β 2 The L 1 part of the penalty plays the role of selecting inputs relevant to at least one task, and the L 2 part combines information across tasks. Since the L 2 penalty does not have the property of encouraging sparsity, if the th input is selected as relevant, all of the elements of β take non-zero values. Thus, the estimate ˆB L1/L2 is sparse only across inputs but not across outputs. 2.3 Tree-Guided Group Lasso for Sparse Multiple-output Regression We assume that the relationships among the outputs can be represented as a tree T with the set of vertices V of size V. Given this tree T over the outputs, we generalize the L 1 /L 2 regularization to a tree regularization as follows. We expand the L 2 part of the L 1 /L 2 penalty into a group-lasso penalty, where the group is defined based on tree T as follows. Each node v V of tree T is associated with group G v whose members consist of all of the output variables (or leaf nodes) in the subtree rooted at node v. Given these groups of outputs that arise from tree T, tree-guided group lasso can be written as ˆB T ree = argmin k (y k Xβ k ) T (y k Xβ k ) + λ v V w v β G v 2 where β G v is a vector of regression coefficients. Each group of regression coefficients β G v weighted with w v that reflects the strength of correlation within the group. In order to define the weights w v s, we first associate each internal node v of the tree T with two quantities s v and g v that satisfy the condition s v + g v = 1. The s v represents the weight for selecting the output variables associated with each of the children of node v separately, and the g v represents the weight for selecting them ointly. Given an arbitrary tree T, we recessively apply the similar operation starting from the root node towards the leaf nodes as follows: where v V w v β G v 2 = λ W (v root ), { sv W (v) = c Children(v) W (c) + g v β G v 2 m G v βm if v is an internal node, if v is a leaf node. It can be shown that the following relationship holds beta ten w v s and (s v, g v ) s.
5 25 : Graphical induced structured input/output models 5 { gv W (v) = m Ancestors(v) s m if v is an internal node, m Ancestors(v) s m if v is a leaf node. The above weighting scheme extends the elastic-net-like penalty hierarchically. Thus, at each internal node v, a high value of s v encourages a separate selection of inputs for the outputs associated with the given node v, whereas high values of g v encourages a oint covariate selection across the outputs. If s v = 1and g v = 0 for all v V, then only separate selections are performed, and the tree-guided group lasso penalty reduces to the lasso penalty. On the other hand, if s v = 0 and g v = 1 for all v V, the penalty reduces to the L 1 /L 2 penalty that performs only a oint covariate selection for all outputs. 2.4 Example Figure 2: tree-guided group lasso Given the tree in the figure above, the tree-guided group-lasso penalty for the th input is give as follows: W (v r oot) = W (v 5 ) = g v5 β G v5 2 + s v5 ( W (v 4 ) + W (v 3 ) ) ( ) = g v5 β G v5 2 + s v5 g v4 β G v4 2 + s v4 ( W (v 1 ) + W (v 2 ) ) + s v5 β 3 = g v5 β G v5 2 + s v5 g v4 β G v4 2 + s v5 s v4 ( β 1 + β 2 ) + s v 5 β Parameter Estimation in order to estimate the regression coefficients in tree-guided group lasso, we use an alternative formulation of the problem that was previously introduced for group lass, given as ˆB T ree = argmin ( ) 2 k (y k Xβ k ) T (y k Xβ k ) + λ v V w v β Gv 2 Since the L 1 /L 2 norm in the above equation is a non-smooth function, it is not trivial to optimize it directly. We make use of the fact that the variational formulation of a mixed-norm regularization is equal to a weighted L 2 regularization as follows: where ( ) 2 v V w v β Gv 2 v V v d,v = 1, d,v 0,, v, and the equality holds for d,v = w v β,v 2 v V wv β,v 2 w 2 v β Gv 2 2 d,v, Thus, we can re-write the problem so that it contains only smooth functions, as follows:
6 6 25 : Graphical induced structured input/output models subect to ˆB T ree = argmin k (y k Xβ k ) T (y k Xβ k ) + λ v d,v = 1, d,v 0,, v, v V w 2 v β Gv 2 2 d,v where we introduced additional variables d,v s that need to be estimated. We solve the problem in the above equation by optimizing β k s and d,v s alternately over iterations until convergence. In each iteration, we first fix the values for β k s, and update d,v s. Then, we hold d,v s as constant, and optimize for β k s. We differentiate the obective in the above equation with respect to β k s, set it to zero, and solve for β k s to obtain the update equation: β k = ( X T X + λd ) 1 X T y k, where D is a J J diagonal matrix with v V w2 v/d,v in the th element along the diagonal. Finally, the regularization parameter λ can be selected using a cross-validation. References [1] Seyoung Kim, Eric P. Xing,Plos Genetics 2009 Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network. [2] Seyoung Kim, Eric P. Xing, Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity.
25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More information27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1
10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex
More informationTREE-GUIDED GROUP LASSO FOR MULTI-RESPONSE REGRESSION WITH STRUCTURED SPARSITY, WITH AN APPLICATION TO EQTL MAPPING
Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 TREE-GUIDED GROUP LASSO FOR MULTI-RESPONSE REGRESSION WITH STRUCTURED SPARSITY, WITH AN APPLICATION TO EQTL MAPPING By Seyoung Kim and
More informationAn Efficient Proximal Gradient Method for General Structured Sparse Learning
Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationOnline Dictionary Learning with Group Structure Inducing Norms
Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationarxiv: v1 [stat.ml] 13 Nov 2008
Submitted to The American Journal of Human Genetics A Multivariate Regression Approach to Association Analysis of Quantitative Trait Network arxiv:8.226v [stat.ml] 3 Nov 28 Seyoung Kim Kyung-Ah Sohn Eric
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationPredicting Workplace Incidents with Temporal Graph-guided Fused Lasso
Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso Keerthiram Murugesan 1 and Jaime Carbonell 1 1 Language Technologies Institute Carnegie Mellon University Pittsburgh, USA CMU-LTI-15-??
More informationA General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations
A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationSMOOTHING PROXIMAL GRADIENT METHOD FOR GENERAL STRUCTURED SPARSE REGRESSION
Submitted to the Annals of Applied Statistics SMOOTHING PROXIMAL GRADIENT METHOD FOR GENERAL STRUCTURED SPARSE REGRESSION By Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell and Eric P. Xing Carnegie
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationA Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure
A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure Micol Marchetti-Bowick Machine Learning Department Carnegie Mellon University micolmb@cs.cmu.edu Abstract
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationLinear Regression. Volker Tresp 2018
Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w
More informationHeterogeneous Multitask Learning with Joint Sparsity Constraints
Heterogeneous ultitas Learning with Joint Sparsity Constraints Xiaolin Yang Department of Statistics Carnegie ellon University Pittsburgh, PA 23 xyang@stat.cmu.edu Seyoung Kim achine Learning Department
More informationStability and the elastic net
Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationHierarchical Penalization
Hierarchical Penalization Marie Szafransi 1, Yves Grandvalet 1, 2 and Pierre Morizet-Mahoudeaux 1 Heudiasyc 1, UMR CNRS 6599 Université de Technologie de Compiègne BP 20529, 60205 Compiègne Cedex, France
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationVariable Selection in Structured High-dimensional Covariate Spaces
Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationGroup exponential penalties for bi-level variable selection
for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationJoint Estimation of Structured Sparsity and Output Structure in Multiple-Output Regression via Inverse-Covariance Regularization
Joint Estimation of Structured Sparsity and Output Structure in Multiple-Output Regression via Inverse-Covariance Regularization Kyung-Ah Sohn, Seyoung Kim School of Computer Science, Carnegie Mellon University
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February
More informationSMOOTHING PROXIMAL GRADIENT METHOD FOR GENERAL STRUCTURED SPARSE REGRESSION. Carnegie Mellon University
The Annals of Applied Statistics 2012, Vol. 6, No. 2, 719 752 DOI: 10.1214/11-AOAS514 Institute of Mathematical Statistics, 2012 SMOOTHING PROXIMAL GRADIENT METHOD FOR GENERAL STRUCTURED SPARSE REGRESSION
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More information12 : Variational Inference I
10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationGraph Wavelets to Analyze Genomic Data with Biological Networks
Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationUVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 6: Linear Regression Model with Regulariza@ons Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sec@ons of this course
More informationRidge Regression 1. to which some random noise is added. So that the training labels can be represented as:
CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationORIE 4741: Learning with Big Messy Data. Regularization
ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model
More informationPackage Grace. R topics documented: April 9, Type Package
Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationA multivariate regression approach to association analysis of a quantitative trait network
BIOINFORMATICS Vol. 25 ISMB 2009, pages i204 i212 doi:10.1093/bioinformatics/btp218 A multivariate regression approach to association analysis of a quantitative trait networ Seyoung Kim, Kyung-Ah Sohn
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationNonconvex penalties: Signal-to-noise ratio and algorithms
Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex
More informationHigh-dimensional Statistical Models
High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More information