The lasso: some novel algorithms and applications
|
|
- Marilynn York
- 5 years ago
- Views:
Transcription
1 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak, Pei Wang, Daniela Witten
2 2 Tibs Lab Genevera Allen Holger Hoefling Gen Nowak Daniella Witten
3 3 Jerome Friedman Trevor Hastie
4 From MyHeritage.Com 4
5 5 Jerome Friedman Trevor Hastie
6 6 Outline New fast algorithm for lasso- Pathwise coordinate descent Three examples of generalizations of the lasso: 1. Logistic/multinomial for classification. Example later of cancer classification from microarray data 2. Indirected graphical models- graphical lasso 3. Estimation of signals along chromosomes- Comparative Genomic Hybridization - Fused lasso
7 7 Linear regression via the Lasso (Tibshirani, 1995) Outcome variable y i, for cases i = 1, 2,... n, features x ij, j = 1, 2,... p Minimize n i=1 (y i j x ijβ j ) 2 + λ β j Equivalent to minimizing sum of squares with constraint βj s. Similar to ridge regression, which has constraint j β2 j t Lasso does variable selection and shrinkage, while ridge only shrinks. See also Basis Pursuit (Chen, Donoho and Saunders, 1998).
8 8 Picture of Lasso and Ridge regression β 2. ^ β. β 2 ^ β β 1 β 1
9 9 Example: Prostate Cancer Data y i = log (PSA), x ij measurements on a man and his prostate lcavol Coefficients svi lweight pgg45 lbph gleason age lcp Shrinkage Factor s
10 Example: Graphical lasso for cell signalling proteins 10
11 11 L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= 7e 05
12 12 Fused lasso for CGH data log2 ratio log2 ratio genome order genome order
13 13 Algorithms for the lasso Standard convex optimizer Least angle regression (LAR) - Efron et al computes entire path of solutions. State-of-the-Art until 2008 Pathwise coordinate descent- new
14 14 Pathwise coordinate descent for the lasso Coordinate descent: optimize one parameter (coordinate) at a time. How? suppose we had only one predictor. Problem is to minimize (y i x i β) 2 + λ β Solution is the soft-thresholded estimate i sign( ˆβ)( β λ) + where ˆβ is usual least squares estimate. Idea: with multiple predictors, cycle through each predictor in turn. We compute residuals r i = y i j k x ij ˆβ k and applying univariate soft-thresholding, pretending that our data is (x ij, r i ).
15 15 Turns out that this is coordinate descent for the lasso criterion (y i x ij β j ) 2 + λ β j i j like skiing to the bottom of a hill, going north-south, east-west, north-south, etc. [Show movie] Too simple?!
16 16 A brief history of coordinate descent for the lasso 1997: Tibshirani s student Wenjiang Fu at University of Toronto develops the shooting algorithm for the lasso. Tibshirani doesn t fully appreciate it 2002 Ingrid Daubechies gives a talk at Stanford, describes a one-at-a-time algorithm for the lasso. Hastie implements it, makes an error, and Hastie +Tibshirani conclude that the method doesn t work 2006: Friedman is the external examiner at the PhD oral of Anita van der Kooij (Leiden) who uses the coordinate descent idea for the Elastic net. Friedman wonders whether it works for the lasso. Friedman, Hastie + Tibshirani start working on this problem. See also Wu and Lange (2008)!
17 17 Pathwise coordinate descent for the lasso Start with large value for λ (very sparse model) and slowly decrease it most coordinates that are zero never become non-zero coordinate descent code for Lasso is just 73 lines of Fortran!
18 18 Extensions Pathwise coordinate descent can be generalized to many other models: logistic/multinomial for classification, graphical lasso for undirected graphs, fused lasso for signals. Its speed and simplicity are quite remarkable. glmnet R package now available on CRAN
19 19 Logistic regression Outcome Y = 0 or 1; Logistic regression model log( P r(y = 1) 1 P r(y = 1) ) = β 0 + β 1 X 1 + β 2 X 2... Criterion is binomial log-likelihood +absolute value penalty Example: sparse data. N = 50, 000, p = 700, 000. State-of-the-art interior point algorithm (Stephen Boyd, Stanford), exploiting sparsity of features : 3.5 hours for 100 values along path Pathwise coordinate descent: 1 minute
20 20 Multiclass classification Microarray classification: 16,000 genes, 144 training samples 54 test samples, 14 cancer classes. Multinomial regression model. Methods CV errors Test errors # of out of 144 out of 54 genes used 1. Nearest shrunken centroids 35 (5) L 2 -penalized discriminant analysis 25 (4.1) Support vector classifier 26 (4.2) Lasso regression (one vs all) 30.7 (1.8) K-nearest neighbors 41 (4.6) L 2 -penalized multinomial 26 (4.2) Lasso-penalized multinomial 17 (2.8) Elastic-net penalized multinomial 22 (3.7)
21 21 Graphical lasso for undirected graphs p variables measured on N observations- eg p proteins measured in N cells goal is to estimate the best undirected graph on the variables: a missing link means that the variables are conditionally independent
22 22 X2 X1 X3 X5 X4
23 23 Naive procedure for graphs Meinhausen and Buhlmann (2006) The coefficient of the jth predictor in a regression measures the partial correlation between the response variable and the jth predictor Idea: apply lasso to graph problem by treating each node in turn as the response variable include an edge i j in the graph if either coefficient of jth predictor in regression for x i is non-zero, or ith predictor in regression for x j is non-zero.
24 24 More exact formulation Assume x N(0, Σ) and let l(σ; X) be the log-likelihood. X j, X k are conditionally independent if (Σ 1 ) jk = 0 let Θ = Σ 1, and let S be the empirical covariance matrix, the problem is to maximize the penalized log-likelihood log det Θ tr(sθ) subject to Θ 1 t, (1) Proposed in Yuan and Lin (2007) Convex problem! How to maximize? lasso(x [, j], X j, t) Naive lasso((x T X) [ j, j], X T [ j, j] X j, t) lasso(ˆσ [ j, j], X T [ j, j] X j, t) Just a modified lasso Exact! Naive using inner products
25 25 Why does this work? Considered one row and column at a time, the log-likelihood and the lasso problem have the same subgradient Banerjee et al (2007) found this connection, but didn t fully exploit it Resulting algorithm is a blockwise coordinate descent procedure
26 26 Speed comparisons Timings for p = 400 variables Method CPU time State-of-the-art convex optimizer Graphical lasso 27 min 6.2 sec All of this can be generalized to binary Markov networks; but computation of the partition function remains the major hurdle
27 Cell signalling proteins 27
28 28 L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= L1 norm= 7e 05
29 29 Fitting graphs with missing edges A new solution to an old problem X2 X1 X3 X5 X4 Do a modified linear regression of each node on the other nodes, leaving out variables that represent missing edges; cycle through the nodes until convergence this is a simple, apparently new algorithm for this problem [according to Whittaker, Lauritzen]
30 30 Another application of the graphical lasso Covariance-regularized regression and classification for high-dimensional problems- the Scout procedure. Daniela Witten - ongoing Stanford Ph.D. thesis Idea is to penalize the joint distribution of the features and response, to obtain a better estimate of the regression parameters.
31 31 Fused lasso 1 min β 2 n p p (y j β j ) 2 + λ 1 β j + λ 2 β j β j 1. j=1 j=1 j=2 (Tibshirani, Saunders, Rosset, Zhu and Knight, 2004) Coordinate descent: although criterion is convex, it is not differentiable, and coordinate descent can get stuck in the cusps modification is needed- have to move pairs of parameters together
32 32 Comparative Genomic Hybridization (CGH) data log2 ratio log2 ratio genome order genome order
33 33 lambda1= 0.01, lambda2= 0 lambda1= 0.01, lambda2= 0.21 lambda1= 0.01, lambda2= o o oo o o o ooo o o oo o o ooo o o oo o oo o oo o o oo oo o oo oo o oo o o oo oo ooo o oo o oo o oo o o o o o o o ooo o o o oo o o oo o o o oo o oo o o ooo o oo o o o o o oo oo o oo o ooo ooo o o oo oo o o oo oo o o o oo o o o ooo o o oo o o ooo o o oo o oo o oo o o oo oo o oo oo o oo o o oo oo ooo o oo o oo o oo o o o o o o o ooo o o o oo o o oo o o o oo o oo o o ooo o oo o o o o o oo oo o oo o ooo ooo o o oo oo o o oo oo o o o oo o o o ooo o o oo o o ooo o o oo o oo o oo o o oo oo o oo oo o oo o o oo oo ooo o oo o oo o oo o o o o o o o ooo o o o oo o o oo o o o oo o oo o o ooo o oo o o o o o oo oo o oo o ooo ooo o o oo oo o o oo oo o lambda1= 1.56, lambda2= 0 lambda1= 1.56, lambda2= 0.21 lambda1= 1.56, lambda2= o o oo o o o ooo o o oo o o ooo o o oo o oo o oo o o oo oo o oo oo o oo o o oo oo ooo o oo o oo o oo o o o o o o o ooo o o o oo o o oo o o o oo o oo o o ooo o oo o o o o o oo oo o oo o ooo ooo o o oo oo o o oo oo o o o oo o o o ooo o o oo o o ooo o o oo o oo o oo o o oo oo o oo oo o oo o o oo oo ooo o oo o oo o oo o o o o o o o ooo o o o oo o o oo o o o oo o oo o o ooo o oo o o o o o oo oo o oo o ooo ooo o o oo oo o o oo oo o o o oo o o o ooo o o oo o o ooo o o oo o oo o oo o o oo oo o oo oo o oo o o oo oo ooo o oo o oo o oo o o o o o o o ooo o o o oo o o oo o o o oo o oo o o ooo o oo o o o o o oo oo o oo o ooo ooo o o oo oo o o oo oo o lambda1= 3.11, lambda2= 0 lambda1= 3.11, lambda2= 0.21 lambda1= 3.11, lambda2= o o oo o o o ooo o o oo o o ooo o o oo o oo o oo o o oo oo o oo oo o oo o o oo oo ooo o oo o oo o oo o o o o o o o ooo o o o oo o o oo o o o oo o oo o o ooo o oo o o o o o oo oo o oo o ooo ooo o o oo oo o o oo oo o o o oo o o o ooo o o oo o o ooo o o oo o oo o oo o o oo oo o oo oo o oo o o oo oo ooo o oo o oo o oo o o o o o o o ooo o o o oo o o oo o o o oo o oo o o ooo o oo o o o o o oo oo o oo o ooo ooo o o oo oo o o oo oo o o o oo o o o ooo o o oo o o ooo o o oo o oo o oo o o oo oo o oo oo o oo o o oo oo ooo o oo o oo o oo o o o o o o o ooo o o o oo o o oo o o o oo o oo o o ooo o oo o o o o o oo oo o oo o ooo ooo o o oo oo o o oo oo o
34 34 Window size=5 Window size=10 TPR Fused.Lasso CGHseg CLAC CBS TPR Fused.Lasso CGHseg CLAC CBS FPR FPR Window size=20 Window size=40 TPR Fused.Lasso CGHseg CLAC CBS TPR Fused.Lasso CGHseg CLAC CBS FPR FPR
35 CGH- multisample fused lasso 35
36 36 Marginal projections have limitations...
37 37 Our idea Patient profiles (cakes) Underlying patterns (raw ingredients) weights(recipes)
38 Common patterns that were discovered 38
39 39 Two-dimensional fused lasso problem has the form 1 px px min (y ii β ii ) 2 β 2 i=1 i =1 subject to px px β ii s 1, i=1 px i=1 px i =1 i =2 i=2 i =1 px β i,i β i,i 1 s 2, px β i,i β i 1,i s 3. we consider fusions of pixels one unit away in horizontal or vertical directions. after a number of fusions, fused regions can become very irregular in shape. Required some impressive programming by Friedman.
40 40 Examples Two-fold validation (even and odd pixels) was used to estimate λ 1, λ 2.
41 41 Discussion lasso penalties are useful for fitting a wide variety of models to large datasets Pathwise coordinate descent enables to fit these models to large datasets for the first time Software in R language: LARS, cghflasso, glasso, glm.net
42 42 Finally Lasso can be used to prove Efron s last theorem. [Wild Applause]
43 1 Original image Fused lasso reconstruction Noisy image Fused lasso reconstruction
Pathwise coordinate optimization
Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Robert Tibshirani Stanford University ASA Bay Area chapter meeting Collaborations with Trevor Hastie, Jerome Friedman, Ryan Tibshirani, Daniela Witten,
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationFast Regularization Paths via Coordinate Descent
KDD August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. KDD August 2008
More informationFast Regularization Paths via Coordinate Descent
user! 2009 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerome Friedman and Rob Tibshirani. user! 2009 Trevor
More informationLearning with Sparsity Constraints
Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationLeast Angle Regression, Forward Stagewise and the Lasso
January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationA significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationPost-selection inference with an application to internal inference
Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationA note on the group lasso and a sparse group lasso
A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationAn Introduction to Graphical Lasso
An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationCovariance-regularized regression and classification for high-dimensional problems
Covariance-regularized regression and classification for high-dimensional problems Daniela M. Witten Department of Statistics, Stanford University, 390 Serra Mall, Stanford CA 94305, USA. E-mail: dwitten@stanford.edu
More informationRegression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays
Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Hui Zou and Trevor Hastie Department of Statistics, Stanford University December 5, 2003 Abstract We propose the
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationVariable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification
Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Tong Tong Wu Department of Epidemiology and Biostatistics
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More informationLecture 5: Soft-Thresholding and Lasso
High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models
More informationAutomatic Response Category Combination. in Multinomial Logistic Regression
Automatic Response Category Combination in Multinomial Logistic Regression Bradley S. Price, Charles J. Geyer, and Adam J. Rothman arxiv:1705.03594v1 [stat.me] 10 May 2017 Abstract We propose a penalized
More informationUnivariate shrinkage in the Cox model for high dimensional data
Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationJournal of Statistical Software
JSS Journal of Statistical Software March 2011, Volume 39, Issue 5. http://www.jstatsoft.org/ Regularization Paths for Cox s Proportional Hazards Model via Coordinate Descent Noah Simon Stanford University
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationIntroduction to the genlasso package
Introduction to the genlasso package Taylor B. Arnold, Ryan Tibshirani Abstract We present a short tutorial and introduction to using the R package genlasso, which is used for computing the solution path
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationClassification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.
Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationapplication in microarrays
Biostatistics Advance Access published April 7, 2006 Regularized linear discriminant analysis and its application in microarrays Yaqian Guo, Trevor Hastie and Robert Tibshirani Abstract In this paper,
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationSimultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More informationRegularized Discriminant Analysis and Its Application in Microarrays
Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationCorrelate. A method for the integrative analysis of two genomic data sets
Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation
More informationRegularized Discriminant Analysis and Its Application in Microarray
Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationSimultaneous variable selection and class fusion for high-dimensional linear discriminant analysis
Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationUndirected Graphical Models
17 Undirected Graphical Models 17.1 Introduction A graph consists of a set of vertices (nodes), along with a set of edges joining some pairs of the vertices. In graphical models, each vertex represents
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationSparse Ridge Fusion For Linear Regression
University of Central Florida Electronic Theses and Dissertations Masters Thesis (Open Access) Sparse Ridge Fusion For Linear Regression 2013 Nozad Mahmood University of Central Florida Find similar works
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationThe picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R
The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework
More informationTuning Parameter Selection in L1 Regularized Logistic Regression
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationA Robust Approach to Regularized Discriminant Analysis
A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationPost-selection Inference for Forward Stepwise and Least Angle Regression
Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationSparse statistical modelling
Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1]
More informationLEAST ANGLE REGRESSION 469
LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationApproximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.
Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and
More informationInstitute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR
DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable
More informationDifferent types of regression: Linear, Lasso, Ridge, Elastic net, Ro
Different types of regression: Linear, Lasso, Ridge, Elastic net, Robust and K-neighbors Faculty of Mathematics, Informatics and Mechanics, University of Warsaw 04.10.2009 Introduction We are given a linear
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationMachine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014
Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationSome new ideas for post selection inference and model assessment
Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23 Two topics 1. How to improve
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationBoosting as a Regularized Path to a Maximum Margin Classifier
Journal of Machine Learning Research () Submitted 5/03; Published Boosting as a Regularized Path to a Maximum Margin Classifier Saharon Rosset Data Analytics Research Group IBM T.J. Watson Research Center
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical
More information