Statistics for high-dimensional data: Group Lasso and additive models

Size: px
Start display at page:

Download "Statistics for high-dimensional data: Group Lasso and additive models"

Transcription

1 Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012

2 The Group Lasso (Yuan & Lin, 2006) high-dimensional parameter vector is structured into q groups or partitions (known a-priori): G 1,..., G q {1,..., p}, disjoint and g G g = {1,..., p} corresponding coefficients: β G = {β j ; j G}

3 Example: categorical covariates X (1),..., X (p) are factors (categorical variables) each with 4 levels (e.g. letters from DNA) for encoding a main effect: 3 parameters for encoding a first-order interaction: 9 parameters and so on... parameterization (e.g. sum contrasts) is structured as follows: intercept: no penalty main effect of X (1) : group G 1 with df = 3 main effect of X (2) : group G 2 with df = 3... first-order interaction of X (1) and X (2) : G p+1 with df = 9... often, we want sparsity on the group-level either all parameters of an effect are zero or not

4 often, we want sparsity on the group-level either all parameters of an effect are zero or not this can be achieved with the Group-Lasso penalty λ q g=1 m g β Gg 2 }{{} 2 2 typically m g = G g

5 properties of Group-Lasso penalty for group-sizes G g 1 standard Lasso-penalty convex penalty convex optimization for standard likelihoods (exponential family models) either ( ˆβ G (λ)) j = 0 or 0 for all j G penalty is invariant under orthonormal transformation e.g. invariant when requiring orthonormal parameterization for factors

6 DNA splice site detection: (mainly) prediction problem DNA sequence... ACGGC... E E E }{{} GC I I I I... AAC... } potential donor site {{ } 3 positions exon GC 4 positions intron response Y {0, 1}: splice or non-splice site predictor variables: 7 factors each having 4 levels (full dimension: 4 7 = ) data: training: test data: true splice sites non-splice sites plus an unbalanced validation set true splice sites non-splice sites

7 logistic regression: ( ) p(x) log = β 0 + main effects + first order interactions p(x) up to second oreder interactions: 1156 parameters use the Group-Lasso which selects whole terms

8 l 2 norm GL GL/R GL/MLE :3 1:5 1:7 2:4 2:6 3:4 3:6 4:5 4:7 5: :2 1:4 1:6 2:3 2:5 2:7 3:5 3:7 4:6 5:6 6:7 Term l 2 norm :2:3 1:2:5 1:2:7 1:3:5 1:3:7 1:4:6 1:5:6 1:6:7 2:3:5 2:3:7 2:4:6 2:5:6 2:6:7 3:4:6 3:5:6 3:6:7 4:5:7 5:6:7 1:2:4 1:2:6 1:3:4 1:3:6 1:4:5 1:4:7 1:5:7 2:3:4 2:3:6 2:4:5 2:4:7 2:5:7 3:4:5 3:4:7 3:5:7 4:5:6 4:6:7 Term mainly neighboring DNA positions show interactions (has been known and debated ) no interaction among exons and introns (with Group Lasso method) no second-order interactions (with Group Lasso method)

9 predictive power: competitive with state to the art maximum entropy modeling from Yeo and Burge (2004) correlation between true and predicted class Logistic Group Lasso max. entropy (Yeo and Burge) our model (not necessarily the method/algorithm) is simple and has clear interpretation

10 Generalized group Lasso penalty q λ m j βg T j A j β Gj, where A j are T j T j positive definite matrices generalized group Lasso: q ˆβ = argmin β ( Y Xβ 2 2 /n + λ m j βg T j A j β Gj )

11 reparameterize β Gj = A 1/2 j β Gj, X Gj = X Gj A 1/2 j q ˆβ = argmin β ( Y Xβ 2 2 /n + λ m j βg T j A j β Gj ) can be derived from ˆβ Gj = A 1/2 j ˆ β Gj, q ˆ β = argmin β ( Y X β 2 2 /n + λ m j β Gj 2 )

12 Groupwise prediction penalty and parameterization invariance q q λ m j X Gj β Gj 2 = λ m j βg T j X T G j X Gj β Gj is a generalized group Lasso penalty if X T G j X Gj definite (i.e. necessarily G j n) are positive this penalty is invariant under any (invertible) transformations within groups i.e. can use G j = B j β Gj where B j is any T j T j invertible matrix X Gj ˆβ Gj = X Gj ˆ β Gj, {j; ˆβ Gj 0} = {j; ˆ β Gj 0}

13 Some aspects from theory again : optimal prediction and estimation (oracle inequality) group screening: Ŝ S 0 }{{} with high prob. set of active groups but listen to Sara in a few minutes interesting case: G j s are large β Gj s are smooth

14 example: high-dimensional additive model Y = p f j (X (j) ) + ɛ and expand f j (x (j) ) = n k=1 β(j) (j) k Bk }{{}}{{} (β Gj ) k f j ( ) smooth smoothness of β Gj basis fct.s (x (j) )

15 Computation and KKT criterion function Q λ (β) = n 1 n ρ β (x i, Y i ) +λ }{{} loss fct. i=1 loss function ρ β (.,.) convex in β KKT conditions: G m g β g 2, g=1 ˆβGg ρ( ˆβ) Gg + λm g = 0 if ˆβ Gg 0 (not the 0-vector), ˆβ Gg 2 ρ( ˆβ) Gg 2 λm g if ˆβ Gg 0.

16 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Lasso: (β 1, β 2 = β (0) 2,..., β j = β (0) j,..., β p = β (0) p )

17 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Lasso: (β 1 = β (1) 1, β 2,..., β j = β (0) j,..., β p = β (0) p )

18 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Lasso: (β 1 = β (1) 1, β 2 = β (1) 2,..., β j,..., β p = β (0) p )

19 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Lasso: (β 1 = β (1) 1, β 2 = β (1) 2,..., β j = β (1) j,..., β p )

20 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Lasso: (β 1, β 2 = β (1) 2,..., β j = β (1) j,..., β p = β (1) p )

21 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Group Lasso: (β G1, β G2 = β (0) G 2,..., β Gj = β (0) G j,..., β Gq = β (0) G q )

22 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Group Lasso: (β G1 = β (1) G1, β G 2,..., β Gj = β (0) G j,..., β Gq = β (0) G q )

23 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Group Lasso: (β G1 = β (1) G1, β G 2 = β (1) G 2,..., β Gj,..., β Gq = β (0) G q )

24 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Group Lasso: (β G1 = β (1) G1, β G 2 = β (1) G 2,..., β Gj = β (1) G j,..., β Gq )

25 Block coordinate descent algorithm generic description for both, Lasso or Group-Lasso problems: cycle through all coordinates j = 1,..., p, 1, 2,... or j = 1,..., q, 1, 2,... optimize the penalized log-likelihood w.r.t. β j (or β Gj ) keeping all other coefficients β k, k j (or k G j ) fixed Group Lasso: (β G1, β G2 = β (1) G 2,..., β Gj = β (1) G j,..., β Gq = β (1) G q )

26 for Gaussian log-likelihood (squared error loss): blockwise up-dates are easy and closed-form solutions exist (use KKT) for other loss functions (e.g. logistic loss): blockwise up-dates: no closed-form solution strategy which is fast: improve every coordinate/group numerically, but not until numerical convergence (by using quadratic approximation of log-likelihood function for improving/optimization of a single block) and further tricks... (still allowing provable numerical convergence)

27 How fast? logistic case: p = 10 6, n = 100 group-size = 20, sparsity: 2 active groups = 40 parameters for 10 different λ-values CPU using grplasso: seconds 3.5 minutes (dual core processor with 2.6 GHz and 32 GB RAM) we can easily deal today with predictors in the Mega s i.e. p

28 How fast? logistic case: p = 10 6, n = 100 group-size = 20, sparsity: 2 active groups = 40 parameters for 10 different λ-values CPU using grplasso: seconds 3.5 minutes (dual core processor with 2.6 GHz and 32 GB RAM) we can easily deal today with predictors in the Mega s i.e. p

29 The sparsity-smoothness penalty (SSP) (whose corresponding optimization becomes again a Group-Lasso problem...) for additive modeling in high dimensions Y i = p f j (x (j) i ) + ε i (i = 1,..., n) f j : R R smooth univariate functions p n

30 in principle: basis expansion for every f j ( ) with basis functions h j,1,..., h j,k where K = O(n) (or e.g. K = O(n 1/2 )) j = 1,..., p represent p f j (x (j) ) = p K β j,k h j,k (x (j) ) k=1 high-dimensional parametric problem and use the Group-Lasso penalty to ensure sparsity of whole functions λ p β Gj 2 }{{} β j :=(β j,1,...,β j,k ) T

31 drawback: does not exploit smoothness (except when choosing appropriate K which is bad if different f j s have different complexity) when using a large number of basis functions (large K ) for achieving a high degree of flexibility need additional control for smoothness

32 Sparsity-Smoothness Penalty (SSP) λ 1 p f j n }{{} H j β j 2 / n I 2 (f j ) = p +λ 2 I(f j ) (f j (x)) 2 dx = β T j W j β j where f j = (f j (X (j) 1 ),..., f j(x (j) n )) T, and W j = h j,k (x)h j,l (x)dx SSP-penalty does variable selection (ˆf j 0 for some j)

33 Orthogonal basis and diagonal smoothing matrices n 1 H T j H j = I and W j diag(d 2 1,..., d 2 K ) := D2, d k = k m (m > 1/2) then, the penalty becomes λ 1 p β j 2 + λ 2 p Dβ j 2 p p p ˆβ(λ 1, λ 2 ) = argmin β Y H j β j 2 2 /n + λ 1 β j 2 + λ 2 Dβ j 2

34 the difficulty is the computation, although still a convex optimization problem see Section in the book

35 A modified SSP-penalty λ 1 p f j λ 2I 2 (f j ) for additive modeling: ˆf1,...,ˆf p = argmin f1,...,f p Y p f j λ 1 p f j λ 2I 2 (f j ) assuming f j is twice differentiable solution is a natural cubic spline with knots at X (j) i finite-dimensional parameterization with e.g. B-splines: f = p f j, f j = H j β j

36 penalty becomes: p f j λ 2I 2 (f j ) λ 1 p = λ 1 βt j p = λ 1 βt j (Σ j + λ 2 W j ) β }{{} A j =A j (λ 2 ) Bj T B j }{{} β j + λ 2 βt j W j }{{} Σ j integ. 2nd derivatives re-parameterize β j = β j (λ 2 ) = R j β j, Rj T R j = A j = A j (λ 2 ) (Choleski) penalty becomes p λ 1 β j 2 }{{} depending on λ 2 i.e., a Group-Lasso penalty β j

37 HIF1α motif additive regression for finding HIF1α transcription factor binding sites on DNA sequences n = 287, p = 196 additive model with SSP has 20% better prediction performance than linear model with Lasso bootstrap stability analysis: select the variables (functions) which have occurred at least in 50% among all bootstrap runs only 2 stable variables /candidate motifs remain Partial Effect Partial Effect Motif.P Motif.P1.6.26

38 Partial Effect Partial Effect Motif.P Motif.P right panel: variable corresponds to a true, known motif variable/motif corresponding to left panel: good additional support for relevance (nearness to transcriptional start-site of important genes,...)

Computationally Tractable Methods for High-Dimensional Data

Computationally Tractable Methods for High-Dimensional Data Computationally Tractable Methods for High-Dimensional Data Peter Bühlmann Seminar für Statistik, ETH Zürich August 2008 Riboflavin production in Bacillus Subtilis in collaboration with DSM (former Roche

More information

The Group Lasso for Logistic Regression

The Group Lasso for Logistic Regression The Group Lasso for Logistic Regression Lukas Meier Sara van de Geer Peter Bühlmann February 8, 2007 Abstract The Group Lasso (Yuan and Lin, 2006) is an extension of the Lasso to do variable selection

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Uncertainty quantification in high-dimensional statistics

Uncertainty quantification in high-dimensional statistics Uncertainty quantification in high-dimensional statistics Peter Bühlmann ETH Zürich based on joint work with Sara van de Geer Nicolai Meinshausen Lukas Meier 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Standardization and the Group Lasso Penalty

Standardization and the Group Lasso Penalty Standardization and the Group Lasso Penalty Noah Simon and Rob Tibshirani Corresponding author, email: nsimon@stanfordedu Sequoia Hall, Stanford University, CA 9435 March, Abstract We re-examine the original

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

STANDARDIZATION AND THE GROUP LASSO PENALTY

STANDARDIZATION AND THE GROUP LASSO PENALTY Statistica Sinica (0), 983-00 doi:http://dx.doi.org/0.5705/ss.0.075 STANDARDIZATION AND THE GROUP LASSO PENALTY Noah Simon and Robert Tibshirani Stanford University Abstract: We re-examine the original

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

mboost - Componentwise Boosting for Generalised Regression Models

mboost - Componentwise Boosting for Generalised Regression Models mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Boosting in a Nutshell Boosting

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

arxiv: v1 [stat.ml] 11 Mar 2016

arxiv: v1 [stat.ml] 11 Mar 2016 Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models arxiv:1603.03724v1 [stat.ml] 11 Mar 2016 Abstract Niharika Gauraha 1, and Swapan K. Parui 2 Indian

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors Patrick Breheny Department of Biostatistics University of Iowa Jian Huang Department of Statistics

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

arxiv: v1 [stat.me] 26 Sep 2012

arxiv: v1 [stat.me] 26 Sep 2012 Correlated variables in regression: clustering and sparse estimation Peter Bühlmann 1, Philipp Rütimann 1, Sara van de Geer 1, and Cun-Hui Zhang 2 arxiv:1209.5908v1 [stat.me] 26 Sep 2012 1 Seminar for

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

Lecture 5: A step back

Lecture 5: A step back Lecture 5: A step back Last time Last time we talked about a practical application of the shrinkage idea, introducing James-Stein estimation and its extension We saw our first connection between shrinkage

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping

The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping Daniela M. Witten, Ali Shojaie, Fan Zhang May 17, 2013 Abstract In the high-dimensional regression setting, the elastic

More information

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan

OPTIMISATION CHALLENGES IN MODERN STATISTICS. Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan OPTIMISATION CHALLENGES IN MODERN STATISTICS Co-authors: Y. Chen, M. Cule, R. Gramacy, M. Yuan How do optimisation problems arise in Statistics? Let X 1,...,X n be independent and identically distributed

More information

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent

Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent Homework 3 Conjugate Gradient Descent, Accelerated Gradient Descent Newton, Quasi Newton and Projected Gradient Descent CMU 10-725/36-725: Convex Optimization (Fall 2017) OUT: Sep 29 DUE: Oct 13, 5:00

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K. Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

A Fast Unified Algorithm for Solving Group-Lasso. Penalized Learning Problems

A Fast Unified Algorithm for Solving Group-Lasso. Penalized Learning Problems A Fast Unified Algorithm for Solving Group-Lasso Penalized Learning Problems Yi Yang and Hui Zou This version: July 2014 Abstract This paper concerns a class of group-lasso learning problems where the

More information

Theoretical Exercises Statistical Learning, 2009

Theoretical Exercises Statistical Learning, 2009 Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Sparsity and the Lasso

Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

UNIVERSITETET I OSLO

UNIVERSITETET I OSLO UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Examination in: STK4030 Modern data analysis - FASIT Day of examination: Friday 13. Desember 2013. Examination hours: 14.30 18.30. This

More information

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information