Sparse statistical modelling

Size: px
Start display at page:

Download "Sparse statistical modelling"

Transcription

1 Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28

2 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1] The number of features or variables measured on a person or object can be very large (e.g., expression levels of genes) These measurements are often highly correlated, i.e., contain much redundant information This scenario is particularly relevant in the age of big-data 1 Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC Press, 2015 Sparse statistical modelling Tom Bartlett 2 / 28

3 Outline Sparse linear models Sparse PCA Sparse SVD Sparse CCA Sparse LDA Sparse clustering Sparse statistical modelling Tom Bartlett 3 / 28

4 Sparse linear models A linear model can be written as p y i =α + x ij β j + ɛ i, j=1 i = 1,..., n =α + x i β + ɛ i Hence, the model can be fit by minimising the objective function { N } minimise (y i α x i β) 2 a,β i=1 Adding a penalisation term to the objective function makes the solution more sparse: { } 1 N minimise (y i α x i β) 2 + λ β q q, where q = 1 or 2 a,β 2N i=1 Sparse statistical modelling Tom Bartlett 4 / 28

5 Sparse linear models The penalty term λ β q q means that only the bare minimum is used of all the information available in the p predictor variables x ij, j = 1,...p. minimise a,β { 1 2N } N (y i α x i β) 2 + λ β q q i=1 q is typically chosen as q = 1 or q = 2, because these produce convex solutions and hence are computationally much nicer! q = 1 is called the lasso ; it tends to set as many elements of β as possible to zero q = 2 is called ridge regression, and it tends to minimise the size of all the elements of β Penalisation is equally applicable to other types of linear models: logistic regression, generalised linear models etc Sparse statistical modelling Tom Bartlett 5 / 28

6 Sparse linear models - simple example Lasso Ridge Regression Coefficients funding not hs college4 college hs Coefficients funding not hs college4 college hs ˆβ 1 / β 1 ˆβ 2 / β 2 Crime-rate modelled according to 5 predictors: annual police funding in dollars per resident (funding), percent of people 25 years and older with four years of high school (hs), percent of 16- to 19-year olds not in high school and not high school graduates (not-hs), percent of 18- to 24-year olds in college (college), and percent of people 25 years and older with at least four years of college (college4). Sparse statistical modelling Tom Bartlett 6 / 28

7 Sparse linear models - genomics example Gene expression data, for p = genes, for n c = 530 cancer samples + n h = 61 healthy tissue samples Fit logistic (i.e., 2 class, cancer/healthy) lasso model using the R package glmnet, selecting λ by cross-validation Out of possible genes for prediction, lasso chooses just these 25 (shown with their fitted model coefficients) ADAMTS HPD NUP ADH HS3ST PAFAH1B CA IGSF TACC CCDC LRRTM TESC CDH LRRC3B TRPM CES MEG TSLP COL10A MMP WDR51A DPP NUAK WISP HHATL Caveat: these are not necessarily the only predictive genes. If we removed these genes from the data-set and fitted the model again, lasso would choose an entirely new set of genes which might be almost as good at predicting! Sparse statistical modelling Tom Bartlett 7 / 28

8 Sparse PCA Ordinary PCA finds v by carrying out the optimisation: { } maximise v X X v 2 =1 n v, with X R n p (i.e., n samples and p variables). With p >> n, the eigenvectors of the sample covariance matrix X X/n are not necessarily close to those of the population covariance matrix [2]. Hence ordinary PCA can fail in this context. This motivates sparse PCA, in which many entries of v are encouraged to be zero, by finding v by carrying out the optimisation: maximise v 2 =1 { v X Xv }, subject to: v 1 t. In effect this discards some variables such that p is closer to n. 2 Iain M Johnstone. On the distribution of the largest eigenvalue in principal components analysis. In: Annals of statistics (2001), pp Sparse statistical modelling Tom Bartlett 8 / 28

9 Sparse SVD The SVD of a matrix X R n p, with n > p, can be expressed as X = UDV, where U R n p and V R p p are orthogonal and D R p p is diagonal. The SVD can hence be found by carrying out the optimisation: minimise U R n p,v R p p,d R p p X UDV 2. Hence, a sparse SVD with rank r can be obtained by carrying out the optimisation: { } minimise X UDV 2 + λ 1 U 1 + λ 2 V 1. U R n r,v R p r,d R r r This allows SVD to be applied to the p > n scenario. Sparse statistical modelling Tom Bartlett 9 / 28

10 Sparse PCA and SVD - an algorithm SVD is a generalisation of PCA. Hence, algorithms to solve the SVD problem can be applied to the PCA problem The sparse PCA can thus be re-formulated as: { } maximise u Xv, subject to: v 1 t, u 2 = v 2 =1 which is biconvex in u and v and can be solved by alternating between the updates: u Xv, and v S ( λ X u ) Xv 2 S λ (X, (1) u) 2 where S λ is the soft-thresholding operator S λ = sign(x) ( x λ) +. Sparse statistical modelling Tom Bartlett 10 / 28

11 Sparse PCA - simulation study Define Σ as a p p block-diagonal matrix, with p = 200 and 10 blocks of 1s of size Hence, we would expect there to be 10 independent components of variation in the corresponding distribution. Generate n samples x Normal(0, Σ) Estimate Σ = (x x)(x x) /n Eigenvector correlation Top 10 PCs n/p Correlate eigenvectors of Σ with eigenvectors of Σ Repeat 100 times for each different value of n The plot shows the means of these correlations over the 100 repetitions for different values of n. Sparse statistical modelling Tom Bartlett 11 / 28

12 Sparse PCA - simulation study An implementation of sparse PCA is available in the R package PMA as the function spca. It proceeds similarly to the algorithm described earlier, which is presented in more detail by Witten, Tibshirani and Hastie [3]. I applied this function to the same simulation as described in the previous slide. The scale of the penalisation is in terms of u 1, with u 1 = p being the minimum and u 1 = 1 being the maximum permissible values. Eigenvector correlation Top 10 PCs n/p The plot shows the result with u 1 = p. 3 Daniela M Witten, Robert Tibshirani, and Trevor Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. In: Biostatistics (2009), kxp008 Sparse statistical modelling Tom Bartlett 12 / 28

13 Sparse PCA - simulation study Eigenvector correlation Top 10 PCs n/p Eigenvector correlation Top 10 PCs n/p The plot shows the result with u 1 = p/2. The plot shows the result with u 1 = p/3. Sparse statistical modelling Tom Bartlett 13 / 28

14 Sparse PCA - real data example I carried out PCA on expression levels of genes in individual cells from developing brains There are many different cell types in the data - some mature, some immature, and some in between Different cell-types are characterised by different gene expression profiles We would therefore expect to be able visualise some separation of the cell-types by dimensionality reduction to three dimensions The plot shows the cells plotted in terms of the top three (standard) PCA components. Sparse statistical modelling Tom Bartlett 14 / 28

15 Sparse PCA - real data example The plot shows the cells in terms of the top three sparse PCA components, with u 1 = 0.1 p (i.e., a high level of regularisation). The plot shows the cells in terms of the top three sparse PCA components, with u 1 = 0.8 p (i.e., a low level of regularisation). Sparse statistical modelling Tom Bartlett 15 / 28

16 Sparse CCA In CCA, the aim is to find coefficient vectors u R p and v R q which project the data-matrices X R n p and Y R n q so as to maximise the correlations between these projections. Whereas PCA aims to find the direction of maximum variance in a single data-matrix, CCA aims to find the directions in the two data-matrices in which the variances best explain each other. The CCA problem can be solved by carrying out the optimisation: maximise Cor(Xu, Yv) u R p,v Rq This problem is not well posed for n < max(p, q), in which case u and v can be found which trivially give Cor(Xu, Yv) = 1. Sparse CCA solves this problem by carrying out the optimisation: maximise Cor(Xu, Yv), subject to u u R p,v R q 1 < t 1 and v 1 < t 2. Sparse statistical modelling Tom Bartlett 16 / 28

17 Sparse CCA - real data example Cell cycle is a biological process involved in the replication of cells Cell-cycle can be thought of as a latent process which is not directly observable in genomics data It is driven by a small set of genes (particularly cyclins and cyclindependent kinases) from which it may be inferred It has an effect on the expression of very many genes: hence it can also tend to act as a confounding factor when modelling many other biological processes Sparse statistical modelling Used CCA here as an exploratory tool, with Y the data for the cell cycle genes, and X the data for all the other genes. Tom Bartlett 17 / 28

18 Sparse LDA LDA assigns item i to a group G based a corresponding data-vector x i, according to the posterior probability: P(G = k x i ) = π kf k (x i ) K l=1 π lf l (x i ), f k (x i ) = with { 1 (2π) p/2 exp 1 } Σ 1/2 2 (x i µ k ) Σ 1 (x i µ k ), with prior π k and mean µ k for group k, and covariance Σ. This assignment takes place by constructing decision boundaries between classes k and l: log P(G = k x i) P(G = l x i ) = log π k π l + x i Σ 1 (µ k µ l ) 1 2 (µ k + µ l ) Σ 1 (µ k µ l ) Because this boundary is linear in x i, we get the name LDA. Sparse statistical modelling Tom Bartlett 18 / 28

19 Sparse LDA The decision boundary log P(G = k x i) P(G = l x i ) = log π k π l + x i Σ 1 (µ k µ l ) 1 2 (µ k + µ l ) Σ 1 (µ k µ l ) then naturally leads to the decision rule: } G(x i ) = argmax {log π k + x i Σ 1 µ k µ k Σ 1 µ k. k By assuming Σ is diagonal, i.e., there is no covariance between the p dimensions, this decision rule can be reduced to the nearest centroids classifier: p (x j µ jk ) 2 G(x i ) = argmin log π k k. j=1 Typically, Σ (or σ) are estimated from the data as Σ (or σ), and the µ k are estimated as µ k whilst training the classifier. σ 2 j Sparse statistical modelling Tom Bartlett 19 / 28

20 Sparse LDA The nearest centroids classifier p (x j µ jk ) 2 Ĝ(x i ) = argmin log π k k j=1 will typically use all p variables. This is often unnecessary and can lead to overfitting in high-dimensional contexts. The nearest shrunken centroids classifier deals with this issue. Define µ = x + α k, where x is the data-mean across all classes, and α k is the class-specific deviation of the mean from x. Then, the nearest shrunken centroids classifier proceeds with the optimisation: 1 K p (x ij x j α jk ) 2 minimise α k R p,k {1,...,K} 2n ˆσ 2 k=1 i C k j=1 K p nk +λ ˆσ 2 α jk, σ 2 j k=1 j=1 where C k and n k are the set and number of samples in group k. Sparse statistical modelling Tom Bartlett 20 / 28

21 Sparse LDA Hence, the α k estimated from the optimisation 1 K p (x ij x j α jk ) 2 minimise α k R p,k {1,...,K} 2n ˆσ 2 k=1 i C k j=1 +λ K p k=1 j=1 nk α jk can be used to estimate the shrunken centroids µ = x + α k, thus training the classifier: p (x j µ jk ) 2 Ĝ(x i ) = argmin log π k k. j=1 σ 2 j ˆσ 2 Sparse statistical modelling Tom Bartlett 21 / 28

22 Sparse LDA - real data example I applied nearest (shrunken) centroids to expression data for genes, for 347 cells of different types: leukocytes (54); lymphoblastic cells (88); fetal brain cells (16wk, 26; 21wk, 24); fibroblasts (37); ductal carcinoma (22); keratinocytes (40); B lymphoblasts (17); ips cells (24); neural progenitors (15). Used R packages MASS, and pamr [4]. Carried out 100 repetitions of 3-fold CV. Plots show normalised mutual information (NMI), adjusted Rand index (ARI) and prediction accuracy. NMI ARI Accuracy Sparsity threshold Sparsity threshold Sparsity threshold Sparse LDA quantile (over 300 predictions) 100% 75% 50% 25% 0% Regular LDA quantile (over 300 predictions) 100% 75% 50% 25% 0% 4 Robert Tibshirani et al. Class prediction by nearest shrunken centroids, with applications to DNA microarrays. In: Statistical Science (2003), pp Sparse statistical modelling Tom Bartlett 22 / 28

23 Sparse clustering Many clustering methods, such hierarchical clustering, are based on a dissimilarity measure D i,i = p j=1 d i,i,j between samples i and i. One popular choice of dissimilarity measure is the euclidean distance. In high-dimensions, it is often unnecessary to use information from all of the p dimensions. A weighted dissimilarity measure D i,i = p j=1 w jd i,i,j can be a useful approach to this problem. This can be obtained by the sparse matrix decomposition: maximise u w, subject to u 2 1, w 2 1, u R n2,w R p w 1 t, and w j 0, j {1,..., p}, where w is vector of the weights w j, j {1,..., p}, and R n2 p is the dissimilarity components arranged such that each row of corresponds to the d i,i,j, j {1,..., p} for a pair of samples i, i. This weighted dissimilarity measure can then be used for sparse clustering, such as sparse hierarchical clustering. Sparse statistical modelling Tom Bartlett 23 / 28

24 Sparse clustering Some clustering methods, such as K-means, need a slightly modified approach. K-means seeks to minimise the within-cluster sum of squares K x i x k 2 2 = 1 K x i x i 2 2 2N i C k k=1 i,i C k k=1 where C k is the set of samples in cluster k and x k is the corresponding centroid. Hence, a weighted K-means could proceed according to the optimisation: minimise w R p p j=1 w j K 1 d i,i n,j k k=1 i,i C k, where d i,i,j = (x ij x i j) 2, and n k is the number of samples in cluster k. Sparse statistical modelling Tom Bartlett 24 / 28

25 Sparse clustering However, for the optimisation p minimise w R p j=1 w j K 1 n k k=1 i,i C k d i,i,j, it is not possible to choose a set of constraints which guarantee a non-pathological solution as well as convexity. Instead, the between-cluster sum of squares can be maximised: p maximise w w R p j 1 n n K 1 d i,i n,j j=1 i=1 i =1 d i,i n,j k k=1 i,i C k subject to w 2 1, w 1 t, and w j 0, j {1,..., p}. Sparse statistical modelling Tom Bartlett 25 / 28

26 Sparse clustering - real data examples Applied (sparse) hierarchal clustering to the same benchmark expression data-set (14349 genes, for 347 cells of different types). NMI L1 bound Used R package sparcl [5] for the sparse clustering. Plots show normalised mutual information (NMI) and adjusted Rand index (ARI) comparing sparse with standard clustering. ARI L1 bound Sparse hierarchical clustering hierarchical clustering 5 Daniela M Witten and Robert Tibshirani. A framework for feature selection in clustering. In: Journal of the American Statistical Association (2012) Sparse statistical modelling Tom Bartlett 26 / 28

27 Sparse clustering - real data examples Applied (sparse) k-means to the same benchmark expression data-set (14349 genes, for 347 cells of different types). NMI L1 bound Used R package sparcl for the sparse clustering. Plots show normalised mutual information (NMI) and adjusted Rand index (ARI) comparing sparse with standard clustering. ARI L1 bound Sparse k means k means Sparse statistical modelling Tom Bartlett 27 / 28

28 Sparse clustering - real data examples Spectral clustering essentially uses k-means clustering (or similar) in dimensionallyreduced (e.g., PCA) space. Applied standard k-means in sparse-pca space to the same benchmark expression data-set (14349 genes, for 347 cells of different types). Offers computational advantages, running in 9 seconds on a 2.8GHz Macbook, compared with 19 seconds for standard k-means, and 35 seconds for sparse k-means. NMI ARI L1 bound / sqrt(n) L1 bound / sqrt(n) Sparse spectral k means k means Sparse statistical modelling Tom Bartlett 28 / 28

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Learning From Data: Modelling as an Optimisation Problem

Learning From Data: Modelling as an Optimisation Problem Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Correlate. A method for the integrative analysis of two genomic data sets

Correlate. A method for the integrative analysis of two genomic data sets Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Regularized Discriminant Analysis and Its Application in Microarrays

Regularized Discriminant Analysis and Its Application in Microarrays Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent KDD August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. KDD August 2008

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Robert Tibshirani Stanford University ASA Bay Area chapter meeting Collaborations with Trevor Hastie, Jerome Friedman, Ryan Tibshirani, Daniela Witten,

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Regularized Discriminant Analysis and Its Application in Microarray

Regularized Discriminant Analysis and Its Application in Microarray Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Standardization and Singular Value Decomposition in Canonical Correlation Analysis

Standardization and Singular Value Decomposition in Canonical Correlation Analysis Standardization and Singular Value Decomposition in Canonical Correlation Analysis Melinda Borello Johanna Hardin, Advisor David Bachman, Reader Submitted to Pitzer College in Partial Fulfillment of the

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Penalized versus constrained generalized eigenvalue problems

Penalized versus constrained generalized eigenvalue problems Penalized versus constrained generalized eigenvalue problems Irina Gaynanova, James G. Booth and Martin T. Wells. arxiv:141.6131v3 [stat.co] 4 May 215 Abstract We investigate the difference between using

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent user! 2009 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerome Friedman and Rob Tibshirani. user! 2009 Trevor

More information

application in microarrays

application in microarrays Biostatistics Advance Access published April 7, 2006 Regularized linear discriminant analysis and its application in microarrays Yaqian Guo, Trevor Hastie and Robert Tibshirani Abstract In this paper,

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Sparse Principal Component Analysis Formulations And Algorithms

Sparse Principal Component Analysis Formulations And Algorithms Sparse Principal Component Analysis Formulations And Algorithms SLIDE 1 Outline 1 Background What Is Principal Component Analysis (PCA)? What Is Sparse Principal Component Analysis (spca)? 2 The Sparse

More information

High-dimensional data: Exploratory data analysis

High-dimensional data: Exploratory data analysis High-dimensional data: Exploratory data analysis Mark van de Wiel mark.vdwiel@vumc.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Contributions by Wessel

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

MSA200/TMS041 Multivariate Analysis

MSA200/TMS041 Multivariate Analysis MSA200/TMS041 Multivariate Analysis Lecture 8 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Back to Discriminant analysis As mentioned in the previous

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Sparse Principal Component Analysis

Sparse Principal Component Analysis Sparse Principal Component Analysis Hui Zou, Trevor Hastie, Robert Tibshirani April 26, 2004 Abstract Principal component analysis (PCA) is widely used in data processing and dimensionality reduction.

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

A Framework for Feature Selection in Clustering

A Framework for Feature Selection in Clustering A Framework for Feature Selection in Clustering Daniela M. Witten and Robert Tibshirani October 10, 2011 Outline Problem Past work Proposed (sparse) clustering framework Sparse K-means clustering Sparse

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Extensions to LDA and multinomial regression

Extensions to LDA and multinomial regression Extensions to LDA and multinomial regression Patrick Breheny September 22 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction Quadratic discriminant analysis Fitting models Linear discriminant

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Learning Gaussian Graphical Models with Unknown Group Sparsity

Learning Gaussian Graphical Models with Unknown Group Sparsity Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

Penalized classification using Fisher s linear discriminant

Penalized classification using Fisher s linear discriminant J. R. Statist. Soc. B (2011) 73, Part 5, pp. 753 772 Penalized classification using Fisher s linear discriminant Daniela M. Witten University of Washington, Seattle, USA and Robert Tibshirani Stanford

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University The Annals of Applied Statistics 2008, Vol. 2, No. 3, 986 1012 DOI: 10.1214/08-AOAS182 Institute of Mathematical Statistics, 2008 TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS BY DANIELA

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information