Sparse survival regression

Size: px
Start display at page:

Download "Sparse survival regression"

Transcription

1 Sparse survival regression Anders Gorst-Rasmussen Department of Mathematics Aalborg University November / 27

2 Outline Penalized survival regression The semiparametric additive risk model. Theoretical results. Software. Ultra high dimension Independence screening. 2 / 27

3 Recap: The semiparametric additive risk (SAR) model Assume hazard function given covariates: λ(t Z i ) = λ 0 (t) + Z i β 0 ; for some unspecified baseline λ 0, covariate Z i R p. Lin & Ying (1994): estimate β 0 as solution to S n β = s n where S n = n 1 n τ i=1 0 τ s n = n 1 n i=1 0 Y i (t)(z i Z(t)) 2 dt, (Z i Z(t))dN i (t); with Z(t) the at-risk-average of Z i s, Y i the at-risk-indicator. Asymptotics easy using the signal + error decomposition s n = S n β 0 + n 1 n τ i=1 0 (Z i Z(t))dM i (t), M i martingale. Notice similarity with linear model, X y = X X β + X ε. 3 / 27

4 What do we do if p > n? Course of action depends on the purpose 1. Model selection? 2. Low prediction error? A useful guideline is how practicians prefer to think: Few features convey all of the effect Nice if statistical models can reflect this. Classical sparsity modeling: Multiple testing. All subsets/forward stepwise regression. Ridge regression and truncation. Etc. Computational/practical issues; weak theoretical justification. 4 / 27

5 Lasso for the SAR model Least Absolute Shrinkage and Selection Operator. SAR estimation minimization of L(β) = β S n β 2β s n. Lasso penalized version: L 1 (β) = L(β) + λ p i=1 β i, with λ 0 a regularization parameter (compare with ridge regression where the penalty is the L 2 -norm of β). Convex problem (so computationally feasible). Simple form (so theoretically tractable). Key property: Lasso does variable selection. 5 / 27

6 Why lasso does variable selection β 2 β 2 t β 1 t β 1 Equivalent formulation: minimize L(β) subj. to p i=1 β i t. Continuous subset selection. No hypothesis testing only optimization. 6 / 27

7 Lasso is (typically) a shrinkage estimator Suppose Z i = (Z 1i,...,Z 1p ) with Z 1i,...,Z 1p independent. Lasso solutions with diagonal S n : LS ˆβ j = S ( ˆβ j,λ/s n,jj ), with ˆβ LS j = S 1 n,jj s n,j the least-squares estimate and So the soft-thresholding operator. S (x,y) = sign(x)( x y) + LS S n,jj ˆβ j λ = ˆβ j = 0; S n,jj LS ˆβ j > λ = 0 < ˆβ j < λ = 0 = ˆβ = ˆβ LS j. LS ˆβ j (with correct sign); Also how we like to think of lasso in non-orthogonal case. 7 / 27

8 Example lasso regularization path LASSO Standardized Coefficients beta /max beta 8 / 27

9 From regularization paths to models Selecting a model requires a good λ how to choose? K-fold cross-validation using the least squares loss: i.e. with L(β) = β S n β 2β s n, choose λ to minimize CV(λ) = K 1 K L (i)( ˆβ ( i) (λ) ) ; i=1 with L (i) calculated from ith fold, ˆβ ( i) (λ) from remaining. For lasso, 5-fold CV is often considered sufficient. Note that CV... is a stochastic procedure; stability may be poor for small n.... focuses on prediction optimality and generally overfits. Alternatives: Generalized cross-validation. AIC, BIC etc. Bootstrapping/subsampling. 9 / 27

10 Computation Good: Choose some λ s; fit using e.g. quadprog in R. Better: Path following algorithm (stagewise/conjugate gradient): Start with β 1 = = β p = 0; set r = s n 1. Find j such that r j maximal. 2. Set β j β j + ε j sign(r j ). 3. Set r r S n β. Least Angle Regression (LARS), Efron et al. (2004): compute ε j so that we get new j in step 1 every time. Yields complete regularization path in min{n, p} steps (piecewise linear!). Implemented in timereg function surv.lars. Expensive: we need both s n and S n Best: Cyclic coordinate descent (we need only some of S n ). 10 / 27

11 The experimental package ahaz for R Cyclic coordinate descent Friedman et al. (2007); fairly recent R package glmnet. Minimize convex L: R p R by initializing θ 1 = = θ p = 0, say; sequentially iterating coordinatewise updates until convergence: θ i argmin θi L( θ 1,..., θ i 1,θ i, θ i+1,..., θ p ). Many iterations needed but they are cheap. The SAR model (ongoing work) Coordinatewise updates of the form ( ) β j S s j β i S ij,λ Sjj 1, A := {k : β k 0}; i A with S (x,y) = sign(x)( x y) +. We need only rows in S n for active variables. Very stable (an issue for nonlinear models and p n). 11 / 27

12 ahaz in action Metzeler et al. (2008) data Predict time to acute myeloid leukaemia from gene expr. p = ; n = 242. Additional test set of 79 patients. 100-grid lasso path 10 seconds on standard laptop. Vanilla lasso + 5-fold cross-validation: 14 nonzero parameters. Continuous risk predictor has HR= 1.55 (p=0.003) Metzeler (2008): using 86 genes, HR= 1.8 (p=0.001). Coefficients 5e 04 0e+00 5e 04 1e 03 # nonzero parameters L1 norm 12 / 27

13 Lasso asymptotics (no IID decompositions, sorry) When does lasso select the right model? I.e. when does there exist a sequence λ n such that P( ˆ M (λ n ) }{{} Est. model = }{{} M ) 1, n? True model Sufficient: strong irrepresentable condition (Zhao & Yu 2006). This a technical condition on S n holds, for example, if Almost orthogonal design (depending M ). Constant correlation. Power decay correlation. Restrictive but close to necessary. Meinshausen & Yu (2006): under much weaker conditions ˆβ(λ n ) β = p i=1 ( ˆβ i (λ n ) βi 0 ) 2 P 0. So lasso should get large effects with large probability. Greenshtein & Ritov (2004): prediction consistency. 13 / 27

14 The much-sought-after oracle property If we have selection consistency, we sacrifice n-consistency. Consider the adaptive lasso (Zou 2006) with criterion L 1 (β) = β S n β 2β s n + p i=1 w i β i ; where w i = β i 1 for β some n-consistent estimator of β 0. Then we can choose λ n such that the oracle property holds: 1. P( M ˆ(λ n ) = M ) 1 2. n 1/2 ( ˆβ M βm 0 ) asymptotically normal with correct variance. See Ma & Leng (2007); Scheike & Martinussen (2009). So adaptive lasso is as good as an oracle. But: How to get a good n-consistent estimator? Fixed-parameter asymptotics can be deceiving! A two-stage approach can be useful: tune 1st stage-lasso to prediction; use weights w i = β lasso 1st st. i 1 in 2nd stage. 14 / 27

15 Example lasso and friends for Sorlie data set.seed(17) X <- as.matrix(sorlie[,11:ncol(sorlie)]) surv <- Surv(sorlie$time+1e-3runif(nrow(sorlie)),sorlie$status) # Lasso m <- ahazpen(surv,x); plot(m) cvla <-cv.ahazpen(surv,x,dfmax=75); plot(cvla) fitla <- ahazpen(surv,x,lambda=cvla$lambda.min); fitla # Weighted lasso cvala <- cv.ahazpen(surv,x,penalty.factor=1/abs(fitla$beta),lambda.min=1e-4) fitala <- ahazpen(surv,x,penalty.factor=1/abs(fitla$beta),lambda=cvala$lambda.min); fitala # Lasso with a non-penalized predictor (grade) summary(ahaz(surv,sorlie$grade)) m <- ahazpen(surv,cbind(sorlie$grade,x),keep=1) cvgra <-cv.ahazpen(surv,cbind(sorlie$grade,x),keep=1) fitgra <- ahazpen(surv,cbind(sorlie$grade,x),lambda=cvgra$lambda.min,keep=1); fitgra # Risk scores risksc.las <- predict(fitla,x,"lp") #(or just X%%fitla$beta) risksc.ala <- predict(fitala,x,"lp") risksc.gra <- scale(cbind(sorlie$grade,x)%%fitgra$beta) # Compare model fit summary(coxph(surv~risksc.las))$rsq summary(coxph(surv~risksc.ala))$rsq summary(coxph(surv~risksc.gra))$rsq f <- function(x) as.numeric(x>median(x)) plot(survfit(surv~f(risksc.las))) lines(survfit(surv~f(risksc.ala)),col=2) lines(survfit(surv~f(risksc.gra)),col=3) legend("bottomleft",c("lasso","adaptive","w/grade"),lty=1,col=1:3) 15 / 27

16 How certain are the lasso estimates? Standard errors; sandwich estimators/bootstrapping. Consistent only for nonzero parameters; and how to use? Monte Carlo methods may be more useful. E.g. stability selection (Meinshausen & Bühlmann 2010). Lasso on subsamples; calculate empirical selection probability. Probability of selection Lambda 16 / 27

17 Lasso is a great screening method: example Simulations from Cox model with 5 nonzero parameters (indices randomly chosen), normally distributed covariates. n = 200, p = 1000; block structure on covariance matrix. Average TPR/FPR over 25 independent simulations. True positive rate SAR lasso Univariate Cox P values False positive rate 17 / 27

18 Additional useful knowledge Beyond the lasso: Elastic net: combine lasso (L1) and ridge (L2) penalties. Yields joint selection of correlated predictors. SCAD penalty: convexity of lasso penalty is responsible for poor model selection. Replace with non-convex to get oracle. MC+ Dantzig selector Etc. Beyond the SAR model: Cox model (including path following algorithms). E.g. glmnet, penalized, glcoxph in R. Accelerated failure time models. But computation and theoretical analysis can be difficult. 18 / 27

19 The case of ultra-high dimension p of order exponential in n; e.g. 2nd order interactions in microarray studies. Penalized variable selection is computationally too intensive; even with fast coordinate descent methods. Most lasso theory works only when p = O(n α ) (at most!) Alternatives? Prediction: Anything goes so just pick variables marginally correlated with survival time (and apply a model of choice). Model selection: Much harder but can we do something similar? 19 / 27

20 Sure independence screening (SIS): linear regression Assume y = X β + ε (with standardized predictors). Estimated model: Mˆ n = {1 j p : ej X y > γ n }; simple hard thresholding of regression coefficients. Fan & Lv (2008): When γ n 0 at suitable rate, if M denotes the true model, we have Sure screening property: P(M M ˆ n ) 1; even with p exponential in n, assuming (with Σ = EX X ) 1. Semi-orthogonality : ej Σβ and βj 0 are large for j M. 2. Σ and X X are sufficiently regular. Iterated SIS: Condition (1) easily fails. Heuristic iterative procedure: Set r 1 := X y. For i = 1,2, Calculate Mˆ i by SIS with r i. 2. Estimate β 0 Mˆ by (penalized) regression, assuming SAR model. i 3. Take r i+1 = r i (X X ) ˆβ M ˆ (residual correlation). i 20 / 27

21 SIS: generally SIS study of misspecification: assuming some joint model, when can marginal parameters be used to decide sign of the (joint) parameter? Fan et al. (2009): if model fitting corresponds to minimizing pseudo-likehood L(β) = n i=1 Q(Z i β) Vanilla SIS according to marginal utilities L j = min β j n 1 n i=1 Q(Z ij β j ); j = 1,...,p; rank according to size, the smaller the more important. 2. Iterated SIS: if 1st stage model M ˆ, calculate L (2) j = min β j,β M n 1 n i=1 Q(Z ij β j + Z ˆ M β M ˆ ); j {1,...,p}\M ; and combine 2nd stage model based on {L (2) j } with M using penalized regression (allows deletion of features). Iterate until stable. Refer to Fan et al. (2009) for a bag of tricks. 21 / 27

22 Independence screening: the Cox model R-package SIS. Uses Cox adaption of SCAD penalization for iterated variant. Everything is so far completely heuristic; but works well. But caution is advised: even for independent covariates, marginal Cox estimates are well known to be inconsistent. SIS example: library(sis) # Read and break ties X <- as.matrix(sorlie[,11:ncol(sorlie)]) surv <- Surv(sorlie$time+1e-3runif(nrow(sorlie)),sorlie$status) # Example ISIS cox.van.sis <- COXvanISISscad(X, surv[,1], surv[,2]) cox.van.sis$isisind 22 / 27

23 Theoretically justified survival SIS: the SAR model Ongoing work. Recall the linear model similarity s n }{{} X y +martingale integral. }{{} (X X )β 0 X ε = S n β 0 }{{} This suggests screening based on the correlation s n. Formally sensible beyond SAR model; if administrative censoring at t = τ and centered Z i s: τ Es nj = Cov(Z 1j,F T (τ Z 1 )) + Cov(Z 1j,F T (t Z 1 ))K(t)dt 0 with F T (t Z 1 ) = P(T 1 t Z 1 ), K a strictly positive function. So E s nj is large if Cov(Z 1j,F T (t)) is consistently large. Checkable if e.g. F T (t Z 1 ) = Λ(t,Z1 α); Λ(t, ) monotone. 23 / 27

24 SAR iterated SIS Within SAR model, the sure screening property will hold if ej ES n β 0 and βj 0 are both large whenever βj 0 0. Such semi-orthogonality may fail (although not clear when). Iterative screening: Set s 1 := s n. For i = 1,2, Estimate Mˆ i by independence screening with s i. 2. Estimate β 0 Mˆ by (penalized) regression, assuming SAR model. i 3. Take s i+1 = ŝ S n ˆβ M ˆ (residual correlation ). i Currently: What does ES n mean? How do we deal with censoring? (How) does it work in practice? 24 / 27

25 Concluding remarks We can deal with p n under the assumption of sparsity. Modern sparsity modeling is a very elaborate exercise in not ignoring the correlation structure. It is not a silver bullet: We can make good, interpretable prediction models. But model selection is primarily model filtering. Sample size still matters. (How important/meaningful is theoretical model selection?) Difficult to translate to applied sciences but progress! 25 / 27

26 Exercise: survival predictions for DLBCL data 1. Load the data set bair.rda which consists of 240 patients diagnosed with diffuse large B-cell lymphoma. Variables are as follows: time and status self-explanatory; train (binary) indicates whether subject is in training set; X4,...,X7998 (continuous) are gene expressions. 2. Build and validate survival prediction models based on the gene epxression data. Specifically, basing estimation on the training set, predict (linear) risk scores in the test set based on: Prediction-tuned SAR lasso. SAR PLS with 1 component (a scaled version ˆβ PLS suffices: use ahaz(surv,x,univariate=true)$s). Cox iterated SIS: the standard Cox model including the relevant covariates obtained from COXvanISISscad. 3. Dichotomize the three different risk scores and compare their log-rank test p-values from coxph. Also plot the corresponding survival curves. 4. Obtain test set risk scores based on the truncated, scaled 1-component PLS estimator s n I ( s n > γ) for different γ (e.g. quantiles of s n ). Calculate the log-rank test p-values for corresponding collection of continuous predictors; plot versus number of nonzero entries in truncated PLS estimator. Interpret. 26 / 27

27 Good night stories 1. Lin DY & Ying Z (1994) Semiparametric Analysis of the additive risk model. Biometrika, 81: Tibshirani R (1996). Regression shrinkage and selection via the lasso. JRSS B, 58: Efron B et al. (2004). Least angle regression. Ann. Statist. 32: Friedman et al. (2007). Pathwise coordinate optimization. Ann. Appl. Stat., 1: Metzeler et al. (2008). An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. Blood, 112: Zhao P & Yu B (2006). On model selection consistency of LASSO. J Machine Learning Research, 7: Meinshausen N & Yu B (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist., 37: Ma S & Leng C (2007), Path consistent model selection in additive risk model via lasso. Statist. Med. 26: Martinussen T & Scheike TH (2009). Covariate selection for the semiparametric additive risk model, Scand. J. Statist. 36: Greenshtein E & Ritov Y. (2004). Persistence in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli, 10: Zou H (2006). The adaptive lasso and its oracle properties. JASA, 101: Meinshausen N & Bühlmann P (2010). Stability selection. JRSS B, 72: Fan J & Lv J (2008). Sure independence screening for ultra-high dimensional feature space. JRSS B, 70: Fan J, Samworth R & Wu Y (2009). Ultrahigh Dimensional Feature Selection: Beyond The Linear Model. J Machine Learning Research, 10: Fan J, Feng Y, Wu Y (2010). High-dimensional variable selection for Cox s proportional hazards model. Preprint. 27 / 27

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL

FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL Statistica Sinica 26 (2016), 881-901 doi:http://dx.doi.org/10.5705/ss.2014.171 FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL COX S MODEL Guangren Yang 1, Ye Yu 2, Runze Li 2 and Anne Buu 3 1 Jinan University,

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Feature Screening in Ultrahigh Dimensional Cox s Model

Feature Screening in Ultrahigh Dimensional Cox s Model Feature Screening in Ultrahigh Dimensional Cox s Model Guangren Yang School of Economics, Jinan University, Guangzhou, P.R. China Ye Yu Runze Li Department of Statistics, Penn State Anne Buu Indiana University

More information

Univariate shrinkage in the Cox model for high dimensional data

Univariate shrinkage in the Cox model for high dimensional data Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

TGDR: An Introduction

TGDR: An Introduction TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Learning with Sparsity Constraints

Learning with Sparsity Constraints Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software March 2011, Volume 39, Issue 5. http://www.jstatsoft.org/ Regularization Paths for Cox s Proportional Hazards Model via Coordinate Descent Noah Simon Stanford University

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Introduction to the genlasso package

Introduction to the genlasso package Introduction to the genlasso package Taylor B. Arnold, Ryan Tibshirani Abstract We present a short tutorial and introduction to using the R package genlasso, which is used for computing the solution path

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent KDD August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. KDD August 2008

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Statistica Sinica 18(2008), 1603-1618 ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang, Shuangge Ma and Cun-Hui Zhang University of Iowa, Yale University and Rutgers University Abstract:

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS

The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS A Dissertation in Statistics by Ye Yu c 2015 Ye Yu Submitted

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Ensemble estimation and variable selection with semiparametric regression models

Ensemble estimation and variable selection with semiparametric regression models Ensemble estimation and variable selection with semiparametric regression models Sunyoung Shin Department of Mathematical Sciences University of Texas at Dallas Joint work with Jason Fine, Yufeng Liu,

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Is the test error unbiased for these programs? 2017 Kevin Jamieson Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014 Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent user! 2009 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerome Friedman and Rob Tibshirani. user! 2009 Trevor

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Exploratory quantile regression with many covariates: An application to adverse birth outcomes Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Genomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs.

Genomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs. Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Genomic Data German Cancer Research Center, Heidelberg, Germany June 16, 6 Diagnostics signatures

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

HIGH-DIMENSIONAL VARIABLE SELECTION WITH THE GENERALIZED SELO PENALTY

HIGH-DIMENSIONAL VARIABLE SELECTION WITH THE GENERALIZED SELO PENALTY Vol. 38 ( 2018 No. 6 J. of Math. (PRC HIGH-DIMENSIONAL VARIABLE SELECTION WITH THE GENERALIZED SELO PENALTY SHI Yue-yong 1,3, CAO Yong-xiu 2, YU Ji-chang 2, JIAO Yu-ling 2 (1.School of Economics and Management,

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information