Sparse survival regression
|
|
- Diana Crawford
- 5 years ago
- Views:
Transcription
1 Sparse survival regression Anders Gorst-Rasmussen Department of Mathematics Aalborg University November / 27
2 Outline Penalized survival regression The semiparametric additive risk model. Theoretical results. Software. Ultra high dimension Independence screening. 2 / 27
3 Recap: The semiparametric additive risk (SAR) model Assume hazard function given covariates: λ(t Z i ) = λ 0 (t) + Z i β 0 ; for some unspecified baseline λ 0, covariate Z i R p. Lin & Ying (1994): estimate β 0 as solution to S n β = s n where S n = n 1 n τ i=1 0 τ s n = n 1 n i=1 0 Y i (t)(z i Z(t)) 2 dt, (Z i Z(t))dN i (t); with Z(t) the at-risk-average of Z i s, Y i the at-risk-indicator. Asymptotics easy using the signal + error decomposition s n = S n β 0 + n 1 n τ i=1 0 (Z i Z(t))dM i (t), M i martingale. Notice similarity with linear model, X y = X X β + X ε. 3 / 27
4 What do we do if p > n? Course of action depends on the purpose 1. Model selection? 2. Low prediction error? A useful guideline is how practicians prefer to think: Few features convey all of the effect Nice if statistical models can reflect this. Classical sparsity modeling: Multiple testing. All subsets/forward stepwise regression. Ridge regression and truncation. Etc. Computational/practical issues; weak theoretical justification. 4 / 27
5 Lasso for the SAR model Least Absolute Shrinkage and Selection Operator. SAR estimation minimization of L(β) = β S n β 2β s n. Lasso penalized version: L 1 (β) = L(β) + λ p i=1 β i, with λ 0 a regularization parameter (compare with ridge regression where the penalty is the L 2 -norm of β). Convex problem (so computationally feasible). Simple form (so theoretically tractable). Key property: Lasso does variable selection. 5 / 27
6 Why lasso does variable selection β 2 β 2 t β 1 t β 1 Equivalent formulation: minimize L(β) subj. to p i=1 β i t. Continuous subset selection. No hypothesis testing only optimization. 6 / 27
7 Lasso is (typically) a shrinkage estimator Suppose Z i = (Z 1i,...,Z 1p ) with Z 1i,...,Z 1p independent. Lasso solutions with diagonal S n : LS ˆβ j = S ( ˆβ j,λ/s n,jj ), with ˆβ LS j = S 1 n,jj s n,j the least-squares estimate and So the soft-thresholding operator. S (x,y) = sign(x)( x y) + LS S n,jj ˆβ j λ = ˆβ j = 0; S n,jj LS ˆβ j > λ = 0 < ˆβ j < λ = 0 = ˆβ = ˆβ LS j. LS ˆβ j (with correct sign); Also how we like to think of lasso in non-orthogonal case. 7 / 27
8 Example lasso regularization path LASSO Standardized Coefficients beta /max beta 8 / 27
9 From regularization paths to models Selecting a model requires a good λ how to choose? K-fold cross-validation using the least squares loss: i.e. with L(β) = β S n β 2β s n, choose λ to minimize CV(λ) = K 1 K L (i)( ˆβ ( i) (λ) ) ; i=1 with L (i) calculated from ith fold, ˆβ ( i) (λ) from remaining. For lasso, 5-fold CV is often considered sufficient. Note that CV... is a stochastic procedure; stability may be poor for small n.... focuses on prediction optimality and generally overfits. Alternatives: Generalized cross-validation. AIC, BIC etc. Bootstrapping/subsampling. 9 / 27
10 Computation Good: Choose some λ s; fit using e.g. quadprog in R. Better: Path following algorithm (stagewise/conjugate gradient): Start with β 1 = = β p = 0; set r = s n 1. Find j such that r j maximal. 2. Set β j β j + ε j sign(r j ). 3. Set r r S n β. Least Angle Regression (LARS), Efron et al. (2004): compute ε j so that we get new j in step 1 every time. Yields complete regularization path in min{n, p} steps (piecewise linear!). Implemented in timereg function surv.lars. Expensive: we need both s n and S n Best: Cyclic coordinate descent (we need only some of S n ). 10 / 27
11 The experimental package ahaz for R Cyclic coordinate descent Friedman et al. (2007); fairly recent R package glmnet. Minimize convex L: R p R by initializing θ 1 = = θ p = 0, say; sequentially iterating coordinatewise updates until convergence: θ i argmin θi L( θ 1,..., θ i 1,θ i, θ i+1,..., θ p ). Many iterations needed but they are cheap. The SAR model (ongoing work) Coordinatewise updates of the form ( ) β j S s j β i S ij,λ Sjj 1, A := {k : β k 0}; i A with S (x,y) = sign(x)( x y) +. We need only rows in S n for active variables. Very stable (an issue for nonlinear models and p n). 11 / 27
12 ahaz in action Metzeler et al. (2008) data Predict time to acute myeloid leukaemia from gene expr. p = ; n = 242. Additional test set of 79 patients. 100-grid lasso path 10 seconds on standard laptop. Vanilla lasso + 5-fold cross-validation: 14 nonzero parameters. Continuous risk predictor has HR= 1.55 (p=0.003) Metzeler (2008): using 86 genes, HR= 1.8 (p=0.001). Coefficients 5e 04 0e+00 5e 04 1e 03 # nonzero parameters L1 norm 12 / 27
13 Lasso asymptotics (no IID decompositions, sorry) When does lasso select the right model? I.e. when does there exist a sequence λ n such that P( ˆ M (λ n ) }{{} Est. model = }{{} M ) 1, n? True model Sufficient: strong irrepresentable condition (Zhao & Yu 2006). This a technical condition on S n holds, for example, if Almost orthogonal design (depending M ). Constant correlation. Power decay correlation. Restrictive but close to necessary. Meinshausen & Yu (2006): under much weaker conditions ˆβ(λ n ) β = p i=1 ( ˆβ i (λ n ) βi 0 ) 2 P 0. So lasso should get large effects with large probability. Greenshtein & Ritov (2004): prediction consistency. 13 / 27
14 The much-sought-after oracle property If we have selection consistency, we sacrifice n-consistency. Consider the adaptive lasso (Zou 2006) with criterion L 1 (β) = β S n β 2β s n + p i=1 w i β i ; where w i = β i 1 for β some n-consistent estimator of β 0. Then we can choose λ n such that the oracle property holds: 1. P( M ˆ(λ n ) = M ) 1 2. n 1/2 ( ˆβ M βm 0 ) asymptotically normal with correct variance. See Ma & Leng (2007); Scheike & Martinussen (2009). So adaptive lasso is as good as an oracle. But: How to get a good n-consistent estimator? Fixed-parameter asymptotics can be deceiving! A two-stage approach can be useful: tune 1st stage-lasso to prediction; use weights w i = β lasso 1st st. i 1 in 2nd stage. 14 / 27
15 Example lasso and friends for Sorlie data set.seed(17) X <- as.matrix(sorlie[,11:ncol(sorlie)]) surv <- Surv(sorlie$time+1e-3runif(nrow(sorlie)),sorlie$status) # Lasso m <- ahazpen(surv,x); plot(m) cvla <-cv.ahazpen(surv,x,dfmax=75); plot(cvla) fitla <- ahazpen(surv,x,lambda=cvla$lambda.min); fitla # Weighted lasso cvala <- cv.ahazpen(surv,x,penalty.factor=1/abs(fitla$beta),lambda.min=1e-4) fitala <- ahazpen(surv,x,penalty.factor=1/abs(fitla$beta),lambda=cvala$lambda.min); fitala # Lasso with a non-penalized predictor (grade) summary(ahaz(surv,sorlie$grade)) m <- ahazpen(surv,cbind(sorlie$grade,x),keep=1) cvgra <-cv.ahazpen(surv,cbind(sorlie$grade,x),keep=1) fitgra <- ahazpen(surv,cbind(sorlie$grade,x),lambda=cvgra$lambda.min,keep=1); fitgra # Risk scores risksc.las <- predict(fitla,x,"lp") #(or just X%%fitla$beta) risksc.ala <- predict(fitala,x,"lp") risksc.gra <- scale(cbind(sorlie$grade,x)%%fitgra$beta) # Compare model fit summary(coxph(surv~risksc.las))$rsq summary(coxph(surv~risksc.ala))$rsq summary(coxph(surv~risksc.gra))$rsq f <- function(x) as.numeric(x>median(x)) plot(survfit(surv~f(risksc.las))) lines(survfit(surv~f(risksc.ala)),col=2) lines(survfit(surv~f(risksc.gra)),col=3) legend("bottomleft",c("lasso","adaptive","w/grade"),lty=1,col=1:3) 15 / 27
16 How certain are the lasso estimates? Standard errors; sandwich estimators/bootstrapping. Consistent only for nonzero parameters; and how to use? Monte Carlo methods may be more useful. E.g. stability selection (Meinshausen & Bühlmann 2010). Lasso on subsamples; calculate empirical selection probability. Probability of selection Lambda 16 / 27
17 Lasso is a great screening method: example Simulations from Cox model with 5 nonzero parameters (indices randomly chosen), normally distributed covariates. n = 200, p = 1000; block structure on covariance matrix. Average TPR/FPR over 25 independent simulations. True positive rate SAR lasso Univariate Cox P values False positive rate 17 / 27
18 Additional useful knowledge Beyond the lasso: Elastic net: combine lasso (L1) and ridge (L2) penalties. Yields joint selection of correlated predictors. SCAD penalty: convexity of lasso penalty is responsible for poor model selection. Replace with non-convex to get oracle. MC+ Dantzig selector Etc. Beyond the SAR model: Cox model (including path following algorithms). E.g. glmnet, penalized, glcoxph in R. Accelerated failure time models. But computation and theoretical analysis can be difficult. 18 / 27
19 The case of ultra-high dimension p of order exponential in n; e.g. 2nd order interactions in microarray studies. Penalized variable selection is computationally too intensive; even with fast coordinate descent methods. Most lasso theory works only when p = O(n α ) (at most!) Alternatives? Prediction: Anything goes so just pick variables marginally correlated with survival time (and apply a model of choice). Model selection: Much harder but can we do something similar? 19 / 27
20 Sure independence screening (SIS): linear regression Assume y = X β + ε (with standardized predictors). Estimated model: Mˆ n = {1 j p : ej X y > γ n }; simple hard thresholding of regression coefficients. Fan & Lv (2008): When γ n 0 at suitable rate, if M denotes the true model, we have Sure screening property: P(M M ˆ n ) 1; even with p exponential in n, assuming (with Σ = EX X ) 1. Semi-orthogonality : ej Σβ and βj 0 are large for j M. 2. Σ and X X are sufficiently regular. Iterated SIS: Condition (1) easily fails. Heuristic iterative procedure: Set r 1 := X y. For i = 1,2, Calculate Mˆ i by SIS with r i. 2. Estimate β 0 Mˆ by (penalized) regression, assuming SAR model. i 3. Take r i+1 = r i (X X ) ˆβ M ˆ (residual correlation). i 20 / 27
21 SIS: generally SIS study of misspecification: assuming some joint model, when can marginal parameters be used to decide sign of the (joint) parameter? Fan et al. (2009): if model fitting corresponds to minimizing pseudo-likehood L(β) = n i=1 Q(Z i β) Vanilla SIS according to marginal utilities L j = min β j n 1 n i=1 Q(Z ij β j ); j = 1,...,p; rank according to size, the smaller the more important. 2. Iterated SIS: if 1st stage model M ˆ, calculate L (2) j = min β j,β M n 1 n i=1 Q(Z ij β j + Z ˆ M β M ˆ ); j {1,...,p}\M ; and combine 2nd stage model based on {L (2) j } with M using penalized regression (allows deletion of features). Iterate until stable. Refer to Fan et al. (2009) for a bag of tricks. 21 / 27
22 Independence screening: the Cox model R-package SIS. Uses Cox adaption of SCAD penalization for iterated variant. Everything is so far completely heuristic; but works well. But caution is advised: even for independent covariates, marginal Cox estimates are well known to be inconsistent. SIS example: library(sis) # Read and break ties X <- as.matrix(sorlie[,11:ncol(sorlie)]) surv <- Surv(sorlie$time+1e-3runif(nrow(sorlie)),sorlie$status) # Example ISIS cox.van.sis <- COXvanISISscad(X, surv[,1], surv[,2]) cox.van.sis$isisind 22 / 27
23 Theoretically justified survival SIS: the SAR model Ongoing work. Recall the linear model similarity s n }{{} X y +martingale integral. }{{} (X X )β 0 X ε = S n β 0 }{{} This suggests screening based on the correlation s n. Formally sensible beyond SAR model; if administrative censoring at t = τ and centered Z i s: τ Es nj = Cov(Z 1j,F T (τ Z 1 )) + Cov(Z 1j,F T (t Z 1 ))K(t)dt 0 with F T (t Z 1 ) = P(T 1 t Z 1 ), K a strictly positive function. So E s nj is large if Cov(Z 1j,F T (t)) is consistently large. Checkable if e.g. F T (t Z 1 ) = Λ(t,Z1 α); Λ(t, ) monotone. 23 / 27
24 SAR iterated SIS Within SAR model, the sure screening property will hold if ej ES n β 0 and βj 0 are both large whenever βj 0 0. Such semi-orthogonality may fail (although not clear when). Iterative screening: Set s 1 := s n. For i = 1,2, Estimate Mˆ i by independence screening with s i. 2. Estimate β 0 Mˆ by (penalized) regression, assuming SAR model. i 3. Take s i+1 = ŝ S n ˆβ M ˆ (residual correlation ). i Currently: What does ES n mean? How do we deal with censoring? (How) does it work in practice? 24 / 27
25 Concluding remarks We can deal with p n under the assumption of sparsity. Modern sparsity modeling is a very elaborate exercise in not ignoring the correlation structure. It is not a silver bullet: We can make good, interpretable prediction models. But model selection is primarily model filtering. Sample size still matters. (How important/meaningful is theoretical model selection?) Difficult to translate to applied sciences but progress! 25 / 27
26 Exercise: survival predictions for DLBCL data 1. Load the data set bair.rda which consists of 240 patients diagnosed with diffuse large B-cell lymphoma. Variables are as follows: time and status self-explanatory; train (binary) indicates whether subject is in training set; X4,...,X7998 (continuous) are gene expressions. 2. Build and validate survival prediction models based on the gene epxression data. Specifically, basing estimation on the training set, predict (linear) risk scores in the test set based on: Prediction-tuned SAR lasso. SAR PLS with 1 component (a scaled version ˆβ PLS suffices: use ahaz(surv,x,univariate=true)$s). Cox iterated SIS: the standard Cox model including the relevant covariates obtained from COXvanISISscad. 3. Dichotomize the three different risk scores and compare their log-rank test p-values from coxph. Also plot the corresponding survival curves. 4. Obtain test set risk scores based on the truncated, scaled 1-component PLS estimator s n I ( s n > γ) for different γ (e.g. quantiles of s n ). Calculate the log-rank test p-values for corresponding collection of continuous predictors; plot versus number of nonzero entries in truncated PLS estimator. Interpret. 26 / 27
27 Good night stories 1. Lin DY & Ying Z (1994) Semiparametric Analysis of the additive risk model. Biometrika, 81: Tibshirani R (1996). Regression shrinkage and selection via the lasso. JRSS B, 58: Efron B et al. (2004). Least angle regression. Ann. Statist. 32: Friedman et al. (2007). Pathwise coordinate optimization. Ann. Appl. Stat., 1: Metzeler et al. (2008). An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. Blood, 112: Zhao P & Yu B (2006). On model selection consistency of LASSO. J Machine Learning Research, 7: Meinshausen N & Yu B (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist., 37: Ma S & Leng C (2007), Path consistent model selection in additive risk model via lasso. Statist. Med. 26: Martinussen T & Scheike TH (2009). Covariate selection for the semiparametric additive risk model, Scand. J. Statist. 36: Greenshtein E & Ritov Y. (2004). Persistence in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli, 10: Zou H (2006). The adaptive lasso and its oracle properties. JASA, 101: Meinshausen N & Bühlmann P (2010). Stability selection. JRSS B, 72: Fan J & Lv J (2008). Sure independence screening for ultra-high dimensional feature space. JRSS B, 70: Fan J, Samworth R & Wu Y (2009). Ultrahigh Dimensional Feature Selection: Beyond The Linear Model. J Machine Learning Research, 10: Fan J, Feng Y, Wu Y (2010). High-dimensional variable selection for Cox s proportional hazards model. Preprint. 27 / 27
Chapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable
More informationFEATURE SCREENING IN ULTRAHIGH DIMENSIONAL
Statistica Sinica 26 (2016), 881-901 doi:http://dx.doi.org/10.5705/ss.2014.171 FEATURE SCREENING IN ULTRAHIGH DIMENSIONAL COX S MODEL Guangren Yang 1, Ye Yu 2, Runze Li 2 and Anne Buu 3 1 Jinan University,
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationarxiv: v1 [stat.me] 30 Dec 2017
arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationFeature Screening in Ultrahigh Dimensional Cox s Model
Feature Screening in Ultrahigh Dimensional Cox s Model Guangren Yang School of Economics, Jinan University, Guangzhou, P.R. China Ye Yu Runze Li Department of Statistics, Penn State Anne Buu Indiana University
More informationUnivariate shrinkage in the Cox model for high dimensional data
Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)
More informationOn High-Dimensional Cross-Validation
On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationTGDR: An Introduction
TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We
More informationThe picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R
The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationLearning with Sparsity Constraints
Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationJournal of Statistical Software
JSS Journal of Statistical Software March 2011, Volume 39, Issue 5. http://www.jstatsoft.org/ Regularization Paths for Cox s Proportional Hazards Model via Coordinate Descent Noah Simon Stanford University
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationLecture 5: Soft-Thresholding and Lasso
High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized
More informationOther Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model
Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationInference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities
Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationA Confidence Region Approach to Tuning for Variable Selection
A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationIntroduction to the genlasso package
Introduction to the genlasso package Taylor B. Arnold, Ryan Tibshirani Abstract We present a short tutorial and introduction to using the R package genlasso, which is used for computing the solution path
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationFast Regularization Paths via Coordinate Descent
KDD August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. KDD August 2008
More informationADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS
Statistica Sinica 18(2008), 1603-1618 ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang, Shuangge Ma and Cun-Hui Zhang University of Iowa, Yale University and Rutgers University Abstract:
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationPathwise coordinate optimization
Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationStepwise Searching for Feature Variables in High-Dimensional Linear Regression
Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationThe Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS
The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS A Dissertation in Statistics by Ye Yu c 2015 Ye Yu Submitted
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationEnsemble estimation and variable selection with semiparametric regression models
Ensemble estimation and variable selection with semiparametric regression models Sunyoung Shin Department of Mathematical Sciences University of Texas at Dallas Joint work with Jason Fine, Yufeng Liu,
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationTwo Tales of Variable Selection for High Dimensional Regression: Screening and Model Building
Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationMachine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014
Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationFast Regularization Paths via Coordinate Descent
user! 2009 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerome Friedman and Rob Tibshirani. user! 2009 Trevor
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationHomogeneity Pursuit. Jianqing Fan
Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationExploratory quantile regression with many covariates: An application to adverse birth outcomes
Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationNon-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets
Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual
More informationStatistical aspects of prediction models with high-dimensional data
Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationStatistical Learning with the Lasso, spring The Lasso
Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationLeast Angle Regression, Forward Stagewise and the Lasso
January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationGenomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs.
Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Genomic Data German Cancer Research Center, Heidelberg, Germany June 16, 6 Diagnostics signatures
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationHIGH-DIMENSIONAL VARIABLE SELECTION WITH THE GENERALIZED SELO PENALTY
Vol. 38 ( 2018 No. 6 J. of Math. (PRC HIGH-DIMENSIONAL VARIABLE SELECTION WITH THE GENERALIZED SELO PENALTY SHI Yue-yong 1,3, CAO Yong-xiu 2, YU Ji-chang 2, JIAO Yu-ling 2 (1.School of Economics and Management,
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More information