Path consistent model selection in additive risk model via Lasso

Size: px
Start display at page:

Download "Path consistent model selection in additive risk model via Lasso"

Transcription

1 STATISTICS IN MEDICINE Statist. Med. 2007; 26: Published online 16 February 2007 in Wiley InterScience ( Path consistent model selection in additive risk model via Lasso Chenlei Leng 1 and Shuangge Ma 2,, 1 Department of Statistics and Applied Probability, National University of Singapore, Singapore 2 Department of Epidemiology and Public Health, Yale University, U.S.A. SUMMARY As a flexible alternative to the Cox model, the additive risk model assumes that the hazard function is the sum of the baseline hazard and a regression function of covariates. For right censored survival data when variable selection is needed along with model estimation, we propose a path consistent model selector using a modified Lasso approach, under the additive risk model assumption. We show that the proposed estimator possesses the oracle variable selection and estimation property. Applications of the proposed approach to three right censored survival data sets show that the proposed modified Lasso yields parsimonious models with satisfactory estimation and prediction results. Copyright q 2007 John Wiley &Sons,Ltd. KEY WORDS: additive risk model; Lasso; oracle properties; variable selection 1. INTRODUCTION In clinical studies, it is common to have multiple covariates measured along with censored survival outcomes. Our study is partly motivated by examples like the famous primary biliary cirrhosis (PBC) study [1], where 17 covariates along with right censored survival outcome were recorded for 312 patients. For variable selection, Fleming and Harrington [1] applied a step-down procedure using the Wald test under the Cox model. It is shown that only a subset of the 17 covariates is in fact associated with the censored clinical outcome. From a clinical point of view, separating important covariates from unimportant ones may lead to better understanding of the causal relationships and more interpretable results. Statistically speaking, including covariates not associated with the outcome in the model fitting leads to unstable estimates and loss of power. So for data sets like Correspondence to: Shuangge Ma, Department of Epidemiology and Public Health, Division of Biostatistics, Yale University, New Haven, CT, U.S.A. shuangge.ma@yale.edu Contract/grant sponsor: NUS Contract/grant sponsor: NIH; contract/grant number: RR Received 13 March 2006 Copyright q 2007 John Wiley & Sons, Ltd. Accepted 18 December 2006

2 3754 C. LENG AND S. MA the PBC, variable selection is needed along with model estimation. In this article, we consider variable selection for right censored survival data with multiple covariates, under the additive risk model assumption. The additive risk model for time to event data assumes that the conditional hazard function for the failure time T of interest is associated with a p-vector covariate Z( ) via λ(t; Z) = λ 0 (t) + (β 0 ) Z(t) where λ 0 is the unknown baseline function and β 0 is the unknown regression parameter. The additive risk model assumes that the risk factors contribute to the hazard in an additive manner and provides a useful alternative to the Cox model [2] especially when the proportional hazard assumption is violated. The additive risk model has been extensively studied and demonstrated to have satisfactory biological and statistical implications [3 8]. With right censored data and the additive risk model, Lin and Ying [7] demonstrated that the estimating equation has a simple least-squared format. This property, which is not shared by the Cox model, makes estimation with multiple covariates computationally affordable. We are most interested in the case that some components of β 0 are extremely small or exactly zero, which are referred as zero covariate effects (as opposed to non-zero covariate effects) in this article. In this case, it is essential to carry out variable selection and identify the important non-zero covariate effects. A comprehensive account of variable selection in linear regression can be found in [9]. Breiman [10, 11] demonstrated that traditional methods, such as best subset selection or stepwise approaches, may suffer from instability and lack of accuracy. A number of penalized approaches, where variable selection is achieved by penalizing the complexity of models, were proposed to tackle these two problems. Especially, Breiman [10] recommended the non-negative garrote estimate, which was further extended to the Lasso [12]. In right censored survival analysis, the Lasso method for Cox s model was investigated by Tibshirani [13]. Variable selection in the accelerated failure time (AFT) model via Lasso and gradient descent regularization was studied in [14]. Fan and Li [15] proposed the SCAD estimator, which possesses the oracle property. Namely, the model can be estimated as if the true sub-model were known in advance. Compared with traditional methods like the step-down approach, penalization methods have better theoretical ground and more satisfactory finite sample performance. Compared with the Cox model and the AFT model, study of the additive risk model remains rare. In gene selection context, Ma and Huang [16] proposed a simple Lasso approach to select relevant predictors for right censored survival data with multiple covariates under the additive risk model assumption. A careful examination shows that their implementation is a hybrid Lasso in the sense the Lasso is operated on the estimating equation. Recent theoretical studies show that the simple Lasso method is not generally path consistent [17 19], in the sense that with probability greater than zero the whole solution path of the Lasso may not contain the true model even if there is one, if the design matrix satisfies certain conditions. More troublesome is the fact that the Lasso does not select the right model by utilizing a prediction criterion, even if the solution path contains the true model [20]. In order to remedy those problems, Zou [17] and Wang et al. [21] proposed a modified Lasso procedure by adaptively penalizing each coefficient and showed the resulting estimator has the oracle property. Zou [17] further discussed various choices of adaptive penalties and demonstrated convincingly the usefulness of the improved Lasso method. In a related article, Yuan and Lin [18] showed that the non-negative garrote estimator yields path consistent models. Compared with other penalization methods, Lasso-based methods may be preferred since the objective function is convex and computation cost is relatively small.

3 MODEL SELECTION IN ADDITIVE RISK MODEL 3755 In this article, we consider variable selection for data like the PBC example, under the additive risk model assumption. We propose an estimating equation-based Lasso method to select relevant predictors. We show that a simple modification of the Lasso can yield path consistent estimate in terms of model selection and root-n convergence rate for estimating regression coefficients. The asymptotic properties of the estimator proposed by Ma and Huang [16] can in fact be established in a similar manner. Compared with [16], the proposed approach is theoretically path consistent without losing the computational simplicity or sacrificing the prediction power. The rest of the article is organized as follows. Section 2 presents Lin Ying s [7] estimator for the additive risk model, the Lasso and the modified Lasso estimates. In Section 3, we show that the modified Lasso estimator possesses the oracle property. In Section 4, we outline the fast Lars Lasso algorithm to compute the whole solution path, which is significantly different from the L 1 boosting in [16]. In addition, we propose choosing the penalty based on the V-fold cross-validation. In Section 5, we analyse the PBC data and two other survival data sets. Section 6 gives a short conclusion. Proofs are relegated to Section ADDITIVE RISK MODEL WITH MODIFIED LASSO 2.1. Additive risk model Consider right censored survival data, where the event time of interest T is subject to random censoring C. Wealsoassumealengthp covariate Z is present. Consider a set of n independent subjects such that the observed counting process {N i (t), t 0} counts the number of events for the ith subject up to time t. Let {Y i (t), t 0} be the at-risk process for the ith individual. Further denote the cumulative baseline hazard function for λ 0 (t) as Λ 0 (t) = t 0 λ 0(u) du. The intensity for N i (t) is then given by Y i (t)dλ(t; Z i ) = Y i (t){dλ 0 (t) + (β 0 ) Z i (t)dt} where Λ is the conditional cumulative hazard function. For detailed data and model assumptions, please refer to [7]. Lin and Ying [7] proposed solving the following estimating equation to estimate β 0 : U(β) = n Z i (t){dn i (t) Y i (t)d ˆΛ 0 (β, t) Y i (t)β Z i (t) dt} where i=1 0 ˆΛ 0 (β, t) = t 0 n {dn j (u) Y j (u)β Z j (u) du} n Y j is an estimate of Λ 0 (t). Thus, the estimating equation can be written as U(β) = n {Z i (t) Z(t)}{dN i (t) Y i (t)β Z i (t) dt} where i=1 0 / Z(t) = n n Y j (t)z j (t) Y j (t)

4 3756 C. LENG AND S. MA The resulting estimator takes the explicit form [ n ] 1 [ n ˆβ = Y i (t){z i (t) Z(t)} 2 dt i=1 0 i=1 0 ] {Z i (t) Z(t)}dN i (t) = A 1 n b n (1) where a 2 = aa is a rank one p p matrix. Under mild regularity conditions, Lin and Ying [7] showed that the random vector n 1/2 U(β 0 ) converges weakly to a normal variable with mean zero and a covariance matrix B which can be consistently estimated by ˆB = n 1 n i=1 0 {Z i (t) Z(t)} 2 dn i (t) Moreover, n 1/2 (ˆβ β 0 ) converges weakly to a p-variate normally distributed variable with mean zero and a covariance matrix Σ = A 1 BA 1 which can be consistently estimated by ˆΣ =  1 ˆB  1, where  = n 1 n i=1 0 Y i (t){z i (t) Z(t)} 2 dt Modified Lasso In the case of multiple covariates with the additive risk model, especially when the number of covariates is large compared with the sample size, empirical studies reveal that the estimated variances for different coefficients may differ by several orders, which indicates unstable estimate (results not shown). Thus, dimension reduction or variable selection is needed along with estimation to remove zero covariate effects. Various dimension reduction methods such as principal component analysis [22] or partial least squares [23] can be applied. However, with dimension reduction methods, usually linear combinations of all covariates are used. This can lead to unclear explanation of the model fitting results. Moreover, if certain covariates are not related to the outcome, it is important to remove those covariates from model fitting. In this sense, variable selection techniques may be preferred. Variable selection can be achieved with penalization methods. Inspired by successes of the Lasso for the Cox and AFT models, we adopt the Lasso procedure and propose estimating β by minimizing 1 p 2 [β A n β 2β b n ]+nλ n β j (2) where λ n is the tuning parameter that essentially governs the bias variance trade-off of the estimate. With slight abuse of notation, we still denote minimizer of (2) as ˆβ. Whenλ n = 0, we go back to the Lin Ying estimator in (1); when λ n =, ˆβ = 0. The first part of (2) mimics the loss function in linear regression, especially if we can write A n = X X and b n = X y, where X is the design matrix and y is the response vector. If β is obtained from the Lin Ying s estimate, then the first part is zero. So loosely speaking, it measures the goodness-of-fit. With slight abuse of notation, we refer it as the loss function. Since A n is positive definite, we can decompose it via singular value decomposition as A n = V QV, where V is an orthogonal matrix and Q = diag(σ 1,...,σ p ) such that σ 1 σ p >0.

5 MODEL SELECTION IN ADDITIVE RISK MODEL 3757 By denoting X = Q 1/2 V and y = Q 1/2 Vb n, (2) has the familiar Lasso form up to a constant 1 p 2 y Xβ 2 + nλ n β j Note here that X is a p p instead of n p matrix, and y is a p-vector, which are significantly different from the standard linear regression formulation. Ma and Huang [16] suggested to replace X by A n and y by b n and minimize the objective function 1 p 2n b n A n β 2 + nλ n β j (3) Up to a scale constant, the above objective function (3) is equivalent to 1 p 2n X y X Xβ 2 + nλ n β j which can be easily seen as a variant of the proposed Lasso estimate. As pointed out in the Introduction, the formulation in (2) or (3) may fail to give path consistent model selection result under certain assumptions on the design matrix [19]. Especially, it is possible that (1) with probability greater than zero, the whole path may not include the true parameter value; or (2) even if it is included, prediction-based approach cannot identify this true model. If the above scenarios happen, we conclude that the variable selection result is not consistent. A simple remedy is to replace the penalty p β j by a data-dependent weighted L 1 norm p w j β j, resulting in 1 p 2 [β A n β 2β b n ]+nλ n w j β j (4) where w j s are the non-negative weights to be specified later. We refer to the estimate defined by the minimizer of (4) as the mlasso (modified Lasso) estimate and focus on its theoretical and empirical properties in this article. In the linear regression paradigm, the mlasso has been investigated in [17]. The proposed estimate for the additive risk model is partly motivated by that study. However, significant differences exist. For example, the design matrix X no longer contains independent covariates and the dimension of X no longer depends on the sample size. Moreover the mlasso estimate with the additive risk model does not have the simplified format as in [12] even under orthogonality assumption on covariates. 3. PROPERTIES OF THE MLASSO ESTIMATE As pointed out by Fan and Li [15], a good penalized approach should result in an estimator with the oracle property, namely, the estimator should perform as well as if the true underlying model were known in advance. We show in this section that the proposed mlasso estimator does possess the oracle property if w j s are appropriately chosen. Thus, the proposed mlasso is capable of consistently identifying the true non-zero covariate effects, and can achieve root-n convergence rate for the estimated non-zero coefficients.

6 3758 C. LENG AND S. MA Without loss of generality, write β 0 = (β 0 1,...,β0 r, β0 r+1,...,β0 p )T, where βr+1 0,...,β0 p are zero. Define I ={j : β 0 j = 0}. We propose setting w j = 1/ ˆβ 0 j, where ˆβ 0 is an initial non-zero, consistent estimate of β 0. Theorem 1 (Oracle property) If max j ˆβ 0 j β0 j =O p(δ n ), nλ n 0and nλ n /δ n, then the minimizer of (4) satisfies the oracle property when n, i.e. ˆβ j = 0 j / I; ˆβ j j I has the same limiting distribution as λ n = 0. Similarly, for the estimate in (3), one has Corollary 2 The minimizer of (3) has the oracle property if max j ˆβ 0 j β0 j =O p(δ n ), nλ n 0, nλ n /δ n, and j β j is replaced by j β j / ˆβ 0 j. Intuitively, by choosing w j = 1/ ˆβ 0 j, the amount of penalization each coefficient gets is inversely proportional to its pre-determined significance. Therefore, the important coefficients are penalized to a lesser extent than unimportant (smaller) ones. In this article, we assume certain estimation consistency of the ˆβ 0 and hence w to simplify the asymptotic proof. Careful inspection of the proof reveals that the consistency assumption can be replaced by the looser zero consistency assumption, i.e. it is only needed to assume that w j s go to infinity for zero components of β 0 and are asymptotically bounded for non-zero components. We do not pursue this issue further in this article. The above results establish the fact that the mlasso estimates have the oracle property, as long as the initial estimate ˆβ 0 is δ n -consistent. The initial estimate does not have to be the optimal n-consistent estimator. As a consequence, ˆβ0 can be obtained via many familiar estimating methods, for example, the standard Lin Ying approach or the ridge regression. For ridge regression with a penalty τ, ˆβ 0 is estimated as ˆβ 0 = (A n + τi ) b n. For practical data analyses, if the initial estimate has zero or extremely small components, we propose setting the corresponding mlasso components zero. As in [12, 15], we can estimate the covariance matrix of ˆβ C (C ={j : ˆβ j = 0}) by the sandwich formula ˆΣ CC ={ CC + nλ n D} 1 ˆB CC { CC + nλ n D} 1 where D = diag(w C,1 /ˆβ C,1,...,w C,d /ˆβ C,d ), d = #{C} and  CC and ˆB CC are the sub-matrices of  and ˆB corresponding to the set C. 4. COMPUTATION AND TUNING By letting d j = w j β j, j = 1,...,p, the mlasso is equivalent to 1 p min d 2 [d à n d 2d b n ]+nλ n d j (5)

7 MODEL SELECTION IN ADDITIVE RISK MODEL 3759 where à n ={diag(w 1,...,w p )} 1 A n {diag(w 1,...,w p )} 1 and b n ={diag(w 1,...,w p )} 1 b n. The fast Lasso algorithm [24] can then be used to solve for d, with a slight twist that only A n and b n are presented instead of X and y. Suppose the estimate which minimizes (5) is ˆd, then the estimate which solves (4) is simply ˆβ j = ˆβ 0 j ˆd j. This is analogous to the non-negative garrote except for the fact that d j s are not constrained to be non-negative. Mimicking [25], we propose the following computational algorithm: Algorithm for computing (5) 1. Start with d = 0, the active set C = arg max j b n, j, and the direction γ, ap-vector with γ C = sgn( b n ) C and γ C C = 0 2. Compute how far the algorithm can proceed before a new variable joins the active set a 1 = min{a>0: {à n (d + aγ) b n } j = {à n (d + aγ) b n } C, j / C}. 3. Compute how far the algorithm can proceed before any d j in C hits zero a 2 = min{a>0:(d + aγ) j = 0, j C} 4. Let α = min(a 1, a 2 );ifa = a 1, add the variable attaining equality at a to C;ifa = a 2, remove variable attending 0 at a from C. Update d d+aγ and compute γ as γ C =à n,cc ( sgn(d C)) and γ C C = 0 5. Go to step 2 until à n d = b n In order to find the λ n corresponding to an estimate ˆd(λ n ) C, we note and thus ˆd(λ n ) C = à n,cc ( b n,c nλ n sgn( ˆd(λ n )) C ) λ n = [ b n,c à n,cc ˆd(λ n ) C ] i n[sgn( ˆd(λ n )) C ] i where [a] i is the ith entry of a. Denote L(β) = 1 2 (β A n β 2b n β). The optimal λ can be determined by the V-fold cross-validation. More specifically, the V-fold cross-validation works as follows: denote the full data set by S and therefore the training set and the testing sets are S S v and S v, respectively, for v = 1,...,V.For each λ, compute L(ˆβ v (λ)) using the training data S S v. Define the cross-validation score as CV(λ) = V L(ˆβ v (λ)). v=1 We then find λ to minimize CV(λ). The same V-fold cross-validation is used for τ if ˆβ 0 is estimated via the ridge regression. In [16], the Lasso estimate is obtained using a L 1 boosting algorithm. The Lars algorithm yields an exact solution, instead of the approximate solution via boosting. The L 1 boosting-based approach is proper if the number of covariates is extremely large (for example in microarray studies). However, in clinical studies such as the PBC example, the Lars approach is computationally faster and better behaved.

8 3760 C. LENG AND S. MA 5. DATA ANALYSIS We apply the proposed approach to three well-investigated public data sets. Those data sets are chosen partly to facilitate comparison with published results. In all three data sets, we have right censored survival data with multiple covariates. We refer to corresponding publications for more experimental and data details PBC data The Mayo Clinic has established a database of 424 patients having primary biliary cirrhosis of liver (PBC). Seventeen covariates were collected for 312 randomized patients. We focus on the 276 patients with no missing value in the covariates. A more detailed account of the PBC data can be found in [26]. Following [13], we standardize all the regressors, so that they are more comparable in the penalization scheme. We calculate the Lasso estimate with Tibshirani s Lasso formulation, and two mlasso estimates, with the initial estimates obtained from Lin Ying s estimating equation (mlasso 1) and the ridge regression (mlasso 2). The tuning parameter λ (and τ for mlasso 2) is chosen by the 10-fold cross-validation. The estimation results are summarized in Table I. The Lasso formulation has the advantage in generating the whole solution path for varying λ, and we present the solution paths, together with the cross-validation scores, in Figure 1. It can be seen that the cross-validation scores are well defined as a function of the tuning parameter and there exist well-separated, unique minimizers. Several interesting observations arise. Firstly, 14 estimated coefficients of the full additive risk model have the same signs as those of the Cox model. We referto [13] Table I for detailed estimates from the Cox model. The three covariates with different signs are alk, chol and hep. In the full additive model, those three variables are not marginally significant. Moreover, they are estimated as unimportant variables if a Lasso procedure is applied to either Cox or additive risk model. This suggests that the Cox and additive models lead to similar biological conclusions. Secondly, compared with the 12 selected covariates in Table I of [13], the Lasso-additive formulation gives a final model with 10 predictors, a subset of the Lasso Cox model excluding chol and sex. We note that the Lasso Cox estimates of chol and sex in [13] are very small and marginally not significant. Thirdly, in terms of signs of estimated non-zero coefficients, the Lasso, mlasso 1 and mlasso 2 are the same as the full additive model. So no biological conclusions are significantly changed by carrying out variable selection. Fourthly, as can be seen in the top row of Figure 1, the Lasso seems to over-shrink the estimates towards zero and the mlasso (especially mlasso 1) seems to employ the shrinkage to a smaller degree. For example, variables such as bill and asc, which eventually have larger coefficients than others, have larger coefficients in the early part of the mlasso 1 and mlasso 2 paths, than in the case of regular Lasso. Since mlasso 2 employs a shrinkage estimator, the ridge regression, as the initial estimate, mlasso 1 has the least shrinkage, which can be seen clearly from Figure 1. There is one fewer covariate (spid) in mlasso 2 than Lasso; and two fewer covariates (spid and prot) in mlasso 1 than Lasso. For model fitting comparison, we propose using the time-dependent receiver operating characteristic (ROC) for censored data approach. The time-dependent ROC technique was investigated in [27] in the context of medical diagnosis and has been used as criteria for censored data regression with high-dimensional covariates [28]. The essential idea is to treat the event indicator as binary outcome for each time point and evaluate the classification performance at each time

9 MODEL SELECTION IN ADDITIVE RISK MODEL 3761 Table I. Estimation results for the PBC data: the Lasso refers to the additive model with the original Lasso penalty; mlasso 1 refers to the Lasso solution with ˆβ 0 being Lin Ying s estimator; mlasso 2 refers to the Lasso solution with ˆβ 0 being the ridge regression estimator. Full model Lasso mlasso 1 mlasso 2 Coefficient SE Coefficient SE Coefficient SE Coeffcient SE Variable ( 10 4 ) ( 10 4 ) Z ( 10 4 ) ( 10 4 ) Z ( 10 4 ) ( 10 4 ) Z ( 10 4 ) ( 10 4 ) Z age alb alk asc bili chol ed hep plat prot sex sgot spid stage trt trig cop

10 3762 C. LENG AND S. MA Lasso Lasso bili CV score β asc ed age cop spid prot stage sgot alb β /max β mlasso β /max β mlasso 1 bili CV score β asc ed age cop stage sgot alb d /max d d /max d mlasso 2 mlasso 2 bili CV score β asc ed age cop prot stage sgot alb d /max d d /max d Figure 1. Analysis of PBC data. Top row: Lasso; middle row: mlasso 1; bottom row: mlasso 2. Left panels: the cross-validation score. Right panels: the solution paths. Dotted lines: optimal tuning parameter chosen by cross-validation. using the standard ROC technique. In the ROC approach, the area under curve (AUC) can be used as the evaluation/comparison criteria and a larger AUC at time t indicates better model fitting of the survival outcome at time t as measured by sensitivity and specificity evaluated at time t.

11 MODEL SELECTION IN ADDITIVE RISK MODEL 3763 PBC ROC curves Density AUC FULL LASSO mlasso 1 mlasso 2 Cox Lasso Time (day) UIS ROC curves Density AUC time (day) BMT ROC curves Density AUC Time (day) Figure 2. Kernel density estimates of the p-values (log 10 transformed) and the time-dependent ROC curves for the three data sets. We can see from Figure 2 that the proposed mlasso approaches have similar model fitting performance as the full model, while using only a subset of covariates. To investigate the prediction performance of the proposed estimators, we randomly split the 276 observation into training sets of size 200 and test sets of 76. The estimation is carried out using the training sets only. The linear risk scores ˆβ Z for the testing sets are calculated using the estimates

12 3764 C. LENG AND S. MA ˆβ from the training sets. We generate two hypothetical risk groups according to the median of the linear risk scores in the testing set. The survival functions of the two groups are compared. A chi-squared statistic (with degree of freedom one) and corresponding p-value are computed. A similar evaluation method is adopted by Ma and Huang [16], Li and Gui [28]. We repeat this procedure 100 times and summarize the log 10 p-values in Figure 2. We see that mlasso 1 and mlasso 2 yield similar p-values with the Lasso, while overall, penalized additive risk models give small p-values compared to the full additive risk model. Out of 100 random training/testing sets, we observe that regular Lasso gives 75 p-values smaller than those of the full model. The mlasso 1 yields 63 and mlasso 2 yields 73 p-values smaller than those of the full model. To compare whether the means of the p-values are significantly different, we use the pairwise Wilcox rank sum test. It is observed that the p-value between Lasso and Full is 0.008, between mlasso1 and Full is 0.10, between mlasso 2 and Full is So the Lasso-based approaches are significantly better than the full model. However, there is no significant difference between either Lasso and mlasso 1 (p-value 0.26), or Lasso and mlasso 2 (p-value 0.88). The mean model size for regular Lasso is 9.55 with a standard error 0.16, for mlasso 1 is 7.20 with a standard error 0.33, for mlasso 2 is 8.37 with a standard error The two mlasso estimates give smaller models while at the same time, maintain the same differentiation power between low and high risk groups. As a simple comparison, we also consider the PBC data with the Lasso Cox estimate. We show in Figure 1 the time-dependent ROC using the estimation results in [13]. We can see that the additive models have dominating AUCs, which suggests that the additive models are better fitted than the Cox model UIS data This data set is a subset of data from the University of Massachusetts Aids Research Unit Impact Study [29]. Observations with missing entries are excluded and we focus on 528 samples with complete record. The purpose of this study was to compare treatment programs of different planned durations designed to reduce drug abuse and to prevent high-risk HIV behaviour. The UIS sought to study how the covariates determine time to return to drug use. As suggested by Hosmer and Lemeshow [29], we create three dummy variables for hecrcoc (heroin or cocaine use) and two dummy variables for IVHX (IV drug use history at administration). A total of 12 covariates are considered for the additive risk model. More discussions on the UIS data set can be found in [29]. We apply the proposed Lasso approach. Plot of the model features is close to Figure 1 and is omitted here. The fitted coefficients and their standard errors are listed in Table II. We note that all the estimated coefficients agree in signs with their counterparts from the Cox model (results not shown), which indicates similar biological conclusions. The Lasso gives a model with 10 covariates, while both mlasso 1 and mlasso 2 give a final model with 8 covariates. Model fitting comparison based on the time-dependent ROC is shown in Figure 2. Results similar to those for the PBC data are observed. In order to assess the performance of the three regularization methods, we randomly split the data into training sets of sample size 400 and testing sets 128. As in the PBC analysis, we create hypothetical risk groups by ranking the testing samples according to their estimated scores. We repeat the splitting 100 times and summarize the result. It is observed that for the 100 runs, regular Lasso gives a model with average size 9.12, mlasso 1 gives 7.64 and mlasso 2 given Again, we see that the two mlasso algorithms give smaller models. Wilcox pairwise tests for differences between p-values from the three methods show that

13 MODEL SELECTION IN ADDITIVE RISK MODEL 3765 Table II. Estimation results for the UIS data: the Lasso refers to the additive model with the original Lasso penalty; mlasso 1 refers to the Lasso solution with ˆβ 0 being Lin Ying s estimator; mlasso 2 refers to the Lasso solution with ˆβ 0 being the ridge regression estimator. Full model Lasso mlasso 1 mlasso 2 Coefficient SE Coefficient SE Coefficient SE Coeffcient SE Variable ( 10 4 ) ( 10 4 ) Z ( 10 4 ) ( 10 4 ) Z ( 10 4 ) ( 10 4 ) Z ( 10 4 ) ( 10 4 ) Z age becktota ndrugtx race treat site los hercoc hercoc hercoc ivhx ivhx

14 3766 C. LENG AND S. MA Table III. Estimation results for the BMT data: the Lasso refers to the additive model with the original Lasso penalty; mlasso 1 refers to the Lasso solution with ˆβ 0 being Lin Ying s estimator; mlasso 2 refers to the Lasso solution with ˆβ 0 being the ridge regression estimator. Full model Lasso mlasso 1 mlasso 2 Coefficient SE Coefficient SE Coefficient SE Coeffcient SE Variable ( 10 4 ) ( 10 4 ) Z ( 10 4 ) ( 10 4 ) Z ( 10 4 ) ( 10 4 ) Z ( 10 4 ) ( 10 4 ) Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

15 MODEL SELECTION IN ADDITIVE RISK MODEL 3767 there is no significant difference between the three methods. See Figure 2 for the kernel density estimates of the log 10 p-values BMT data The BMT data is provided and analysed in [30]. We study the disease free time after bone marrow transplant for 137 patients. Of particular interest is to detect influential covariate effects out of the 14 measurement covariates. We employ the proposed Lasso approach and the fitted coefficients are in Table III. In this example, we observe that the full Cox model (result not shown) and the full additive model give 11 coefficients with the same signs. The three coefficients with different signs are Z2, Z10 and Z11, which again are not important variables if a Lasso procedure is applied. The time dependence ROC curves are shown in Figure 2. The time-dependent AUCs are reasonably large, which suggests satisfactory model fitting results. The prediction performance is examined by splitting the total samples into training sets (with 100 samples) and testing sets (with 37 sample). We repeat it 100 times. The average model size for the full model is 4.90; for mlasso 1 is 3.97; for mlasso 2 is Pairwise Wilcoxons tests confirm that the three penalized regression methods yield significantly better separation between low and high risk groups; while there are no significant difference between the three Lasso approaches (results not shown). 6. CONCLUDING REMARKS Additive risk model provides a useful alternative to Cox model. For data like the PBC, the additive risk model may provide a better fit than the Cox model as evaluated with time-dependent ROC. With the presence of multiple covariates, the Lasso methodology can be applied to build reliable parsimonious models. Particularly, we propose the mlasso approach and show that it yields path consistent model selection result. The fast Lars Lasso algorithm is applied to compute the whole solution path of the estimates, which greatly facilitates the adaptive choice of the tuning parameter. The proposed algorithm can be easily realized using existing software. Development of public software package will be pursued in the future. We also discuss a sandwich-type formula to infer the standard errors of the estimated parameters. We analyse three survival data sets using the proposed approach. Comparisons with the full additive model and the Cox model suggest satisfactory estimation and prediction performance. Extensive simulation studies will be pursued in a separate study to systematically investigate the small sample performance of the Lasso-based approaches under the additive risk model. Compared with previous variable selection studies with censored data, we consider the widely used but less-investigated additive risk model. Combined with previous studies in [13, 14], a more comprehensive framework of variable selection in survival analysis is now available. Compared with [16], the proposed mlasso has better theoretical basis and similar computational cost. Empirical studies show that the mlasso can yield models smaller than those from regular Lasso, yet with similar prediction performance. Compared with the regular Lasso, the mlasso has the disadvantage of requiring a consistent initial estimate, although this usually does not pose a serious problem. In this article, we assume the covariates act linearly on the additive hazard but it need not be true. We are currently investigating the extension to non-parametric additive model λ(t; Z) = λ 0 (t) + η(z(t))

16 3768 C. LENG AND S. MA where η is a non-parametric function which will be fitted using popular non-parametric methods such as smoothing spline [31] or local polynomial [32]. This study will be reported in a later manuscript. 7. PROOFS Proof of Theorem 1 Write the initial estimate of β 0 as ˆβ 0, where max j ˆβ 0 j β0 j =O p(δ n ). Denote β = β 0 / n and define V (0) n (u) = 1 2 (β A n β 2b n β) + nλ n p 1 ˆβ 0 j β j = 1 2 [(β0 + u/ n) A n (β 0 + u/ n) 2b n (β0 + u/ n)]+nλ n p 1 ˆβ 0 j β0 j + u/ n and V n (u) = V n (0) (u) V n (0) (0). Note that V n (u) is minimized at n(ˆβ β 0 ). First note that V n (u) = 1 A 2 ut n n u (b n A n β 0 ) T u + p 1 nλ n n( β 0 n ˆβ 0 j j + u j / n β 0 j ) By standard counting process arguments [33, 7], A n /n A and (b n A n β 0 )/ n w N(0, B). It is easy to see n( β j + u j / n β j ) = u j sgn(β j )I (β j = 0) + u j I (β j = 0) Since nλ n 0and nλ n /δ n, it follows nλ n ˆβ 0 j ( β j + u/ { op (1) if u j = 0 j / I n β j ) = + otherwise Therefore, V n ( ) d V ( ), where { 2u T w + u T Au if u j = 0 j / I V n (u) = + otherwise Now rewrite matrix A,w and u as ( ) AII A II c A = A I c I A I c I c (6) (7) where A II is r r, A I c I c is (p r) (p r) and A I c I = A T II c ( ) ( ) wi ui w =, u = w I c u I c

17 MODEL SELECTION IN ADDITIVE RISK MODEL 3769 Since V ( ) is convex, by applying the arguments in [34, 35] (see also [36]), n(ˆβ β 0 ) argmin (V ). Thatisû I d A 1 II w I and û I c d 0. Thus, the n consistency part is proved. And we have P(A = r) 1, where A = #{ˆβ j = 0 β 0 j = 0}. It remains to show P(B > 0) 0asn, where B = #{ˆβ j = 0 β 0 j = 0}. We have showed that ˆβ β 0 = O p (1/ n). It is sufficient to show that for some small ε n = C/ n and j = r + 1,...,p To show (8), by Taylor expansion, we have V n (0) V n (0) (β) > 0 for 0<β β j <ε n (8) j < 0 for ε n <β j <0 (9) (β) = 1 (A T n β j n n, j β b n, j) + nλ n / ˆβ 0 j (10) where A n, j is the jth column of A n and b n, j is the jth entry of b n. Note that A T n, j β b n, j n = AT n, j β0 b n, j n + AT n, j (β β0 ) n = O p (1) by the assumption that β β 0 = O P (1/ n). Since nλ n /δ n, (10) is dominated by nλn / ˆβ 0 j > 0. Similarly we can prove (9). This completes the proof of P(B > 0) 0. ACKNOWLEDGEMENTS We would like to thank associate editor and two referees for insightful comments that have led to significant improvement of this paper. Chenlei Leng s research is partially supported by NUS research grants. Shuangge Ma would like to thank Yale Center for High Performance Computation in Biology and Biomedicine (NIH grant: RR , which funded the instrumentation) for computing support. REFERENCES 1. Fleming TR, Harrington DP. Counting Processes and Survival Analysis. Wiley: New York, Cox DR. Regression models and life-tables (with Discussion). Journal of the Royal Statistical Society, Series B 1972; 34: Aalen O. A Model for Regression Analysis of Counting Processes. Lecture Notes in Statistics, vol. 2. Springer: Berlin, Cox DR, Oakes D. Analysis of Survival Data. Chapman & Hall: London, Mckeague IW, Sasieni PD. A partly parametric additive risk model. Biometrika 1994; 81: Thomas DC. Use of auxiliary information in fitting nonproportional hazard models. Modern Statistical Methods in Chronic Disease Epidemiology. Wiley: New York, Lin D, Ying Z. Semiparametric analysis of the additive risk model. Biometrika 1994; 81: Breslow NE, Day NE. Statistical Methods in Cancer Research, vol. 2. IARC: Lyon, Miller A. Subset Selection in Regression. Chapman & Hall: London, Breiman L. Better subset regression using the nonnegative garrote. Technometrics 1995; 37: Breiman L. Heuristics of instability and stabilization in model selection. The Annals of Statistics 1996; 24:

18 3770 C. LENG AND S. MA 12. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 1996; 58: Tibshirani R. The lasso method for variable selection in the Cox model. Statistics in Medicine 1997; 16: Huang J, Ma S, Xie H. Regularized estimation in the accelerated failure time model with high dimensional covariates. Biometrics 2006; 62: Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 2001; 96: Ma S, Huang J. Lasso method for additive risk models with high dimensional covariates. Technical Report 347, Department of Statistics and Actuarial Science, University of Iowa, Iowa, Zou H. The adaptive lasso and its oracle properties. Technical Report, University of Minnesota, Minneapolis, Yuan M, Lin Y. On the nonnegative garrote estimator. Technical Report , School of Industrial and Systems Engineering, Georgia Institute of Technology, Zhao P, Yu B. On model selection consistency of lasso. Manuscript, Leng C, Lin Y, Wahba G. A note on the lasso and related procedures in model selection. Statistica Sinica 2006; 16(4): Wang H, Li G, Jiang G. Robust regression shrinkage and consistent variable selection via the lad-lasso. Manuscript, Ma S, Kosorok MR, Fine JP. Additive risk models for survival data with high dimensional covariates. Biometrics 2006; 62: Nguyen D, Rocke DM. Partial least squares proportional hazard regression for application to DNA microarray data. Bioinformatics 2002; 18: Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression (with Discussion). The Annals of Statistics 2004; 32: Rosset S, Zhu J. Piecewise linear regularized solution paths. Manuscript, Dickson E, Grambsch P, Fleming T, Fisher LD, Langworthy A. Prognosis in primary biliary cirrhosis: model for decision making. Hepatology 1989; 10: Heagerty PJ, Lumley T, Pepe M. Time dependent roc curves for censored survival data and a diagnostic marker. Biometrics 2000; 56: Li H, Gui J. Partial cox regression analysis for high-dimensional microarray gene expression data. Bioinformatics 2004; 20: Hosmer DW, Lemeshow S. Applied Survival Analysis. Wiley: New York, Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data. Springer: Berlin, Wahba G. Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. SIAM: Philadelphia, PA, Fan J, Gijbels I. Local Polynomial Modelling and its Applications. Chapman & Hall: London, Anderson P, Gill RD. Cox s regression model for counting process: a large sample study. The Annals of Statistics 1982; 10: Geyer C. On the asymptotics of constrained m-estimation. The Annals of Statistics 1994; 22: Geyer C. On the asymptotics of convex stochastic optimization. Manuscript, Knight K, Fu W. Asymptotics for lasso-type estimators. The Annals of Statistics 2000; 28:

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Adaptive Lasso for Cox s Proportional Hazards Model

Adaptive Lasso for Cox s Proportional Hazards Model Adaptive Lasso for Cox s Proportional Hazards Model By HAO HELEN ZHANG AND WENBIN LU Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695-8203, U.S.A. hzhang@stat.ncsu.edu

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Effect of outliers on the variable selection by the regularized regression

Effect of outliers on the variable selection by the regularized regression Communications for Statistical Applications and Methods 2018, Vol. 25, No. 2, 235 243 https://doi.org/10.29220/csam.2018.25.2.235 Print ISSN 2287-7843 / Online ISSN 2383-4757 Effect of outliers on the

More information

Unified LASSO Estimation via Least Squares Approximation

Unified LASSO Estimation via Least Squares Approximation Unified LASSO Estimation via Least Squares Approximation Hansheng Wang and Chenlei Leng Peking University & National University of Singapore First version: May 25, 2006. Revised on March 23, 2007. Abstract

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. *

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. * Least Absolute Deviations Estimation for the Accelerated Failure Time Model Jian Huang 1,2, Shuangge Ma 3, and Huiliang Xie 1 1 Department of Statistics and Actuarial Science, and 2 Program in Public Health

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

TGDR: An Introduction

TGDR: An Introduction TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Grouped variable selection in high dimensional partially linear additive Cox model

Grouped variable selection in high dimensional partially linear additive Cox model University of Iowa Iowa Research Online Theses and Dissertations Fall 2010 Grouped variable selection in high dimensional partially linear additive Cox model Li Liu University of Iowa Copyright 2010 Li

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Residuals and model diagnostics

Residuals and model diagnostics Residuals and model diagnostics Patrick Breheny November 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/42 Introduction Residuals Many assumptions go into regression models, and the Cox proportional

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Sparse survival regression

Sparse survival regression Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data 1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL

LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL Statistica Sinica 17(2007), 1533-1548 LEAST ABSOLUTE DEVIATIONS ESTIMATION FOR THE ACCELERATED FAILURE TIME MODEL Jian Huang 1, Shuangge Ma 2 and Huiliang Xie 1 1 University of Iowa and 2 Yale University

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Statistica Sinica 18(2008), 1603-1618 ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang, Shuangge Ma and Cun-Hui Zhang University of Iowa, Yale University and Rutgers University Abstract:

More information

Reduced-rank hazard regression

Reduced-rank hazard regression Chapter 2 Reduced-rank hazard regression Abstract The Cox proportional hazards model is the most common method to analyze survival data. However, the proportional hazards assumption might not hold. The

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

In Search of Desirable Compounds

In Search of Desirable Compounds In Search of Desirable Compounds Adrijo Chakraborty University of Georgia Email: adrijoc@uga.edu Abhyuday Mandal University of Georgia Email: amandal@stat.uga.edu Kjell Johnson Arbor Analytics, LLC Email:

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen The Annals of Statistics 21, Vol. 29, No. 5, 1344 136 A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1 By Thomas H. Scheike University of Copenhagen We present a non-parametric survival model

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant

More information

arxiv: v2 [stat.ml] 22 Feb 2008

arxiv: v2 [stat.ml] 22 Feb 2008 arxiv:0710.0508v2 [stat.ml] 22 Feb 2008 Electronic Journal of Statistics Vol. 2 (2008) 103 117 ISSN: 1935-7524 DOI: 10.1214/07-EJS125 Structured variable selection in support vector machines Seongho Wu

More information

Nonnegative Garrote Component Selection in Functional ANOVA Models

Nonnegative Garrote Component Selection in Functional ANOVA Models Nonnegative Garrote Component Selection in Functional ANOVA Models Ming Yuan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 3033-005 Email: myuan@isye.gatech.edu

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint B.A. Turlach School of Mathematics and Statistics (M19) The University of Western Australia 35 Stirling Highway,

More information

NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA SIMIN HU

NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA SIMIN HU NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA by SIMIN HU Submitted in partial fulfillment of the requirements

More information

Survival Prediction Under Dependent Censoring: A Copula-based Approach

Survival Prediction Under Dependent Censoring: A Copula-based Approach Survival Prediction Under Dependent Censoring: A Copula-based Approach Yi-Hau Chen Institute of Statistical Science, Academia Sinica 2013 AMMS, National Sun Yat-Sen University December 7 2013 Joint work

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Univariate shrinkage in the Cox model for high dimensional data

Univariate shrinkage in the Cox model for high dimensional data Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)

More information

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Dhruv B. Sharma, Howard D. Bondell and Hao Helen Zhang Abstract Statistical procedures for variable selection

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection An Improved 1-norm SVM for Simultaneous Classification and Variable Selection Hui Zou School of Statistics University of Minnesota Minneapolis, MN 55455 hzou@stat.umn.edu Abstract We propose a novel extension

More information

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005

More information

On the Breslow estimator

On the Breslow estimator Lifetime Data Anal (27) 13:471 48 DOI 1.17/s1985-7-948-y On the Breslow estimator D. Y. Lin Received: 5 April 27 / Accepted: 16 July 27 / Published online: 2 September 27 Springer Science+Business Media,

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

Model Selection and Estimation in Regression with Grouped Variables 1

Model Selection and Estimation in Regression with Grouped Variables 1 DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1095 November 9, 2004 Model Selection and Estimation in Regression with Grouped Variables 1

More information

Published online: 10 Apr 2012.

Published online: 10 Apr 2012. This article was downloaded by: Columbia University] On: 23 March 215, At: 12:7 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Variable Selection in Cox s Proportional Hazards Model Using a Parallel Genetic Algorithm

Variable Selection in Cox s Proportional Hazards Model Using a Parallel Genetic Algorithm Variable Selection in Cox s Proportional Hazards Model Using a Parallel Genetic Algorithm Mu Zhu and Guangzhe Fan Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Estimation in the l 1 -Regularized Accelerated Failure Time Model

Estimation in the l 1 -Regularized Accelerated Failure Time Model Estimation in the l 1 -Regularized Accelerated Failure Time Model by Brent Johnson, PhD Technical Report 08-01 May 2008 Department of Biostatistics Rollins School of Public Health 1518 Clifton Road, N.E.

More information

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data Efficiency Comparison Between Mean and Log-rank Tests for Recurrent Event Time Data Wenbin Lu Department of Statistics, North Carolina State University, Raleigh, NC 27695 Email: lu@stat.ncsu.edu Summary.

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

Proportional hazards regression

Proportional hazards regression Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression

More information

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis. Lu Tian and Richard Olshen Stanford University 1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

More information

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Eunsik Park 1 and Y-c Ivan Chang 2 1 Chonnam National University, Gwangju, Korea 2 Academia Sinica, Taipei,

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information