NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA SIMIN HU

Size: px
Start display at page:

Download "NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA SIMIN HU"

Transcription

1 NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA by SIMIN HU Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Adviser: Dr. J. SUNIL RAO Department of Epidemiology and Biostatistics CASE WESTERN RESERVE UNIVERSITY January 2007

2 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the dissertation of candidate for the Ph.D. degree *. (signed) (chair of the committee) (date) *We also certify that written approval has been obtained for any proprietary material contained therein.

3 Table of Contents List of Tables...4 List of Figures...6 Acknowledgements...9 Abstract...10 Penalized Weighted Least Squares Method for Estimation and Variable Selection in Accelerated Failure Time Models with High Dimensional Gene Expression Data Introduction Survival analysis and right censored data, AFT Models, weighted least squares method and common regularization techniques Survival analysis and right censored data AFT models Stute s Weighted Least Squares method for AFT models Penalized weighted least squares method Censoring constraints Ridge and LASSO penalization Algorithms for finding the penalized solutions Quadratic programming Variable selection Algorithms for finding the penalized solutions Parameter tuning Simulation studies

4 Simulation 1: comparison of model estimation between algorithm QP and algorithm QPCC when p < n Simulation 2: comparison of the performances in variable selection and prediction accuracy between algorithms QP, QPCC, QPLASSO and QPCCLASSO when p < n Simulation 3: comparison of the performances in variable selection between algorithms QP, QPCC, QPLASSO and QPCCLASSO when p > n Simulation 4: comparison of the performances in variable selection and prediction accuracy between algorithms QP, QPCC, QPLASSO and QPCCLASSO when p > n Real data examples Stanford heart transplant data (p<n) Diffuse large B-cell lymphoma survival data (p > n) Discussion Statistical Redundancy Testing for Improved Gene Selection in Cancer Classification Using Microarray Data Introduction Background on gene expression data and related Algorithms DIANA (Divisive Analysis) clustering Algorithm FLDA Statistical redundancy testing and gene selection Algorithms Test statistic: eigenvalue-ratio Bootstrap re-sampling test for statistical redundancy Gene selection methods Simulations Simulation Simulation Simulation Gene selection for microarray datasets Gene selection for leukemia study

5 Gene selection for Colon cancer study Discussion Appendix I Appendix II Appendix III Bibliography

6 List of Tables Table 1.1 Distributions for AFT models Table 1.2 Simulated data Table 1.3 Model fitted under different λ Table 1.4 Estimate of β 2 and standard deviation Table 1.5 Estimates of β s with standard deviations using QP and QPCC when censoring percentage is 30% Table 1.6 Estimates of β s with standard deviations using QP and QPCC when censoring percentage is 70% Table 1.7 Mean-squared errors for test data and model selection for training data at 2 different censoring percentages and based on 100 replications. The 3rd and 5th columns show the frequencies to select the exactly correct model; the numbers in parenthesis are the frequencies to retain all the 6 relevant variables in the final model Table 1.8 Number of variables selected and number of relevant variables selected at different censoring percentages. The numbers in parenthesis are the numbers of relevant variables selected in the final model Table 1.9 Mean mean-squared errors for test data at 2 different censoring percentages and based on 50 replications. The 3rd and 5th columns show the frequencies to select the exactly correct model; the numbers in parenthesis are the frequencies to retain all the 20 relevant variables in the final model Table 1.10 Survival analysis results of Standford heart transplant data with different estimation methods Table 1.11 DLBCL WLSE and MSE of selected models using different algorithms Table 1.12 DLBCL log-rank test results for groups defined according to linear risk scores

7 5 Table 1.13 DLBCL log-rank test results for groups defined according to Cox scores Table 1.14 Genes selected by the four algorithms and belong to the 17 genes used to build the Cox model in Rosenwald et al. (2002) Table 1.15 Genes selected with the QPCCLASSO algorithm Table 1.16 Number of common genes between the lists of genes selected by the four algorithms Table 2.1 Gene p-values for statistical redundancy test and t-test Table 2.2 Gene filtering process in Algorithm Table 2.3 Genes selected under different pair-wise correlations Table selected genes for Leukemia data Table 2.5 Golub s 50 genes that are selected and/or filtered Table 2.6 LOOCV errors of different classification models and gene selection methods for Leukemia data Table 2.7 The 46 selected genes for colon cancer data Table 2.8 LOOCV errors of different classification models and gene selection methods for colon cancer data Table I.1 DLBCL survival training data results with the four algorithms

8 List of Figures Figure 1.1 Line plot in calendar time for four subjects in a hypothetical follow-up study Figure 1.2 Line plot in the time scale for four subjects Figure 1.3 Plot of model fitted under different λ 0. o uncensored data; censored data Figure 1.4 Slope and intercept vs. λ Figure 1.5 Data: n = 30, p = 10, corr(i, j) = 0.5, σ = 2, and, censoring percentage = 30%. Tuning parameters: λ 0 = 1 and λ 2 = Each plot has an overlaid line. o uncensored data; censored data. The overlaid lines have intercept 0 and slope 1. The plots in the left panel show the predicted survival times versus the observed survival times. The plots in the right panel show the predicted survival times versus the true survival times Figure 1.6 Data: n = 30, p = 10, corr(i, j) = 0.5, σ = 2, and, censoring percentage = 70%. Tuning parameters: λ 0 =1 and λ 2 = Each plot has an overlaid line. o uncensored data; censored data. The overlaid lines have intercept 0 and slope 1. The plots in the left panel show the predicted survival times versus the observed survival times. The plots in the right panel show the predicted survival times versus the true survival times Figure 1.7 Box-plots for mean-squared error on test data when p = 8 and n = 50. Methods 1, 2, 3 and 4 represent algorithm QP, QPCC, QPLASSO and QPCCLASSO, respectively. The left panel shows the results when censoring percentage is 30%; the right panel shows the results when censoring percentage is 70% Figure 1.8 Scatter plots of predicted log-survival time vs. actual log-survival time. The left top panel is for the model selected by QP ; the right top panel is for the 6

9 7 model selected by QPCC ; the left bottom panel is for the model selected by QPLASSO ; the right bottom panel is for the model selected by QPCCLASSO. o uncensored data; censored data. The overlaid lines have intercept 0 and slope 1. Censoring percentage = 30% Figure 1.9 Scatter plots of predicted log-survival time vs. actual log-survival time. The left top panel is for the model selected by QP ; the right top panel is for the model selected by QPCC ; the left bottom panel is for the model selected by QPLASSO ; the right bottom panel is for the model selected by QPCCLASSO. o uncensored data; censored data. The overlaid lines have intercept 0 and slope 1. Censoring percentage = 50% Figure 1.10 Scatter plots of predicted log-survival time vs. actual log-survival time. The left top panel is for the model selected by QP ; the right top panel is for the model selected by QPCC ; the left bottom panel is for the model selected by QPLASSO ; the right bottom panel is for the model selected by QPCCLASSO. o uncensored data; censored data. The overlaid lines have intercept 0 and slope 1. Censoring percentage = 70% Figure 1.11 Box-plots for mean-squared error on test data when p = 200 and n = 100. Methods 1, 2, 3 and 4 represent algorithm QP, QPCC, QPLASSO and QPCCLASSO respectively. The left panel shows the results when censoring percentage is 30%; the right panel shows the results when censoring percentage is 70% Figure 1.12 Scatter plots of predicted log-survival time vs. log-survival time for the training data. The left top panel is for the model selected by QP ; the right top panel is for the model selected by QPCC ; the left bottom panel is for the model selected by QPLASSO ; the right bottom panel is for the model selected by QPCCLASSO. o uncensored data; censored data. The overlaid lines have intercept 0 and slope Figure 1.13 Scatter plots of predicted log-survival time vs. log-survival time for the validation data. The left top panel is for the model selected by QP ; the right top panel is for the model selected by QPCC ; the left bottom panel is for the model selected by QPLASSO ; the right bottom panel is for the model selected by

10 8 QPCCLASSO. o uncensored data; censored data. The overlaid lines have intercept 0 and slope Figure I.1 Scatter plots of predicted log-survival time vs. log-survival time for the training data. The left top panel is for the model selected by QP ; the right top panel is for the model selected by QPCC ; the left bottom panel is for the model selected by QPLASSO ; the right bottom panel is for the model selected by QPCCLASSO. o uncensored data; censored data. The overlaid lines have intercept 0 and slope Figure I.2 Scatter plots of predicted log-survival time vs. log-survival time for the validation data. The left top panel is for the model selected by QP ; the right top panel is for the model selected by QPCC ; the left bottom panel is for the model selected by QPLASSO ; the right bottom panel is for the model selected by QPCCLASSO. o uncensored data; censored data. The overlaid lines have intercept 0 and slope

11 Acknowledgements My greatest gratitude goes to Professor J. Sunil Rao, my thesis advisor. His insightful guidance, persistent support and encouragement have given me the greatest experience. Thanks also go to Professors Mark Adams, Jeffrey M. Albert and Pingfu Fu. I m honored to have them on my Ph. D. committee. Their advices have made a lot of improvement to this work. Thanks to Dr. Albert for being my academic advisor and supporting me a lot during these years. Thanks to Dr. Fu for giving me a lot of inspiring discussions and constructive suggestions to my research work. I am also grateful to Guan and Jinjing for their friendship and support. This dissertation is dedicated to my mother Defeng Liang, my husband Baofu Duan and my daughter Kelly Duan. I give my deepest love and appreciation for their unconditional love and constant support. 9

12 New Methods for Variable Selection with Applications to Survival Analysis and Statistical Redundancy Analysis Using Gene Expression Data Abstract by Simin Hu An important application of microarray research is to develop cancer diagnostic and prognostic tools based on tumor genetic profiles. For easy interpretation, such studies aim to identify a small fraction of genes to build molecular predictors of clinical outcomes from at least thousands of genes thus require methodologies that can model high dimensional covariates and accomplish variable selection simultaneously. One interesting area is modeling cancer patients survival time or time to cancer reoccurrence with gene expression data. In the first part of this dissertation, we propose a new penalized weighted least squares method for model estimation and variable selection in accelerated failure time models. In this method, right censored observations are used as censoring constraints in optimizing the weighted least squares objective function. We also include ridge penalty to deal with singularity caused by collinearity and high dimensionality and use the least absolute shrinkage and selection operator to achieve 10

13 11 model parsimony. Simulation studies demonstrate that adding censoring constraints improves model estimation and variable selection especially for data with high dimensional covariates. Real data examples show our method is able to identify genes that are relevant to patient survival times. Another interesting area is cancer subtype classification using gene expression profiles. One important issue is to reduce redundancy caused by correlation among genes. Since genes with correlated expression levels may be co-expressed or belong to the same biological pathway related to the disease, including such genes into classifiers provides very little additional information. In the second part of the dissertation, we define an eigenvalue-ratio statistic to measure a gene s contribution to the joint discriminability of a set of genes. Based on this eigenvalue-ratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the effectiveness of our eigenvalueratio statistic based gene selection methods. We also demonstrate that the selected compact gene subsets can not only be used to build high quality cancer classifiers but also have biological relevance.

14 Penalized Weighted Least Squares Method for Estimation and Variable Selection in Accelerated Failure Time Models with High Dimensional Gene Expression Data 12

15 Introduction Recently there is a great deal of interest in survival analysis with right censored data and high-dimensional covariates. This is mainly driven by studies using gene expression profiles to predict patient survival time or time to cancer recurrence (e.g., Rosenwald et al., 2002; van t Veer et al. 2002; Beer et al. 2002). For easy interpretation such studies aim to identify best predictors of survival times from at least thousands of genes and thus need methodologies that can address censoring problem, deal with high dimensionality and accomplish variable selection simultaneously. Dimension reduction techniques such as supervised principle component analysis and sliced inverse regression, partial least squares method and kernel methods have been extended to Cox models (e.g., Bair and Tibshirani, 2004, Li and Li, 2004, Li and Gui, 2004, Park et al., 2003, Li and Luan, 2003) and accelerated failure time models (e.g., Huang and Harrington, 2002). There are also some survival ensemble methods developed for survival prediction with gene expression data. Li and Luan (2005) proposed a smoothing spline based boosting algorithm for the non-parametric additive Cox model. Horthorn et al. (2006) introduced a random forest algorithm and a generic gradient boosting algorithm to construct prognostic accelerated failure time models. All these methods are not rigorous variable selection methods since all the covariates are used to build compound covariates or ensemble models. Tibshirani (1997) used the LASSO method for variable selection and shrinkage in Cox s proportional hazards model by minimizing the log partial likelihood subject to the

16 14 sum of the absolute values of the parameters being bounded by a constant. The LASSO imposes an L 1 -penalty on the regression coefficients and tends to produce some zero coefficients thus can produce a parsimonious model (Tibshirani, 1996). This lasso method in Cox model can not be directly applied to high-dimensional gene expression data. Some efforts have been made for modeling and variable selection with right censored data and high-dimensional covariates in the literature. Gui and Li (2005) proposed to use the L 1 penalized estimation for the Cox model to select genes that are associated with patients survival. They used Least Angle Regression method (LARS, Efron et al., 2004) to solve the computational difficulty for high-dimensional data. Their method can in principle select n 1 genes where n is the sample size. Although Cox model (Cox,1972) is the most widely used procedure for modeling the relationship of covariates to a right censored outcome, a drawback of this method is that it assumes that the relative hazards for any two individuals to be independent of time with covariates fixed over time. When the proportional hazards assumption is questionable, a very attractive practical alternative to Cox model for failure time regression is the accelerated failure time model (AFT, Kalbfleisch and Prentice, 1980). Another desirable property is that AFT models (e.g. James, 1998) reduce to an ordinary linear regression model in the absence of censoring but Cox model does not. Furthermore, an AFT model can provide information for assessing the baseline hazard function if it models the data reasonably well but a Cox model can not. In the literature, the estimation approaches developed for AFT models include the semi-parametric rank-based methods (e.g. Tsiatis, 1990, Wei et

17 15 al., 1990, Lai & Ying, 1992, Ying, 1993, Fygenson & Ritov, 1994, Jin et al., 2003), the least squares based and M-estimation methods (e.g. Miller, 1976, Buckley & James, 1979, Ritov, 1990, Lai & Ying, 1991), and, the weighted least squares method (Stute, 1993, 1996). Wang et al. (2006) proposed a doubly penalized Buckley-James method with both L 1 -norm and L 2 -norm penalties to relate high-dimensional genomic data to censored survival outcomes. Huang et al. (2005a) used two regularization approaches based on Stute s method, the LASSO (Tibshirani, 1994) and the threshold gradient directed regularization (TGDR, Friedman & Popsecus, 2003), for estimation and variable selection in the accelerated failure time model with high-dimensional covariates. In another paper of Huang et al. (2005b), they used the Kaplan-Meier weights in the leastabsolute-deviations objective function. Datta et al. (2006) applied LARS method in AFT modeling for variable selection after censored data imputation with re-weighting, mean imputation, and multiple imputation approaches. Besides estimation problem, another very important problem is concerned with model evaluation because prediction is an important inferential goal in survival analysis. Since a very important information given by right censored data is that the censored survival time is less than the true survival time for censored observations, it is very natural to think of using prediction for censored observation as a kind of goodness measure for AFT models that are built on only uncensored data while censored observations are assigned zero Kaplan-Meier weights. In other words, if an AFT model very well models the survival times of the population from which the survival data comes, it should not underestimate too many censored observations or underestimate some censored observations too much. On the other hand, we may expect to improve

18 16 model prediction accuracy by making use of such information. This is the main motivation of the method proposed in this dissertation work. To handle the aforementioned estimation problem for accelerated failure time model and high dimensionality problem with gene expression data, we consider a new penalized weighted least squares estimator for accelerated failure time model. First, we propose to include censored observations in our penalized weighted least squares estimator as constraints which we call censoring constraints to limit the model space and thus improve model prediction accuracy. Second, besides the censoring constraints, our penalized weighted least squares estimator also includes both L 1 and L 2 penalties. Such a combination of lasso and ridge penalty has been used in Zou & Hastie s (2004) regularization and variable selection method elastic net for linear regression model, which simultaneously does automatic variable selection and continuous shrinkage and thus can select groups of correlated variables. We use quadratic programming (Goldfarb and Idnani, 1982, 1983) to solve the Stute s weighted least squares objective function with censoring constraints and L 1 and L 2 penalties. Although this method is motivated by the aforementioned problems in survival analysis with high dimensional covariates, there is no doubt this method can be applied to cases when p < n. In Section 2, we describe survival analysis, right censored data and the usual set up of such data, and briefly summarize the AFT model, Stute s weighted least squares estimator, LASSO and ridge regression. We define in Section 3 our new penalized weighted least squares method. In Section 4, we discuss computational issues and solutions. We use the K-fold cross validation for tuning parameter selection. Bootstrap method is used for variance estimation. Section 5 presents some simulation studies. The

19 17 analyses on real datasets are carried out in Section 6. The last section summarizes the proposed method and gives some discussions for future work in this research era.

20 Survival analysis and right censored data, AFT Models, weighted least squares method and common regularization techniques Survival analysis and right censored data Survival analysis was originally the analysis of survival times for medical patients diagnosed with some fatal disease and now is a well-developed field of statistical methodology and widely applied in modeling and testing hypotheses of time-to-event data that occur in many scientific fields including biology, medicine, public health, epidemiology, engineering, and economics. In medical studies, time-to-event data are generically referred to as survival data and the event of interest may be death, the end of a period spent in remission from a disease, relief from symptoms, or the recurrence of a particular condition. A special feature of survival data is that survival times are frequently censored when the end-point of interest has not been observed for some subjects. Censoring mechanisms that can lead to incomplete observation of survival time include censoring and truncation. Common types of censoring are right censoring, left censoring and interval censoring. Right censoring occurs after a subject has been entered into a study, i.e., to the right of the last known survival time. Thus a right censored survival time is less than the actual, but unknown, survival time. An example of right censoring in medical studies is that subjects who are still alive at the end of study or have been lost to follow-up only have follow-up times that are referred as censored times. Only the subjects who died have

21 19 actual survival time. Since this dissertation work focuses on right censored data, we describe such data by an example in Hosmer and Lemeshow (1999). It is a hypothetical 2-year-long study and four patients are enrolled during the first year. Patient 1 entered the study on 01/01/1990 and died on 03/01/1991; patient 2 entered the study on 02/01/1990 and lost to follow up on 02/01/1991; patient 3 entered the study on 06/01/1990 and was still alive on 12/01/1991, the end of study; and, patient 4 entered the study on 09/01/1990 and died on 04/01/1991. The calendar times for these four patients are shown in Fig. 1.1 and survival times are shown in Fig In this study, patients 1 and 4 have actual observed survival time while patients 2 and 3 have right-censored observations of survival time. 5 4 x x 0 J F J S F M A J Calendar time Figure 1.1 Line plot in calendar time for four subjects in a hypothetical follow-up study

22 20 4 x x Time in Months Figure 1.2 Line plot in the time scale for four subjects We consider the usual form of right censored data (Y, δ, X), where Y = min{t, C} with T being the response variable (i.e., observed survival time) and C being the censoring variable (i.e., censored survival time), δ = I{T C} is the indicator of censoring, and X are p-dimensional covariates. For survival time and gene expression data, X is a n p matrix with x i = (x i1,, x ip ), where i=1,, n, as the vector of the gene expression levels of p genes for the ith sample. Modeling of survival time is based on non-parametric and parametric approaches. Non-parametric and semi-parametric methods include Kaplan-Meier estimates of survival (Kaplan & Meier, 1958), Cox proportional hazards regression models (Cox, 1972) and extensions due to Andersen and Gill (1982). Parametric accelerated failure time models (Kalbfleisch and Prentice, 1980) assume an underlying hazard function distribution such as exponential, Weibull, gamma, Gompertz, lognormal, and log-logistic distribution.

23 AFT models AFT models specify that the effect of a fixed covariate X j, j=1,, p, acts multiplicatively on the failure time T or additively on logarithm or a known monotone transformation of T. An AFT model can be written as t( y i ) = β 0 + x i β + ε i (1) where i=1,, n, β 0 is the intercept term, β is the unknown p 1 vector of true regression coefficients and ε i s (i=1,, n) are unobservable independent random errors with mean zero and a common distribution function. Table 1.1 displays the distributions assumed for ε i and the corresponding distributions for T. Table 1.1 Distributions for AFT models Distribution of ε Extreme value (2 parameters) Extreme value (1 parameter) Log-gamma Logistic Normal Distribution of T Weibull Exponential Gamma Log- logistic Log-normal The usual choice for t is log(y i ) so the AFT model has the following form which is the same as that of a linear regression model: log( y β β + ε (2) i ) = 0 + x i i For simplicity, we write Equation (1) hereafter as y = β 0 + β + ε (3) i x i i where y i is the logarithm transformation of the response variable.

24 Stute s Weighted Least Squares method for AFT models Suppose t 1 t n are independent observations from an unknown distribution function F and c 1 c n are independent observations from an unknown distribution function G. Let y (1) y (n) be the ordered observations, δ (1),, δ (n) be the concomitant censoring indicators, and, x (1),, x (n) be the concomitant covariates. For right censored response with distribution function F, the Kaplan-Meier estimator Fˆ KM ( t) (Kaplan & Meier, 1958) is defined by ( i ) ˆ KM n i 1 F ( t) = (4) y + 1 ( i ) t n i δ Under this non-parametric maximum likelihood estimator, the mass attached to y (i) is w ni δ 1 ( i ) δ i ( i) n j = n i 1 j= 1 n j (5) Stute (1993, 1996) used this attached mass to weight the observations in linear regression for right censored data and extended Fˆ KM to a multivariate distribution ˆ 0 function F n ( x, t) which serves as an estimator of the joint distribution function 0 ˆF of (X, T) when covariates X are available and set n ( x, t) = Win I X ( i ) x, y( i ) Fˆ 0 n { t} (6) i= 1 Without censoring, Fˆ KM ( t) reduces to the usual sample distribution estimator Fˆ which assigns weight 1/n to each observation thus the weighted least squares reduces to

25 23 ordinary least squares. Stute (1993, 1996) defined the weighted least squares estimate βˆ of β as ˆ β = arg min β 1 2 n i= 1 w n i ( y β x β 2 ( i) 0 ( i) ) (7) Stute (1993, 1996) proved the consistency and asymptotical normality of the weighted least squares estimator under some integrability assumptions, E (ε X) =0, and the following assumptions for random censoring mechanism: 1. T and C are independent and F and G have no jumps in common, 2. P (T C X, T) = P (T C T), i.e., δ and X are independent conditionally on T, and, 3. τ < τ, where τ T and τ C are respectively the least upper bounds for the support of distribution functions of T and C. T C

26 Penalized weighted least squares method Although the information from censored observations has been incorporated into the Kaplan-Meier weights in Stute s weighted least squares method, in real applications when p < n, we sometimes observe that too many censored observations predicted survival times are less than the censored time or some censored observations are very poorly predicted, i.e., some censored samples have predicted survival time T ˆ << C. We observe severer problem when using Stute s method in case of p >> n, e.g., for survival analysis with high dimensional gene expression data. Another problem is singularity caused by collinearity and high dimensionality of gene expression data. These problems motivate our new variable selection method in this dissertation work. To address the problems, two issues have to be considered in using Stute s weighted least squares method. One is the variable selection, i.e. how to select only a small fraction of the variables that are really important predictors of survival time, and the other is to pay more attention to the censored data in both variable and model selections so that they will not be underestimated too much. We propose a new penalized weighted least squares method to address these two issues. In this method, we try to minimize the weighted least squares objective function while adding censoring observations as constraints. L 1 and L 2 penalties are also added and the quadratic programming algorithm is used to solve the objective function. In the following, we elaborate how to add the censoring constraints and L 1 and L 2 penalties in a quadratic programming process.

27 25 The properties of censoring constraints and its combination with LASSO and ridge penalties will be discussed and the results will be developed in a series of lemmas Censoring constraints Since the Kaplan-Meier weights for censored observations are 0, we can rewrite the objective function in Stute s weighted least square method as 1 ( Y 2 U T β X β ) W ( Y β X ) (8) 0 U U U 0 U β where X U and Y U are the predictor and response observations for uncensored data, and W U is a diagonal matrix representing the Kaplan-Meier weights for uncensored data. Denote X C and Y C as the predictor and response for censored data. Since Y C T C for right censored data, we can add the constraints, i.e., Y C X C β + β 0, to the weighted least squares method. These constraints may be too stringent due to the random noises. Instead we modify the constraints as Y C X C β + β 0 + ξ, where ξ is a vector of non-negative values that measure the severities of violations of the constraints. The weighted least squares estimate βˆ with censoring constraints (CC) is defined as β 1 2 λ T β ) + ξ ξ 2n ˆ CC T 0 = arg min ( YU β 0 X U β ) WU ( YU β 0 X U β, ξ subject to Y β + β + ξ (9) C X C 0 where λ 0 is a positive value accounting for the penalties of violations of constraints, and n is added in the penalty term just for scaling purpose to match W U. Equation (9) is equivalent to the following

28 26 ˆ β CC 1 = arg min ( Y β 2 U T β0 X β ) W ( Y β0 X β ) U U U U λ T ( Y β X β ) W ( Y β X β ) 0 + C 0 C C C 0 2 C (10) where W C is a diagonal matrix with diagonal element W C i,i =1/n if Y Ci > X Ci β + β 0 and 0 otherwise. As can be seen in the objective function in Equation (10), for a censored observation (X Ci, Y Ci ), if its estimated survival time is no less than the censoring time, the difference between the estimated survival time and the censoring time is ignored. Otherwise, the difference (we name it censoring error) is added into the objective function just like the uncensored observations. However the Equation (10) is not easy to solve using traditional regression algorithms because the objective function is not continuously differentiable in the model space. Instead we choose to use Equation (9) in which we expand the dimension of the model space to include additional parameters, ξ, to make the objective function continuously differentiable in a constrained (β,ξ) space. Equation (9) can be solved using the quadratic programming algorithm and we will mention this later. Notice that in Equation (9), the constraints that ξ must be a vector of non-negative values, i.e., ξ 0, are ignored in the optimization problem but they are guaranteed due to the penalty term. We can prove it in the following Lemma 1. Lemma 1. If the pair ( βˆ,ξˆ ) is the minimizer of Equation (9) with λ 0 > 0, then ξˆ 0. Proof. Define f 1 2 T 0 T ( β, ξ ) = ( Y β X β ) W ( Y β X β ) ξ ξ U 0 U U U 0 U + λ 2n

29 27 Assume j that ˆ ξ j < 0, then ν, 0 < v < ξˆ j, and let V be a vector with V j = ν and V i = 0 for i j. We have f λ 2 ( β ˆ ξ ) ( β ˆ 0 ξ ) ˆ 2 ˆ, + V = f ˆ, ξ ( ˆ ξ + v) f ( ˆ, β ˆ ξ ) 2n j j < and YC X ˆ C β + β + ˆ ξ X ˆ C β + β + ˆ 0 0 ξ + V This contradicts that ( βˆ,ξˆ ) is the minimizer. Hence the assumption that j, ˆ ξ < 0 is false. The censoring constraints in equation (9) limit the model space using the rightcensored data. To illustrate how censoring constraints affect model estimation, we consider the following simple example. For simplicity, assume we have model Y = 1+ X + ε, a total of 4 observations are available and 2 observations are censored. Table 1.2 shows the 4 observations. j Table 1.2 Simulated data i X i Y i δ i We choose different values for censoring parameter λ 0 and fit the models. Table 1.3 presents the model values fitted under different λ 0.

30 28 Table 1.3 Model fitted under different λ 0 λ 0 intercept slope Figure 1.3 plots these fitted models. Figure 1.4 plots model estimate of intercept and slope as a function of λ 0. We can see that if λ 0 = 0, i.e. no censoring constraints are used, a perfect model is fitted for the two uncensored observations. However due to the random noises, this perfect model does not predict censored observation 4 very well. The predicted survival time is less than the censored time. If censoring constraints are used, the estimate will try to increase the prediction for observation 4 to reduce the amount of violation of censoring constraints. At the same time, it also changes the slope to balance the residuals for uncensored data. The amount of adjustment depends on the censoring parameter λ 0. It can be seen that when λ 0 increases, the model intercept increases a little bit, but the model slope becomes closer to the true model. The other censored data, observation 2, has no effect to the estimate since its predicted survival time is greater than the censoring time.

31 29 Y lambda0 = 0 lambda0 = 0.1 lambda0 = 0.5 lambda0 = X Figure 1.3 Plot of model fitted under different λ 0. o uncensored data; censored data

32 slope intercept lambda0 Figure 1.4 Slope and intercept vs. λ 0 The next example is to illustrate how censoring constraints affect model selection. We create data from model Y = β 0 + x 1 β 1 + x 2 β 2 + ε Y where β 0 = 1, β 1 =1, β 2 =0, corr(x 1, x 2 )=0, ε Y ~N(σ, 1), and n=10. Table 1.4 shows the estimate for β 2 and standard deviation based on 500 replications under various combinations of censoring percentage and σ with two different modeling methods. Under each combination of censoring percentage and σ, the estimate obtained using the method with censoring constraints is more closer

33 31 to 0 and has lower standard deviation than the estimate obtained with the other method without censoring constraints especially at a high censoring percentage. Since the true model is β 2 =0 in this simulation, this example shows that it is more likely to achieve correct model selection by adding censoring constraints especially when the available uncensored data are very limited. Table 1.4 Estimate of β 2 and standard deviation Censoring percentage σ Without censoring constraints With censoring constraints (0.295) 0.009(0.274) 30% (0.590) (0.519) (1.180) (0.987) (0.446) (0.349) 50% (0.893) (0.659) (1.785) (1.247) (2.911) 0.021(0.801) 70% (5.821) (1.513) (11.643) 0.076(2.941) Ridge and LASSO penalization Ridge regression (Hastie et al., 2001) minimize a penalized residual sum of squares, ˆ β ridge 2 n p = + p 1 2 arg min yi β 0 xij β j λ β j 2 i= 1 j= 1 j= 1 β (11) which can be written in the matrix form as 1 T 1 T RSS( λ) = ( y β 0 Xβ ) ( y β 0 Xβ ) + λβ β (12) 2 2 The ridge regression estimate can be written as ˆ ridge T 1 T β = ( X X + λi) X y (13)

34 32 where I is a p p identity matrix. So the ridge penalty is equivalent to add a positive constant to the diagonal of X T X before inversion. When there is collinearity between the covariates or X T X is not of full rank when p>>n, adding λi makes X T X non-singular. The effect of this penalty is to constrain the size of the coefficients by shrinking them toward zero. More subtle effects are that coefficients of correlated variables are shrunk toward each other as well as toward zero (Hastie & Tibshirani, 2004). To deal with singularity caused by collinearity between the covariates or that X T U X U is not of full rank because p>>n u, where n u is the number of uncensored observations, we add the ridge penalty and the penalized weighted least squares objective function can be written as: β 1 2 λ0 T λ T β ) + ξ ξ + β β 2n 2 ˆ CC T 2 = arg min ( YU β0 XU β ) WU ( YU β0 XU β, ξ subject to Y β + β + ξ (14) C X C 0 where λ 2 is a positive value accounting for the ridge penalty. Tibshirani (1994) proposed to use L 1 LASSO penalty p β for estimation in linear regression. The LASSO estimate is defined by 1 j 1 = arg min β 2 ˆ LASSO y β β n i 0 i= 1 j= 1 p 2 x ij β j subject to p 1 β j s (15) where L 1 LASSO penalty is added as a constraint: Σ j β j s. Because of the nature of this constraint, it tends to produce some coefficients that are exactly zero. This constraint makes the solutions nonlinear in the y i and further modifications need to be made in order to use it in the quadratic programming algorithm.

35 33 One possible way used in Tibshirani (1996) to add the LASSO constraint is to rewrite + = β β β, where + β and β are nonnegative. Then the LASSO constraint becomes: 0 + β, 0 β and s j j j + + β β. Now the weighted least squares estimate βˆ with censoring and LASSO constraints (CC-LASSO) is: ( ) ( ) [ ] ( ) [ ] = + β β λ β β λ ξ ξ λ β β β β β β β β ξ β β T T T U U U T U U LASSO CC n X Y W X Y arg min ˆ, ˆ ,, subject to ξ β β C X C Y, 0 + β, 0 β and s j j j + + β β (16) In Equation (16), the ridge penalty is written as β β λ β β λ T T instead of ( ) ( ) + + β β β β λ T 2 2, which follows from the following lemma 2: Lemma 2. If the pair ( ) + β β ˆ, ˆ is the minimizer of Equation (16) with λ 2 > 0, then p j j j = + 0 0, ˆ ˆ β β. Proof. Define ( ) ( ) [ ] ( ) [ ] = β β λ β β λ ξ ξ λ β β β β β β β β T T T U U U T U U n X Y W X Y f , Assume j that 0 ˆ ˆ > + j β j β, then ν, ( ) + < < j j v β β ˆ, ˆ min 0, and let V be a vector with V j = ν and V i = 0 for i j. We have ) ˆ ( ) ˆ ( ˆ V V = + β β β which satisfies all the constraints in Equation (16) and also

36 34 ( ) ( ) ( ) ( ) ( ) < + = β β β β β β λ β β β β ˆ, ˆ ˆ ˆ ˆ ˆ 2 ˆ, ˆ ˆ, ˆ f v v f V V f j j j j This contradicts that ( ) + β β ˆ, ˆ is the minimizer. Hence the assumption that j, 0 ˆ ˆ > + j β j β is false. Till now we have formed an optimization problem for the penalized weighted least squares method, and added censoring constraints and L 1 and L 2 penalties. The optimization problem in this new form (16) can be solved using the standard quadratic programming algorithm.

37 Algorithms for finding the penalized solutions Typically an intercept β 0 is included in the model, however in all estimators described above, the intercept β 0 has been left out for notational simplicity and we do not penalize β 0. In least square regression, this is achieved by simply centering predictors and responses. Similarly in weighted least square regression, samples can be centered by their weighted means X / W = wni X ni ( i) w and Y W wniy i) / = wni ( (Tibshirani, 1994). X w w If we denote the weighted data by,, X ) 1/ 2 ( w ) ( X X W ) w 1/ 2 Y δ where Y ( w ) ( Y Y W ) ( ( i) ( i) ( i) ( i) = ni ( i) and w ( i) = ni ( i), it is easy to see the weighted least squares estimator in equation (7) is equivalent to ordinary least squares estimator without intercept on weighted data with Kaplan-Meier weights. For censored observations, since their Kaplan- Meier weights are 0, we simply use 1/n to weight them and then use the weighted censored data in censoring constraints. For simplicity, we denote the weighted data by ( ( i) ( i) ( i) Y, δ, X ) hereafter and the equation (14) and (16) can be simplified as β 1 2 λ0 T λ T β ) + ξ ξ + β β 2 2 ˆ CC T 2 = arg min ( YU XU β ) ( YU XU β, ξ subject to Y X C β + ξ (17) C + CC LASSO + ( ˆ 1 β, ˆ β ) = arg min Y X ( β β ) + β, β, ξ 2 λ0 T λ2 + ξ ξ + β 2 2 T + [ ] [ Y X ( β β )] U + T U + β λ2 + β 2 T β U U subject to + + Y X C β + ξ, β 0, β 0 and β + β s (18) C j j j

38 Quadratic programming A standard quadratic program is defined as: QP : T 1 T arg min d b + b Db subject to Ab b0 (19) b 2 where b R p If D is positive semi-definite, i.e., b T Db 0 for all b, then the problem is convex. If D is positive definite, i.e., b T Db > 0 for all b 0, then the problem is strictly convex. For a strictly convex problem, the minimizer is unique if it exists. The weighted least square estimator with censoring constraints β CC defined Equation (17) can be converted to the standard quadratic program by defining the following: β b = ξ ( p+ ) 1 n c d = T X U 0 n c ( p+ ) 1 T X U X U + λ2i D = 0 p p λ I 0 0 nc nc ( p+ n ) ( p+ n ) c c A = [ X C I n c n ] c nc + c ( p n ) b 0 = Y C (20)

39 37 Lemma 3. If λ 0 and λ 2 are both positive finite, there is unique global minimizer for the quadratic program with censoring constraints and ridge penalty defined in Equation (20). Proof. In Equation (20), it can be seen that D is positive definite if λ 0 and λ 2 are both positive. Also there is a feasible solution ( ) = n c p C f Y b (21) Therefore we can get a unique global minimizer for the quadratic program with censoring constraints and ridge penalty defined in Equation (20). Similarly, the weighted least square estimator with censoring and LASSO constraints ( ) CC LASSO + β β, defined in Equation (18) to the standard quadratic program by defining the following: ( ) = n c p b ξ β β [ ] ( ) [ ] p n U U U T n p U U c X X X X d and = = ( ) ( ) c c C C n p n p n n p p U T U I I X X D = ] [ λ λ ( ) ( ) c c c c n p n p p p p n n C C I I X X A = ( ) = n c p C s Y b (22)

40 38 Lemma 4. If λ 0 and λ 2 are both positive finite, there is unique global minimizer for the quadratic program with censoring and LASSO constraints and ridge penalty defined in Equation (22). Proof. It can be seen that D in Equation (22) is positive definite if λ 0 and λ 2 are both positive and a feasible solution is 0 b f = (23) YC ( 2 p+ ) 1 n c Therefore we can also conclude that there exists a unique global minimizer for the quadratic program with censoring and LASSO constraints and ridge penalty defined in Equation (22) Variable selection For problems when p > n, it is desirable to build a parsimonious model. If we use Equation (14) to fit the AFT model, one possible approach is to recursively remove variables with very small coefficients and rebuild model again. We remove one variable at a time that has the smallest absolute coefficient value if p is small. In case of p is large, we can remove a small set of features at a time that have smallest absolute coefficient values to speed up the process. If we use Equation (16) to fit the AFT model, the LASSO bound can be used to achieve model parsimony and select important covariates since LASSO constraint tends to produce zero coefficients. However, unlike another model selection algorithm LARS (Efron et al., 2004) whose lasso solution paths grow piecewise linearly in a predictable one at a time way thus can produce exactly zero coefficients, our method is found to

41 39 have the same issue as discussed in Tibshirani (1996), like model selection, the LASSO is a tool for achieving parsimony but in actuality an exact zero coefficient is unlikely to occur. For this reason, for our variable selection algorithms with LASSO constraint as discussed in the next section, we choose a very small ε (ε = 1e-8 in this study) and set those variables whose absolute values of coefficients less than ε to have zero coefficient thus produce parsimonious models. There are some very well known model selection criteria such as Cp (Mallows, 1973), AIC (Akaike, 1973, 1974) and BIC (Schwartz, 1978). We use AIC (Akaike s Information Criterion) in this dissertation work. Akaike (1973, 1974) defined AIC as ( L( ˆ y ) K AIC = 2 log θ + 2 (24) where ( L ( ˆ y ) log θ is the maximized log-likelihood and K is the number of estimable parameters in the model. For model selection, we define ( L ( ˆ y ) log θ as the weighted K- fold cross-validation error, i.e., the sum of squared residuals of uncensored data multiplied by the Kaplan-Meier weights, and K as the number of variables with non-zero coefficients. Thus the AIC criterion can be formulated as: AIC= 2*n u *Log (CV-WLSE) + 2*K (25) where n u is the number of uncensored observations and CV-WLSE is the weighted K-fold cross-validation error. This AIC criterion is to be used when the total number of covariates is not so large relative to the sample size. We use AIC in all the simulation studies. AIC may perform poorly and drastically estimate corresponding risk function, i.e., lead to over-fitting, when a candidate model is over-specified and there are too many parameters relative to sample size (Sugiura 1978, Sakamoto et al. 1986). Hurvich and

42 40 Tsai (1989) studied this small-sample (second order) bias adjustment and derived a criterion AIC c : ( ( ˆ N L y ) + 2K AIC c = 2 log θ (26) N k 1 where N is the sample size. For datasets with high dimensional covariates such as survival time and gene expression data, we use AIC c which is formulated as AIC c = n u *Log(CV-WLSE) + 2*K * n u / (n u - K - 1) (27) Algorithms for finding the penalized solutions Based on above discussions, we provide in the following a few algorithms to build AFT models with different kinds of penalizations.

43 41 Algorithm 1 Quadratic Programming with Censoring Constraints (QPCC) 1) Initialize variable set FS = {1,, p}. 2) Divide the data into K folds. Leave out one fold at a time and solve the problem defined in Equation (14) using the standard quadratic programming algorithm. 3) Average the K models built in 2) by simply averaging their coefficients. Evaluate model quality by computing the weighted least square error using the averaged model. Evaluate the AIC or AIC c score and save the averaged model if it has the lowest AIC or AIC c score. 4) Stop if there is only one variable left and return the model with the lowest AIC or AIC c score. 5) Remove one variable or a small set of variables from FS that have smallest absolute coefficient values. Go to 2). Algorithm 2 Quadratic Programming (QP) This algorithm is used for comparison purpose only. It is the same as algorithm 1 except that we exclude the censoring constraints when solving Equation (14) in step 2).

44 42 Algorithm 3 Quadratic Programming with Censoring and LASSO Constraints (QPCCLASSO) 1) For each of a given set of LASSO bounds, use the standard quadratic programming algorithm to solve equation (16). 2) Find the variable set FS from the solution which are the variables with absolute values of coefficients larger than a small ε (ε = 1e-8 in this study). 3) Use the variable set in step 2) and divide the data into K folds. Leave out one fold at a time and solve the problem defined in Equation (16) using the standard quadratic programming algorithm. 4) Average the K models built in 3) by simply averaging their coefficients. Evaluate model quality by computing the weighted least square error using the averaged model. Evaluate the AIC or AIC c score and save the averaged model if it has the lowest AIC or AIC c score. 5) Repeat step 1) to 4) until all the bounds are exhausted. Return the model with the best AIC or AIC c score. Algorithm 4 Quadratic Programming with LASSO Constraints (QPLASSO) This algorithm is used for comparison purpose only. It is the same as algorithm 3 except that we exclude the censoring constraints when solving Equation (16) in step 1) and 3).

45 Parameter tuning When solving equation (14) or (16), we have two parameters to be tuned. One parameter is λ 2 which accounts for the ridge penalty. There are already many discussions on how to choose appropriate ridge penalties, which will not be discussed here. The advantage of using λ 2 in the algorithms is that it can make the quadratic programming problem strictly convex and therefore only one global optimizer can be found. The other parameter is λ 0 which accounts for the penalties of violations of censoring constraints. The value of this parameter depends how stringently one wants the model to satisfy censoring constraints. The larger the value, the more likely the model will satisfy censoring constraints, but may have poorer predictions for uncensored samples. A good practice is to balance the model quality for uncensored data and the censoring constraints. We can build a model by simply letting λ 0 be 1 and then check how the model predicts both censored and uncensored samples. If uncensored samples are predicted very well but most censoring constraints are severely violated, we may want to increase the value of λ 0 and start over again. If most censoring constraints are satisfied but uncensored samples are very poorly predicted, we may want to decrease the value of λ 0 and start over again. In our study, we find out that setting λ 0 as 1 will be sufficient for most problems.

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

Tuning Parameter Selection in L1 Regularized Logistic Regression

Tuning Parameter Selection in L1 Regularized Logistic Regression Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University

More information

PENALIZED METHOD BASED ON REPRESENTATIVES & NONPARAMETRIC ANALYSIS OF GAP DATA

PENALIZED METHOD BASED ON REPRESENTATIVES & NONPARAMETRIC ANALYSIS OF GAP DATA PENALIZED METHOD BASED ON REPRESENTATIVES & NONPARAMETRIC ANALYSIS OF GAP DATA A Thesis Presented to The Academic Faculty by Soyoun Park In Partial Fulfillment of the Requirements for the Degree Doctor

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Univariate shrinkage in the Cox model for high dimensional data

Univariate shrinkage in the Cox model for high dimensional data Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Variable Selection in Competing Risks Using the L1-Penalized Cox Model

Variable Selection in Competing Risks Using the L1-Penalized Cox Model Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2008 Variable Selection in Competing Risks Using the L1-Penalized Cox Model XiangRong Kong Virginia Commonwealth

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis

Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant

More information

High-Dimensional Statistical Learning: Introduction

High-Dimensional Statistical Learning: Introduction Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

TGDR: An Introduction

TGDR: An Introduction TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Focused fine-tuning of ridge regression

Focused fine-tuning of ridge regression Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm Zhenqiu Liu, Dechang Chen 2 Department of Computer Science Wayne State University, Market Street, Frederick, MD 273,

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Multicategory Vertex Discriminant Analysis for High-Dimensional Data

Multicategory Vertex Discriminant Analysis for High-Dimensional Data Multicategory Vertex Discriminant Analysis for High-Dimensional Data Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park October 8, 00 Joint work with Prof. Kenneth

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

PENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract

PENALIZED PRINCIPAL COMPONENT REGRESSION. Ayanna Byrd. (Under the direction of Cheolwoo Park) Abstract PENALIZED PRINCIPAL COMPONENT REGRESSION by Ayanna Byrd (Under the direction of Cheolwoo Park) Abstract When using linear regression problems, an unbiased estimate is produced by the Ordinary Least Squares.

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Regularized Discriminant Analysis and Its Application in Microarray

Regularized Discriminant Analysis and Its Application in Microarray Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of

More information

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

Survival Prediction Under Dependent Censoring: A Copula-based Approach

Survival Prediction Under Dependent Censoring: A Copula-based Approach Survival Prediction Under Dependent Censoring: A Copula-based Approach Yi-Hau Chen Institute of Statistical Science, Academia Sinica 2013 AMMS, National Sun Yat-Sen University December 7 2013 Joint work

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Regularized Discriminant Analysis and Its Application in Microarrays

Regularized Discriminant Analysis and Its Application in Microarrays Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

A simulation study of model fitting to high dimensional data using penalized logistic regression

A simulation study of model fitting to high dimensional data using penalized logistic regression A simulation study of model fitting to high dimensional data using penalized logistic regression Ellinor Krona Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats

More information

Grouped variable selection in high dimensional partially linear additive Cox model

Grouped variable selection in high dimensional partially linear additive Cox model University of Iowa Iowa Research Online Theses and Dissertations Fall 2010 Grouped variable selection in high dimensional partially linear additive Cox model Li Liu University of Iowa Copyright 2010 Li

More information

Extensions of Cox Model for Non-Proportional Hazards Purpose

Extensions of Cox Model for Non-Proportional Hazards Purpose PhUSE Annual Conference 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Author: Jadwiga Borucka PAREXEL, Warsaw, Poland Brussels 13 th - 16 th October 2013 Presentation Plan

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

arxiv: v1 [stat.me] 7 Dec 2013

arxiv: v1 [stat.me] 7 Dec 2013 VARIABLE SELECTION FOR SURVIVAL DATA WITH A CLASS OF ADAPTIVE ELASTIC NET TECHNIQUES arxiv:1312.2079v1 [stat.me] 7 Dec 2013 By Md Hasinur Rahaman Khan University of Dhaka and By J. Ewart H. Shaw University

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Simultaneous Coefficient Penalization and Model Selection in Geographically Weighted Regression: The Geographically Weighted Lasso

Simultaneous Coefficient Penalization and Model Selection in Geographically Weighted Regression: The Geographically Weighted Lasso Simultaneous Coefficient Penalization and Model Selection in Geographically Weighted Regression: The Geographically Weighted Lasso by David C. Wheeler Technical Report 07-08 October 2007 Department of

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. *

Least Absolute Deviations Estimation for the Accelerated Failure Time Model. University of Iowa. * Least Absolute Deviations Estimation for the Accelerated Failure Time Model Jian Huang 1,2, Shuangge Ma 3, and Huiliang Xie 1 1 Department of Statistics and Actuarial Science, and 2 Program in Public Health

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information