Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina
|
|
- Erika Cameron
- 5 years ago
- Views:
Transcription
1 Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes
2 Introduction Method Theoretical Results Simulation Studies Application Conclusions
3 Introduction
4 Introduction For survival data, one important goal is to use a subject s baseline information to predict the exact timing of an event.
5 Introduction For survival data, one important goal is to use a subject s baseline information to predict the exact timing of an event. A variety of regression models focus on estimating the survival function and evaluating covariate effects, but not on predicting event times.
6 Introduction For survival data, one important goal is to use a subject s baseline information to predict the exact timing of an event. A variety of regression models focus on estimating the survival function and evaluating covariate effects, but not on predicting event times. We consider supervised learning algorithms for prediction: they directly aim for prediction; they are nonparametric; they are powerful and flexible in handling large number of predictors.
7 Introduction Many learning methods exist for predicting non-censored outcomes, of which support vector machines (SVM) are commonly used for binary outcomes.
8 Introduction Many learning methods exist for predicting non-censored outcomes, of which support vector machines (SVM) are commonly used for binary outcomes. A simple geometric interpretation. A convex quadratic programming problem. Inclusion of non-linearity by using kernels. Adapted for regression with a continuous response by using the ɛ-insensitive loss.
9 Introduction For survival outcomes, censoring imposes new challenge for SVM. Most of existing methods focus on modifying the support vector regression.
10 Introduction For survival outcomes, censoring imposes new challenge for SVM. Most of existing methods focus on modifying the support vector regression. Shivaswamy et al. (2007) and Khan and Zubek (2008) generalized the ɛ-insensitive loss function.
11 Introduction Van Belle et al. (2009, 2011a) adopted the concept of concordance index to handle censored data. They considered all the comparable pairs. For observed times y i < y j, ranking constraints were added for predicted values to penalize misranking: f (x j ) f (x i ) 1 ζ ij, i < j.
12 Introduction Van Belle et al. (2009, 2011a) adopted the concept of concordance index to handle censored data. They considered all the comparable pairs. For observed times y i < y j, ranking constraints were added for predicted values to penalize misranking: f (x j ) f (x i ) 1 ζ ij, i < j. Van Belle et al. (2011b) included both regression (loss function modification) and ranking constraints. Prediction rule not clear; No theoretical justification; Observed information may not be fully used; Censoring is completely random.
13 Introduction Goldberg and Kosorok (2013) used inverse probability of censoring weighting to adapt standard support vector methods.
14 Introduction Goldberg and Kosorok (2013) used inverse probability of censoring weighting to adapt standard support vector methods. Their method may suffer from severe bias when the censoring distribution is misspecified. The learning only uses uncensored observations. Large weights (high censoring) often make algorithms numerically unstable and even infeasible.
15 Method
16 Method We represent the survival times in the framework of the counting process.
17 Method We represent the survival times in the framework of the counting process. At each event time, we use a support vector machine to identify event vs. non-events.
18 Method A risk score f (t, X) = α(t) + g(x) is used to classify the binary outcomes (o/x) in a maximal separation sense.
19 Method A risk score f (t, X) = α(t) + g(x) is used to classify the binary outcomes (o/x) in a maximal separation sense. α(t): depending on t; used to stratify times for risk sets;
20 Method A risk score f (t, X) = α(t) + g(x) is used to classify the binary outcomes (o/x) in a maximal separation sense. α(t): depending on t; used to stratify times for risk sets; g(x): a function of covariates for true prediction, e.g. g(x) = X T β for linear kernel;
21 Method A risk score f (t, X) = α(t) + g(x) is used to classify the binary outcomes (o/x) in a maximal separation sense. α(t): depending on t; used to stratify times for risk sets; g(x): a function of covariates for true prediction, e.g. g(x) = X T β for linear kernel; imbalance between events and non-events at each t (1 vs n).
22 Method Notations: counting process N i (t) = I(T i C i t); at-risk process Y i (t) = I(T i C i t); total number of events: d = n i=1 I(T i C i ). Primal form: min α,g 1 2 g 2 + C n s.t. Y i (t j )ζ i (t j ) 0, n i=1 d Y i (t j )w i (t j )ζ i (t j ) j=1 Y i (t j )δn i (t j ){α(t j ) + g(x i )} Y i (t j ){1 ζ i (t j )}, where δn i (t j ) 2{N i (t j ) N i (t j )} 1, and w i (t j ) = I { δn i (t j ) = 1 } { } 1 1 n i=1 Y +I { δn i (t j ) = 1 } { } 1 i(t j ) n i=1 Y. i(t j )
23 Method n d Dual form: L D = γ ij Y i (t j ) i=1 i =1 j=1 j =1 i=1 j=1 1 n n d d γ ij γ i j 2 Y i(t j )Y i (t j )δn i (t j )δn i (t j )K(X i, X i ) s.t. 0 γ ij w i (t j )C n, n γ ij Y i (t j )δn i (t j ) = 0. i=1
24 Method n d Dual form: L D = γ ij Y i (t j ) i=1 i =1 j=1 j =1 i=1 j=1 1 n n d d γ ij γ i j 2 Y i(t j )Y i (t j )δn i (t j )δn i (t j )K(X i, X i ) s.t. 0 γ ij w i (t j )C n, n γ ij Y i (t j )δn i (t j ) = 0. i=1 The kernel function K(X i, X i ) = g(x i ) T g(x i ). The predictive score ĝ(x i ) = n i=1 d j=1 ˆγ ijy i (t j )δn i (t j )K(X i, X i ).
25 Method Prediction: using the predictive scores ĝ(x) and distinct event times in the training data set to predict the survival outcome of the future subject. A future subject with ĝ(xnew) Training data set: ĝ(x 3 ) T 1 ĝ(x 2 ) T 2 ĝ(x 1 ) T 3... If the value of ĝ(xnew) is closest to ĝ(x 2 ), we predict the survival outcome of this subject to be T 2.
26 Theoretical Results
27 Empirical risk Regularization Form: min R n (f ) + λ n g, where n d R n(f ) = n 1 w i (t j )Y i (t j )[1 (α(t j ) + g(x i ))δn i (t j )] +. i=1 j=1
28 Empirical risk Regularization Form: min R n (f ) + λ n g, where R n(f ) = n 1 n i=1 d w i (t j )Y i (t j )[1 (α(t j ) + g(x i ))δn i (t j )] +. j=1 After substituting ˆα(t j ) into R n (f ), we obtain a profile empirical risk PR n (g) for g( ): 1 n n i=1 n k=1 I(Y k Y i )[2 g(x i ) + g(x k )] + i n k=1 I(Y k Y i ) 2 n n i=1 i n k=1 I(Y k Y i ).
29 Empirical risk Regularization Form: min R n (f ) + λ n g, where R n(f ) = n 1 n i=1 d w i (t j )Y i (t j )[1 (α(t j ) + g(x i ))δn i (t j )] +. j=1 After substituting ˆα(t j ) into R n (f ), we obtain a profile empirical risk PR n (g) for g( ): 1 n n i=1 n k=1 I(Y k Y i )[2 g(x i ) + g(x k )] + i n k=1 I(Y k Y i ) 2 n n i=1 i n k=1 I(Y k Y i ). ĝ(x) minimizes PR n (g) + λ n g, and R n (ˆf ) = PR n (ĝ). PR n (g) takes a similar form as the partial likelihood function in survival analysis under a different loss function.
30 Risk Function and Optimal Decision Rule We consider another empirical risk R 0n (f ), comparable to the concept of 0-1 loss in standard SVMs, R 0n (f ) = n 1 n i=1 d w i (t j )Y i (t j )I(δN i (t j )f (t j, X i ) < 0). j=1
31 Risk Function and Optimal Decision Rule We consider another empirical risk R 0n (f ), comparable to the concept of 0-1 loss in standard SVMs, R 0n (f ) = n 1 n i=1 d w i (t j )Y i (t j )I(δN i (t j )f (t j, X i ) < 0). j=1 Denoting the asymptotic limits of R n (f ) and R 0n (f ) to be R(f ) and R 0 (f ), we find the optimal decision rule f (t, x) that minimizes both R(f ) and R 0 (f ), and the minimal risk R 0 (f ).
32 Risk Function and Optimal Decision Rule Theorem Let h(t, x) denote the conditional hazard rate function of T = t given X = x and let h(t) = E[dN(t)/dt]/E[Y(t)] = E[h(t, X) Y(t) = 1] be the average hazard rate at time t. Then f (t, x) = sign(h(t, x) h(t)) minimizes R(f ). Furthermore, f (t, x) also minimizes R 0 (f ) and R 0 (f ) = P(T C) 1 [ ] 2 E E{Y(t) X = x} h(t, x) h(t) dt. In addition, for any f (t, x) [ 1, 1], for some constant c. R 0 (f ) R 0 (f ) R(f ) R(f )
33 Asymptotic Properties We consider the profile risk for PR(g) and the reproducing kernel space H n from a Gaussian kernel. Then we derive the asymptotic learning rate. Theorem Assume that X s support is compact and E[Y(τ) X] is bounded from zero where τ is the study duration. Furthermore, assume λ n and σ n satisfies λ n, σ n 0, and nλ n σ n (2/p 1/2)d for some p (0, 2). Then it holds λ n ĝ 2 H n + PR(ĝ) inf PR(g) + O p g { λ n + σ d/2 n } + λ 1/2 n σ n (1/p 1/4)d. n
34 Simulation Studies
35 Simulation Setup Our method compared with Van Belle et al., 2011 (Modified SVR) and Goldberg and Kosorok, 2013 (IPCW). Five baseline covariates X marginally normal with mean 0 and variance Survival times generated from Cox model with baseline Weibull distribution and β = (2, 1.6, 1.2, 0.8, 0.4). Censoring times depending on covariates X, and censoring ratios 40% and 60%: Case 1 using AFT models for censoring; Case 2 using Cox models for censoring.
36 Simulation Setup Using linear kernel; Sample size 100 and 200; 500 replicates. Tuning parameter C n selected using 5-fold cross-validation via a grid 2 16, 2 15,..., observations in the testing data set. Evaluating prediction performances: correlation, RMSE.
37 Simulation Results Table: Case 1, censoring times following the AFT model # of n = 100 n = 200 Censoring Noises Method Corr. RMSE Ratio Corr. RMSE Ratio 40% 0 Modified SVR (0.60) (0.58) 1.24 IPCW-KM (0.52) (0.41) 1.21 IPCW-Cox (0.64) (0.57) 1.25 SVHR (0.27) (0.17) Modified SVR (0.60) (0.57) 1.22 IPCW-KM (0.47) (0.44) 1.22 IPCW-Cox (0.54) (0.57) 1.27 SVHR (0.35) (0.20) Modified SVR (0.47) (0.50) 1.15 IPCW-KM (0.32) (0.34) 1.22 IPCW-Cox (0.46) (0.47) 1.26 SVHR (0.36) (0.23) Modified SVR (0.89) (0.47) 1.09 IPCW-KM (0.21) (0.14) 1.09 IPCW-Cox (0.23) (0.39) 1.15 SVHR (0.32) (0.25) 1.00
38 Simulation Results Table: Case 1, censoring times following the AFT model # of n = 100 n = 200 Censoring Noises Method Corr. RMSE Ratio Corr. RMSE Ratio 60% 0 Modified SVR (0.54) (0.42) 1.24 IPCW-KM (0.41) (0.37) 1.32 IPCW-Cox (0.47) (0.48) 1.33 SVHR (0.43) (0.33) Modified SVR (0.53) (0.50) 1.21 IPCW-KM (0.34) (0.32) 1.31 IPCW-Cox (0.39) (0.39) 1.33 SVHR (0.48) (0.33) Modified SVR (0.45) (0.45) 1.15 IPCW-KM (0.30) (0.24) 1.26 IPCW-Cox (0.30) (0.27) 1.29 SVHR (0.44) (0.36) Modified SVR (1.08) (1.52) 1.21 IPCW-KM (0.26) (0.20) 1.10 IPCW-Cox (0.20) (0.21) 1.15 SVHR (0.24) (0.25) 1.00
39 Simulation Results Table: Case 2, censoring times following the Cox model # of n = 100 n = 200 Censoring Noises Method Corr. RMSE Ratio Corr. RMSE Ratio 40% 0 Modified SVR (0.59) (0.54) 1.12 IPCW-KM (0.42) (0.31) 1.12 IPCW-Cox (0.57) (0.46) 1.12 SVHR (0.25) (0.16) Modified SVR (0.51) (0.50) 1.12 IPCW-KM (0.42) (0.34) 1.13 IPCW-Cox (0.52) (0.51) 1.16 SVHR (0.29) (0.18) Modified SVR (0.40) (0.38) 1.06 IPCW-KM (0.34) (0.30) 1.13 IPCW-Cox (0.40) (0.43) 1.18 SVHR (0.33) (0.20) Modified SVR (0.92) (0.54) 1.04 IPCW-KM (0.21) (0.18) 1.05 IPCW-Cox (0.23) (0.22) 1.07 SVHR (0.40) (0.24) 1.00
40 Simulation Results Table: Case 2, censoring times following the Cox model # of n = 100 n = 200 Censoring Noises Method Corr. RMSE Ratio Corr. RMSE Ratio 60% 0 Modified SVR (0.56) (0.47) 1.12 IPCW-KM (0.43) (0.33) 1.16 IPCW-Cox (0.56) (0.48) 1.17 SVHR (0.37) (0.25) Modified SVR (0.48) (0.46) 1.09 IPCW-KM (0.38) (0.35) 1.17 IPCW-Cox (0.44) (0.47) 1.20 SVHR (0.37) (0.27) Modified SVR (0.42) (0.38) 1.06 IPCW-KM (0.31) (0.26) 1.16 IPCW-Cox (0.40) (0.33) 1.20 SVHR (0.40) (0.29) Modified SVR (0.87) (0.80) 1.05 IPCW-KM (0.29) (0.21) 1.03 IPCW-Cox (0.26) (0.23) 1.08 SVHR (0.38) (0.35) 1.00
41 Application
42 Huntington s Disease Study Data The study is to identify and combine clinical and biological markers to detect early indicators of disease progression. The outcome is age observed at event or censored time. There are 705 subjects and 126 of them are non-censored. We study the prediction capability of 15 covariates on the age-at-onset of Huntington s Disease. Three-fold cross validation is used to choose tuning parameter C n. We consider both linear kernel and Gaussian kernel. To evaluate the prediction capability, we assess the usefulness of combined score in performing risk stratification.
43 Huntington s Disease Study Data Table: Normalized coefficient estimates using linear kernel Marker Normalized β Cox model a Total Motor Score * CAP * Stroop Color Stroop Word SDMT Stroop Interference FRSBE Total UHDRS Psychiatric SCL90 Depression SCL90 GSI SCL90 PST SCL90 PSDI TFC Education Male Gender * a The estimates from Cox model with significant p-value (p-value < 0.05) are marked with *.
44 Huntington s Disease Study Data Table: Comparison of prediction capability for different methods 25th percentile 50th percentile 75th percentile Kernel Method C-index Logrank χ 2 a HR b Logrank χ 2 HR Logrank χ 2 HR Linear Modified SVR IPCW-KM IPCW-Cox SVHR Gaussian Modified SVR IPCW-KM IPCW-Cox SVHR a Logrank χ 2, Chi-square statistics from Logrank tests for two groups separated using the 25th percentile, 50th percentile, and 75th percentile of predicted values. b HR, Hazard Ratios comparing two groups separated using the 25th percentile, 50th percentile, and 75th percentile of predicted values.
45 Huntington s Disease Study Data 10 Linear Kernel 8 Hazard ratio Percentile as the cut point separating binary groups Figure: Hazard ratios comparing two groups separated using percentiles of predicted values as cut points. Hazard ratio Gaussian Kernel Dotted curve: Modified SVR; Dashed curve: IPCW-KM; Dashed-dotted curve: IPCW-Cox; Black solid curve: SVHR Percentile as the cut point separating binary groups
46 Atherosclerosis Risk in Communities Study Data A prospective epidemiologic study to investigate the etiology of atherosclerosis in cardiovascular risk factors. Baseline examination enrolled 15,792 participants with ages from four U.S. communities. In this example, we apply our method to part of the baseline data, where participants are African-American males with hypertension living in Jackson, Mississippi. There are 624 participants and 133 of them are non-censored. We assess the prediction capability of some common cardiovascular risk factors for incident heart failure until We analyze the data following the same procedure.
47 Atherosclerosis Risk in Communities Study Data Table: Normalized coefficient estimates using linear kernel Covariate Normalized β Cox model a Age (in years) * Diabetes * BMI (kg/m 2 ) SBP (mm of Hg) Fasting glucose (mg/dl) Serum albumin (g/dl) * Serum creatinine (mg/dl) Heart rate (beats/minute) Left ventricular hypertrophy * Bundle branch block * Prevalent CHD * Valvular heart disease * HDL (mg/dl) * LDL (mg/dl) Pack years of smoking * Current smoking status Former smoking status * a The estimates from Cox model with significant p-value (p-value < 0.05) are marked with *.
48 Atherosclerosis Risk in Communities Study Data Table: Comparison of prediction capability for different methods 25th percentile 50th percentile 75th percentile Kernel Method C-index Logrank χ 2 a HR b Logrank χ 2 HR Logrank χ 2 HR Linear Modified SVR IPCW-KM IPCW-Cox Our method Gaussian Modified SVR IPCW-KM IPCW-Cox Our method a Logrank χ 2, Chi-square statistics from Logrank tests for two groups separated using the 25th percentile, 50th percentile, and 75th percentile of predicted values. b HR, Hazard Ratios comparing two groups separated using the 25th percentile, 50th percentile, and 75th percentile of predicted values.
49 Atherosclerosis Risk in Communities Study Data 10 Linear Kernel 8 Hazard ratio Percentile as the cut point separating binary groups Figure: Hazard ratios comparing two groups separated using percentiles of predicted values as cut points. Hazard ratio Gaussian Kernel Dotted curve: Modified SVR; Dashed curve: IPCW-KM; Dashed-dotted curve: IPCW-Cox; Black solid curve: SVHR Percentile as the cut point separating binary groups
50 Conclusions
51 Concluding Remarks We adapted the learning algorithm SVM to predict event times in the framework of counting process. The proposed method is optimal in discriminating covariate specific hazard from population average hazard. We can handle censored data appropriately without specifying the censoring distribution. Numerical studies showed superiority of SVHR, especially in the presence of high censoring ratio and noise variables. One potential challenge is the fast growing dimensions of the quadratic programming optimization as the sample size increases.
Building a Prognostic Biomarker
Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationVARIABLE SELECTION AND STATISTICAL LEARNING FOR CENSORED DATA. Xiaoxi Liu
VARIABLE SELECTION AND STATISTICAL LEARNING FOR CENSORED DATA Xiaoxi Liu A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements
More informationSurvival Analysis for Case-Cohort Studies
Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz
More informationResiduals and model diagnostics
Residuals and model diagnostics Patrick Breheny November 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/42 Introduction Residuals Many assumptions go into regression models, and the Cox proportional
More informationOptimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai
Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment
More informationLecture 3. Truncation, length-bias and prevalence sampling
Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in
More informationIndividualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models
Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationBeyond GLM and likelihood
Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationLecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina
Lecture 9: Learning Optimal Dynamic Treatment Regimes Introduction Refresh: Dynamic Treatment Regimes (DTRs) DTRs: sequential decision rules, tailored at each stage by patients time-varying features and
More informationSurvival Analysis Math 434 Fall 2011
Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationSemiparametric Regression
Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMAS3301 / MAS8311 Biostatistics Part II: Survival
MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationYou know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?
You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David
More informationApproximation of Survival Function by Taylor Series for General Partly Interval Censored Data
Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationOn Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data
On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data Yehua Li Department of Statistics University of Georgia Yongtao
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationStatistical Methods for Alzheimer s Disease Studies
Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations
More informationPart III Measures of Classification Accuracy for the Prediction of Survival Times
Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples
More informationRobustifying Trial-Derived Treatment Rules to a Target Population
1/ 39 Robustifying Trial-Derived Treatment Rules to a Target Population Yingqi Zhao Public Health Sciences Division Fred Hutchinson Cancer Research Center Workshop on Perspectives and Analysis for Personalized
More informationBayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects
Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University September 28,
More informationConsider Table 1 (Note connection to start-stop process).
Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationβ j = coefficient of x j in the model; β = ( β1, β2,
Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)
More informationAbout this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes
About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationRobust Kernel-Based Regression
Robust Kernel-Based Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial
More information[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements
[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers
More informationPENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA
PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationSupport Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature
Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationMarginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback
University of South Carolina Scholar Commons Theses and Dissertations 2017 Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback Yanan Zhang University of South Carolina Follow
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationJoint Modeling of Longitudinal Item Response Data and Survival
Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationSurvival SVM: a Practical Scalable Algorithm
Survival SVM: a Practical Scalable Algorithm V. Van Belle, K. Pelckmans, J.A.K. Suykens and S. Van Huffel Katholieke Universiteit Leuven - Dept. of Electrical Engineering (ESAT), SCD Kasteelpark Arenberg
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationLinear Methods for Classification
Linear Methods for Classification Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Classification Supervised learning Training data: {(x 1, g 1 ), (x 2, g 2 ),..., (x
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationSemiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes
Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes by Se Hee Kim A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial
More informationAchieving Optimal Covariate Balance Under General Treatment Regimes
Achieving Under General Treatment Regimes Marc Ratkovic Princeton University May 24, 2012 Motivation For many questions of interest in the social sciences, experiments are not possible Possible bias in
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationSupport Vector Machines Explained
December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
More informationMultistate models and recurrent event models
Multistate models Multistate models and recurrent event models Patrick Breheny December 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Multistate models In this final lecture,
More informationPhD course: Statistical evaluation of diagnostic and predictive models
PhD course: Statistical evaluation of diagnostic and predictive models Tianxi Cai (Harvard University, Boston) Paul Blanche (University of Copenhagen) Thomas Alexander Gerds (University of Copenhagen)
More informationEstimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.
Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationPart IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation
Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationLecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016
Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family
More informationPairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion
Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial
More informationSTAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis
STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationData splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:
#+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationIn contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require
Chapter 5 modelling Semi parametric We have considered parametric and nonparametric techniques for comparing survival distributions between different treatment groups. Nonparametric techniques, such as
More informationLongitudinal + Reliability = Joint Modeling
Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More information