Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Size: px
Start display at page:

Download "Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:"

Transcription

1 #+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen #+DATE: 30 May :--- datasplitting-presentation.tex Top 1/

2 Outline * Introduction * Parameters of interest * Estimation methods * Simulation results * Guidelines & pitfalls * Summary --:--- datasplitting-presentation.tex 4% 2/

3 This talk is not about - how to deal with censored data - how to deal with competing risks - the choice of the prediction metric - the choice of the modelling algorithm This talk is about methods for internal validation of risk prediction models based on repeated data splitting. --:--- datasplitting-presentation.tex 6% 3/41 [Introduction]

4 The estimation problem: a mission impossible A statistical risk prediction model which only works in its own training data is practically useless. Aim: to estimate how well the model generalizes to new data, i.e. how it will perform in *yet unseen patients* Dilemma: there are no new data! --:--- datasplitting-presentation.tex 8% 4/41 [Introduction]

5 For the purpose of illustration The Copenhagen stroke study: Ten years of follow-up of 518 patients after stroke until death. Cox regression: Factor unit Hazard.ratio CI.95 P.value Age per year 1.06 [1.04;1.07] < Sex female male 1.47 [1.18;1.83] Hypertension no yes 1.24 [1.00;1.53] Stroke history no yes 1.27 [0.98;1.63] Other disease no yes 1.13 [0.87;1.47] 0.35 Alcohol no yes 0.91 [0.72;1.16] Diabetes no yes 1.38 [1.04;1.81] Smoking no yes 1.30 [1.04;1.62] Stroke score scale 0.98 [0.97;0.98] < Cholesterol mg/dl 1.00 [0.92;1.08] In what follows: Simulate data that are "alike" the real data based on Weibull regression models --:--- datasplitting-presentation.tex 10% 5/41 [Introduction]

6 Notation (part I) Predictors X R p age, sex, diabetes, stroke score,... Outcome at time t Y(t) {0,1} 0 = survived, 1 = dead Risk prediction model: (X, t ) [0, 1] ˆR n (t X ) P (Y (t ) = 1 X ) Performance metric 1 E[{Y (t ) ˆR n (t X )} 2 ] --:--- datasplitting-presentation.tex 12% 6/41 [Introduction]

7 Fundamental idea Data splitting is very intuitive: we hide one part of the data, learn on the rest, and then check our knowledge on what was hidden. There is a hidden parameter here: hide and how much we show. how much we --:--- datasplitting-presentation.tex 14% 7/41 [Introduction]

8 Sketch: 3-fold CV --:--- datasplitting-presentation.tex 16% 8/41 [Introduction]

9 Effect of learning sample size on predictions Copenhagen stroke study (n=518, p=11, n.event=404) Cox regression model (B=100 repetitions) 100 % 75 % 50 % Factor age sex male female male hypten no no no prevstroke no no no othdisease no no no alcohol no no no diabetes yes no no smoke yes no yes atrialfib no no no strokescore cholest time status % 0 % 5 year survival probability Patient nr. 17 Patient nr. 18 Patient nr. 19 loo Percentage of hidden data --:--- datasplitting-presentation.tex 18% 9/41 [Introduction]

10 Effect of test set size on model performance Performance of 5 year prediction 100 % 75 % Cox regression (Copenhagen stroke study) 50 % Coin flipper 25 % Size of simulated validation sample --:--- datasplitting-presentation.tex 20% 10/41 [Introduction]

11 Effect of split-ratio No matter how often we split the data: - The smaller the size of the learning samples, the higher the variability of the risk predictions - The smaller the size of the validation samples, the higher the variability of the estimate of model performance --:--- datasplitting-presentation.tex 22% 11/41 [Introduction]

12 Effect of split-ratio No matter how often we split the data: - The smaller the size of the learning samples, the higher the variability of the risk predictions - The smaller the size of the validation samples, the higher the variability of the estimate of model performance *Dilemma* *Tradeoff* *#%&Grr!!?! --:--- datasplitting-presentation.tex 24% 11/41 [Introduction]

13 Effect of split-ratio No matter how often we split the data: - The smaller the size of the learning samples, the higher the variability of the risk predictions - The smaller the size of the validation samples, the higher the variability of the estimate of model performance *Dilemma* *Tradeoff* *#%&Grr!!?! Wait a minute: What do we want to estimate? --:--- datasplitting-presentation.tex 26% 11/41 [Introduction]

14 Dietterich (1998) A frequently-applied strategy is to convert Question 2 into Question 6 --:--- datasplitting-presentation.tex 28% 12/41 [Parameters of interest] --

15 Notation (part II) The one and only: D n = {(Y 1 (t ), X 1 ),..., (Y n (t ), X n )} [data-set] The available risk prediction model: ˆR n = R(D n ) [trained in this data set] The average model at n : r n = E Dn R (D n ) [trained in a data set of size n ] Note: The function R is the model selection algorithm --:--- datasplitting-presentation.tex 30% 13/41 [Parameters of interest] --

16 Definition: a) Conditional performance at D n : [ { } ] 2 = E Y,X Y (t ) ˆR n (t X ) D n, b) Expected performance at sample size n : [ { } ]) 2 = E Dn (E Y,X Y (t ) ˆR n (t X ) D n. --:--- datasplitting-presentation.tex 32% 14/41 [Parameters of interest] --

17 Definition: a) Conditional performance at D n : [ { } ] 2 = E Y,X Y (t ) ˆR n (t X ) D n, b) Expected performance at sample size n : [ { } ]) 2 = E Dn (E Y,X Y (t ) ˆR n (t X ) D n. Efron & Tibshirani (1997): Note, however, that although the conditional error rate is often what we would like to obtain, none of the methods correlates very well with it on a sample by sample basis. --:--- datasplitting-presentation.tex 34% 14/41 [Parameters of interest] --

18 Decomposition of the expected prediction performance [ { } ]) 2 E Dn (E Y,X Y (t ) ˆR n (t X ) Dn [ = E X,Y {Y (t ) r n (t X )} 2] } {{ } Model accuracy } ] 2 +E Dn [E X {ˆR n (t X ) r n (t X ) } {{ } Model uncertainty } + E X,Y E Dn {ˆR n (t X ) r n (t X ) }{{} =0 Note: The model accuracy is the conditional error of the average model r n = E Dn R (D n ) at size n --:--- datasplitting-presentation.tex 36% 15/41 [Parameters of interest] --

19 Learning curve True performance Marty McFly Data generating model Useless model Average: across learning sets Conditional: single learning set Overfitting model Learning sample size --:--- datasplitting-presentation.tex 38% 16/41 [Parameters of interest] --

20 Overview - Cross-validation + Leave-one-out (LOOCV) + k-fold (repeated B times) + leave-k out (repeated random sub-sampling) - Bootstrap + Optimism corrected bootstrap [Efron -> Harrel] + Boostrap cross-validation + leave-one-out bootstrap bootsrap [Efron & Tibshirani 1997] + adjusted bootstrap [Jiang & Simon 2007] Note: All estimates are practically subject to Monte-Carlo variation (except perhaps LOOCV). --:--- datasplitting-presentation.tex 40% 17/41 [Estimation methods]

21 Leave-one-out cross-validation Denote ˆR n i = R (D n \ {(X i, Y i )}) for the model trained in the data without subject i : LOOCV = 1 {Y i (t ) ˆR n i (t X i )} 2. n i D n --:--- datasplitting-presentation.tex 42% 18/41 [Estimation methods]

22 Leave-one-out cross-validation Denote ˆR n i = R (D n \ {(X i, Y i )}) for the model trained in the data without subject i : LOOCV = 1 {Y i (t ) ˆR n i (t X i )} 2. n i D n Advantages: - We expect R (D n )(t x ) = ˆR n (t x ) ˆR i n (t x ). - Result does not depend on the random seed Disadvantage: - The whole model strategy has to be applied n times. - Bias? Variance? For which parameter? --:--- datasplitting-presentation.tex 44% 18/41 [Estimation methods]

23 K-fold cross-validation Split the data into K disjoint subsets D 1 n,..., D K n of approximately equal size and denote ˆR k n = R (D n \ D k n ) CV(K ) = 1 n K {Y i (t ) ˆR n k (t X i )} 2. k =1 i Dn k --:--- datasplitting-presentation.tex 46% 19/41 [Estimation methods]

24 K-fold cross-validation Split the data into K disjoint subsets D 1 n,..., D K n of approximately equal size and denote ˆR k n = R (D n \ D k n ) CV(K ) = 1 n K {Y i (t ) ˆR n k (t X i )} 2. k =1 i Dn k Advantages: - CV(10) is a frequently used procedure - Useful if model selection is time consuming - Has (asymptotic) oracle properties!? Disadvantage: - Usually high Monte-Carlo variation - Negative bias for performance at n --:--- datasplitting-presentation.tex 48% 19/41 [Estimation methods]

25 Bootstrap-cross-validation 1. Draw B bootstrap training data sets Db train from D n either with replacement of size m = n or without replacement of size m < n. 2. Fit the model in each training set: ˆR b train = R (Db train ) 3. Use the left-out data to compute the performance, then average. BootCV = 1 B B 1 n b =1 b i / D train b { } 2 Y i (t ) ˆR b train (t X i ) --:--- datasplitting-presentation.tex 50% 20/41 [Estimation methods]

26 Leave-one-out bootstrap 1. Draw B bootstrap training data sets Db train from D n either with replacement of size m = n or without replacement of size m < n. 2. Fit the model in each training set: ˆR b train = R (Db train ) 3. Use the left-out data to compute the performance, then average. LOOBOOT = 1 n n 1 K i =1 i b :i / D train b { } 2 Y i (t ) ˆR b train (t X i ) --:--- datasplitting-presentation.tex 52% 21/41 [Estimation methods]

27 Leave-one-out bootstrap Advantages - includes more model variability - is less variable than LOOCV Disadvantages - assesses the expected performance rather than the conditional performance - underestimates the performance at n - depends on the random seed unless B is large Notes - If B is small then LOOBOOT is preferable to BootCV - The bias depends on the slope of the learning curve --:--- datasplitting-presentation.tex 54% 22/41 [Estimation methods]

28 Decomposition of the leave-one-out bootstrap LOOBOOT = 1 n 1 { } 2 Y i (t ) rk train n K i (t X i ) i =1 i b :i / D b }{{} Here Estimated model accuracy + 1 n 1 } 2 {ˆR b train (X i ) rk train n K i (t X i ) i =1 i b :i / D b }{{} Estimated model uncertainty r train K i (t X i ) = 1 K i b :i / D b ˆR train b (t X i ). is the average prediction at X i of the bootstrap training models which did not include i. --:--- datasplitting-presentation.tex 56% 23/41 [Estimation methods]

29 Apparent performance The apparent performance (aka re-substitution performance) is obtained by validating the model in its own training data. App = 1 n i D n { } 2 Y i (t ) ˆR n (t X i ) Disadvantage: performance Overestimates the prediction --:--- datasplitting-presentation.tex 58% 24/41 [Estimation methods]

30 Optimism corrected bootstrap Two problems: 1. The Bootstrap cross-validation estimate underestimates the expected performance 2. the apparent performance overestimates the expected performance. A compromise: BootCV + ω(apperr BootCV) Remaining question: How to choose ω? --:--- datasplitting-presentation.tex 60% 25/41 [Estimation methods]

31 The.632+ bootstrap estimate Efron & Tibshirani: Define App BootCV ˆω =.632/ App NoInf }{{} Relative overfit Boot = BootCV + ˆω 632 +(App BootCV) The no-information performance assesses the overfitting by permutation NoInf = n j =1 i =1 n {Y i (t ) ˆR n (t X j )} 2 n 2 --:--- datasplitting-presentation.tex 62% 26/41 [Estimation methods]

32 Design For different sample sizes we simulate data that are "alike" the Copenhagen stroke study data based on parametric models for survival and censoring 1. In each sample we fit a Cox model after automated backward elimination. 2. We generate a huge independent test set ( records) to compute the conditional performance. 3. In each sample we compute LOOCV, App and using 1000 bootstrap samples also BootCv and the.632+ Steps 1-3 are repeated 360 times. --:--- datasplitting-presentation.tex 64% 27/41 [Simulation results]

33 Cost study based simulation results 65 % Learning curve and variation of conditional performance 50 % 45 % Learning sample size --:--- datasplitting-presentation.tex 66% 28/41 [Simulation results]

34 Apparent performance 65 % 50 % 45 % Learning sample size --:--- datasplitting-presentation.tex 68% 29/41 [Simulation results]

35 LOOCV versus Bootstrap % LOOCV boot % 45 % Learning sample size --:--- datasplitting-presentation.tex 70% 30/41 [Simulation results]

36 BootCv: portions Hiding (subsampling) different 65 % Subsample size for learning 50% 36.8% 20% 10% 50 % 45 % Learning sample size --:--- datasplitting-presentation.tex 72% 31/41 [Simulation results]

37 Next Guidelines & pitfalls: - comparison of risk prediction models does not always work - the superlearner - practical hints --:--- datasplitting-presentation.tex 74% 32/41 [Guidelines & pitfalls] ---

38 Comparison of risk prediction models We want to assess if ˆR n (2) has significantly better prediction performance than ˆR n (1). Define paired residual-differences: i (t ) = {Y i (t ) ˆR (1) n (t X i )} 2 {Y i (t ) ˆR (2) n (t X i )} 2 --:--- datasplitting-presentation.tex 76% 33/41 [Guidelines & pitfalls] ---

39 Comparison of risk prediction models We want to assess if ˆR n (2) has significantly better prediction performance than ˆR n (1). Define paired residual-differences: i (t ) = {Y i (t ) ˆR (1) n (t X i )} 2 {Y i (t ) ˆR (2) n (t X i )} 2 van de Wiel (2009) proposed a statistical test of H 0 : F (δ) + F ( δ) = 1, for all δ where i F for fixed training and test set. --:--- datasplitting-presentation.tex 78% 33/41 [Guidelines & pitfalls] ---

40 Comparison of risk prediction models We want to assess if ˆR n (2) has significantly better prediction performance than ˆR n (1). Define paired residual-differences: i (t ) = {Y i (t ) ˆR (1) n (t X i )} 2 {Y i (t ) ˆR (2) n (t X i )} 2 van de Wiel (2009) proposed a statistical test of H 0 : F (δ) + F ( δ) = 1, for all δ where i F for fixed training and test set. 1. use a paired test in each split, and report the median p-value 2. requires equal size of the validation sets! 3. There are some unsolved issues in right censored data. --:--- datasplitting-presentation.tex 80% 33/41 [Guidelines & pitfalls] ---

41 The.632+ bootstrap does not seem to work for random forest Copenhagen stroke study COX RF Estimation method BOOTCV 0.20 Prediction error Time --:--- datasplitting-presentation.tex 82% 34/41 [Guidelines & pitfalls] ---

42 The.632+ bootstrap does not seem to work for random forest Copenhagen stroke study COX RF Estimation method BOOTCV APP 0.20 Prediction error Time --:--- datasplitting-presentation.tex 84% 35/41 [Guidelines & pitfalls] ---

43 The.632+ bootstrap likes random forest Copenhagen stroke study COX RF Estimation method BOOTCV APP Prediction error Time --:--- datasplitting-presentation.tex 86% 36/41 [Guidelines & pitfalls] ---

44 The SuperLearner has oracle properties (van der Laan et al. 2007) Validation? Interpretation? Risk of an endless loop! --:--- datasplitting-presentation.tex 88% 37/41 [Guidelines & pitfalls] ---

45 Finally: some practical hints - Do many splits (or repeat cross-validation several times) to avoid Monte-Carlo variation - Repeat all model specification steps in each split - Use the same splits to compare modelling strategies - Prefer LOOBOOT over BOOTCV when model fitting is slow - Do not sample with replacement + if the learning algorithm uses cross-validation to select a hyper parameter + in high dimensions (see Binder & Schumacher) - UseR! my packages: + pec (continuous, survival, competing risks) + ModelGood (binary) --:--- datasplitting-presentation.tex 90% 38/41 [Guidelines & pitfalls] ---

46 Citations RJ Hyndman (blog): Every statistician knows that the model fit statistics are not a good guide to how well a model will predict. --:--- datasplitting-presentation.tex 92% 39/41 [Summary]

47 Citations RJ Hyndman (blog): Every statistician knows that the model fit statistics are not a good guide to how well a model will predict. J Shao (1993): Using the LOOCV method can be compared to using a telescope to see some objects (i.e. different models) 10,000 meters away, whereas using the BOOTCV method is more like using the same telescope to see the same objects only 100 meter away. --:--- datasplitting-presentation.tex 94% 39/41 [Summary]

48 Citations RJ Hyndman (blog): Every statistician knows that the model fit statistics are not a good guide to how well a model will predict. J Shao (1993): Using the LOOCV method can be compared to using a telescope to see some objects (i.e. different models) 10,000 meters away, whereas using the BOOTCV method is more like using the same telescope to see the same objects only 100 meter away. Efron & Tibshirani (1997): The same set of bootstrap replications that gives a point estimate of prediction error can also be used to assess the variability of that estimate. --:--- datasplitting-presentation.tex 96% 39/41 [Summary]

49 Last slide --:--- datasplitting-presentation.tex 98% 40/41 [Summary]

50 References 1. Shao (1993), Linear model selection by cross-validation, Journal of the American Statistical Association, 88, Dietterich (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural computation, 10(7), Efron & Tibshirani (1997). Improvement on cross-validation: The.632+ bootstrap method. Journal of the American Statistical Association, 92: van der Laan, Polley & Hubbard (2008) Super Learner, Statistical Applications of Genetics and Molecular Biology, 6, article Binder & Schumacher (2008). Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Statistical Applications in Genetics and Molecular Biology; 7: Article van de Wiel, Berkhof & van Wieringen (2009). Testing the prediction error difference between 2 predictors. Biostatistics, 10: Mogensen, Ishwaran & Gerds (2012). Evaluating random forests for survival analysis using prediction error curves. Journal of Statistical Software, 50(11): :--- datasplitting-presentation.tex Bot 41/41 [Summary]

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

A tool to demystify regression modelling behaviour

A tool to demystify regression modelling behaviour A tool to demystify regression modelling behaviour Thomas Alexander Gerds 1 / 38 Appetizer Every child knows how regression analysis works. The essentials of regression modelling strategy, such as which

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

Resampling Methods CAPT David Ruth, USN

Resampling Methods CAPT David Ruth, USN Resampling Methods CAPT David Ruth, USN Mathematics Department, United States Naval Academy Science of Test Workshop 05 April 2017 Outline Overview of resampling methods Bootstrapping Cross-validation

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation

More information

Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring

Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring Noname manuscript No. (will be inserted by the editor) Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring Thomas A. Gerds 1, Michael W Kattan

More information

Resampling techniques for statistical modeling

Resampling techniques for statistical modeling Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

An Introduction to Statistical Machine Learning - Theoretical Aspects -

An Introduction to Statistical Machine Learning - Theoretical Aspects - An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,

More information

Performance of Cross Validation in Tree-Based Models

Performance of Cross Validation in Tree-Based Models Performance of Cross Validation in Tree-Based Models Seoung Bum Kim, Xiaoming Huo, Kwok-Leung Tsui School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, Georgia 30332 {sbkim,xiaoming,ktsui}@isye.gatech.edu

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Instrumental variables estimation in the Cox Proportional Hazard regression model

Instrumental variables estimation in the Cox Proportional Hazard regression model Instrumental variables estimation in the Cox Proportional Hazard regression model James O Malley, Ph.D. Department of Biomedical Data Science The Dartmouth Institute for Health Policy and Clinical Practice

More information

PhD course: Statistical evaluation of diagnostic and predictive models

PhD course: Statistical evaluation of diagnostic and predictive models PhD course: Statistical evaluation of diagnostic and predictive models Tianxi Cai (Harvard University, Boston) Paul Blanche (University of Copenhagen) Thomas Alexander Gerds (University of Copenhagen)

More information

Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples. Harald Binder & Martin Schumacher

Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples. Harald Binder & Martin Schumacher Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples Harald Binder & Martin Schumacher Universität Freiburg i. Br. Nr. 100 December 2007 Zentrum für

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

More information

Comparison of Predictive Accuracy of Neural Network Methods and Cox Regression for Censored Survival Data

Comparison of Predictive Accuracy of Neural Network Methods and Cox Regression for Censored Survival Data Comparison of Predictive Accuracy of Neural Network Methods and Cox Regression for Censored Survival Data Stanley Azen Ph.D. 1, Annie Xiang Ph.D. 1, Pablo Lapuerta, M.D. 1, Alex Ryutov MS 2, Jonathan Buckley

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 282 Super Learner Based Conditional Density Estimation with Application to Marginal Structural

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Targeted Learning for High-Dimensional Variable Importance

Targeted Learning for High-Dimensional Variable Importance Targeted Learning for High-Dimensional Variable Importance Alan Hubbard, Nima Hejazi, Wilson Cai, Anna Decker Division of Biostatistics University of California, Berkeley July 27, 2016 for Centre de Recherches

More information

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths for New Developments in Nonparametric and Semiparametric Statistics, Joint Statistical Meetings; Vancouver, BC,

More information

UVA CS 4501: Machine Learning

UVA CS 4501: Machine Learning UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

R-squared for Bayesian regression models

R-squared for Bayesian regression models R-squared for Bayesian regression models Andrew Gelman Ben Goodrich Jonah Gabry Imad Ali 8 Nov 2017 Abstract The usual definition of R 2 (variance of the predicted values divided by the variance of the

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback University of South Carolina Scholar Commons Theses and Dissertations 2017 Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback Yanan Zhang University of South Carolina Follow

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements [Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers

More information

Decision Trees. Tirgul 5

Decision Trees. Tirgul 5 Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Analysis of MALDI-TOF Data: from Data Preprocessing to Model Validation for Survival Outcome

Analysis of MALDI-TOF Data: from Data Preprocessing to Model Validation for Survival Outcome Analysis of MALDI-TOF Data: from Data Preprocessing to Model Validation for Survival Outcome Heidi Chen, Ph.D. Cancer Biostatistics Center Vanderbilt University School of Medicine March 20, 2009 Outline

More information

Look before you leap: Some insights into learner evaluation with cross-validation

Look before you leap: Some insights into learner evaluation with cross-validation Look before you leap: Some insights into learner evaluation with cross-validation Gitte Vanwinckelen and Hendrik Blockeel Department of Computer Science, KU Leuven, Belgium, {gitte.vanwinckelen,hendrik.blockeel}@cs.kuleuven.be

More information

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation Patrick J. Heagerty PhD Department of Biostatistics University of Washington 166 ISCB 2010 Session Four Outline Examples

More information

Introduction to Machine Learning and Cross-Validation

Introduction to Machine Learning and Cross-Validation Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance

More information

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics 1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Probability and Statistical Decision Theory

Probability and Statistical Decision Theory Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes

More information

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

Estimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence

Estimating Explained Variation of a Latent Scale Dependent Variable Underlying a Binary Indicator of Event Occurrence International Journal of Statistics and Probability; Vol. 4, No. 1; 2015 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Estimating Explained Variation of a Latent

More information

Statistical Inference for Data Adaptive Target Parameters

Statistical Inference for Data Adaptive Target Parameters Statistical Inference for Data Adaptive Target Parameters Mark van der Laan, Alan Hubbard Division of Biostatistics, UC Berkeley December 13, 2013 Mark van der Laan, Alan Hubbard ( Division of Biostatistics,

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Known unknowns : using multiple imputation to fill in the blanks for missing data

Known unknowns : using multiple imputation to fill in the blanks for missing data Known unknowns : using multiple imputation to fill in the blanks for missing data James Stanley Department of Public Health University of Otago, Wellington james.stanley@otago.ac.nz Acknowledgments Cancer

More information

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?

Train the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test? Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

An Empirical Study of Building Compact Ensembles

An Empirical Study of Building Compact Ensembles An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Estimating Optimal Dynamic Treatment Regimes from Clustered Data

Estimating Optimal Dynamic Treatment Regimes from Clustered Data Estimating Optimal Dynamic Treatment Regimes from Clustered Data Bibhas Chakraborty Department of Biostatistics, Columbia University bc2425@columbia.edu Society for Clinical Trials Annual Meetings Boston,

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

Decision Trees & Random Forests

Decision Trees & Random Forests Decision Trees & Random Forests BUGS Meeting Daniel Pimentel-Alarcón Computer Science, GSU Decision Trees Goal: Predict Will I get El Cáncer? Will I develop Diabetes? Is my boyfriend/girlfriend cheating

More information

Multi-state models: prediction

Multi-state models: prediction Department of Medical Statistics and Bioinformatics Leiden University Medical Center Course on advanced survival analysis, Copenhagen Outline Prediction Theory Aalen-Johansen Computational aspects Applications

More information

Linear Regression 1 / 25. Karl Stratos. June 18, 2018

Linear Regression 1 / 25. Karl Stratos. June 18, 2018 Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Introduction. 19 April 2012 Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University September 28,

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Math 6330: Statistical Consulting Class 5

Math 6330: Statistical Consulting Class 5 Math 6330: Statistical Consulting Class 5 Tony Cox tcoxdenver@aol.com University of Colorado at Denver Course web site: http://cox-associates.com/6330/ What is a predictive model? The probability that

More information

Stochastic Gradient Descent. CS 584: Big Data Analytics

Stochastic Gradient Descent. CS 584: Big Data Analytics Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Individual Treatment Effect Prediction Using Model-Based Random Forests

Individual Treatment Effect Prediction Using Model-Based Random Forests Individual Treatment Effect Prediction Using Model-Based Random Forests Heidi Seibold, Achim Zeileis, Torsten Hothorn https://eeecon.uibk.ac.at/~zeileis/ Motivation: Overall treatment effect Base model:

More information

Supervised Learning via Decision Trees

Supervised Learning via Decision Trees Supervised Learning via Decision Trees Lecture 4 1 Outline 1. Learning via feature splits 2. ID3 Information gain 3. Extensions Continuous features Gain ratio Ensemble learning 2 Sequence of decisions

More information

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable

More information

UC Berkeley UC Berkeley Electronic Theses and Dissertations

UC Berkeley UC Berkeley Electronic Theses and Dissertations UC Berkeley UC Berkeley Electronic Theses and Dissertations Title Super Learner Permalink https://escholarship.org/uc/item/4qn0067v Author Polley, Eric Publication Date 2010-01-01 Peer reviewed Thesis/dissertation

More information

Prediction Performance of Survival Models

Prediction Performance of Survival Models Prediction Performance of Survival Models by Yan Yuan A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Doctor of Philosophy in Statistics Waterloo,

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17 Model selection I February 17 Remedial measures Suppose one of your diagnostic plots indicates a problem with the model s fit or assumptions; what options are available to you? Generally speaking, you

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information