Classification of Longitudinal Data Using Tree-Based Ensemble Methods

Similar documents
Variance Reduction and Ensemble Methods

CS7267 MACHINE LEARNING

Diagnostics. Gad Kimmel

Statistical Machine Learning from Data

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

TDT4173 Machine Learning

Data Mining und Maschinelles Lernen

Machine Learning. Ensemble Methods. Manfred Huber

BAGGING PREDICTORS AND RANDOM FOREST

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensembles of Classifiers.

The Design and Analysis of Benchmark Experiments Part II: Analysis

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Notation P(Y ) = X P(X, Y ) = X. P(Y X )P(X ) Teorema de Bayes: P(Y X ) = CIn/UFPE - Prof. Francisco de A. T. de Carvalho P(X )

Distribution-Free Distribution Regression

REGRESSION TREE CREDIBILITY MODEL

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Classification using stochastic ensembles

2D1431 Machine Learning. Bagging & Boosting

Random Forests for Ordinal Response Data: Prediction and Variable Selection

Advanced Statistical Methods: Beyond Linear Regression

Ensembles. Léon Bottou COS 424 4/8/2010

1 Handling of Continuous Attributes in C4.5. Algorithm

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Lossless Online Bayesian Bagging

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University

Statistics and learning: Big Data

Machine Learning Linear Classification. Prof. Matteo Matteucci

Roman Hornung. Ordinal Forests. Technical Report Number 212, 2017 Department of Statistics University of Munich.

Neural Networks and Ensemble Methods for Classification

Statistical aspects of prediction models with high-dimensional data

Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS

Introduction to Machine Learning

Active Sonar Target Classification Using Classifier Ensembles

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Algorithm-Independent Learning Issues

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Bagging and Other Ensemble Methods

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Introduction to Supervised Learning. Performance Evaluation

Data Warehousing & Data Mining

Incorporating Boosted Regression Trees into Ecological Latent Variable Models

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Voting (Ensemble Methods)

An Empirical Study of Building Compact Ensembles

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

SF2930 Regression Analysis

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Lecture 3 Classification, Logistic Regression

Classifier performance evaluation

Sampling Strategies to Evaluate the Performance of Unknown Predictors

JEROME H. FRIEDMAN Department of Statistics and Stanford Linear Accelerator Center, Stanford University, Stanford, CA

A Bias Correction for the Minimum Error Rate in Cross-validation

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Review of some concepts in predictive modeling

A Brief Introduction to Adaboost

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

PATTERN RECOGNITION AND MACHINE LEARNING

On Multi-Class Cost-Sensitive Learning

TDT4173 Machine Learning

day month year documentname/initials 1

Boosting & Deep Learning

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

Learning with multiple models. Boosting.

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions

Oliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Performance Evaluation and Comparison

Gradient Boosting (Continued)

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Learning theory. Ensemble methods. Boosting. Boosting: history

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

On Multi-Class Cost-Sensitive Learning

Learning From Crowds. Presented by: Bei Peng 03/24/15

ABC random forest for parameter estimation. Jean-Michel Marin

Ensemble Methods and Random Forests

AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology

UVA CS 4501: Machine Learning

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )

1 Handling of Continuous Attributes in C4.5. Algorithm

ENSEMBLES OF DECISION RULES

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests: Finding Quasars

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Stat 602 Exam 1 Spring 2017 (corrected version)

Introduction to Logistic Regression

Ensemble Methods: Jay Hyer

Cross Validation & Ensembling

Gradient Boosting, Continued

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Transcription:

Classification of Longitudinal Data Using Tree-Based Ensemble Methods W. Adler, and B. Lausen 29.06.2009

Overview 1 Ensemble classification of dependent observations 2 3 4

Classification of dependent observations common in medicine: dependent data paired organs longitudinal data / repeated measurements common for classification: usage of one observation per organ (e.g. randomly drawn eye) or examination of only newest observations when error estimation is performed correctly, all observations can be used (Brenning & Lausen, 2008)

Modified bootstrap examined classification method: bootstrap based tree ensembles bagged classification trees (bagging) (Breiman, 1996) random forest (Breiman, 2001) usually, dependency between observations is ignored when bootstrap samples are drawn this can have negative effects on classification performance extreme example: drawing of 2 N from N observations (correlation = 1) (1 exp( 2)) N = 0.865 N expected different observations (in contrast to 0.632 N different observations when drawing N from N) higher correlation between trees tree ensembles work when variation between single trees is high

Modified bootstrap modification: patient based generation of bootstrap samples (in contrast to observation based) learning data set L consists of N persons with 1 or more (repeated) measurements from the left and/or right eye: L = {(y L j(i) i, x L j(i) i, y R k(i) i, x R k(i) i ), i = 1,..., N, j(i) = 1,..., J i, k(i) = 1,..., K i } where: x s j i : p-dimensional predictor variable x s j i = (x s j i1,..., x s j ip ) Rp, s j {L 1,..., L Ji, R 1,..., R Ki } J i / K i : number of repeated measurements per person i and eye y s j i {0, 1}: class membership

Modified bootstrap base classifier of the ensemble: with ỹ {0, 1} C base ( x, L b ) : x ỹ the base classifier is trained with the bootstrap samples L b, b = 1,..., B, consisting of observations (y i, x i ),i=1,...,n these bootstrap samples are drawn following strategies τ 1 and τ 2 after N subjects are drawn with replacement

Modified bootstrap Strategy τ 1 : draw one randomly selected observation per drawn person L τ 1,b = {(y ν, x ν ), ν = 1,..., N} with p(l j ) = p(r k ) = 1 J iν +K iν, j = 1,..., J iν, k = 1,..., K iν Strategy τ 2 : draw all observations per drawn person L τ 2,b = {(y ν, x ν ), ν = 1,..., M} with M = i (J i + K i ), i : drawn subjects ensemble classification by majority voting ( ) Cτ ensemble ( x s j 1 B, L) = I C base ( x s j, L τ,b B ) > 1 2 b=1

Modified bootstrap often longitudinal observations over long time periods are less common than longitudinal observations over shorter time periods weighted bootstrap strategy τ 1a increases probability of drawing older observations w τ 1a = 1/(1 + e t ), where t is the time difference to the newest observation per subject progression of disease makes newer observations more typical weighted bootstrap strategy τ 1b increases probability of drawing newer observations w τ 1b = 0.5 + 1/(1 + e t )

Medical Problem Glaucoma: one of the most common causes for blindness worldwide important: early detection important diagnostic instrument: Heidelberg Retina Tomography depth images of the eye background calculation of geometric parameters for classification Erlangen glaucoma registry: longitudinal measurements of glaucoma patients and healthy controls

Data set 61 HRT variables from N=372 subjects (182 healthy controls, 190 glaucoma patients) 951 observations from 592 eyes (152 subjects with 1 eye, 220 subjects with 2 eyes; classes are equal for all observations of one subject) reference data set: newest observations of one randomly selected eye per subject (N ref = N = 372)

Longitudinal measurements

Number of examinations

Glaucoma classification examined classification methods: bagged classification trees (B=100 trees) random forest (B =1000 trees) with following strategies: training with reference data set training ignoring dependency between observations bootstrap strategies τ 1, τ 1a, τ 1b, τ 2 performance estimation: 50 bootstrap samples 20 replications classes: normal / glaucomatous (0 / 1) more detailed diagnoses: ocular hypertension ( o ), normal ( n ), preperimetric glaucoma ( p ), perimetric glaucoma ( g ); o / n and p / g are pooled for performance estimation (4 classes for training / 2 classes for testing)

: ROC analysis (RF)

: ROC analysis (RF)

: ROC analysis (RF)

: ROC analysis (RF)

: ROC analysis (RF)

: ROC analysis (RF)

: ROC analysis (Bagging)

: ROC analysis (RF) - subclasses

: ROC analysis (Bagging) - subclasses

: AUC (RF) Bootstrap estimation (B=50), 20 replications

: AUC (Bagging) Bootstrap estimation (B=50), 20 replications

: AUC (RF) - subclasses Bootstrap estimation (B=50), 20 replications

: AUC (Bagging) - subclasses Bootstrap estimation (B=50), 20 replications

classification performance can be increased when all observations of longitudinal data are used for training the classifier modified bootstrapping (drawing one observation per subject) increases classification performance and: reduces computational costs introduction of subclasses based on expert knowledge leads to increased classification performance in glaucoma detection with HRT variables

Adler W, Lausen B: Bootstrap estimated sensitivities, specificities and ROC curve. Computational Statistics & Data Analysis, 2009. Adler W, Brenning A, Potapov S, Schmid M, Lausen B: Ensemble Classification of Paired Data. Submitted. Breiman, L: Bagging Predictors. Machine Learning 26:123-40, 1996 Breiman, L: Random Forests. Machine Learning 45:5-32, 2001 Brenning A, Lausen B: Estimating error rates in the classification of paired organs. Statistics in Medicine, 2008.

Future research classification performance when progression is given without modelling the progression modelling the progression (e.g. bundling: modelling progression by an additional method which calculates parameters for the trees) classification using all observations per subject as one observation vector missing examinations varying time differences between examinations changing class membership over time iterations similar to boosting