Classification of Longitudinal Data Using Tree-Based Ensemble Methods

Size: px

Start display at page:

Download "Classification of Longitudinal Data Using Tree-Based Ensemble Methods"

Louise Moody
5 years ago
Views:

1 Classification of Longitudinal Data Using Tree-Based Ensemble Methods W. Adler, and B. Lausen

2 Overview 1 Ensemble classification of dependent observations 2 3 4

3 Classification of dependent observations common in medicine: dependent data paired organs longitudinal data / repeated measurements common for classification: usage of one observation per organ (e.g. randomly drawn eye) or examination of only newest observations when error estimation is performed correctly, all observations can be used (Brenning & Lausen, 2008)

4 Modified bootstrap examined classification method: bootstrap based tree ensembles bagged classification trees (bagging) (Breiman, 1996) random forest (Breiman, 2001) usually, dependency between observations is ignored when bootstrap samples are drawn this can have negative effects on classification performance extreme example: drawing of 2 N from N observations (correlation = 1) (1 exp( 2)) N = N expected different observations (in contrast to N different observations when drawing N from N) higher correlation between trees tree ensembles work when variation between single trees is high

5 Modified bootstrap modification: patient based generation of bootstrap samples (in contrast to observation based) learning data set L consists of N persons with 1 or more (repeated) measurements from the left and/or right eye: L = {(y L j(i) i, x L j(i) i, y R k(i) i, x R k(i) i ), i = 1,..., N, j(i) = 1,..., J i, k(i) = 1,..., K i } where: x s j i : p-dimensional predictor variable x s j i = (x s j i1,..., x s j ip ) Rp, s j {L 1,..., L Ji, R 1,..., R Ki } J i / K i : number of repeated measurements per person i and eye y s j i {0, 1}: class membership

6 Modified bootstrap base classifier of the ensemble: with ỹ {0, 1} C base ( x, L b ) : x ỹ the base classifier is trained with the bootstrap samples L b, b = 1,..., B, consisting of observations (y i, x i ),i=1,...,n these bootstrap samples are drawn following strategies τ 1 and τ 2 after N subjects are drawn with replacement

7 Modified bootstrap Strategy τ 1 : draw one randomly selected observation per drawn person L τ 1,b = {(y ν, x ν ), ν = 1,..., N} with p(l j ) = p(r k ) = 1 J iν +K iν, j = 1,..., J iν, k = 1,..., K iν Strategy τ 2 : draw all observations per drawn person L τ 2,b = {(y ν, x ν ), ν = 1,..., M} with M = i (J i + K i ), i : drawn subjects ensemble classification by majority voting ( ) Cτ ensemble ( x s j 1 B, L) = I C base ( x s j, L τ,b B ) > 1 2 b=1

8 Modified bootstrap often longitudinal observations over long time periods are less common than longitudinal observations over shorter time periods weighted bootstrap strategy τ 1a increases probability of drawing older observations w τ 1a = 1/(1 + e t ), where t is the time difference to the newest observation per subject progression of disease makes newer observations more typical weighted bootstrap strategy τ 1b increases probability of drawing newer observations w τ 1b = /(1 + e t )

9 Medical Problem Glaucoma: one of the most common causes for blindness worldwide important: early detection important diagnostic instrument: Heidelberg Retina Tomography depth images of the eye background calculation of geometric parameters for classification Erlangen glaucoma registry: longitudinal measurements of glaucoma patients and healthy controls

10 Data set 61 HRT variables from N=372 subjects (182 healthy controls, 190 glaucoma patients) 951 observations from 592 eyes (152 subjects with 1 eye, 220 subjects with 2 eyes; classes are equal for all observations of one subject) reference data set: newest observations of one randomly selected eye per subject (N ref = N = 372)

11 Longitudinal measurements

12 Number of examinations

13 Glaucoma classification examined classification methods: bagged classification trees (B=100 trees) random forest (B =1000 trees) with following strategies: training with reference data set training ignoring dependency between observations bootstrap strategies τ 1, τ 1a, τ 1b, τ 2 performance estimation: 50 bootstrap samples 20 replications classes: normal / glaucomatous (0 / 1) more detailed diagnoses: ocular hypertension ( o ), normal ( n ), preperimetric glaucoma ( p ), perimetric glaucoma ( g ); o / n and p / g are pooled for performance estimation (4 classes for training / 2 classes for testing)

14 : ROC analysis (RF)

15 : ROC analysis (RF)

16 : ROC analysis (RF)

17 : ROC analysis (RF)

18 : ROC analysis (RF)

19 : ROC analysis (RF)

20 : ROC analysis (Bagging)

21 : ROC analysis (RF) - subclasses

22 : ROC analysis (Bagging) - subclasses

23 : AUC (RF) Bootstrap estimation (B=50), 20 replications

24 : AUC (Bagging) Bootstrap estimation (B=50), 20 replications

25 : AUC (RF) - subclasses Bootstrap estimation (B=50), 20 replications

26 : AUC (Bagging) - subclasses Bootstrap estimation (B=50), 20 replications

27 classification performance can be increased when all observations of longitudinal data are used for training the classifier modified bootstrapping (drawing one observation per subject) increases classification performance and: reduces computational costs introduction of subclasses based on expert knowledge leads to increased classification performance in glaucoma detection with HRT variables

28 Adler W, Lausen B: Bootstrap estimated sensitivities, specificities and ROC curve. Computational Statistics & Data Analysis, Adler W, Brenning A, Potapov S, Schmid M, Lausen B: Ensemble Classification of Paired Data. Submitted. Breiman, L: Bagging Predictors. Machine Learning 26:123-40, 1996 Breiman, L: Random Forests. Machine Learning 45:5-32, 2001 Brenning A, Lausen B: Estimating error rates in the classification of paired organs. Statistics in Medicine, 2008.

29 Future research classification performance when progression is given without modelling the progression modelling the progression (e.g. bundling: modelling progression by an additional method which calculates parameters for the trees) classification using all observations per subject as one observation vector missing examinations varying time differences between examinations changing class membership over time iterations similar to boosting

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis