Correction for Tuning Bias in Resampling Based Error Rate Estimation

Size: px

Start display at page:

Download "Correction for Tuning Bias in Resampling Based Error Rate Estimation"

Gary Black
5 years ago
Views:

1 Correction for Tuning Bias in Resampling Based Error Rate Estimation Christoph Bernau & Anne-Laure Boulesteix Institute of Medical Informatics, Biometry and Epidemiology (IBE), Ludwig-Maximilians University, Munich, Germany Workshop on Validation in Statistics and Machine Learning Berlin / 24

2 1 Bias induced by tuning & classifier selection 2 Alternative approaches for bias correction 3 Simulation study 4 Outlook 2 / 24

3 Bias induced by tuning & classifier selection Bias induced by tuning & classifier selection 3 / 24

Bias induced by tuning & classifier selection Tuning & Classifier Selection a lot of different classifiers available for the data at hand no gold standard for classifier selection established in the

4 Bias induced by tuning & classifier selection Tuning & Classifier Selection a lot of different classifiers available for the data at hand no gold standard for classifier selection established in the case of highdimensional data (e.g. microarray data) selection of the classifier performed according to a specific performance measure, commonly obtained by resampling or bootstrap similar situation: optimization of tuning parameters, e.g. the cost parameter of support vector machines 4 / 24

5 Bias induced by tuning & classifier selection Raw Data Subsampling SVM PAM... 5NN LDA Iter. 1 e 11 e e 1,K 1 e 1K Iter. 2 e 12 e e 2,K 1 e 2K Iter. B 1 e B 1,1 e B 1,2... e B 1,K 1 e B 1,K Iter. B e B1 e B2... e B,K 1 e BK Average ē 1 ē 2... ē K 1 ē K e bk : test error of the kth clasifier in the bth resampling iteration ē k : average test error of classifier k in the whole resampling procedure 5 / 24

6 Bias induced by tuning & classifier selection Selection/Tuning Bias downward bias induced by selection/tuning process ([7]) authors in biomedical research inclined to report best performance only information on performance of all classifiers needed in order to avoid overoptimism Is there a way to use the information on the performance of other candidate classifiers in order to estimate the actual performance of the optimal classifier on independent data? 6 / 24

7 Bias induced by tuning & classifier selection Nested Cross-Validation [7] MCR nest = 1 B B b=1 e bk. (1) b two nested loops: inner tuning/selection loop and outer performance estimation loop performs an extra cross-validation on each training set of the outer loop in order to find the most appropriate model for the specific training set (k b ) mimicks the procedure that is actually applied to the whole data set computationally intensive 7 / 24

8 Alternative approaches for bias correction Alternative approaches for bias correction 8 / 24

9 Alternative approaches for bias correction Estimator by Tibshirani & Tibshirani [6] Bias = 1 B B Bias b = 1 B b=1 B ) (e bk e bk b b=1 (2) kb denotes the index corresponding to the classifier performing best on resampling test set b k denotes the index corresponding to the classifier performing best on the whole resampling procedure uses differences between locally and globally optimal classifiers on the different testsets 9 / 24

10 Alternative approaches for bias correction Our new weighted approach MCR WM = K w k ē k (3) k=1 weighted mean of the resampling error rates of all classifiers sensible bounds (worst and optimal MCR) computationally less expensive than NCV theoretical motivation (see slide 12) uses all the information obtained in the classifier selection/tuning process 10 / 24

11 Alternative approaches for bias correction Comparison of NCV and the weighted MCR approach Nested Cross-Validation New Approach Results in Outer Loop Inner Loop Outer Loop Results in Outer Loop Estimating Weights Combined MCR Smooth Weighted MCR Similar Result 11 / 24

12 Alternative approaches for bias correction Theoretical Motivation estimator for E Pn [ε(k (S) S)] rather than for a specific classifier S: whole sample, ɛ: true generalization error decompose the mean E Pn [ε(k (S) S)] into: K P (k (S) = k) E Pn [ε(k S) k (S) = k] k=1 K P (k (S) = k) E Pn (ε(k S)) k=1 crucial assumption: ε(k S) k (S) for each classifier 12 / 24

13 Alternative approaches for bias correction Estimating P(k (S) = k) using a parametric approach approximate the probabilities by a Monte Carlo simulation with normality assumption: (ē 1,..., ē K ) N(µ, Σ), (4) where µ and Σ are estimated from the matrix (e bk ) k=1...k b=1...b use these probablities as weights in the new estimator vector of mean MCRs (ē) plugged in as µ 13 / 24

Alternative approaches for bias correction The problem of variance estimation proof of nonexistence of an unbiased estimator ([1],[3]) several low biased variance estimators in the literature

14 Alternative approaches for bias correction The problem of variance estimation proof of nonexistence of an unbiased estimator ([1],[3]) several low biased variance estimators in the literature problem of dependencies between testsets good estimator for ρ(α, β) = Cor(ē b1 k, ē b2 k) required sensible estimator ([3]), if ˆρ(α, β) is provided: ( 1 B + ρ 1 ρ ) 1 B 1 B (e kb ē k ) 2 b=1 14 / 24

15 Alternative approaches for bias correction A new approach for estimating ρ simuation of several independent non-informative response vectors conditional and unconditional error rates known to be 0.5 simulation of R replicates (e.g. 1000) of the response vector with y i B(1, 0.5) for each r: computation of the average test errors for two resampling steps (ē r1 and ē r2 ) estimation of ρ by: ˆρ = Ĉov(ē 1, ē 2 ) Var(ē 1 ) Var(ē 2 ) (5) 15 / 24

16 Alternative approaches for bias correction Estimating ρ Results: Colon Resampling Colon CV rho knncma scdacma svmcma knncma dldacma pls_ldacma plrcma nad & RO rho knncma scdacma svmcma knncma dldacma pls_ldacma plrcma nad RO %Trainingsdaten number of folds n t n ([3]) as a good approximation which ignores the differences between classifiers 16 / 24

17 Simulation study Simulation study 17 / 24

18 Simulation study Setup seven different classifier algorithms (PAM ( = 0.5), linear SVM (cost = 50), knn (k = 1), knn (k = 18), DLDA, PLSLDA (3 components) and PLR (λ = 0.01) ) feature selection according to t-test three different real data sets (Golub, Colon, Singh) 100 resampling iterations with 0.63% or 0.8% and LOOCV 50 replications of the whole procedure in order to asses variability of the different bias correction methods 18 / 24

19 Simulation study Results Singh real data 0.8 Corrected MCR Sim Based Rho Nadeau Nested CV Raw Mean Tibshirani Min. / Max. MCR replication 19 / 24

20 Simulation study Results Golub real data LOOCV Golub noninformative data Corrected MCR Sim Based Rho Nadeau Nested CV Raw Mean Tibshirani Min. / Max. MCR replication Corrected MCR replication 20 / 24

21 Outlook Outlook 21 / 24

22 Outlook Outlook improvement of the estimate of Σ in MC-simulation better estimator for correlation between test sets (ρ) better estimator for correlations between classifiers evaluation of weighted mean approach on independent real data sets and further analysis on simulated data alternative approach: Generalized Degrees of Freedom [2,4,8] tries to correct apparent error [5] provides information on prediction stability for individual observations 22 / 24

23 Outlook Thank you for your attention 23 / 24

24 Outlook References 1 Yoshua Bengio and Yves Grandvalet. No unbiased estimator of the variance of K-Fold Cross-Validation. J. Mach. Learn. Res., 5: , Bradley Efron. The estimation of prediction error. Journal of the American Statistical Association, 99(467): , Claude Nadeau and Yoshua Bengio. Inference for the generalization error. Mach. Learn., 52(3): , Xiaotong Shen and Jianming Ye. Adaptive model selection. Journal of the American Statistical Association, 97(457): , Robert Tibshirani and Keith Knight. The covariance inflation criterion for adaptive model selection. J. ROY. STATIST. SOC. B, 55: , Ryan J Tibshirani and Robert Tibshirani. A bias correction for the minimum error rate in cross-validation. Annals of Applied Statistics 2009, Vol. 3, No. 2, , Sudhir Varma and Richard Simon. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7(1):91, Jianming Ye. On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93(441): , March / 24

Full versus incomplete cross-validation: measuring the impact of imperfect separation between training and test sets in prediction error estimation

Full versus incomplete cross-validation: measuring the impact of imperfect separation between training and test sets in prediction error estimation cross-validation: measuring the impact of imperfect separation between training and test sets in prediction error estimation IIM Joint work with Christoph Bernau, Caroline Truntzer, Thomas Stadler and