Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals

Size: px

Start display at page:

Download "Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals"

Edwina Tyler
6 years ago
Views:

1 Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals Kendrick Boyd 1 Kevin H. Eng 2 C. David Page 1 1 University of Wisconsin-Madison, Madison, WI 2 Roswell Park Cancer Institute, Buffalo, NY September 26, 2013 Boyd et al. (ECML 2013) Area under PR Curve September 26,

2 Binary Classification Evaluation Receiver-operating characteristic (ROC) curves Preferred over accuracy alone [Provost et al., 1998] Insensitive to skew (π = proportion of positives) Area under ROC curve (AUCROC) TPR FPR Boyd et al. (ECML 2013) Area under PR Curve September 26,

3 Binary Classification Evaluation Receiver-operating characteristic (ROC) curves Preferred over accuracy alone [Provost et al., 1998] Insensitive to skew (π = proportion of positives) Area under ROC curve (AUCROC) Precision-recall (PR) curves Alternative to ROC curves when π near 0 [Davis and Goadrich, 2006; Goadrich et al., 2006] Sensitive to skew Area under PR curve (AUCPR) TPR precision FPR recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

4 The Difficulty Given precision Do Estimate area: AUCPR = 0.5 Obtain confidence interval: [0.4, 0.6] recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

5 Empirical PR Points 5 positives (n) 15 negatives (m) π = 0.25 score true label pos 0.95 neg 0.90 neg 0.85 pos 0.80 pos 0.75 neg 0.70 neg 0.65 neg Boyd et al. (ECML 2013) Area under PR Curve September 26,

6 Empirical PR Points 5 positives (n) 15 negatives (m) π = 0.25 score true label pos 0.95 neg 0.90 neg 0.85 pos 0.80 pos 0.75 neg 0.70 neg 0.65 neg precision Recall = 1 5 Precision = recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

7 Empirical PR Points 5 positives (n) 15 negatives (m) π = 0.25 score true label pos 0.95 neg 0.90 neg 0.85 pos 0.80 pos 0.75 neg 0.70 neg 0.65 neg precision Recall = 1 5 Precision = recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

8 Empirical PR Points 5 positives (n) 15 negatives (m) π = 0.25 score true label pos 0.95 neg 0.90 neg 0.85 pos 0.80 pos 0.75 neg 0.70 neg 0.65 neg precision Recall = 1 5 Precision = recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

9 Empirical PR Points 5 positives (n) 15 negatives (m) π = 0.25 score true label pos 0.95 neg 0.90 neg 0.85 pos 0.80 pos 0.75 neg 0.70 neg 0.65 neg precision Recall = 2 5 Precision = recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

10 Empirical PR Points 5 positives (n) 15 negatives (m) π = 0.25 score true label pos 0.95 neg 0.90 neg 0.85 pos 0.80 pos 0.75 neg 0.70 neg 0.65 neg precision Recall = 3 5 Precision = recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

11 Empirical PR Points 5 positives (n) 15 negatives (m) π = 0.25 score true label pos 0.95 neg 0.90 neg 0.85 pos 0.80 pos 0.75 neg 0.70 neg 0.65 neg precision Recall = 3 5 Precision = recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

12 Empirical PR Points 5 positives (n) 15 negatives (m) π = 0.25 score true label pos 0.95 neg 0.90 neg 0.85 pos 0.80 pos 0.75 neg 0.70 neg 0.65 neg precision recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

13 AUCPR Estimators Many existing methods to estimate AUCPR Boyd et al. (ECML 2013) Area under PR Curve September 26,

14 AUCPR Estimators Many existing methods to estimate AUCPR 0.75 precision lower trap recall [Abeel et al., 2009] Boyd et al. (ECML 2013) Area under PR Curve September 26,

15 AUCPR Estimators Many existing methods to estimate AUCPR 0.75 precision lower trap upper trap recall [Abeel et al., 2009; Davis and Goadrich, 2006] Boyd et al. (ECML 2013) Area under PR Curve September 26,

16 AUCPR Estimators Many existing methods to estimate AUCPR 0.75 precision lower trap upper trap avg prec recall [Manning et al., 2008] Boyd et al. (ECML 2013) Area under PR Curve September 26,

17 AUCPR Estimators Many existing methods to estimate AUCPR precision lower trap upper trap avg prec interp conv recall [Davis and Goadrich, 2006] Boyd et al. (ECML 2013) Area under PR Curve September 26,

18 AUCPR Estimators Many existing methods to estimate AUCPR 0.75 lower trap upper trap precision avg prec interp conv interp max interp med interp mean recall Boyd et al. (ECML 2013) Area under PR Curve September 26,

19 AUCPR Estimators Many existing methods to estimate AUCPR precision recall lower trap upper trap avg prec interp conv interp max interp med interp mean binormal Boyd et al. (ECML 2013) Area under PR Curve September 26,

20 Estimator Evaluation Goals Estimator Desiderata Unbiased: average estimate is equal to true AUCPR Boyd et al. (ECML 2013) Area under PR Curve September 26,

21 Estimator Evaluation Goals Estimator Desiderata Unbiased: average estimate is equal to true AUCPR Robust to different output distributions Boyd et al. (ECML 2013) Area under PR Curve September 26,

22 Estimator Evaluation Goals Estimator Desiderata Unbiased: average estimate is equal to true AUCPR Robust to different output distributions Robust to various skews (π) and data sizes (n + m) Boyd et al. (ECML 2013) Area under PR Curve September 26,

23 Simulation Setup Assume example scores are drawn from known distributions Similar to binormal analysis on ROC curves [Pepe, 2004; Bamber, 1975] Allows calculation of true PR curve and AUCPR Boyd et al. (ECML 2013) Area under PR Curve September 26,

24 Simulation Setup Assume example scores are drawn from known distributions Similar to binormal analysis on ROC curves [Pepe, 2004; Bamber, 1975] Allows calculation of true PR curve and AUCPR Analyzed distributions Binormal density precision Binormal score recall negatives N(0, 1) positives N(1, 1) negatives positives Sample Curve True Curve Boyd et al. (ECML 2013) Area under PR Curve September 26,

25 Simulation Setup Assume example scores are drawn from known distributions Similar to binormal analysis on ROC curves [Pepe, 2004; Bamber, 1975] Allows calculation of true PR curve and AUCPR Analyzed distributions Binormal Bibeta density precision Bibeta score recall negatives Beta(2, 5) positives Beta(5, 2) negatives positives Sample Curve True Curve Boyd et al. (ECML 2013) Area under PR Curve September 26,

26 Simulation Setup Assume example scores are drawn from known distributions Similar to binormal analysis on ROC curves [Pepe, 2004; Bamber, 1975] Allows calculation of true PR curve and AUCPR Analyzed distributions Binormal Bibeta Offset uniform density precision Offset Uniform score recall negatives U(0, 1) positives U(0.5, 1.5) negatives positives Sample Curve True Curve Boyd et al. (ECML 2013) Area under PR Curve September 26,

27 Simulation Setup Assume example scores are drawn from known distributions Similar to binormal analysis on ROC curves [Pepe, 2004; Bamber, 1975] Allows calculation of true PR curve and AUCPR Analyzed distributions Binormal Bibeta Offset uniform Additional parameters # of examples (n + m) skew (π = 0.1) Boyd et al. (ECML 2013) Area under PR Curve September 26,

28 AUCPR Estimator Results 1.4 binormal 1.06 bibeta bias ratio (n=10) offset uniform 1000 (n=100) (n=1000) n + m interp conv upper trap interp max avg prec lower trap interp mean interp med binormal Boyd et al. (ECML 2013) Area under PR Curve September 26,

29 AUCPR Estimator Results 1.4 binormal 1.06 bibeta bias ratio (n=10) offset uniform 1000 (n=100) (n=1000) n + m interp conv upper trap interp max avg prec lower trap interp mean interp med binormal Boyd et al. (ECML 2013) Area under PR Curve September 26,

30 AUCPR Confidence Intervals Definition A 1 α confidence interval is an interval that contains the true value with probability at least (1 α). [Wasserman, 2004] Boyd et al. (ECML 2013) Area under PR Curve September 26,

31 AUCPR Confidence Intervals Definition A 1 α confidence interval is an interval that contains the true value with probability at least (1 α). [Wasserman, 2004] Empirical Cross-validation: compute interval using mean and variance of K estimates from K-fold cross-validation Boyd et al. (ECML 2013) Area under PR Curve September 26,

32 AUCPR Confidence Intervals Definition A 1 α confidence interval is an interval that contains the true value with probability at least (1 α). [Wasserman, 2004] Empirical Cross-validation: compute interval using mean and variance of K estimates from K-fold cross-validation Bootstrap: choose interval that contains (1 α)% of empirical distribution of AUCPR estimates Boyd et al. (ECML 2013) Area under PR Curve September 26,

33 AUCPR Confidence Intervals Definition A 1 α confidence interval is an interval that contains the true value with probability at least (1 α). [Wasserman, 2004] Empirical Cross-validation: compute interval using mean and variance of K estimates from K-fold cross-validation Bootstrap: choose interval that contains (1 α)% of empirical distribution of AUCPR estimates Parametric Binomial: ˆθ ± Φ 1 α/2 ˆθ(1 ˆθ) n ˆθ is the estimated AUCPR Boyd et al. (ECML 2013) Area under PR Curve September 26,

34 AUCPR Confidence Intervals Definition A 1 α confidence interval is an interval that contains the true value with probability at least (1 α). [Wasserman, 2004] Empirical Cross-validation: compute interval using mean and variance of K estimates from K-fold cross-validation Bootstrap: choose interval that contains (1 α)% of empirical distribution of AUCPR estimates Parametric Binomial: ˆθ ± Φ ˆθ(1 ˆθ) 1 α/2 n ˆθ is the[ estimated AUCPR ] ˆη Φ(1 α/2) ˆτ e e Logit:, ˆη+Φ(1 α/2) ˆτ 1+e ˆη Φ(1 α/2) ˆτ 1+e ˆη+Φ(1 α/2) ˆτ ˆη = log ˆθ, ˆτ = (nˆθ(1 1/2 ˆθ)) 1 ˆθ Boyd et al. (ECML 2013) Area under PR Curve September 26,

35 Confidence Interval Evaluation Goals Confidence Interval Desiderata Valid - at least (1 α)% coverage Boyd et al. (ECML 2013) Area under PR Curve September 26,

36 Confidence Interval Evaluation Goals Confidence Interval Desiderata Valid - at least (1 α)% coverage Prefer narrower (but still valid) Boyd et al. (ECML 2013) Area under PR Curve September 26,

37 Confidence Interval Evaluation Goals Confidence Interval Desiderata Valid - at least (1 α)% coverage Prefer narrower (but still valid) Robust to different output distributions Boyd et al. (ECML 2013) Area under PR Curve September 26,

38 Confidence Interval Evaluation Goals Confidence Interval Desiderata Valid - at least (1 α)% coverage Prefer narrower (but still valid) Robust to different output distributions Robust to various skews (π) and data sizes (n + m) Boyd et al. (ECML 2013) Area under PR Curve September 26,

39 AUCPR Confidence Interval Results binormal bibeta offset uniform 0.95 coverage (n=20) 1000 (n=100) (n=500) (n=20) 1000 (n=100) n+m (n=500) (n=20) 1000 (n=100) 5000 (n=500) estimator lower trap interval cross validation Boyd et al. (ECML 2013) Area under PR Curve September 26,

40 AUCPR Confidence Interval Results binormal bibeta offset uniform 0.95 coverage (n=20) 1000 (n=100) (n=500) (n=20) 1000 (n=100) n+m (n=500) (n=20) 1000 (n=100) 5000 (n=500) estimator avg prec lower trap interp med interval cross validation Boyd et al. (ECML 2013) Area under PR Curve September 26,

41 AUCPR Confidence Interval Results coverage binormal bibeta offset uniform (n=20) 1000 (n=100) (n=500) (n=20) 1000 (n=100) n+m (n=500) (n=20) 1000 (n=100) 5000 (n=500) estimator avg prec lower trap interp med interval logit binomial bootstrap cross validation Boyd et al. (ECML 2013) Area under PR Curve September 26,

42 Conclusion Choice of AUCPR estimator and confidence interval is important Particularly for small data sets Boyd et al. (ECML 2013) Area under PR Curve September 26,

43 Conclusion Choice of AUCPR estimator and confidence interval is important Particularly for small data sets Recommended estimators Lower trapezoid Average precision Interpolated median Boyd et al. (ECML 2013) Area under PR Curve September 26,

44 Conclusion Choice of AUCPR estimator and confidence interval is important Particularly for small data sets Recommended estimators Lower trapezoid Average precision Interpolated median Recommended confidence intervals Binomial Logit Boyd et al. (ECML 2013) Area under PR Curve September 26,

45 Conclusion Choice of AUCPR estimator and confidence interval is important Particularly for small data sets Recommended estimators Lower trapezoid Average precision Interpolated median Recommended confidence intervals Binomial Logit What about cross-validation and bootstrap? Converge to proper coverage, but from below Problematic for small data sets and low numbers of positive examples Boyd et al. (ECML 2013) Area under PR Curve September 26,

46 Thank You Questions? Acknowledgments NIGMS grant R01GM NLM grant R01LM UW Carbone Cancer Center ICTR NIH NCA TS grant UL1TR CIBM Training Program grant 5T15LM Roswell Park Cancer Institute NCI grant P30 CA Boyd et al. (ECML 2013) Area under PR Curve September 26,

47 AUCPR Estimators Results by π binormal bibeta bias ratio 2 offset uniform 0.8 interp conv lower trap 0.75 upper trap interp mean (n=10) 0.1 (n=100) 0.5 (n=500) π interp max avg prec interp med binormal Boyd et al. (ECML 2013) Area under PR Curve September 26,

48 AUCPR Confidence Interval Widths coverage binormal bibeta offset uniform width ratio estimator avg prec lower trap interp med interval logit binomial bootstrap cross validation Boyd et al. (ECML 2013) Area under PR Curve September 26,

49 AUCPR Confidence Interval Locations bias ratio binormal bibeta offset uniform (n=20) 1000 (n=100) 5000 (n=500) 200 (n=20) (n=100) (n=500) n+m 200 (n=20) 1000 (n=100) 5000 (n=500) estimator avg prec lower trap interp med interval binomial bootstrap cross validation Boyd et al. (ECML 2013) Area under PR Curve September 26,

50 References I Thomas Abeel, Yves Van de Peer, and Yvan Saeys. Toward a gold standard for promoter prediction evaluation. Bioinformatics, 25(12):i313 i320, Donald Bamber. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4): , Jesse Davis and Mark Goadrich. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine learning, ICML 06, pages , New York, NY, USA, ACM. Mark Goadrich, Louis Oliphant, and Jude Shavlik. Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves. Machine Learning, 64: , Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, Margaret Sullivan Pepe. The statistical evaluation of medical tests for classification and prediction. Oxford University Press, USA, Boyd et al. (ECML 2013) Area under PR Curve September 26,

51 References II Foster J Provost, Tom Fawcett, and Ron Kohavi. The case against accuracy estimation for comparing induction algorithms. In ICML, volume 98, pages , Larry Wasserman. All of statistics: A concise course in statistical inference. Springer Verlag, Boyd et al. (ECML 2013) Area under PR Curve September 26,

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation Kendrick Boyd Vitor Santos Costa Jesse Davis David Page May 30, 2012 Abstract Precision-recall (PR) curves and the areas