Brieman, L., Friedman, J., Olshen, R. and Stone, C., (1984), Classification and Regression Trees, Wadworth, Belmont, CA.

Size: px
Start display at page:

Download "Brieman, L., Friedman, J., Olshen, R. and Stone, C., (1984), Classification and Regression Trees, Wadworth, Belmont, CA."

Transcription

1 Bibliography: Brieman, L., Friedman, J., Olshen, R. and Stone, C., (1984), Classification and Regression Trees, Wadworth, Belmont, CA. Breiman, L, (1999, )Random Forests Statistics Department, University California Berkley CA Brieman, L.,(2001), Statistical modeling: The two cultures, Statist. Sci. 16, No. 3, Brieman, L.,(2005) Correspondence. Gamel, J., McLean, I. and Greenberg, R., (1988), Interval-by-interval Cox model analysis of 3680 cases of intraocular melanomas Shows a decline in the prognostic value of size and cell type over time and tumor excision, Cancer 61: Gamel, J., Greenberg, R. and McLean I., (1998), A stable linear algorithm for fitting the lognormal model to survival data, Computers and Biomedical Research No31:38-47 Gamel, J., George, S., Edwards, M. and Seigler, H., (2002),The long-term clinical course of patients with cutaneous melanoma, Cancer, 95, No. 6: Hofstadter, D., (1979), Godel, Escher, Bach: an Eternal Golden Braid, Basic Books Inc, New York Seigler (2005) Personal Communication Slingluff, C., Vollmer, R., Reintgen, D. and Seigler, H. Lethal Thin malignant melanoma: Identifying patients at risk, Ann. Surg. 1988;208: No2, Stadelmann, W., Rapaport, D., Soong, S. et al., (1998), Prognostic factors that influence melanoma outcome. In; Balch, C.,Houghton, A., Sober,A., eds. Cutaneous Melanoma, 3 rd ed.,st Louis, MO: Quality Medical Publishing; Venables, W. and Ripley, B. Modern Applied Statistics with S-PLUS, Springer-Verlag, New York, Inc. Vollmer, R. and Seigler, H., (2001A), A model for pretest probability of lymph node metastasis from cutaneous melanoma, Am. J. Clin. Pathol. 114: Vollmer, R. and Seigler, H., (2001B), Using a continuous transformation of the Breslow thickness for prognosis in cutaneous melanoma, Am. J. Clin. Pathol. 115:

2 R 2.01 (2005), The R Foundation for Statistical Computing, cran@r-project.org Packages Sarkar, D. (2004) Lattice Graphics, Implementation of Trellis Graphics Breiman, L., Cutler, A., Liaw, A. and Wiener, M. randomforest:,(2005) Breiman and Cutler s random forests for classification and regres Ripley, B. (2005) tree: Classification and regression trees 61

3 APPENDIX 62

4 deviance Fig.1 Cross Validation Deviance Plot size Inf misclass Fig.2 Cross Validation Misclassification Plot size 63

5 Fig.3. Plot of Full Tree without Text, All Patients, All Variables and Any Recurrence 64

6 dextent:abc stggrp:ab typcas:a Fig.4. Plot of Tree All Patients, All Variables and Any Recurrence. Pruned Tree k=30 dextent:abc stggrp:ab stggrp:bc typcas:a prisite:bcde clark:f anyimm:a side:b prisite:abcfghi hist:abcegh clark:ac AGE < 37.2 prisite:bcg anyimm:a clark:bc clark:bce AGE < Fig.5. Plot of Tree All Patients All Variables and Any Recurrence Pruned Tree k=8 65

7 MeanDecreaseAccurac MeanDecreaseAccurac dextent stggrp hist histgrp clark satel race prisite ulcer AGE THICK side typcas sex anyimm dextent stggrp histgrp hist race satel prisite clark AGE ulcer side typcas sex anyimm THICK Importance Importance Fig.6. The Variable Importance Plot of All Patients, All Variables and Any Recurrence (L). The Variable Importance Plot of All Patients, All Variables and Recurrence More Than Local (R). 66

8 MeanDecreaseAccurac MeanDecreaseAccurac prisite hist histgrp clark race side ulcer AGE sex anyimm THICK satel prisite AGE hist race side histgrp ulcer sex anyimm satel clark THICK Importance Importance Fig.7. The Variable Importance Plot of All Patients, Leave Out STGGRP, DEXTEXT and TYPCAS. Any Recurrence (L). The Variable Importance Plot of All Patients, Leave Out STGGRP, DEXTEXT and TYPCAS, Recurrence More Than Local(R) 67

9 MeanDecreaseAccuracy MeanDecreaseAccurac dextent hist histgrp clark AGE race prisite ulcer satel side THICK typcas sex anyimm stggrp dextent hist histgrp ulcer race side prisite satel sex stggrp anyimm AGE THICK clark Importance Importance Fig.8. The Variable Importance Plot of Limited Patients, All Variables and Any Recurrence (L). The Variable Importance Plot of Limited Patients, All Variables and Recurrence More Than Local (R) 68

10 TEXT FULL TREE ALL PATIENTS ALL VARIABLES ANY RECURRENCE 1) root ( ) 2) dextent: 0,1, ( ) 4) stggrp: 0, ( ) 8) typcas: ( ) 16) anyimm: ( ) 32) prisite: 1,2,3,6,7,12, ( ) 64) AGE < ( ) 128) satel: ( ) * 129) satel: ( ) 258) AGE < ( ) 516) AGE < ( ) * 517) AGE > ( ) * 259) AGE > ( ) * 65) AGE > ( ) 130) AGE < ( ) * 131) AGE > ( ) 262) histgrp: 1,2,3, ( ) 524) prisite: 1,2,3,7, ( ) 1048) AGE < ( ) 2096) AGE < ( ) * 2097) AGE > ( ) 4194) side: ( ) * 4195) side: ( ) * 1049) AGE > ( ) * 525) prisite: ( ) 1050) sex: ( ) * 1051) sex: ( ) 2102) AGE < ( ) * 2103) AGE > ( ) * 263) histgrp: 4, ( ) 526) sex: ( ) 1052) AGE < ( ) * 1053) AGE > ( ) * 527) sex: ( ) * 33) prisite: 4, ( ) 66) histgrp: 2,4, ( ) 132) AGE < ( ) * 133) AGE > ( ) * 67) histgrp: 1, ( ) 134) clark: ( ) * 135) clark: 1, ( ) * 17) anyimm: ( ) 34) hist: 1,2,3,6,12, ( ) 68) prisite: 2,3, ( ) 136) clark: 2,4,5, ( ) 272) side: ( ) Fig.9. Text Full Tree All Patients All Variables Any Recurrence 69

11 544) prisite: 2, ( ) * 545) prisite: ( ) 1090) THICK < ( ) * 1091) THICK > ( ) 2182) THICK < ( ) * 2183) THICK > ( ) 4366) AGE < ( ) * 4367) AGE > ( ) 8734) AGE < ( ) * 8735) AGE > ( ) * 273) side: ( ) * 137) clark: 1, ( ) 274) THICK < ( ) 548) sex: ( ) * 549) sex: ( ) 1098) AGE < ( ) 2196) AGE < ( ) * 2197) AGE > ( ) * 1099) AGE > ( ) * 275) THICK > ( ) 550) THICK < ( ) * 551) THICK > ( ) 1102) histgrp: 1,3, ( ) * 1103) histgrp: ( ) 2206) AGE < ( ) 4412) AGE < ( ) 8824) AGE < ( ) 17648) AGE < ( ) 35296) AGE < ( ) 70592) side: ( ) ) THICK < ( ) ) satel: 1, ( ) * ) satel: ( ) * ) THICK > ( ) ) ulcer: 1, ( ) ) THICK < ( ) * ) THICK > ( ) ) THICK < ( ) * ) THICK > ( ) * ) ulcer: ( ) ) THICK < ( ) * ) THICK > ( ) * 70593) side: ( ) ) THICK < ( ) ) THICK < ( ) ) THICK < ( ) * ) THICK > ( ) ) AGE < ( ) * ) AGE > ( ) * Fig.9. cont 70

12 282373) THICK > ( ) * ) THICK > ( ) ) THICK < ( ) ) AGE < ( ) * ) AGE > ( ) * ) THICK > ( ) ) ulcer: 2, ( ) ) sex: ( ) ) THICK < ( ) * ) THICK > ( ) ) THICK < ( ) ) THICK < ( ) ) AGE < ( ) * ) AGE > ( ) * ) THICK > ( ) * ) THICK > ( ) * ) sex: ( ) ) THICK < ( ) ) AGE < ( ) * ) AGE > ( ) ) AGE < ( ) ) AGE < ( ) ) AGE < ( ) ) AGE > ( ) * ) AGE > ( ) * ) AGE > ( ) ) AGE < ( ) * ) AGE > ( ) * ) THICK > ( ) * ) ulcer: ( ) * 35297) AGE > ( ) * 17649) AGE > ( ) * 8825) AGE > ( ) * 4413) AGE > ( ) * 2207) AGE > ( ) 4414) AGE < ( ) * 4415) AGE > ( ) * 69) prisite: 1,6,12,13, ( ) 138) clark: 2,3, ( ) 276) prisite: 1,13, ( ) 552) AGE < ( ) * 553) AGE > ( ) 1106) AGE < ( ) 2212) prisite: ( ) 4424) histgrp: ( ) 8848) clark: ( ) * 8849) clark: ( ) * 4425) histgrp: ( ) * 2213) prisite: ( ) * Fig.9. cont 71

13 1107) AGE > ( ) 2214) side: 0, ( ) 4428) THICK < ( ) 8856) AGE < ( ) * 8857) AGE > ( ) 17714) AGE < ( ) 35428) THICK < ( ) * 35429) THICK > ( ) 70858) AGE < ( ) ) AGE < ( ) ) sex: ( ) * ) sex: ( ) * ) AGE > ( ) * 70859) AGE > ( ) * 17715) AGE > ( ) * 4429) THICK > ( ) 8858) histgrp: 1,2,3, ( ) 17716) AGE < ( ) 35432) AGE < ( ) 70864) sex: ( ) * 70865) sex: ( ) ) AGE < ( ) * ) AGE > ( ) ) histgrp: ( ) * ) histgrp: ( ) ) AGE < ( ) * ) AGE > ( ) * 35433) AGE > ( ) 70866) AGE < ( ) * 70867) AGE > ( ) ) AGE < ( ) * ) AGE > ( ) * 17717) AGE > ( ) 35434) AGE < ( ) * 35435) AGE > ( ) 70870) AGE < ( ) * 70871) AGE > ( ) * 8859) histgrp: ( ) * 2215) side: ( ) 4430) AGE < ( ) 8860) AGE < ( ) 17720) THICK < ( ) 35440) sex: ( ) 70880) AGE < ( ) * 70881) AGE > ( ) ) THICK < ( ) * ) THICK > ( ) * 35441) sex: ( ) 70882) AGE < ( ) * Fig.9. cont 72

14 70883) AGE > ( ) ) AGE < ( ) ) ulcer: ( ) * ) ulcer: 1, ( ) * ) AGE > ( ) * 17721) THICK > ( ) * 8861) AGE > ( ) 17722) prisite: ( ) 35444) satel: ( ) * 35445) satel: ( ) 70890) AGE < ( ) * 70891) AGE > ( ) * 17723) prisite: ( ) * 4431) AGE > ( ) 8862) sex: ( ) 17724) AGE < ( ) 35448) AGE < ( ) 70896) AGE < ( ) ) ulcer: 2, ( ) ) prisite: 1, ( ) * ) prisite: ( ) ) AGE < ( ) * ) AGE > ( ) * ) ulcer: ( ) * 70897) AGE > ( ) * 35449) AGE > ( ) * 17725) AGE > ( ) 35450) AGE < ( ) 70900) AGE < ( ) * 70901) AGE > ( ) ) AGE < ( ) * ) AGE > ( ) * 70901) AGE > ( ) ) AGE < ( ) * ) AGE > ( ) * 35451) AGE > ( ) * 8863) sex: ( ) * 277) prisite: 6, ( ) 554) AGE < ( ) * 555) AGE > ( ) 1110) THICK < ( ) * 1111) THICK > ( ) * 139) clark: 1, ( ) 278) AGE < ( ) * 279) AGE > ( ) 558) clark: ( ) 1116) ulcer: 1, ( ) * 1117) ulcer: ( ) * 559) clark: ( ) Fig.9. cont 73

15 1118) ulcer: ( ) 2236) THICK < ( ) 4472) histgrp: 1,2, ( ) 8944) prisite: 1,12, ( ) 17888) AGE < ( ) 35776) sex: ( ) 71552) THICK < ( ) ) THICK < ( ) * ) THICK > ( ) ) AGE < ( ) * ) AGE > ( ) * 71553) THICK > ( ) * 35777) sex: ( ) * 17889) AGE > ( ) * 8945) prisite: ( ) * 4473) histgrp: ( ) * 2237) THICK > ( ) 4474) AGE < ( ) * 4475) AGE > ( ) 8950) prisite: 1, ( ) * 8951) prisite: ( ) * 1119) ulcer: 1, ( ) 2238) sex: ( ) * 2239) sex: ( ) * 35) hist: 4,10, ( ) 70) clark: 2,4, ( ) 140) THICK < ( ) * 141) THICK > ( ) * 9) typcas: ( ) 18) side: ( ) 36) clark: 1, ( ) 72) AGE < ( ) * 73) AGE > ( ) 146) hist: 2, ( ) 292) THICK < ( ) 584) AGE < ( ) * 585) AGE > ( ) * 293) THICK > ( ) 586) AGE < ( ) 1172) sex: ( ) * 1173) sex: ( ) * 587) AGE > ( ) * 147) hist: 3, ( ) * 37) clark: 2, ( ) 74) anyimm: ( ) * 75) anyimm: ( ) * 19) side: 0, ( ) 38) AGE < ( ) 76) clark: 2, ( ) Fig.9. cont 74

16 152) AGE < ( ) * 153) AGE > ( ) * 77) clark: 1, ( ) * 39) AGE > ( ) 78) AGE < ( ) 156) AGE < ( ) * 157) AGE > ( ) 314) prisite: 3, ( ) * 315) prisite: 1, ( ) 630) AGE < ( ) * 631) AGE > ( ) * 79) AGE > ( ) * 5) stggrp: 4, ( ) 10) prisite: 2,3,4, ( ) 20) THICK < ( ) 40) sex: ( ) 80) clark: 2,3, ( ) * 81) clark: ( ) * 41) sex: ( ) * 21) THICK > ( ) * 11) prisite: 1,12,13, ( ) 22) satel: ( ) 44) prisite: 1, ( ) 88) AGE < ( ) 176) AGE < ( ) 352) AGE < ( ) 704) THICK < ( ) * 705) THICK > ( ) 1410) AGE < ( ) * 1411) AGE > ( ) 2822) THICK < ( ) 5644) anyimm: ( ) * 5645) anyimm: ( ) 11290) AGE < ( ) 22580) AGE < ( ) * 22581) AGE > ( ) * 11291) AGE > ( ) * 2823) THICK > ( ) * 353) AGE > ( ) * 177) AGE > ( ) * 89) AGE > ( ) 178) AGE < ( ) * 179) AGE > ( ) 358) AGE < ( ) * 359) AGE > ( ) 718) AGE < ( ) * 719) AGE > ( ) * 45) prisite: 12, ( ) * 23) satel: 1, ( ) * Fig.9. cont 75

17 3) dextent: 3,4,5,6, ( ) 6) stggrp: 1, ( ) 12) prisite: ( ) 24) AGE < ( ) * 25) AGE > ( ) * 13) prisite: 2,3,13, ( ) * 7) stggrp: ( ) 14) clark: ( ) 28) prisite: ( ) * 29) prisite: 2, ( ) * 15) clark: 1,2,3,4, ( ) * Fig.9. cont 76

18 Table Consecutive Random Forests 10 Trees SAMPLE # MTRY 4 VOTES VOTES N Y N Y TOTAL PERCENT ERROR Table Consecutive Random Forests 25 Trees SAMPLE # MTRY 4 VOTES VOTES N Y N Y TOTAL PERCENT ERROR 77

19 Table Consecutive Random Forests 50 Trees SAMPLE # MTRY 4 VOTES VOTES N Y N Y TOTAL PERCENT ERROR Table Consecutive Random Forests 100 Trees SAMPLE # MTRY 4 VOTES VOTES N Y N Y TOTAL PERCENT ERROR 78

20 Table Consecutive Random Forests 200 Trees SAMPLE # MTRY 4 VOTES VOTES N Y N Y TOTAL PERCENT ERROR Table Consecutive Random Forests 300 Trees SAMPLE # MTRY 4 VOTES VOTES N Y N Y TOTAL PERCENT ERROR 79

21 Table Consecutive Random Forests 500 Trees SAMPLE # MTRY 4 VOTES VOTES N Y N Y TOTAL PERCENT ERROR Table Consecutive Random Forests 1000 Trees MTRY 4 SAMPLE # VOTES VOTES TOTAL PERCENT N Y N Y ERROR

22 Table 9. Consecutive Larger Trees NUMBER TREES 5 Consecutive runs 2000 trees 3 Consecutive runs 3000 trees 2 Consecutive runs 5000 trees MTRY 4 VOTES VOTES N Y N Y TOTAL PERCENT ERROR 81

23 Table 10. Single Tree Results All Patients All Variables Any Recurrence SINGLE TREE RESULTS ALL PATIENTS ALL VARIABLES ANY RECURRENCE TOTAL SET TOTAL PERCENT NUMBER OF ACTUAL, AND PERCENT IN RANDOM SAMPLES PERCENT PERCENT PREDICTED WITH >=.50 IS >=.15 IS >=.10 IS >=.05 IS THIS IS RESULTS WITH >=.50 FULL TREE k=5 k=30 of of TOTAL THIS IS RESULTS WITH >=.15 FULL TREE k=5 k=30 of of TOTAL THIS IS RESULTS WITH >=.10 FULL TREE k=5 k=30 of of TOTAL THIS IS RESULTS WITH >=.05 FULL TREE k=5 k=30 of of TOTAL

24 Table 11. Single Tree Results All Patients All Variables Recurrence More Than Local SINGLE TREE RESULTS ALL PATIENTS ALL VARIABLES RECURRENCE MORE THAN LOCAL TOTAL SET TOTAL PERCENT NUMBER OF ACTUAL, AND PERCENT IN RANDOM SAMPLES PERCENT PERCENT PREDICTED WITH >=.50 IS >=.15 IS >=.10 IS >=.05 IS THIS IS RESULTS WITH >=.50 FULL TREE k=5 k=30 of of TOTAL THIS IS RESULTS WITH >.15 FULL TREE k=5 k=30 of of TOTAL THIS IS RESULTS WITH >.1 FULL TREE k=5 k=30 of of TOTAL THIS IS RESULTS WITH >.05 FULL TREE k=5 k=30 of of TOTAL

25 Table12. Single Tree Results All Patients All Variables Recurrence More Than Local 2 nd Run SINGLE TREE ALL PATIENTS ALL VARIABLES RECURRENCE MORE THAN LOCAL 2 ND RUN TOTAL SET TOTAL PERCENT NUMBER OF ACTUAL, AND PERCENT IN RANDOM SAMPLES PERCENT PERCENT PREDICTED WITH >=.50 IS >=.15 IS >=.10 IS >=.05 IS THIS IS RESULTS WITH >=.50 FULL TREE k=5 k=30 of of TOTAL M.S.E THIS IS RESULTS WITH >.15 FULL TREE k=5 k=30 of of TOTAL M.S.E THIS IS RESULTS WITH >.10 FULL TREE k=5 k=30 of of TOTAL M.S.E THIS IS RESULTS WITH >.05 FULL TREE k=5 k=30 of of TOTAL M.S.E

26 Table 13. Single Tree Results All Patients Leave Out Variables Any Recurrence SINGLE TREE RESULTS All PATIENTS LEAVE OUT VARIABLES ANY RECURRENCE TOTAL SET PERCENT NUMBER OF ACTUAL AND AND PERCENT IN RANDOM SAMPLES PERCENT PERCENT PREDICTED WITH >=.50 IS >=.15 IS >=.10 IS >=.05 IS THIS IS RESULTS WITH >=.50 FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.15 FULL TREE k=5 k=30 of of THIS IS RESULTS WITH >=.1 FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.05 FULL TREE k=5 k=30 of of for Full

27 Table 14. Single Tree Results All Patients Leave Out Variables Any Recurrence 2 nd Run SINGLE TREE RESULTS All PATIENTS LEAVE OUT VARIABLES RECURRENCE MORE THAN LOCAL 2 nd RUN TOTAL SET PERCENT NUMBER OF ACTUAL AND AND PERCENT IN RANDOM SAMPLES PERCENT PERCENT PREDICTED WITH >=.50 IS >=.15 IS >=.10 IS >=.05 IS THIS IS RESULTS WITH >=.5 FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.15 FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.10 FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.05 FULL TREE k=5 k=30 of of for Full

28 Table 15. Single Tree Results All Patients Leave Out Variables Recurrence More Than Local SINGLE TREE All PATIENTS LEAVE OUT VARIABLES ANY RECURRENCE TOTAL SET PERCENT NUMBER OF ACTUAL AND AND PERCENT IN RANDOM SAMPLES PERCENT PERCENT PREDICTED WITH >=.50 IS >=.15 IS >=.10 IS >=.05 IS THIS IS RESULTS WITH >=.5" FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.15" FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.1" FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.05" FULL TREE k=5 k=30 of of for Full

29 Table 16. Single Tree Results All Patients Leave Out Variables Recurrence More Than Local 2 nd Run SINGLE TREE All PATIENTS LEAVE OUT VARIABLES RECURRENCE MORE THAN LOCAL 2 nd RUN TOTAL SET PERCENT NUMBER OF ACTUAL AND AND PERCENT IN RANDOM SAMPLE PERCENT PERCENT PREDICTED WITH >=.50 IS >=.15 IS >=.10 IS >=.05 IS THIS IS RESULTS WITH >=.50 FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.15 FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.10 FULL TREE k=5 k=30 of of for Full THIS IS RESULTS WITH >=.05 FULL TREE k=5 k=30 of of for Full

30 Table 17. Single Tree Results Limited Patients All Variables Any Recurrence SINGLE TREE RESULTS LIMITED PATIENTS ANY RECURRENCE TOTAL SET TOTAL PERCENT NUMBER OF ACTUAL, AND PERCENT IN RANDOM SAMPLES PERCENT PERCENT PREDICTED WITH >=.50 IS >=.15 IS >=.10 IS >=.05 IS THIS IS RESULTS WITH >=.5" FULL TREE k=5 k=30 of of TOTAL M.S.E THIS IS RESULTS WITH >=.15" FULL TREE k=5 k=30 of of TOTAL M.S.E THIS IS RESULTS WITH >=.1" FULL TREE k=5 k=30 of of TOTAL M.S.E THIS IS RESULTS WITH >=.05" FULL TREE k=5 k=30 of of TOTAL M.S.E

31 Table 18. Single Tree Results Limited Patients All Variables Recurrence More Than Local SINGLE TREE RESULTS LIMITED PATIENTS RECURRENCE MORE THAN LOCAL TOTAL SET TOTAL PERCENT NUMBER OF ACTUAL, AND PERCENT IN RANDOM SAMPLES PERCENT PERCENT PREDICTED WITH >=.50 IS >=.15 IS >=.10 IS >=.05 IS THIS IS RESULTS WITH >=.5" FULL TREE k=5 k=30 of of TOTAL M.S.E THIS IS RESULTS WITH >=.15" FULL TREE k=5 k=30 of of TOTAL M.S.E THIS IS RESULTS WITH >=.1" FULL TREE k=5 k=30 of of TOTAL M.S.E THIS IS RESULTS WITH >=.5" FULL TREE k=5 k=30 of of TOTAL M.S.E

32 Table 19. RF All Patients All Variables Any Recurrence TREES 300 MTRY = 15 T.N. CUTOFF N Y TOTAL ERROR

33 MTRY 13 T.N. CUTOFF N Y TOTAL ERROR Table 19 cont 92

34 MTRY 10 T.N. CUTOFF N Y TOTAL ERROR Table 19 cont 93

35 MTRY 7 T.N. CUTOFF N Y TOTAL ERROR Table 19 cont 94

36 MTRY 4 T.N. CUTOFF TOTAL N Y ERROR Table 19 cont 95

37 MTRY 2 T.N. CUTOFF TOTAL N Y ERROR Table 19 cont 96

38 MTRY 1 T.N. CUTOFF TOTAL N Y ERROR Table 19 cont 97

39 Table 20. RF All Patients All Variables Recurrence More Than Local TREES 300 MTRY = 15 T.N. CUTOFF N Y M.S.E TOTAL ERROR

40 MTRY 13 T.N. CUTOFF M.S.E TOTAL N Y ERROR Table 20 cont 99

41 MTRY 10 T.N. CUTOFF N Y M.S.E TOTAL ERROR Table 20 cont 100

42 MTRY 7 T.N. CUTOFF N Y M.S.E TOTAL ERROR Table 20 cont 101

43 MTRY 4 T.N. CUTOFF N Y M.S.E TOTAL ERROR Table 20 cont 102

44 MTRY 2 T.N. CUTOFF N Y M.S.E TOTAL ERROR Table 20 cont 103

45 MTRY 1 T.N. CUTOFF N Y M.S.E TOTAL ERROR Table 20 cont 104

46 Table 21. RF All Patients Leave Out STAGE GROUP, DETEXT AND TYPCAS Any Recurrence TREES 300 MTRY = 12 T.N. CUTOFF N Y TOTAL ERROR Table 21 cont 105

47 MTRY 10 T.N. CUTOFF TOTAL N Y ERROR Table 21 cont 106

48 MTRY 4 T.N. CUTOFF N Y TOTAL ERROR Table 21 cont 107

49 MTRY 2 T.N. CUTOFF N Y TOTAL ERROR Table 21 cont 108

50 Table 22. RF All Patients Leave Out STAGE GROUP, DETEXT And TYPCAS Recurrence More Than Local TREES 300 MTRY = 12 T.N. CUTOFF N Y TOTAL ERROR

51 MTRY 10 T.N. CUTOFF TOTAL N Y ERROR Table 22 cont 110

Statistical Consulting Topics Classification and Regression Trees (CART)

Statistical Consulting Topics Classification and Regression Trees (CART) Statistical Consulting Topics Classification and Regression Trees (CART) Suppose the main goal in a data analysis is the prediction of a categorical variable outcome. Such as in the examples below. Given

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

COMPSTAT2010 in Paris. Hiroki Motogaito. Masashi Goto

COMPSTAT2010 in Paris. Hiroki Motogaito. Masashi Goto COMPSTAT2010 in Paris Ensembled Multivariate Adaptive Regression Splines with Nonnegative Garrote Estimator t Hiroki Motogaito Osaka University Masashi Goto Biostatistical Research Association, NPO. JAPAN

More information

Unsupervised Learning with Random Forest Predictors

Unsupervised Learning with Random Forest Predictors Unsupervised Learning with Random Forest Predictors Tao Shi, and Steve Horvath,, Department of Human Genetics, David Geffen School of Medicine, UCLA Department of Biostatistics, School of Public Health,

More information

Supplementary Table 1. The relationship between LncHIFCAR expression and clinicopathologic parameters in OSCC Age (years) Clinicopathologic parameters LncHIFCAR expression Number High Low of cases P value

More information

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics 1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by

More information

Growing a Large Tree

Growing a Large Tree STAT 5703 Fall, 2004 Data Mining Methodology I Decision Tree I Growing a Large Tree Contents 1 A Single Split 2 1.1 Node Impurity.................................. 2 1.2 Computation of i(t)................................

More information

Methods for generating vegetation maps from remotely

Methods for generating vegetation maps from remotely Mapping Ecological Systems with a Random Forest Model: Tradeoffs between Errors and Bias Emilie Grossmann 1, Janet Ohmann 2, James Kagan 3, Heather May 1 and Matthew Gregory 1 1 Forest Ecosystems and Society,

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Conditional variable importance in R package extendedforest

Conditional variable importance in R package extendedforest Conditional variable importance in R package extendedforest Stephen J. Smith, Nick Ellis, C. Roland Pitcher February 10, 2011 Contents 1 Introduction 1 2 Methods 2 2.1 Conditional permutation................................

More information

Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS

Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS University of Colorado, Boulder CU Scholar Computer Science Technical Reports Computer Science Spring 5-1-23 Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities

More information

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

More information

SF2930 Regression Analysis

SF2930 Regression Analysis SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression

More information

Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms

Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms Analysis and correction of bias in Total Decrease in Node Impurity measures for tree-based algorithms Marco Sandri and Paola Zuccolotto University of Brescia - Department of Quantitative Methods C.da Santa

More information

WALD III SOFTWARE FOR THE MASSES (AND AN EXAMPLE) Leo Breiman UCB Statistics

WALD III SOFTWARE FOR THE MASSES (AND AN EXAMPLE) Leo Breiman UCB Statistics 1 WALD III SOFTWARE FOR THE MASSES (AND AN EXAMPLE) Leo Breiman UCB Statistics leo@stat.berkeley.edu 2 IS THERE AN OBLIGATION? Tens of thousands of statisticians around the world are using statistical

More information

Variable importance in RF. 1 Start. p < Conditional variable importance in RF. 2 n = 15 y = (0.4, 0.6) Other variable importance measures

Variable importance in RF. 1 Start. p < Conditional variable importance in RF. 2 n = 15 y = (0.4, 0.6) Other variable importance measures n = y = (.,.) n = 8 y = (.,.89) n = 8 > 8 n = y = (.88,.8) > > n = 9 y = (.8,.) n = > > > > n = n = 9 y = (.,.) y = (.,.889) > 8 > 8 n = y = (.,.8) n = n = 8 y = (.889,.) > 8 n = y = (.88,.8) n = y = (.8,.)

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

Random Forests for Ordinal Response Data: Prediction and Variable Selection

Random Forests for Ordinal Response Data: Prediction and Variable Selection Silke Janitza, Gerhard Tutz, Anne-Laure Boulesteix Random Forests for Ordinal Response Data: Prediction and Variable Selection Technical Report Number 174, 2014 Department of Statistics University of Munich

More information

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

A note on R 2 measures for Poisson and logistic regression models when both models are applicable Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer

More information

Identifying representative trees from ensembles

Identifying representative trees from ensembles Research Article Received 26 February 2009, Accepted 7 November 20 Published online 3 February 202 in Wiley Online Library (wileyonlinelibrary.com) DOI: 0.002/sim.4492 Identifying representative trees

More information

The influence of categorising survival time on parameter estimates in a Cox model

The influence of categorising survival time on parameter estimates in a Cox model The influence of categorising survival time on parameter estimates in a Cox model Anika Buchholz 1,2, Willi Sauerbrei 2, Patrick Royston 3 1 Freiburger Zentrum für Datenanalyse und Modellbildung, Albert-Ludwigs-Universität

More information

Relative-risk regression and model diagnostics. 16 November, 2015

Relative-risk regression and model diagnostics. 16 November, 2015 Relative-risk regression and model diagnostics 16 November, 2015 Relative risk regression More general multiplicative intensity model: Intensity for individual i at time t is i(t) =Y i (t)r(x i, ; t) 0

More information

Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests

Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests Irene Epifanio Dept. Matemàtiques and IMAC Universitat Jaume I Castelló,

More information

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan Outline: Cox regression part 2 Ørnulf Borgan Department of Mathematics University of Oslo Recapitulation Estimation of cumulative hazards and survival probabilites Assumptions for Cox regression and check

More information

Formula for the t-test

Formula for the t-test Formula for the t-test: How the t-test Relates to the Distribution of the Data for the Groups Formula for the t-test: Formula for the Standard Error of the Difference Between the Means Formula for the

More information

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012 Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv:203.093v2 [math.st] 23 Jul 202 Servane Gey July 24, 202 Abstract The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces of R d with frontiers

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

STATISTICAL COMPUTING USING R/S. John Fox McMaster University

STATISTICAL COMPUTING USING R/S. John Fox McMaster University STATISTICAL COMPUTING USING R/S John Fox McMaster University The S statistical programming language and computing environment has become the defacto standard among statisticians and has made substantial

More information

Dyadic Classification Trees via Structural Risk Minimization

Dyadic Classification Trees via Structural Risk Minimization Dyadic Classification Trees via Structural Risk Minimization Clayton Scott and Robert Nowak Department of Electrical and Computer Engineering Rice University Houston, TX 77005 cscott,nowak @rice.edu Abstract

More information

Multi-state models: prediction

Multi-state models: prediction Department of Medical Statistics and Bioinformatics Leiden University Medical Center Course on advanced survival analysis, Copenhagen Outline Prediction Theory Aalen-Johansen Computational aspects Applications

More information

Chapter 6. Ensemble Methods

Chapter 6. Ensemble Methods Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction

More information

Boulesteix: Maximally selected chi-square statistics and binary splits of nominal variables

Boulesteix: Maximally selected chi-square statistics and binary splits of nominal variables Boulesteix: Maximally selected chi-square statistics and binary splits of nominal variables Sonderforschungsbereich 386, Paper 449 (2005) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Maximally

More information

Random projection ensemble classification

Random projection ensemble classification Random projection ensemble classification Timothy I. Cannings Statistics for Big Data Workshop, Brunel Joint work with Richard Samworth Introduction to classification Observe data from two classes, pairs

More information

Multivariable Fractional Polynomials

Multivariable Fractional Polynomials Multivariable Fractional Polynomials Axel Benner May 17, 2007 Contents 1 Introduction 1 2 Inventory of functions 1 3 Usage in R 2 3.1 Model selection........................................ 3 4 Example

More information

Regression tree methods for subgroup identification I

Regression tree methods for subgroup identification I Regression tree methods for subgroup identification I Xu He Academy of Mathematics and Systems Science, Chinese Academy of Sciences March 25, 2014 Xu He (AMSS, CAS) March 25, 2014 1 / 34 Outline The problem

More information

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION PubH 745: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION Let Y be the Dependent Variable Y taking on values and, and: π Pr(Y) Y is said to have the Bernouilli distribution (Binomial with n ).

More information

Regression and Classification Trees

Regression and Classification Trees Regression and Classification Trees 1 Regression Trees The basic idea behind regression trees is the following: Group the n subjects into a bunch of groups based solely on the explanatory variables. Prediction

More information

β j = coefficient of x j in the model; β = ( β1, β2,

β j = coefficient of x j in the model; β = ( β1, β2, Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)

More information

Evaluation of the predictive capacity of a biomarker

Evaluation of the predictive capacity of a biomarker Evaluation of the predictive capacity of a biomarker Bassirou Mboup (ISUP Université Paris VI) Paul Blanche (Université Bretagne Sud) Aurélien Latouche (Institut Curie & Cnam) GDR STATISTIQUE ET SANTE,

More information

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016 Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

More information

REGRESSION TREE CREDIBILITY MODEL

REGRESSION TREE CREDIBILITY MODEL LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017 Overview Statistical }{{ Method

More information

Applied Survival Analysis Lab 10: Analysis of multiple failures

Applied Survival Analysis Lab 10: Analysis of multiple failures Applied Survival Analysis Lab 10: Analysis of multiple failures We will analyze the bladder data set (Wei et al., 1989). A listing of the dataset is given below: list if id in 1/9 +---------------------------------------------------------+

More information

MODELING MISSING COVARIATE DATA AND TEMPORAL FEATURES OF TIME-DEPENDENT COVARIATES IN TREE-STRUCTURED SURVIVAL ANALYSIS

MODELING MISSING COVARIATE DATA AND TEMPORAL FEATURES OF TIME-DEPENDENT COVARIATES IN TREE-STRUCTURED SURVIVAL ANALYSIS MODELING MISSING COVARIATE DATA AND TEMPORAL FEATURES OF TIME-DEPENDENT COVARIATES IN TREE-STRUCTURED SURVIVAL ANALYSIS by Meredith JoAnne Lotz B.A., St. Olaf College, 2004 Submitted to the Graduate Faculty

More information

Junction-Explorer Help File

Junction-Explorer Help File Junction-Explorer Help File Dongrong Wen, Christian Laing, Jason T. L. Wang and Tamar Schlick Overview RNA junctions are important structural elements of three or more helices in the organization of the

More information

Machine Learning - TP

Machine Learning - TP Machine Learning - TP Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

An asymmetric entropy measure for decision trees

An asymmetric entropy measure for decision trees An asymmetric entropy measure for decision trees Simon Marcellin Laboratoire ERIC Université Lumière Lyon 2 5 av. Pierre Mendès-France 69676 BRON Cedex France simon.marcellin@univ-lyon2.fr Djamel A. Zighed

More information

The Design and Analysis of Benchmark Experiments Part II: Analysis

The Design and Analysis of Benchmark Experiments Part II: Analysis The Design and Analysis of Benchmark Experiments Part II: Analysis Torsten Hothorn Achim Zeileis Friedrich Leisch Kurt Hornik Friedrich Alexander Universität Erlangen Nürnberg http://www.imbe.med.uni-erlangen.de/~hothorn/

More information

Variable importance measures in regression and classification methods

Variable importance measures in regression and classification methods MASTER THESIS Variable importance measures in regression and classification methods Institute for Statistics and Mathematics Vienna University of Economics and Business under the supervision of Univ.Prof.

More information

Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA

Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA Technical Report - 7/87 AN APPLICATION OF COX REGRESSION MODEL TO THE ANALYSIS OF GROUPED PULMONARY TUBERCULOSIS SURVIVAL DATA P. VENKATESAN* K. VISWANATHAN + R. PRABHAKAR* * Tuberculosis Research Centre,

More information

Model Testing for Future Reintroductions of Desert Bighorn Sheep at Capitol Reef National Park

Model Testing for Future Reintroductions of Desert Bighorn Sheep at Capitol Reef National Park University of Wyoming National Park Service Research Center Annual Report Volume 13 13th Annual Report, 1989 Article 7 1-1-1989 Model Testing for Future Reintroductions of Desert Bighorn Sheep at Capitol

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

Machine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3

Machine Learning. Nathalie Villa-Vialaneix -  Formation INRA, Niveau 3 Machine Learning Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau

More information

RANDOM FORESTS FOR CLASSIFICATION IN ECOLOGY

RANDOM FORESTS FOR CLASSIFICATION IN ECOLOGY Ecology, 88(11), 2007, pp. 2783 2792 Ó 2007 by the Ecological Society of America RANDOM FORESTS FOR CLASSIFICATION IN ECOLOGY D. RICHARD CUTLER, 1,7 THOMAS C. EDWARDS, JR., 2 KAREN H. BEARD, 3 ADELE CUTLER,

More information

Classification of Longitudinal Data Using Tree-Based Ensemble Methods

Classification of Longitudinal Data Using Tree-Based Ensemble Methods Classification of Longitudinal Data Using Tree-Based Ensemble Methods W. Adler, and B. Lausen 29.06.2009 Overview 1 Ensemble classification of dependent observations 2 3 4 Classification of dependent observations

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Generalized Additive Models

Generalized Additive Models By Trevor Hastie and R. Tibshirani Regression models play an important role in many applied settings, by enabling predictive analysis, revealing classification rules, and providing data-analytic tools

More information

RECSM Working Paper Number 54 January 2018

RECSM Working Paper Number 54 January 2018 Machine learning for propensity score matching and weighting: comparing different estimation techniques and assessing different balance diagnostics Massimo Cannas Department of Economic and Business Sciences,

More information

Personalities. Charles Darwin John Maynard Smith Alan Turing John von Neumann Sewell Wright

Personalities. Charles Darwin John Maynard Smith Alan Turing John von Neumann Sewell Wright Personalities Charles Darwin John Maynard Smith Alan Turing John von Neumann Sewell Wright Good questions Major transitions [JMSmith & ESzathmary95] Replicating molecules Chromosomes linkage among self-replicating

More information

TREE-BASED METHODS FOR SURVIVAL ANALYSIS AND HIGH-DIMENSIONAL DATA. Ruoqing Zhu

TREE-BASED METHODS FOR SURVIVAL ANALYSIS AND HIGH-DIMENSIONAL DATA. Ruoqing Zhu TREE-BASED METHODS FOR SURVIVAL ANALYSIS AND HIGH-DIMENSIONAL DATA Ruoqing Zhu A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements

More information

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES DAVID MCDIARMID Abstract Binary tree-structured partition and classification schemes are a class of nonparametric tree-based approaches to classification

More information

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression LOGISTIC REGRESSION Regression techniques provide statistical analysis of relationships. Research designs may be classified as eperimental or observational; regression analyses are applicable to both types.

More information

Linear Recurrent Subsequences of Meta-Fibonacci Sequences

Linear Recurrent Subsequences of Meta-Fibonacci Sequences Linear Recurrent Subsequences of Meta-Fibonacci Sequences Nathan Fox arxiv:1508.01840v1 [math.nt] 7 Aug 2015 Abstract In a recent paper, Frank Ruskey asked whether every linear recurrent sequence can occur

More information

Multimodal Deep Learning for Predicting Survival from Breast Cancer

Multimodal Deep Learning for Predicting Survival from Breast Cancer Multimodal Deep Learning for Predicting Survival from Breast Cancer Heather Couture Deep Learning Journal Club Nov. 16, 2016 Outline Background on tumor histology & genetic data Background on survival

More information

Logic Regression. Ingo Ruczinski. Department of Biostatistics Johns Hopkins University.

Logic Regression. Ingo Ruczinski. Department of Biostatistics Johns Hopkins University. Logic Regression Ingo Ruczinski Department of Biostatistics Jons Hopkins University Email: ingo@ju.edu ttp://biosun.biostat.jsp.edu/ iruczins Wit Carles Kooperberg Micael LeBlanc, FHCRC Introduction Motivation

More information

Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship

Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship Vladimir Svetnik, Andy Liaw, and Christopher Tong Biometrics Research, Merck & Co., Inc. P.O. Box 2000

More information

The coxvc_1-1-1 package

The coxvc_1-1-1 package Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models

More information

Multivariable Fractional Polynomials

Multivariable Fractional Polynomials Multivariable Fractional Polynomials Axel Benner September 7, 2015 Contents 1 Introduction 1 2 Inventory of functions 1 3 Usage in R 2 3.1 Model selection........................................ 3 4 Example

More information

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Censoring Unbiased Regression Trees and Ensembles

Censoring Unbiased Regression Trees and Ensembles Johns Hopkins University, Dept. of Biostatistics Working Papers 1-31-216 Censoring Unbiased Regression Trees and Ensembles Jon Arni Steingrimsson Department of Biostatistics, Johns Hopkins Bloomberg School

More information

Nonlinear Knowledge-Based Classification

Nonlinear Knowledge-Based Classification Nonlinear Knowledge-Based Classification Olvi L. Mangasarian Edward W. Wild Abstract Prior knowledge over general nonlinear sets is incorporated into nonlinear kernel classification problems as linear

More information

Statistics and learning: Big Data

Statistics and learning: Big Data Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Anatrytone logan. Species Distribution Model (SDM) assessment metrics and metadata Common name: Delaware Skipper Date: 17 Nov 2017 Code: anatloga

Anatrytone logan. Species Distribution Model (SDM) assessment metrics and metadata Common name: Delaware Skipper Date: 17 Nov 2017 Code: anatloga Anatrytone logan Species Distribution Model (SDM) assessment metrics and metadata Common name: Delaware Skipper Date: 17 Nov 2017 Code: anatloga fair TSS=0.74 ability to find new sites This SDM incorporates

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

arxiv: v1 [stat.me] 5 Dec 2018

arxiv: v1 [stat.me] 5 Dec 2018 Joint latent class trees: A Tree-Based Approach to Joint Modeling of Time-to-event and Longitudinal Data Ningshan Zhang and Jeffrey S. Simonoff IOMS Department, Leonard N. Stern School of Business, New

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS

Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS Efforts to Improve Quality of Care Stephen Jones, PhD Bio-statistical Research

More information

Methods for Predicting an Ordinal Response with High-Throughput Genomic Data

Methods for Predicting an Ordinal Response with High-Throughput Genomic Data Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2016 Methods for Predicting an Ordinal Response with High-Throughput Genomic Data Kyle L. Ferber Virginia

More information

Influence measures for CART

Influence measures for CART Jean-Michel Poggi Orsay, Paris Sud & Paris Descartes Joint work with Avner Bar-Hen Servane Gey (MAP5, Paris Descartes ) CART CART Classification And Regression Trees, Breiman et al. (1984) Learning set

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Variable importance in binary regression trees and forests

Variable importance in binary regression trees and forests Electronic Journal of Statistics Vol. 1 (2007) 519 537 ISSN: 1935-7524 DOI: 10.1214/07-EJS039 Variable importance in binary regression trees and forests Hemant Ishwaran Department of Quantitative Health

More information

BET: Bayesian Ensemble Trees for Clustering and Prediction in Heterogeneous Data

BET: Bayesian Ensemble Trees for Clustering and Prediction in Heterogeneous Data BET: Bayesian Ensemble Trees for Clustering and Prediction in Heterogeneous Data arxiv:1408.4140v1 [stat.ml] 18 Aug 2014 Leo L. Duan 1, John P. Clancy 2 and Rhonda D. Szczesniak 3 4 Summary We propose

More information

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen Recap of Part 1 Per Kragh Andersen Section of Biostatistics, University of Copenhagen DSBS Course Survival Analysis in Clinical Trials January 2018 1 / 65 Overview Definitions and examples Simple estimation

More information

Title. Citation Remote Sensing Letters, 5(2): Issue Date Doc URLhttp://hdl.handle.net/2115/ Type.

Title. Citation Remote Sensing Letters, 5(2): Issue Date Doc URLhttp://hdl.handle.net/2115/ Type. Title Random Forest classification of crop type usin Author(s) Sonobe, Rei; Tani, Hiroshi; Wang, Xiufeng; Kob Citation Remote Sensing Letters, 5(2): 157-164 Issue Date 2014-02 Doc URLhttp://hdl.handle.net/2115/57984

More information

Package penalized. February 21, 2018

Package penalized. February 21, 2018 Version 0.9-50 Date 2017-02-01 Package penalized February 21, 2018 Title L1 (Lasso and Fused Lasso) and L2 (Ridge) Penalized Estimation in GLMs and in the Cox Model Author Jelle Goeman, Rosa Meijer, Nimisha

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Forecasting Casino Gaming Traffic with a Data Mining Alternative to Croston s Method

Forecasting Casino Gaming Traffic with a Data Mining Alternative to Croston s Method Forecasting Casino Gaming Traffic with a Data Mining Alternative to Croston s Method Barry King Abstract Other researchers have used Croston s method to forecast traffic at casino game tables. Our data

More information

First Aid Kit for Survival. Hypoxia cohort. Goal. DFS=Clinical + Marker 1/21/2015. Two analyses to exemplify some concepts of survival techniques

First Aid Kit for Survival. Hypoxia cohort. Goal. DFS=Clinical + Marker 1/21/2015. Two analyses to exemplify some concepts of survival techniques First Aid Kit for Survival Melania Pintilie pintilie@uhnres.utoronto.ca Two analyses to exemplify some concepts of survival techniques Checking linearity Checking proportionality of hazards Predicted curves:

More information

Maximally selected chi-square statistics for at least ordinal scaled variables

Maximally selected chi-square statistics for at least ordinal scaled variables Maximally selected chi-square statistics for at least ordinal scaled variables Anne-Laure Boulesteix anne-laure.boulesteix@stat.uni-muenchen.de Department of Statistics, University of Munich, Akademiestrasse

More information

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

Generalization to Multi-Class and Continuous Responses. STA Data Mining I Generalization to Multi-Class and Continuous Responses STA 5703 - Data Mining I 1. Categorical Responses (a) Splitting Criterion Outline Goodness-of-split Criterion Chi-square Tests and Twoing Rule (b)

More information

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science Log-linearity for Cox s regression model Thesis for the Degree Master of Science Zaki Amini Master s Thesis, Spring 2015 i Abstract Cox s regression model is one of the most applied methods in medical

More information

Regression tree-based diagnostics for linear multilevel models

Regression tree-based diagnostics for linear multilevel models Regression tree-based diagnostics for linear multilevel models Jeffrey S. Simonoff New York University May 11, 2011 Longitudinal and clustered data Panel or longitudinal data, in which we observe many

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information