Analyzing dynamic ensemble selection techniques using dissimilarity analysis

Size: px
Start display at page:

Download "Analyzing dynamic ensemble selection techniques using dissimilarity analysis"

Transcription

1 Analyzing dynamic ensemble selection techniques using dissimilarity analysis George D. C. Cavalcanti 1 1 Centro de Informática - Universidade Federal de Pernambuco (UFPE), Brazil gdcc@cin.ufpe.br George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

2 Overview 1 Introduction 2 Objective 3 Meta-Learning for Dynamic Ensemble Selection 4 Experimental Study 5 Conclusion George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

3 Introduction There is no clear guideline to choose a good learning method George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

4 Introduction There is no clear guideline to choose a good learning method Selecting the best current classifier can lead to the choice of the worst classifier for future data George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

5 Introduction There is no clear guideline to choose a good learning method Selecting the best current classifier can lead to the choice of the worst classifier for future data No free lunch theorem No dominant classifier exists for all the data distributions, and the data distribution of the task at hand is usually unknown George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

6 Combination of Classifiers Introduction L 1# L 2# x q# Combiner # decision# L 3#...# L m# pool#of#classifiers# Combination of Classifiers Consists of combining the opinions of an ensemble of classifiers in the hope that the new opinion will be better than the individual ones. Vox populi, vox Dei. George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

7 Combination of Classifiers Introduction Is it worth to combine classifiers? Three reasons to combine classifiers: Statistical Computational Representational George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

8 Statistical (or worst case) motivation Introduction Two options: George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

9 Statistical (or worst case) motivation Introduction Two options: 1 Pick any classifier risk of making a bad choice George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

10 Statistical (or worst case) motivation Introduction Ensembles de classificado Two options: Introdução 1 Pick any classifier risk of making a bad choice O uso de ensembles foi justificado por D 2 Average them três maneiras: no guarantee to perform better than the single best D probably presents little interference of bad classifiers Estatística Computacional Avoid the worst classifier by averaging several classifiers George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39 5

11 Computational motivation Introduction O uso de ensembles foi justificado por Dietterich de três Different maneiras: algorithms lead to different local optima Training algorithms such as hill-climbing and random search Classifiers Estatísticaend closer tocomputacional optimal classifier D Representativa Error surface 5 Aggregation may lead to a classifier that is a better approximation than any single D i George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

12 Representational (or best case) motivation ustificado Introduction por Dietterich de D might not be in the classifier s space Ex.: classifiers space contains only linear classifiers So, an ensemble of linear classifiers can approximate any decision boundary with Representativa any predefined accuracy putacional Best classifier out of the classifiers space George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

13 Multiple Classifier System Competitions results Netflix Prize KDDCup Gödel Prize 2003: AdaBoost algorithm ImageNet George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

14 Multiple Classifier System Fusion versus Selection L 1# L 2# x q# Combiner # decision# L 3#...# L m# pool#of#classifiers# George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

15 Multiple Classifier System Two main phases: Generation and Combination L 1# L 2# x q# Combiner # decision# L 3#...# L m# pool#of#classifiers# George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

16 Multiple Classifier System Two main phases: Generation and Combination L 1# L 2# x q# Combiner # decision# L 3#...# L m# pool#of#classifiers# George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

17 Dynamic Ensemble Selection (DES) Introduction DES techniques assume that each base classifier is a local expert measure the level of competence of the classifiers George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

18 Dynamic Ensemble Selection (DES) Introduction George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

19 Dynamic Ensemble Selection (DES) Introduction George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

20 Dynamic Ensemble Selection (DES) Introduction George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

21 Dynamic Ensemble Selection (DES) Introduction George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

22 Dynamic Ensemble Selection (DES) Introduction George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

23 Dynamic Ensemble Selection (DES) Introduction George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

24 Dynamic Ensemble Selection (DES) General architecture x q# L 1# L 1# L 2# L 3# Dynamic## Selec<on # L 2# L 3# Combiner # decision#...#...# L m# L n# pool#of#classifiers#l# pool#of#classifiers#l # L # # L # George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

25 DES Criterion Introduction Given a test pattern x q : Select the most competent classifiers to predict the label of x q How to estimate the competence level of base classifiers? The Region of Competence of x q is defined by its neighbours George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

26 DES Criterion Region of Competence Xq Xq k=1 k=3 Xq k=5 George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

27 DES Criterion Region of Competence Xq Xq Xq Xq k=1 k=1 k=3 k=3 Xq Xq k=5 k=5 George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

28 DES Criterion Region of Competence Xq Xq Xq Xq Xq Xq k=1 k=1 k=1 k=3 k=3 k=3 Xq Xq Xq k=5 k=5 k=5 George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

29 DES Criterion Algorithm OLA: an example k=3 C1,C2,C3 C1,C2,C3 Xq C1,C2,C3 C1,C2,C3 C 1 : 2/5 = C 2 : 4/5 = C 3 : 3/5 = C1,C2,C3 k=5 Three classifiers: {C 1, C 2, C 3 } Green means that C i correctly classified x q Orange means that C i did not correctly classify x q George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

30 Oracle [Kuncheva, PAMI 2002] Ideal DES technique that always selects the classifier that predicts the correct label for x q and rejects otherwise Oracle definition { δi,j = 1, if ci correctly classifies xq δ i,j = 0, otherwise George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

31 Objective Compare the criteria used to estimate the level of competence of classifiers using dissimilarity representation George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

32 Objective Compare the criteria used to estimate the level of competence of classifiers using dissimilarity representation The purpose of the dissimilarity analysis is twofold: Understand the relationship between different DES criteria Determine which DES criterion behaves similar to the Oracle George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

33 Objective Compare the criteria used to estimate the level of competence of classifiers using dissimilarity representation The purpose of the dissimilarity analysis is twofold: Understand the relationship between different DES criteria Determine which DES criterion behaves similar to the Oracle Hypothesis: Techniques closer to the Oracle in the dissimilarity space achieve higher recognition accuracy George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

34 Dynamic Ensemble Selection: Literature OLA: Overall Local Accuracy Woods et al., PAMI 1997 LCA: Local Classifier Accuracy Woods et al., PAMI 1997 MCB: Multiple Classifier Behaviour Giancinto, 2001 MLA: Modified Local Accuracy Smits, 2002 KNORA: K-Nearests Oracles-Eliminate Ko et al., Pattern Recognition 2008 KNOP: K-Nearests Output Profiles Cavalin et al., Pattern Recognition 2013 Meta-DES: On Meta-Learning for Dynamic Ensemble Selection Cruz et al., ICPR 2014 George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

35 Meta-DES: Meta-Features Multiple criteria are used to estimate the competence of base classifiers These criteria are encoded as meta-features MF Criterion Paradigm f 1 Local accuracy in the RoC Classifier Accuracy over a local region f 2 Extent of consensus in the RoC Classifier consensus f 3 Overall accuracy in the RoC Classifier Accuracy over a local region f 4 Accuracy in the decision space Output Profiles f 5 Degree of confidence for the input sample Classifier confidence MF: meta-feature RoC: region of competence George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

36 Meta-DES: Meta-Classifier A meta-classifier (λ) is trained to distinguish between a competent and a not competent classifier For each pair (classifier c i, query pattern x q ) The output of λ is Yes (competent) or No (not competent) f 1" f 2" f 3" λ"classifier" Yes/No" f 4" f 5" George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

37 Meta-DES architecture 3.2.1) Overproduction Training 3.2.2) Meta-Training Phase Training Xj,Train Classifier Generation Process 3.2.3) Generalization Phase Test Selection DSEL Xj,Train Xj,Test Xj,DSEL hc Data Generation Process Sample Selection Pool of classifiers Meta-Feature Extraction Process K Kp Meta-Feature Extraction Process vi,j C = {c1,, cm} vi,j Dynamic Selection Majority Vote Meta Training Process Selector wl Rafael M. O. Cruz, Robert Sabourin and George D. C. Cavalcanti. On Meta-Learning for Dynamic Ensemble Selection. International Conference on Pattern Recognition (ICPR), pp , George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

38 Experimental protocol Pool of classifiers: 10 Perceptrons (generated using Bagging) Number of replications: 20 Training, Meta-training, Selection, and Test Size of the region of competence: 7 Database No. of Instances Dimensionality No. of Classes Source Pima UCI Liver Disorders UCI Breast (WDBC) UCI Vehicle UCI Blood transfusion UCI Sonar UCI Ionosphere UCI Wine UCI Haberman UCI Banana PRTOOLS Lithuanian PRTOOLS George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

39 Disssimilarity Analysis The first step is to compute the Dissimilarity matrix D D is an 8 8 symmetrical matrix d A,B is the dissimilarity between two DES techniques The dissimilarity d A,B is compute as Dissimilarity metric d A,B = 1 NM N M j=1 i=1 ( ) 2 δi,j A δi,j B δ A i,j is the level of competence of the technique A δ B i,j is the level of competence of the technique B N is the size of the validation dataset M is size of the pool of classifiers George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

40 Dissimilarity Matrix For each dataset, we compute a dissimilarity matrix D Pima, D Liver,..., D Lithuanian The average dissimilarity matrix D is computed as the mean of D a D = mean(d Pima, D Liver,..., D Lithuanian ) Meta-Learning KNORA MCB LCA OLA MLA KNOP Oracle Meta-Learning (0.06) 0.46(0.15) 0.40(0.07) 0.36(0.06) 0.40(0.04) 0.53(0.08) 0.54(0.03) KNORA 0.36(0.06) (0.06) 0.42(0.01) 0.44(0.01) 0.71(0.04) 0.74(0.11) 0.68(0.01) MCB 0.46(0.15) 0.89(0.06) (0.01) 0.89(0.06) 1.06(0.07) 0.75(0.03) 0.72(0.08) LCA 0.40(0.07) 0.42(0.01) 0.58(0.01) (0.01) 0.45(0.02) 0.31(0.04) 0.60(0.06) OLA 0.36(0.06) 0.44(0.01) 0.89(0.06) 0.42(0.01) (0.04) 0.74(0.11) 0.68(0.11) MLA 0.40(0.04) 0.71(0.04) 1.06(0.07) 0.45(0.02) 0.71(0.04) (0.01) 0.63(0.07) KNOP 0.53(0.08) 0.74(0.11) 0.75(0.03) 0.31(0.04) 0.74(0.11) 0.54(0.01) (0.12) Oracle 0.54(0.03) 0.68(0.01) 0.72(0.08) 0.60(0.06) 0.68(0.11) 0.63(0.07) 0.86(0.12) 0 George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

41 Dissimilarity Matrix For each dataset, we compute a dissimilarity matrix D Pima, D Liver,..., D Lithuanian The average dissimilarity matrix D is computed as the mean of D a D = mean(d Pima, D Liver,..., D Lithuanian ) Meta-Learning KNORA MCB LCA OLA MLA KNOP Oracle Meta-Learning (0.06) 0.46(0.15) 0.40(0.07) 0.36(0.06) 0.40(0.04) 0.53(0.08) 0.54(0.03) KNORA 0.36(0.06) (0.06) 0.42(0.01) 0.44(0.01) 0.71(0.04) 0.74(0.11) 0.68(0.01) MCB 0.46(0.15) 0.89(0.06) (0.01) 0.89(0.06) 1.06(0.07) 0.75(0.03) 0.72(0.08) LCA 0.40(0.07) 0.42(0.01) 0.58(0.01) (0.01) 0.45(0.02) 0.31(0.04) 0.60(0.06) OLA 0.36(0.06) 0.44(0.01) 0.89(0.06) 0.42(0.01) (0.04) 0.74(0.11) 0.68(0.11) MLA 0.40(0.04) 0.71(0.04) 1.06(0.07) 0.45(0.02) 0.71(0.04) (0.01) 0.63(0.07) KNOP 0.53(0.08) 0.74(0.11) 0.75(0.03) 0.31(0.04) 0.74(0.11) 0.54(0.01) (0.12) Oracle 0.54(0.03) 0.68(0.01) 0.72(0.08) 0.60(0.06) 0.68(0.11) 0.63(0.07) 0.86(0.12) 0 How to show this matrix D in a 2D-plot? George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

42 Classifier Projection Space (CPS) [Pekalska et al., 2002] The CPS is an R n space where each DES technique is represented as a point and the Euclidean distance between two techniques is equal to their dissimilarities 2-Dimensional projection based on a Non-linear multidimensional scaling (Sammon mapping) George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

43 Classifier Projection Space (CPS) CPS for the average dissimilarity matrix D 0.06 MCB KNOP MLA OLA LCA KNORA META LEARNING ORACLE Rafael M. O. Cruz, George D. C. Cavalcanti, Tsang Ing Ren and Robert Sabourin. Feature representation selection based on Classifier Projection Space and Oracle analysis. Expert Systems with Applications, v. 40, pp , George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

44 Average Distance to the Oracle For each classification problem The dissimilarity between each DES and the Oracle (d a,oracle ) Mean and standard deviation Database Meta-Learning KNORA-E MCB LCA OLA MLA KNOP Pima 0.32(0.04) 0.43(0.01) 0.47(0.08) 0.36(0.06) 0.43(0.01) 0.44(0.07) 0.41(0.02) Liver Disorders 0.50(0.04) 0.61(0.01) 0.67(.008) 0.56(0.06) 0.61(0.01) 0.60(0.07) 0.51(0.02) Breast Cancer 0.59(0.35) 1.22(0.10) 1.20(0.10) 0.69(0.01) 1.20(0.10) 0.77(0.03) 1.20(0.10) Blood Transfusion 0.33(0.03) 0.40(0.01) 0.46(0.01) 0.36(.003) 0.40(0.01) 0.44(0.08) 0.4(0.01) Banana 0.33(0.10) 0.29(0.01) 0.36(0.01) 0.24(0.01) 0.29(0.01) 0.36(0.01) 0.34(0.01) Vehicle 0.36(0.07) 0.49(0.01) 0.48(0.02) 0.36(0.04) 0.49(0.01) 0.37(0.05) 0.47(0.02) Lithuanian Classes 0.47(0.14) 0.49(0.02) 0.56(0.02) 0.39(0.04) 0.49(0.02) 0.54(0.01) 0.51(0.03) Sonar 0.58(0.10) 0.91(0.04) 0.88(0.01) 0.70(0.01) 0.91(0.04) 0.85(0.02) 0.84(0.06) Ionosphere 0.62(0.22) 0.89(0.05) 0.88(0.06) 0.70(0.07) 0.89(0.05) 0.68(0.02) 0.88(0.06) Wine 1.03(0.20) 0.88(0.11) 0.98(0.11) 0.73(0.02) 0.88(0.11) 0.93(0.06) 0.82(0.14) Haberman 0.79(0.04) 0.89(0.05) 1.01(0.05) 0.82(0.02) 0.89(0.05) 0.92(0.04) 0.86(0.06) Mean 0.54(0.05) 0.68(0.01) 0.72(0.08) 0.60(0.06) 0.68(0.11) 0.63(0.07) 0.86(0.12) George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

45 Average Distance to the Oracle Database Meta-Learning KNORA-E MCB LCA OLA MLA KNOP Pima 0.32(0.04) 0.43(0.01) 0.47(0.08) 0.36(0.06) 0.43(0.01) 0.44(0.07) 0.41(0.02) Liver Disorders 0.50(0.04) 0.61(0.01) 0.67(.008) 0.56(0.06) 0.61(0.01) 0.60(0.07) 0.51(0.02) Breast Cancer 0.59(0.35) 1.22(0.10) 1.20(0.10) 0.69(0.01) 1.20(0.10) 0.77(0.03) 1.20(0.10) Blood Transfusion 0.33(0.03) 0.40(0.01) 0.46(0.01) 0.36(.003) 0.40(0.01) 0.44(0.08) 0.4(0.01) Banana 0.33(0.10) 0.29(0.01) 0.36(0.01) 0.24(0.01) 0.29(0.01) 0.36(0.01) 0.34(0.01) Vehicle 0.36(0.07) 0.49(0.01) 0.48(0.02) 0.36(0.04) 0.49(0.01) 0.37(0.05) 0.47(0.02) Lithuanian Classes 0.47(0.14) 0.49(0.02) 0.56(0.02) 0.39(0.04) 0.49(0.02) 0.54(0.01) 0.51(0.03) Sonar 0.58(0.10) 0.91(0.04) 0.88(0.01) 0.70(0.01) 0.91(0.04) 0.85(0.02) 0.84(0.06) Ionosphere 0.62(0.22) 0.89(0.05) 0.88(0.06) 0.70(0.07) 0.89(0.05) 0.68(0.02) 0.88(0.06) Wine 1.03(0.20) 0.88(0.11) 0.98(0.11) 0.73(0.02) 0.88(0.11) 0.93(0.06) 0.82(0.14) Haberman 0.79(0.04) 0.89(0.05) 1.01(0.05) 0.82(0.02) 0.89(0.05) 0.92(0.04) 0.86(0.06) Mean 0.54(0.05) 0.68(0.01) 0.72(0.08) 0.60(0.06) 0.68(0.11) 0.63(0.07) 0.86(0.12) The meta-learning framework is closer to the Oracle for the majority of datasets, followed by the LCA technique Rafael M. O. Cruz, Robert Sabourin and George D. C. Cavalcanti. Analyzing Dynamic Ensemble Selection Techniques Using Dissimilarity Analysis. Workshop on Artificial Neural Networks in Pattern Recognition, v. 8774, p , George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

46 Comparative Results Accuracy rate: mean and standard deviation Database Meta-Learning KNORA-E MCB LCA OLA MLA KNOP Oracle Pima 77.74(2.34) 73.16(1.86) 73.05(2.21) 72.86(2.98) 73.14(2.56) 73.96(2.31) 73.42(2.11) 95.10(1.19) Liver Disorders (5.57) 63.86(3.28) 63.19(2.39) 62.24(4.01) 62.05(3.27) 57.10(3.29) 65.23(2.29) 90.07(2.41) Breast Cancer 97.41(1.07) 96.93(1.10) 96.83(1.35) 97.15(1.58) 96.85(1.32) 96.66(1.34) 95.42(0.89) 99.13(0.52) Blood Transfusion 79.14(1.88) 74.59(2.62) 72.59(3.20) 72.20(2.87) 72.33(2.36) 70.17(3.05) 77.54(2.03) 94.20(2.08) Banana 90.16(2.09) 88.83(1.67) 88.17(3.37) 89.28(1.89) 89.40(2.15) 80.83(6.15) 85.73(10.65) 94.75(2.09) Vehicle 82.50(2.07) 81.19(1.54) 80.20(4.05) 80.33(1.84) 81.50(3.24) 71.15(3.50) 80.09(1.47) 96.80(0.94) Lithuanian Classes 90.26(2.78) 88.83(2.50) 89.17(2.30) 88.10(2.20) 87.95(1.85) 77.67(3.20) 89.33(2.29) (0.57) Sonar 79.72(1.86) 74.95(2.79) 75.20(3.35) 76.51(2.06) 74.52(1.54) 74.85(1.34) 75.72(2.82) 94.46(1.63) Ionosphere 89.31(0.95) 87.37(3.07) 85.71(2.12) 86.56(1.98) 86.56(1.98) 87.35(1.34) 85.71(5.52) 96.20(1.72) Wine 96.94(4.08) 95.00(1.53) 95.55(2.30) 95.85(2.25) 96.16(3.02) 96.66(3.36) 95.00(4.14) (0.21) Haberman 76.71(3.52) 71.23(4.16) 72.86(3.65) 70.16(3.56) 72.26(4.17) 65.01(3.20) 75.00(3.40) 97.36(3.34) The best results are in bold. Results that are significantly better (p < 0.05) are underlined The accuracy of the proposed Meta-learning framework is statistically superior in 8 out of 11 datasets George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

47 Conclusion We conducted a study about the dissimilarity between different DES techniques using Classifier Projection Space (CPS) George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

48 Conclusion We conducted a study about the dissimilarity between different DES techniques using Classifier Projection Space (CPS) Techniques that use the same kind of information such as LCA, OLA and MLA are likely to present similar results George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

49 Conclusion We conducted a study about the dissimilarity between different DES techniques using Classifier Projection Space (CPS) Techniques that use the same kind of information such as LCA, OLA and MLA are likely to present similar results The combination of multiple criterion using meta-learning achieves a result close to the Oracle in the dissimilarity space And also achieves higher recognition rates George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

50 Analyzing dynamic ensemble selection techniques using dissimilarity analysis George D. C. Cavalcanti 1 1 Centro de Informática - Universidade Federal de Pernambuco (UFPE), Brazil gdcc@cin.ufpe.br George D. C. Cavalcanti (CIn-UFPE) WTDCC-UFU / 39

arxiv: v1 [cs.lg] 13 Aug 2014

arxiv: v1 [cs.lg] 13 Aug 2014 A Classifier-free Ensemble Selection Method based on Data Diversity in Random Subspaces Technical Report arxiv:1408.889v1 [cs.lg] 13 Aug 014 Albert H.R. Ko École de technologie supérieure (ÉTS), Université

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy

Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy and for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy Marina Skurichina, Liudmila I. Kuncheva 2 and Robert P.W. Duin Pattern Recognition Group, Department of Applied Physics,

More information

Selection of Classifiers based on Multiple Classifier Behaviour

Selection of Classifiers based on Multiple Classifier Behaviour Selection of Classifiers based on Multiple Classifier Behaviour Giorgio Giacinto, Fabio Roli, and Giorgio Fumera Dept. of Electrical and Electronic Eng. - University of Cagliari Piazza d Armi, 09123 Cagliari,

More information

Classifier Selection. Nicholas Ver Hoeve Craig Martek Ben Gardner

Classifier Selection. Nicholas Ver Hoeve Craig Martek Ben Gardner Classifier Selection Nicholas Ver Hoeve Craig Martek Ben Gardner Classifier Ensembles Assume we have an ensemble of classifiers with a well-chosen feature set. We want to optimize the competence of this

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Ensembles. Léon Bottou COS 424 4/8/2010

Ensembles. Léon Bottou COS 424 4/8/2010 Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

Data Dependence in Combining Classifiers

Data Dependence in Combining Classifiers in Combining Classifiers Mohamed Kamel, Nayer Wanas Pattern Analysis and Machine Intelligence Lab University of Waterloo CANADA ! Dependence! Dependence Architecture! Algorithm Outline Pattern Recognition

More information

Ensemble determination using the TOPSIS decision support system in multi-objective evolutionary neural network classifiers

Ensemble determination using the TOPSIS decision support system in multi-objective evolutionary neural network classifiers Ensemble determination using the TOPSIS decision support system in multi-obective evolutionary neural network classifiers M. Cruz-Ramírez, J.C. Fernández, J. Sánchez-Monedero, F. Fernández-Navarro, C.

More information

Top-k Parametrized Boost

Top-k Parametrized Boost Top-k Parametrized Boost Turki Turki 1,4, Muhammad Amimul Ihsan 2, Nouf Turki 3, Jie Zhang 4, Usman Roshan 4 1 King Abdulaziz University P.O. Box 80221, Jeddah 21589, Saudi Arabia tturki@kau.edu.sa 2 Department

More information

Improving the Expert Networks of a Modular Multi-Net System for Pattern Recognition

Improving the Expert Networks of a Modular Multi-Net System for Pattern Recognition Improving the Expert Networks of a Modular Multi-Net System for Pattern Recognition Mercedes Fernández-Redondo 1, Joaquín Torres-Sospedra 1 and Carlos Hernández-Espinosa 1 Departamento de Ingenieria y

More information

What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity

What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate.

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Large-Scale Nearest Neighbor Classification with Statistical Guarantees

Large-Scale Nearest Neighbor Classification with Statistical Guarantees Large-Scale Nearest Neighbor Classification with Statistical Guarantees Guang Cheng Big Data Theory Lab Department of Statistics Purdue University Joint Work with Xingye Qiao and Jiexin Duan July 3, 2018

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

Boosting & Deep Learning

Boosting & Deep Learning Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

arxiv: v2 [cs.lg] 21 Feb 2018

arxiv: v2 [cs.lg] 21 Feb 2018 Vote-boosting ensembles Maryam Sabzevari, Gonzalo Martínez-Muñoz and Alberto Suárez Universidad Autónoma de Madrid, Escuela Politécnica Superior, Dpto. de Ingeniería Informática, C/Francisco Tomás y Valiente,

More information

Investigating the Performance of a Linear Regression Combiner on Multi-class Data Sets

Investigating the Performance of a Linear Regression Combiner on Multi-class Data Sets Investigating the Performance of a Linear Regression Combiner on Multi-class Data Sets Chun-Xia Zhang 1,2 Robert P.W. Duin 2 1 School of Science and State Key Laboratory for Manufacturing Systems Engineering,

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining 13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

More information

Combination Methods for Ensembles of Multilayer Feedforward 1

Combination Methods for Ensembles of Multilayer Feedforward 1 Combination Methods for Ensembles of Multilayer Feedforward 1 JOAQUÍN TORRES-SOSPEDRA MERCEDES FERNÁNDEZ-REDONDO CARLOS HERNÁNDEZ-ESPINOSA Dept. de Ingeniería y Ciencia de los Computadores Universidad

More information

Context-based Reasoning in Ambient Intelligence - CoReAmI -

Context-based Reasoning in Ambient Intelligence - CoReAmI - Context-based in Ambient Intelligence - CoReAmI - Hristijan Gjoreski Department of Intelligent Systems, Jožef Stefan Institute Supervisor: Prof. Dr. Matjaž Gams Co-supervisor: Dr. Mitja Luštrek Background

More information

10701/15781 Machine Learning, Spring 2007: Homework 2

10701/15781 Machine Learning, Spring 2007: Homework 2 070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

More information

Ensembles of Classifiers.

Ensembles of Classifiers. Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

An Empirical Study of Building Compact Ensembles

An Empirical Study of Building Compact Ensembles An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.

More information

Fast heterogeneous boosting

Fast heterogeneous boosting Fast heterogeneous boosting Norbert Jankowski Department of Informatics, Nicolaus Copernicus University, Poland norbert@is.umk.pl Abstract he main goal of this paper is introduction of fast heterogeneous

More information

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:

More information

Infinite Ensemble Learning with Support Vector Machinery

Infinite Ensemble Learning with Support Vector Machinery Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems

More information

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which

More information

Dynamic Linear Combination of Two-Class Classifiers

Dynamic Linear Combination of Two-Class Classifiers Dynamic Linear Combination of Two-Class Classifiers Carlo Lobrano 1, Roberto Tronci 1,2, Giorgio Giacinto 1, and Fabio Roli 1 1 DIEE Dept. of Electrical and Electronic Engineering, University of Cagliari,

More information

Comparison of Log-Linear Models and Weighted Dissimilarity Measures

Comparison of Log-Linear Models and Weighted Dissimilarity Measures Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Boosting: Algorithms and Applications

Boosting: Algorithms and Applications Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

Weight Initialization Methods for Multilayer Feedforward. 1

Weight Initialization Methods for Multilayer Feedforward. 1 Weight Initialization Methods for Multilayer Feedforward. 1 Mercedes Fernández-Redondo - Carlos Hernández-Espinosa. Universidad Jaume I, Campus de Riu Sec, Edificio TI, Departamento de Informática, 12080

More information

A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS

A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS Journal of the Chinese Institute of Engineers, Vol. 32, No. 2, pp. 169-178 (2009) 169 A TWO-STAGE COMMITTEE MACHINE OF NEURAL NETWORKS Jen-Feng Wang, Chinson Yeh, Chen-Wen Yen*, and Mark L. Nagurka ABSTRACT

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline

More information

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Adaptive Boosting of Neural Networks for Character Recognition

Adaptive Boosting of Neural Networks for Character Recognition Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Yoshua Bengio Dept. Informatique et Recherche Opérationnelle Université de Montréal, Montreal, Qc H3C-3J7, Canada fschwenk,bengioyg@iro.umontreal.ca

More information

Diversity-Based Boosting Algorithm

Diversity-Based Boosting Algorithm Diversity-Based Boosting Algorithm Jafar A. Alzubi School of Engineering Al-Balqa Applied University Al-Salt, Jordan Abstract Boosting is a well known and efficient technique for constructing a classifier

More information

Hierarchical Boosting and Filter Generation

Hierarchical Boosting and Filter Generation January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

More information

FRaC: A Feature-Modeling Approach for Semi-Supervised and Unsupervised Anomaly Detection

FRaC: A Feature-Modeling Approach for Semi-Supervised and Unsupervised Anomaly Detection Noname manuscript No. (will be inserted by the editor) FRaC: A Feature-Modeling Approach for Semi-Supervised and Unsupervised Anomaly Detection Keith Noto Carla Brodley Donna Slonim Received: date / Accepted:

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Pareto Analysis for the Selection of Classifier Ensembles

Pareto Analysis for the Selection of Classifier Ensembles Pareto Analysis for the Selection of Classifier Ensembles Eulanda M. Dos Santos Ecole de technologie superieure 10 rue Notre-Dame ouest Montreal, Canada eulanda@livia.etsmtl.ca Robert Sabourin Ecole de

More information

Neural Network Learning: Testing Bounds on Sample Complexity

Neural Network Learning: Testing Bounds on Sample Complexity Neural Network Learning: Testing Bounds on Sample Complexity Joaquim Marques de Sá, Fernando Sereno 2, Luís Alexandre 3 INEB Instituto de Engenharia Biomédica Faculdade de Engenharia da Universidade do

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Getting Lost in the Wealth of Classifier Ensembles?

Getting Lost in the Wealth of Classifier Ensembles? Getting Lost in the Wealth of Classifier Ensembles? Ludmila Kuncheva School of Computer Science Bangor University mas00a@bangor.ac.uk Supported by Project RPG-2015-188 Sponsored by the Leverhulme trust

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

CE213 Artificial Intelligence Lecture 14

CE213 Artificial Intelligence Lecture 14 CE213 Artificial Intelligence Lecture 14 Neural Networks: Part 2 Learning Rules -Hebb Rule - Perceptron Rule -Delta Rule Neural Networks Using Linear Units [ Difficulty warning: equations! ] 1 Learning

More information

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER:

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Classifier performance evaluation

Classifier performance evaluation Classifier performance evaluation Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánu 1580/3, Czech Republic

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Eunsik Park 1 and Y-c Ivan Chang 2 1 Chonnam National University, Gwangju, Korea 2 Academia Sinica, Taipei,

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data

More information

A Simple Algorithm for Learning Stable Machines

A Simple Algorithm for Learning Stable Machines A Simple Algorithm for Learning Stable Machines Savina Andonova and Andre Elisseeff and Theodoros Evgeniou and Massimiliano ontil Abstract. We present an algorithm for learning stable machines which is

More information

Benchmarking Non-Parametric Statistical Tests

Benchmarking Non-Parametric Statistical Tests R E S E A R C H R E P O R T I D I A P Benchmarking Non-Parametric Statistical Tests Mikaela Keller a Samy Bengio a Siew Yeung Wong a IDIAP RR 05-38 January 5, 2006 to appear in Advances in Neural Information

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

B555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015

B555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015 - Machine Learning - Homework Enrique Areyan April 8, 01 Problem 1: Give decision trees to represent the following oolean functions a) A b) A C c) Ā d) A C D e) A C D where Ā is a negation of A and is

More information

NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE

NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Series A, OF THE ROMANIAN ACADEMY Volume 0, Number /009, pp. 000 000 NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE

More information

Applications of multi-class machine

Applications of multi-class machine Applications of multi-class machine learning models to drug design Marvin Waldman, Michael Lawless, Pankaj R. Daga, Robert D. Clark Simulations Plus, Inc. Lancaster CA, USA Overview Applications of multi-class

More information

A PERTURBATION-BASED APPROACH FOR MULTI-CLASSIFIER SYSTEM DESIGN

A PERTURBATION-BASED APPROACH FOR MULTI-CLASSIFIER SYSTEM DESIGN A PERTURBATION-BASED APPROACH FOR MULTI-CLASSIFIER SYSTEM DESIGN V.DI LECCE 1, G.DIMAURO 2, A.GUERRIERO 1, S.IMPEDOVO 2, G.PIRLO 2, A.SALZO 2 (1) Dipartimento di Ing. Elettronica - Politecnico di Bari-

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS

Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities ; CU- CS University of Colorado, Boulder CU Scholar Computer Science Technical Reports Computer Science Spring 5-1-23 Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

Diversity Regularized Ensemble Pruning

Diversity Regularized Ensemble Pruning Diversity Regularized Ensemble Pruning Nan Li 1,2, Yang Yu 1, and Zhi-Hua Zhou 1 1 National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210046, China 2 School of Mathematical

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Numerical Learning Algorithms

Numerical Learning Algorithms Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

Combining Heterogeneous Sets of Classifiers: Theoretical and Experimental Comparison of Methods

Combining Heterogeneous Sets of Classifiers: Theoretical and Experimental Comparison of Methods Combining Heterogeneous Sets of Classifiers: Theoretical and Experimental Comparison of Methods Dennis Bahler and Laura Navarro Department of Computer Science North Carolina State University Raleigh NC

More information

CSC Neural Networks. Perceptron Learning Rule

CSC Neural Networks. Perceptron Learning Rule CSC 302 1.5 Neural Networks Perceptron Learning Rule 1 Objectives Determining the weight matrix and bias for perceptron networks with many inputs. Explaining what a learning rule is. Developing the perceptron

More information

Voting Massive Collections of Bayesian Network Classifiers for Data Streams

Voting Massive Collections of Bayesian Network Classifiers for Data Streams Voting Massive Collections of Bayesian Network Classifiers for Data Streams Remco R. Bouckaert Computer Science Department, University of Waikato, New Zealand remco@cs.waikato.ac.nz Abstract. We present

More information

CS 229 Final report A Study Of Ensemble Methods In Machine Learning

CS 229 Final report A Study Of Ensemble Methods In Machine Learning A Study Of Ensemble Methods In Machine Learning Abstract The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used

More information

Computational learning theory. PAC learning. VC dimension.

Computational learning theory. PAC learning. VC dimension. Computational learning theory. PAC learning. VC dimension. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics COLT 2 Concept...........................................................................................................

More information

A novel k-nn approach for data with uncertain attribute values

A novel k-nn approach for data with uncertain attribute values A novel -NN approach for data with uncertain attribute values Asma Trabelsi 1,2, Zied Elouedi 1, and Eric Lefevre 2 1 Université de Tunis, Institut Supérieur de Gestion de Tunis, LARODEC, Tunisia trabelsyasma@gmail.com,zied.elouedi@gmx.fr

More information