Data Dependence in Combining Classifiers

Size: px

Start display at page:

Download "Data Dependence in Combining Classifiers"

Bennett Shields
5 years ago
Views:

1 in Combining Classifiers Mohamed Kamel, Nayer Wanas Pattern Analysis and Machine Intelligence Lab University of Waterloo CANADA

2 ! Dependence! Dependence Architecture! Algorithm Outline

3 Pattern Recognition Systems! Best possible classification rates.! Increase efficiency and accuracy. Multiple Classifier Systems! Evidence of improving performance! Problem decomposed naturally from using various sensors! Avoid making commitments to arbitrary initial conditions or parameters in Combining Classifiers

4 Categorization of MCS Architecture Input/Output Mapping Representation Types of classifiers in Combining Classifiers

5 Categorization of MCS (cntd Architecture cntd ) Parallel [Dasarathy,, 94] Parallel Input 1 Classifier 1 Input 2 Classifier 2 Input N Classifier N F U S I O N Output Serial [Dasarathy,, 94] Serial Input 1 Classifier 1 Classifier 2 Classifier N Output Input 2 Input N in Combining Classifiers

6 Categorization of MCS (cntd Architectures [Lam, 00] cntd ) Conditional Topology! Once a classifier unable to classify the output the following classifier is deployed Hierarchal Topology! Classifiers applied in succession! Classifiers with various levels of generalization Hybrid Topology! The choice of the classifier to use is based on the input pattern (selection) Multiple (Parallel) Topology in Combining Classifiers

7 Categorization of MCS (cntd Input/Output Mapping cntd ) Linear Mapping! Sum Rule! Weighted Average [Hashem 97] Non-linear Mapping! Maximum! Product! Hierarchal Mixture of Experts [Jordon and Jacobs 94]! Stacked Generalization [Wolpert 92] in Combining Classifiers

8 Categorization of MCS (cntd Representation cntd ) Similar representations! Classifiers need to be different Different representation! Use of different sensors! Different features extracted from the same data set [Ho, 98, Skurichina & Duin,, 02] in Combining Classifiers

9 Categorization of MCS (cntd Types of Classifiers cntd ) Specialized classifiers! Encourage specialization in areas of the feature space! All classifiers must contribute to achieve a final decision! Hierarchal Mixture of Experts [Jordon and Jacobs 94]! Co-operative operative Modular Neural Networks [Auda and Kamel 98] Ensemble of classifiers! Set of redundant classifiers Competitive versus cooperative [Sharkey, 1999] in Combining Classifiers

10 Categorization of MCS (cntd cntd )! Classifiers inherently dependent on the data.! Describe how the final aggregation uses the information present in the input pattern.! Describe the relationship between the final output Q(x) and the pattern under classification x in Combining Classifiers

11 Data Independent ly Dependent ly Dependent in Combining Classifiers

12 Data Independence Solely rely on output of classifiers to determine final classification output. Q(x) = arg max(f (C (x)), j) j Q(x) is the final class assigned for pattern x C j is a vector composed of the output of the various classifiers in the ensemble {c{ 1j,c 2j,...,c Nj } for a given class y j c ij is the confidence classifier i has in pattern x belonging to class y j Mapping F j can be linear or non-linear j j in Combining Classifiers

13 Data Independence (cntd cntd ) Simple voting techniques are data independent! Average! Maximum! Majority Susceptible to incorrect estimates of the confidence in Combining Classifiers

14 Train the combiner on global performance of the data Q(x) = arg max(f (W ( C( x)), C (x)), j) j j j W(C (x)) is the weighting matrix composed of elements w ij w ij is the weight assigned to class j in classifier i in Combining Classifiers

15 (cntd cntd ) ly data dependent approaches include! Weighted average [Hashem 97]! Fuzzy Measures [Gader et al 96]! Belief theory [Xu et al 92]! Behavior Knowledge Space (BKS) [Huang et al 95]! Decision Templates [Kuncheva et al 01]! Modular approaches [Auda and Kamel 98]! Stacked Generalization [Wolpert 92]! Boosting [Schapire 90] Lack consideration for local superiority of classifiers in Combining Classifiers

16 Classifier selection or combining performed based on the sub-space space which the input pattern belongs to. Final classification is dependent on the pattern being classified. Q(x) = arg max(f (W ( x), j j C (x)), j) j in Combining Classifiers

17 (cntd cntd ) ly Data Dependent approaches include! Dynamic Classifier Selection (DCS) DCS With local Accuracy (DCS_LA) [Woods et. al.,97] DCS based on Multiple Classifier Behavior (DCS_MCB) [Giancinto and Roli,, 01]! Hierarchal Mixture of Experts [Jordon and Jacobs 94]! Feature-based approach [Wanas et. al., 99] Weights demonstrate dependence on the input pattern. Intuitively should perform better than other methods in Combining Classifiers

18 Architectures Methodology to incorporate multiple classifiers in a dynamically adapting system Aggregation adapts to the behavior of the ensemble! Detectors generate weights for each classifier that reflect the degree of confidence in each classifier for a given input! A trained aggregation learns to combine the different decisions in Combining Classifiers

19 Architectures (cntd Architecture I cntd ) N. Wanas, M. Kamel, G. Auda, and F. Karray, Decision Aggregation in Modular Neural Network Classifiers, Pattern Recognition Letters, 20(11-13), , in Combining Classifiers

20 Architectures (cntd cntd ) Classifiers! Each individual classifier, C i, produces some output representing its interpretation of the input x! Utilizing sub-optimal classifiers.! The collection of classifier outputs for class y j is represented as C j (x) Detector! Detector D l is a classifier that uses input features to extract useful information for aggregation! Doesn t aim to solve the classification problem.! Detector output d lg (x) is a probability that the input pattern x is categorized to group g.! The output of all the detectors is represented by D(X) in Combining Classifiers

21 Architectures (cntd cntd ) Aggregation! Fusion layer for all the classifiers! Trained to adapt to the behavior of the various modules! data dependent Q(x) = arg max(f (D( x), j C (x)), j) Weights dependent on the input pattern being classified j j in Combining Classifiers

22 Architectures (cntd Architecture II cntd ) in Combining Classifiers

23 Architectures (cntd cntd ) Classifiers! Each individual classifier, C i, produces some output representing its interpretation of the input x! Utilizing sub-optimal classifiers.! The collection of classifier outputs for class y j is represented as C j (x) Detector! Appends input to output of classifier ensemble.! Produces a weighting factor, w ij,for each class in a classifier output.! The dependence of the weights on both the classifier output and the input pattern is represented by W(x,C j (x)) in Combining Classifiers

24 Architectures (cntd cntd ) Aggregation! Fusion layer for all the classifiers! Trained to adapt to the behavior of the various modules! Combines implicit and explicit data dependence Q(x) = arg max(f (W ( x, C ( x)), C (x)), j) j j Weights dependent on the input pattern and the performance of the classifiers. j j in Combining Classifiers

25 Five one-hidden layer BP classifiers used partially disjoint data sets No optimization is performed for the trained networks The parameters of all the networks are maintained for all the classifiers that are trained Three data sets! 20 Class Gaussian! Satimages! Clouds data in Combining Classifiers

26 (cntd cntd ) Oracle Majority Data Set Singlenet Maximum 20 Class ± ± ± ± 0.36 Clouds ± ± 0.16 Data Dependent Approaches ± ± 0.02 Satimages ± ± ± ± 0.16 Average ± ± ± 0.22 Borda ± ± ± 0.20 ly Data Dependent Approaches Weighted Avg ± ± ± 0.21 Bayesian ± ± ± 0.16 Fuzzy Integral ± ± ± 0.19 ly Data Dependent Feature-based 8.64 ± ± ± 0.19 in Combining Classifiers

27 each component independently! Optimize individual components, may not lead to overall improvement! Collinearity,, high correlation between classifiers! Components, under-trained or over-trained in Combining Classifiers

28 (cntd cntd ) Adaptive training Selective: Reducing correlation between components! Selective:! Focused: Re Re-training focuses on misclassified patterns. Efficient: Controls the duration of training! Efficient: in Combining Classifiers

29 Adaptive : Main loop Increase diversity among ensemble Incremental learning Evaluation of training to determine the re-training set in Combining Classifiers

30 Adaptive : Save classifier if it performs well on the evaluation set Determine when to terminate training for each module in Combining Classifiers

31 Adaptive : Evaluation Train aggregation modules Evaluate training sets for each classifier Compose new training data in Combining Classifiers

32 Adaptive : Data Selection New training data are composed by concatenating! Error i : Misclassified entries of training data for classifier i.! Correct i : Random choice of P ratio of correctly classified entries of the training data for classifier i. in Combining Classifiers

33 Five one-hidden layer BP classifiers used partially disjoint data sets No optimization is performed for the trained networks The parameters of all the networks are maintained for all the classifiers that are trained Three data sets! 20 Class Gaussian! Satimages! Clouds data in Combining Classifiers

34 (cntd cntd ) Best Classifier Oracle Best Classifier Oracle Best Classifier Oracle Data Set Singlenet 20 Class ± ± ± ± ± ± ± ± ± ± 0.19 Normal Clouds ± ± ± ± ± ± ± 0.17 Architecture Trained Adaptively ± ± ± 0.13 Satimages ± ± ± ± 0.19 Ensemble Trained Adaptively using WA as the evaluation function ± ± ± ± ± ± 0.14 in Combining Classifiers

35 Categorization of various combining approaches based on data dependence Independent : vulnerable to incorrect confidence estimates implicitly dependent: doesn t take into account local superiority of classifiers ly dependent: Literature focuses on selection not combining in Combining Classifiers

36 (cntd cntd ) Feature-based approach! Combines implicit and explicit data dependence! Uses an Evolving training algorithm to enhance diversity amongst classifiers! Reduces harmful correlation! Determines duration of training! Improved classification accuracy in Combining Classifiers

37 References [Kittler et. al., 98] J. Kittler, M. Hatef, R. Duin, and J. Matas, On Combining Classifiers, IEEE Trans. PAMI, 20:3, , [Dasarthy,, 94] B. Dasarthy, Decision Fusion, IEEE Computer Soc. Press, [Lam, 00] L. Lam, Classifier Combinations: Implementations and Theoretical Issues, MCS2000, LNCS 1857, 77-86, [Hashem,, 1997] S. Hashem, Algorithms for Optimal Linear Combination of Neural Networks Int. Conf. on Neural Networks, Vol 1, , [Jordon and Jacob, 94] M. Jordon, and R. Jacobs, Hierarchical Mixture of Experts and the EM Algorithm, Neural Computing, , [Wolpert,, 92] D. Wolpert, Stacked Generalization, Neural Networks, Vol 5, , 1992 [Auda and Kamel, 98] G. Auda and M. Kamel, Modular Neural Network Classifiers: A Comparative Study, J. Int. Rob. Sys., Vol. 21, , [Gader et. al., 96] P. Gader, M. Mohamed, and J. Keller, Fusion of Handwritten Word Classifiers, Patt. Reco. Let.,17(6), , [Xu et. al., 92] L. Xu, A. Kazyzak, C. Suen, Methods of Combining Multiple Classifiers and their Applications to Handwritten Recognition, IEEE Sys. Man and Cyb., 22(3), , 1992 [Kuncheva et. al., 01] L. Kuncheva, J. Bezdek, and R. Duin, Decision Templates for Multiple Classifier Fusion: An Experimental Comparison, Patt. Reco., vol. 34, , [Huang et. al., 95] Y. Huang, K. Liu, and C. Suen, The Combination of Multiple Classifiers by a Neural Network Approach, J. Patt. Reco. and Art. Int., Vol. 9, , [Schapire, 90] R. Schapire, The Strength of Weak Learnability, Mach. Lear., Vol. 5, ,1990. [Giancinto and Roli,, 01] G. Giancinto and F. Roli, Dynamic Classifier Selection based on Multiple Classifier Behavior, Patt. Reco., Vol. 34, , [Wanas et., al., 99] N. Wanas, M. Kamel, G. Auda, and F. Karray, Decision Aggregation in Modular Neural Network Classifiers, Patt. Reco. Lett., 20(11-13), , in Combining Classifiers

Selection of Classifiers based on Multiple Classifier Behaviour

Selection of Classifiers based on Multiple Classifier Behaviour Giorgio Giacinto, Fabio Roli, and Giorgio Fumera Dept. of Electrical and Electronic Eng. - University of Cagliari Piazza d Armi, 09123 Cagliari,