Ensemble Based on Data Envelopment Analysis

Size: px

Start display at page:

Download "Ensemble Based on Data Envelopment Analysis"

Gloria Harper
5 years ago
Views:

1 Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) , Fax) E-ail: Abstract. There has been uch research to evaluate the efficiency of various data fusion/enseble approaches. However, when cobining individual classifiers for fusion or enseble purposes, typically only isclassification rate has been considered as a perforance easure. This ight be risky especially when the class distribution is skewed or when the costs associated with both Type I and II errors are significantly different fro each other. For this kind of situation, consideration of additional perforance easures such as sensitivity, specificity, false negative or positive errors are needed. In this paper, we propose to use DEA in order to find the weights involved in ulti-attributes perforances of each classifier as an eleent of a data enseble algorith. This algorith is expected to serve general purposes of classification.. Introduction Data ining is the process of extracting valid, previously unknown, and ultiately coprehensible inforation fro large databases and using it to ake crucial business decisions. Effective ining necessary inforation and knowledge fro a large database has been recognized as a key research topic by any practitioners in the field of data-based arketing. Algoriths often used for data ining can be classified into one of the following areas: artificial neural network, achine learning, and classical statistical odels. It has been reported that the classification accuracy of the individual algorith can be iproved by cobining the results of several classifiers. Data fusion techniques try to cobine classification results obtained fro several single classifiers and are known to iprove the classification accuracy when soe results of relatively uncorrelated classifiers are cobined. Data enseble cobines various results obtained fro a single classifier fitted repeatedly based on several bootstrap resaples. The resulting perforance is known to be ore stable than that of a single classifier. There has been uch research to evaluate the efficiency of various data fusion/enseble approaches. However, when cobining individual classifiers for fusion or enseble purposes, typically only one attribute, that is isclassification rate, has been considered as a perforance easure. This ight be risky especially when the class distribution is skewed or when the costs associated with both Type I and II errors are significantly different fro each other. For this kind of situation, consideration of additional perforance easures such as sensitivity, specificity, false

2 negative or positive errors are needed. Then it becoes a ulti-attribute decision aking proble in ters of finding class inforation for a given case based on different weights on various perforance easures. Subsequent question raised is how to find such weights. In this paper, we propose to use DEA in order to find the weights involved in ultiattributes perforances of each classifier as an eleent of a data enseble algorith. Data Envelopent Analysis (DEA) has been frequently applied to assess the efficiency of several decision aking units (DMU) which have ultiple inputs as well as outputs. By way of DEA, one can find the efficiency score of each DMU and can figure out the set of efficient DMUs based on the set of non-doinated solution. In addition, DEA provides inefficient DMUs with the bencharking point and has advantages over alternative paraetric approaches such as regression or ratio analysis [2]. Organization of this paper is as follows. In section 2, we briefly suarize the established data enseble techniques along with related literature. In section 3, we introduce the enseble based on DEA. In section 4, we illustrate an exaple of the proposed ethod using the road traffic accident data. In section 5 we suarize our findings. 2. Literature Review We first review data enseble literature. Enseble algoriths can be divided into two types: those that adaptively change the distribution of the bootstrap training set based on the perforance of previous classifiers, as in Boosting ethods or Arcing (Adaptive resapling and Cobining) and those that do not, as in Bagging (Bootstrap AGGregatING)). Bagging (Bootstrap AGGregatING) algorith introduced by Breian [] votes classifiers generated by different bootstrap saples. A bootstrap saple is generated by uniforly sapling N instances fro the training set with replaceent. Detailed procedure is as follows: Step. Suppose f is a classifier, producing an M-vector output with (one) Step 2. To bag and M- (zero), at the input point x. f, we draw bootstrap saples T = ( t t,, t ) each of size N with replaceent fro the training data., 2 L N Step 3. Classify input point x to the class k with largest vote in f k bagging as follows. f k bagging M k = f () M =

3 The basic idea of Bagging is to reduce the deviation of several classifiers by voting the classified results due to bootstrap resaples. Arcing (Adaptive Resapling and Cobining) is designed exclusively for classification proble, which is developed by Freund and Shapire [3] in the nae of boosting but Breian renaed it as arcing. Basic idea of it is alost like bagging which tries to reduce the deviation of several classifiers by voting the classified results due to bootstrap resaples. Arcing takes unequal probability bootstrap saples, that is the probability of a training exaple being sapled is not unifor, but depends on the training error of previous predictors. In General, the classification procedure of arcing (Adaboost M) can be suarized as follows: Step. Saple with replaceent fro T with probabilities P (i) (where P (i)=/n and i,2, L, N = ) (=,, M) and construct the classifier f using the resapled set T of size N. Step 2. Classify T using f and let d(i)= if exaple i is incorrectly classified, else d(i)=0. ε Step 3. Calculate the and as follows. β ε = N i = ( ε P ( i ) d ( i ), β = ε Step 4. Update probabilities P + (i) by using the following forula. P d ( i) P ( i) β ( i) = d ( i P ( i) β + ) Step 5. Let =+ and go to Step if <M. Step 6. Take a weighted vote of the classifications, with weights log (β ). ) (2) (3) Quinlan [7] reported results of applying both bagging and boosting by decision trees (C4.5) on 8 data sets. Although boosting generally increases accuracy ore than bagging, it also produces severe degradation on soe data sets. The authors' further experient showed that such deterioration in general perforance of boosting is resulted fro over-fitting a large nuber of trials that allow the coposite classifier to becoe very coplex. Instead of using the fixed weight for the vote of classifier, they suggested using the voting weight of each classifier to vary in response to the confidence with which the instance is classified. Trials over a diverse collection of datasets under their suggestion reduced the downside of classification accuracy and also led to slightly better results on ost of the datasets considered. Opitz and Maclin [6] presented an epirical evaluation of Bagging and Boosting as ethods for creating an enseble of neural networks and decision-tree with 4 data sets. The authors found out that Bagging is appropriate for ost probles, but when properly applied, Boosting ay produce even larger gains in accuracy. Their results

4 also showed that the advantages and disadvantages of both Bagging and Boosting depend only on the doain to which they are applied, instead of the type of classifier. Hansen [4] copared five eta achine learning ethods which eploy neural networks as an enseble eber: three fro enseble ethods (Siple, Bagging and Adaboos and two fro ixture expert ethods (XuME and Dynco). The epirical results showed that the cooperative error function of Dynco is superior to the copetitive error function of the others. Kohavi and Wolpert [5] proposed bias and variance decoposition for isclassification error, when there are only two levels of class. The authors showed how estiating the ters in the decoposition using frequency counts leads to biased estiates and explained how to get unbiased estiators to overcoe such ajor shortcoings as obtaining potentially negative variance. They then gave soe exaples of the bias-variance tradeoff using two achine learning algoriths applied to data available in several UC-Irvine repository. 3. Enseble Based on DEA When ultiple classifiers are obtained, we suggest to cobine those results using the weight reflecting ulti-attribute perforance easures such as sensitivity, specificity, bias and variance of isclassification rate defined as follows: (Nuber of observations that predict the event correctly) Sensitivity = (Nuber of observations that represent the event ) (4) (Nuber of observations that predict the event 0 correctly) Specificity = (Nuber of observations that represent the event 0 ) (5) bias 2 [ 2 P(YF = y x ) P(YH = y x )] (6) 2 y Y 2 var iance = x P(YH y x ) (7) 2 y Y where ( Y y x) P F = is the probability that the outcoe of a given case with input x is y while ( Y y x) P H = is the probability that the outcoe of a given case with input x is classified as y. As a eans to obtain an individual weight for each perforance easure, DEA is proposed. Data Envelopent Analysis (DEA) has been frequently applied to assess the efficiency of several decision aking units (DMU) which have ultiple inputs as well as outputs. In our case, individual classifier is considered as DMU while we consider outputs as the four perforance easures of each classifier along with constant inputs. By way of DEA, one can find the efficiency score of each DMU and can figure out the set of efficient DMUs based on the set of non-doinated solution.

5 In addition, DEA provides inefficient DMUs along with bencharking points and has advantages over alternative paraetric approaches such as regression or ratio analysis [2]. We suggest DEA enseble as follows: Step. Choose the achine learning algorith to be used as a classifier. Step 2. Generate training set T by sapling fro T with replaceent, where the probability for sapling training case i is P(i). Step 3. Construct the classifier f using the resapled set T of size N. f Step 4. Evaluate the classifier in ters of ulti-attributes y where it represents the rth output variable of th classifier. In this study, we consider r four attributes sensitivity y r (y), specificity (y2), the squared bias of isclassification (y3) and the variance of isclassification (y4). Then for each unit o we want to find the best weight v i that axiize the weighted output by solving the following atheatical prograing odel: ax h s. t o = r= s r= v s r v v r r y y r r for =, > 0, s =,2,3,4,M L (8) Each DMU o is assigned the highest possible efficiency score that the constraints allow fro the available data by choosing the appropriate virtual ultipliers (weights) for the outputs. Let h o * denote the optial value of h o where 0 h o *. One can say that h o *= and the copleentary slackness conditions of linear prograing are et if and only if unit o is efficient relative to other units considered. On the other hand, if h o *<, then this unit is considered as inefficient which could not achieve a higher rating relative to the reference set to which it is being copared. = ( h, L, h M ) Step 5. Once a set of the efficiency scores of M classifiers h is found,

6 noralize it to ˆ = ( hˆ, hˆ,, hˆ 2 L M ) M h = h so that ˆ = and take a weighted vote of the classifications, with weights ĥ as follows. M f = hˆ f (9) bagging = 4. Nuerical Exaple In this section, we apply the proposed DEA enseble algorith to the actual data for illustration. Sohn and Shin [9] used individual algoriths such as neural network and decision tree to classify the severity of road accidents occurred in Seoul, Korea in 996. Input variables used for classification of two levels of severity (bodily injury and property daage) are road width, shape of car-body, accident category, speed before the accident, violent drive, and protective device. Detailed levels of these input variables are displayed in Table. These variables were selected using decision tree and all turned out to have better explanatory power than variables representing weather conditions. A saple of 564 accidents was taken and 60% of the were used for training while rest of the was used for validation, respectively. Correct classification rates obtained by both classification odels were not significantly different. In order to increase the classification perforance, we use DEA enseble introduced in the previous section. Table Input Variable Description Death Major Injury Minor Injury Injury Report Property Daage Car Alone

Accident Type Man to Car Car to Car Velocity Below 20k Below 0k Road Width Car Shape Service Area Over 6 Unknown Bus Figure Decision Tree based on First Bootstrap Resapled Training Data Set We

7 Accident Type Man to Car Car to Car Velocity Below 20k Below 0k Road Width Car Shape Service Area Over 6 Unknown Bus Figure Decision Tree based on First Bootstrap Resapled Training Data Set We generate 0 Bootstrap resaples and fit decision tree for each resaple. Figure shows a result of fitted tree. This fitted tree classifier is evaluated in ters of sensitivity, specificity, bias and variance of isclassification rate. Obtained values can be used for outputs for DEA. Suarized values for all 0 fitted tree classifiers are displayed in Table 2. We use SAS/OR [8] to solve atheatical prograing for DEA and efficiency score obtained for each classifier is suarized in Table 3. Apparently, classifiers 2,5,6,8 and 9 turn out to be efficient and accordingly their weights are higher than the rest of the reaining classifiers. We then apply these weights to the results of individual classifiers to obtain DEA enseble outcoe. Finally, DEA enseble is copared to a singletree result with respect to ultiple perforance easures.

8 Table 2 Perforance Measures of Each Bootstrap Classifier Table 3 Efficiency Score and Weight for Each Classifier DEA enseble appears to be better in ters of sensitivity and variance. In our case, there was not uch variation aong the ten efficiency scores and therefore the weights were alost even. Note that when the weights are the sae, DEA enseble would be equivalent to Bagging. Table 4 Coparisons of the Classification Results between DEA Enseble and Decision Tree

9 6. Conclusion In this paper, we suggest DEA enseble, which reflects ore than one perforance easures for voting. This is a generalized version of Bagging and is expected to have axiu utilization when there is uch variation in the efficiency scores of individual classifiers. One drawback of the proposed DEA enseble has, as the other enseble is that the cobined rule cannot be explicitly stated. DEA enseble concept suggested can be extended to Arcing by reflecting ultiple perforances when assigning new selection probabilities. This is left as further study areas. References [] Breian, L., Bagging, Boosting, and C4.5, ftp://ftp.stat.berkeley.edu/pub/users /breian, 996. [2] Charnes, A., Cooper, W. W., and Rhodes, E., Measuring the efficiency of decision aking units, European Journal of Operational Research, 2(6): , 978. [3] Freund, Y. and Shapire, R. E., A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Coputer and Syste Sciences, 55:9-39, 995. [4] Hansen, J. V., Cobining predictors: coparison of five eta achine learning ethods, Inforation Sciences, 9:9-05, 999. [5] Kohavi, R., Wolpert, D. H., Bias plus variance decoposition for zero-one loss function, Proceedings of the Thirteenth International Conference on Machine Learning, 996. [6] Opitz, D. W. and Maclin, R. F., An epirical evaluation of bagging and boosting for artificial neural networks, International conference on neural networks, 3:40-405, 997. [7] Quinlan, J. R., Bagging, Boosting, and C4.5 In Proceedings of the Thirteenth National Conference on Artificial Intelligence, 996. Available on line at: [8] SAS/OR Manual, SAS Institute, Cary, NC, USA, 992. [9] Sohn, S. Y. and Shin, H. W., Pattern Recognition for Road Traffic Accident Severity in Korea, Ergonoics, 44():07-7, 200.

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/