Study of Selective Ensemble Learning Methods Based on Support Vector Machine

Similar documents
Kernel Methods and SVMs Extension

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

A New Evolutionary Computation Based Approach for Learning Bayesian Network

Which Separator? Spring 1

Linear Classification, SVMs and Nearest Neighbors

Regularized Discriminant Analysis for Face Recognition

Lecture 10 Support Vector Machines. Oct

Support Vector Machines

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Kristin P. Bennett. Rensselaer Polytechnic Institute

The Study of Teaching-learning-based Optimization Algorithm

Lecture 10 Support Vector Machines II

Generalized Linear Methods

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Support Vector Machines

CSE 252C: Computer Vision III

Semi-supervised Classification with Active Query Selection

Support Vector Machines

Multilayer Perceptron (MLP)

Problem Set 9 Solutions

MMA and GCMMA two methods for nonlinear optimization

Online Classification: Perceptron and Winnow

COS 521: Advanced Algorithms Game Theory and Linear Programming

Dynamic Ensemble Selection and Instantaneous Pruning for Regression

Support Vector Machines CS434

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Support Vector Machines CS434

Ensemble Methods: Boosting

A Network Intrusion Detection Method Based on Improved K-means Algorithm

On the Multicriteria Integer Network Flow Problem

The Minimum Universal Cost Flow in an Infeasible Flow Network

Lagrange Multipliers Kernel Trick

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Support Vector Machines

Intro to Visual Recognition

Lecture 3: Dual problems and Kernels

Chapter 6 Support vector machine. Séparateurs à vaste marge

The Order Relation and Trace Inequalities for. Hermitian Operators

Improved delay-dependent stability criteria for discrete-time stochastic neural networks with time-varying delays

EEE 241: Linear Systems

Using Immune Genetic Algorithm to Optimize BP Neural Network and Its Application Peng-fei LIU1,Qun-tai SHEN1 and Jun ZHI2,*

CSC 411 / CSC D11 / CSC C11

A Hybrid Variational Iteration Method for Blasius Equation

Large-Margin HMM Estimation for Speech Recognition

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Feature Selection: Part 1

10-701/ Machine Learning, Fall 2005 Homework 3

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

A new Approach for Solving Linear Ordinary Differential Equations

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

18-660: Numerical Methods for Engineering Design and Optimization

Support Vector Machines

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Nonlinear Classifiers II

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Finding Dense Subgraphs in G(n, 1/2)

FORECASTING EXCHANGE RATE USING SUPPORT VECTOR MACHINES

Lecture Notes on Linear Regression

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Interactive Bi-Level Multi-Objective Integer. Non-linear Programming Problem

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Natural Language Processing and Information Retrieval

Learning with Tensor Representation

1 Convex Optimization

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Maximal Margin Classifier

Boostrapaggregating (Bagging)

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Ensemble of GA based Selective Neural Network Ensembles

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Multigradient for Neural Networks for Equalizers 1

Beyond Zudilin s Conjectured q-analog of Schmidt s problem

VQ widely used in coding speech, image, and video

The internal structure of natural numbers and one method for the definition of large prime numbers

NUMERICAL DIFFERENTIATION

Module 9. Lecture 6. Duality in Assignment Problems

A NEW ALGORITHM FOR FINDING THE MINIMUM DISTANCE BETWEEN TWO CONVEX HULLS. Dougsoo Kaown, B.Sc., M.Sc. Dissertation Prepared for the Degree of

Feature Selection in Multi-instance Learning

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

Double Layered Fuzzy Planar Graph

A Fast Computer Aided Design Method for Filters

Report on Image warping

Supporting Information

Evaluation of simple performance measures for tuning SVM hyperparameters

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Convexity preserving interpolation by splines of arbitrary degree

Application of B-Spline to Numerical Solution of a System of Singularly Perturbed Problems

Transient Stability Assessment of Power System Based on Support Vector Machine

Fixed point method and its improvement for the system of Volterra-Fredholm integral equations of the second kind

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Unified Subspace Analysis for Face Recognition

An Admission Control Algorithm in Cloud Computing Systems

A Multimodal Fusion Algorithm Based on FRR and FAR Using SVM

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Transcription:

Avalable onlne at www.scencedrect.com Physcs Proceda 33 (2012 ) 1518 1525 2012 Internatonal Conference on Medcal Physcs and Bomedcal Engneerng Study of Selectve Ensemble Learnng Methods Based on Support Vector Machne Ka L, Zhbn Lu,Yanxa Han School of Mathematcs and Computer Hebe Unversty Baodng, Hebe Provnce,071002, Chna lka_ntu@163.com,chenglong_lzb@163.com Abstract Dversty among base classfers s an mportant factor for mprovng n ensemble learnng performance. In ths paper, we choose support vector machne as base classfer and study four methods of selectve ensemble learnng whch nclude hll-clmbng, ensemble forward sequental selecton, ensemble backward sequental selecton and clusterng selecton. To measure the dversty among base classfers n ensemble learnng, the entropy E s used. The expermental results show that dfferent dversty measure mpacts on ensemble performance n some extent and frst three selectve strateges have smlar generalzaton performance. Meanwhle, when usng clusterng selectve strategy, selectng dfferent number of clusters n ths experment also does not mpact on the ensemble performance except some dataset. 2012 2011 Publshed by Elsever by Elsever B.V. Selecton Ltd. Selecton and/or peer and/or revew peer-revew under responsblty under responsblty of ICMPBE Internatonal of [name Commttee. organzer] Open access under CC BY-NC-ND lcense. Keywords: Dversty; Selectve Ensemble; Generalzaton Error; Support Vector Machne 1. Introducton Dversty has been recognzed as a very mportant characterstc for mprovng generalzaton performance of ensemble learnng. So researchers present some dversty measures and ensemble learnng methods whch use dfferent strategy to rase dversty among components. In generatng ensemble member, selectve ensemble learnng s a common method. Usng dfferent selectve strateges wll obtan dfferent ensemble learnng methods. For example, amed at neural network, Gacnto appled clusterng technology to select ensemble member [1]. Zhou et al utlzed genetc algorthm to select ensemble members and obtaned better generalzaton performance [2]. After that, L et al. also appled clusterng technques and genetc algorthms to select ensemble models [3]. Moreover, we study the selectve ensemble learnng based on neural network and decson tree [4]. In the aspect of dversty, 1875-3892 2012 Publshed by Elsever B.V. Selecton and/or peer revew under responsblty of ICMPBE Internatonal Commttee. Open access under CC BY-NC-ND lcense. do:10.1016/.phpro.2012.05.247

Ka L et al. / Physcs Proceda 33 ( 2012 ) 1518 1525 1519 Kuncheva et al researched the measures of dversty n classfer ensembles and ther relatonshp wth the ensemble accuracy [5]. Now, people are stll researchng the measures of dversty and dfferent ensemble learnng algorthms [6-13]. Amed these above, ths paper researches the selectve ensemble methods based on support vector machne algorthms. Ths paper s organzed as follows. Support vector machne and dversty measure used n ths paper are summarzed n secton 2, and selectve ensemble methods are ntroduced n secton 3. Secton 4 gves the results and analyss of experment. The conclusons are gven n secton 5. 2. Support vector machne and dversty measure In ths secton, we brefly revew the support vector machne n bnary classfcaton problems.gven N a dataset of labeled tranng ponts (x 1, y 1 ), (x 2, y 2 ),, (x l, y l ), where ( x, y) R { 1, 1}, =1, 2 l. When they are lnearly separable tranng dataset, there exst some hyperplane whch correctly separate the postve and negatve examples. The pont x whch les on the hyperplane satsfes <w,x>+b=0, where w s normal to the hyperplane. It s seen that f the tranng set s lnearly separable, the support vector algorthm fnds the optmal separatng hyperplane wth the maxmal margn; If the tranng set s lnearly non-separable or approxmately separable data, t need ntroduce the trade-off parameter; If the tranng data s not lnearly separable, the SVM leanng algorthm mapped the nput data usng a nonlnearly mappng functon x ( x ) to a hgh-dmenson feature space z, and the data n z s ndeed lnearly or approxmately separable. In two-class classfcaton, all tranng data satsfy the followng decson functon 1, f y =+1 f ( x ) sgn( w, x b). 1, f y = -1 (1) In the lnearly separable tranng set, all tranng ponts satsfy the followng nequaltes w, x b 1, f y =+1. (2) w, x b 1, f y =-1 In fact, t can be wrtten as y( wx, b ) 1, (=1,2,,l) above nequaltes. Fndng the hyperplane s 2 equvalent to obtan the maxmum margn by mnmzng w, subect to constrants (2). The prmal optmal problem s gven as 1 2 mn w wb, 2 (3) st.. y (<w,x >+b) 1,for =1,2,...,l As the process of solvng (3) s very dfferent, so we ntroduce Lagrange multpler to transform the prmal problem nto ts dual problem that solves the followng quadratc programmng (QP) problem. mn l l l 1 yy ( x x ) 2 (4) l 1 1 1 1 s. t. y 0, 0, 1, 2,..., l. In lnear classfer, the soluton n feature space usng a lnearly mappng functon X ( X ) only replaces the dot products x x by nner product of vectors ( x) ( x ). The mappng functon satsfes (), x ( x) k(, x x ), called kernel functon, n the tranng algorthm and we would never need to explctly even know what s. An decson functon SVM s obtaned by computng dot products of a gven test pont x wth w, or more specfcally by computng the sgn of

1520 Ka L et al. / Physcs Proceda 33 ( 2012 ) 1518 1525 Ns * ( ) ( ) 1 f x y s x b Ns 1 Ns 1 * * y ( s ) ( x) b yk( s, x) b, (5) where the coeffcents { } are postve and subtracted from the obectve functon, the s are support vectors, and Ns s the number of support vectors. Next we brefly ntroduce Non-parwse dversty measure used n the experment, whch s the entropy measure E [6], whch s defned n the followng: 1 N 1 E mn l( z ), L l( z ), (6) 1 N L L /2 where L s the number of classfers, N s the number of nstances n the data set. z s nstance. l(z ) s the number of classfers that can correctly recognze z at the same tme. E vares between 0 and 1. Except these above, we use pars of dversty measures, as seen [5]. 3. Algorthms of selectve ensemble learnng Selectve Ensemble learnng s to select ensemble members wth selectve strateges after generatng many dfferent base models. Dfferent selectve strateges can get dfferent ensemble learnng algorthms. In the past, researchers manly mprove dversty of ensemble member by use feature subset method [6-11]. In ths paper, we use data subset method and gve four dfferent selectve ensemble learnng approaches: Hll-Clmbng, Ensemble Backward Sequental Selecton, Ensemble Forward Sequental Selecton and Clusterng Selecton. Base models are support vector machne (SVM). Durng tranng classfers, classfers have dfferences because they are traned by randomly extractng data set. In every tranng classfer, the soluton space s dfferent. In order to measure how the dfferences mpact on the accuracy of classfer, we ntroduce a formula to study how dversty mpact on ensemble accuracy. It defnes as follows: acc dv Fun r allacc alldv. (7) In formula (7), acc s ensemble accuracy. allacc s all classfers accuracy. dv s ensemble dversty. alldv s all classfers dversty. In computng accuracy, we use the maorty vote and use the entropy E as the dversty measure. In the followng, we brefly ntroduce four dfferent ensemble learnng methods whch are seen as n [4]. 3.1 Hll-Clmbng (HC) Method Hll Clmbng ensemble proposed s composed of two maor phases, namely constructon of the ntal ensemble by randomly selectng base model and teratve refnement of the ensemble members. Intal ensemble members are formed usng the randomly selectve method; the second phase s amed to mprove the value of the ftness functon of the ensemble classfers. For all the learnng models, an attempt s made to swtch (add or delete) each models. If the result produces the larger value of ftness, that change s kept. Ths process s contnued untl no further mprovements are possble. 3.2 Ensemble Backward Sequental Selecton (EBSS) Method

Ka L et al. / Physcs Proceda 33 ( 2012 ) 1518 1525 1521 EBSS begns wth all learnng models and repeatedly removes a model whose removal yelds the maxmal value of ftness mprovement. The cycle repeats untl no mprovement s obtaned. 3.3 Ensemble Forward Sequental Selecton (EFSS) Method EFSS begns wth zero attrbutes, evaluates all base models wth exactly one model, and selects the one wth the best performance. It then adds to the ResultSet that yelds the best performance for models of the next larger sze. The cycle repeats untl no mprovement s obtaned. 3.4 Clusterng Selecton Ensemble Clusterng technology s an mportant data analyss tool. By t, data structure may be found. At present, there exst many dfferent knds of clusterng algorthms. Among them, most common clusterng algorthms are herarchcal clusterng Algorthms and k-means clusterng algorthms. In the followng, we study model clusterng based on above algorthms. For any two models n m and n n, dstance between them s defned as 1 N 1 n, n E d( n, n ) mn l( z ), L l( z ). (8) m n m n 1 N L L/2 The above dstance measure s amed to group models based on dversty. That s to say that n the same cluster, we select base model so that the value of dversty n the whole cluster s maxmal. Moreover, n herarchcal clusterng algorthms, to merge smlar clusters, we use the followng dstance between any two clusters: E, E d( E, E ) max { d( n, n )}, s. (9) n E n E t s t In the followng, T s the sze of tranng models, S s dataset and L s learnng algorthm. The selectve ensemble method of herarchcal clusterng s descrbed as follows. 4. Experments 4.1 Expermental data and methods A number of ensemble technques solvng the ntegraton problem can mprove the generalzaton performance of ensemble learnng. In addton, the theoretcal bass of ensemble learnng s based on the dversty of base models. Selectve ntegraton ams to select ensemble models whch have the bggest dversty usng some strateges. In ths paper these strateges nclude Hll Clmbng, Ensemble Backward Sequental Selecton, Ensemble Backward Sequental Selecton and Clusterng technology. Clusterng technology nclude herarchcal clusterng algorthms and k-means clusterng. These methods have n common, that s, many base models are all traned utlzng decson tree algorthms, neural network algorthm and support vector algorthm before gettng ensemble models. Then, ensemble models are constructed wth above selectve ensemble methods. Number Data set TABLE I Features of dataset Number of data Number of feature 1 Balance 625 5 3 Number of class 2 Car 1728 6 4

1522 Ka L et al. / Physcs Proceda 33 ( 2012 ) 1518 1525 3 Cmc 1473 9 3 4 Ecol 336 8 8 5 Glass 214 11 7 6 Hayes 132 6 3 7 Irs 150 4 3 8 Pma 768 8 2 9 Wne 178 13 3 10 Zoo 101 18 7 4.2.Expermental results and analyss for HC, EFSS and EBSS Frstly, many base models are traned wth decson tree, BP neural network and support vector machne algorthm. Then we use hll clmbng, ensemble forward sequental selecton and ensemble backward sequental selecton to select base models for formng ensemble members. Fnally, the performance of ntegrated model and dversty are acheved usng a maorty vote. In order to study the effect of the dversty on ensemble accuracy, we consder the accuracy and dversty together. It controls dversty by alterng parameter r. The value of parameter r s 0, 1/5, 1/3 and 1. In Fg. 1 and Fg. 2, classfers are traned usng support vector machne. Then ensemble members are acheved usng above three selectve methods. Fnally ensemble accuracy s computed usng vote method. From Fg. 2, we can see that when parameter r s zero, the value of most ensemble accuracy s bgger than other. For some data set, the ensemble accuracy s larger as the value of parameter r. So the dversty mpact on the ensemble accuracy n some extent. In all, three dfferent ensemble methods ncludng HC,EFSS and EBSS have smlar generalzaton performance based on support vector machne for vote strategy. Fg. 1 Expermental results wth HC,EFSS and EBSS methods.

Ka L et al. / Physcs Proceda 33 ( 2012 ) 1518 1525 1523 r=1/5 r=1/3 r=1 r=0 110 90 70 50 1 2 3 4 5 6 7 8 9 10 Fg. 2 Expermental results wth HC,EFSS and EBSS methods for dfferent value r. 4.3 Expermental results wth clusterng technology A hundred of base models are created usng the support vector machne algorthm. Then dfferent clusters are obtaned usng k-means and herarchcal clusterng technology. In ths experment, the number of clusters s 4, 6, 9, 15, 25, 30 and 35. Fg. 3 s the average result of ensemble models whch s selected usng k-means clusterng technology and herarchcal clusterng technology for dfferent cluster. Ensemble accuracy s computed by vote method, and dversty s measured by entropy E, Fal/Non-fal, double fault and plan dsagreement measure. In clusterng selectve technology, we select classfers whch have larger dversty. From Fg. 3, we can see that there s no clear dscplnary change, but usng clusterng strategy can get better ensemble performance. Fg. 3 Expermental results wth Clusterng methods. 4.4 Analyss of expermental results wth generalzaton error

1524 Ka L et al. / Physcs Proceda 33 ( 2012 ) 1518 1525 We consder the generalzaton error of ensemble learnng when generatng the outputs of a number of dfferent classfers. Wth regard to the generalzaton error, Martn studed t [14]. In the experment, we frst select some base models usng clusterng technology ncludng k-means and herarchcal clusterng technology, then usng these base models estmate the generalzaton error of test data. We regard the data set as a sample m s (( x 1, b 1 ),..., ( xm, bm )) Z and b {0,1}. Suppose that selected base models are {C 1, C 2,, C L }. Wth a test sample x, we compute Count { C : C ( x ) b } 1, 2,..., L, thus there s followng equaton: 1 Count L / 2 er. 0 Count L / 2 Fnally, all er are sumed as follows: L ercount er 1. The generalzaton error s, er error count. m Expermental result s seen as n Fg. 4. From these result, we can see that selectve strateges can reduce ensemble generalzaton error. Further, they can mprove ensemble generalzaton performance. 5. Conclusons Fg. 4 Expermental results wth Clusterng methods. The paper prmarly studed the selectve ensemble methods and the dversty of ensemble models. We frst ntroduce dversty measures ncludng pars of dversty measures and non-parwse dversty measures. Pars of dversty measures nclude fal/non-fal, double-fault and plan dsagreement measure. Non-parwse dversty measure s the entropy measure E. Then, we study four selectve ensemble technologes, namely hll clmbng, ensemble forward sequental selecton, ensemble backward sequental selecton and clusterng technologes ncludng herarchcal clusterng algorthms and k-means clusterng. The Entropy E s used as dversty measures n ths paper. Analyzng the hll clmbng, ensemble forward sequental selecton and ensemble backward sequental selecton, we can see that usng selectng strategy can acheve a certan performance advantages of ntegraton.

Ka L et al. / Physcs Proceda 33 ( 2012 ) 1518 1525 1525 6. Acknowledgment The authors would lke to thank Hebe Natural Scence Foundaton for ts fnancal support (No: F2009000236). References [1] Gacnto, G., Rol, F. Desgn of effectve neural network ensembles for mage classfcaton processes. Image Vson and Computng Journal, 2001, 19(9/10): 699~707. [2] Zh Hua Zhou, Janxn Wu, We Tang. Ensemblng neural networks : Many could be better than all.artfcal Intellgence, 2002, 137 (1/2) : 239~263. [3] L Guo-zheng, Yang Je etal.clusterng algorthms based selectve ensemble. Journal of Fudan unversty(natural Scence),2004,43(5):689~691. [4] L Ka, Han Yanxa. Study of selectve ensemble learnng method and ts dversty based on decson tree and neural network. 2010 Chnese Control and Decson Conference, CCDC 2010,p1310-1315. [5] L. I. Kuncheva and C. J. Whtaker. Measures of dversty n classfer ensembles. Machne Learnng, 2003,51:181~207. [6] Alexey Tsymbal. Mykola Pechenzky. Padrag Cunnngham. Dversty n search strateges for ensemble feature selecton. Informaton Fuson 6(2005) 83~98. [7] A.Tsymbal, S.Puuronen, D. Patterson. Ensemble feature selecton wth the smple Bayesan classfcaton.informaton Fuson,2003,4(2) :87~100. [8] P. Delmata, Z. Sura, Feature Selecton Algorthm for Multple Classfer Systems:A Hybrd Approach. Fundamenta Informatcae, 2008,85:97~110. [9] Eulanda M. Dos Santos, Robert Sabourn, Patrck Maupn. A dynamc overproduce-and-choose strategy for the selecton of classfer ensembles. Pattern Recognton,2008,41:2993~3009. [10] Eulanda M. Dos Santos, Robert Sabourn, Patrck Maupn. Overfttng cautous selecton of classfer ensembles wth genetc algorthms. Informaton Fuson, 2009,10:150~162. [11] G. Martnez-Munoz, A. Suarez, Usng boostng to prune baggng ensembles.pattern Recognton Letters,2007,28 (1) :156~165. [12] Ioanns Partalas, Grgoros Tsoumakas and Ioanns Vlahavas. Prunng an Ensemble of Classfers va Renforcement Learnng. Neurocomputng, 2009,72 (7-9):1900~1909. [13] D. Skalak. The sources of ncreased accuracy for two proposed boostng algorthms. Amercan Assocaton for Artfcal Intellgence. AAAI-96 (1996). [14] A. Martn. On the generalzaton error of fxed combnatons of classfers. Journal of Computer and System Scences 73 (2007) 725-734.