An Improved Support Vector Machine Using Class-Median Vectors *

Similar documents
Support vector machines II

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Binary classification: Support Vector Machines

Kernel-based Methods and Support Vector Machines

Supervised learning: Linear regression Logistic regression

An Introduction to. Support Vector Machine

Support vector machines

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Introduction to local (nonparametric) density estimation. methods

A CLASSIFICATION OF REMOTE SENSING IMAGE BASED ON IMPROVED COMPOUND KERNELS OF SVM

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

A handwritten signature recognition system based on LSVM. Chen jie ping

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Study on a Fire Detection System Based on Support Vector Machine

Regression and the LMS Algorithm

Analysis of Lagrange Interpolation Formula

Functions of Random Variables

0/1 INTEGER PROGRAMMING AND SEMIDEFINTE PROGRAMMING

CSE 5526: Introduction to Neural Networks Linear Regression

15-381: Artificial Intelligence. Regression and neural networks (NN)

Generative classification models

Comparison of SVMs in Number Plate Recognition

Radial Basis Function Networks

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

ABOUT ONE APPROACH TO APPROXIMATION OF CONTINUOUS FUNCTION BY THREE-LAYERED NEURAL NETWORK

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Unsupervised Learning and Other Neural Networks

Research on SVM Prediction Model Based on Chaos Theory

Investigating Cellular Automata

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

Line Fitting and Regression

Dimensionality Reduction and Learning

Fourth Order Four-Stage Diagonally Implicit Runge-Kutta Method for Linear Ordinary Differential Equations ABSTRACT INTRODUCTION

CHAPTER VI Statistical Analysis of Experimental Data

Linear regression (cont.) Linear methods for classification

Bayes (Naïve or not) Classifiers: Generative Approach

Research Article A New Derivation and Recursive Algorithm Based on Wronskian Matrix for Vandermonde Inverse Matrix

Beam Warming Second-Order Upwind Method

Correlation and Regression Analysis

Lecture 7: Linear and quadratic classifiers

9.1 Introduction to the probit and logit models

6. Nonparametric techniques

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

MAX-MIN AND MIN-MAX VALUES OF VARIOUS MEASURES OF FUZZY DIVERGENCE

Study of Correlation using Bayes Approach under bivariate Distributions

Nonlinear Blind Source Separation Using Hybrid Neural Networks*

Dimensionality reduction Feature selection

A COMPARATIVE STUDY OF THE METHODS OF SOLVING NON-LINEAR PROGRAMMING PROBLEM

TESTS BASED ON MAXIMUM LIKELIHOOD

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

To use adaptive cluster sampling we must first make some definitions of the sampling universe:

Arithmetic Mean and Geometric Mean

Applied Mathematics and Computation

Overcoming Limitations of Sampling for Aggregation Queries

Generalized Minimum Perpendicular Distance Square Method of Estimation

ESTIMATION OF MISCLASSIFICATION ERROR USING BAYESIAN CLASSIFIERS

COMPROMISE HYPERSPHERE FOR STOCHASTIC DOMINANCE MODEL

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

Gender Classification from ECG Signal Analysis using Least Square Support Vector Machine

A Robust Total Least Mean Square Algorithm For Nonlinear Adaptive Filter

Machine Learning. knowledge acquisition skill refinement. Relation between machine learning and data mining. P. Berka, /18

Model Fitting, RANSAC. Jana Kosecka

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

Summary of the lecture in Biostatistics

PROJECTION PROBLEM FOR REGULAR POLYGONS

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. Research on scheme evaluation method of automation mechatronic systems

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

Delay-independent Fuzzy Hyperbolic Guaranteed Cost Control Design for a Class of Nonlinear Continuous-time Systems with Uncertainties 1)

A New Method for Decision Making Based on Soft Matrix Theory

CHAPTER 4 RADICAL EXPRESSIONS

DKA method for single variable holomorphic functions

Algorithms behind the Correlation Setting Window

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov

Chapter 11 Systematic Sampling

Pinaki Mitra Dept. of CSE IIT Guwahati

Third handout: On the Gini Index

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations

New Schedule. Dec. 8 same same same Oct. 21. ^2 weeks ^1 week ^1 week. Pattern Recognition for Vision

Analysis of VMSS Schemes for Group Key Transfer Protocol

L5 Polynomial / Spline Curves

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Chapter 2 Supplemental Text Material

Dynamic Analysis of Axially Beam on Visco - Elastic Foundation with Elastic Supports under Moving Load

Cubic Nonpolynomial Spline Approach to the Solution of a Second Order Two-Point Boundary Value Problem

Mechanics of Materials CIVL 3322 / MECH 3322

On generalized fuzzy mean code word lengths. Department of Mathematics, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India

Analyzing Fuzzy System Reliability Using Vague Set Theory

Evaluating Polynomials

Objectives of Multiple Regression

FACE RECOGNITION BASED ON OPTIMAL KERNEL MINIMAX PROBABILITY MACHINE

V. Rezaie, T. Ahmad, C. Daneshfard, M. Khanmohammadi and S. Nejatian

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

A tighter lower bound on the circuit size of the hardest Boolean functions

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Transcription:

A Improved Support Vector Mache Usg Class-Meda Vectors Zhezhe Kou, Jahua Xu, Xuegog Zhag ad Lag J State Ke Laborator of Itellget Techolog ad Sstems Departmet of Automato, Tsghua Uverst, Bejg 100084, P.R.C. zhezhekou00@mals.tsghua.edu.c Abstract Support vector mache bulds the fal dscrmat fucto o ol a small part of the trag samples, hch ma make the decso rule too sestve to ose ad outlers. Ispred b the dea of cetral support vector mache or C, e preset a mproved method based o the class-meda, called Meda Support Vector Mache or M ths paper. The epermet results sho M s a promsg ad robust algorthm, especall he outlers are far from the class-ceter. Keords Patter Classfcato, Support Vector Mache, Meda Support Vector Mache 1 Itroducto Support vector mache or s a e patter recogto techque developed b Dr. Vapk ad hs co-researchers [1-3]. The basc dea of s to desg a lear classfer th mamal classfcato marg, hle mmzg the trag error. Mamzg the marg plas the role of capact cotrol so that the learg mache ll ot ol have small emprcal rsk but also hold good geeralzato ablt [1-3]. Usuall the fal dscrmat fucto of ol depeds o part of the trag samples, hch are called support vectors. Ths propert makes the fal decso fucto sestve to some certa specfc samples the set. Thus the decso fucto obtaed b ma be badl cotorted f the samples are polluted b ose ad outlers ad some of these specfc samples are ufortuatel cosdered as support vectors, hch s ofte true the practcal applcatos. Ths problem has bee attractg more ad more atteto [4-6]. I [5] Zhag proposed a Ths ork s supported b Natoal Natural Scece Foudato of Cha(Project umber 69885004). modfed verso amed cetral support vector mache or C, hch ot ol takes the advatage of capact cotrol but also troduces the class ceter to overcome the eakess of. It as proved effectve ad promsg some practcal cases [5-6]. Hoever, C ma ot alas ork. I some cases, especall he the outlers stad far from the class ceter, the mea vector ll be pulled aa from the ceter b these outlers. But the class meda s a more robust represetato of the class tha the class ceter. I ths paper e preset a method, hch follos the mportat dea of ad C but tres to prevet the classfer from becomg too sestve to outlers b buldg the classfer o both the class-meda vectors ad the support vectors. The method s referred to as Meda Support Vector Mache or M. Epermets o to data sho ts usefuless ad advatages over ad also over C for certa cases. The remag part of the paper s arraged the follog a. The dea ad algorthm of M s preseted secto 2 ad secto 3. Secto 4 dscusses the M o-separable cases ad secto 5 apples the dea of kerel to make M olear. A smplfed mplemetato of M s gve secto 6 to utlze prevous algorthms for. Secto 7 llustrates some epermet results to evaluate M. More dscussos ad coclusos are gve secto 8. 2 Meda Support Vector Mache I ths paper the class meda s defed as a vector hose each compoet s the meda of correspodg compoets of all samples. I practce f the umber of the samples s odd, compoets of the class meda are the meda values, otherse ts compoets are the average of the to mddle values. Whe outlers occur the sample set, the could chage the meda lttle, but affect the class-ceter severel. As sho - 1 -

Fg.1, f there ests o outler, the cetral vector s close to meda vector. Hoever he a outler occurs the cetral vector s pulled far from the true cetral vector but the meda vector ol chages lttle. So t s cocluded that class-meda ll be more robust tha class-ceter, especall he the outlers are located far from class-ceter. class-medas to the classfcato boudar are: b ( b) d (5a) b ( b) d (5b) here 1, 1 are the class labels of ad. The e ca formulate our verso of marg as: ( ) d d d (6) Fg.1: Dagram of ceter ad meda Lke ad C, M s to fd a optmal separatg hperplae ( ) b 0 (1) accordg to the sample set (, ), 1,...,, R d, { 1, 1} (2) The decso fucto s the f ( ) sg{( ) b} (3) For the sake of smplct, frst e cosder learl separable cases. Accordg to [1-3], for the lear classfer to be optmal, all samples (2) should be correctl classfed b (3) ad the separato marg should be mamzed. The former requremet ca be rtte as: b,,, [( ) ] ξ > 0 1 L (4) here ξ 0 s a slack varable that cotrols the trag errors. Ths requremet guaratees the emprcal rsk ca be mmzed. The latter requremet performs the role of capact cotrol, hch makes the learg mache geeralze ell [1-3]. I, the separato marg s defed as the dstaces from the earest samples to the classfcato boudar. Actuall t s ths defto of separato marg that decdes the famous propert that the traed classfer depeds o ol a small part of the trag samples. To avod ths, e defe aother kd of marg, that s, the dstaces from the to class medas to the separato boudar. Fgure 2 llustrates the basc dea of M. Deote the meda of class 1 as, ad the meda of class -1 as. The dstaces from the Fgure 2. Meda separato marg We call the marg defed b (6) the meda separato marg ad call the classfer to mamze ths marg as Meda Support Vector Mache or M. It s ot eas to mamze (6) drectl, so follog the scheme used stadard, e ormalze the umerator of meda separato marg d to 1,.e. ( ) 1 (7) The reaso h e ca do ths s that the magtude of ca be scaled arbtrarl thout affectg the classfcato result. Thus e costruct the optmzato problem of our M method as, 1 2 1 m ψ ( ) ( ) (8) 2 2 subject to costrats (4) ad (7). Ths form s smlar to the prme problem of th oe more costrat. The optmzato goal of (8) ca be eplaed as mamzg the meda marg hle keepg all the trag data ot ol correctl classfed but also be aa from the separato hperplae. We ca ame the to hperplaes ( ) b ±ξ as the separato boudares ad call the rego betee them boudar zoe. - 2 -

3 The Dual Form of M Follog the smlar scheme th, the Lagrage fucto of the prme problem ca be rtte as: L p 1 2 ( ) α 1 { [( ) b] ξ} { ( ) 1} (9) β Ad the dual form of the above problem s: ma L D εα β 1 1 α 2, j 1 j ( β ) (10) ( α j β ), j 1 subject to: α 0 (11a) 1 α 0, 1,... (11b) hch s a quadratc programmg problem of α, β. The soluto eght vector s the lear combato of the trag samples α β ( ) 1 (12) Ad the decso fucto s f ( ) sg{( ) b } (13) hch the threshold b ca be decded from a sample for hch the equalt (4) holds true or be calculated b the to class medas. Smlarl as the case, from the Küh-Tucker codto e ca also fd that ol the α correspodg to those samples, hch make the equalt hold (4), s o-zero. These samples are also the oes that are earest to the separato boudar, correspodg to the support vectors. From (12) e ca see that the fal classfer of M s decded b both the support vectors ad the to class-meda vectors. Ths ca lead to a smplfed mplemetato of M dscussed secto 6. 4 Cosderatos for Noseparable Cases Whe cosderg the oseparable case, aga follog the deas, e troduce slack varables to codto (4) to clude the samples that volate ths codto: [ ( ) b] ξ ξ> 0, ξ 0, 1, L, (14) ad defe the objectve fucto as: mmze 1 ψ (, ξ ) ( ) C ξ (15) 2 1 Parameter C cotrols pealt o the errors ad also cotrols the trade-off betee the classfcato formato cotaed the medas of the samples ad the dvdual samples ear the boudar. For ths e problem, the dual problem turs to be same as defed b (10) ad (11) epect that codto (11b) s modfed to: 0 α < C,,... (16) 5 Kerel Verso 1 Lke e ca also appl the dea of kerel to, make M olear, b substtutg ( ) th, some kerel K( ) to realze the mappg from attrbute space to the hgh-dmeso feature space. I ths case, the fal decso fucto ll be f ( ) sg{ K( ) b } sg{ K (, ) K ( α β, 1 β K (, ) b } ) (17) Here e should pa especall atteto to ad. Ca the meda vectors attrbute space be used drectl here? From secto 2 t ca be see that the basc dea of usg class medas to buld support vector mache s that the class meda s less sestve to outlers ad thus ca be a more robust represetato of the class. Sce ths s true the attrbute space, e beleve that the class-meda attrbute space ca be a good represetatve of the sample. At the same tme, the medas the feature space are usuall uko because the feature space ma ot be or eed ot to be ko. Therefore t ould be reasoable ad feasble to drectl use the meda vectors attrbute space here. That s, use the mage of the meda vectors attrbute space as the medas feature space. Thus the M ca be mplemeted olear case a smplfed a. 6 A smplfed mplemetato of M From the epresso of the soluto of (12), e ca fd out that the fal decso fucto ca be cosdered as the combato of to parts, the support - 3 -

vectors ad the class-meda vectors. Ths fact leads to a smplfed mplemetato of M, hch ca be obtaed through the modfcato of the stadard result. That s, after gettg the eght of a stadard classfer, e ca adjust the hperplae b addg the dfferece vector betee the to class-medas th certa factor, e.g., e ( 1 λ ) λ( ) (18) here s the eght obtaed b stadard ad 0 λ 1 s a costat cotrollg the balace betee the classfcato formato cotaed the SVs ad the class medas. B takg ths short cut, prevous algorthms for ca be utlzed. I practce, λ ca be selected accordg the a pror estmato about our belefs the specfc samples ad the class medas. For eample, f there s severe ose sample set, a larger ca be chose to make the class medas pla a more mportat role. If λ 0, the stadard ca be obtaed. I practce, sce s scaled stadard, ma eed scalg to be comparable th. Whe kerel s adopted for the olear trasform, e ca use the mage of the meda vectors attrbute space as the meda feature space such that f ( ) sg{(1 λ) α K (, ) (19) 1 svm λ( K ( ) K ( )) b } Ths s our smplfed mplemetato of kerel verso of M. same as the oe obtaed he the trag set s clea. Note that here e dd ot attempt to detf ad remove the outler, but smpl made the result less sestve to outlers b takg the class-meda formato to accout. It ca also be see that he the outler s ot far from class-ceters, M ad C acheve smlar performace. (a) outler (b) (c) (d) Fg. 3: A comparso epermet of, C ad M. Pluses ad crcles deote the trag samples of to classes. (a) the stadard result at the estece of outlers; (b) the stadard result ecludg the outlers; (c) ad (d) the hperplaes of C ad M he the outlers est, respectvel. 7 Epermet Results ad Aalss I order to evaluate the performace of M ad compare M th ad C, e desg to to data sets. Epermet results ad aalss are preseted as follos. As sho Fg. 3, the frst eample s to compare the performace of MSV th that of C ad uder the estece of outlers ot far from the class ceter. I Fg.3 (a), there are to outlers the trag set ad the separato hperplae s obvous ot the optmal oe. Fg.3 (b) shos the optmal result he the trag set s oseless ad the separato hperplae chages a lot from the oe (a). Fg.3 (c) s the result obtaed b C, usg ts smplfed mplemetato[5]. Fg.3 (d) llustrates the result obtaed b M, usg ts smplfed mplemetato, th the same λ as (c). We ca fd that although the outlers est the trag set, the optmal result of M s almost the (a) (b) Fg. 4: A comparso epermet betee C ad M. Pluses ad crcles deote the trag samples of to classes. (a) ad (b) are the results of C ad M he the outlers are far from class-ceters, respectvel. Fgure 4 shos the epermet results of the comparso betee C ad M uder the estece of outlers far from the class-ceter. Fg.4 (a) shos the result obtaed b C ad Fg.4 (b) - 4 -

the result b M. It ca be see that he the outlers are far from class ceter, the chage the class-ceter greatl so that the results obtaed b C s far from the reall optmal oe. But the class meda s less sestve to such outlers ad therefore the result s more relable. So t ca be cocluded that M s a more robust algorthm. socet orkshop:, Ne York: The Isttute of Electrcal ad Electrocs Egeergs, Ic. pp. 3-11. [6] X. Zhag, H. Ke, Cetral support vector maches ad ts applcato to cacer classfcato. Submtted to IEEE Tras. o Neural Netorks. [7] Steve Gu, Support Vector Maches for Classfcato ad Regresso, ISIS Techcal Report, Uverst of Southampto, 1998 8 Dscusso ad Cocluso Support vector mache or, as a e techque for patter recogto, possesses good geeralzato ad thus has bee recevg more ad more atteto. Hoever, some authors observed that some cases here there est severe ose ad outlers the trag data, makes ts result too sestve to fe specfc samples ad thus less relable. As a step toard a more robust ad more practcall applcable modfcato of, e proposed meda support vector mache or M. We edeavor to combe the class-meda formato th the stadard to make the algorthm more robust. Further, accordg to the form of the M s soluto, a smpler mplemetato s troduced, hch s smpl the combato of the stadard SVs ad the class medas b a factor. Epermets llustrate the advatage of M over stadard ad also over C for certa cases. Hoever, further aalss of the method s deserved, ad e are also trg to appl t for some practcal problems. 9 Ackoledgemet The authors ould lke to thak Dr. JaHua Xu ad Ha Ke for ther helpful dscusso. Refereces [1] Vapk V N. Statstcal Learg Theor, Ne-York, Joh Wle & Sos, 1998. [2] Vapk V N. The Nature of Statstcal Learg Theor, NY: Sprger-Verlag, 1995 [3] X. Zhag, Itroducto to statstcal learg theor ad support vector maches, Acta Automatca Sca, 2000, 26(1):32-42( Chese). [4] B. Boser, I. Guo, V. Vapk. A trag algorthm for optmal marg classfers, preseted at the 5th Aual Workshop o Computatoal Learg Theor, Pttsburgh: ACM Press, 1992. [5] X. Zhag, Usg class-ceter vectors to buld supportvector maches, Yu-He Hu, Ja Larse, etc. Neural Netorks for Sgal Processg IX, Proceedgs of the 1999 IEEE sgal processg - 5 -