The Influences of Smooth Approximation Functions for SPTSVM

Similar documents
Model-based mixture discriminant analysis an experimental study

Complexity of Regularization RBF Networks

A new initial search direction for nonlinear conjugate gradient method

A Unified View on Multi-class Support Vector Classification Supplement

Danielle Maddix AA238 Final Project December 9, 2016

Control Theory association of mathematics and engineering

Coding for Random Projections and Approximate Near Neighbor Search

max min z i i=1 x j k s.t. j=1 x j j:i T j

Where as discussed previously we interpret solutions to this partial differential equation in the weak sense: b

10.5 Unsupervised Bayesian Learning

LOGISTIC REGRESSION IN DEPRESSION CLASSIFICATION

Polynomial Smooth Twin Support Vector Machines

Weighted K-Nearest Neighbor Revisited

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM

Bilinear Formulated Multiple Kernel Learning for Multi-class Classification Problem

Nonreversibility of Multiple Unicast Networks

The Hanging Chain. John McCuan. January 19, 2006

Maximum Entropy and Exponential Families

A Queueing Model for Call Blending in Call Centers

Twin Support Vector Machine in Linear Programs

EXACT TRAVELLING WAVE SOLUTIONS FOR THE GENERALIZED KURAMOTO-SIVASHINSKY EQUATION

Hankel Optimal Model Order Reduction 1

ONLINE APPENDICES for Cost-Effective Quality Assurance in Crowd Labeling

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion

Taste for variety and optimum product diversity in an open economy

A NONLILEAR CONTROLLER FOR SHIP AUTOPILOTS

Assessing the Performance of a BCI: A Task-Oriented Approach

7 Max-Flow Problems. Business Computing and Operations Research 608

The Laws of Acceleration

Ordered fields and the ultrafilter theorem

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach

Sensitivity Analysis in Markov Networks

Discrete Bessel functions and partial difference equations

Average Rate Speed Scaling

Sensor management for PRF selection in the track-before-detect context

A new method of measuring similarity between two neutrosophic soft sets and its application in pattern recognition problems

Modeling of discrete/continuous optimization problems: characterization and formulation of disjunctions and their relaxations

Applying CIECAM02 for Mobile Display Viewing Conditions

Stability of alternate dual frames

Resolving RIPS Measurement Ambiguity in Maximum Likelihood Estimation

Probabilistic Graphical Models

Variation Based Online Travel Time Prediction Using Clustered Neural Networks

The Effectiveness of the Linear Hull Effect

Journal of Inequalities in Pure and Applied Mathematics

MATHEMATICAL AND NUMERICAL BASIS OF BINARY ALLOY SOLIDIFICATION MODELS WITH SUBSTITUTE THERMAL CAPACITY. PART II

University of Groningen

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont.

Error Bounds for Context Reduction and Feature Omission

arxiv:gr-qc/ v2 6 Feb 2004

Lightpath routing for maximum reliability in optical mesh networks

Coefficients of the Inverse of Strongly Starlike Functions

(q) -convergence. Comenius University, Bratislava, Slovakia

Singular Event Detection

RESEARCH ON RANDOM FOURIER WAVE-NUMBER SPECTRUM OF FLUCTUATING WIND SPEED

Bäcklund Transformations and Explicit Solutions of (2+1)-Dimensional Barotropic and Quasi-Geostrophic Potential Vorticity Equation

Scalable Positivity Preserving Model Reduction Using Linear Energy Functions

Tight bounds for selfish and greedy load balancing

The First Integral Method for Solving a System of Nonlinear Partial Differential Equations

SPLINE ESTIMATION OF SINGLE-INDEX MODELS

ON LOWER LIPSCHITZ CONTINUITY OF MINIMAL POINTS. Ewa M. Bednarczuk

A Characterization of Wavelet Convergence in Sobolev Spaces

Measuring & Inducing Neural Activity Using Extracellular Fields I: Inverse systems approach

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E')

Finite-time stabilization of chaotic gyros based on a homogeneous supertwisting-like algorithm

Searching All Approximate Covers and Their Distance using Finite Automata

Rigorous prediction of quadratic hyperchaotic attractors of the plane

ScienceDirect. Weighted linear loss support vector machine for large scale problems

Improved Extended Kalman Filter for Parameters Identification

Support Vector Machine via Nonlinear Rescaling Method

COMBINED PROBE FOR MACH NUMBER, TEMPERATURE AND INCIDENCE INDICATION

Integration of the Finite Toda Lattice with Complex-Valued Initial Data

Chapter 8 Hypothesis Testing

Reliability Guaranteed Energy-Aware Frame-Based Task Set Execution Strategy for Hard Real-Time Systems

7.1 Roots of a Polynomial

Normative and descriptive approaches to multiattribute decision making

A Heuristic Approach for Design and Calculation of Pressure Distribution over Naca 4 Digit Airfoil

Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite.

ON THE LEAST PRIMITIVE ROOT EXPRESSIBLE AS A SUM OF TWO SQUARES

Case I: 2 users In case of 2 users, the probability of error for user 1 was earlier derived to be 2 A1

AC : A GRAPHICAL USER INTERFACE (GUI) FOR A UNIFIED APPROACH FOR CONTINUOUS-TIME COMPENSATOR DESIGN

Comparison of Alternative Equivalent Circuits of Induction Motor with Real Machine Data

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1

RATIONALITY OF SECANT ZETA VALUES

The Second Postulate of Euclid and the Hyperbolic Geometry

On the application of the spectral projected gradient method in image segmentation

An Integer Solution of Fractional Programming Problem

The transition between quasi-static and fully dynamic for interfaces

UPPER-TRUNCATED POWER LAW DISTRIBUTIONS

A model for measurement of the states in a coupled-dot qubit

Application of the Dyson-type boson mapping for low-lying electron excited states in molecules

SINCE Zadeh s compositional rule of fuzzy inference

Development of Fuzzy Extreme Value Theory. Populations

ON A PROCESS DERIVED FROM A FILTERED POISSON PROCESS

On the Licensing of Innovations under Strategic Delegation

A New Version of Flusser Moment Set for Pattern Feature Extraction

Feature Selection by Independent Component Analysis and Mutual Information Maximization in EEG Signal Classification

On Component Order Edge Reliability and the Existence of Uniformly Most Reliable Unicycles

DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS

The experimental plan of displacement- and frequency-noise free laser interferometer

THE METHOD OF SECTIONING WITH APPLICATION TO SIMULATION, by Danie 1 Brent ~~uffman'i

Transcription:

The Influenes of Smooth Approximation Funtions for SPTSVM Xinxin Zhang Liaoheng University Shool of Mathematis Sienes Liaoheng, 5059 P.R. China ldzhangxin008@6.om Liya Fan Liaoheng University Shool of Mathematis Sienes Liaoheng, 5059 P.R. China fanliya63@6.om Abstrat: The reently proposed smooth projetion twin support vetor mahine(sptsvm) gains a good generalization ability is suitable for many binary lassifiation problems. But we know that different smooth approximation funtions may bring different lassifiation auraies. In order to study the influene of smooth approximation funtions for SPTSVM, in this paper, we first overview eight known smooth approximation funtions desribe their differentiability error ranges by five lemmas one theorem. Then, we perform a series of omparative experiments on lassifiation auray running time by using SPTSVM with Newton-Armijo method on 0 UCI datasets 6 NDC datasets. From experiment results, we an get a hoie order of the eight approximation funtions in generally. Key Words: Smooth projetion TSVM; plus funtion; smooth approximation funtion; hoie order Introdution Reently, nonparallel hyperplane support vetor mahine (NHSVM) lassifiation methods, as the extension of the lassial SVM, have beome the researhing hot spots in the field of mahine learning. The study of NHSVM lassifiation methods originates from generalized eigenvalue proximal SVM (GEPSVM) [], twin support vetor mahine (TSVM) [] projetion twin support vetor mahine (PTSVM) [3]. For binary data lassifiation problems, NHSVM methods aim to find a hyperplane for eah lass, suh that eah hyperplane is proximal to the data points of one lass far from the data points of the other lass. GEPSVM, TSVM PTSVM are three representative algorithms of NHSVM, all the other NHSVM methods are improved versions based on them. GEPSVM obtains eah of nonparallel hyperplanes by solving the eigenvetor orresponding to a smallest eigenvalue of a generalized eigenvalue problem, so that eah hyperplane is as lose as possible to the points of its own lass as far as possible from the points of the other lass, in the meantime. Twin support vetor mahine (TSVM) onstruts a pair of nonparallel hyperplanes by solving two smaller size QPPs rather than a single quadrati programming problem (QPP) suh that eah one is as lose as possible to one lass, as far as possible from the other lass. A new input will be assigned to one of the lasses depending on its proximity to whih hyperplane. Experiments show that TSVM is faster than SVM [,4]. Different from GEPSVM TSVM, the entral idea in PTSVM is to find a projetion axis for eah lass, suh that withinlass variane of the projeted samples of its own lass is minimized; meanwhile, the projeted samples of the other lass satter away as far as possible. PTSVM is an improvement extension of multi-weight vetor projetion SVM (MVSVM) [5]. In order to further enhane the performane of PTSVM, Shao et al. [6] proposed a least squares version of PTSVM, alled least squares PTSVM (LSPTSVM). LSPTSVM works extremely faster than PTSVM beause the solutions of LSPTSVM an be attained by solving two systems of linear equations, whereas PTSVM needs to solve two QPPs. Beause of this, the least squares method is also reeived great attention in support tensor mahine. Later, Shao et al. [7-9] proposed a simple reasonable variant of PTSVM from theoretial point E-ISSN: 4-880 4 Volume 5, 06

of view, alled PTSVM with regularization term (RPTSVM), in whih the regularized risk priniple is implemented the nonlinear lassifiation ignored in PTSVM is also onsidered in RPTSVM. Ding Hua [0] formulated a nonlinear version of LSPTSVM for binary nonlinear lassifiation by introduing nonlinear kernel into LSPTSVM. This formulation leads to a novel nonlinear algorithm, alled nonlinear L- SPTSVM (NLSPTSVM). Ding et al. reviewed many known nonparallel hyperplane support vetor mahine algorithms in []. In addition, by means of the idea of smooth TSVM in [], the authors of the paper introdue smoothing tehnique into PTSVM propose smooth PTSVM (SPTSVM) in [3]. We know that by using smoothing tehniques, we an solve primal unonstrained d- ifferentiable optimization problems rather than dual QPPs, whih results that many optimization methods an be used in smooth versions of various variants of TSVM, suh as Newton method, quasi Newton method, Newton-Armijo method so on. But we disover that different smooth approximation funtions have different impats for lassifiation results even using the same lassifier. So, in this paper, we first overview eight smooth approximation funtions proposed in [4-] then ompare their influenes for SPTSVM on 6 datasets taken from UCI database NDC database. Taking into aount the length of the paper, we only disuss the influenes of smooth approximation funtions for linear version of SPTSVM. By means of kernel skill, we an disuss the influenes of smooth approximation funtions for nonlinear SPTSVM by using the similar way. Linear PTSVM SPTSVM In this setion, we reall linear PTSVM linear SPTSVM briefly, for details see [3,3]. Let T = {(x (i) j, y (i) j )} m i j=, i =, be a set of data samples for a binary lassifiation problem, where i = denotes the positive lass, i = denotes the negative lass, m i denotes the number of samples belonging to lass i, x (i) j R n y (i) j {±} are respetively the input lass label of jth sample in lass i. Let µ (i) = mi m i j= x (i) j be the mean of lass i for i =, A = [x (),, x () m ] T R m n B = [x (),, x () m ] T R m n denote the input matries of positive negative lasses, respetively, m = m + m. Let e R m e R m be vetors of ones.. Linear PTSVM The entral idea of linear PTSVM is to find a projetion axis for eah lass suh that withinlass variane of the projeted samples of its own lass is minimized meanwhile the projeted samples of the other lass satter away as far as possible. This leads to the following two optimization problems: min m w,ξ i= (w T x () i w T µ () ) + m k= ξ k k s.t. w T x () k w T µ () + ξ k, ξ k 0, k =,,, m, () min m w,η k i= (w T x () i w T µ () ) + m k= η k s.t. (w T x () k w T µ () ) + η k, η k 0, k =,,, m, () where, > 0 are trade-off parameters {ξ k } m k= {η k} m k= are slak variables. Put S = m j= (x () i µ () )(x () i µ () ) T R n n, S = m j= (x () i µ () )(x () i µ () ) T R n n, then the problems () () an be written as the following matrix forms, respetively: min w,η min w,ξ wt S w + e T ξ s.t. Bw m e e T Aw + ξ e, ξ 0. wt S w + e T η s.t. (Aw m e e T Bw ) + η e, η 0. (3) (4) By solving the Wolfe dual problems of the problems (3) (4), respetively, min α αt (B m e e T A)S (B T m A T e e T )α e T α s.t. 0 α e, min β βt (A m e e T B)S (A T m B T e e T )β e T β s.t. 0 β e, E-ISSN: 4-880 4 Volume 5, 06

we an obtain the optimal Lagrange multipliers vetors α β. Without loss of generality, we an let S S are nonsingular matries. Otherwise, sine they are symmetri nonnegative definite matries, we an regularize them by using S + εi n S + εi n to replae S S, respetively, where ε > 0 is a suffiient small number I n denotes the n order unit matrix. Consequently, we an dedue that w = S (B T m A T e e T )α, w = S (A T m B T e e T )β, then the lass label of a new input x R n an be assigned by lass(x) = arg min (wi ) T x (wi ) T µ (i). i=,. Linear SPTSVM The main idea of linear SPTSVM is to introdue smoothing tehnique into PTSVM, whih results in solving a pair of primal unonstraint differentiable optimization problems rather than a pair of dual QPPs. By introduing the plus funtions: x + = max {x, 0}, x R, x + = ((x ) +,, (x n ) + ) T, x R n, the onstraints of the primal problems () () an be rewritten as follows, respetively, ξ k = ( w T x () k + w T µ () ) +, k =,,, m, η k = ( + w T x () k w T µ () ) +, k =,,, m, ξ = (ξ,, ξ m ) T = (e + e Ãw Bw ) + R m, η = (η,, η m ) T = (e e Bw + Aw ) + R m. where à = (µ() ) T B = (µ () ) T. In order to avoid the singularities of the matries S S involved in linear PTSVM, we adding the generalization terms 3 w 4 w in problems () (), respetively. In addition, for obtaining differentiable optimization problems, we replae one penalty by two penalty for slak vetors ξ η. Consequently, we get two improved unonstraint optimization problems: min (e + e Ãw Bw ) + w Aw e Ãw + 3 w, + (5) min (e e Bw + Aw ) + w Bw e Bw + 4 w. + (6) Beause the plus funtion (x) + for x R is nondifferentiable, for effetively quikly solving the problems (5) (6) by using the known optimization methods, we need to introdue a s- moothing approximation funtion ρ(x, ) for the plus funtion (x) +, where is a smoothing parameter. Consequently, the problems (5) (6) an be further improved as the following two unonstraint differentiable optimization problems: min f (w ) = Aw w e Ãw + ρ(e + e Ãw Bw, ) + 3 w, min f (w ) = Bw w e Bw + ρ(e + Aw e Bw, ) + 4 w. (7) (8) In this paper, we mainly use Newton-Armijo method for solving the problems (7) (8)..3 Newton-Armijo method Newton-Armijo method is one of the most popular iterative algorithms for solving unonstraint smooth optimization problems has been shown to be quadratially onvergent (see [5]). In order to use Newton-Armijo method, firstly we need to alulate the gradient vetors f i (w i ) Hessian matries f i (w i ) of the objetive funtions of the problems (7) (8): f (w ) = m i= ρ(z i, )ρ (z i, )(ÃT x () i ) +(A e Ã) T (A e Ã)w + 3 w, f (w ) = m i=(ρ (z i, ) + ρ(z i, )ρ (z i, )) (ÃT x () ) T i )(ÃT x () i +(A e Ã) T (A e Ã) + 3 I, f (w ) = m i= ρ(z i, )ρ (z i, )(x () i B T ) +(B e B) T (B e B)w + 4 w, f (w ) = m i=(ρ (z i, ) + ρ(z i, )ρ (z i, )) (x () i B T )(x () i B T ) T +(B e B) T (B e B) + 4 I, E-ISSN: 4-880 43 Volume 5, 06

where z i = + Ãw (x () i ) T w, z i = Bw (x () i ) T w I is the identity matrix of appropriate dimension. Then we alulate the searh diretion d t by Newton method (alled Newton diretion) searh stepsize λ t by Armijo method (alled Armijo stepsize) for t-th step iteration. The speifi proedure is as follows, in whih we only solve the problem (7). With the similar way, we an solve the problem (8). Algorithm. The Newton-Armijo algorithm for solving linear SPTSVM Step. Initialization. For given parameter values, 3 the maximum number of iterations T, let t = 0 ε > 0 be small enough take arbitrarily nonzero vetor w t R n. Step. Calulate Newton diretion d t by solving the system of linear equations f (w)d t t = f (w). t Step 3. Calulate Armijo stepsize λ t by inexat linear searhing, that is, hoose λ t = max {,,, } satisfying 4 f (w t ) f (w t + λ t d t ) λt 4 f (w t ) T d t. Step 4. Update w. t Calulate the next iterative point by formula w t+ = w t + λ t d t. Step 5. If w t+ w t < ε or the maximum number of iterations T is ahieved, stop iteration take w = w t+ ; otherwise, put t t + return to step. Step 6. The lass label of a new input x R n is assigned by lass(x) = arg min (w i ) T x (wi ) T mi m i i=, j= x(i) j. 3 Smooth approximation funtions In this setion, we briefly overview eight smooth approximation funtions for the plus funtion x +, whih are taken from [4-], desribe the differentiability error ranges of these approximation funtions as five lemmas one theorem. In all approximation funtions, > 0 denotes the smooth parameter. In 980, Zhang [4] introdued a smooth approximation funtion as the integral of the sigmoid funtion for the plus funtion x +, whih is defined as follows. ρ (x, ) = x + ln( + e x ), x R. (9) Later, this approximation funtion is used in many SVM models, suh as [3-5]. It is evident that lim + ρ (x, ) = x +, x R. This indiates that the approximation effet will be better better with the inrease of the value of. The first- seond-order derivatives of ρ (x, ) are respetively ρ (x, ) = ρ (x, ) = + e x, e x ( + e x ), where ln( ) is the natural logarithms e is the base of natural logarithms. Lemma 3.. [] Let ρ (x, ) be defined by (9). Then () ρ (x, ) is arbitrary order smooth with respet to x; () ρ (x, ) x +, x R; (3) for arbitrarily k > 0 with x < k, one has ρ (x, ) x + ( ln ) + ( k ) ln. In 005, Yuan et al. [5] proposed the following quadrati pieewise polynomial smooth approximation funtion for x + got a quadrati polynomial smooth support vetor mahine (QPSSVM) model: ρ (x, ) = x, x, 4 x + x + 4, < x <, 0, x. (0) The first- seond-order derivatives of ρ (x, ) are respetively, x, ρ (x, ) = x +, < x <, 0, x, ρ (x, ) = {, x <, 0, x. In the same year, Yuan et al. [6] introdued the following fourth pieewise polynomial smooth approximation funtion for x + got E-ISSN: 4-880 44 Volume 5, 06

a fourth polynomial smooth support vetor mahine (FPSSVM) model: x, x, ρ 3 (x, ) = (x + 6 )3 (x 3), < x <, 0, x. () The first- seond-order derivatives of ρ 3 (x, ) are respetively, x, ρ 3 (x, ) = (x + 8 ) (x 5), < x <, 0, x, ρ 3 (x, ) = { 3 8 (x + )(x 3), x <, 0, x. Lemma 3. [6] Let ρ (x, ) ρ 3 (x, ) be defined by (0) (), respetively. Then () ρ (x, ) is -order smooth ρ 3 (x, ) is -order smooth with respet to x; () ρ (x, ) x + ρ 3 (x, ) x + for all x R; (3) for any x R, one has ρ (x, ) x + ρ 3 (x, ) x +. 9 In 007, Xiong et al. [7] derived an important reursive equation () proposed a lass of smooth approximation funtions using the interpolation tehnique. ρ d 4(x, ) = a I d dx I d = x(x )d (d ) I d (d ) d, d =, 3,, () where d is the number of iterations a R is a parameter. For example, taking d = a = 3, we an alulate that I = 3 x3 3 x 3 = 3 ( x 3 x ), then ρ 4(x, ) = 4 x4 x x, x R. The first- seond-order derivatives of ρ 4(x, ) are respetively (ρ 4) (x, ) = x 3 x, (ρ 4) (x, ) = 3 x. It notes that the larger the parameter d is, the higher the approximation auray is, but that will generate the additional omputation ost. So, we only onsider the ase of d =, that is, ρ 4(x, ). In the same year, Yuan et al. [9] proposed a three-order spline interpolation polynomial approximation funtion obtained a three-order spline smooth support vetor mahine (TSSVM) model: x, x >, 6 ρ 5 (x, ) = x3 + x + x +, 0 < x, 6 6 x3 + x + x +, < x 0, 6 0, x. (3) The first- seond-order derivatives of ρ 5 (x, ) are respetively, x >, ρ 5 (x, ) = x + x +, 0 < x, x + x +, < x 0, 0, x, { ρ x + > 0, x < 5 (x, ) =, 0, x. Lemma 3.3 [8] Let ρ 5 (x, ) be defined by (3). Then () ρ 5 (x, ) is -order smooth with respet to x; () ρ 5 (x, ) x +, x R; (3) for any x R, one has ρ 5 (x, ) x +. 4 In 03, Wu et al. [9] introdued a three order pieewise polynomial approximation funtion: 0, x <, 4 8(x+ 4 )3, ρ 6 (x, ) = x 0, 3 4 (4) x + 8( 4 x)3, 0 < x 3α, 4 x, x >. 4 The first- seond-order derivatives of ρ 6 (x, ) are respetively 0, x <, 4 8(x+ ρ 4 ), 6 (x, ) = x 0, 4 8(x 4 ), 0 < x, 4, x >, 4 E-ISSN: 4-880 45 Volume 5, 06

ρ 6 (x, ) = 0, x > 4, 6( 4 x ) 0, x 4. Lemma 3.4 [9] Let ρ 6 (x, ) be defined by (4). Then () ρ 6 (x, ) is -order smooth with respet to x; () ρ 6 (x, ) x +, x R; (3) for any x R, one has ρ 6 (x, ) x + 385. In the same year, Ding et al. [0] introdued a luster of polynomial approximation funtions obtained a polynomial smooth twin support vetor regression: x, x, ρ n 7(x, ) = ( + x n (l 3)!! l= ( + x ) l ) (l)!! + x, x <, 0, x, (5) where n =, 3,. The first- seond-order derivatives of ρ n (x, ) are respetively (ρ n 7) (x, ) = (ρ n 7) (x, ) =, x, x ( + n l= (l 3)!! (l)!! ( x ) l ) +, x <, 0, x, x, x, ( + n (l 3)!! l= x 0, x, ( (l )!! x ) l ) nl= (l 3)!! ( (l 4)!! x ) l, x <, Lemma 3.5 [5] Let ρ n 7(x, ) be defined by (5). Then () ρ n 7(x, ) is n-order smooth with respet to x; () n lim max(ρ n 7(x, ) x + ) = 0. In 04, a quadrati polynomial smooth approximation funtion was proposed in [] as follows: ρ(x, α) = 4 α x + x + α, x R, 4 where α R : α 0 is a smooth parameter. If letting = α > 0, one has ρ 8 (x, ) = x + x + 4 4 = ( x + 4 ), x R. (6) The first- seond-order derivatives of ρ 8 (x, ) are respetively ρ 8 (x, ) = ( x + ), ρ 8 (x, ) =. Theorem 3. Let ρ 8 (x, ) be defined by (6). Then () ρ 8 (x, ) is -order smooth with respet to x; () ρ 8 (x, ) x +, x R; (3) lim x (ρ 8(x, ) x + ) = 0. Proof. The first onlusion is obvious. Sine { 4 ρ 8 (x, ) x + = ( x ), x > 0, ( x + (7) 4 ), x 0, we an obtain the seond onlusion. From (7), we an get lim x (ρ 8(x, ) x + ) = lim x 4 ( x ) = 0 when x > 0, lim x (ρ 8(x, ) x + ) = lim x 4 ( x + ) = 0 when x 0, whih indiates that the third onlusion is true. 4 The Influenes for SPTSVM In this setion, in order to illustrate the influenes of eight smooth approximation funtions for linear SPTSVM, we perform a series of omparative experiments of binary lassifiation problems on lassifiation auray running time by using 0 datasets taken from UCI database [6] 6 datasets taken from NDC database [7] are listed in Table. In 0 UCI datasets, Iris, Vehile, Waveform Balane four datasets all have 3 E-ISSN: 4-880 46 Volume 5, 06

Table : Desription of NDC datasets Dataset Training data Test data Features NDC-00 00 40 3 NDC-500 500 00 3 NDC-700 700 40 3 NDC-000 000 00 3 NDC-000 000 400 3 NDC-3000 3000 600 3 lasses, we hoose the later two lasses for experiments, respetively. All experiments are implemented in Matlab (7..0) R00b environment on a PC with an Intel P4 proessor (.30 GHz) with 4 GB RAM SPTSVM is implemented by Newton-Armijo algorithm, that is, Algorithm, the fivefold ross-validation method. We know that the hoie of parameters have great impat on the performane of a lassifier, in order to failitate omparison, we take ε = 0 3, T = 50 in Algorithm = = 3 = 4 = after grid searhing from { 8,, 8 }. The lassifiation auray is defined by Auray = T P + T N T P + F P + T N + F N, where TP, TN, FP FN denote the numbers of true positive, true negative, false positive false negative, respetively. The experiment results on 0 UCI datasets are listed in Table on 6 NDC datasets are listed in Table 3, in whih the seventh s- mooth approximation funtion ρ n 7(x, ) is taken as ρ 4 7(x, ). In addition, in order to explain the influene of order n for ρ n 7(x, ), we perform omparative experiments with n =, 3, 4, respetively, the experiment results are listed in Table 4. It should also be pointed out that the fourth smooth approximation funtion ρ d 4(x, ) are not ommonly used, so we only take d =, that is ρ 4(x, ). From Table, we an see that on the lassifiation auray () ρ (x, ) is the best on Breast Waveform two datasets the next best on the rest datasets exept to Liver dataset; () ρ (x, ) ρ 3 (x, ) are ompletely the same, whih are the best on Breast, Pima Waveform three datasets, are slightly worse than ρ (x, ) on the rest datasets, are the worst on Liver dataset; (3) ρ 4(x, ) is the best on Balane, Liver, Iris Vehile four datasets is the worst on Breast, Pima Waveform three datasets; (4) ρ 5 (x, ) is almost the same as ρ (x, ) ρ 3 (x, ); (5) ρ 4 7(x, ) is the same as ρ (x, ) ρ 3 (x, ) exept to Liver dataset learly better than ρ (x, ), ρ 3 (x, ) ρ 5 (x, ) on Liver dataset; (6) although ρ 6 (x, ) ρ 8 (x, ) are the best on WBC Vote two datasets, respetively, but generally speaking, ρ 6 (x, ), ρ 8 (x, ) ρ 4 7(x, ) are omparable. On the running time, ρ 8 (x, ) osts the least time among these datasets exept for Vehile Waveform two datasets ρ 4(x, ) osts the most time exept for Liver dataset. From Table 3, we an see that () ρ (x, ) has the highest lassifiation auray on all datasets; () ρ (x, ), ρ 3 (x, ), ρ 5 (x, ) ρ 4 7(x, ) have the same the lassifiation auraies on all datasets, whih are slightly lower than the orresponding lassifiation auraies of ρ (x, ), respetively; (3) ρ 6 (x, ) ρ 8 (x, ) have the almost same the lassifiation auraies on all datasets, whih are omparable with the lassifiation auraies of ρ (x, ), ρ 3 (x, ), ρ 5 (x, ) ρ 4 7(x, ); (4) ρ 4(x, ) has the worst lassifiation auraies exept for NDC-700 dataset the longest running times exept for NDC-700 NDC-000 datasets; (5) ρ (x, ) ρ 3 (x, ) have the shortest running times on NDC-00, NDC-700 NDC- 3000 three datasets. From Table 4, we an see that ρ 7(x, ), ρ 3 7(x, ) ρ 4 7(x, ) have the same lassifiation auraies on 0 UCI datasets, just the running times are different, whih indiates that the lassifiation auray of SPTSVM may have only small hanges with inreasing of the order n. On the basis of the above analysis, we an onlude that when hoosing a smooth approximation funtion in order to improve the lassifiation auray of SPTSVM, in general, we an firstly onsider ρ (x, ), seondly onsider one of ρ (x, ), ρ 3 (x, ) ρ 5 (x, ), thirdly onsider E-ISSN: 4-880 47 Volume 5, 06

Table : Comparison results on 0 UCI datasets Dataset ρ (x, ) ρ (x, ) ρ 3 (x, ) ρ 4 (x, ) ρ 5(x, ) ρ 6 (x, ) ρ 8 (x, ) ρ 4 7 (x, ) Auray Auray Auray Auray Auray Auray Auray Auray (%) (%) (%) (%) (%) (%) (%) (%) Time(s) Time(s) Time(s) Time(s) Time(s) Time(s) Time(s) Time(s) Balane 9.8 90.3509 90.3509 9.985 90.3509 90.877 9.8 90.3509 (65 4).6683 0.9795.64 3.5855 0.877.469 0.703 0.803 Breast 7.3637 7.6464 7.6364 59.6364 7.0000 7.6364 70.5455 7.6364 (77 9) 0.8860 0.687 0.7097.643 0.5080 0.694 0.453.087 Heart 8.5000 8.967 8.967 75.83333 8.967 8.6667 8.6667 8.967 (303 3) 0.83.8574 0.5638.5075 0.5008 0.6890 0.46 0.9584 Pima 76.0784 76.09 76.09 7.94 76.09 75.94 75.448 76.09 (768 8).96.074.668 3.495.0680.8637 0.70.6805 Vote 96.0000 94.0000 94.0000 94.0000 94.0000 96.0000 96.50000 94.0000 (435 6) 0.6796 0.505 0.5667.3384 0.4773 0.5500 0.443.900 Liver 59.655 53.793 53.793 6.7586 54.379 60.0000 60.3448 60.0000 (345 6).5990.038.403 0.976.973 0.8570 0.7598.370 WBC 97.49 96.4706 96.4706 95.6 96.4706 97.309 44.0336 96.4706 (600 9) 4.595 0.749 0.74.956 0.6794.05 0.5944.7458 Iris 93.0000 93.0000 93.0000 96.0000 93.0000 9.0000 9.0000 93.0000 (50 4) 0.968 0.665 0.788 0.54 0.774 0.70 0.55 0.750 Vehile 86.0000 86.0000 86.0000 93.0000 86.0000 86.0000 86.0000 86.0000 (50 8) 0.504 0.4405 0.3836.4558 0.4668 0.563 0.569 0.6887 Waveform 93.0000 93.0000 93.0000 90.0000 93.0000 9.0000 9.0000 93.0000 (50 ) 0.8 0.4066 0.7379 0.9567 0.7906 0.4984 0.4839 0.5858 Table 3: Comparison results on NDC datasets with eight smoothing approximation funtions Dataset ρ (x, ) ρ (x, ) ρ 3 (x, ) ρ 4 (x, ) ρ 5(x, ) ρ 6 (x, ) ρ 8 (x, ) ρ 4 7 (x, ) Auray Auray Auray Auray Auray Auray Auray Auray (%) (%) (%) (%) (%) (%) (%) (%) Time(s) Time(s) Time(s) Time(s) Time(s) Time(s) Time(s) Time(s) NDC-00 94.3590 9.3077 9.3077 8.053 9.3077 9.8 9.8 9.3077 0.778 0.5599 0.554.0489 0.6099 0.865 0.597 0.679 NDC-500 93.9394 9.33 9.33 9.77 9.33 93.33 9.773 9.33.987.045.3064 3.093.378.3987.93.7377 NDC-700 95.857 94.743 94.743 95.743 94.743 95.0000 94.743 94.743.446.3848.93 3.9989 6.389 5.8373 4.338.963 NDC-000 96.484 96.84 96.84 93.9698 96.84 96.389 95.9799 96.84 3.7086 4.3996 5.998 9.0389 6.449 7.9403.893.90 NDC-000 97.0500 96.60000 96.6000 8.053 96.6000 96.5000 96.5500 96.6000 0.9988 3.9 3.64 3.084.0087 8.650 5.649 9.376 NDC-3000 97.566 97.084 97.084 95.864 97.084 97.6 97.095 97.084 7.7454 6.330 7.09 47.598 33.484 34.4930 9.605 37.665 Table 4: Comparison results with ρ 7, ρ 3 7 ρ 4 7 Dataset ρ 7 (x, ) ρ3 7 (x, ) ρ4 7 (x, ) Auray(%) Auray(%) Auray(%) Time(s) Time(s) Time(s) Balane 90.3509 90.3509 90.3509 (65 4).4.9469 0.803 Breast 7.6364 7.6364 7.6364 (77 9) 0.438 0.646.087 Heart 8.967 8.967 8.967 (303 3) 0.760.4943 0.9585 Pima 76.09 76.09 76.09 (768 8).08.044.6805 Vote 94.0000 94.0000 94.0000 (435 6) 0.7866 0.76.900 Liver 60.0000 60.0000 60.0000 (345 6).5035.339.370 WBC 96.4706 96.4706 96.4706 (600 9).548.584.7459 Iris 93.0000 93.0000 93.0000 (50 4) 0.557 0.673 0.750 Vehile 86.0000 86.0000 86.0000 (50 8) 0.579 0.655 0.6887 Waveform 93.0000 93.0000 93.0000 (50 ) 0.6079 0.6053 0.5858 E-ISSN: 4-880 48 Volume 5, 06

one of ρ 6 (x, ), ρ 8 (x, ) ρ n 7(x, ) finally onsider ρ 4(x, ). Of ourse, different smooth approximation funtions will bring different lassifiation auraies. So, we should hoose a suitable smooth approximation funtion for underlying dataset. 5 Conlusions In this paper, we study the influene of eight smooth approximation funtions for SPTSVM on lassifiation auray running time by means of 0 UCI datasets 6 NDC datasets. From experiment results, we an get a hoie order of the eight approximation funtions in generally. But we know that different approximation funtions may bring different lassifiation auraies, we should hoose a suitable smooth approximation funtion for underlying dataset. As stated in Introdution, GEPSVM, TSVM PTSVM are three representative methods of NHSVM all the other NHSVM methods are improved versions based on them. In this paper, we only disuss the influene of eight known s- mooth approximation funtions for smooth version of PTSVM only disuss the linear version of SPTSVM. In the next step of work, we should do: firstly, we will investigate the influene of these approximation funtions for s- mooth versions of GEPSVM TSVM, respetively; seondly, we will onsider the nonlinear version of SPTSVM; thirdly, we will be ommitted to finding more smooth approximation funtions ompare the auraies of approximating with them to the plus funtion. Referenes: [] O.L. Mangasarian, E.W. Wild, Multisurfae proximal support vetor mahine lassifiation via generalized eigenvalues, IEEE Trans Pattern Anal Mah Intell. 8(), 006, pp. 69-74. [] Jayadeva, R. Khemhani, S. Chra, Twin support vetor mahines for pattern lassifiation, IEEE Trans Pattern Anal Mah Intell. 9(5), 007, pp. 905-90. [3] X. Chen, J. Yang, Q. Ye, J. Liang, Reursive projetion twin support vetor mahine via withinlass variane minimization, Pattern Reognition. 44, 0, pp. 643-655. [4] Y.H. Shao, C.H. Zhang, X.B. Wang, N.Y. Deng, Improvements on twin support vetor mahines, IEEE Transations on Neural Networks. (6), 0, pp. 96-968. [5] Q. Ye, C. Zhao, N. Ye, Y. Chen, Multiweight vetor projetion support vetor mahines, Pattern Reognition Letters. 3 (3), 00, pp. 006-0. [6] Y.H. Shao, N.Y. Deng, Z.M. Yang, Least squares reursive projetion twin support vetor mahine for lassifiation, Pattern Reogn. 45(6), 0, pp. 99-307. [7] Y.H. Shao, W.J. Chen, W.B. Huang, Z.M. Yang, N.Y. Deng, The best separating deision tree twin support vetor mahine for multi-lass lassifiation, Proedia Comput Si. 7, 03, pp. 03-038. [8] Y.H. Shao, Z. Wang, W.J. Chen, N.Y. Deng, A regularization for the projetion twin support vetor mahine, Knowl-Based Syst. 37, 03, pp. 03-0. [9] Y.H. Shao, C.H. Zhang, Z.M. Yang, L. Jing, N.Y. Deng, An etwin support vetor mahine for regression, Neural Comput Appli 3(), 03, pp. 75-85. [0] S.F. Ding, X.P. Hua, Reursive least squares projetion twin support vetor mahines for nonlinear lassifiation, Neuroomputing. 30(3), 04, pp. 3-9. [] S.F. Ding, X.P. Hua, J.Z. Yu, An overview on nonparallel hyperplane support vetor mahine algorithms, Neural Comput Appli. 5, 04, pp. 975-98. [] M. Arun Kumar, M. Gopal, Appliation of smoothing tehnique on twin support vetor mahine, Pattern Reogration Letters. 9, 008, pp. 84-848. [3] X.X. Zhang, L.Y. Fan, Appliation of s- moothing tehnique on projetive TSVM, International Journal of Applied Mathematis Mahine Learning. (), 05, pp. 7-45. [4] I. Zhang, A smoothing-out tehnique for min-max optimization, Math. Program. 9, 980, pp. 6-77. [5] Y.B. Yuan, J. Yan, C.X. Xu, Polynomial smooth support vetor mahine, Chinese Journal of Computers. 8(), 005, pp. 9-7. E-ISSN: 4-880 49 Volume 5, 06

[6] Y.B. Yuan, T.Z. Huang, A polynomial s- mooth support vetor mahine for lassifiation, Proeedings of the st International Conferene on Advaned Data Mining Appliations (ADMA 05). 005, pp.57-64. [7] J.Z. Xiong, J.L. Hu, H.Q. Yuan, Researh on a new lass of funtions for smoothing support vetor mahines, Ata Eletronia Sinia. 35(), 007, pp. 366-370. [8] Y.B. Yuan, W.G. Fan, D.M. Pu, Spline funtion smooth support vetor mahine for lassifiation, Journal of Industrial Management Optimization. 3(3), 007, pp. 59-54. [9] Q. Wu, J.L. Fan, Smooth support vetor mahine based on pieewise funtion, SieneDiret. 0(5), 03, pp. -8. [0] S.F. Ding, H.J. Huang, R. N, Foreasting method of stok prie based on polynomial smooth twin support vetor regression, Springer-Verlag Berlin Heidelberg. 7995, 03, pp. 96-05. [] S. Balasundaram, Deepak Gupta, Kapil, Lagrangian support vetor regression via unonstrained onvex minimization, Neural Networks. 5, 04, pp. 67-79. [] Lee, Y.J. Mangasarian, A smooth support vetor mahine for lassifiation, Computational Optimization Appliations. 0(), 00, pp. 5-. [3] X. Chen, J. Yang, J. Liang, Q. Ye, S- mooth twin support vetor regression, Neural Computation. (3), 0, pp. 505-53. [4] Z. Wang, Y. Shao, T. Wu, A GA-based model seletion for smooth twin parametrimargin support vetor mahine, Pattern Reogration. 46, 03, pp. 67-77. [5] Y.Q. Liu, S.Y. Liu, M.G. Gu, Selftraining polynomial support smooth semisupervised support vetor mahines, Journal of System Simulation. (8), 009, pp. 5740-5743. [6] C.L. Blake, C.J. Merz, UCI Repository for Mahine Learning Databases, 998. http:// www.is.ui.edu/mlearn/mlrepository.html. [7] D.R. Musiant, NDC: Normally distributed lustered datasets, 998. http:// www.s.wis.edu/ musiant/data/ nd. E-ISSN: 4-880 50 Volume 5, 06