Rough Margin based Core Vector Machine

Similar documents
Perceptron Revisited: Linear Separators. Support Vector Machines

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

Comments on the Core Vector Machines: Fast SVM Training on Very Large Data Sets

Support Vector Machines

Support Vector Machine (continued)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

An introduction to Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Statistical Machine Learning from Data

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

ν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors

L5 Support Vector Classification

Machine Learning. Support Vector Machines. Manfred Huber

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machine

Support Vector Machine for Classification and Regression

Non-linear Support Vector Machines

Support Vector Machines Explained

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Jeff Howbert Introduction to Machine Learning Winter

A Revisit to Support Vector Data Description (SVDD)

Soft-margin SVM can address linearly separable problems with outliers

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE

Introduction to Support Vector Machines

Support Vector Machines

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

Support Vector Machines

SMO Algorithms for Support Vector Machines without Bias Term

Kernel Methods and Support Vector Machines

Introduction to Support Vector Machines

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Formulation with slack variables

Square Penalty Support Vector Regression

Support Vector Machine (SVM) and Kernel Methods

Review: Support vector machines. Machine learning techniques and image analysis

Convergence of a Generalized Gradient Selection Approach for the Decomposition Method

Support Vector Machines for Classification: A Statistical Portrait

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Support Vector Machines

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

LIBSVM: a Library for Support Vector Machines (Version 2.32)

Discussion of Some Problems About Nonlinear Time Series Prediction Using ν-support Vector Machine

Support Vector Machines

Solving the SVM Optimization Problem

Support Vector Machine (SVM) and Kernel Methods

Sequential Minimal Optimization (SMO)

Statistical Pattern Recognition

Support Vector Machines.

Support Vector Machine & Its Applications

Pairwise Away Steps for the Frank-Wolfe Algorithm

ICS-E4030 Kernel Methods in Machine Learning

Lecture Support Vector Machine (SVM) Classifiers

Direct L2 Support Vector Machine

Introduction to Support Vector Machines

Pattern Recognition 2018 Support Vector Machines

CS-E4830 Kernel Methods in Machine Learning

Support Vector Machines for Classification and Regression

A NEW MULTI-CLASS SUPPORT VECTOR ALGORITHM

SUPPORT VECTOR MACHINE

Support Vector Machine Regression for Volatile Stock Market Prediction

CS145: INTRODUCTION TO DATA MINING

Model Selection for Multi-class SVMs

Support Vector Machines: Maximum Margin Classifiers

Learning with kernels and SVM

Support Vector Machines on General Confidence Functions

Max-Margin Ratio Machine

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Machines, Kernel SVM

Diversity Regularized Machine

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Training Support Vector Machines: Status and Challenges

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning And Applications: Supervised Learning-SVM

Classifier Complexity and Support Vector Classifiers

LIBSVM: a Library for Support Vector Machines

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Linear & nonlinear classifiers

Max Margin-Classifier

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

SVMC An introduction to Support Vector Machines Classification

Support Vector Machine via Nonlinear Rescaling Method

Robust Support Vector Machine Using Least Median Loss Penalty

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Nearest Neighbors Methods for Support Vector Machines

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

A Tutorial on Support Vector Machine

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

CSC 411 Lecture 17: Support Vector Machine

Announcements - Homework

Cutting Plane Training of Structural SVM

Support Vector Machines

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support vector machines

From Lasso regression to Feature vector machine

Transcription:

Rough Margin based Core Vector Machine Gang Niu, Bo Dai 2, Lin Shang, and Yangsheng Ji State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 20093, P.R.China {niugang,jiyangsheng,lshang}@ai.nju.edu.cn 2 NLPR/LIAMA, Institute of Automation, Chinese Academy of Science, Beijing 0090, P.R.China, bdai@nlpr.ia.ac.cn Abstract. The recently proposed rough margin based support vector machine (RMSVM) could tackle the overfitting problem due to outliers effectively with the help of rough margins. However, the standard solvers for them are time consuming and not feasible for large datasets. On the other hand, the core vector machine (CVM) is an optimization technique based on the minimum enclosing ball that can scale up an SVM to handle very large datasets. While the 2-norm error used in the CVM might make it theoretically less robust against outliers, the rough margin could make up this deficiency. Therefore we propose our rough margin based core vector machine algorithms. Experimental results show that our algorithms hold the generalization performance almost as good as the RMSVM on large scale datasets and improve the accuracy of the CVM significantly on extremely noisy datasets, whilst cost much less computational resources and are often faster than the CVM. Introduction People in computer science societies have been questing for faster algorithms since long before. When come to mind the solving techniques of SVMs, there are several approaches ranged from the chunking method [] to the sequential minimal optimization [2], as well as scaling down the training data and low-rank kernel matrix approximations [3]. Eventually, the Core Vector Machine (CVM) algorithms [4, 5, 6, 7] have gone to an extreme that they have linear asymptotic time complexity and constant asymptotic space complexity, since they transform the quadratic programming (QP) involved in SVMs to the minimum enclosing ball (MEB) problems. In order to perform this transformation the CVM takes the 2-norm error, which may cause it less robust and thus hurt the accuracy. Fortunately the notion of the rough margin in the Rough Margin based Support Vector Machine (RMSVM) [8] could make SVMs less sensitive to noises and outliers, and consequently reduce the negative effects of outliers. For this reason we propose our Rough Margin based Core Vector Machine (RMCVM), which unites the merits of the two aforementioned methods. Supported by National Natural Science Foundation of China (No. 60775046) and Natural Science Foundation of Jiangsu Province (No. BK2009233).

After brief introductions to the CVM and the RMSVM in Section 2 and 3, we will first of all define the primal problem of the RMCVM. Next we shall elaborate how to solve an RMCVM through the approximate MEB finding algorithm. We will also investigate the loss functions used by the RMSVM and RMCVM. In the end experimental results are shown in Section 5. 2 Core Vector Machine with Minimum Enclosing Ball Given a set of points S = {x,..., x m }, where x i R d for some integer d, the minimum enclosing ball of S (denoted MEB(S)) is the smallest ball which contains all the points in S [5]. Formally, let ϕ be a kernel induced feature map, min R,c R2 s.t. c ϕ(x i ) 2 R 2, i =,..., m. () Let B(c, R ) be the exact MEB(S). Given an ɛ > 0, a (+ɛ)-approximation of B(c, R ) is a ball B(c, (+ɛ)r) such that R R and S B(c, (+ɛ)r). A subset Q of S is called a core-set, if S B(c, (+ɛ)r) where B(c, R) is MEB(Q). The approximate MEB finding algorithm [9] uses a simple iterative scheme: at the t-th iteration, Q t is expanded by including the farthest point of S from c t, then optimize () to get B(c t+, R t+ ); this is repeated until S B(c t, (+ɛ)r t ). A surprising property is that the number of iterations, and thus the size of the final core-set, depend only on ɛ but not on d or m [9, 5]. The dual of () is max α α diag(k) α Kα s.t. α 0, α =, where α is the Lagrange multiplier and K is the kernel matrix. Conversely, any QP of this form can be regarded as an MEB problem [4]. In particular when k(x, x) = κ, (2) where κ is a constant (this is true for many popular kernels), we can drop the linear term α diag(k) and obtain a simpler QP, max α α Kα s.t. α 0, α =. (3) Definition (CVM [4]). min w,b,ξ,ρ w 2 +b 2 2ρ+C ξi 2 s.t. y i (w ϕ(x i )+b) ρ ξ i, i =,... m, (4) where C is a regularization parameter and ξ i are slack variables. The dual of (4) is analogous to the dual (3), in which K is replaced with K that K ij = ϕ(x i, y i ) ϕ(x j, y j ) where ϕ(x i, y i ) = [y i ϕ(x i ), y i, C e i ] (e i is all 0 except that the i-th component is ). Hence the CVM is an MEB if (2) is true. To deal with the situation that (2) is not satisfied, Tsang et al. extend the MEB to the center-constrained MEB [0], and propose the generalized core vector machine [] which is applicable for any kernel and can also be applied to kernel methods such as SVR and ranking SVM. Note that the 2-norm error is used here. It could be less robust in the presence of outliers in theory to some extent [5].

3 Rough Margin based Support Vector Machine The rough set theory, which is based on the concept of the lower and upper approximation of a set, is a mathematical tool to cope with uncertainty and incompleteness [2]. The rough margins [8] are expressed as a lower margin 2ρ l w and an upper margin 2ρu w where 0 ρ l ρ u. They are corresponding with the lower and upper approximations of the outlier set, such that the samples in the lower margin are considered as outliers, the samples outside the upper margin are not outliers, and the samples between two rough margins are possibly outliers. When the training procedure takes place, the RMSVM tries to give major penalty to samples lying within the lower margin, and give minor penalty to other samples [8]. Notice that the Universum SVM [3] uses a similar strategy. In practice, the RMSVM introduces slack variables ζ i and penalize them τ times larger than ξ i, since ζ i > 0 means that ϕ(x i ) is in the lower margin. Formally, Definition 2 (RMSVM [8]). min w,b,ξ,ζ,ρ l,ρ u 2 w 2 νρ l νρ u + m ξ i + τ m s.t. y i (w ϕ(x i ) + b) ρ u ξ i ζ i, 0 ξ i ρ u ρ l, ζ i 0, ρ l 0, ρ u 0, i =,... m, ζ i (5) where ν (0, ), τ > are regularization parameters and ξ i, ζ i are slack variables. 3. Justification of the Rough Margin Apparently the RMSVM should encounter severer overfitting problem since it emphasizes more on outliers than the ν-svm. However, the dual of (5) is min α 2 α Qα s.t. y α = 0, α 2ν, 0 α i τ, i =,... m, (6) m where Q ij = y i y j ϕ(x i ) ϕ(x j ). Hence 2ν/τ is the fraction of samples permitted to lie within the lower margin. For fixed ν, the larger the τ is, the less overfitting the RMSVM is, and ultimately the RMSVM would become an underfitting classifier. Likewise the loss function, which was an absent topic in [8], could justify the rough margin well. Let f(x i ) = w ϕ(x i ) + b, Proposition 3. The loss function of the RMSVM is ρ u + (τ )ρ l τy i f(x i ), y i f(x i ) ρ l L ρ l,ρ u (x i, y i, f) = ρ u y i f(x i ), ρ l < y i f(x i ) ρ u 0, y i f(x i ) > ρ u Proof. (Sketch) In fact ξ i increases before ζ i, while ζ i has to stay at zero until ξ i arrives at ρ u ρ l, since ξ i suffers less penalty in (5).

In other words, though m < α i τ m when y if(x i ) ρ l, ρ l is smaller than ρ, the loss L ρ l,ρ u (x i, y i, f) may still not be large, and the number of samples satisfying y i f(x i ) ρ l is usually smaller than the number of samples satisfying y i f(x i ) ρ in the ν-svm. Therefore the notion of the rough margin is a tool and technique to avoid the overfitting problem. There is a side effect that ρ u is usually larger than ρ, which makes the RMSVM generate much more support vectors satisfying ρ l < y i f(x i ) ρ u such that ϕ(x i ) lies between two rough margins with tiny α i. This phenomenon slows down the speed and increases the storage of the RMSVM. 4 Rough Margin based Core Vector Machine For the sake of using the approximate MEB finding algorithm to solve the rough margin based SVM, we use 2-norm error because it allows a soft-margin L2-SVM to be transformed to a hard-margin one. Subsequently we have Definition 4 (RMCVM). min w 2 + b 2 2ρ l 2ρ u + C w,b,ξ,ζ,ρ l,ρ u ξi 2 + τc s.t. y i (w ϕ(x i ) + b) ρ u ξ i ζ i, ξ i ρ u ρ l, i =,... m, ζ 2 i (7) where C > 0, τ > are regularization parameters and ξ i, ζ i are slack variables. The dual of (7) is max α,β α ( K yy + yy + ) τc I α α β 2 C s.t. α = 2, β =, α 0, β 0, where y = [y,..., y m ] and the operator denotes the Hadamard product. Remark 5. We omit the constraint ζ i 0 since it is dispensable for L2-SVMs. We omit the constraints ρ l, ρ u 0 based on the fact that certain inequality constraints in the dual problem can be replace by the corresponding equality constraints [4, 5]. Finally we omit ξ i 0, otherwise there will be C α β+γ 2 in the objective and additional constraint γ 0, and the optimal γ = 0 obviously. The constraint ρ u ρ l is indeed implicated by (7) already. Remark 6. Note that the regularization parameter of the original SVM [6] and the ν-svm [7] is C and ν respectively. In the CVM it is C through which we control the trading off between the flatness and training errors. Since the order of w and ξ i are equal in (4), their coefficients would change simultaneously under scaling, which means that the coefficient of ρ does not influence very much. Remark 7. It is obvious that there is only ν-rmsvm insofar as it applies -norm error. Similarly the RMSVM using 2-norm error would be C-RMSVM inherently. (8)

Therefore we demonstrate that Definition 4 is proper. Furthermore, Proposition 8. The loss function of the RMCVM is (ρ u ρ l ) 2 + τ(ρ l y i f(x i )) 2, y i f(x i ) τ+ L ρ τ ρ l τ ρ u l,ρ u τ 2 (x i, y i, f) = τ+ (ρ u y i f(x i )) 2, τ+ τ ρ l τ ρ u < y i f(x i ) ρ u 0, y i f(x i ) > ρ u Proof. (Sketch) We have ξ i = τζ i when 0 < ξ i < ρ u ρ l, by equalling the partial derivatives of the objective function of (7) w.r.t. ξ i and ζ i respectively. 4. Solving Rough Margin based CVM From now on we will proof that (7) can be solved approximately by the CVM. Proposition 9. The RMSVM cannot be transformed to an MEB problem unless we drop the group of constraints α i τ m in (6). Lemma 0. Given a non negative vector α R m, the optimal value of is between 4m (α ) 2 and 4 α 2. min β α β 2 s.t. β = 2 α, β 0 (9) Proof. (Sketch) The upper bound is quite straightforward by setting β = 2 α. Let V = {β : β = 2 α }, then V consists of a hyperplane in R m. The lower bound is given by min β V α β = α 2 α = α 2 m. Actually β i = 0 iff cot(α, e i ) 2+. In other words, β is even sparser than α, which is consistent with that there are less outliers than support vectors. Theorem. The optimum of (8) is bounded by ( max α α K yy + yy + τ + 4 ) 4τC I α s.t. α = 2, α 0, (0) max (K α α yy + yy + 4 C + ) τc I α s.t. α = 2, α 0. () Proof. (Sketch) Denote the objective function of (8) as max α,β h(α, β) = min α,β h (α) + h 2 (α, β) where h (α) = α ( K yy + yy + τc I) α and h 2 (α, β) = C α β 2. Then ( ) min h(α, β) min h (α) + min h 2 (α, β) α,β α β since h(α, β) is convex. According to Lemma 0, min β h 2 (α, β) for given α is bounded by 4mC (α ) 2 and 4C α α.

Corollary 2. The RMCVM can be solved approximately using the approximate MEB finding algorithm. Proof. Let Q (), ϕ, Q (2), ϕ 2 be Q () ij = ϕ (x i, y i ) ϕ (x j, y j ), ϕ (x i, y i ) = [ y i ϕ(x i ), y i, [ Q (2) ij = ϕ 2 (x i, y i ) ϕ 2 (x j, y j ), ϕ 2 (x i, y i ) = y i ϕ(x i ), y i, ] τ + 4 4τC e i, 2 mc, e i τc Then (0) and () can be regarded as MEB problems with parameters R t = 2 α diag(q (t) ) 4 α Q (t) α, c t = α i ϕ t (x i, y i ), t =, 2. 2 The convergence of (0) and () are as same as the CVM but we omit the proof here. Hence the approximate RMCVM algorithms based on (0) and () have linear asymptotic time complexity and constant asymptotic space complexity. Recall that the RMSVM is slower than the ν-svm since it generates more support vectors. Surprisingly the RMCVM most often generates less core vectors and are faster than the CVM, even though we solve two MEB problems. ]. 5 Experiments We implement the RMSVM using LibSVM [8] and the RMCVM using LibCVM. The parameters are fixed to τ = 5 as [8] suggested, and ɛ = 0 6 as [5] recommended. We tune ν {0., 0.2,..., 0.9} providing it is feasible. The kernel is Gaussian and its width is the better one computed by the default methods of LibSVM and LibCVM. The computation method of kernel width is kept unchanged over one dataset. The notation RC l stands for the solution f l (x) = α i y i k(x, x i ) + b from (0) and RC u for f u (x) from (), where b = y α respectively. We denote RC avg as the average solution of them. For multi-class tasks the default one-versus-one strategy is used. The datasets 2 are listed in Table. To begin with, we conduct experiments on a small dataset shown in Table 2. Our accuracy is almost always higher than the CVM. We find that RC avg is not always the best and RC u is usually better than RC l. The next are results on large datasets in the top of Table 3, where the time for reading input and writing output files is excluded from the training time. Note that the CVM and the RMCVM are very fast on extended-usps, on which all the RMSVM fail to give a solution in 24 hours. Moveover, the RMCVM is usually better than the CVM and even beat the RMSVM once on web. At last we display results 2 Download from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ and http://www.cse.ust.hk/~ivor/cvm.html.

Table. Data sets name #class #training #testing #feature satimage 6 4,435 2,000 36 web 2 49,749 4,95 300 ex-usps 2 266,079 75,383 676 intrusion 2 4,898,43 3,029 27 SensIT Vehicle: acoustic 3 78,823 9,705 50 SensIT Vehicle: seismic 3 78,823 9,705 50 Table 2. Experimental results on satimage (accuracy (%)) ɛ = 0 5 ɛ = 0 6 C CVM RC l RC u RC avg CVM RC l RC u RC avg ν ν-svm RMSVM 0. 82.5 83.5 84.8 84. 82.5 83.5 84.8 84. 0.5 8.9 88.8 85.7 86.9 87.7 86.8 85.6 86.9 87.8 86.9 0.4 84.8 89.5 0 89.0 88.3 88.8 88.3 88.6 89.2 89.4 89.3 0.3 87.2 89.8 00 89.3 88.3 84.9 88.4 89.6 89.8 90.0 90. 0.2 88.9 89.6 Table 3. Experimental results on large datasets accuracy (%) training time (seconds) C CVM RC l RC u RC avg RMSVM CVM RC l RC u RC avg RMSVM web 0 98.6 98.7 98.8 98.7 423 28 79 295 00 98.9 99.0 98.9 99. 39 29 23 53 98.9,000 96.2 97. 97.3 98.2 20.7.6 2.9 24.6 0,000 95.9 96. 93.0 97.4 8.88 7.75 4.53 2.22 extended-usps 00 99.46 99.50 99.49 99.49 66 2 09 229 -,000 99.45 99.47 99.46 99.47 8 92 08 20 34 > day intrusion 00 9.8 92.0 92. 92. 209 36 3 49 not 0,000 9.8 9.7 9.8 92. - 3.6 3.45 3.58 6.77 enough,000,000 9.8 92.0 88.3 92.3 2.4.65.6 2.84 memory SensIT Vehicle: acoustic 00 50.0 26.7 3.9 4.9 3923 706 38 832,000 39.4 37. 52.4 58.4 68.5 23.6 5. 38.2 66. 0,000 40.9 42. 4.0 46.6 9.47 7.46 6.98 4.,000,000 40.6 50.6 42.4 47.7 4.90 5.45 4.37 9.64 SensIT Vehicle: seismic 00 50. 23.2 23.5 26.2 3882 796 64 95,000 28. 46.5 59. 57.3 55.0 2.5.6 33.3 64.6 0,000 5.2 4.6 50.3 55.0 8.50 7.7 6.6 3.7,000,000 34.6 46.7 4. 45.9 4.60 4.25 4.67 8.70 79434 3594

on the extremely noisy SensIT Vehicle in the bottom of Table 3. Perhaps the RMCVM could always choose the right core vector and have less iteration before convergence as a result. When C is too small the RMCVM is inferior to the CVM since it is underfitting. 6 Conclusion Motivated by the rough margin, we propose the rough margin based core vector machine and demonstrate that it can be solved efficiently. Experimental results show that the derived algorithms can handle very large datasets and the accuracy is almost comparable to the rough margin based SVM. References [] Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (998) [2] Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods. MIT Press (999) 85 208 [3] Smola, A.J., Schökopf, B.: Sparse greedy matrix approximation for machine learning. In: 7th ICML. (2000) 9 98 [4] Tsang, I.W., Kwok, J.T., Cheung, P.M.: Very large svm training using core vector machines. In: 20th AISTATS. (2005) [5] Tsang, I.W., Kwok, J.T., Cheung, P.M.: Core vector machines: Fast svm training on very large data sets. JMLR 6 (2005) 363 392 [6] Tsang, I.W., Kocsor, A., Kwok, J.T.: Simpler core vector machines with enclosing balls. In: 24th ICML. (2007) 9 98 [7] Asharaf, S., Murty, M.N., Shevade, S.K.: Multiclass core vector machine. In: 24th ICML. (2007) 4 48 [8] Zhang, J., Wang, Y.: A rough margin based support vector machine. Information Sciences 78 (2008) 2204 224 [9] Bădoiu, M., Clarkson, K.L.: Optimal core-sets for balls. In: DIMACS Workshop on Computational Geometry. (2002) [0] Tsang, I.W., Kwok, J.T., Lai, K.T.: Core vector regression for very large regression problems. In: 22nd ICML. (2005) 92 99 [] Tsang, I.W., Kwok, J.T., Zurada, J.M.: Generalized core vector machines. IEEE Transactions on Neural Networks 7 (2006) 26 40 [2] Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences (982) 34 356 [3] Weston, J., Collobert, R., Sinz, F., Bottou, L., Vapnik, V.: Inference with the universum. In: 23rd ICML. (2006) 009 06 [4] Crisp, D.J., Burges, C.J.C.: A geometric interpretation of ν-svm classifiers. In: NIPS. Volume 2. (999) 244 250 [5] Chang, C.C., Lin, C.J.: Training ν-support vector classifiers: Theory and algorithms. Neural Computation 3 (200) 29 247 [6] Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20 (995) 273 297 [7] Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Computation 2 (2000) 207 245 [8] Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. (200) Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.