Robust Support Vector Machine Using Least Median Loss Penalty

Size: px

Start display at page:

Download "Robust Support Vector Machine Using Least Median Loss Penalty"

Aubrie Pearl Wilkinson
5 years ago
Views:

1 Preprints of the 8th IFAC World Congress Milano (Italy) August 8 - September, Robust Support Vector Machine Using Least Median Loss Penalty Yifei Ma Li Li Xiaolin Huang Shuning Wang Department of Automation, Tsinghua University Tsinghua National Laboratory for Information Science and Technology Beijing, 84, China ( s: mayifei@gmail.com, li-li@mail.tsinghua.edu.cn, huangxl6@mails.tsinghua.edu.cn, swang@mail.tsinghua.edu.cn) Abstract: It is found that data points used for training may contain outliers that can generate unpredictabledisturbanceforsomesupportvectormachines(svms)classificationproblems.no theoretical limit for such bad influence is held in traditional convex SVM methods. We present a novel robust misclassification penalty function for SVM which is inspired by the concept of Least Median Regression. In our approach, total loss penalty in training is measured by the summation of two median hinge losses, each for a different class. We also propose a Rank and Convex Procedure to optimize our tasks. Though our approach is heuristic, it is faster than other known robust methods, such as SVM with Ramp Loss Penalty. Keywords: Statistical data analysis, Learning theory, Pattern recognition, Robust identification, Support vector machine.. Robust SVM. INTRODUCTION (a) The Support Vector Machine (SVM) is a binary classification tool developed by Cortes and Vapnik (995). It is characterized by its fast speed, good generalization ability, and unique solutions when convexity retains (Steinwart and Christmann, 8). However, in all of its recent developments we surveyed (section.), none has a robustness guarantee (which describes how the classifier works in the presenceofoutliers)underthecriteriaofbreakdownpoint, which has been widely used in robust regression (section.3). Moreover, our algorithm is shown to run faster than most other robust classification methods, especially the L-regularized Ramp Loss SVM (Collobert et al., 6). On the one hand, outliers in the training set prevent us from learning the true classification rules in practice. For example when SVM is used for intrusion detection in computer systems, Hu (3) suggested that in the assumed normal period used for training, intrusions may be already on the way. On the other hand, SVM is used for Outlier Detection through single class classification. Chandola et al. (7) did a survey of Outlier Detection. To better illustrate the need for robust SVM, we may see the toy example in Figure.. Known Robust SVM Methods Some of the methods below are not designed for robustness, but can be used as a robust approach. Other ap- This work was jointly supported by the National Natural Science Foundation of China (6748, 69748) and the Research Fund of Doctoral Program of Higher Education (839). Fig.. The outlier on the right side severely influence the Hinge Loss-SVM result in (a), but play no major role in (b), the Class Conditional Median Loss-SVM method that we propose for robust SVM classification. Support Vectors in both cases are circled. Note that (a) is tuned to maximize accuracy and vertical positions of points are not considered as features. proaches achieve robustness by modifying the loss penalty term in SVM objective function. Wang et al. (8) and Wu and Liu () proposed a fast probability estimation method using SVM, where decision rules for different posterior probabilities can be simulated by SVM results when multipliers for loss penalties are set differently on class margin violation and class margin violation. However, the focus in these methods is on probability estimation rather than robustness since they identify outliers after the estimation is done and their speed is slower than traditional SVMs. An earlier attempt (Song et al., ) allowed a small weight (and consequently a big margin) in areas far from the geologic centers of the training data points. However, model assumptions for data distribution are needed. The Ramp Loss-SVM or ψ-learning achieves robustness by setting a limit on maximal penalty for a single point violating its supposed margin. This can be achieved by (b) Copyright by the International Federation of Automatic Control (IFAC) 8

2 Preprints of the 8th IFAC World Congress Milano (Italy) August 8 - September,.5 Model Hinge Loss SVM Result.5 Ramp Loss SVM Result.5 Median Loss SVM Result Accuracy on the test set: 87.% 87.8% 88.% (a) (b) (c) (d) Fig.. Median Loss-SVM simplifies SVM problem in Banana Dataset (a). All results are with RBF and website suggested parameters (C = 36.,σ = ) and are learnt on train set 3, which contains 4 data points. For (c), the maximal penalty is found at maxl ramp (yf(x)) = L hinge () =. In (d), we use % upper quantile as the median (notations are described in section.3). From (b) to (d), the number of Support Vectors (circled) decreases and the margin increases. Clearly (d) has an easily controllable tolerance on outliers (crossed). Besides, in general, if we use a ( err%) quantile, CCML-SVM may improve classification accuracy. Dataset available at subtraction of two parallel Hinge Loss functions or by creating some piecewise linear functions (Collobert et al., 6; Ertekin et al., ; Liu et al., 5). The major disadvantage found in these methods is the lack of convexity, i.e. no known method can guarantee a global imum in polynomial time. Xu et al. (6) performed mathematical treatments to regain a semi-definite property under some relaxation, whose reformed problem was not yet scalable. Implementations of Ramp Loss-SVM or ψ-learning on a Concave-Convex Procedure (CCCP) can be found in most of the above papers..3 Breakdown Point & Least Median Squares Regression Least Median Squares (LMS)isarobustcriterionproposed by Rousseeuw (984) which is used in regression field to resolve a similar robust problem. LMS finds a regression linewithleastmediansquarederrorofsamplingdata.lms is proved to be the most robust criterion, when measured by the concept of breakdown point (Donoho and Huber, 98). The reason can be explained that a few outliers with large errors cannot affect the value of median squared error. Inspired by the idea of LMS, a novel robust SVM method is proposed in this paper. When LMS is used for linear regression, the regression problem can be solved by some exact algorithms (Steele and Steiger, 986; Stromberg, 993; Agull, 997). Unfortunately, when the data set grows up to some extent, the computing time of these algorithms will expose. Therefore, Rousseeuw and Hubert (997), Olson (997), and Chakraborty and Chaudhuri (8) proposed some approximation algorithms with less computation load. However, all these algorithms can only solve linear regression problems with LMS criterion. Therefore, when applying LMS in SVM, we will also introduce a new algorithm..4 Class Conditional Median Loss-SVM Approach The main contribution in our work is a Class Conditional Median Loss function (CCML) for the SVM loss penalty measurement. The total loss penalty for all data points is measured by the summation of the median individual hingelossesineachofthetwoclasses.datapointscontaining a bigger loss are likely to serve as outliers. Moreover, theclassificationhyperplaneremainssparse,yieldingafast training and testing speed comparable to (or even faster than) CCCP, the known fastest robust method. While discussion and experiments are included in this paper, our main focus is in the mathematical and pseudocode descriptions of the algorithms and their efficacy. Let us see a benchmark problem comparing Hinge Loss-SVM, Ramp Loss-SVM and CCML-SVM in Fig... FORMULATION OF SVM OPTIMIZATION WITH MEDIAN LOSS PENALTY. General Formulation of SVM Suppose the data is expressed as (x i,y i ) R d {,}, i Ω = {,...,n}, where d is the finite total number of features and y i is the class label of x i. By the above expression, the training data contains only two classes, denoted by their y attribute as class and class. All subscripts of data points in class are denoted by Ω {i Ω,y i = } and in class by Ω similarly. Ω sgn stands for either one of them provided sgn = or. Based on any continuous function f : R d R, a binary decision can be made that classifies any data point x to class if f(x) > or class otherwise. This function f(x)iscalledadecision function.givenadecisionfunction f(x) and any data point (x,y ), we measure the level of misclassification (called loss) by L(y f(x )), where L : R [, ) is called a Loss Function. A pair of decision margins are defined by the equation f(x) = ±. SVM learns a decision rule from an optimization which balances margin enlargement and train set loss suppression. In linear case, decision functionf(x) = ωxb, where (ω,b) are parameters to be learnt. Since / ω shows the width of the margin, ( ω C i Ω L(y i(ωx i b)) is to be imized. Here C is the balancing parameter. or of some other quantile 9

3 Preprints of the 8th IFAC World Congress Milano (Italy) August 8 - September, In general some kernel function k : R d R d R is used to measure the distance of any two data points in R d, which results in the construction of a so-called reproducing kernel Helbert Space (RKHS) H {k(,x) : x R d }. Decision function f(x) is restricted to the form f(x) = f H (x)b, where f H ( ) H and b is a real constant. SVM here imizes ( f H H C i Ω L(y if(x i )).. Traditional Hinge Loss-SVM In the traditional Hinge Loss-SVM, the loss function for each data point is L hinge (y i f(x i )) = max(, y i f(x i )) = [ y i f(x i )]. The loss term of SVM goal function and the SVM optimization problem are respectively, Total L hinge = C i Ω[ y i f(x i )] () f ( ω C ) [ y i f(x i )]. () i Ω Complete Formulation Taking (3) as loss penalty and (4) as constraint, our CCML-SVM optimization goal is, ( f( ) ω C median C median [ ( )f(x i )] [ ()f(x i )] i Ω ) s.t. median i Ω [ ()f(x i )] = median [ ( )f(x i )]. 3. SOLUTION OF CCML-SVM 3. Mixed Integer-Convex Description In order to form a programmable problem, we need to transform (5) into a mixed integer-convex optimization task presented in Theorem. The following two Lemmas lead to our result there. Lemma. Problem(5)isequivalenttotheproblembelow, (5).3 Class Conditional Median Loss-SVM (CCML-SVM) Class Conditional Median Loss Penalty We change the traditional Hinge Loss penalty term () to a novel Class Conditional Median Loss penalty term (CCML), ( ) f,r,r ω C max [ y i f(x i )] R { R = 5% Ω, R = 5% Ω s.t. y i f(x i ) y j f(x j ), i R R, j H H, (6) CCML = C median i Ω [ ()f(x i )] C median [ ( )f(x i )] (3) where median in general stands for a function that maps a list of values {φ[i],i =,...,N} to its ( N )-th maximal value φ. 3 In (3), it is used to find in {[ y i f(x i )],i Ω sgn } its ( Ωsgn )-th maximum, where sgn = stands for class and sgn = for class. Median 4 in this paper may also be replaced by some other upper quantiles in order to implement noise level control. A quantile is denoted here as the rank in percentage of an extracted single value in the given list of values it belongs to. For example, a 5% quantile means the ( 5% Ω sgn )-th maximum from the given list {[ y i f(x i )],i Ω sgn }. It may not exceed 5%. where R and H donate an arbitrary partition of Ω (i.e., Ω = R H and R H = ) that satisfies the constraints. R and H are defined similarly. Proof. One way to evaluate a median function median{φ[i],i =,...,N} is to partition the universe set A = {,...,N} into two subsets A R = {σ,...,σ k } and A H = {σ k,...,σ N }, where k = N and {σ,...,σ N } is a permutation of {,...,N}. If φ[σ i ] φ[σ j ] for all pairs (σ i A R,σ j A H ) (this condition exists for example when σ i is chosen such that φ[σ i ] s fall into an ascending order), it can be concluded that the maximal φ[σ i ] in σ i A R is ranked in value after A H elements but before the rest ( A R ) elements. It is then the ( A H )-th (the ( N )-th) maximum. Applying the above general treatment of median to the specific CCML penalty, we get the following, Balancing Constraint TheSVMsolutionfunctionf(x) = ωxb or f(x) = α i k(x,x i )b has a constant term b, which can be adjusted to balance the loss penalty for both classes. Hence we may add an additional constraint to (3), median[ y i f(x i )] = max[ y i f(x i )] { i Ω yi f(x s.t. i ) y j f(x j ), i R, j H R = 5% Ω, (7) median i Ω [ ()f(x i )] median [ ( )f(x i )] =. (4) Class Conditional is used to red the reader that our Median Loss is Class based. CCML is not a Loss Function but rather the term to replace the loss penalty term in the SVM goal function. 3 There is a difference between our definition of median and the statistical definition when N is even, but this difference is trivial. 4 For simpler notation, we still use the term median when we actually mean another specified quantile in this paper. where R and H denote an arbitrary partition of Ω satisfying the constraints. 5 Via replacing subscripts by, a similar result may be derived for class. We denote this counterpart by (7 ). 5 If φ is used to denote the optimal solution of the MIP problem (7), then the data points with penalty values (by evaluating [ y i f(x i )] ) less than φ are contained in R and those greater than φ are contained in H. There might be multiple data points whose penalty values are exactly at φ. In this case, those points are randomly picked up by R and H and have no influence on φ.

4 Preprints of the 8th IFAC World Congress Milano (Italy) August 8 - September, Constraint (4) makes the objective in (7) and (7 ) equal. Hence they are combined to form a single MIP (6). Lemma. The optimization task (6) has the same imized goal value as the following task, ( ) f,r,r ω C max [ y i f(x i )] R s.t. R = 5% Ω, R = 5% Ω (8) where R denotes a subset of Ω and R a subset of Ω. Proof. The only difference between (6) and (8) is that (6) has an additional constraint, y i f(x i ) y j f(x j ), i R R, j H H. (9) Clearly, relieved from the constraint (9), imized goal value in (8) imized goal target value in (6). In order to prove that the imized goal value in (6) is no greater than (8), we only need to construct a solution feasible in (6) (satisfying (9)) from an optimum solution in (8) and the goal value does not increase. Given that, imized goal in (8) goal value with our constructed solution imized goal in (6). Followed is the construction. Suppose one optimum in (8) is found at (f,r,r ) = (f,r,r ). Denote i as a subscript of the maximal penalty in R, i.e., [ y i )] = max [ y i f (x i )]. Also denote i the counterpart in R. Assume[ y i )] = [ y i )].Otherwise, by changing b in f, (8) may be further imized. When (9) does not hold as (8) reaches its imum, there exists at least a pair of i R R and j H H such that [ y i f (x i )] > [ y j f (x j )]. This pair is called a twisted pair because they violate the order between R R and H H described by (9). The following holds for either i R or i R and for either j H or j H, [ y i )] = [ y i )] [ y i f (x i )] > [ y j f (x j )]. In the case j H, we change R to R (R \{i }) {j} and H to H (H \{j}) {i }. Now max R [ y if (x i )] = [ y i )] max i R R [ y i f (x i )], which means that the goal value in (8) does not increase. It cannot decrease either since it is already the imum. The new pair ( R,R ) is in fact also a optimum solution for (8). For the case j H, we swap i R with this j and apply a similar analysis like the above. After the construction of a new R or R, the number of twisted pairs {(i,j) : [ y i f(x i )] > [ y j f(x j )],i R R,j H H } decrease. Hence, after finite steps repeating the above construction (from When (9) does not hold... ), twisted pairs will disappear. ConcludingLemmaand,ourcoding-friendlyMIPtasks are as followed. Theorem. Optimization task (5) is equivalent to the Mixed-Integer Programg task (8). Corollary. Optimization task (5) is equivalent to the following MIP task described with a slack variable ξ, ( ) f,r,r,ξ ω Cξ { yi f(x i ) ξ, i R R s.t. ξ R = 5% Ω, R = 5% Ω () where R denotes a subset of Ω and R a subset of Ω. 3. Dual Problem When R and R are fixed in (), MIP becomes a quadratic optimization problem. We call the problem for fixed R R R the convex part in (). The Lagrangian for this convex part is, L P = ω Cξ Cβξ α i ( y i (ωx i b) ξ), where α i and β are dual variables. For the above convex part, the Karush-Kuhn-Tucker (KKT) condition is, L P ω T = ω α i y i x i =, L P b = α i y i =, L P = α i C( β) =, () ξ α i ( y i (ωx i b) ξ) =, βξ =, β, α i, (i R is omitted for all conditions and summations). The dual problem for convex part of () is then, max α i α i α j y i y j k(x i,x j ) α i i,j R α i, i R, s.t. i C, α α i y i =, () where k(x i,x j ) stands for a specific kernel function 6 measuring the distance between x i and x j. Arelationshipbetweendualconvex part andprimalconvex part can be derived from KKT criteria () and the fact that the optimal goal values of () and () are equal. 3.3 Rank and Convex Procedure (RCP) A reasonable solution of () may be achieved by Algorithm. 6 for example an Euclidean norm distance for linear decision or a Gaussian RBF for a typical nonlinear decision

5 Preprints of the 8th IFAC World Congress Milano (Italy) August 8 - September, Algorithm : Rank and Convex Procedure (RCP) Step input : Ω denoting a subset of training set Ω that contains % random data from each class or f (x) denoting an initial decision function. output: A convergent decision function f(x). initialize Train % random data from each class with Hinge Loss-SVM for f (x) = j Ω α j y jk(x j x)b. repeat R Find new Rsgn t by evaluating f t (x i ),i Ω sgn and sorting them (sgn = or ). Assume R t = R t R t Solve optimum ( αi t in dual convex part max αi α t i ) i,j R α t i α j y i y j k(x i,x j ) { αi, i R s.t. t α t i C, α t i y i = C Back to ( primal convex part α t t i i,j R α t t i αt j y iy j k(x i,x j )) Step ξ t = C b t = ξ t y i j R t αt j y jk(x i,x j ), αi t >,i Rt f t (x) = j R α t t j y jk(x,x j )b t until R t == R t or t exceeds the iteration limit. Theorem. RCP converges in finite steps. Proof. For the goal function in (8), J(f,R,R ) = ω Cmax [ y i f(x i )], C step decreases its value directly by optimizing f( ) and R step does not increase its value. Also, for each choice of R, the C step is a convex problem and has a unique optimum. Since the combination of R is finite (because R Ω), RCP will converge in finite steps. 3.4 RCP Take Advantage of Traditional Solvers Algorithm solves our goal (5) through (8) directly. However, from observations of RCP, in some situations we can improve its coding feasibility in RCP form. Since we have neglected half of (or some other percentage of) data points from each class for consideration in C Step in RCP, it appears that the remained training data points (contained in R R ) are completely separable. Hence, a hard margin may be applied to the remaining R R subset. This can be implemented with C for any kind of loss function. Alternative C step adapted from total hinge loss 7 in R R is listed in Table. 3.5 Heuristic Methods for a Better Solution There are two mechanisms for heuristic methods to fiddle with the input of RCP. In one way, we can heuristically combine or modify convergent decision functions f( ) from thelastrcpoutputandbeginanewrcpwiththerevised 7 Of course, in terms of hard margin, a variety of loss penalty term for SVM may be used. Table. Alternative C Step for RCP (i,j R t is ommited for all conditions and summations) Name C Step : Hard Margin for RCP Alternative Formulation Solve optimum α t j in dual convex part ( ) max αi αi αi α j y i y j k(x i,x j ) { αi, i R t s.t. αi y i = Back to primal convex part b t = y i α t j y j k(x i,x j ), α t i > f t (x) = α t j y jk(x,x j )b t decisionfunctions.inanotherway,wemaychoosedifferent initializing subsets Ω heuristically according to the last RCP outputs and begin a new RCP with the new subset. In the latter mechanism, we may set bigger chances for the correctlyclassifiedpointssuchthattheconvergentdecision function in the following iteration will not change greatly. In general, the former mechanism is better in exploration while the latter in exploitation. 4. PROPERTIES 4. Choice of Class Conditional Quantile and Robustness Thechoiceofconditionalquantiles R sgn / Ω sgn isatradeoff between robustness and accuracy. This is shown in Figure 3. A bigger quantile means better accuracy but also smallermarginandlesstoleranceonoutliers,puttingsvm at risk of overfitting. We suggest that a proper quantile allow twice the neglecting percentage as the error rate of a typical classifier. Fig. 3. Decision line for toy -D data under different quantiles. 5 points are generated randomly with Ω 3 Ω. (a) shows class conditional distribution p X Y (x ) and p X Y (x ) by red and blue curves respectively. (b) shows the posterior possibility of P(Y = X = x) (red) and P(Y = X = x) (blue). (c) is a Hinge Loss-SVM result simulating Bayesian decision function f B (x) = P(Y = X = x) P(Y = X = x). (d), (e), (f) are CCML- SVM results for different R sgn / Ω sgn values (9%, 7%, and 5% respectively). Only horizontal position of each data point is considered as a feature. (a) (b) (c) (d) (e) (f)

6 Preprints of the 8th IFAC World Congress Milano (Italy) August 8 - September, 4. Comparison with Ramp Loss for Speed The Ramp Loss penalty function, or a ConCave-Convex Procedure (CCCP) is shown in Collobert et al. (6) to be faster than the classical convex approach of Hinge Loss penalty. Algorithm shows their approach. Algorithm :ConCave-ConvexProcedureasdepicted in Collobert et al. 8 (for purpose of comparison) initialize Train % random data with Hinge Loss-SVM for f (x) = w xb. repeat CC Step { Compute βi t C, if = yi f t (x i ) < s, otherwise C Step Solve αi t for ( max αi i Ω α i { s.t. i Ω α iy i = βi t α i C βi, t i Ω ) i,j Ω α iα j y i y j k(x i,x j ) Compute b t using y i ( i Ω αt j y jk(x i,x j )b t ) =, < α i < C C Step result is f t (x) = j Ω αt j y jk(x,x j )b t until β t == β t or t exceeds iteration limit. It is clear that CCCP and RCP (or RCP) follow similar routines, ) a C Step taking advantage of the SVM convex optimizer, and ) a CC Step or R Step in deciding which data points to train in the C Step. Their difference is primarily whether the data sets fed into C Step contain noises. In CCCP, they allow noises to the extent that s < y i f(x i ). In our approach however, noisesarepreconditioned.therefore,onsimilarconditions, the number of Support Vectors are decreased substantially in our approach and speed in C Step increased. Moreover, if the C Step is done with a Gaussian RBF and the primal result cannot be simplified, the number of SVs reflects the number of implicit C i exp( σ x x i ) terms and our approach greatly simplifies the result. This fact contributes to a much faster evaluation of y i f(x i ) in R Step than in CC Step. 5. CONCLUSION In this paper, we proposed a novel approach, Class Conditional Median Loss-SVM for robust classification. We described its equivalent optimization problems and a RCP algorithm to solve them. We also demonstrated its unique properties like adaptivity to different noise levels and even faster speed than ConCave-Convex Procedures. A demo in -D benchmark dataset is reported as well. ACKNOWLEDGEMENTS We highly appreciate the helpful comments from Dr. HuachunZhang(InstituteofElectronics,ChineseAcademy of Sciences), Dr. Jun Xu and Mr. Qiang Meng (Department of Automation, Tsinghua University). REFERENCES Agull, J. (997). Exact algorithms for computing the least median of squares estimate in multiple linear regression. In L-Statistical Procedures and Related Topics, Chakraborty, B. and Chaudhuri, P. (8). On an optimization problem in robust statistics. Journal of Computational and Graphical Statistics, 7(3), Chandola, V., Banerjee, A., Kumar, V., and Chandola, V. (7). Outlier detection: A survey. Collobert, R., Sinz, F., Weston, J., and Bottou, L. (6). Trading convexity for scalability. In ICML 6: Proceedings of the 3rd international conference on Machine learning, 8. ACM, New York, NY, USA. Cortes, C. and Vapnik, V. (995). Support-vector networks. In Machine Learning, Donoho, D. and Huber, P. (98). The notion of breakdown point. In A Festshrift for Erich L. Lehmann, Ertekin, S., Bottou, L., and Giles, C. (). Nonconvex Online Support Vector Machines. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(), Hu, W. (3). Robust support vector machines for anomaly detection. In Proc. 3 International Conference on Machine Learning and Applications, 3 4. Liu, Y., Shen, X., and Doss, H. (5). Multicategory ψ-learning and support vector machine. Journal of Computational and Graphical Statistics, 4(), Olson, C. (997). An approximation algorithm for least median of squares regression. Information Processing Letters, 63, Rousseeuw, P. (984). Least median of squares regression. Journal of American Statistical Association, 79(388), Rousseeuw, P. and Hubert, M. (997). Recent developments in progress. In L-Statistical Procedures and Related Topics, 4. Song, Q., Hu, W., and Xie, W. (). Robust support vector machine with bullet hole image classification. Systems, Man and Cybernetics, Part C, IEEE Transactions on, 3(4), Steele, J. and Steiger, W. (986). Algorithms and complexity for least median of squares regression. Discrete Applied Mathematics, 4(), 93. Steinwart, I. and Christmann, A. (8). Support Vector Machines. Information Science and Statistics. Springer, Dordrecht. Stromberg,A.J.(993). Computingtheexactleastmedian of squares estimate and stability diagnostics in multiple linear regression. SIAM Journal on Scientific Computing, 4(6), Wang, J., Shen, X., and Liu, Y. (8). Probability estimation for large-margin classifiers. Biometrika, 95(), Wu, Y. and Liu, Y. (). Non-crossing large-margin probability estimation and its application to robust svm via preconditioning. Statistical Methodology, 8(), Xu, L., Crammer, K., and Schuurmans, D. (6). Robust support vector machine training via convex outlier ablation. In AAAI 6: Proceedings of the st national conference on Artificial intelligence, AAAI Press. 3

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data