ON REGULARISATION PARAMETER TRANSFORMATION OF SUPPORT VECTOR MACHINES. Hong-Gunn Chew Cheng-Chew Lim. (Communicated by the associate editor name)

Size: px

Start display at page:

Download "ON REGULARISATION PARAMETER TRANSFORMATION OF SUPPORT VECTOR MACHINES. Hong-Gunn Chew Cheng-Chew Lim. (Communicated by the associate editor name)"

Noah Harrell
5 years ago
Views:

1 Manuscrpt submtted to AIMS Journals Volume X, Number 0X, XX 200X Webste: pp. X XX ON REGULARISATION PARAMETER TRANSFORMATION OF SUPPORT VECTOR MACHINES Hong-Gunn Chew Cheng-Chew Lm School of Electrcal and Electronc Engneerng The Unversty of Adelade SA 5005 AUSTRALIA (Communcated by the assocate edtor name) Abstract. The Dual-nu Support Vector Machne (SVM) s an effectve method n pattern recognton and target detecton. It mproves on the Dual-C SVM, and offers compettve performance n detecton and computaton wth tradtonal classfers. We show that the regularsaton parameters Dual-nu and Dual-C can be set such that the same SVM soluton s obtaned. We present the process of determnng the related parameters of one form from the soluton of a traned SVM of the other form, and test the relatonshp wth a dgt recognton problem. The lnk between the Dual-nu and Dual-C parameters allows users to use Dual-nu for ease of tranng, and to swtch between the two forms readly. 1. Introducton. The Support Vector Machne (SVM) mplements structural rsk mnmsaton whch s a learnng prncple that attempts to mnmse the error and the complexty of the decson functon [1, 17]. The supervsed learnng paradgm has been used wth many applcatons n mage classfcatons [3, 10]. The SVM learns from a two-class tranng set by maxmsng the wdth of a margn between the two classes n a feature space nduced by a kernel, and mnmsng complexty by usng least tranng ponts to support the decson hyperplane. Tranng an SVM s formulated as solvng a lnearly constraned quadratc programmng problem. Its objectve functon conssts of the wdth of the margn 2/ w and an error penalty term, and s constraned by a box constrant and an equalty constrant. The optmsaton problem s large and can be solved usng numercal methods such as those n [4, 8, 12, 16, 18, 19]. The settng of the error penalty n the objectve functon s based on repeated tral, although there are automated algorthms [13], whch stll requres addtonal tme consumng tranng. Pror knowledge n many applcatons such as the detecton rate requred s avalable. Such pror knowledge can be ncorporatng nto SVMs to gve mproved generalsaton and computaton performance. The ν-svm [15] s one such formulaton that provdes a bound on the selecton of the error penalty and reduces the need to test dfferent error penalty values to fnd the optmal one. The ncorporaton of pror knowledge can be pursued further for 2000 Mathematcs Subject Classfcaton. Prmary: 68T10; Secondary:90C20. Key words and phrases. Support Vector Machne, Pattern recognton, Quadratc optmsaton. 1

2 2 H.G. CHEW AND C.C. LIM tranng dataset wth uneven class sze, commonly found n target detecton applcatons and mult-class mage recognton problems. Dual-ν SVM s an effectvely way to ncorporate pror knowledge [2, 4]. It s desgned to match performance n detecton and computaton wth other types of SVMs and other tradtonal classfers, whle retanng ν-svm s reduced error penalty selecton complexty. Ths paper hghlghts three man ponts. Frst, we ntroduce the Dual-C and Dual-ν SVM formulatons n Secton 2. The Dual-C SVM s a proven classfer for a wde range of applcatons [?,?, 10] and s the class basng extenson of the orgnal C-SVM, whle the Dual-ν SVM s the extenson of ν-svm. Second, we show analytcally n Secton 3 that there s a relatonshp between the solutons of Dual-ν SVM and Dual-C SVM. That means the results of one SVM can be transformed nto a soluton of the other SVM, wth dentcal decson functons. Last, an experment usng the benchmark pattern recognton dataset (MNIST) n Secton 4 demonstrates transformaton between the Dual-ν SVM soluton and the Dual-C SVM soluton. The experment also shows the smpler error penalty selecton requrements whle achevng equal or better classfcaton performance for bnary classfcaton than the Dual-C SVM. The transformaton demonstrates the ablty of the new Dual-ν SVM formulaton to obtan the same optmum solutons as Dual-C SVM whle reducng the computatonal requrements. 2. Support Vector Machne Formulaton. The Support Vector Machne s traned wth a dataset wth each data pont havng one of two classfcaton labels: postve (+1) and negatve ( 1). The C-SVM and ν-svm formulatons both utlse a sngle error parameter durng tranng to wegh the costs of errors wth the wdth of the decson margn. A common phenomenon n pattern recognton where the numbers of tranng data ponts for each class are dfferent, the decson boundary would be based towards the class wth less tranng data. The result s a classfer that makes more classfcaton errors n that class. A more general formulaton for each type of SVM has been ntroduced wth class basng: Dual-C SVM (denoted as 2C-SVM) [3] and Dual-nu SVM (denoted as 2ν-SVM) [4]. A separate error parameter for each classfcaton label allows the resultng SVM to be based to one class, or to correct an exstng tranng dataset bas, as documented n [3] for 2C-SVM and n [2, 7] for 2ν-SVM. We wll brefly dscuss these two types of SVMs n ths secton, and the relatonshp between these SVMs n the followng secton Dual-C Support Vector Machnes. The orgnal C-SVM formulaton [1] uses a sngle error parameter C as a regularsaton factor between the wdth of the margn and the total dstance of each error from the margn. A smple change n the formulaton to two error parameters, one for each class, mproves the capablty of the SVM to be able to ncorporate classfcaton basng. The 2C-SVM formulaton [3] ntroduces C + and C as the error parameters for the postve and negatve classes respectvely. 2C-SVM, beng a more general formulaton, can reduce to C-SVM by settng C + = C = C. Consder a set of l data vectors {x,y }, wth x R d, y {+1, 1}, = 1,...,l, where x s the -th data vector that belongs to a bnary class y. We seek the hyperplane that best separates the two classes wth the wdest margn whle mnmsng the cost of errors governed by the error parameters C +,C > 0. The maxmal margn hyperplane problem s formulated n the followng prmal problem:

3 REGULARISATION PARAMETER TRANSFORMATION OF SVM 3 Problem (P 2C ). subject to where mn w,b,ξ { 1 2 w 2 + C ξ } y (w Φ(x ) + b) 1 ξ, C = ξ 0, { C+, y = +1 C, y = 1. The mappng functon Φ : R d R n moves from the data space to the feature space to provde generalsaton for the decson functon that may be a non-lnear functon of the tranng data. The vector w R n and the bas b R descrbes the hyperplane wth w Φ(x)+b = 0 n the feature space, and ξ R are slack varables to relax the constrant for non-separable problems. The problem s equvalent to maxmsng the margn 2/ w, whle mnmsng the cost of the errors C ξ. The margns are defned by w Φ(x) + b = ±1. The 2C-SVM tranng problem s convex. It can be formulated as a Wolfe dual Lagrangan problem [3, 5], expressed as Problem (D 2C ). subject to max {α } α 1 α α j y y j K(x,x j ) 2,j 0 α C, α y = 0, where,j 1,...,l, α are the Lagrange multplers, and K(, ) s the kernel functon K(x,x j ) = Φ(x ) Φ(x j ). (1) The resultng decson varables α defne the decson hyperplane that separates the feature space nto the postve and negatve classes. The decson functon thus determnes the postve or negatve sde of the hyperplane that the data pont les on, and s gven by ( ) f(x) = sgn α y K(x,x) + b. The Lagrange multplers α can be thought of as the weghts to the tranng vectors that support the decson hyperplane. Therefore, the correspondng tranng vectors are termed n the followng remark. Remark 1. Tranng data vectors, x, wth correspondng decson varables α > 0 are termed support vectors, and support vectors wth α = C are addtonally termed bounded support vectors. In addton, only bounded support vectors can have ξ > 0 [14].

4 4 H.G. CHEW AND C.C. LIM ξ j ξ Fgure 1. Support vectors (crcled) of a SVM soluton of two classes ( and ) Fgure 1 shows an example of a two-dmenson SVM soluton. In the fgure, there are a total of ten support vectors (fve and fve ) as ndcated by the crcular hghlght. Of these, there are 4 bounded support vectors (two from each class) that have crossed ther assocated margns. The number of support vectors and bounded support vectors for a problem forms the bass for error parameter selecton n 2ν-SVM Dual-ν Support Vector Machnes. The formulaton of ν-svm [15] was developed to smplfy the selecton of the error parameter. The error parameter was changed from C (0, ) to ν (0,1). The parameter ν sets the bounds on the number of support vectors as well as bounded support vectors, such that (rato of Bounded Support Vectors) ν (rato of Support Vectors). The parameter C vares greatly n dfferent classfcaton problems, requrng many teratons to fnd a sutable value. In contrast, we have found that ν can be set at 0.1 n most cases for the frst teraton. However, ν-svm has only one error parameter, and ts tranng range becomes lmted when the tranng class szes are dfferent [6]. The tranng range to produce a feasble SVM s lmted by a tranng set that s non-separable (lower bound) or by an unbalanced tranng set (upper bound). The extenson to dual errors n Dual-ν allows more flexblty n the tranng process, and also overcomes the lmtaton and restrcton of ν-svm. The Extended ν-svm of Perez-Cruz et al. [11] extends the range of the error parameter ν but does not removes the effects of basng. The new 2ν-SVM removes the restrcton of the unbalanced tranng set, as the data n each class s now weghted separately. Therefore, the range of the 2ν-SVM error parameters s only lmted wth a lower bound by a non-separable tranng set and the lower bound reveals the mnmum number of tranng errors of the set. We ntroduce ν + and ν n the Dual-ν formulaton [4] as the error parameters of tranng for the postve and negatve classes. The subscrpt ± s used to denote both the + and subscrpts of the correspondng varable. That s, ν ± means both ν + and ν.

5 REGULARISATION PARAMETER TRANSFORMATION OF SVM 5 Consder a set of l data vectors {x,y }, wth x R d, y {+1, 1}, = 1,...,l, where x s the -th data vector that belongs to a bnary class y. Wth the error parameters 0 ν ± 1, the 2ν-SVM prmal formulaton takes the form of: Problem (P 2ν ). subject to where wth mn w,b,ρ,ξ { 1 2 w 2 C (νρ ξ ) y (w Φ(x ) + b) ρ ξ, C = ξ 0, ρ 0, }, (2) { C+, y = +1 C, y = 1, (3) ν = 2ν +ν, (4) ν + + ν ( C + = [l ν )] 1 + = ν, (5) ν 2l + ν + ( C = [l 1 + ν )] 1 = ν. (6) ν + 2l ν The poston of the margns, ρ, s defned by w x + b = ±ρ, and l + and l are the numbers of tranng ponts for the postve and negatve classes respectvely. The problem s now equvalent to maxmsng the margn 2/ w, whle mnmsng the poston of the margns ±ρ and the cost of the errors C ξ. The hyperplane s defned by the normal vector, w, and the bas, b, and ξ s the slack varable for classfcaton errors, as n the case of 2C-SVM. Remark 2. The ν-svm formulaton by [15] can be derved from 2ν-SVM by lettng ν + = νsl 2l + and ν = νsl 2l where ν s s the error parameter of ν-svm. If the tranng class sze s balanced, that s l + = l, t follows that ν + = ν = ν s, whch shows the smlarty of the two formulatons. Remark 3. It can be seen n Problem (P 2ν ) that we have made C = 1 as a result of normalsng the soluton and smplfyng the formulaton. The sum can be found from the defntons (5) and (6) as well as (4): C = l + C + + l C = ν + ν = 1. 2ν + 2ν The 2ν-SVM tranng problem (P 2ν ) s a convex functon. It can be formulated as a Wolfe dual Lagrangan problem [2], as Problem (D 2ν ). subject to max {α } 1 α α j y y j K(x,x j ) 2,j 0 α C, (7)

6 6 H.G. CHEW AND C.C. LIM α y = 0, (8) α ν, (9) where,j 1,...,l, α are the Lagrange multplers, and K(, ) s the kernel functon (1). In solvng the 2ν-SVM problem, constrant (9) can be smplfed from an nequalty to an equalty as follows: Lemma 2.1. The optmal soluton of Problem (D 2ν ) results n α = ν. Proof. It can be seen that α > ν cannot form the optmal soluton as the objectve functon can be maxmsed further by decreasng α. Note that a smlar equalty result as Lemma 2.1 exsts n ν-svm, and s dscussed n [15]. 3. Relatonshp between 2ν-SVM and 2C-SVM. The dfferences n error parameters between 2ν-SVM and 2C-SVM are ndeed not wthout relatons. We proceed to show that for a classfcaton problem, both SVMs can result n the same optmal soluton wth the proper settng of the correspondng error parameters. The easer selecton of ν ± wth 2ν-SVMs smplfes the error parameters search, as compared to 2C-SVMs, and thus can result n better performng SVMs. Note that n ths secton, we denote the varables to the optmal soluton of a 2C-SVM wth the superscrpt C, and that of a 2ν-SVM wth the superscrpt ν Relatng 2ν to 2C. An optmal soluton to 2ν-SVM has a correspondng optmal soluton n 2C-SVM. Proposton 1. If {w ν, b ν, ξ ν, ρν } wth the correspondng {α ν } s an optmal soluton to a 2ν-SVM gven the error parameters ν + and ν, then {w C,b C,ξ C} where w C = w ν /ρ ν, b C = b ν /ρ ν, ξ C = ξ ν/ρν wth {α C} = {αν /ρν } s an optmal soluton to the correspondng 2C-SVM, wth error parameters ( )] 1 C + = [ρ ν l ν+ ν, ( )] 1 (10) C = [ρ ν l 1 + ν ν +. Proof. Consder the prmal formulaton of 2ν-SVM where the optmal soluton {w ν, b ν, ξ ν, ρν } mnmses the objectve functon (2). Lemma 3.1 gven below states that the soluton s also the optmser of 1 mn {w,b,ξ,ρ} 2 w 2 + C ν ξ subject to νρ = νρ ν, where C ν s gven by C + and C usng Equaton (3). The last constrant becomes ρ = ρ ν and removes ρ as an optmsng varable. However, the 2C-SVM formulaton requres the margns to le at ±1, or ρ = 1. We can change

7 REGULARISATION PARAMETER TRANSFORMATION OF SVM 7 the feature space by dvdng by ρ ν, and have w = w/ρ ν, b = b/ρ ν, ξ = ξ /ρ ν and C C = C ν/ρν, to get subject to 1 mn {w,b,ξ } 2 w 2 + C C ξ y (w Φ(x ) + b ) 1 ξ, ξ 0, ρ/ρ ν = 1. Ths s the same as the Prmal Problem (P 2C ), and therefore the 2C-SVM soluton s {w C,b C,ξ C} where wc = w ν /ρ ν, b C = b ν /ρ ν, ξ C = ξ ν/ρν. Note that C ν, and thus Equatons (4) (6), are also dvded by ρ ν to gve the 2C-SVM error parameters C + and C. The normal of the hyperplane w s the combnaton of all the vectors weghted by α [4]. Snce w s scaled by ρ ν, both C ν and α ν are also be scaled by ρ ν. The Dual Problem (D 2C ) soluton s thus {α C} = {αν /ρν }. Lemma 3.1. If x s a feasble optmal soluton of then, y = x s also a feasble optmal soluton of mn x a(x) + b(x) (11) subject to g(x) 0, h(x) = 0, mn y b(y) (12) subject to g(y) 0, h(y) = 0, a(y) = a(x ). Proof. Let ŷ be the optmser of (12), such that b(ŷ) < b(x ), and a(ŷ) = a(x ). Therefore a(ŷ) + b(ŷ) < a(x ) + b(x ), whch contradcts the ntal condton that x s the optmser of (11). Thus y = x s also a feasble mnmser of b(y) n (12). Proposton 1 shows that the 2C-SVM soluton s scaled from the 2ν-SVM soluton by the derved margn poston ρ ν. Indeed, the error parameters of 2C-SVM are scaled versons of the 2ν-SVM. Remark 4. Gven the 2ν-SVM soluton, the error parameters (10) of the correspondng 2C-SVM are C + = C ν +/ρ ν, C = C ν /ρ ν (13) where C ν +,C ν are the varable lmts as defned by Equatons (5) and (6), and ρ ν s the margn poston of the 2ν-SVM soluton.

8 8 H.G. CHEW AND C.C. LIM 3.2. Relatng 2C to 2ν. An optmal soluton to 2C-SVM has a correspondng optmal soluton n 2ν-SVM. Proposton 2. If {w C,b C,ξ C} wth the correspondng {αc } s an optmal soluton to a 2C-SVM gven the error parameters C + and C, then {w ν,b ν,ξ ν,ρν } where ρ ν = (l + C + + l C ) 1, and w ν = ρ ν w C, b ν = ρ ν b C, ξ ν = ρ ν ξ C wth {α ν} = {ρν α C } s an optmal soluton to the correspondng 2ν-SVM, wth error parameters ν + = ν = P αc 2C +l +, P αc 2C l. Proof. Consder the dual formulaton of 2C-SVM where the optmal soluton {α C} maxmses the objectve functon (7). Lemma 3.2 gven below states that the soluton s also the optmser of max {α } 1 α α j y y j K(x,x j ) 2,j subject to α = αc, where CC s gven by C + and C usng Equaton (3). The last constrant becomes equal to the new ν after some scalng. However, the 2ν-SVM formulaton requres C = 1 (Remark 3). Ths requrement s met by dvdng the Dual space by CC = l + C + + l C. Wth ρ ν = (l + C + + l C ) 1 and thus α = ρν α, C ν = ρν C C and ν = ρ ν αc, we get subject to max {α } 1 α 2 α jy y j K(x,x j ),j 0 α C ν, α y = 0, α = ν. The above optmsaton problem s precsely the 2ν-SVM Dual Problem, and thus the 2ν-SVM soluton s {α ν} = {ρν α C }. Returnng to the Prmal varables, the normal w s the combnaton of all the vectors weghted by α [4]. The transformaton from 2C-SVM to 2ν-SVM scaled α by ρ ν, the normal w should be smlarly scaled. The same argument follows for the other optmsng varables. The 2ν-SVM error parameters are calculated from C ν and ν usng Equatons (4) (6). Lemma 3.2. If x s a feasble optmal soluton of max x a(x) + b(x) (14) subject to g(x) 0, h(x) = 0,

9 REGULARISATION PARAMETER TRANSFORMATION OF SVM 9 Fgure 2. A separable dataset then, y = x s also a feasble optmal soluton of max y b(y) (15) subject to g(y) 0, h(y) = 0, a(y) = a(x ). Proof. The proof s obtaned from Lemma 3.1 by mnmsng [ a(x) b(x)] and [ b(x)] for the two objectve functons. There s an nterestng observaton from Proposton 2 when we have a separable dataset gvng a soluton wth no bounded support vectors. A separable dataset has data ponts that can be separated by a hyperplane n the feature space. Fgure 2 shows an example of a separable two-dmenson dataset. There are no bounded support vectors when there are no data ponts that cross the margn, therefore α < C,. The parameters ν ± are nversely proportonal to the parameters C ± from Equaton (2), as α C wll not change wth ncreasng C ± as long as α < C,, as s the case for a separable dataset. Ths properly does not only apples to separable problems, but generally to all problems for a wde range of parameters values. Remark 5. The parameters ν ± ncreases whle the correspondng parameters C ± decreases for any gven problem. Smlar to Remark 4, the transformaton from 2C-SVM to 2ν-SVM nvolves the scalng by the varable ρ ν. If we consder the {C ν +, C ν } parameters requred for optmsng 2ν-SVM, t would appear that the regularsaton parameters do not requre the soluton of the 2C-SVM, but only the suppled error parameters C + and C. Indeed ths s correct, but there s another varable ν that s requred for the optmsaton of 2ν-SVM, and that varable requres the optmsaton varables from the soluton of the 2C-SVM. Remark 6. Gven the 2C-SVM soluton, the varable lmts n Equatons (4) (6) of the correspondng 2ν-SVM are ν = ρ ν αc, C ν + = ρ ν C +, C ν = ρ ν C (16)

10 10 H.G. CHEW AND C.C. LIM where {α C } s the soluton of 2C-SVM, wth ρν = (l + C + + l C ) 1. From Remark 4 and Remark 6, t s evdent that the correspondng solutons of 2C-SVM and 2ν-SVM are related by ρ ν. In addton, the respectve decson functons are also related. Remark 7. The decson functons for 2C-SVM (f 2C ) and 2ν-SVM (f 2ν ) are related wth f 2C (x) = f 2ν (x)/ρ ν. We have shown wth Proposton 1 and Proposton 2 that f an optmal soluton exsts n one formulaton of SVMs, a correspondng optmal soluton also exsts n the other formulaton. Therefore, wth the correct error parameters beng chosen, one formulaton can perform equally as well as the other formulaton. However, the search n 2C-SVM for the optmal error parameters C ± for a problem s often dffcult and tme consumng due to the wde search range of C ± (0, ). 2ν-SVM provdes a more ntutve error parameter model that mproves on the parameter search, and thus results n smpler search and selecton, and shorter overall tranng tmes. 4. Practcal Results. In order to compare the results obtaned usng 2ν-SVM, and the results obtaned usng 2C-SVM wth the transformaton of the parameters from νs to Cs, we wll use the results of 2C-SVM to transform the parameters Cs back to νs to compare that wth the orgnal results. The MNIST handwrtten dgt recognton dataset [9] s the prmary source we use for comparsons between 2C-SVM and 2ν-SVM. The dataset s wdely used n pattern recognton research as a benchmark. The dataset has ten handwrtten dgts (0 9) dgtsed nto pxel mages, n 60,000 tranng mages and 10,000 test mages. We select the one-aganst-rest (or wnner-takes-all) strategy for ts smple mplementaton and excellent classfcaton performance [14]. In our experment, we classfy handwrtten mages of 10 dgts. The one-aganst-rest strategy takes each class and trans a classfer aganst the rest of the classes. Ths requres ten bnary classfers, one for each dgt to dentfy t aganst the other dgts. The strategy s use of unbalanced tranng class szes can easly be handled wth 2ν-SVM and 2C-SVM Comparng Classfers. The man purpose s to compare the performance of 2C-SVM and 2ν-SVM wth dfferent error parameters. The parameters C ± (0, ) of 2C-SVM does not have an upper lmt, and the optmal value to choose vares from problem to problem. 2ν-SVM, on the other hand, s governed by ν ± (0,1) of a lmted range. The startng value of ν ± = 0.1 s found to be a good startng value through extensve testng wth dfferent datasets and problems. We use the MINST dataset to tran both 2ν-SVM and 2C-SVM wth varyng parameter values usng the radal bass functon kernel of wdth 15. Table 1 shows the classfcaton performances of the SVMs. The 2C-SVM results clearly shows that the number of trals needed to fnd the best performance depends on the startng parameter value. Snce there s no upper lmt to the parameters C ±, t s mpossble to provde a general gude of where to start from. The resultng effect s the need to complete more teratve trals of dfferent parameter values before the optmal one s found.

11 REGULARISATION PARAMETER TRANSFORMATION OF SVM 11 Table 1. Classfcaton performance comparson Classfcaton Performance for Dgt (%) SVM Overall C + = C 2C-SVM ν + = ν 2ν-SVM The 2ν-SVM startng pont of ν ± = 0.1 requres at least 10% of tranng vectors to be support vectors. In most problems, ths requrement results n a well performng classfer, wth the classfer not over-fttng (too few support vectors) or over-generalsng (too many support vectors) to the tranng dataset. We can see from Table 1 that for ths hand wrtten dgt dataset, the performance of 2ν-SVM ranges between 95.2% and 98.5%, whle 2C-SVM ranges between 89.9% and 98.5%. Choosng C ± = 0.01 as the startng value wll result n a longer teratve search for the optmal value of C ± = 10. The strength n 2ν-SVM over 2C-SVM s the need for fewer teratons to select the optmal parameter value, as startng from ν ± = 0.1 wll always result n a well performng classfer Verfy Transformaton. Proposton 1 and Proposton 2 defne the transformaton of the error parameters between 2ν-SVM and 2C-SVM for a partcular dataset. The results n the prevous secton shows that 2ν-SVM provded the best performance wth ν ± = We wll tran a set of 2ν-SVMs (one for each dgt) usng the parameters n the prevous secton, and transform ther solutons nto the parameters for 2C-SVMs. The 2ν-SVM soluton and the 2C-SVM soluton can be compared by checkng the Lagrange multplers {α }, wth Proposton 1 statng that the resultng multplers should be {α C} = {αν /ρν }. The 2C-SVM soluton s transformed back nto the parameters for 2ν-SVM to verfy Proposton 2. The multplers should agan be {α ν} = {ρν α C }. We can also compare ths fnal soluton wth the ntal 2ν-SVM soluton. Table 2 shows the results of the transformaton from 2ν-SVM to 2C-SVM (top secton), and then back to 2ν-SVM (bottom secton). The 2C-SVM parameters {C +,C } transformed from 2ν-SVM has the approxmate rato of 9 : 1. If we have ν + = ν, Equaton (10) gves the only dfference between C + and C as l + and l. That s, the rato of C + : C s the nverse rato of the tranng class szes, whch n our dataset s about 1 : 9. Ths agrees wth the strategy proposed n [3] to correct unbalanced tranng class szes basng. The numercal method for tranng the SVMs nduces a small numercal error that s dependent on the termnaton threshold used. Thus, the 2C-SVM soluton s expected have an nsgnfcantly

12 12 H.G. CHEW AND C.C. LIM Table 2. Parameter transformaton from 2ν to 2C and back to 2ν, startng from ν ± = ν to 2C 2ν to 2C to 2ν Dgt Parameter C C ave error to ν ( 10 6 ) ν ( 10 3 %) ν ( 10 3 %) ave error to C ( 10 6 ) ave error to ν ( 10 6 ) small dfference to the 2ν-SVM soluton. The error tabled shows that we have acheved a smlar soluton. The second 2ν-SVM soluton that was transformed from the 2C-SVM soluton has a smlar set of parameters as the ntal value of ν + = ν = The bggest dfference was for Dgt 9 where t s a mere 0.015%. Ths set of parameters and the low error between the Lagrange multplers verfes that the transformaton from 2C-SVM to 2ν-SVM works as proposed. 5. Concluson. We have derved the relatonshp between the solutons of 2ν-SVM and 2C-SVM to show that the two formulatons can and do result n the same soluton. The relatonshp allows us to use 2ν-SVM wth ts smpler error parameters ν ± whle havng the same performance as 2C-SVM. It can provde the user wth a reasonable set of parameters for 2C-SVM to use, by tranng wth 2ν-SVM frst and then transformng results to the 2C-SVM parameters. Ths method removes the need to search for the values for C ±, whch s problem dependent. The transformaton shows that the 2ν-SVM and the 2C-SVM both produce the same soluton, and that any soluton obtaned by one formulaton can be obtaned by the other formulaton. The 2ν-SVM formulaton provdes an ntutve parameter selecton whle havng smlar computatonal load, and thus should provde users wth easer and faster classfcaton optmsaton than 2C-SVM. REFERENCES [1] B. E. Boser, I. M. Guyon, and V. N. Vapnk. A tranng algorthm for optmal margn classfers. In D. Haussler, edtor, 5th Annual ACM Workshop on COLT, pages , Pttsburgh, PA, ACM Press. [2] H.G. Chew, R.E. Bogner, and C.C. Lm. Dual-nu support vector machne wth error rate and tranng sze basng. In Proceedngs of the 26th IEEE Internatonal Conference on Acoustcs, Speech and Sgnal Processng (ICASSP 2001), pages , Salt Lake Cty, Utah, USA, IEEE, Pscataway, NJ, USA. [3] H.G. Chew, D.J. Crsp, R.E. Bogner, and C.C. Lm. Target detecton n radar magery usng support vector machnes wth tranng sze basng. In Proceedngs of the Sxth Internatonal Conference on Control, Automaton, Robotcs and Vson (ICARCV 2000), Sngapore, 2000.

13 REGULARISATION PARAMETER TRANSFORMATION OF SVM 13 [4] H.G. Chew, C.C. Lm, and R.E. Bogner. An mplementaton of tranng dual-nu support vector machnes. In L.Q. Q, K.L. Teo, and X.Q. Yang, edtors, Optmzaton and Control wth Applcatons. Sprnger, [5] E.K.P. Chong and S.H. Zȧk. An Introducton to Optmzaton. Wley-Interscence Seres, USA, 2nd edton, [6] D.J. Crsp and C.J.C. Burges. A geometrc nterpretaton of ν-svm classfers. Advances n Neural Informaton Processng Systems, 12 (2000), [7] M.A. Davenport, R.G. Baranuk, and C.D. Scott. Controllng false alarms wth support vector machnes. In Proceedngs of the Internatonal Conference on Acoustcs, Speech and Sgnal Processng (ICASSP 2006), Toulouse, France, [8] S.C. Fang, D.Y. Gao, R.L. Sheu, and S.Y. Wu. Canoncal dual approach for solvng 0-1 quadratc programmng problems. Journal of Industral and Management Optmzaton, 4 (2008), [9] Y. LeCun, L. Bottou, Y. Bengo, and P. Haffner. Gradent-based learnng appled to document recognton. Proceedngs of the IEEE, 86 (1998), [10] E. Osuna, R. Freund, and F. Gros. Tranng support vector machnes: An applcaton to face detecton. In Proceedngs of CVPR 97, Puerto Rco, [11] F. Perez-Cruz, J. Weston, D.J.L. Hermann, and B. Schölkopf. Extenson of the ν-svm range for classfcaton. In J.A.K. Suykens, G. Horvath, S. Basu, C. Mcchell, and J. Vandewalle, edtors, Advances n Learnng Theory: Methods, Models and Applcatons, 190 (2003), pages [12] J. Platt. Fast tranng of support vector machnes usng sequental mnmal optmzaton. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, edtors, Advances n Kernel Methods Support Vector Learnng, pages , Cambrdge, MA, MIT Press. [13] K. Schttkowsk. Optmal parameter selecton n support vector machnes. Journal of Industral and Management Optmzaton, 1 (2005), [14] B. Schölkopf. Support Vector Learnng. R. Oldenbourg Verlag, Munch, [15] B. Schölkopf, A.J. Smola, R.C. Wllamson, and P.L. Bartlett. New support vector algorthms. Neural Computaton, 12 (2000), [16] K.L. Teo, V. Rehbock, and L.S. Jennngs. A new computatonal algorthm for functonal nequalty constraned optmzaton problems. Automatca, 29 (1993), [17] V.N. Vapnk. Estmaton of Dependences Based on Emprcal Data. Sprnger Verlag, New York, USA, Orgnal edton n Russan: Nauka, Moscow, [18] Z.B. Wang, S.C. Fang, D.Y. Gao, and W.X. Xng. Global extremal condtons for mult-nteger quadratc programmng. Journal of Industral and Management Optmzaton, 4 (2008), [19] Z. We, L. Q, and J.R. Brge. A new method for nonsmooth convex optmzaton. Journal of Inequaltes and Applcatons, 2 (1998), Receved March 2008; revsed September E-mal address: hgchew@eleceng.adelade.edu.au E-mal address: cclm@eleceng.adelade.edu.au

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general