Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan
Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Wth NN:Mult-layer feed-forward neural networks Neurons are organzed nto herarchcal layers Each layer receve ther nputs from the prevous one and transmts the output to the net one w j w j j j j w g z j j j z w g z Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
w w w w ( ) ( ) w w ( ) XOR w = 0.7 w = 0.7 = 0. 5 w = 0.3 w = 0.3 = 0. 5 w = 0.7 w = -0.7 = 0. 5 = 0 = 0 a = -0.5 z = 0 a = -0.5 z = 0 a = -0.5 z = 0 Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
w w w w ( ) ( ) w w ( ) XOR w = 0.7 w = 0.7 = 0. 5 w = 0.3 w = 0.3 = 0. 5 w = 0.7 w = -0.7 = 0. 5 = = 0 a = 0. z = a = -0. z = 0 a = 0. z = Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
w w w w ( ) ( ) w w ( ) XOR w = 0.7 w = 0.7 = 0. 5 w = 0.3 w = 0.3 = 0. 5 w = 0.7 w = -0.7 = 0. 5 = 0 = a = 0. z = a = -0. z = 0 a = 0. z = Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
w w w w ( ) ( ) w w ( ) XOR w = 0.7 w = 0.7 = 0. 5 w = 0.3 w = 0.3 = 0. 5 w = 0.7 w = -0.7 = 0. 5 = = a = 0.9 z = a = 0. z = a = -0.5 z = 0 Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
The hdden layer REMAPS the nput n a new representaton that s lnearly separable Input Desred Actvaton of output hdden neurons 0 0 0 0 0 0 0 0 0 0 Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Etenson to Non-lnear Decson Boundary So far, we have only consdered large-margn classfer wth a lnear decson boundary How to generalze t to become nonlnear? Key dea: transform to a hgher dmensonal space to make lfe easer Input space: the space the pont are located Feature space: the space of f( ) after transformaton Why to transform? Lnear operaton n the feature space s equvalent to nonlnear operaton n nput space Classfcaton can become easer wth a proper transformaton. In the XOR problem, for eample, addng a new feature of make the problem lnearly separable Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
XOR X Y 0 0 0 0 0 0 Y Is not lnearly separable X X Y XY 0 0 0 0 0 0 0 0 0 XY Y Is lnearly separable X
Fnd a feature space Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Transformng the Data Input space f(.) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) Feature space Note: feature space s of hgher dmenson than the nput space n practce Computaton n the feature space can be costly because t s hgh dmensonal The feature space can be nfnte-dmensonal! The kernel trck comes to rescue Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
The Kernel Trck Recall the SVM optmzaton problem The data ponts only appear as scalar product As long as we can calculate the nner product n the feature space, we do not need the mappng eplctly Many common geometrc operatons (angles, dstances) can be epressed by nner products Defne the kernel functon K by Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
An Eample for f(.) and K(.,.) Suppose f(.) s gven as follows An nner product n the feature space s So, f we defne the kernel functon as follows, there s no need to carry out f(.) eplctly Ths use of kernel functon to avod carryng out f(.) eplctly s known as the kernel trck Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Kernels Gven a mappng: φ() a kernel s represented as the nner product K (, y) φ () φ (y) A kernel must satsfy the Mercer s condton: g( ) such that g ( ) d K(, y) g( ) g( y) ddy 0 Analogous to postve-semdefnte matrces M for whch z T 0 z M z 0 Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Modfcaton Due to Kernel Functon Change all nner products to kernel functons For tranng, Orgnal Wth kernel functon Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Modfcaton Due to Kernel Functon For testng, the new data z s classfed as class f f >0, and as class f f <0 Orgnal Wth kernel functon Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
More on Kernel Functons Snce the tranng of SVM only requres the value of K(, j ), there s no restrcton of the form of and j can be a sequence or a tree, nstead of a feature vector K(, j ) s just a smlarty measure comparng and j For a test object z, the dscrmnat functon essentally s a weghted sum of the smlarty between z and a preselected set of objects (the support vectors) Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Eample Suppose we have 5 D data ponts =, =, 3 =4, 4 =5, 5 =6, wth,, 6 as class and 4, 5 as class y =, y =, y 3 =-, y 4 =-, y 5 = Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Eample Suppose we have 5 D data ponts =, =, 3 =4, 4 =5, 5 =6, wth,, 6 as class and 4, 5 as class y =, y =, y 3 =-, y 4 =-, y 5 = class class class 4 5 6 Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Eample We use the polynomal kernel of degree K(,y) = (y+) C s set to 00 We frst fnd a (=,, 5) by Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Eample By usng a QP solver, we get a =0, a =.5, a 3 =0, a 4 =7.333, a 5 =4.833 Note that the constrants are ndeed satsfed The support vectors are { =, 4 =5, 5 =6} The dscrmnant functon s b s recovered by solvng f()= or by f(5)=- or by f(6)=, All three gve b=9 Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Eample Value of dscrmnant functon f(z) f(z)>0 class class class 4 5 6 f(z)<0 Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Kernel Functons In practcal use of SVM, the user specfes the kernel functon; the transformaton f(.) s not eplctly stated Gven a kernel functon K(, j ), the transformaton f(.) s gven by ts egenfunctons (a concept n functonal analyss) Egenfunctons can be dffcult to construct eplctly Ths s why people only specfy the kernel functon wthout worryng about the eact transformaton Another vew: kernel functon, beng an scalar product, s really a smlarty measure between the objects Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
A kernel s assocated to a transformaton Gven a kernel, n prncple t should be recovered the transformaton n the feature space that orgnates t. K(,y) = (y+) = y +y+ If and y are numbers t corresponds the transformaton What f and y are -dmensonal vectors? Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
A kernel s assocated to a transformaton ) ) ) ) ) ) ) ) ) ) ) ) ) ) ),, j j j j j j j j j j + + + + + + = K ) ) ) ) ) ) T,,, )= (,, f Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
XOR Smple eample (XOR problem) 0 L α) = N α = N N = j= α α j y y j K(, j ) Input vector Y [-,-] - [-,+] + [+,-] + K( ) +, j )= j [+,+] - (-,-) (-,+) (+,-) (+,+) (-,-) 9 (-,+) 9 (+,-) 9 (+,+) 9 Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
) α + α α α + α α α α + α + α α α α α α α ( +α +α α +α L(α 4 4 3 3 4 3 4 3 4 3 9 9 9 9 ) 0 9α α L 0 9α α L 0 9α α L 0 9α α L 4 3 4 4 3 3 4 3 4 3 = α α α = α α α = α α α = α α α 4 8 4 3 L = = = α = α = α α The four Input vectors are All support vectors N = ) ( y α w= W = [0, 0, /sqrt(), 0, 0, 0] T XOR Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
XOR 0 Input vector Y [-,-] - [-,+] + [+,-] + [+,+] - f( )=, ) ) ) ),, ),, ) T w= N = α y ( ) W = [0, 0, /sqrt(), 0, 0, 0] Input vector Y [-,-] - + [-,+] + - [+,-] + - [+,+] - + Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Eamples of Kernel Functons Polynomal kernel up to degree d Polynomal kernel up to degree d Radal bass functon kernel wth wdth s Sgmod wth parameter k and It does not satsfy the Mercer condton on all k and Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Ploynomal kernel Bshop C, Pattern recognton and Machne Learnng, Sprnger Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Fonte: http://www.vancuc.org/
Eamples of Kernel Functons Radal bass functon (or gaussan) kernel wth wdth s K(, y) ep s y ep s yy ep s Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Eamples of Kernel Functons Wth -dm vectors: K(, y) ep s y ep s y ep s It corresponds to the scalar product n the nfnte dmensonal feature space: 3 ( T f ) ep,,, s s s 3! s 3,..., ) n! s... For vector n m-dm the feature space s more complcated Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna n n
Wthout slack varables Bshop C, Pattern recognton and Machne Learnng, Sprnger Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Wth slack varables Bshop C, Pattern recognton and Machne Learnng, Sprnger Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Gaussan RBF kernel Bshop C, Pattern recognton and Machne Learnng, Sprnger Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Buldng new kernels If k (,y) and k (,y) are two vald kernels then the followng kernels are vald Lnear Combnaton Eponental Product Polymomal transformaton (Q: polymonal wth non negatve coeffcents) Functon product (f: any functon) ), ( ), ( ), ( y k c y c k y k ), ( ep ), ( y k y k ), ( ), ( ), ( y k y k y k ), ( ), ( y Q k y k ) ( ), ( ) ( ), ( y f y k f y k Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Choosng the Kernel Functon Probably the most trcky part of usng SVM. The kernel functon s mportant because t creates the kernel matr, whch summarzes all the data Many prncples have been proposed (dffuson kernel, Fsher kernel, strng kernel, ) There s even research to estmate the kernel matr from avalable nformaton In practce, a low degree polynomal kernel or RBF kernel wth a reasonable wdth s a good ntal try Note that SVM wth RBF kernel s closely related to RBF neural networks, wth the centers of the radal bass functons automatcally chosen for SVM Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Kernels can be defned also for structures other than vectors Computatonal bology often deals wth structures dfferent from vectors: Sequences (DNA, RNA, protens) Trees (Phylogenetc relatonshps) Graphs (Interacton networks) 3-D structures (protens) Is t possble to buld kernels for that structures? Transform data onto a feature space made of n-dmensonal real vectors and then compute the scalar product. Wrte a kernel wthout wrtng eplctly the feature space (but.. What s a kernel?) Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Defnng kernels wthout defnng feature transformaton What a kernel represent? Dstance n feature space
Defnng kernels wthout defnng feature transformaton What a kernel represent? Dstance n feature space Kernel s a SIMILARITY measure Moreover t has to fullfll a «postvty» condton
Spectral kernel for sequences Gven a DNA sequence we can count the number of bases (4-D feature space) f ( ) ( n, n, n, n A C G T ) Or the number of dmers (6-D space) f ( ) ( n, n, n, n Or l-mers (4 l D space), n, n, n, n AA AC AG AT CA CC CG CT,..) The spectral kernel s k l (, y) l ) f y) f l Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
l s usually lower than
Kernel out of generatve models Gven a generatve model assocatng a probablty p( θ) to a gven nput, we defne : Fsher Kernel ) ( ) ( ), ( y p p y K ), ( ), ( ), ( ), ( ), ( ), ( ), ( ) ( ln ), ( y g F g y K g g N g g E F p g T N T Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Other Aspects of SVM How to use SVM for mult-class classfcaton? One can change the QP formulaton to become mult-class More often, multple bnary classfers are combned One can tran multple one-versus-all classfers, or combne multple parwse classfers ntellgently Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Other Aspects of SVM How to nterpret the SVM dscrmnant functon value as probablty? By performng logstc regresson on the SVM output of a set of data (valdaton set) that s not used for tranng Some SVM software (lke lbsvm) have these features bult-n Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Software A lst of SVM mplementaton can be found at http://www.kernel-machnes.org/software.html Some mplementaton (such as LIBSVM) can handle mult-class classfcaton SVMLght s among one of the earlest mplementaton of SVM Several Matlab toolboes for SVM are also avalable Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Summary: Steps for Classfcaton Prepare the pattern matr Select the kernel functon to use Select the parameter of the kernel functon and the value of C You can use the values suggested by the SVM software, or you can set apart a valdaton set to determne the values of the parameter Eecute the tranng algorthm and obtan the a Unseen data can be classfed usng the a and the support vectors Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Strengths and Weaknesses of SVM Strengths Tranng s relatvely easy No local optmal, unlke n neural networks It scales relatvely well to hgh dmensonal data Tradeoff between classfer complety and error can be controlled eplctly Non-tradtonal data lke strngs and trees can be used as nput to SVM, nstead of feature vectors Weaknesses Need to choose a good kernel functon. Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna
Other Types of Kernel Methods A lesson learnt n SVM: a lnear algorthm n the feature space s equvalent to a non-lnear algorthm n the nput space Standard lnear algorthms can be generalzed to ts nonlnear verson by gong to the feature space Kernel prncpal component analyss, kernel ndependent component analyss, kernel canoncal correlaton analyss, kernel k-means, -class SVM are some eamples Per Lug Martell - Systems and In Slco Bology 05-06- Unversty of Bologna