An nroducon o Suppor Vecor Machne 報告者 : 黃立德 References: Smon Haykn, "Neural Neworks: a comprehensve foundaon, second edon, 999, Chaper 2,6 Nello Chrsann, John Shawe-Tayer, An Inroducon o Suppor Vecor Machnes, 2000, Chaper 3~6
Oulne Drawbacks of learnng Overvew of SVM The Emprcal Rsk Mnmzaon Prncple VC-dmenson Srucural Rsk Mnmzaon Lnearly separable paerns Non-lnearly separable paerns How o buld a SVM for paern recognon Example: XOR problem Properes and expansons of SVM Concluson Applcaons of SVM LIBSVM 2
Drawbacks of learnng The choce of he class of funcons from whch he npu/oupu mappng mus be sough. Learnng n hree-node neural neworks s known o be NP-complee 3
Drawbacks of learnng (con.) In pracce, here are followng problems The learnng algorhm may prove neffcen as for example n he case of local mnma The sze of oupu hypohess can frequenly become very large and mpraccal If here are only a lmed number of ranng examples, he hypohess found by learnng algorhm wll lead o overfng and hence poor generalzaon The learnng algorhm s usually conrolled by a large number of parameers ha are ofen chosen by urnng heurscs, makng he sysem dffcul and unrelable o use 4
Overvew of SVM Wha s SVM? A lnear machne wh some very nce properes The goal s o consruc a decson surface such ha he margn of separaon beween posve and negave samples s maxmzed. SVM s a learnng sysem ha uses a hypohess space of lnear funcons n a hgh dmensonal feaure space, raned wh a learnng algorhm from opmzaon heory ha mplemens a learnng bas derved from sascal learnng heory 5
The Emprcal Rsk Mnmzaon Prncple Gven a se of daa ( ) ( ) n x y,..., x, y, x R, y {, }, N N Also gven a se of decson funcons { } n f : λ I, where f : R {, } λ The expeced rsk s ( λ) = f ( x) y dp( x y) λ R, λ 6
The Emprcal Rsk Mnmzaon Prncple (con.) The approxmaon (emprcal rsk) R emp N N ( λ) = f ( x ) λ = y Theory of unform convergence n probably { } lm P sup( R( λ) R ( λ)) > ε = 0, ε > 0 N λ I emp 7
Vapnk-Chervonenks dmenson VC-dmenson I s a measure of he capacy or expressve power of he famly of classfcaon funcons realzed by he learnng machne 8
Srucure Rsk Mnmzaon Le I be a subse of I k S { } k = fλ : λ Ik Defne a srucure of nesed subse S S2... S n... Each subse sasfes he condon h h2... h n..., h : VC dmenson 9
Srucure Rsk Mnmzaon (con.) Implemenng SRM can be dffcul because he VC dmenson of S n could be hard o compue. hn mn Remp ( λ) + N Suppor Vecor Machne, SVM, are able o acheve he goal of mnmzng he upper bound of R( λ) by mnmzng a bound on he VC dmenson h and R λ a he same me. emp ( ) 0
Conceps of SVM SVM s an approxmae mplemenaon of he mehod of srucural rsk mnmzaon I does no ncorporae problem-doman knowledge
Lnearly separable paerns A ranng sample ( x, ) N d = The paerns are lnear separable. The equaon of a decson surface ha does he separaon s wx And we can wre wx b 0 wx b 0 { } + b= 0 + for d = + + < for d = d ( w x + b) for =, 2,..., N 2
Lnearly separable paerns (con.) The dscrmnan funcon of he opmal hyperplane s gx ( ) = wx+ b 0 0 Maxmum Margn Rule We selec he hyperplane wh maxmum neares he daa pon (suppor vecors). 3
Lnearly separable paerns (con.) w x 0 + b0 = 0 w x 0 + b0 = Margn : 2 w 0 w x 0 + b0 = 4
Lnearly separable paerns (con.) The fnal goal s o mnmzes he cos funcon Φ ( w) = w w 2 We may solve he consraned opmzaon problem usng he mehod of Lagrange mulplers. Lagrangan funcon N J( w, b, α) = w w α d( w w + b) 2 = 5 α : Lagrange mulplers
Lnearly separable paerns (con.) dual form {( x, d )} N = Gven he ranng sample,fnd he Lagrange mulplers { } N α ha maxmze he = objecve funcon Q( α) = α αα N N N jdd j x xj = 2 = j= subjec o he consrans () (2) N = α d = 0 α 0 for =,2,..., N 6
Lnearly separable paerns (con.) Havng deermned he opmum Lagrange mulplers, we can compue he opmum wegh vecor and bas. w N = α d x 0 o, = b w x for d ( s) ( s) 0 = 0 = 7
Non-lnearly separable paerns allow ranng errors The defnon of decson surface s d ( w x + b) ξ for =,2,..., N A las, we only have o mnmzng he followng funcon N Φ ( w, ξ ) = w w+ C ξ 2 = 8
Non-lnearly separable paerns (con.) dual form {( x, d )} N = Gven he ranng sample,fnd he Lagrange mulplers { α } N ha maxmze he = objecve funcon N N N Q( α) = α αα jdd j x xj = 2 = j= subjec o he consrans () (2) N = α d = 0 0 α C for =,2,..., N where C s a user-specfed posve parameer 9
Non-lnearly separable paerns (con.) Afer he opmum Lagrange mulplers have deermned, we can compue he opmum wegh vecor and bas. w N s = α d x 0 o, = b w x for d ( s) ( s) 0 = 0 = 20
How o buld a SVM for paern recognon Seps for consrucng a SVM Nonlnear mappng of npu vecors no a hgh dmensonal feaure space ha s hdden from boh he npu and oupu. Consrucon of an opmal hyperplane for separang he feaures. 2
How o buld a SVM for paern recognon (con.) x denoes a vecor from he npu space. { ϕ ( ) } m j x j= denoes a se of nonlnear mappng from he npu space o he feaure space. Defne a hyperplane as followng m j= wϕ ( x) + b= 0 j Defne he vecor j ϕ( x) ϕ ( x), ϕ ( x),..., ϕ ( x) = 0 m m j= 0 wϕ ( x) = 0 j j 22
How o buld a SVM for paern recognon (con.) We can wre he equaon n he compac form. w ϕ ( x) = 0 Because he feaures are lnear separable, we may wre N α dϕ( x ) w = = Subsung Eq.2 n Eq.,we ge N = α dϕ ( x ) ϕ( x) = 0 2 23
How o buld a SVM for paern recognon (con.) Defne he nner-produc kernel denoed by K( x, x ) = ϕ ( x) ϕ( x ) m = j j = j= 0 ϕ ( x) ϕ ( x ) for,..., N Now, we may use he nner-produc kernel o consruc he opmal decson surface n he feaure space whou consderng he feaure space n explc form. N = α dk( x, x) = 0 24
How o buld a SVM for paern recognon (con.) Opmum desgn of a SVM (dual form) {( x, d )} N = Gven he ranng sample,fnd he Lagrange mulplers { α } N ha maxmze he = objecve funcon N N N Q( α) = α αα jdd jk( x, xj ) = 2 = j= subjec o he consrans () (2) N = α d = 0 0 α C for =,2,..., N where C s a user-specfed posve parameer 25
How o buld a SVM for paern recognon (con.) We may vew K( x as he j-h elemen, xj) of a symmerc N-by-N marx K K = { K x } xj (, j ) = (, ) N Havng found he opmum values of α o,, we can ge w N = α dϕ( x ) o o, = 26
How o buld a SVM for paern recognon (con.) 27
Example: XOR problem Frs, we choose kernel as K( xx, ) = + ( xx) 2 = [ ] [ ] Wh x x, x2 and x = x, x,we ge 2 K( x, x ) = + x x + 2x x x x + x x + 2x x + 2x x 2 2 2 2 2 2 2 2 2 2 2 2 = 2 2 2 ϕ( x), x, 2 x x, x, 2 x, 2x 2 2 ϕ( x) =, x, 2 x x2, x2, 2 x, 2 x 2, =, 2, 3, 4 28
Example: XOR problem (con.) We also fnd ha K 9 9 = 9 9 The objecve funcon for he dual form s 2 Q( α) = α+ α2 + α3+ α4 (9α 2αα 2 2αα 3+ 2αα 4 2 2 2 2 + 9α + 2αα 2αα + 9α 2αα + 9 α ) 2 2 3 2 4 3 3 4 4 29
Example: XOR problem (con.) Opmzng Q( α) Q( α) α Q( α) α 2 Q( α) α 3 Q( α) α 4 9α α α + α = 2 3 4 α + 9α + α α = 2 3 4 α + α + 9α α = 2 3 4 α α α + 9α = 2 3 4 30
Example: XOR problem (con.) The opmum values of α o, are α = α = α = α = o, o,2 o,3 o,4 8 Q ( α ) = o 4 2 wo = wo = 2 4 2 3
Example: XOR problem (con.) We fnd ha he opmum wegh vecor s wo = x + x + x x 8 [ ϕ( ) ϕ( ) ϕ( ) ϕ( )] 2 3 4 0 0 2 2 2 2 2 = + + = 8 0 2 2 2 2 0 2 2 2 2 0 32
Example: XOR problem (con.) The opmal hyperplane s defned by w o ϕ ( x) = 0 2 x 2xx 2 0,0,,0,0,0 2 = 0 2 x2 2x 2x 2 xx 2= 0 33
Properes and expansons of SVM Two mporan feaures: Dualy s he frs feaure of SVM Operae n a kernel nduced feaure space Several expansons of SVM: C-Suppor Vecor Classfcaon (bnary case) v-suppor Vecor Classfcaon (bnary case) Dsrbuon Esmaon (one-class SVM) ε -Suppor Vecor Regresson ( -SVR) v-suppor Vecor Regresson (v-svr) ε 34
Concluson The SVM s an elegan and hghly prncpled learnng mehod for he desgn of classfyng nonlnear npu daa. Compared wh back-propagaon algorhm Only operae n a bach mode Whaever he learnng ask, provde a mehod for conrollng model complexy ndependenly of dmensonaly I s guaraneed o fnd a global exremum of he error surface The compuaon can be performed effcenly By usng a suable nner-produc kernel, he SVM compues all he mporan nework parameers auomacally. 35
Applcaons of SVM Classfcaon Regresson Recognon Bonformacs 36
LIBSVM A Lbrary for Suppor Vecor Machnes Made by Chh-Jen Ln and Chh-Chung Chang Boh C++ and Java sources hp://www.cse.nu.edu.w/~cjln/ 37