Natural Language Processing and Information Retrieval

Size: px

Start display at page:

Download "Natural Language Processing and Information Retrieval"

Joel Franklin
6 years ago
Views:

1 Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal:

2 Summary Support Vector Machnes Hard-margn SVMs Soft-margn SVMs

3 Whch hyperplane choose?

4 Classfer wth a Maxmum Margn Var 1 IDEA 1: Select the hyperplane wth maxmum margn Margn Margn Var 2

5 Support Vector Var 1 Support Vectors Margn Var 2

6 Support Vector Machne Classfers Var 1 The margn s equal to 2 k w w! w!! x! + b = k w! " x! + b =! k k k w!! x! + b = 0 Var 2

7 Support Vector Machnes Var 1 The margn s equal to We need to solve 2 k w w! w! " x! + b =! k k w!! x! + b = k k w!! x! + b Var 2 = 0 max 2! k w! w " x! + b # +k, f x! s postve! w " x! + b $ %k, f x! s negatve

8 Support Vector Machnes Var 1 There s a scale for whch k=1. The problem transforms n: w!!! w! x + b = " 1 1!! w! x + b = 1 1 w!! x! + b Var 2 = 0 max 2! w! w " x! + b # +1, f x! s postve! w " x! + b $ %1, f x! s negatve

9 Fnal Formulaton max 2! w! w " x! + b # +1, y =1! w " x! + b $ %1, y = -1 " max 2! w y ( w! " x! + b) #1 " mn! w 2 " " y ( w! " x! + b) #1 mn! w 2 2 y ( w! " x! + b) #1

10 Optmzaton Problem Optmal Hyperplane: Mnmze Subject to " (! w ) = 1 2! w 2 y ((! w #! x ) + b) $ 1, = 1,...,m The dual problem s smpler

11 Lagrangan Defnton "

12 Dual Optmzaton Problem

13 Dual Transformaton Gven the Lagrangan assocated wth our problem To solve the dual problem we need to evaluate: Let us mpose the dervatves to 0, wth respect to w!

14 Dual Transformaton (cont d) and wrt b Then we substtuted them n the Lagrange functon

15 Fnal Dual Problem

16 Khun-Tucker Theorem Necessary and suffcent condtons to optmalty L( w, α, β ) = 0 w L( w, α, β ) = 0 b α g ( w ) = 0, =1,.., m g ( w ) 0, =1,.., m α 0, =1,.., m

17 Propertes comng from constrants Lagrange constrants: m!! y = 0 =1 Karush-Kuhn-Tucker constrants! w = m!!! y x =1 " #[y (! x #! w + b) $1] = 0, = 1,...,m! Support Vectors have not null To evaluate b, we can apply the followng equaton

18 Warnng! On the graphcal examples, we always consder normalzed hyperplane (hyperplanes wth normalzed gradent) b n ths case s exactly the dstance of the hyperplane from the orgn So f we have an equaton not normalzed we may have! x "! w '+b = 0 wth! x = x,y ( ) and! w '= 1,1 ( ) and b s not the dstance

19 Warnng! Let us consder a normalzed gradent! w = 1/ 2,1/ 2 ( ) ( x,y) " ( 1/ 2,1/ 2) + b = 0 # x / 2 + y/ 2 = $b # y = $x $ b 2 Now we see that -b s exactly the dstance. For x =0, we have the ntersecton wth. Ths! w dstance projected on s -b "b 2

20 Soft Margn SVMs Var 1!! slack varables are added w!!! w! x + b = 1 Some errors are allowed but they should penalze the objectve functon!! w! x + b = " w!! x! + b = 0 Var 2

21 Soft Margn SVMs Var 1! The new constrants are y ( w! " x! + b) #1$% & x! where % # 0 w!!! w! x + b = " 1 1!! w! x + b = 1 1 w!! x! + b Var 2 = 0 The objectve functon penalzes the ncorrect classfed examples mn 1 2 w! 2 +C # " C s the trade-off between margn and the error

22 Dual formulaton mn 1 2 w + C m =1 ξ2 y ( w x + b) 1 ξ, =1,.., m ξ 0, =1,.., m L( w, b, ξ, α )= 1 2 w w + C 2 m ξ 2 m α [y ( w x + b) 1+ξ ], =1 =1 By dervng wrt! w,! " and b

23 Partal Dervatves

24 Substtuton n the objectve functon! j of Kronecker

25 Fnal dual optmzaton problem

26 Soft Margn Support Vector Machnes The algorthm tres to keep ξ low and maxmze the margn NB: The number of error s not drectly mnmzed (NP-complete problem); the dstances from the hyperplane are mnmzed If C, the soluton tends to the one of the hard-margn algorthm Attenton!!!: f C = 0 we get w! = 0, snce mn 1 2 w! 2 +C # " y (! w "! x + b) #1$% &! x % # 0 y b "1#$ %! x If C ncreases the number of error decreases. When C tends to nfnte the number of errors must be 0,.e. the hard-margn formulaton

27 Robusteness of Soft vs. Hard Margn SVMs Var 1 Var 1! ξ w!! x! + b = 0 Var 2 w!! x! + b = 0 Var 2 Soft Margn SVM Hard Margn SVM

28 Soft vs Hard Margn SVMs Soft-Margn has ever a soluton Soft-Margn s more robust to odd examples Hard-Margn does not requre parameters

29 Parameters mn 1 2 w! 2 +C! = mn 1 2 w! 2 +C +!! + + C "!! (! " ) = mn 1 2 w! 2 +C J!! + +!! " C: trade-off parameter J: cost factor

30 Theoretcal Justfcaton

31 Defnton of Tranng Set error Tranng Data N f : R! { ±1} (! Emprcal Rsk (error) Rsk (error) R emp [ f ] = 1 m x 1,y 1 ),...,( x! m,y m ) " R N # ±1 m # =1 1 2 f ( x! ) " y R[ f ] = 1 2 f (! x ) " y dp(! x,y) # { }

32 Error Characterzaton (part 1) From PAC-learnng Theory (Vapnk): R(") # R emp (") + $( d m, log(% ) m ) $( d m, log(% ) m ) = d (log 2m d +1)&log(% 4 ) m where d s thevc-dmenson, m s the number of examples, δ s a bound on the probablty to get such error and α s a classfer parameter.

33 There are many versons for dfferent bounds

34 Error Characterzaton (part 2)

35 Rankng, Regresson and Multclassfcaton

36 The Rankng SVM [Herbrch et al. 1999, 2000; Joachms et al. 2002] The am s to classfy nstance pars as correctly ranked or ncorrectly ranked Ths turns an ordnal regresson problem back nto a bnary classfcaton problem We want a rankng functon f such that x > x j ff f(x ) > f(x j ) or at least one that tres to do ths wth mnmal error Suppose that f s a lnear functon f(x ) = w x

37 Sec The Rankng SVM Rankng Model: f x f (x )

38 Sec The Rankng SVM Then (combnng the two equatons on the last slde): x > x j ff w x w x j > 0 x > x j ff w (x x j ) > 0 Let us then create a new nstance space from such pars: z k = x x k y k = +1, 1 as x, < x k

39 Support Vector Rankng mn 1 2 w + C m =1 ξ2 y k ( w ( x x j )+b) 1 ξ k, ξ k 0, k =1,.., m 2, j =1,.., m y k = 1 f rank( x ) > rank( x j ),"1 0 otherwse, where k = m + j Gven two examples we buld one example (x, x j )

40 Support Vector Regresson (SVR) f(x) ( x) = wx b f + +ε 0 -ε Mn y 1 2 Constrants: w T Soluton: w x T w b ε w T x + b y ε x

41 Support Vector Regresson (SVR) f(x) ( x) = wx b f + ξ ξ* +ε 0 -ε Mnmse: N 1 w + C " 2 wt # + " y w * ξ, ξ T x w T Constrants: x + b 0 b y ( * ) =1 ε + ξ ε + ξ * x

42 Support Vector Regresson y s not -1 or 1 anymore, now t s a value ε s the tollerance of our functon value

43 From Bnary to Multclass classfers Three dfferent approaches: ONE-vs-ALL (OVA) Gven the example sets, {E1, E2, E3, } for the categores: {C1, C2, C3, } the bnary classfers: {b1, b2, b3, } are bult. For b1, E1 s the set of postves and E2 E3 s the set of negatves, and so on For testng: gven a classfcaton nstance x, the category s the one assocated wth the maxmum margn among all bnary classfers

44 From Bnary to Multclass classfers ALL-vs-ALL (AVA) Gven the examples: {E1, E2, E3, } for the categores {C1, C2, C3, } buld the bnary classfers: {b1_2, b1_3,, b1_n, b2_3, b2_4,, b2_n,,bn-1_n} by learnng on E1 (postves) and E2 (negatves), on E1 (postves) and E3 (negatves) and so on For testng: gven an example x, all the votes of all classfers are collected where b E1E2 = 1 means a vote for C1 and b E1E2 = -1 s a vote for C2 Select the category that gets more votes

45 From Bnary to Multclass classfers Error Correctng Output Codes (ECOC) The tranng set s parttoned accordng to bnary sequences (codes) assocated wth category sets. For example, ndcates that the set of examples of C1,C3 and C5 are used to tran the C classfer. The data of the other categores,.e. C2 and C4 wll be negatve examples In testng: the code-classfers are used to decode one the orgnal class, e.g. C = 1 and C = 1 ndcates that the nstance belongs to C1 That s, the only one consstent wth the codes

46 SVM-lght: an mplementaton of SVMs Implements soft margn Contans the procedures for solvng optmzaton problems Bnary classfer Examples and descrptons n the web ste: (

47 Structured Output

48 Mult-classfcaton

49 References A tutoral on Support Vector Machnes for Pattern Recognton Downloadable artcle (Chrss Burges) The Vapnk-Chervonenks Dmenson and the Learnng Capablty of Neural Nets Downloadable Presentaton Computatonal Learnng Theory (Sally A Goldman Washngton Unversty St. Lous Mssour) Downloadable Artcle AN INTRODUCTION TO SUPPORT VECTOR MACHINES (and other kernel-based learnng methods) N. Crstann and J. Shawe-Taylor Cambrdge Unversty Press Check our lbrary The Nature of Statstcal Learnng Theory Vladmr Naumovch Vapnk - Sprnger Verlag (December, 1999) Check our lbrary

50 Exercse 1. The equatons of SVMs for Classfcaton, Rankng and Regresson (you can get them from my sldes). 2. The perceptron algorthm for Classfcaton, Rankng and Regresson (the last two you have to provde by lookng at what you wrote n pont (1)). 3. The same as pont (2) by usng kernels (wrte the kernel defnton as ntroducton of ths secton).

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti MACHINE LEARNING Support Vector Machines Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Summary Support Vector Machines