Lecture 10 Support Vector Machnes Oct - 20-2008
Lnear Separators Whch of the lnear separators s optmal?
Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron algorthm depends on a concept called margn
Intuton of Margn Consder ponts A, B, and C We are qute confdent n our A w x b =0 predcton for A because t s w x b > 0 far from the decson B boundary. In contrast, we are not so confdent n our predcton for w x b < 0 C because a slght change n C the decson boundary may flp the decson. Gven a tranng set, we would lke to make all of our predctons correct and confdent! Ths can be captured by the concept of margn
Functonal Margn One possble way to defne margn: We defne ths as the functonal margn of the lnear classfer w.r.t tranng example ( x, y ) The large the value, the better really? What f we rescale (w, b) by a factor α, consder the lnear classfer specfed by (αw, αb) Decson boundary reman the same Yet, functonal margn gets multpled by α We can change the functonal margn of a lnear classfer wthout changng anythng meanngful We need somethng more meanngful
What we really want A w x b =0 B C We want the dstances between the examples and the decson boundary to be large ths quantty s what we call geometrc margn But how do we compute the geometrc margn of a data pont w.r.t a partcular lne (parameterzed by w and b)?
Some basc facts about lnes w x b = 0 X 1? 1 w x w b
Geometrc Margn The geometrc margn of (w, b) w.r.t. x () s the dstance from x () to the decson surface B Ths dstance can be computed as y ( w x b) γ = w C A γ A Gven tranng set S={(x, y ): =1,, }, the geometrc margn of the classfer w.r.t. S s γ = mn γ = 1L ote that the ponts closest to the boundary are called the support ote that the ponts closest to the boundary are called the support vectors n fact these are the only ponts that really matters, other examples are gnorable ( )
What we have done so far We have establshed that we want to fnd a lnear decson boundary whose margn s the largest We know how to measure the margn of a lnear decson boundary ow what? We have a new learnng objectve Gven a lnearly separable (wll be relaxed later) tranng set S={(x, y ): =1,, }, we would lke to fnd a lnear classfer (w, b) wth maxmum margn.
Maxmum Margn Classfer Ths can be represented as a constraned optmzaton problem. maxγ w, b subject to : () () ( w x b) y () γ, = 1, L, w Ths optmzaton problem s n a nasty form, so we need to do some rewrtng Let γ = γ w, we can rewrte ths as max w, b γ ' w subject to : y ( w x b) γ ', = 1, L,
Maxmum Margn Classfer ote that we can arbtrarly rescale w and b to make the functonal margn γ ' large or small So we can rescale them such that =1 max γ ' w, b w subject to : y ( w x γ ' b) γ ', = 1, L, max w, b 1 w subject to : (or equvalently mn y ( w x w, b b) 1, w 2 ) = 1, L, Maxmzng the geometrc margn s equvalent to mnmzng the magntude of w subject to mantanng a functonal margn of at least 1
Solvng the Optmzaton Problem mn w, b subject to : y 1 2 w 2 ( w xx b ) 1, = 1, L, Ths results n a quadratc optmzaton problem wth lnear nequalty constrants. t Ths s a well-known class of mathematcal programmng problems for whch several (non-trval) algorthms exst. In practce, we can just regard the QP solver as a black-box box wthout botherng how t works You wll be spared of the excrucatng detals and jump to
The soluton We can not gve you a close form soluton that you can drectly yplug n the numbers and compute for an arbtrary data sets But, the soluton can always be wrtten n the followng form w = α y x, s.t. α y = 0 = 1 Ths s the form of w, b can be calculated accordngly usng some addtonal steps The weght vector s a lnear combnaton of all the tranng examples Importantly, many of the α s are zeros These ponts that have non-zero α s are the support vectors = 1
A Geometrcal Interpretaton Class 2 α 8 =0.6 α 10=0 α 5 =0 α 7 =0 α 2 =0 α 4 =0 α 9 =0 Class 1 α 6 =1.4 6 α 3 =0 α 1 =0.8
A few mportant notes regardng the geometrc nterpretaton gves the decson boundary postve support vectors le on ths lne negatve support vectors le on ths lne We can thnk of a decson boundary now as a tube of certan wdth, no ponts can be nsde the tube Learnng nvolves adjustng the locaton and orentaton t of the tube to fnd the largest fttng tube for the gven tranng set