SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1
Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition è Primal form Alternative interpretation from Empirical Risk Minimization point of view. Understand the final solution (in the dual form) Generalizes well to Kernel functions SVM + Kernels 2
Ch. 15 Linear classifiers: Which Hyperplane? w T x i + b = 0 Lots of possible solutions for a, b, c. Some methods find a separating hyperplane, but not the optimal one [according to some criterion of expected goodness] E.g., perceptron Support Vector Machine (SVM) finds an optimal* solution. Maximizes the distance between the hyperplane and the difficult points close to decision boundary One intuition: if there are no points near the decision surface, then there are no very uncertain classification decisions This line represents the decision boundary: ax + by c = 0 3
Sec. 15.1 Another intuition If you have to place a fat separator between classes, you have less choices, and so the capacity of the model has been decreased 4
Sec. 15.1 Support Vector Machine (SVM) SVMs maximize the margin around the separating hyperplane. A.k.a. large margin classifiers The decision function is fully specified by a subset of training samples, the support vectors. Solving SVMs is a quadratic programming problem Seen by many as the most successful current text classification method* Support vectors Maximizes Narrower margin margin *but other discriminative methods often perform very similarly 5
Sec. 15.1 Maximum Margin: Formalization w: decision hyperplane normal vector x i : data point i y i : class of data point i (+1 or -1) Classifier is: f(x i ) = sign(w T x i + b) Functional margin of x i is: y i (w T x i + b) But note that we can increase this margin simply by scaling w, b. Functional margin of dataset is twice the minimum functional margin for any point NB: Not 1/0 NB: a common trick The factor of 2 comes from measuring the whole width of the margin 6
A w T x i + b = 7.4 Largest Margin + + + w B + + + + + + w T x i + b = 0 - w T x i + b = 3.7 C - - - - - - - - Distance from the separating hyperplane corresponds to the confidence of prediction Example: We are more sure about the class of A and B than of C 7
Sec. 15.1 w Geometric Margin Distance from example to the separator is Examples closest to the hyperplane are support vectors. Margin ρ of the separator is the width of separation between support vectors of classes. x r x ρ r = y w T x + b w Algebraic derivation of finding r: Dotted line x x is perpendicular to decision boundary so parallel to w. Unit vector is w/ w, so line is rw/ w, for some r. x = x yrw/ w. x satisfies w T x +b = 0. So w T (x yrw/ w ) + b = 0 Recall that w = sqrt(w T w). So w T x yr w + b = 0 So, solving for r gives: r = y(w T x + b)/ w 8
<latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="hp+6lruf2d3tzaldqaqqvekmxyw=">aaab2xicbzdnsgmxfixv1l86vq1rn8eiucozbnqpuhfzwbzco5rm5k4bmskmyr2hdh0bf25efc93vo3pz0jbdwq+zknivscullqubn9ebwd3b/+gfugfnfzjk9nmo2fz0gjsilzl5jnmfpxu2cvjcp8lgzylffbj6f0i77+gstlxtzqrmmr4wmtuck7o6oyaraadlmw2ivxdc9yanb+gss7kddujxa0dhefbucunsafw7g9liwuxuz7ggupnm7rrtrxzzi6dk7a0n+5oykv394ukz9bostjdzdhn7ga2mp/lbiwlt1eldvesarh6kc0vo5wtdmajnchizrxwyasblykjn1yqa8z3hysbg29d77odbu3wmya6nmmfxeein3ahd9cblghi4bxevyn35n2suqp569lo4i+8zx84xio4</latexit> <latexit sha1_base64="6lcawpc66yuuxmextyjv8lpzvfu=">aaachhicbvbnswmxej31s9av6tvluaqpuna96exrevfywvahu5rsom2d2eyszaql+g+8+fe8ccqi/8a0xugrd5k8ew9cmi/olltk+5/ezozc/mjiaam8vlk6tl7zwgnandccgyjvqbmjuuulntzikskbzcbpyoxx8e35yl++q2nlqq9okggu8j6wxsk4oaldoqkv1z2f7gyf1vhojsuxc5toij0v52svfvcitsh1kthrv3b8qj8g+0ucguxagxq78hj2upenqekobm0r8dokhtyqfarvy2fumepilvew5ajmcdpooj7znu06pco6qxflexurp28mewltiildz8kpb6e9kfif18qpexqnpc5yqi0md3vzxshlo9byrxoupaaocggk+ystfw64ibdt2yuqti/8lzqpqoffds59kmewbmmebhaip3abdwiagad4gld48x69z+99etemv+s2cb/gfxwbjpcihw==</latexit> <latexit sha1_base64="6lcawpc66yuuxmextyjv8lpzvfu=">aaachhicbvbnswmxej31s9av6tvluaqpuna96exrevfywvahu5rsom2d2eyszaql+g+8+fe8ccqi/8a0xugrd5k8ew9cmi/olltk+5/ezozc/mjiaam8vlk6tl7zwgnandccgyjvqbmjuuulntzikskbzcbpyoxx8e35yl++q2nlqq9okggu8j6wxsk4oaldoqkv1z2f7gyf1vhojsuxc5toij0v52svfvcitsh1kthrv3b8qj8g+0ucguxagxq78hj2upenqekobm0r8dokhtyqfarvy2fumepilvew5ajmcdpooj7znu06pco6qxflexurp28mewltiildz8kpb6e9kfif18qpexqnpc5yqi0md3vzxshlo9byrxoupaaocggk+ystfw64ibdt2yuqti/8lzqpqoffds59kmewbmmebhaip3abdwiagad4gld48x69z+99etemv+s2cb/gfxwbjpcihw==</latexit> <latexit sha1_base64="4cpnfqrmsw/wku4zs1giw1jocw0=">aaacj3icbvbnswmxem36wetx1aoxybe8snn1ohelrrepfewhdevjptm2njtdklmhlp4bl/4vl4kk6nf/ytquok0dybx5b4zkxhblydb1p52l5zxvtfxmrnzza3tnn7e3xznrojluesqj3qiyaskuvfggheasgywbhhowuj7o9xvqrktqdocxtelwu6iroenltxnxvmsqj4gwtmmz+npwxfk/bhppkc2zu/xt8cj42adktj3luwv3gnqrecnikzqq7dyl34l4eojclpkxtc+nstviggwxmm76iygy8qhrqdncxuiwrdf0zze9tkyhdintj0i6zx9pjfhozdambgfisg/mtqn5n9zmshvrggkvjwikzx7qjpjircem0y7qwfeolwbcc/txyvtmm47w2qw1wztferhuzgqew/bu3xyxnnqriyfkijwqj5ytirkhfvilndyqj/jk3pxh59l5dz5mrutoonna/otz9q0m2qot</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> Help from Inner Product Remember: Dot product / inner product ha, Bi = kakkbk cos! "#$% (Or use an algebraic derivation similar to prev slide) A s projection onto B = <A, B> / <B, B> 9
Derivation of the Geometric Margin A + d(a, L) w L w T x + b = 0 d(a, L) = <A, w>/<w, w> - <H, w>/<w, w> Note that H is on the line L, so <w, H> + b = 0 Therefore = (<A, w> + b)/<w, w> H (0,0) 10
Sec. 15.1 Linear SVM Mathematically The linearly separable case Assume that all data is at least distance 1 from the hyperplane, then the following two constraints follow for a training set {(x i,y i )} w T x i + b 1 if y i = 1 w T x i + b 1 if y i = 1 For support vectors, the inequality becomes an equality Then, since each example s distance from the hyperplane is r = y w T x + b w The margin is: ρ = 2 w 11
Sec. 15.1 Derivation of ρ ρ w T x a + b = 1 Hyperplane w T x + b = 0 w T x b + b = -1 Extra scale constraint: min i=1,,n w T x i + b = 1 This implies: w T (x a x b ) = 2 ρ = x a x b 2 = 2/ w 2 w T x + b = 0 12
Sec. 15.1 Solving the Optimization Problem Find w and b such that Φ(w) =½ w T w is minimized; and for all {(x i,y i )}: y i (w T x i + b) 1 This is now optimizing a quadratic function subject to linear constraints Quadratic optimization problems are a well-known class of mathematical programming problem, and many (intricate) algorithms exist for solving them (with many special ones built for SVMs) The solution involves constructing a dual problem where a Lagrange multiplier α i is associated with every constraint in the primary problem: Find α 1 α N such that Q(α) =Σα i - ½ΣΣα i α j y i y j x it x j is maximized and (1) Σα i y i = 0 (2) α i 0 for all α i 13
Sec. 15.1 Geometric Interpretation Find w and b such that Φ(w) =½ w T w is minimized; and for all {(x i,y i )}: y i (w T x i + b) 1 What are fixed and what are variables? Linear constraint(s): Where can the variables be? Quadratic objective function: 14
<latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> Sec. 15.1 The Optimization Problem Solution The solution has the form: w =Σα i y i x i b= y k - w T x k for any x k such that α k ¹ 0 Each non-zero α i indicates that corresponding x i is a support vector. Then the scoring function will have the form: f(x) = Σα i y i x it x + b Q: What are the model parameters? What does f(x) mean intuitively? Classification is based on the sign of f(x) f(x) = X sv2sv y sv sv hx sv, xi + b Notice that it relies on an inner product between the test point x and the support vectors x i We will return to this later. Also keep in mind that solving the optimization problem involved computing the inner products <x i,x j > between all pairs of training points. 15
Sec. 15.2.1 Soft Margin Classification If the training data is not linearly separable, slack variables ξ i can be added to allow misclassification of difficult or noisy examples. Allow some errors Let some points be moved to where they belong, at a cost Still, try to minimize training set errors, and to place hyperplane far from each class (large margin) ξ i ξ j 16
Soft Margin Classification Mathematically Sec. 15.2.1 [Optional] The old formulation: Find w and b such that Φ(w) =½ w T w is minimized and for all {(x i,y i )} y i (w T x i + b) 1 The new formulation incorporating slack variables: Find w and b such that Φ(w) =½ w T w + CΣξ i is minimized and for all {(x i,y i )} y i (w T x i + b) 1- ξ i and ξ i 0 for all i Parameter C can be viewed as a way to control overfitting A regularization term 17
Alternative View SVM in the natural form arg min w, b 1 2 w w Margin + C n å max Hyper-parameter related to regularization { 0,1 - y ( w x + b) } i i i= 1 Empirical loss L (how well we fit training data) SVM uses Hinge Loss : 0/1 loss penalty min w, b 1 2 s. t. " i, y w i 2 ( w x i n + Cåxi i= 1 + b) ³ 1-x i -1 0 1 2 Hinge loss: max{0, 1-z} z = y ( x w b) i i + 18
Soft Margin Classification Solution The dual problem for soft margin classification: Find α 1 α N such that Q(α) =Σα i - ½ΣΣα i α j y i y j x it x j is maximized and (1) Σα i y i = 0 (2) 0 α i C for all α i Neither slack variables ξ i nor their Lagrange multipliers appear in the dual problem! Again, x i with non-zero α i will be support vectors. Solution to the dual problem is: Sec. 15.2.1 [Optional] w = Σα i y i x i b = y k (1- ξ k ) - w T x k where k = argmax α k k w is not needed explicitly for classification! f(x) = Σα i y i x it x + b 19
Sec. 15.1 Classification with SVMs Given a new point x, we can score its projection onto the hyperplane normal: I.e., compute score: w T x + b = Σα i y i x it x + b Decide class based on whether < or > 0 Can set confidence threshold t. Score > t: yes Score < -t: no Else: don t know -1 0 1 20
<latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> Sec. 15.2.1 Linear SVMs: Summary The classifier is a separating hyperplane. The most important training points are the support vectors; they define the hyperplane. Quadratic optimization algorithms can identify which training points x i are support vectors with non-zero Lagrangian multipliers α i. Both in the dual formulation of the problem and in the solution, training points appear only inside inner products: Find α 1 α N such that Q(α) =Σα i - ½ΣΣα i α j y i y j x it x j is maximized and (1) Σα i y i = 0 (2) 0 α i C for all α i f(x) = X sv2sv y sv sv hx sv, xi + b 21
Sec. 15.2.3 Non-linear SVMs Datasets that are linearly separable (with some noise) work out great: 0 x But what are we going to do if the dataset is just too hard? 0 x How about mapping data to a higher-dimensional space: x 2 0 x 22
Sec. 15.2.3 Non-linear SVMs: Feature spaces General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x φ(x) 23
Sec. 15.2.3 The Kernel Trick The linear classifier relies on an inner product between vectors K(x i,x j )=x it x j If every datapoint is mapped into high-dimensional space via some transformation Φ: x φ(x), the inner product becomes: K(x i,x j )= φ(x i ) T φ(x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space. Usually, no need to construct the feature space explicitly. 24
<latexit sha1_base64="d1qsll5qi5urxdl1qzwun/vsatg=">aaadqxicrzjda9swfiyvex9d9tf0u9ynwfhx6qh2gkw3hblddlalfpy0leqcrmijuln2ponamp5t+w+727+znhinm/ruexjec86jo9ehe6zsgpd93w3hffdw0eodj82nz56/ogwdveybjnom91giez0iqefskn4dazipus1phep+hd58kuvxs66nsnq3wkv8fnozepfgfgxqctt4+cujmyv5govz8q7/1cvibb/jc+wf+brvgxfoiemlojbuykkajvui3x14y9q69x97eckj8ijj4omoobvguq60mm1h0xv/e/nx9057g1q4xhllkf4dl4r7zdhgsensgywql7cestoxtwgd3e6rnrcjm7tafsdfh7wvgkq0uxuuj61fzjqwloykmktgdam/hvfonqgmedekmeepztd0xodwkhpzm8rxm1bgtzyzxvgi7acar7p1gzmnjvnfosvli2a3vibvqw0zim5guvbpblyxzunrjjekufxbpbwam5arkyjtwnrfbe41zwcxu2mheoz+8r7odzub3wmu3rcvplbjoecv0rvkoqb9qbfom7pepcscy+er03p67ql75q7c7xvuavr3xqe7x2v/agvakvo=</latexit> <latexit sha1_base64="d1qsll5qi5urxdl1qzwun/vsatg=">aaadqxicrzjda9swfiyvex9d9tf0u9ynwfhx6qh2gkw3hblddlalfpy0leqcrmijuln2ponamp5t+w+727+znhinm/ruexjec86jo9ehe6zsgpd93w3hffdw0eodj82nz56/ogwdveybjnom91giez0iqefskn4dazipus1phep+hd58kuvxs66nsnq3wkv8fnozepfgfgxqctt4+cujmyv5govz8q7/1cvibb/jc+wf+brvgxfoiemlojbuykkajvui3x14y9q69x97eckj8ijj4omoobvguq60mm1h0xv/e/nx9057g1q4xhllkf4dl4r7zdhgsensgywql7cestoxtwgd3e6rnrcjm7tafsdfh7wvgkq0uxuuj61fzjqwloykmktgdam/hvfonqgmedekmeepztd0xodwkhpzm8rxm1bgtzyzxvgi7acar7p1gzmnjvnfosvli2a3vibvqw0zim5guvbpblyxzunrjjekufxbpbwam5arkyjtwnrfbe41zwcxu2mheoz+8r7odzub3wmu3rcvplbjoecv0rvkoqb9qbfom7pepcscy+er03p67ql75q7c7xvuavr3xqe7x2v/agvakvo=</latexit> <latexit sha1_base64="d1qsll5qi5urxdl1qzwun/vsatg=">aaadqxicrzjda9swfiyvex9d9tf0u9ynwfhx6qh2gkw3hblddlalfpy0leqcrmijuln2ponamp5t+w+727+znhinm/ruexjec86jo9ehe6zsgpd93w3hffdw0eodj82nz56/ogwdveybjnom91giez0iqefskn4dazipus1phep+hd58kuvxs66nsnq3wkv8fnozepfgfgxqctt4+cujmyv5govz8q7/1cvibb/jc+wf+brvgxfoiemlojbuykkajvui3x14y9q69x97eckj8ijj4omoobvguq60mm1h0xv/e/nx9057g1q4xhllkf4dl4r7zdhgsensgywql7cestoxtwgd3e6rnrcjm7tafsdfh7wvgkq0uxuuj61fzjqwloykmktgdam/hvfonqgmedekmeepztd0xodwkhpzm8rxm1bgtzyzxvgi7acar7p1gzmnjvnfosvli2a3vibvqw0zim5guvbpblyxzunrjjekufxbpbwam5arkyjtwnrfbe41zwcxu2mheoz+8r7odzub3wmu3rcvplbjoecv0rvkoqb9qbfom7pepcscy+er03p67ql75q7c7xvuavr3xqe7x2v/agvakvo=</latexit> <latexit sha1_base64="d1qsll5qi5urxdl1qzwun/vsatg=">aaadqxicrzjda9swfiyvex9d9tf0u9ynwfhx6qh2gkw3hblddlalfpy0leqcrmijuln2ponamp5t+w+727+znhinm/ruexjec86jo9ehe6zsgpd93w3hffdw0eodj82nz56/ogwdveybjnom91giez0iqefskn4dazipus1phep+hd58kuvxs66nsnq3wkv8fnozepfgfgxqctt4+cujmyv5govz8q7/1cvibb/jc+wf+brvgxfoiemlojbuykkajvui3x14y9q69x97eckj8ijj4omoobvguq60mm1h0xv/e/nx9057g1q4xhllkf4dl4r7zdhgsensgywql7cestoxtwgd3e6rnrcjm7tafsdfh7wvgkq0uxuuj61fzjqwloykmktgdam/hvfonqgmedekmeepztd0xodwkhpzm8rxm1bgtzyzxvgi7acar7p1gzmnjvnfosvli2a3vibvqw0zim5guvbpblyxzunrjjekufxbpbwam5arkyjtwnrfbe41zwcxu2mheoz+8r7odzub3wmu3rcvplbjoecv0rvkoqb9qbfom7pepcscy+er03p67ql75q7c7xvuavr3xqe7x2v/agvakvo=</latexit> <latexit sha1_base64="2viisliyyouiydhxxsbcw6hb9fc=">aaacznicdvjnbxmxepuux2x5suhixskayofonqtckkugisquqsjtptinvf5vytvrb+3zimctupkz+a3cuhhhf+bnkti0zsrlb957y489ziolhatpzyi+dv3gzvs7t5m7d+/df9dbfxjgtg25mhcjjd3kmbnkajebcuocvvawmlpimdt52+qhz8i6afqnwfvivrkfloxkdai17/2m1vlu0zlbmit83bza+5hmyig1zwjp5ecmifg5pu7ugh82/5zzna1ya+5qpw/pjv14ufxqmw6u7s6d5h9vpn9jmm0ipdshqud/ez/2fezvzhv9djcua18g5bz0r0+/v/ufebrpez9obnhdcg1cmeemjk1g5pkfyzvoelo7utf+whzigqbmpxazvx5hg58fjsefswfpwgu2w+fz6dyqzikz7d5d1frykm1aq/f65qwuahcabw4qaoxb4ha2ojdwcfcrabi3mvsk+zjzxih8gcq8arl45cvgydgg6yb8jp3rg7sjhfqypuf7ikbxaiteozgaib59ie6jl5gpx/fz3mrfn9y4oq95hlyi/vyh0kpifg==</latexit> <latexit sha1_base64="8lvcsces1hi+5kxmqsidaz+cuse=">aaacznicdvlnbhmxepzugzblpwgoxcwcqbyidnnpl5ucsaijs5biwyloi6/xm1j12lt7tik1vlx5cd6fz+dgaybx5gnwjhvs2jkspw++7xt77hfasmehjn8e4canm7c2t25hd+7eu7/defdwworkmd5iwmpzlfllpvb8baikpyonp0uq+wf68qbrd8+4sukrj7ao+asgmyvywsh4atr5scq52cefhxmau6p+gfcxsflmkjd60ohpdztg55jyuwoux/9ztpe0zdty6/wsovv24/5aqcu4uto7t5l/vbnszvkvez4leeeq+9v7ssogy3ra6ca9ebn4kkguqhfw9nvbx7+/vhpoo99jpllvcavmumvhsvzcxfedgkler6syvktshm742enfc24nbjmogj/ztizzbfxsgjdsu8lrwtpfkxpn0729rdxkddq4gnxv4oqqk+ckrq7kk4lb42a2obogm5aldygzwvek2zwaysd/gmg/qnl5ylfbqb+xxl3kq9idvear2ekp0ro0gxk0iwbohrqiewlb++a0oa9coazpwjr8vlkgwuxni7qw4zc/schj7g==</latexit> <latexit sha1_base64="8lvcsces1hi+5kxmqsidaz+cuse=">aaacznicdvlnbhmxepzugzblpwgoxcwcqbyidnnpl5ucsaijs5biwyloi6/xm1j12lt7tik1vlx5cd6fz+dgaybx5gnwjhvs2jkspw++7xt77hfasmehjn8e4canm7c2t25hd+7eu7/defdwworkmd5iwmpzlfllpvb8baikpyonp0uq+wf68qbrd8+4sukrj7ao+asgmyvywsh4atr5scq52cefhxmau6p+gfcxsflmkjd60ohpdztg55jyuwoux/9ztpe0zdty6/wsovv24/5aqcu4uto7t5l/vbnszvkvez4leeeq+9v7ssogy3ra6ca9ebn4kkguqhfw9nvbx7+/vhpoo99jpllvcavmumvhsvzcxfedgkler6syvktshm742enfc24nbjmogj/ztizzbfxsgjdsu8lrwtpfkxpn0729rdxkddq4gnxv4oqqk+ckrq7kk4lb42a2obogm5aldygzwvek2zwaysd/gmg/qnl5ylfbqb+xxl3kq9idvear2ekp0ro0gxk0iwbohrqiewlb++a0oa9coazpwjr8vlkgwuxni7qw4zc/schj7g==</latexit> <latexit sha1_base64="i9b7qoombkvay63whktbh29l0zo=">aaacznicdvjnbxmxepuuhy3lv4ajf4siva5e61zopvlvxpc4bim0lei08nq9ivwvvbvnqwzrxzxfx40fwp/am0swactilt6898yej51vsjpi019rfo/+g4c7u4+sx0+epnvee/hyxjnacjhmrhl7ljenlnridbkuokusygwmxgl2cdzqp1fcomn0v1hwylqyuzaf5awcnev9ptvc7tgswsirfn28xweyzmiutc8caev1kxd8dln3acepm3/o2ypwuqf3t563dmd+ptwq6bjxsnfntvk/kp9/im02evkaukhzv72fewqmama9fjpiv4fva7ibfbsj0az3k+ag16xqwbvzbklscqaewzbcisahtrmv4xdsliybalykn/wr52jw28dkuda2la14xxyrpcudw5zzclbdu5tas96ltwoo9qde6qogofn6okjwgaxu3xbn0gooahka41agxjffmms4hb+qhcgqm1e+du6ga5ioybfspzzajgmxvuzv0b4i6cm6rj/qci0rjz5hl9g3ymej+cpu4u9raxxtal6hryh//ag7dn/m</latexit> <latexit sha1_base64="6xnp0rp2hhgjt16tzhq5/ljedwo=">aaacznicdzlljtmwfiadcbvczqos2vgu0lcgirubdvifekjiuyq6m1ldqrzhaa1x7ix9ulgsic2pxtowy8eg98bpr0pmwpes/f7+/ythcbjksqdp+iukr12/cfpwzu3kzt1793d7dx7uo1nblibckgmpm+aeklpmqiish5uvrmyuomio37x+wupyj43+dotkzeq20lkqnefa894fwi3lhi0zllpcr5ox+a2mmvhi7bmarfzsjaq/x9sdwpdd5l9yvseqn+cu9vmwd+jhw3mnnedw6t65s/lfl89fkuym1c0iw5pqofoz2y88bvm1814/hasbwpcforx90dmf738jhmbz3k+ag16xqgnxzlkpssuyewzbciwahnzoviwfs4wybqlzkdzmb66jwc8cyxfhbfga8iz2ozwrnvuxwui207ulxguv8qy1fk9nxuqqbqh59kvfrtay3n4tzquvhnq6cmatdlnivmswcqh/qbi+arl45mtifzgg6yb8iv3rw7sthfqypuf7ikbxaiq+odgaib59je6ir5gpx/eqbujv22gcnfy8qucq/v4x4h3ihg==</latexit> <latexit sha1_base64="swfpnszmjb7nzskswirnwqoeqva=">aaacznicdzllbtqwfiadcgkjtyks2vgmoljgfm+gbionrujibaajasunpyphcwasok5qn4yywhfbxoj34rny8qbirhkcnjmqpbeozon39/8noy6tlepaioofqxjj5q3bg5t3orv37j942nl6tg+lynax4ouqzghcrfbsixfiuokwnillirihyfhbxj9ycgnlot/bshstnm20zcrn4ng084uwc7lncwbzjhol+ixextqrm6ld4qgrn+ui4bey2hmdrl//s05xwkuf2ov9tmgt+fh/qkmruhbat25t/tfl0lekpgdva/ywrlto9hz2i0ehkotppxv34lxhq4kcie7g2fd3v/98ezocdn7qtobvljrwxawdk7ieiwmgjfeijmhlrcn4mzujszea5cjo3oo6avzckxrnhffla17rdodjubxlpphjznp72wvgdd64gmxn4qqukxcar1+uvqpdgzu7xak0gonaesg4kx5wzofmma7+d4j8rycxj3xv7pd7jo6rj6q72epr2krp0fo0jqh6jqboprqieelbh+akoa1coawxyr1+wufd4kznmbpq4de/wzvj9g==</latexit> <latexit sha1_base64="swfpnszmjb7nzskswirnwqoeqva=">aaacznicdzllbtqwfiadcgkjtyks2vgmoljgfm+gbionrujibaajasunpyphcwasok5qn4yywhfbxoj34rny8qbirhkcnjmqpbeozon39/8noy6tlepaioofqxjj5q3bg5t3orv37j942nl6tg+lynax4ouqzghcrfbsixfiuokwnillirihyfhbxj9ycgnlot/bshstnm20zcrn4ng084uwc7lncwbzjhol+ixextqrm6ld4qgrn+ui4bey2hmdrl//s05xwkuf2ov9tmgt+fh/qkmruhbat25t/tfl0lekpgdva/ywrlto9hz2i0ehkotppxv34lxhq4kcie7g2fd3v/98ezocdn7qtobvljrwxawdk7ieiwmgjfeijmhlrcn4mzujszea5cjo3oo6avzckxrnhffla17rdodjubxlpphjznp72wvgdd64gmxn4qqukxcar1+uvqpdgzu7xak0gonaesg4kx5wzofmma7+d4j8rycxj3xv7pd7jo6rj6q72epr2krp0fo0jqh6jqboprqieelbh+akoa1coawxyr1+wufd4kznmbpq4de/wzvj9g==</latexit> <latexit sha1_base64="3uzauo6bvgmpogiu3jqxhppyibq=">aaacznicdzjnb9mwgmedjjcrxlbgymwiao0dvdwlxjamueziuis6taq7yngc1ppjz/atimjfu+7zcemd8d1w2mpklzxspl9////jplgtvuo6snpfubxz7/6dh7upksdpnj7b6z1/cermbbkyc6ompcmye0pqmqyjspxuvrayu+i4o/vs+sdlyz00+juskjet2vzlqnigac16f2i1kpu0zldicr9s3ufpmgzilrxpartyr5mq/bztd27bd5t/ydkaq9yau9vpw9yjnw6vnxscg6e7c2fxvy6fvyfnfahbejy0oulnv7ofegqmama9fjpi14vvc7ivfbst0az3i+ag16xqwbvzbklscqaewzbcisahtrmv42dslizbalykn/xr62jwm0byxbgbhg14tbsdnpxorcosjnvp3u2vhxd5kxqkj1mvdvwd0hzzoqjwgaxu7xbn0gooahue41agwtffmms4hd8gcydabn7ybxe0hjb0ql6r/shn7xhsolfondphbh1ab+gqjday8ehrdb79jhw8ipdxe19song07xmjrlv8+rfltt/u</latexit> <latexit sha1_base64="1kyf+2q/8lefqjl+aeb9ereqkjm=">aaacknicbvdlssnafl2pr1pfuzdubotquuqic90ivtecmwpwhsytk+nedp08mjkussi/uhpjr7jpqilu/ranrwj9hbg4nhmuc+/xys6ksqyruziznztfkc6wlpzxvtfm9y0bgswc0aajectupcwpzyftkky4vysfxyhh6a3xo8/92z4vkkxhtrre1a3wfch8rrdsuts8vaw4avzdz0+tbb998x62i05qxuz76ntupy6k4mw61dpebbnsva0x0f9if5jybdvzewcaetscop2ijaenfefyyqztxcpnsvcmcjqvnetsgjmevqdntumcuomm41mztkovdvijov+o0fidnkhxioug8hqy31l+9nlxp6+zkp/ytvkyj4qgzpkrn3ckipt3hjpmukl4qbnmbno7itlfahol2y3peuzfj/8lnwdv26rav3a5dgytfgeltqecnhxbds6gdg0g8ajp8akvxpmxnebg2yramd5nnuehjpcp5ugn0q==</latexit> <latexit sha1_base64="iz9fiyagel9be2kyucjrkhyirm8=">aaacknicbvdlsgmxfm3uv62vuzduqotqqzqzxehgqlor3fswttczlkyaaumzd5jmyrjml/whn/6kmy6u4typmx2itfva4hdouete44smcmkyiy2zsrq2vphdzg1t7+zu6fshtykiocy1hlcanxwkckm+qukqgwmencdpyatu9g/hfn1aukcb/yjjknge6vruprhjjbx16/ui5shzc9wksk/hdx+kj/akfk1ygr92k7fkekbzody5bosfo2xmajejosofst4qpy8qcbwtd61ogcop+bizjettnejpj4hlihljc1yksihwh3vju1efeutyyetufb4rpqpdgkvnszhr5ycs5akre45kjrcui95y/m9rrtk9tbpqh5ekpp5+5eymygcoe4mdygmwlfyeyu7vrhd3eedyqnzzqgrz8erl8nrwno2y+wawkjdgiiw4anlqbca4abvwb6qgbjb4aw/ghxxor9pqg2mf02hgm80cgj/qvr4b7qupvw==</latexit> <latexit sha1_base64="iz9fiyagel9be2kyucjrkhyirm8=">aaacknicbvdlsgmxfm3uv62vuzduqotqqzqzxehgqlor3fswttczlkyaaumzd5jmyrjml/whn/6kmy6u4typmx2itfva4hdouete44smcmkyiy2zsrq2vphdzg1t7+zu6fshtykiocy1hlcanxwkckm+qukqgwmencdpyatu9g/hfn1aukcb/yjjknge6vruprhjjbx16/ui5shzc9wksk/hdx+kj/akfk1ygr92k7fkekbzody5bosfo2xmajejosofst4qpy8qcbwtd61ogcop+bizjettnejpj4hlihljc1yksihwh3vju1efeutyyetufb4rpqpdgkvnszhr5ycs5akre45kjrcui95y/m9rrtk9tbpqh5ekpp5+5eymygcoe4mdygmwlfyeyu7vrhd3eedyqnzzqgrz8erl8nrwno2y+wawkjdgiiw4anlqbca4abvwb6qgbjb4aw/ghxxor9pqg2mf02hgm80cgj/qvr4b7qupvw==</latexit> <latexit sha1_base64="zb0hw2lqsnveleufnzv0xylpxqi=">aaacknicbvdlssnafj3uv62vqes3g0wokcxrhw6eqhvbtqx7gcytk+mkhtp5mdmpljdvceovuolckw79ecdtxnp6yobwzrnmvccjgrxsmczabmv1bx0jv1ny2t7z3dp3d+oiidgmnrywgdcdjaijpqljkhlphpwgz2gk4qzuu78xjfzqwh+wo5dyhur51kuyssv19nvhkuuh2xfcoero4q8fjqfwbpzmeaz/7xzsysbm5kpts9jri0bzmaiuezmjrzch2thhvjfakud8irksomuaobrjxcxfjcqfkxikrhiaeqslqi88iux4emoct5tshw7a1fmlnkrzezhyhbh5jkqmw4pflxx/81qrdk/tmpphjimpzx+5eymygglvses5wzknfegyu7urxh3eezaq3yiqwvw8eznul8qmutafzglllqsjd47amsgbe1ybcngavvadglyan/aoprrxbaxntm9znkdlm4fgd7svb9lcpkg=</latexit> Sec. 15.2.3 Example What about K(u, v) =(1+u > v) 3? K(u, v) =(1+u > v) 2 =1+2u > v +(u > v) 2 X i =1+2u > v + 0 =1+2u > v + @ X i! 2 u i v i 1 u 2 i vi 2 + X u i u j v i v j A i6=j = (u) > (v) (u) = 1 p 2u1... p 2ud u 2 1... u 2 d u 1 u 2... u d 1 u d > (v) = 1 p 2v1... p 2vd v 2 1... v 2 d v 1 v 2... v d 1 u d > Linear Non-linear Non-linear + feature combination 25
<latexit sha1_base64="sm6vytap6fyzmwqaxzfbyvkj7/y=">aaacvhicbvhlsgmxfm1mrdb6qrp0eyxcc7bmfefbhkibwu0f+4bolzn0thubezbkpgxor+pc8evcudb9lprwqudk3hosmxm34kwqy/o2znrwensns5vd2z84pmodnzrkgaskdrryulrciogzaoqkkq6tsadxxq5nd/gw7tffqugwbi9qhehhj/2aeywspalubvhuchyibq6xjczddomxdm+32jgs75mivsmojkjcctueidrxgiduspth0ootzxsvlulsmr+hybgby1tla1z4e9glkeelqnvzn04vplepgakcsnm2ruh1eiiuoxwmwsewebe6jh1oaxgqh2qnmyuywrea6wevfhofcs/yzudcfcnhvquv08hlem9k/tdrx8q76sqsigifaz1f5mucqxbpe8y9joaqptaaumh0rjgoii5n6x/i6hds9sdvgkalbftl+/kqx71fxjfbz+gcfzcnrlevpaiaqiokptcpgqzd+dj+zzsznktny+e5rstlhv4be+6yug==</latexit> <latexit sha1_base64="sm6vytap6fyzmwqaxzfbyvkj7/y=">aaacvhicbvhlsgmxfm1mrdb6qrp0eyxcc7bmfefbhkibwu0f+4bolzn0thubezbkpgxor+pc8evcudb9lprwqudk3hosmxm34kwqy/o2znrwensns5vd2z84pmodnzrkgaskdrryulrciogzaoqkkq6tsadxxq5nd/gw7tffqugwbi9qhehhj/2aeywspalubvhuchyibq6xjczddomxdm+32jgs75mivsmojkjcctueidrxgiduspth0ootzxsvlulsmr+hybgby1tla1z4e9glkeelqnvzn04vplepgakcsnm2ruh1eiiuoxwmwsewebe6jh1oaxgqh2qnmyuywrea6wevfhofcs/yzudcfcnhvquv08hlem9k/tdrx8q76sqsigifaz1f5mucqxbpe8y9joaqptaaumh0rjgoii5n6x/i6hds9sdvgkalbftl+/kqx71fxjfbz+gcfzcnrlevpaiaqiokptcpgqzd+dj+zzsznktny+e5rstlhv4be+6yug==</latexit> <latexit sha1_base64="sm6vytap6fyzmwqaxzfbyvkj7/y=">aaacvhicbvhlsgmxfm1mrdb6qrp0eyxcc7bmfefbhkibwu0f+4bolzn0thubezbkpgxor+pc8evcudb9lprwqudk3hosmxm34kwqy/o2znrwensns5vd2z84pmodnzrkgaskdrryulrciogzaoqkkq6tsadxxq5nd/gw7tffqugwbi9qhehhj/2aeywspalubvhuchyibq6xjczddomxdm+32jgs75mivsmojkjcctueidrxgiduspth0ootzxsvlulsmr+hybgby1tla1z4e9glkeelqnvzn04vplepgakcsnm2ruh1eiiuoxwmwsewebe6jh1oaxgqh2qnmyuywrea6wevfhofcs/yzudcfcnhvquv08hlem9k/tdrx8q76sqsigifaz1f5mucqxbpe8y9joaqptaaumh0rjgoii5n6x/i6hds9sdvgkalbftl+/kqx71fxjfbz+gcfzcnrlevpaiaqiokptcpgqzd+dj+zzsznktny+e5rstlhv4be+6yug==</latexit> <latexit sha1_base64="sm6vytap6fyzmwqaxzfbyvkj7/y=">aaacvhicbvhlsgmxfm1mrdb6qrp0eyxcc7bmfefbhkibwu0f+4bolzn0thubezbkpgxor+pc8evcudb9lprwqudk3hosmxm34kwqy/o2znrwensns5vd2z84pmodnzrkgaskdrryulrciogzaoqkkq6tsadxxq5nd/gw7tffqugwbi9qhehhj/2aeywspalubvhuchyibq6xjczddomxdm+32jgs75mivsmojkjcctueidrxgiduspth0ootzxsvlulsmr+hybgby1tla1z4e9glkeelqnvzn04vplepgakcsnm2ruh1eiiuoxwmwsewebe6jh1oaxgqh2qnmyuywrea6wevfhofcs/yzudcfcnhvquv08hlem9k/tdrx8q76sqsigifaz1f5mucqxbpe8y9joaqptaaumh0rjgoii5n6x/i6hds9sdvgkalbftl+/kqx71fxjfbz+gcfzcnrlevpaiaqiokptcpgqzd+dj+zzsznktny+e5rstlhv4be+6yug==</latexit> Sec. 15.2.3 Kernels Why use kernels? Make non-separable problem separable. Map data into better representational space Can be learned to capture some notion of similarity Common kernels Linear Polynomial K(x,z) = (1+x T z) d Gives feature combinations Radial basis function (infinite dimensional space) Text classification: K(x i, x j ; Usually use linear or polynomial kernels )=exp( kx i x j k 2 2 2 ) 26
<latexit sha1_base64="bfdpyt2r/13uygdxjg10ltjtgzk=">aaacknicbzdlssnafizpvnz6q7p0m1iefqukbnthoupgcfprxqapztkdtemnkzazkzyq8g3c+bzu3hshflc+ielbuks/dpx85xzmnn8jofpanmfgwuls8spqzi27vrg5tz3b2a0pp5sevonpfdlwskkccvrvthpacctfnsnp3elfpfx6gerffhgvhwftebgrmmsi1glq5y5sfxrtsa2qzqs6q8xixjzo4rtf6kzge1j3hdd6icfkgh2dijpcdsq2c3mzze6e/hprzvll4sv5iwbu2rmr3ffj6fghccdkns0z0k0is80ip3hwdhunmonjlm0mvmcpqly0otvghwnpinexyrmatejpiqh7sg09j+lmf1xztrt+v2ug2j1rruweoaactd9yq460j9lcuidjsjqfjgytyzjdeelhiylo0k1dsozp/mtqjyxllfm3vr58cvnlyb8ooaawneizrqecvsdwbk/wbu/gszeyxsbhthxbmm3sws8zn19qvkin</latexit> <latexit sha1_base64="phhx/+akqn7zk8pthbbc7ukojya=">aaacknicbzdlssnafiyn9vbrrerszwarwpssungfi6obwu3f3qajytkdtemnkzazkzaqd3hnxudw56ylpbj1ldyytaw19yebn++cw5zzowgjuun6wmssla+srmxxcxubw9s7+d29hvrdgukd+8wxlqdjwignduuvi61aeoq5jdsd/nvabw6ikntnntumiowhlqcuxuglym5fmjl07egooek5vg/e0eqs6keuxfc2ahpi9rw3eogn5at+gbi8hg7m2fmcxtyngovgmjlcpfryob5qj1u7pzi7pg49whvmsmq2oqfkipbqfdms58xqkgdhpuqsdmi58oi0osmpmtxksae6vkgev3bcf09eyjny6dljz7qonk+l8l9ao1tuurvrhoskcdz9ya0zvd5mc4mdkghwbjgyhavndow4hwtckkk3dcgyp3nrne7lhl427oxc5qpmlquh4baugqhoqaxcgcqoawyewct4a+/aszbsxtrhtdwjzwb2wr9pn9/v96qk</latexit> <latexit sha1_base64="phhx/+akqn7zk8pthbbc7ukojya=">aaacknicbzdlssnafiyn9vbrrerszwarwpssungfi6obwu3f3qajytkdtemnkzazkzaqd3hnxudw56ylpbj1ldyytaw19yebn++cw5zzowgjuun6wmssla+srmxxcxubw9s7+d29hvrdgukd+8wxlqdjwignduuvi61aeoq5jdsd/nvabw6ikntnntumiowhlqcuxuglym5fmjl07egooek5vg/e0eqs6keuxfc2ahpi9rw3eogn5at+gbi8hg7m2fmcxtyngovgmjlcpfryob5qj1u7pzi7pg49whvmsmq2oqfkipbqfdms58xqkgdhpuqsdmi58oi0osmpmtxksae6vkgev3bcf09eyjny6dljz7qonk+l8l9ao1tuurvrhoskcdz9ya0zvd5mc4mdkghwbjgyhavndow4hwtckkk3dcgyp3nrne7lhl427oxc5qpmlquh4baugqhoqaxcgcqoawyewct4a+/aszbsxtrhtdwjzwb2wr9pn9/v96qk</latexit> <latexit sha1_base64="z3g6f71ynzfevx5kulacdciqajk=">aaacknicbzdlssnafiyn9vbrlerszwarkkpj3oiy6kzwu9feoalhmp20qyetmdmplpdnceoruolckw59ejm0olb+mpdznxoyc343zfqqw5hppzxvtfwn8mzla3tnd0/fp2jlibkythdaatf1kssmctjsvdhsdqvbvstixx3dzvxomahja/6kjigxfttg1kmyqrq5+ruli9+j5rhalmphdgitxmihylac72uwj9tq9elnjcfn8aecwjpowoqjv426kqsug7mwvvco6ehtqx/gycdcyyak7jlgqowycuuxi0nfiiqjer6haemlliofsdvot03gsur60ate+ricof09esnfyonvpp3zonkxlsh/ar1ievd2thkykclx/cmvylafmmsn9qkgwlfjahawnn0v4iescks03swec/hkzdo+qjtg3xwwq42bio4yoalhoazmcaka4a40qqtg8alewdv40f61qtbtpuetja2yoqr/ph19a7kcprw=</latexit> Sec. 15.1 Classification with SVMs with Kernels Given a new point x, we can compute its score i.e., P sv2sv svk(x sv, x)+b Decide class based on whether < or > 0 E.g., in scikit-learn -1 0 1 27
Sec. 15.1 Pros and Cons of the SVM Classifier Pros High accuracy Fast classification speed Works with cases where #features > #samples Can adapt to different objects (due to Kernel) Cons Any K(u, v) can be used if symmetric, continuous, and positive semi-definite. Or explicit engineer the feature space. Training does not scale well with number of training samples (O(d*n 2 ) or O(d*n 3 )) Hyper-parameters needs tuning 28
Ch. 15 Resources for today s lecture Christopher J. C. Burges. 1998. A Tutorial on Support Vector Machines for Pattern Recognition S. T. Dumais. 1998. Using SVMs for text categorization, IEEE Intelligent Systems, 13(4) S. T. Dumais, J. Platt, D. Heckerman and M. Sahami. 1998. Inductive learning algorithms and representations for text categorization. CIKM 98, pp. 148-155. Yiming Yang, Xin Liu. 1999. A re-examination of text categorization methods. 22nd Annual International SIGIR Tong Zhang, Frank J. Oles. 2001. Text Categorization Based on Regularized Linear Classification Methods. Information Retrieval 4(1): 5-31 Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, New York. T. Joachims, Learning to Classify Text using Support Vector Machines. Kluwer, 2002. Fan Li, Yiming Yang. 2003. A Loss Function Analysis for Classification Methods in Text Categorization. ICML 2003: 472-479. Tie-Yan Liu, Yiming Yang, Hao Wan, et al. 2005. Support Vector Machines Classification with Very Large Scale Taxonomy, SIGKDD Explorations, 7(1): 36-43. Classic Reuters-21578 data set: http://www.daviddlewis.com /resources /testcollections/reuters21578/