SUPPORT VECTOR MACHINE

Similar documents
Perceptron Revisited: Linear Separators. Support Vector Machines

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

CS145: INTRODUCTION TO DATA MINING

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support Vector Machine & Its Applications

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Linear Classification and SVM. Dr. Xin Zhang

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods

Kernel Methods and Support Vector Machines

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

(Kernels +) Support Vector Machines

Introduction to Support Vector Machines

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS

Lecture Support Vector Machine (SVM) Classifiers

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Support Vector Machine (SVM) and Kernel Methods

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Support Vector Machines and Kernel Methods

Kernels and the Kernel Trick. Machine Learning Fall 2017

Linear & nonlinear classifiers

Lecture 10: Support Vector Machine and Large Margin Classifier

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Linear & nonlinear classifiers

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

ML (cont.): SUPPORT VECTOR MACHINES

Max Margin-Classifier

Introduction to SVM and RVM

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Pattern Recognition 2018 Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Jeff Howbert Introduction to Machine Learning Winter

CS249: ADVANCED DATA MINING

Support Vector Machine (continued)

Announcements - Homework

Support vector machines Lecture 4

Support Vector Machines

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

Lecture 10: A brief introduction to Support Vector Machine

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Support Vector Machines.

Support Vector Machines

Support Vector Machines

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector and Kernel Methods

Support Vector Machines

Polyhedral Computation. Linear Classifiers & the SVM

SVMs, Duality and the Kernel Trick

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

L5 Support Vector Classification

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machine

Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Review: Support vector machines. Machine learning techniques and image analysis

Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Chapter 6: Classification

Statistical Pattern Recognition

Neural networks and support vector machines

Support Vector Machines Explained

Support Vector Machines, Kernel SVM

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Modelli Lineari (Generalizzati) e SVM

Introduction to Support Vector Machines

Statistical Methods for NLP

CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Support Vector Machine

Support Vector Machines. Machine Learning Fall 2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

CS798: Selected topics in Machine Learning

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machine

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture Notes on Support Vector Machine

Support Vector Machines. Maximizing the Margin

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Kernelized Perceptron Support Vector Machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Statistical Machine Learning from Data

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning : Support Vector Machines

Soft-Margin Support Vector Machine

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

SVMC An introduction to Support Vector Machines Classification

A Tutorial on Support Vector Machine

Machine Learning A Geometric Approach

Transcription:

SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1

Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition è Primal form Alternative interpretation from Empirical Risk Minimization point of view. Understand the final solution (in the dual form) Generalizes well to Kernel functions SVM + Kernels 2

Ch. 15 Linear classifiers: Which Hyperplane? w T x i + b = 0 Lots of possible solutions for a, b, c. Some methods find a separating hyperplane, but not the optimal one [according to some criterion of expected goodness] E.g., perceptron Support Vector Machine (SVM) finds an optimal* solution. Maximizes the distance between the hyperplane and the difficult points close to decision boundary One intuition: if there are no points near the decision surface, then there are no very uncertain classification decisions This line represents the decision boundary: ax + by c = 0 3

Sec. 15.1 Another intuition If you have to place a fat separator between classes, you have less choices, and so the capacity of the model has been decreased 4

Sec. 15.1 Support Vector Machine (SVM) SVMs maximize the margin around the separating hyperplane. A.k.a. large margin classifiers The decision function is fully specified by a subset of training samples, the support vectors. Solving SVMs is a quadratic programming problem Seen by many as the most successful current text classification method* Support vectors Maximizes Narrower margin margin *but other discriminative methods often perform very similarly 5

Sec. 15.1 Maximum Margin: Formalization w: decision hyperplane normal vector x i : data point i y i : class of data point i (+1 or -1) Classifier is: f(x i ) = sign(w T x i + b) Functional margin of x i is: y i (w T x i + b) But note that we can increase this margin simply by scaling w, b. Functional margin of dataset is twice the minimum functional margin for any point NB: Not 1/0 NB: a common trick The factor of 2 comes from measuring the whole width of the margin 6

A w T x i + b = 7.4 Largest Margin + + + w B + + + + + + w T x i + b = 0 - w T x i + b = 3.7 C - - - - - - - - Distance from the separating hyperplane corresponds to the confidence of prediction Example: We are more sure about the class of A and B than of C 7

Sec. 15.1 w Geometric Margin Distance from example to the separator is Examples closest to the hyperplane are support vectors. Margin ρ of the separator is the width of separation between support vectors of classes. x r x ρ r = y w T x + b w Algebraic derivation of finding r: Dotted line x x is perpendicular to decision boundary so parallel to w. Unit vector is w/ w, so line is rw/ w, for some r. x = x yrw/ w. x satisfies w T x +b = 0. So w T (x yrw/ w ) + b = 0 Recall that w = sqrt(w T w). So w T x yr w + b = 0 So, solving for r gives: r = y(w T x + b)/ w 8

<latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="hp+6lruf2d3tzaldqaqqvekmxyw=">aaab2xicbzdnsgmxfixv1l86vq1rn8eiucozbnqpuhfzwbzco5rm5k4bmskmyr2hdh0bf25efc93vo3pz0jbdwq+zknivscullqubn9ebwd3b/+gfugfnfzjk9nmo2fz0gjsilzl5jnmfpxu2cvjcp8lgzylffbj6f0i77+gstlxtzqrmmr4wmtuck7o6oyaraadlmw2ivxdc9yanb+gss7kddujxa0dhefbucunsafw7g9liwuxuz7ggupnm7rrtrxzzi6dk7a0n+5oykv394ukz9bostjdzdhn7ga2mp/lbiwlt1eldvesarh6kc0vo5wtdmajnchizrxwyasblykjn1yqa8z3hysbg29d77odbu3wmya6nmmfxeein3ahd9cblghi4bxevyn35n2suqp569lo4i+8zx84xio4</latexit> <latexit sha1_base64="6lcawpc66yuuxmextyjv8lpzvfu=">aaachhicbvbnswmxej31s9av6tvluaqpuna96exrevfywvahu5rsom2d2eyszaql+g+8+fe8ccqi/8a0xugrd5k8ew9cmi/olltk+5/ezozc/mjiaam8vlk6tl7zwgnandccgyjvqbmjuuulntzikskbzcbpyoxx8e35yl++q2nlqq9okggu8j6wxsk4oaldoqkv1z2f7gyf1vhojsuxc5toij0v52svfvcitsh1kthrv3b8qj8g+0ucguxagxq78hj2upenqekobm0r8dokhtyqfarvy2fumepilvew5ajmcdpooj7znu06pco6qxflexurp28mewltiildz8kpb6e9kfif18qpexqnpc5yqi0md3vzxshlo9byrxoupaaocggk+ystfw64ibdt2yuqti/8lzqpqoffds59kmewbmmebhaip3abdwiagad4gld48x69z+99etemv+s2cb/gfxwbjpcihw==</latexit> <latexit sha1_base64="6lcawpc66yuuxmextyjv8lpzvfu=">aaachhicbvbnswmxej31s9av6tvluaqpuna96exrevfywvahu5rsom2d2eyszaql+g+8+fe8ccqi/8a0xugrd5k8ew9cmi/olltk+5/ezozc/mjiaam8vlk6tl7zwgnandccgyjvqbmjuuulntzikskbzcbpyoxx8e35yl++q2nlqq9okggu8j6wxsk4oaldoqkv1z2f7gyf1vhojsuxc5toij0v52svfvcitsh1kthrv3b8qj8g+0ucguxagxq78hj2upenqekobm0r8dokhtyqfarvy2fumepilvew5ajmcdpooj7znu06pco6qxflexurp28mewltiildz8kpb6e9kfif18qpexqnpc5yqi0md3vzxshlo9byrxoupaaocggk+ystfw64ibdt2yuqti/8lzqpqoffds59kmewbmmebhaip3abdwiagad4gld48x69z+99etemv+s2cb/gfxwbjpcihw==</latexit> <latexit sha1_base64="4cpnfqrmsw/wku4zs1giw1jocw0=">aaacj3icbvbnswmxem36wetx1aoxybe8snn1ohelrrepfewhdevjptm2njtdklmhlp4bl/4vl4kk6nf/ytquok0dybx5b4zkxhblydb1p52l5zxvtfxmrnzza3tnn7e3xznrojluesqj3qiyaskuvfggheasgywbhhowuj7o9xvqrktqdocxtelwu6iroenltxnxvmsqj4gwtmmz+npwxfk/bhppkc2zu/xt8cj42adktj3luwv3gnqrecnikzqq7dyl34l4eojclpkxtc+nstviggwxmm76iygy8qhrqdncxuiwrdf0zze9tkyhdintj0i6zx9pjfhozdambgfisg/mtqn5n9zmshvrggkvjwikzx7qjpjircem0y7qwfeolwbcc/txyvtmm47w2qw1wztferhuzgqew/bu3xyxnnqriyfkijwqj5ytirkhfvilndyqj/jk3pxh59l5dz5mrutoonna/otz9q0m2qot</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> <latexit sha1_base64="ammlhsrrrhy9zurpr71nq69ebsm=">aaacj3icbvblswmxem76rpvv9eglwaqpunzf0iui9ekxgn1adynzdnogs9klmrvk6b/x4l/xiqiihv0npt0vfa0k8833zzdmfyzsghtdd2dmdm5+ybgwvfxewv1bl21snkycag51hstyt0jmqaofdrqoozvoyfeoornexez05i1oi2j1jcmegoj1legjztbsndkpl5nqs6dn+7rkfz0vj9rvgez6nufsrn5vpdy+dgaz7ztkbswdbv0lvbyusr61tunj78y8juahl8yytucmgiyyrseljit+aibh/ib1ow2hyhgyydtdc0x3ldolvvjbo5bo2e8tixyzm4xc2xkxhjjf2ot8t2un2dsorkilkyli2uo9vfkm6cq02huaomqhbyxryf9k+ybpxtfaw7qmel9x/gsabxxprxhxh+wzam5hgwythbjhphjezsglqze64esopjbn8ulco4/oq/owtc44+cww+rhoxycogqox</latexit> Help from Inner Product Remember: Dot product / inner product ha, Bi = kakkbk cos! "#$% (Or use an algebraic derivation similar to prev slide) A s projection onto B = <A, B> / <B, B> 9

Derivation of the Geometric Margin A + d(a, L) w L w T x + b = 0 d(a, L) = <A, w>/<w, w> - <H, w>/<w, w> Note that H is on the line L, so <w, H> + b = 0 Therefore = (<A, w> + b)/<w, w> H (0,0) 10

Sec. 15.1 Linear SVM Mathematically The linearly separable case Assume that all data is at least distance 1 from the hyperplane, then the following two constraints follow for a training set {(x i,y i )} w T x i + b 1 if y i = 1 w T x i + b 1 if y i = 1 For support vectors, the inequality becomes an equality Then, since each example s distance from the hyperplane is r = y w T x + b w The margin is: ρ = 2 w 11

Sec. 15.1 Derivation of ρ ρ w T x a + b = 1 Hyperplane w T x + b = 0 w T x b + b = -1 Extra scale constraint: min i=1,,n w T x i + b = 1 This implies: w T (x a x b ) = 2 ρ = x a x b 2 = 2/ w 2 w T x + b = 0 12

Sec. 15.1 Solving the Optimization Problem Find w and b such that Φ(w) =½ w T w is minimized; and for all {(x i,y i )}: y i (w T x i + b) 1 This is now optimizing a quadratic function subject to linear constraints Quadratic optimization problems are a well-known class of mathematical programming problem, and many (intricate) algorithms exist for solving them (with many special ones built for SVMs) The solution involves constructing a dual problem where a Lagrange multiplier α i is associated with every constraint in the primary problem: Find α 1 α N such that Q(α) =Σα i - ½ΣΣα i α j y i y j x it x j is maximized and (1) Σα i y i = 0 (2) α i 0 for all α i 13

Sec. 15.1 Geometric Interpretation Find w and b such that Φ(w) =½ w T w is minimized; and for all {(x i,y i )}: y i (w T x i + b) 1 What are fixed and what are variables? Linear constraint(s): Where can the variables be? Quadratic objective function: 14

<latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> Sec. 15.1 The Optimization Problem Solution The solution has the form: w =Σα i y i x i b= y k - w T x k for any x k such that α k ¹ 0 Each non-zero α i indicates that corresponding x i is a support vector. Then the scoring function will have the form: f(x) = Σα i y i x it x + b Q: What are the model parameters? What does f(x) mean intuitively? Classification is based on the sign of f(x) f(x) = X sv2sv y sv sv hx sv, xi + b Notice that it relies on an inner product between the test point x and the support vectors x i We will return to this later. Also keep in mind that solving the optimization problem involved computing the inner products <x i,x j > between all pairs of training points. 15

Sec. 15.2.1 Soft Margin Classification If the training data is not linearly separable, slack variables ξ i can be added to allow misclassification of difficult or noisy examples. Allow some errors Let some points be moved to where they belong, at a cost Still, try to minimize training set errors, and to place hyperplane far from each class (large margin) ξ i ξ j 16

Soft Margin Classification Mathematically Sec. 15.2.1 [Optional] The old formulation: Find w and b such that Φ(w) =½ w T w is minimized and for all {(x i,y i )} y i (w T x i + b) 1 The new formulation incorporating slack variables: Find w and b such that Φ(w) =½ w T w + CΣξ i is minimized and for all {(x i,y i )} y i (w T x i + b) 1- ξ i and ξ i 0 for all i Parameter C can be viewed as a way to control overfitting A regularization term 17

Alternative View SVM in the natural form arg min w, b 1 2 w w Margin + C n å max Hyper-parameter related to regularization { 0,1 - y ( w x + b) } i i i= 1 Empirical loss L (how well we fit training data) SVM uses Hinge Loss : 0/1 loss penalty min w, b 1 2 s. t. " i, y w i 2 ( w x i n + Cåxi i= 1 + b) ³ 1-x i -1 0 1 2 Hinge loss: max{0, 1-z} z = y ( x w b) i i + 18

Soft Margin Classification Solution The dual problem for soft margin classification: Find α 1 α N such that Q(α) =Σα i - ½ΣΣα i α j y i y j x it x j is maximized and (1) Σα i y i = 0 (2) 0 α i C for all α i Neither slack variables ξ i nor their Lagrange multipliers appear in the dual problem! Again, x i with non-zero α i will be support vectors. Solution to the dual problem is: Sec. 15.2.1 [Optional] w = Σα i y i x i b = y k (1- ξ k ) - w T x k where k = argmax α k k w is not needed explicitly for classification! f(x) = Σα i y i x it x + b 19

Sec. 15.1 Classification with SVMs Given a new point x, we can score its projection onto the hyperplane normal: I.e., compute score: w T x + b = Σα i y i x it x + b Decide class based on whether < or > 0 Can set confidence threshold t. Score > t: yes Score < -t: no Else: don t know -1 0 1 20

<latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> <latexit sha1_base64="k4b7e5oi/ah6ln+wpsjrzzeou5a=">aaacvnicbvfda9swfjxd9wppl9s+7uwymghpcxyptc+fsr70swnlwohcufbkrfssjssxbum/2b20p2uvy3iswnbugodonhulq6o0kmk6oh4nwpupq2vrgx9bm1vbo7vr3n7p5qvhvmtymzv7fc2xqvoue07y+8jwvknkd+nddepfpxjjra5/ugnbbwrhwmscofpsmflzivxojmlwpdvhcanulmpy2uegqsp3xg3tzlcdzapcauvzthbjorl1whl4e8jmpfksqjmxhem6jnpxj54b3pnkqdpkgdth9exhossv145jtlafxiubvgicyjlxlvpaxib7wdhve6prctuozrhu8muri8hy45d2mfoxoypu1k5v6iubye1brxh/5/vll10mkqgl0nhn5hdlpqsxq5mxjithzmmpj8im8lmcm6bb5vxpthwiydsnvye9004sd5jvz+2rr4s4nsgn8pkckosckytyq25jlzdyk/wkwmaleal+h6vh+rw0dby9b+qfhnefkko08q==</latexit> Sec. 15.2.1 Linear SVMs: Summary The classifier is a separating hyperplane. The most important training points are the support vectors; they define the hyperplane. Quadratic optimization algorithms can identify which training points x i are support vectors with non-zero Lagrangian multipliers α i. Both in the dual formulation of the problem and in the solution, training points appear only inside inner products: Find α 1 α N such that Q(α) =Σα i - ½ΣΣα i α j y i y j x it x j is maximized and (1) Σα i y i = 0 (2) 0 α i C for all α i f(x) = X sv2sv y sv sv hx sv, xi + b 21

Sec. 15.2.3 Non-linear SVMs Datasets that are linearly separable (with some noise) work out great: 0 x But what are we going to do if the dataset is just too hard? 0 x How about mapping data to a higher-dimensional space: x 2 0 x 22

Sec. 15.2.3 Non-linear SVMs: Feature spaces General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x φ(x) 23

Sec. 15.2.3 The Kernel Trick The linear classifier relies on an inner product between vectors K(x i,x j )=x it x j If every datapoint is mapped into high-dimensional space via some transformation Φ: x φ(x), the inner product becomes: K(x i,x j )= φ(x i ) T φ(x j ) A kernel function is some function that corresponds to an inner product in some expanded feature space. Usually, no need to construct the feature space explicitly. 24

<latexit sha1_base64="d1qsll5qi5urxdl1qzwun/vsatg=">aaadqxicrzjda9swfiyvex9d9tf0u9ynwfhx6qh2gkw3hblddlalfpy0leqcrmijuln2ponamp5t+w+727+znhinm/ruexjec86jo9ehe6zsgpd93w3hffdw0eodj82nz56/ogwdveybjnom91giez0iqefskn4dazipus1phep+hd58kuvxs66nsnq3wkv8fnozepfgfgxqctt4+cujmyv5govz8q7/1cvibb/jc+wf+brvgxfoiemlojbuykkajvui3x14y9q69x97eckj8ijj4omoobvguq60mm1h0xv/e/nx9057g1q4xhllkf4dl4r7zdhgsensgywql7cestoxtwgd3e6rnrcjm7tafsdfh7wvgkq0uxuuj61fzjqwloykmktgdam/hvfonqgmedekmeepztd0xodwkhpzm8rxm1bgtzyzxvgi7acar7p1gzmnjvnfosvli2a3vibvqw0zim5guvbpblyxzunrjjekufxbpbwam5arkyjtwnrfbe41zwcxu2mheoz+8r7odzub3wmu3rcvplbjoecv0rvkoqb9qbfom7pepcscy+er03p67ql75q7c7xvuavr3xqe7x2v/agvakvo=</latexit> <latexit sha1_base64="d1qsll5qi5urxdl1qzwun/vsatg=">aaadqxicrzjda9swfiyvex9d9tf0u9ynwfhx6qh2gkw3hblddlalfpy0leqcrmijuln2ponamp5t+w+727+znhinm/ruexjec86jo9ehe6zsgpd93w3hffdw0eodj82nz56/ogwdveybjnom91giez0iqefskn4dazipus1phep+hd58kuvxs66nsnq3wkv8fnozepfgfgxqctt4+cujmyv5govz8q7/1cvibb/jc+wf+brvgxfoiemlojbuykkajvui3x14y9q69x97eckj8ijj4omoobvguq60mm1h0xv/e/nx9057g1q4xhllkf4dl4r7zdhgsensgywql7cestoxtwgd3e6rnrcjm7tafsdfh7wvgkq0uxuuj61fzjqwloykmktgdam/hvfonqgmedekmeepztd0xodwkhpzm8rxm1bgtzyzxvgi7acar7p1gzmnjvnfosvli2a3vibvqw0zim5guvbpblyxzunrjjekufxbpbwam5arkyjtwnrfbe41zwcxu2mheoz+8r7odzub3wmu3rcvplbjoecv0rvkoqb9qbfom7pepcscy+er03p67ql75q7c7xvuavr3xqe7x2v/agvakvo=</latexit> <latexit sha1_base64="d1qsll5qi5urxdl1qzwun/vsatg=">aaadqxicrzjda9swfiyvex9d9tf0u9ynwfhx6qh2gkw3hblddlalfpy0leqcrmijuln2ponamp5t+w+727+znhinm/ruexjec86jo9ehe6zsgpd93w3hffdw0eodj82nz56/ogwdveybjnom91giez0iqefskn4dazipus1phep+hd58kuvxs66nsnq3wkv8fnozepfgfgxqctt4+cujmyv5govz8q7/1cvibb/jc+wf+brvgxfoiemlojbuykkajvui3x14y9q69x97eckj8ijj4omoobvguq60mm1h0xv/e/nx9057g1q4xhllkf4dl4r7zdhgsensgywql7cestoxtwgd3e6rnrcjm7tafsdfh7wvgkq0uxuuj61fzjqwloykmktgdam/hvfonqgmedekmeepztd0xodwkhpzm8rxm1bgtzyzxvgi7acar7p1gzmnjvnfosvli2a3vibvqw0zim5guvbpblyxzunrjjekufxbpbwam5arkyjtwnrfbe41zwcxu2mheoz+8r7odzub3wmu3rcvplbjoecv0rvkoqb9qbfom7pepcscy+er03p67ql75q7c7xvuavr3xqe7x2v/agvakvo=</latexit> <latexit sha1_base64="d1qsll5qi5urxdl1qzwun/vsatg=">aaadqxicrzjda9swfiyvex9d9tf0u9ynwfhx6qh2gkw3hblddlalfpy0leqcrmijuln2ponamp5t+w+727+znhinm/ruexjec86jo9ehe6zsgpd93w3hffdw0eodj82nz56/ogwdveybjnom91giez0iqefskn4dazipus1phep+hd58kuvxs66nsnq3wkv8fnozepfgfgxqctt4+cujmyv5govz8q7/1cvibb/jc+wf+brvgxfoiemlojbuykkajvui3x14y9q69x97eckj8ijj4omoobvguq60mm1h0xv/e/nx9057g1q4xhllkf4dl4r7zdhgsensgywql7cestoxtwgd3e6rnrcjm7tafsdfh7wvgkq0uxuuj61fzjqwloykmktgdam/hvfonqgmedekmeepztd0xodwkhpzm8rxm1bgtzyzxvgi7acar7p1gzmnjvnfosvli2a3vibvqw0zim5guvbpblyxzunrjjekufxbpbwam5arkyjtwnrfbe41zwcxu2mheoz+8r7odzub3wmu3rcvplbjoecv0rvkoqb9qbfom7pepcscy+er03p67ql75q7c7xvuavr3xqe7x2v/agvakvo=</latexit> <latexit sha1_base64="2viisliyyouiydhxxsbcw6hb9fc=">aaacznicdvjnbxmxepuux2x5suhixskayofonqtckkugisquqsjtptinvf5vytvrb+3zimctupkz+a3cuhhhf+bnkti0zsrlb957y489ziolhatpzyi+dv3gzvs7t5m7d+/df9dbfxjgtg25mhcjjd3kmbnkajebcuocvvawmlpimdt52+qhz8i6afqnwfvivrkfloxkdai17/2m1vlu0zlbmit83bza+5hmyig1zwjp5ecmifg5pu7ugh82/5zzna1ya+5qpw/pjv14ufxqmw6u7s6d5h9vpn9jmm0ipdshqud/ez/2fezvzhv9djcua18g5bz0r0+/v/ufebrpez9obnhdcg1cmeemjk1g5pkfyzvoelo7utf+whzigqbmpxazvx5hg58fjsefswfpwgu2w+fz6dyqzikz7d5d1frykm1aq/f65qwuahcabw4qaoxb4ha2ojdwcfcrabi3mvsk+zjzxih8gcq8arl45cvgydgg6yb8jp3rg7sjhfqypuf7ikbxaiteozgaib59ie6jl5gpx/fz3mrfn9y4oq95hlyi/vyh0kpifg==</latexit> <latexit sha1_base64="8lvcsces1hi+5kxmqsidaz+cuse=">aaacznicdvlnbhmxepzugzblpwgoxcwcqbyidnnpl5ucsaijs5biwyloi6/xm1j12lt7tik1vlx5cd6fz+dgaybx5gnwjhvs2jkspw++7xt77hfasmehjn8e4canm7c2t25hd+7eu7/defdwworkmd5iwmpzlfllpvb8baikpyonp0uq+wf68qbrd8+4sukrj7ao+asgmyvywsh4atr5scq52cefhxmau6p+gfcxsflmkjd60ohpdztg55jyuwoux/9ztpe0zdty6/wsovv24/5aqcu4uto7t5l/vbnszvkvez4leeeq+9v7ssogy3ra6ca9ebn4kkguqhfw9nvbx7+/vhpoo99jpllvcavmumvhsvzcxfedgkler6syvktshm742enfc24nbjmogj/ztizzbfxsgjdsu8lrwtpfkxpn0729rdxkddq4gnxv4oqqk+ckrq7kk4lb42a2obogm5aldygzwvek2zwaysd/gmg/qnl5ylfbqb+xxl3kq9idvear2ekp0ro0gxk0iwbohrqiewlb++a0oa9coazpwjr8vlkgwuxni7qw4zc/schj7g==</latexit> <latexit sha1_base64="8lvcsces1hi+5kxmqsidaz+cuse=">aaacznicdvlnbhmxepzugzblpwgoxcwcqbyidnnpl5ucsaijs5biwyloi6/xm1j12lt7tik1vlx5cd6fz+dgaybx5gnwjhvs2jkspw++7xt77hfasmehjn8e4canm7c2t25hd+7eu7/defdwworkmd5iwmpzlfllpvb8baikpyonp0uq+wf68qbrd8+4sukrj7ao+asgmyvywsh4atr5scq52cefhxmau6p+gfcxsflmkjd60ohpdztg55jyuwoux/9ztpe0zdty6/wsovv24/5aqcu4uto7t5l/vbnszvkvez4leeeq+9v7ssogy3ra6ca9ebn4kkguqhfw9nvbx7+/vhpoo99jpllvcavmumvhsvzcxfedgkler6syvktshm742enfc24nbjmogj/ztizzbfxsgjdsu8lrwtpfkxpn0729rdxkddq4gnxv4oqqk+ckrq7kk4lb42a2obogm5aldygzwvek2zwaysd/gmg/qnl5ylfbqb+xxl3kq9idvear2ekp0ro0gxk0iwbohrqiewlb++a0oa9coazpwjr8vlkgwuxni7qw4zc/schj7g==</latexit> <latexit sha1_base64="i9b7qoombkvay63whktbh29l0zo=">aaacznicdvjnbxmxepuuhy3lv4ajf4siva5e61zopvlvxpc4bim0lei08nq9ivwvvbvnqwzrxzxfx40fwp/am0swactilt6898yej51vsjpi019rfo/+g4c7u4+sx0+epnvee/hyxjnacjhmrhl7ljenlnridbkuokusygwmxgl2cdzqp1fcomn0v1hwylqyuzaf5awcnev9ptvc7tgswsirfn28xweyzmiutc8caev1kxd8dln3acepm3/o2ypwuqf3t563dmd+ptwq6bjxsnfntvk/kp9/im02evkaukhzv72fewqmama9fjpiv4fva7ibfbsj0az3k+ag16xqwbvzbklscqaewzbcisahtrmv4xdsliybalykn/wr52jw28dkuda2la14xxyrpcudw5zzclbdu5tas96ltwoo9qde6qogofn6okjwgaxu3xbn0gooahka41agxjffmms4hb+qhcgqm1e+du6ga5ioybfspzzajgmxvuzv0b4i6cm6rj/qci0rjz5hl9g3ymej+cpu4u9raxxtal6hryh//ag7dn/m</latexit> <latexit sha1_base64="6xnp0rp2hhgjt16tzhq5/ljedwo=">aaacznicdzlljtmwfiadcbvczqos2vgu0lcgirubdvifekjiuyq6m1ldqrzhaa1x7ix9ulgsic2pxtowy8eg98bpr0pmwpes/f7+/ythcbjksqdp+iukr12/cfpwzu3kzt1793d7dx7uo1nblibckgmpm+aeklpmqiish5uvrmyuomio37x+wupyj43+dotkzeq20lkqnefa894fwi3lhi0zllpcr5ox+a2mmvhi7bmarfzsjaq/x9sdwpdd5l9yvseqn+cu9vmwd+jhw3mnnedw6t65s/lfl89fkuym1c0iw5pqofoz2y88bvm1814/hasbwpcforx90dmf738jhmbz3k+ag16xqgnxzlkpssuyewzbciwahnzoviwfs4wybqlzkdzmb66jwc8cyxfhbfga8iz2ozwrnvuxwui207ulxguv8qy1fk9nxuqqbqh59kvfrtay3n4tzquvhnq6cmatdlnivmswcqh/qbi+arl45mtifzgg6yb8iv3rw7sthfqypuf7ikbxaiq+odgaib59je6ir5gpx/eqbujv22gcnfy8qucq/v4x4h3ihg==</latexit> <latexit sha1_base64="swfpnszmjb7nzskswirnwqoeqva=">aaacznicdzllbtqwfiadcgkjtyks2vgmoljgfm+gbionrujibaajasunpyphcwasok5qn4yywhfbxoj34rny8qbirhkcnjmqpbeozon39/8noy6tlepaioofqxjj5q3bg5t3orv37j942nl6tg+lynax4ouqzghcrfbsixfiuokwnillirihyfhbxj9ycgnlot/bshstnm20zcrn4ng084uwc7lncwbzjhol+ixextqrm6ld4qgrn+ui4bey2hmdrl//s05xwkuf2ov9tmgt+fh/qkmruhbat25t/tfl0lekpgdva/ywrlto9hz2i0ehkotppxv34lxhq4kcie7g2fd3v/98ezocdn7qtobvljrwxawdk7ieiwmgjfeijmhlrcn4mzujszea5cjo3oo6avzckxrnhffla17rdodjubxlpphjznp72wvgdd64gmxn4qqukxcar1+uvqpdgzu7xak0gonaesg4kx5wzofmma7+d4j8rycxj3xv7pd7jo6rj6q72epr2krp0fo0jqh6jqboprqieelbh+akoa1coawxyr1+wufd4kznmbpq4de/wzvj9g==</latexit> <latexit sha1_base64="swfpnszmjb7nzskswirnwqoeqva=">aaacznicdzllbtqwfiadcgkjtyks2vgmoljgfm+gbionrujibaajasunpyphcwasok5qn4yywhfbxoj34rny8qbirhkcnjmqpbeozon39/8noy6tlepaioofqxjj5q3bg5t3orv37j942nl6tg+lynax4ouqzghcrfbsixfiuokwnillirihyfhbxj9ycgnlot/bshstnm20zcrn4ng084uwc7lncwbzjhol+ixextqrm6ld4qgrn+ui4bey2hmdrl//s05xwkuf2ov9tmgt+fh/qkmruhbat25t/tfl0lekpgdva/ywrlto9hz2i0ehkotppxv34lxhq4kcie7g2fd3v/98ezocdn7qtobvljrwxawdk7ieiwmgjfeijmhlrcn4mzujszea5cjo3oo6avzckxrnhffla17rdodjubxlpphjznp72wvgdd64gmxn4qqukxcar1+uvqpdgzu7xak0gonaesg4kx5wzofmma7+d4j8rycxj3xv7pd7jo6rj6q72epr2krp0fo0jqh6jqboprqieelbh+akoa1coawxyr1+wufd4kznmbpq4de/wzvj9g==</latexit> <latexit sha1_base64="3uzauo6bvgmpogiu3jqxhppyibq=">aaacznicdzjnb9mwgmedjjcrxlbgymwiao0dvdwlxjamueziuis6taq7yngc1ppjz/atimjfu+7zcemd8d1w2mpklzxspl9////jplgtvuo6snpfubxz7/6dh7upksdpnj7b6z1/cermbbkyc6ompcmye0pqmqyjspxuvrayu+i4o/vs+sdlyz00+juskjet2vzlqnigac16f2i1kpu0zldicr9s3ufpmgzilrxpartyr5mq/bztd27bd5t/ydkaq9yau9vpw9yjnw6vnxscg6e7c2fxvy6fvyfnfahbejy0oulnv7ofegqmama9fjpi14vvc7ivfbst0az3i+ag16xqwbvzbklscqaewzbcisahtrmv42dslizbalykn/xr62jwm0byxbgbhg14tbsdnpxorcosjnvp3u2vhxd5kxqkj1mvdvwd0hzzoqjwgaxu7xbn0gooahue41agwtffmms4hd8gcydabn7ybxe0hjb0ql6r/shn7xhsolfondphbh1ab+gqjday8ehrdb79jhw8ipdxe19song07xmjrlv8+rfltt/u</latexit> <latexit sha1_base64="1kyf+2q/8lefqjl+aeb9ereqkjm=">aaacknicbvdlssnafl2pr1pfuzdubotquuqic90ivtecmwpwhsytk+nedp08mjkussi/uhpjr7jpqilu/ranrwj9hbg4nhmuc+/xys6ksqyruziznztfkc6wlpzxvtfm9y0bgswc0aajectupcwpzyftkky4vysfxyhh6a3xo8/92z4vkkxhtrre1a3wfch8rrdsuts8vaw4avzdz0+tbb998x62i05qxuz76ntupy6k4mw61dpebbnsva0x0f9if5jybdvzewcaetscop2ijaenfefyyqztxcpnsvcmcjqvnetsgjmevqdntumcuomm41mztkovdvijov+o0fidnkhxioug8hqy31l+9nlxp6+zkp/ytvkyj4qgzpkrn3ckipt3hjpmukl4qbnmbno7itlfahol2y3peuzfj/8lnwdv26rav3a5dgytfgeltqecnhxbds6gdg0g8ajp8akvxpmxnebg2yramd5nnuehjpcp5ugn0q==</latexit> <latexit sha1_base64="iz9fiyagel9be2kyucjrkhyirm8=">aaacknicbvdlsgmxfm3uv62vuzduqotqqzqzxehgqlor3fswttczlkyaaumzd5jmyrjml/whn/6kmy6u4typmx2itfva4hdouete44smcmkyiy2zsrq2vphdzg1t7+zu6fshtykiocy1hlcanxwkckm+qukqgwmencdpyatu9g/hfn1aukcb/yjjknge6vruprhjjbx16/ui5shzc9wksk/hdx+kj/akfk1ygr92k7fkekbzody5bosfo2xmajejosofst4qpy8qcbwtd61ogcop+bizjettnejpj4hlihljc1yksihwh3vju1efeutyyetufb4rpqpdgkvnszhr5ycs5akre45kjrcui95y/m9rrtk9tbpqh5ekpp5+5eymygcoe4mdygmwlfyeyu7vrhd3eedyqnzzqgrz8erl8nrwno2y+wawkjdgiiw4anlqbca4abvwb6qgbjb4aw/ghxxor9pqg2mf02hgm80cgj/qvr4b7qupvw==</latexit> <latexit sha1_base64="iz9fiyagel9be2kyucjrkhyirm8=">aaacknicbvdlsgmxfm3uv62vuzduqotqqzqzxehgqlor3fswttczlkyaaumzd5jmyrjml/whn/6kmy6u4typmx2itfva4hdouete44smcmkyiy2zsrq2vphdzg1t7+zu6fshtykiocy1hlcanxwkckm+qukqgwmencdpyatu9g/hfn1aukcb/yjjknge6vruprhjjbx16/ui5shzc9wksk/hdx+kj/akfk1ygr92k7fkekbzody5bosfo2xmajejosofst4qpy8qcbwtd61ogcop+bizjettnejpj4hlihljc1yksihwh3vju1efeutyyetufb4rpqpdgkvnszhr5ycs5akre45kjrcui95y/m9rrtk9tbpqh5ekpp5+5eymygcoe4mdygmwlfyeyu7vrhd3eedyqnzzqgrz8erl8nrwno2y+wawkjdgiiw4anlqbca4abvwb6qgbjb4aw/ghxxor9pqg2mf02hgm80cgj/qvr4b7qupvw==</latexit> <latexit sha1_base64="zb0hw2lqsnveleufnzv0xylpxqi=">aaacknicbvdlssnafj3uv62vqes3g0wokcxrhw6eqhvbtqx7gcytk+mkhtp5mdmpljdvceovuolckw79ecdtxnp6yobwzrnmvccjgrxsmczabmv1bx0jv1ny2t7z3dp3d+oiidgmnrywgdcdjaijpqljkhlphpwgz2gk4qzuu78xjfzqwh+wo5dyhur51kuyssv19nvhkuuh2xfcoero4q8fjqfwbpzmeaz/7xzsysbm5kpts9jri0bzmaiuezmjrzch2thhvjfakud8irksomuaobrjxcxfjcqfkxikrhiaeqslqi88iux4emoct5tshw7a1fmlnkrzezhyhbh5jkqmw4pflxx/81qrdk/tmpphjimpzx+5eymygglvses5wzknfegyu7urxh3eezaq3yiqwvw8eznul8qmutafzglllqsjd47amsgbe1ybcngavvadglyan/aoprrxbaxntm9znkdlm4fgd7svb9lcpkg=</latexit> Sec. 15.2.3 Example What about K(u, v) =(1+u > v) 3? K(u, v) =(1+u > v) 2 =1+2u > v +(u > v) 2 X i =1+2u > v + 0 =1+2u > v + @ X i! 2 u i v i 1 u 2 i vi 2 + X u i u j v i v j A i6=j = (u) > (v) (u) = 1 p 2u1... p 2ud u 2 1... u 2 d u 1 u 2... u d 1 u d > (v) = 1 p 2v1... p 2vd v 2 1... v 2 d v 1 v 2... v d 1 u d > Linear Non-linear Non-linear + feature combination 25

<latexit sha1_base64="sm6vytap6fyzmwqaxzfbyvkj7/y=">aaacvhicbvhlsgmxfm1mrdb6qrp0eyxcc7bmfefbhkibwu0f+4bolzn0thubezbkpgxor+pc8evcudb9lprwqudk3hosmxm34kwqy/o2znrwensns5vd2z84pmodnzrkgaskdrryulrciogzaoqkkq6tsadxxq5nd/gw7tffqugwbi9qhehhj/2aeywspalubvhuchyibq6xjczddomxdm+32jgs75mivsmojkjcctueidrxgiduspth0ootzxsvlulsmr+hybgby1tla1z4e9glkeelqnvzn04vplepgakcsnm2ruh1eiiuoxwmwsewebe6jh1oaxgqh2qnmyuywrea6wevfhofcs/yzudcfcnhvquv08hlem9k/tdrx8q76sqsigifaz1f5mucqxbpe8y9joaqptaaumh0rjgoii5n6x/i6hds9sdvgkalbftl+/kqx71fxjfbz+gcfzcnrlevpaiaqiokptcpgqzd+dj+zzsznktny+e5rstlhv4be+6yug==</latexit> <latexit sha1_base64="sm6vytap6fyzmwqaxzfbyvkj7/y=">aaacvhicbvhlsgmxfm1mrdb6qrp0eyxcc7bmfefbhkibwu0f+4bolzn0thubezbkpgxor+pc8evcudb9lprwqudk3hosmxm34kwqy/o2znrwensns5vd2z84pmodnzrkgaskdrryulrciogzaoqkkq6tsadxxq5nd/gw7tffqugwbi9qhehhj/2aeywspalubvhuchyibq6xjczddomxdm+32jgs75mivsmojkjcctueidrxgiduspth0ootzxsvlulsmr+hybgby1tla1z4e9glkeelqnvzn04vplepgakcsnm2ruh1eiiuoxwmwsewebe6jh1oaxgqh2qnmyuywrea6wevfhofcs/yzudcfcnhvquv08hlem9k/tdrx8q76sqsigifaz1f5mucqxbpe8y9joaqptaaumh0rjgoii5n6x/i6hds9sdvgkalbftl+/kqx71fxjfbz+gcfzcnrlevpaiaqiokptcpgqzd+dj+zzsznktny+e5rstlhv4be+6yug==</latexit> <latexit sha1_base64="sm6vytap6fyzmwqaxzfbyvkj7/y=">aaacvhicbvhlsgmxfm1mrdb6qrp0eyxcc7bmfefbhkibwu0f+4bolzn0thubezbkpgxor+pc8evcudb9lprwqudk3hosmxm34kwqy/o2znrwensns5vd2z84pmodnzrkgaskdrryulrciogzaoqkkq6tsadxxq5nd/gw7tffqugwbi9qhehhj/2aeywspalubvhuchyibq6xjczddomxdm+32jgs75mivsmojkjcctueidrxgiduspth0ootzxsvlulsmr+hybgby1tla1z4e9glkeelqnvzn04vplepgakcsnm2ruh1eiiuoxwmwsewebe6jh1oaxgqh2qnmyuywrea6wevfhofcs/yzudcfcnhvquv08hlem9k/tdrx8q76sqsigifaz1f5mucqxbpe8y9joaqptaaumh0rjgoii5n6x/i6hds9sdvgkalbftl+/kqx71fxjfbz+gcfzcnrlevpaiaqiokptcpgqzd+dj+zzsznktny+e5rstlhv4be+6yug==</latexit> <latexit sha1_base64="sm6vytap6fyzmwqaxzfbyvkj7/y=">aaacvhicbvhlsgmxfm1mrdb6qrp0eyxcc7bmfefbhkibwu0f+4bolzn0thubezbkpgxor+pc8evcudb9lprwqudk3hosmxm34kwqy/o2znrwensns5vd2z84pmodnzrkgaskdrryulrciogzaoqkkq6tsadxxq5nd/gw7tffqugwbi9qhehhj/2aeywspalubvhuchyibq6xjczddomxdm+32jgs75mivsmojkjcctueidrxgiduspth0ootzxsvlulsmr+hybgby1tla1z4e9glkeelqnvzn04vplepgakcsnm2ruh1eiiuoxwmwsewebe6jh1oaxgqh2qnmyuywrea6wevfhofcs/yzudcfcnhvquv08hlem9k/tdrx8q76sqsigifaz1f5mucqxbpe8y9joaqptaaumh0rjgoii5n6x/i6hds9sdvgkalbftl+/kqx71fxjfbz+gcfzcnrlevpaiaqiokptcpgqzd+dj+zzsznktny+e5rstlhv4be+6yug==</latexit> Sec. 15.2.3 Kernels Why use kernels? Make non-separable problem separable. Map data into better representational space Can be learned to capture some notion of similarity Common kernels Linear Polynomial K(x,z) = (1+x T z) d Gives feature combinations Radial basis function (infinite dimensional space) Text classification: K(x i, x j ; Usually use linear or polynomial kernels )=exp( kx i x j k 2 2 2 ) 26

<latexit sha1_base64="bfdpyt2r/13uygdxjg10ltjtgzk=">aaacknicbzdlssnafizpvnz6q7p0m1iefqukbnthoupgcfprxqapztkdtemnkzazkzyq8g3c+bzu3hshflc+ielbuks/dpx85xzmnn8jofpanmfgwuls8spqzi27vrg5tz3b2a0pp5sevonpfdlwskkccvrvthpacctfnsnp3elfpfx6gerffhgvhwftebgrmmsi1glq5y5sfxrtsa2qzqs6q8xixjzo4rtf6kzge1j3hdd6icfkgh2dijpcdsq2c3mzze6e/hprzvll4sv5iwbu2rmr3ffj6fghccdkns0z0k0is80ip3hwdhunmonjlm0mvmcpqly0otvghwnpinexyrmatejpiqh7sg09j+lmf1xztrt+v2ug2j1rruweoaactd9yq460j9lcuidjsjqfjgytyzjdeelhiylo0k1dsozp/mtqjyxllfm3vr58cvnlyb8ooaawneizrqecvsdwbk/wbu/gszeyxsbhthxbmm3sws8zn19qvkin</latexit> <latexit sha1_base64="phhx/+akqn7zk8pthbbc7ukojya=">aaacknicbzdlssnafiyn9vbrrerszwarwpssungfi6obwu3f3qajytkdtemnkzazkzaqd3hnxudw56ylpbj1ldyytaw19yebn++cw5zzowgjuun6wmssla+srmxxcxubw9s7+d29hvrdgukd+8wxlqdjwignduuvi61aeoq5jdsd/nvabw6ikntnntumiowhlqcuxuglym5fmjl07egooek5vg/e0eqs6keuxfc2ahpi9rw3eogn5at+gbi8hg7m2fmcxtyngovgmjlcpfryob5qj1u7pzi7pg49whvmsmq2oqfkipbqfdms58xqkgdhpuqsdmi58oi0osmpmtxksae6vkgev3bcf09eyjny6dljz7qonk+l8l9ao1tuurvrhoskcdz9ya0zvd5mc4mdkghwbjgyhavndow4hwtckkk3dcgyp3nrne7lhl427oxc5qpmlquh4baugqhoqaxcgcqoawyewct4a+/aszbsxtrhtdwjzwb2wr9pn9/v96qk</latexit> <latexit sha1_base64="phhx/+akqn7zk8pthbbc7ukojya=">aaacknicbzdlssnafiyn9vbrrerszwarwpssungfi6obwu3f3qajytkdtemnkzazkzaqd3hnxudw56ylpbj1ldyytaw19yebn++cw5zzowgjuun6wmssla+srmxxcxubw9s7+d29hvrdgukd+8wxlqdjwignduuvi61aeoq5jdsd/nvabw6ikntnntumiowhlqcuxuglym5fmjl07egooek5vg/e0eqs6keuxfc2ahpi9rw3eogn5at+gbi8hg7m2fmcxtyngovgmjlcpfryob5qj1u7pzi7pg49whvmsmq2oqfkipbqfdms58xqkgdhpuqsdmi58oi0osmpmtxksae6vkgev3bcf09eyjny6dljz7qonk+l8l9ao1tuurvrhoskcdz9ya0zvd5mc4mdkghwbjgyhavndow4hwtckkk3dcgyp3nrne7lhl427oxc5qpmlquh4baugqhoqaxcgcqoawyewct4a+/aszbsxtrhtdwjzwb2wr9pn9/v96qk</latexit> <latexit sha1_base64="z3g6f71ynzfevx5kulacdciqajk=">aaacknicbzdlssnafiyn9vbrlerszwarkkpj3oiy6kzwu9feoalhmp20qyetmdmplpdnceoruolckw59ejm0olb+mpdznxoyc343zfqqw5hppzxvtfwn8mzla3tnd0/fp2jlibkythdaatf1kssmctjsvdhsdqvbvstixx3dzvxomahja/6kjigxfttg1kmyqrq5+ruli9+j5rhalmphdgitxmihylac72uwj9tq9elnjcfn8aecwjpowoqjv426kqsug7mwvvco6ehtqx/gycdcyyak7jlgqowycuuxi0nfiiqjer6haemlliofsdvot03gsur60ate+ricof09esnfyonvpp3zonkxlsh/ar1ievd2thkykclx/cmvylafmmsn9qkgwlfjahawnn0v4iescks03swec/hkzdo+qjtg3xwwq42bio4yoalhoazmcaka4a40qqtg8alewdv40f61qtbtpuetja2yoqr/ph19a7kcprw=</latexit> Sec. 15.1 Classification with SVMs with Kernels Given a new point x, we can compute its score i.e., P sv2sv svk(x sv, x)+b Decide class based on whether < or > 0 E.g., in scikit-learn -1 0 1 27

Sec. 15.1 Pros and Cons of the SVM Classifier Pros High accuracy Fast classification speed Works with cases where #features > #samples Can adapt to different objects (due to Kernel) Cons Any K(u, v) can be used if symmetric, continuous, and positive semi-definite. Or explicit engineer the feature space. Training does not scale well with number of training samples (O(d*n 2 ) or O(d*n 3 )) Hyper-parameters needs tuning 28

Ch. 15 Resources for today s lecture Christopher J. C. Burges. 1998. A Tutorial on Support Vector Machines for Pattern Recognition S. T. Dumais. 1998. Using SVMs for text categorization, IEEE Intelligent Systems, 13(4) S. T. Dumais, J. Platt, D. Heckerman and M. Sahami. 1998. Inductive learning algorithms and representations for text categorization. CIKM 98, pp. 148-155. Yiming Yang, Xin Liu. 1999. A re-examination of text categorization methods. 22nd Annual International SIGIR Tong Zhang, Frank J. Oles. 2001. Text Categorization Based on Regularized Linear Classification Methods. Information Retrieval 4(1): 5-31 Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, New York. T. Joachims, Learning to Classify Text using Support Vector Machines. Kluwer, 2002. Fan Li, Yiming Yang. 2003. A Loss Function Analysis for Classification Methods in Text Categorization. ICML 2003: 472-479. Tie-Yan Liu, Yiming Yang, Hao Wan, et al. 2005. Support Vector Machines Classification with Very Large Scale Taxonomy, SIGKDD Explorations, 7(1): 36-43. Classic Reuters-21578 data set: http://www.daviddlewis.com /resources /testcollections/reuters21578/