Linear Classification, SVMs and Nearest Neighbors

1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush between faces and other obects? 2

Feature 2 2 Bnary Classfcaton: Example Faces (class C 1 ) Non-faces (class C 2 ) Feature 1 How do we classfy new data ponts? Bnary Classfcaton: Lnear Classfers x 2 g(x) > 0 g(x) = 0 g(x) < 0 C 1 C 2 Fnd a lne (n general, a hyperplane) separatng the two sets of data ponts: g(x) = w x + b = 0,.e., w 1 x 1 + w 2 x 2 + b = 0 For any new pont x, choose: x 1 class C 1 f g(x) > 0 and class C 2 otherwse

3 Separatng Hyperplane Class 1 w x b 0 denotes +1 output denotes -1 output Class 2 Need to choose w and b based on tranng data 5 Separatng Hyperplanes Dfferent choces of w and b gve dfferent hyperplanes Class 1 Class 2 denotes +1 output denotes -1 output (Ths and next few sldes adapted from Andrew Moore s) 6

4 Whch hyperplane s best? Class 1 denotes +1 output denotes -1 output Class 2 7 How about the one rght n the mddle? Intutvely, ths boundary seems good Avods msclassfcaton of new test ponts f they are generated from the same dstrbuton as tranng ponts 8

5 Margn Defne the margn of a lnear classfer as the wdth that the boundary could be ncreased by before httng a datapont. 9 Maxmum Margn and Support Vector Machne Support Vectors are those dataponts that the margn pushes up aganst The maxmum margn classfer s called a Support Vector Machne (n ths case, a Lnear SVM or LSVM) 10

6 Why Maxmum Margn? Robust to small perturbatons of data ponts near boundary There exsts theory showng ths s best for generalzaton to new ponts Emprcally works great 11 Margn Fndng the Maxmum Margn (For Math Lovers Eyes Only) Can show that we need to maxmze: 2/ w subect to y x w b 1, Constraned optmzaton problem that leads to: w a y x where the a are obtaned by maxmzng: 1 a aa y y ( x x ) 2, subect toa 0 and a y 0 Quadratc programmng (QP) problem - A global maxmum can always be found Depends on dot product of nputs (Interested n more detals? see Burges SVM tutoral onlne) 12

7 What f data s not lnearly separable? Outlers (due to nose) 13 Soft Margn SVMs ξ Allow errors ξ (devatons from margn) Trade off margn wth errors Mnmze: 1 2 2 y w C w x b 1 and 0, subect to: 14

8 Another Example Not lnearly separable 15 Handlng non-lnearly separable data Idea: Map orgnal nput space to hgherdmensonal feature space; use lnear classfer n hgher-dm. space x φ(x) 16

9 Problem: Hgh dmensonal spaces x φ(x) Computaton n hgh-dmensonal feature space s costly The hgh dmensonal proecton functon φ(x) may be too complcated to compute Kernel trck to the rescue! 17 Recall: SVM maxmzes the quadratc functon: 1 a aa y y ( x x ) 2, subect toa 0 and Insght: The Kernel Trck a y 0 The data ponts only appear as dot product No need to compute hgh-dmensonal φ(x) explctly! Just replace nner product x x wth a kernel functon K(x,x ) = φ(x ) φ(x ) E.g., Gaussan kernel K(x,x ) = exp(- x -x 2 /2 2 ) E.g., Polynomal kernel K(x,x ) = x x +1) d 18

10 Example of the Kernel Trck Suppose f(.) s gven as follows: Dot product n the feature space s So, f we defne the kernel functon as follows, there s no need to compute f(.) explctly Use of kernel functon to avod computng f(.) explctly s known as the kernel trck 19 Face Detecton usng SVMs Kernel used: Polynomal of degree 2 (Osuna, Freund, Gros, 1998) 20

11 Support Vectors 21 K-Nearest Neghbors Idea: Do as your neghbors do! Classfy a new data-pont accordng to a maorty vote of your k nearest neghbors How do you measure near? x dscrete (e.g., strngs): Hammng dstance d(x 1,x 2 ) = # features on whch x 1 and x 2 dffer x contnuous (e.g., mages): Eucldean dstance d(x 1,x 2 ) = x 1 -x 2 = square root of sum of squared dfferences between correspondng elements of data vectors 22

12 Example Input Data: 2-D ponts (x 1,x 2 ) Two classes: C 1 and C 2. New Data Pont + K = 4: Look at 4 nearest neghbors. 3 are n C 1, so classfy + as C 1 23 K-NN produces a Nonlnear Decson Boundary Some ponts near the boundary may be msclassfed (but perhaps okay because of nose) 24

13 Next Tme Regresson (Learnng functons wth contnuous outputs) Lnear Regresson Neural Networks To Do: Proect 4 Read Chapter 18 25