Classification and Support Vector Machine

Size: px

Start display at page:

Download "Classification and Support Vector Machine"

Thomasina Cooper
6 years ago
Views:

1 Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC Convex Optimization Fall , HKUST, Hong Kong

2 Outline of Lecture Problem Statement 1 Problem Statement 2 : Linear/Logistic Regression Linear Regression Regression with Huberized Loss Logistic Regression 3 Support Vector Machine () Linearly Separable Linearly Nonseparable Nonlinear 4 Application: Multiclass Image Classification Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

3 Outline of Lecture Problem Statement 1 Problem Statement 2 : Linear/Logistic Regression Linear Regression Regression with Huberized Loss Logistic Regression 3 Support Vector Machine () Linearly Separable Linearly Nonseparable Nonlinear 4 Application: Multiclass Image Classification Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

4 Classification Problem A set of input points with binary labels: x i R n y i { 1, 1}, i = 1,..., N How to classify the x i s? 5 4 Pos Class Neg Class Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

5 Classification Problem Separate the two classes using a linear model: ŷ = β T x + β 0 We can classify as follows: { predict Pos Class, if sign (ŷ) = +1 predict Neg Class, if sign (ŷ) = 1 Thus, misclassification happens if sign (ŷ) y! Note that then the decision boundary is the hyperplane β T x + β 0 = 0 Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

6 Classification Problem Then our goal is to minimize the number of misclassifications: minimize β 0,β,{ŷ i } subject to N i=1 1 {sign(ŷ i ) y i } ŷ i = β T x i + β 0, i However, the objective loss function is nonconvex and nondifferentiable Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

7 What could we do? Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

8 Outline of Lecture Problem Statement Linear Regression Regression with Huberized Loss Logistic Regression 1 Problem Statement 2 : Linear/Logistic Regression Linear Regression Regression with Huberized Loss Logistic Regression 3 Support Vector Machine () Linearly Separable Linearly Nonseparable Nonlinear 4 Application: Multiclass Image Classification Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

9 Classification as Linear Regression Linear Regression Regression with Huberized Loss Logistic Regression A straightforward idea is to do linear regression, i.e., replacing the misclassification loss with the residual sum of squares loss: minimize β 0,β,{ŷ i } subject to N i=1 ŷ i y i 2 ŷ i = β T x i + β 0, i Note that ŷ i y i 2 = ŷ i y i 2 y 2 i = (1 y iŷ i ) 2 Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

10 From Loss Function Point of View Linear Regression Regression with Huberized Loss Logistic Regression Misclassification loss 1 {sign(ŷ) y} ; squared error loss (ŷ y) 2 = (1 y i ŷ i ) Misclassification Linear Reg: Squared Error Loss y ŷ Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

11 Decision Boundary Problem Statement Linear Regression Regression with Huberized Loss Logistic Regression 5 4 Pos Class Neg Class Linear Reg Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

12 Linear Regression Regression with Huberized Loss Logistic Regression Can we do better? Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

13 Linear Regression Regression with Huberized Loss Logistic Regression Some idea? 10 9 Misclassification Linear Reg: Squared Error Loss y ŷ Can we find some better loss function approximation? Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

14 Famous Huber Loss Problem Statement Linear Regression Regression with Huberized Loss Logistic Regression Huber Loss (with parameter M): { x 2 φ hub (x) = M (2 x M) Select M = 2, define loss as φ hub (1 yŷ): x < M x M Misclassification Linear Reg: Squared Error Huber Loss 7 6 Loss y ŷ Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

15 Famous Huber Loss Problem Statement Linear Regression Regression with Huberized Loss Logistic Regression Actually, we don t need to penalize when yŷ > 0! Define { φ hub (x) x 0 φ hub_pos (x) = 0 x < 0 Select M = 2, define loss as φ hub_pos (1 yŷ): Misclassification Linear Reg: Squared Error Class Huber 7 6 Loss y ŷ Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

16 Minimize the Huberized Loss Linear Regression Regression with Huberized Loss Logistic Regression Then, we take the Huberized loss as the approximation and intend to minimize it: minimize N i=1 φ hub_pos (1 y i ŷ i ) β 0,β,{ŷ i } subject to ŷ i = β T x i + β 0, i where φ hub_pos (x) with M = 2 looks like: φhub pos(x) x Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

17 Decision Boundary Problem Statement Linear Regression Regression with Huberized Loss Logistic Regression Pos Class Neg Class Linear Reg Reg with Huber Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

18 Linear Regression Regression with Huberized Loss Logistic Regression Can we find more approximations? Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

19 Logistic Model Problem Statement Linear Regression Regression with Huberized Loss Logistic Regression Given a data point x R n, model the probability of label y { 1, +1} as (recall that ŷ = β T x + β 0 ): 1 P (Y = y x) = 1 + e y ŷ Here, the function g (z) = 1 1+e z is called logistic function: 1 logistic function 0.8 g(z) z Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

20 Logistic Model Problem Statement Linear Regression Regression with Huberized Loss Logistic Regression The above probability formula amounts to modeling the log-odds ratio as linear model ŷ: log P (Y = 1 x) 1 + eŷ = log P (Y = 1 x) 1 + e ŷ = ŷ = βt x + β 0 Classification rule: { predict Pos Class, if sign (ŷ) = +1 predict Neg Class, if sign (ŷ) = 1 The above classification rule is exactly the same as we wanted before! Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

21 Negative Log-likelihood as Loss Linear Regression Regression with Huberized Loss Logistic Regression Armed with logistic model, we can have the likelihood function: N N 1 LH (β, β 0 ) = P (Y = y i x i ) = 1 + e y i ŷ i i=1 i=1 The loss function can be defined as negative log-likelihood: Loss (β, β 0 ) = log LH (β, β 0 ) = N i=1 ) log (1 + e y i ŷ i Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

22 Logistic Regression Problem Statement Linear Regression Regression with Huberized Loss Logistic Regression Now, minimize the loss function (i.e., maximize the likelihood): minimize β 0,β,{ŷ i } subject to N i=1 log ( 1 + e y i ŷ i ) ŷ i = β T x i + β 0, i Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

23 From Loss Function Point of View Linear Regression Regression with Huberized Loss Logistic Regression Misclassification loss 1 {sign(ŷ) y} ; squared error loss (ŷ y) 2 ; Class Huber φ hub_pos (1 yŷ); Negative log-likelihood log ( 1 + e y ŷ) Misclassification Linear Reg: Squared Error Huber Class Logistic Reg: neg log LH 7 6 Loss y ŷ Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

24 Decision Boundary Problem Statement Linear Regression Regression with Huberized Loss Logistic Regression Pos Class Neg Class Linear Reg Reg with Huber Logistic Reg Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

25 Outline of Lecture Problem Statement Linearly Separable Linearly Nonseparable Nonlinear 1 Problem Statement 2 : Linear/Logistic Regression Linear Regression Regression with Huberized Loss Logistic Regression 3 Support Vector Machine () Linearly Separable Linearly Nonseparable Nonlinear 4 Application: Multiclass Image Classification Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

26 Optimal Separating Hyperplane Linearly Separable Linearly Nonseparable Nonlinear So many hyperplanes can separate the two classes BUT, what is the optimal separating hyperplane? Optimal separating hyperplane should: separate the two classes, and maximize the distance to the closest point from either class. Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

27 Signed Distance Problem Statement Linearly Separable Linearly Nonseparable Nonlinear 1 β βt (x x 0) x decision boundary: β T x + β 0 = 0 signed distance β β x 0 β T x+β 0 = 0 1 β βt (x x 0 ) = 1 ( β T ) x + β 0 β Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

28 Maximize the Margin Linearly Separable Linearly Nonseparable Nonlinear Then the margin is defined as 1 β y ( β T ) x + β 0 Now, maximize the distance to the closest point from either class maximize M β 0,β,M 1 subject to β y ( i β T ) x i + β 0 M, i Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

29 Support Vector Machine Linearly Separable Linearly Nonseparable Nonlinear For any β and β 0 satisfying these inequalities, any positively scaled multiple satisfies them too, then arbitrarily set β = 1 M The above problem amounts to minimize β 0,β subject to β y i ( β T x i + β 0 ) 1, i Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

30 Boundary and Support Vectors Linearly Separable Linearly Nonseparable Nonlinear 6 4 Pos Class Neg Class Linear Reg Reg with Huber Logistic Reg Support Lines Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

31 Linearly Separable Linearly Nonseparable Nonlinear The case we considered before is linearly separable What if it s linearly nonseparable? Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

32 Linearly Nonseparable Case Linearly Separable Linearly Nonseparable Nonlinear 5 4 Pos Class Neg Class Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

33 Linearly Nonseparable Case Linearly Separable Linearly Nonseparable Nonlinear Relax the constraints by Introducing positive slack variables ξ i s: y i ( β T x i + β 0 ) 1 ξi And then penalize i ξ i in the objective function: minimize β 0,β,{ξ i } subject to 1 2 β 2 + C N i=1 ξ i ξ i 0, y i ( β T x i + β 0 ) 1 ξi, i Linearly separable case corresponds to C = Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

34 The as a Penalization Method Linearly Separable Linearly Nonseparable Nonlinear Revisit the primal problem: minimize β 0,β,{ξ i } subject to 1 2 β 2 + C N i=1 ξ i ξ i 0, y i ( β T x i + β 0 ) 1 ξi, i Consider the optimization problem: minimize β 0,β,{ŷ i } subject to N i=1 [1 y iŷ i ] + + λ 2 β 2 ŷ i = β T x i + β 0, i where loss is N i=1 [1 y iŷ i ] + (called hinge loss) Two problems are equivalent with λ = 1/C (linearly separable case corresponds to λ = 0) Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

35 From Loss Function Point of View Linearly Separable Linearly Nonseparable Nonlinear Misclassification loss 1 {sign(ŷ) y} ; squared error loss (ŷ y) 2 ; Class Huber φ hub_pos (1 yŷ); Negative log-likelihood loss log ( 1 + e y ŷ) ; Hinge loss [1 y i ŷ i ] Misclassification Linear Reg: Squared Error Class Huber Logistic Reg: neg log LH : hinge loss 6 Loss y ŷ Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

36 Solving I Problem Statement Linearly Separable Linearly Nonseparable Nonlinear The Lagrange function is L (β, β 0, ξ i, α i, µ i ) = 1 2 β 2 + C i=1 i=1 N i=1 N N [ ( µ i ξ i α i yi β T ) x i + β 0 (1 ξi ) ] where α i 0 and µ i 0, i, are dual variables Setting derivatives w.r.t. β, β 0, {ξ i } to zero ξ i β = N α i y i x i, 0 = i=1 N α i y i, α i = C µ i i i=1 Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

37 Solving II Problem Statement Linearly Separable Linearly Nonseparable Nonlinear Dual function: g (α i, µ i ) = inf L (β, β 0, ξ i, α i, µ i ) β,β 0,{ξ i } = 1 N N N N α i α j y i y j x T i x j + C ξ i (α i + µ i ) ξ i 2 = i=1 j=1 N N α i 1 2 i=1 i=1 j=1 N α i α j y i y j x T i x j β 0 N i=1 j=1 N α i α j y i y j x T i x j i=1 i=1 N α i y i + N i=1 i=1 α i Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

38 Solving III Problem Statement Linearly Separable Linearly Nonseparable Nonlinear Dual problem: maximize {α i } subject to N i=1 α i α i C, i N i=1 α iy i = 0. N N i=1 j=1 α iα j y i y j x T i x j Dual problem is a QP! Vector-matrix representation of the objective: 1 T α 1 2 αt ( Diag (y) X T XDiag (y) ) α Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

39 KKT Conditions Problem Statement Linearly Separable Linearly Nonseparable Nonlinear KKT equations characterize the solution to the primal and dual problems ( ξ i 0, y i β T ) x i + β 0 1 ξi, i N 0 = α i y i, 0 α i C, i β = i=1 N α i y i x i, α i = C µ i, i i=1 0 = µ i ξ i, α i [ yi ( β T x i + β 0 ) (1 ξi ) ] = 0, i Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

40 Finding Decision Boundary Linearly Separable Linearly Nonseparable Nonlinear Given optimal dual αi s, we have: β = N αi y i x i. i=1 Support Vectors: those observations i with nonzero α i For any 0 < α j < C, we have ξ j = 0, and ( N ) y j i=1 α i y ix T i x j + β 0 = 1, hence we can solve for β0 Decision function becomes: D (x) = sign (ŷ) = sign ( ( β T x + β0 ) N ) = sign αi y i x T i x + β0 i=1 Observation: in the dual problem and the above decision function, the only operation on x i s is the inner product! Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

41 Decision Boundary Problem Statement Linearly Separable Linearly Nonseparable Nonlinear 6 4 Pos Class Neg Class C= Large C: focus attention more on points near the boundary small margin Small C: involves data further away large margin Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

42 Linearly Separable Linearly Nonseparable Nonlinear 6 C= C= C=10 6 C= Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

43 Linearly Separable Linearly Nonseparable Nonlinear Decision Boundary: Revisit Linearly Separable Case 6 4 Pos Class Neg Class Linear Reg Reg with Huber Logistic Reg Support Lines Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

44 Linearly Separable Linearly Nonseparable Nonlinear Decision Boundary: Linearly Nonseparable Case 6 4 Pos Class Neg Class Linear Reg Reg with Huber Logistic Reg C=10 Support Lines Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

45 Linearly Separable Linearly Nonseparable Nonlinear For the linearly separable/nonseparable cases, so far works well! But, what if the linear decision boundary doesn t work any more? Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

46 Nonlinear Decision Surface Linearly Separable Linearly Nonseparable Nonlinear Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

47 Feature Transformation Linearly Separable Linearly Nonseparable Nonlinear Naive method: use a curve instead of a line not efficient Feature Transformation: pre-process the data with h : R n H, x h (x) R n : input space; H: feature space [ h(x)= x 2 1 x 2 ] T 2 ============= Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

48 After Feature Transformation Linearly Separable Linearly Nonseparable Nonlinear Using the features h (x) as inputs: Dual problem: maximize N i=1 α i 1 2 {α i } subject to The decision function: ŷ = β T h (x) + β 0 N N i=1 j=1 α iα j y i y j h (x i ) T h (x j ) N i=1 α iy i = 0, 0 α i C, i D (x) = sign ( ( β T ) N ) h (x) + β 0 = sign α i y i h (x i ) T h (x) + β 0 where β 0 can be determined, as before, by solving y i ŷ i = 1 for any x i for which 0 < α i < C Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70 i=1

49 Kernelized Problem Statement Linearly Separable Linearly Nonseparable Nonlinear In fact, we need not specify the transformation h (x) at all, but require only knowledge of the kernel function that computes inner products in the transformed space, i.e.: Dual problem: maximize N i=1 α i 1 2 {α i } subject to K (x i, x j ) h (x i ) T h (x j ) N N i=1 j=1 α iα j y i y j K (x i, x j ) N i=1 α iy i = 0, 0 α i C, i The decision function: ( N ) D (x) =sign α i y i K (x i, x) + β 0 i=1 where β 0 can be determined, as before, by solving y i ŷ i = 1 for any x i for which 0 < α i < C Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

50 Kernel Problem Statement Linearly Separable Linearly Nonseparable Nonlinear K should be a symmetric positive (semi-) definite function The previous linearly corresponds to K (x i, x j ) x T i x j Popular kernels: pth Degree polynomial: K (x i, x j ) = ( 1 + x T i x ) p j Gaussian radial kernel: Neural network: K (x i, x j ) = e x i x j 2 /σ 2 K (x i, x j ) = tanh ( κ 1 x T i x ) j + κ 2 Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

51 Steps for Applying Linearly Separable Linearly Nonseparable Nonlinear Steps: 1 In-sample training: 1 Select the parameter C 2 Select the kernel function K (x i, x j ) and related parameters 3 Solve the dual problem to obtain αi 4 Compute ( β0, and then classify according to N ) sign i=1 α i y ik (x i, x) + β0 2 Out-of-sample testing: 1 Use the trained model to test out-of-samples 2 If the out-of-sample test is not good, adjust parameter C, or kernels, and re-train the model until the out-of-sample result is good enough Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

52 Linearly Separable Linearly Nonseparable Nonlinear Decision Boundary by Gaussian Radial Kernel 1 C=1 σ= Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

53 More than two classes? Linearly Separable Linearly Nonseparable Nonlinear What if there are more than TWO classes? Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

54 Multiclass Classification Setup Linearly Separable Linearly Nonseparable Nonlinear Labels: { 1, +1} {1, 2,..., K} Classification decision rule: f : x R n {1, 2,..., K} 6 Class 1 Class 2 Class Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

55 Multiclass Classification Methods Linearly Separable Linearly Nonseparable Nonlinear Main ideas: Decompose the multiclass classification problem into multiple binary classification problems Use the majority voting principle or a combined decision from a committee to predict the label Common approaches: One-vs-Rest (One-vs-All) approach One-vs-One (Pairwise, All-vs-All) approach Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

56 One-vs-Rest Approach Linearly Separable Linearly Nonseparable Nonlinear Steps: 1 Solve K different binary problems: classify class i as +1 versus the rest classes for all j {1,..., K} \i as 1 2 Assign a test sample to the class arg max i f i (x), where f i (x) is the solution from the i-th problem Properties: Simple to implement, performs well in practice Not optimal (asymptotically) Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

57 Linearly Separable Linearly Nonseparable Nonlinear One-vs-Rest Example: Step 1, training 6 4 Class 1 v.s. Class 2&3 Class 1: label +1 Class 2: label 1 Class 3: label Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

58 Linearly Separable Linearly Nonseparable Nonlinear One-vs-Rest Example: Step 1, training... Class 2 v.s. Class 1&3 Class 3 v.s. Class 1&2 6 6 Class 1: label 1 Class 1: label 1 Class 2: label +1 Class 2: label 1 Class 3: label 1 Class 3: label Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

59 One-vs-Rest Example: Step 2, test Linearly Separable Linearly Nonseparable Nonlinear Test point x = [ 0 2 ] T, by fi (x) = x T β i + β0 i, we have f 1 (x) = , f 2 (x) = , and f 3 (x) = Assign [ 0 2 ] T to class 2! 6 4 Class 1 Class 2 Class 3 Class 1 vs 2&3 Class 2 vs 1&3 Class 3 vs 1&2 Test Sample Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

60 One-vs-One Approach Linearly Separable Linearly Nonseparable Nonlinear Steps: ( ) 1 K Solve different binary problems: classify class i as +1 2 versus each class j i as 1. Each classifier is called f ij 2 For prediction at a point, each classifier is queried once and issues a vote. The class with the maximum number of votes is the winner Properties: Training process is efficient: small binary problems There are too many problems when K is large (If K = 10, we need to train 45 binary classifiers) Simple to implement, performs competitively in practice Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

61 Linearly Separable Linearly Nonseparable Nonlinear One-vs-One Example: Step 1, training 6 4 Class 1 v.s. Class 2 Class 1: label +1 Class 2: label 1 Class Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

62 Linearly Separable Linearly Nonseparable Nonlinear One-vs-One Example: Step 1, training... Class 1 v.s. Class 3 Class 2 v.s. Class Class 1: label +1 Class 1 Class 2 Class 2: label +1 Class 3: label 1 Class 3: label Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

63 One-vs-One Example: Step 2, test Linearly Separable Linearly Nonseparable Nonlinear The same test point x = [ 0 2 ] T : f 12 (x) = < 0, vote to class 2 f 23 (x) = > 0, vote to class 2 f 13 (x) = > 0, vote to class 1 Conclusion: class 2 wins! 6 4 Class 1 Class 2 Class 3 Class 1 vs 2 Class 2 vs 3 Class 1 vs 3 Test Sample Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

64 Outline of Lecture Problem Statement 1 Problem Statement 2 : Linear/Logistic Regression Linear Regression Regression with Huberized Loss Logistic Regression 3 Support Vector Machine () Linearly Separable Linearly Nonseparable Nonlinear 4 Application: Multiclass Image Classification Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

Application: Histogram-based Image Classification Multiclass: air shows, bears, Arabian horses, night scenes, and several 1058 more classes not shown

65 Application: Histogram-based Image Classification Multiclass: air shows, bears, Arabian horses, night scenes, and several 1058 more classes not shown IEEE TRANSACTIONS hereon NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

66 App: Histogram-based Image Multi-Classification, Modeling Why not put image pixels into a vector? Drawbacks: large size lack of invariance with respect to translations Histogram-based Image representation color space: Hue-Saturation-Value (HSV) # of bins per color component = 16 x R 4096 : histogram of the picture, dimension = 16 3 = 4096 y {airshows, bears,... }: the class labels Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

67 App: Histogram-based Image Multi-Classification, Applied s: linear Poly 2: K (x i, x j ) = ( 1 + x T i x j Radial basis function ) 2 K (x i, x j ) = e ρd(x i,x j ) where the distance measure d (x i, x j ) can be Gaussian: d (x i, x j ) = x i x j 2 Laplacian (l 1 distance): d (x i, x j ) = x i x j χ 2 : d (x i, x j ) = k ( x (k) i ) x (k) 2 j x (k) i +x (k) j Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

68 App: Histogram-based Image Multi-Classification, Result Criteria: error rate in percentage Benchmark, K-nearest neighbor (KNN) method Database KNN l 2 KNN χ 2 Corel Corel Database linear Poly 2 Radial basis function Gaussian Laplacian χ 2 Corel Corel Y. Feng and D. Palomar (HKUST) CvxOpt in Image Classification ELEC / 70

69 For Further Reading Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The elements of statistical learning. Springer New York, Olivier Chapelle, Patrick Haffner, and Vladimir N. Vapnik, Support vector machines for histogram-based image classification, IEEE Transactions on Neural Networks. 10(5): , 1999.

70 Thanks For more information visit:

Jeff Howbert Introduction to Machine Learning Winter

Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable