Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Similar documents
Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Support Vector Machines for Classification and Regression

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Jeff Howbert Introduction to Machine Learning Winter

Support Vector Machines

Support Vector Machine

Support Vector Machine (continued)

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Machine Learning. Support Vector Machines. Manfred Huber

Introduction to Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines: Maximum Margin Classifiers

CS798: Selected topics in Machine Learning

Support Vector Machine

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

L5 Support Vector Classification

Statistical Machine Learning from Data

Support Vector Machine (SVM) and Kernel Methods

Lecture Notes on Support Vector Machine

Lecture 2: Linear SVM in the Dual

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Lecture Support Vector Machine (SVM) Classifiers

Support Vector Machines

Constrained Optimization and Support Vector Machines

Lecture 10: A brief introduction to Support Vector Machine

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Lecture 18: Optimization Programming

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

CSC 411 Lecture 17: Support Vector Machine

Support Vector Machines

ICS-E4030 Kernel Methods in Machine Learning

Learning with kernels and SVM

Non-linear Support Vector Machines

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Support Vector Machines

Statistical Pattern Recognition

Soft-margin SVM can address linearly separable problems with outliers

LECTURE 7 Support vector machines

Support Vector Machines and Kernel Methods

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Announcements - Homework

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Linear & nonlinear classifiers

Support vector machines

SVM and Kernel machine

ML (cont.): SUPPORT VECTOR MACHINES

CS-E4830 Kernel Methods in Machine Learning

Formulation with slack variables

COMP 875 Announcements

Support Vector Machines

Convex Optimization and Support Vector Machine

Linear & nonlinear classifiers

Review: Support vector machines. Machine learning techniques and image analysis

Constrained optimization

1 Boosting. COS 511: Foundations of Machine Learning. Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/2003

Linear, Binary SVM Classifiers

Constrained Optimization

Support Vector Machines

Pattern Recognition 2018 Support Vector Machines

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Learning From Data Lecture 25 The Kernel Trick

CS145: INTRODUCTION TO DATA MINING

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

UVA CS 4501: Machine Learning

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Support Vector Machines for Regression

Support Vector Machines, Kernel SVM

Kernel Methods and Support Vector Machines

Machine Learning Techniques

Support Vector Machine. Industrial AI Lab.

Soft-Margin Support Vector Machine

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Support Vector Machines

CS , Fall 2011 Assignment 2 Solutions

Polyhedral Computation. Linear Classifiers & the SVM

Lecture 10: Support Vector Machine and Large Margin Classifier

Support Vector Machines

Machine Learning : Support Vector Machines

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

An introduction to Support Vector Machines

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008

Statistical Methods for SVM

Support vector machines Lecture 4

Proof and Implementation of Algorithmic Realization of Learning Using Privileged Information (LUPI) Paradigm: SVM+

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Support Vector Machines Explained

Transcription:

Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao

linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines hyperplane (decision boundary) that will separate the data.

Support Vector Machines port Vector Machines One possible solution. B 1 ne Possible Solution CS479/579: Data Mining Slide-3 Huiping Cao, Slide 3/26

Support Vector Machines port Vector Machines Another possible solution. B 2 nother possible solution CS479/579: Data Mining Slide-4 Huiping Cao, Slide 4/26

Support Vector Machines port Vector Machines Other possible solutions. B 2 her possible solutions CS479/579: Data Mining Slide-5 Huiping Cao, Slide 5/26

Support Vector Machines Which one is better? B1 or B2? port Vector Machines How do you define better? The bigger the margin, the better. B 1 B 2 hich one is better? B1 or B2? ow do you define better? Huiping Cao, Slide 6/26

Support Vector Machines pport Find Vector hyperplane Machines that maximizes the margin; so B1 is better than B2 B 1 B 2 b 21 b 22 margin b 11 b 12 Find hyperplane maximizes the margin => B1 is better than B2 Huiping Cao, Slide 7/26 CS479/579: Data Mining Slide-7

Huiping Cao, Slide 8/26 Linear Support Vector Machine Support Vector Machines Support Vector Machines B 1 + b = 0 b = 1 w x + b = +1 b 11 1 1 if w x + b 1 if w x + b 1 B 1 : w x + b = 0 CS479/579: Data Mining Slide-8 b 11 : w x + b = 1 b 12 : w x + b = 1 Margin = 2! w b 12

Huiping Cao, Slide 9/26 Support Vector Machines! w! x + b = +1 w x B 1 + b = 0! w! x + b =!1 X2 d X1! w b 11 The margin of the decision boundaries: w (x 1 x 2 ) = 2 w (x 1 x 2 ) = w (x 1 x 2 ) cos(θ), where is the norm of a vector. w (x 1 x 2 ) cos(θ) = w d, where d is the length of vector (x 1 x 2 ) in the direction of vector w Thus, w d = 2 Margin: d = 2 w b 12

Huiping Cao, Slide 10/26 Learn a linear SVM Model The training phase of SVM is to estimate the parameters w and b. The parameters must follow two conditions: { 1, if w x i + b 1 y i = 1, if w x i + b 1

Huiping Cao, Slide 11/26 Formulate the problem rationale We want to maximize margin: d = 2 w which is equivalent to minimize L(w) = w 2 2. Subjected to the following constraints: { 1 if w x i + b 1 y i = 1 if w x i + b 1 which is equivalent to y i (w x i + b) 1, i = 1,, N

Huiping Cao, Slide 12/26 Formulate the problem Formalized to the following constrained optimization problem. Objective function: subject to Minimize w 2 2 y i (w x i + b) 1, i = 1,, N Convex optimization problem: Objective function is quadratic Constraints are linear Can be solved using the standard Lagrange multiplier method.

Huiping Cao, Slide 13/26 Lagrange formulation Lagrangian for the optimization problem (take into account the constraints by rewriting the objective function). L P (w, b, λ) = w 2 2 λ i (y i (w x i + b) 1) minimize w.r.t. w and b and maximize w.r.t. each λ i 0 where λ i : Lagrange multiplier To minimize the Lagrangian, take the derivative of L P (w, b, λ) w.r.t. w and b and set them to 0: L N P w = 0 = w = λ i y i x i L P b N = 0 = λ i y i = 0

Huiping Cao, Slide 14/26 Substituting, L P (w, b, λ) to L D (λ) Solving L P (w, b, λ) is still difficult because it solves a large number of parameters w, b, and λ i. Idea: Transform Lagrangian into a function of the Lagrange multipliers only by substituting w and b in L P (w, b, λ), we get (the dual problem) L D (λ) = λ i 1 2 λ i λ j y i y j x i x j j=1 Details: see the next slide. It is very nice to have L D (λ) because it is a simple quadratic form in the vector λ.

Huiping Cao, Slide 15/26 Substituting details, get the dual problem L D (λ) w 2 2 = 1 2 w w = 1 N 2 ( λ i y i x i ) ( N λ j y j x j ) = 1 λ i λ j y i y j x i 2 x j (1) j=1 j=1 λ i (y i (w x i + b) 1) = λ i y i w x i λ i y i b + λ i (2) i = λ i y i ( λ j y j x j )x i + λ i (3) j=1 = λ i λ i λ j y i y j x i x j (4) j=1 Add both sides, get L D (λ) = λ i 1 2 λ i λ j y i y j x i x j j=1

Huiping Cao, Slide 16/26 Finalized L D (λ) L D (λ) = Maximize w.r.t. each λ subject to λ i 1 2 j=1 λ i λ j y i y j x i x j λ i 0 for i = 1, 2,, N and N λ iy i = 0 Solve L D (λ) using quadratic programming (QP). We get all the λ i (Out of the scope).

Huiping Cao, Slide 17/26 Solve L D (λ) QP We are maximizing L D (λ) max λ ( λ i 1 2 λ i λ j y i y j x i x j) j=1 subject to constraints (1) λ i 0 for i = 1, 2,, N and (2) N λ iy i = 0. Translate the objective to minimization because QP packages generally come with minimization. min λ ( 1 2 λ i λ j y i y j x i x j λ i ) j=1

Huiping Cao, Slide 18/26 Solve L D (λ) QP min λ ( 1 2 λ subject to y λ = 0 }{{} linear constraint y 1 y 1 x 1 x 1 y 1 y 2 x 1 x 2 y 1 y N x 1 x N y 2 y 1 x 2 x 1 y 2 y 2 x 2 x 2 y 2 y N x 2 x N y N y 1 x N x 1 y N y 2 x N x 2 y N y N x N x N } {{ } quadratic coefficients }{{} 0 λ }{{} lower bounds upper bounds Let Q represent the matrix with the quadratic coefficients min λ ( 1 2 λ Qλ + ( 1) λ) subject to y λ = 0; λ 0 λ + ( 1) λ }{{} linear

Huiping Cao, Slide 19/26 Lagrange multiplier QP solves λ = λ 1, λ 2,, λ N, where most of them are zeros. Karush-Kuhn-Tucker (KKT) conditions λ i 0 The constraint (zero form with extreme value) λ i (y i (w x i + b) 1) = 0 Either λ i is zero Or y i (w x i + b) 1) = 0 Support vector x i : y i (w x i + b 1) = 0 and λ i > 0 Training instances that do not reside along these hyperplanes have λ i = 0

Huiping Cao, Slide 20/26 Get w and b w and b depend on support vectors x i and its class label y i. w value b value w = λ i y i x i b = y i w x i Idea: Given a support vector (x i, y i ), we have y i (w x i + b) 1 = 0. Multiply y i on both sides, yi 2(w x i + b) y i = 0. yi 2 = 1 because y i = 1 or y i = 1. Then, (w x i + b) y i = 0.

Huiping Cao, Slide 21/26 Get w and b Example x i1 x i2 y i λ i 0.4 0.5 1 100 0.5 0.6-1 100 0.9 0.4-1 0 0.7 0.9-1 0 0.17 0.05 1 0 0.4 0.35 1 0 0.9 0.8-1 0 0.2 0 1 0 Solve λ using quadratic programming packages w = (w 1, w 2 ) w 1 = 2 λ iy i x i1 = 100 1 0.4 + 100 ( 1) 0.5 = 10 w 2 = 2 λ iy i x i2 = 100 1 0.5 + 100 ( 1) 0.6 = 10 b = 1 w x 1 = 1 (( 10) 0.4 + ( 10) (0.5)) = 10

Huiping Cao, Slide 22/26 SVM: given a test data point z y z = sign(w z + b) = sign(( N λ iy i x i )z + b) if y z = 1, the test instance is classified as positive class if y z = 1, the test instance is classified as negative class

Huiping Cao, Slide 23/26 SVM discussions Support Vector Machines What is the problem if not linearly separable? What if the problem is not linearly separable? CS479/579: Data Mining Slide-11

Huiping Cao, Slide 24/26 SVM discussions Nonlinear separable Nonlinear Support Vector Machines What if decision boundary is not linear? CS479/579: Data Mining Slide-13 Do not work in X space. Transform the data into higher dimensional Z space such that the data are linearly separable. L D (λ) = λ i 1 λ i λ j y i y j x i 2 x j j=1 L D (λ) = λ i 1 λ i λ j y i y j z i 2 z j j=1 Apply linear SVM, support vectors live in Z space

Huiping Cao, Slide 25/26 Quadratic programming packages Octave https://www.gnu.org/software/octave/doc/interpreter/quadratic- Programming.html Solve the quadratic program min x (0.5x H x + x q) subject to A x = b lb <= x <= ub A lb <= A in x <= A ub

Huiping Cao, Slide 26/26 Quadratic programming packages (MATLAB) Optimization toolbox in MATLAB: min x ( 1 2 x Hx + f x) such that A x b, Aeq x = beq, lb x ub.