Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 8, 2015

Similar documents
Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 6, 2015

Machine Learning. What is a good Decision Boundary? Support Vector Machines

Machine Learning. Support Vector Machines. Eric Xing. Lecture 4, August 12, Reading: Eric CMU,

Recap: the SVM problem

Recap: the SVM problem

Machine Learning. Support Vector Machines. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 8, Sept. 13, 2012 Based on slides from Eric Xing, CMU

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

Support Vector Machines CS434

Advanced Introduction to Machine Learning

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Microarray technology. Supervised learning and analysis of microarray data. Microarrays. Affymetrix arrays. Two computational tasks

Which Separator? Spring 1

Lecture 3: Dual problems and Kernels

Support Vector Machines

Support Vector Machines

Support Vector Machines

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Support Vector Machines CS434

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

CSE 252C: Computer Vision III

Discriminative classifier: Logistic Regression. CS534-Machine Learning

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

Lecture 10 Support Vector Machines II

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

Chapter 6 Support vector machine. Séparateurs à vaste marge

Kernel Methods and SVMs Extension

Linear classification models: Perceptron. CS534-Machine learning

Lagrange Multipliers Kernel Trick

10-701/ Machine Learning, Fall 2005 Homework 3

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

Xiangwen Li. March 8th and March 13th, 2001

Lecture 10 Support Vector Machines. Oct

Support Vector Machines

Maximal Margin Classifier

Linear Classification, SVMs and Nearest Neighbors

CHAPTER 7 CONSTRAINED OPTIMIZATION 1: THE KARUSH-KUHN-TUCKER CONDITIONS

CHAPTER 6 CONSTRAINED OPTIMIZATION 1: K-T CONDITIONS

Support Vector Machines

Pattern Classification

Generative classification models

Solving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint

Classification learning II

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso

Least Squares Fitting of Data

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Least Squares Fitting of Data

Kristin P. Bennett. Rensselaer Polytechnic Institute

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

LECTURE :FACTOR ANALYSIS

Natural Language Processing and Information Retrieval

18-660: Numerical Methods for Engineering Design and Optimization

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Excess Error, Approximation Error, and Estimation Error

Preference and Demand Examples

Intro to Visual Recognition

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.

COS 511: Theoretical Machine Learning

Multigradient for Neural Networks for Equalizers 1

Rectilinear motion. Lecture 2: Kinematics of Particles. External motion is known, find force. External forces are known, find motion

Lecture 6: Support Vector Machines

1 Convex Optimization

1 Definition of Rademacher Complexity

Kernel Methods and SVMs

Nonlinear Classifiers II

Elastic Collisions. Definition: two point masses on which no external forces act collide without losing any energy.

Gradient Descent Learning and Backpropagation

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

CSC 411 / CSC D11 / CSC C11

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

17 Support Vector Machines

Computational and Statistical Learning theory Assignment 4

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Week 5: Neural Networks

Homework Assignment 3 Due in class, Thursday October 15

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

15-381: Artificial Intelligence. Regression and cross validation

Polynomial Barrier Method for Solving Linear Programming Problems

An Optimal Bound for Sum of Square Roots of Special Type of Integers

Evaluation of classifiers MLPs

Lecture 20: November 7

Linear Feature Engineering 11

FMA901F: Machine Learning Lecture 5: Support Vector Machines. Cristian Sminchisescu

Errors for Linear Systems

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Solutions to Homework 7, Mathematics 1. 1 x. (arccos x) (arccos x) 1

Ensemble Methods: Boosting

COMP th April, 2007 Clement Pang

CSE 546 Midterm Exam, Fall 2014(with Solution)

Feature Selection: Part 1

COS 521: Advanced Algorithms Game Theory and Linear Programming

PHYS 705: Classical Mechanics. Calculus of Variations II

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Chapter 8 SCALAR QUANTIZATION

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1

Transcription:

Machne Learnng 0-70 Fall 205 Support Vector Machnes Erc Xng Lecture 9 Octoer 8 205 Readng: Chap. 6&7 C.B ook and lsted papers Erc Xng @ CMU 2006-205

What s a good Decson Boundar? Consder a nar classfcaton task th = ± laels not 0/ as efore. When the tranng eaples are lnearl separale e can set the paraeters of a lnear classfer so that all the tranng eaples are classfed correctl Man decson oundares! Generatve classfers Logstc regressons Are all decson oundares equall good? Class Class 2 Erc Xng @ CMU 2006-205 2

Not All Decson Boundares Are Equal! Wh e a have such oundares? Irregular dstruton Ialanced tranng szes outlners Erc Xng @ CMU 2006-205 3

Classfcaton and Margn Paraeterzng decson oundar Let denote a vector orthogonal to the decson oundar and denote a scalar "offset" ter then e can rte the decson oundar as: 0 Class 2 Class d - d + Erc Xng @ CMU 2006-205 4

Classfcaton and Margn Paraeterzng decson oundar Let denote a vector orthogonal to the decson oundar and denote a scalar "offset" ter then e can rte the decson oundar as: 0 Margn +/ > +c/ for all n class 2 +/ < c/ for all n class Class Class 2 d - d + Or ore copactl: + / >c/ he argn eteen an to ponts = d + d + = Erc Xng @ CMU 2006-205 5

6 Mau Margn Classfcaton he nu perssle argn s: Here s our Mau Margn Classfcaton prole: c 2 * * c c / / s.t 2 a Erc Xng @ CMU 2006-205

Mau Margn Classfcaton con'd. he optzaton prole: a s.t But note that the agntude of c erel scales and and does not change the classfcaton oundar at all! h? So e nstead ork on ths cleaner prole: a s.t c he soluton to ths leads to the faous Support Vector Machnes - -- eleved an to e the est "off-the-shelf" supervsed learnng algorth c Erc Xng @ CMU 2006-205 7

Support vector achne A conve quadratc prograng prole th lnear constrans: a s.t he attaned argn s no gven Onl a fe of the classfcaton constrants are relevant support vectors d + d - Constraned optzaton We can drectl solve ths usng coercal quadratc prograng QP code But e ant to take a ore careful nvestgaton of Lagrange dualt and the soluton of the aove n ts dual for. deeper nsght: support vectors kernels ore effcent algorth Erc Xng @ CMU 2006-205 8

9 Dgresson to Lagrangan Dualt he Pral Prole Pral: he generalzed Lagrangan: the 's 0 and 's are called the Lagarangan ultplers Lea: A re-rtten Pral: l h k g f s.t. n 0 0 l k h g f L o/ constrants pral satsfes f a f L 0 a n L 0 Erc Xng @ CMU 2006-205

Lagrangan Dualt cont. Recall the Pral Prole: n a 0 L he Dual Prole: heore eak dualt: a 0 n L d * a 0 n L n a 0 L p * heore strong dualt: Iff there est a saddle pont of L e have d * p * Erc Xng @ CMU 2006-205 0

A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d * a n f g n a 0 0 f g p * f g Erc Xng @ CMU 2006-205

A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d * a n f g n a 0 0 f g p * f g Erc Xng @ CMU 2006-205 2

he KK condtons If there ests soe saddle pont of L then the saddle pont satsfes the follong "Karush-Kuhn-ucker" KK condtons: L L α g g 0 0 0 0 0 k l Copleentar slackness Pral feaslt Dual feaslt heore: If * * and * satsf the KK condton then t s also a soluton to the pral and the dual proles. Erc Xng @ CMU 2006-205 3

4 Solvng optal argn classfer Recall our opt prole: hs s equvalent to Wrte the Lagrangan: Recall that * can e reforulated as No e solve ts dual prole: s.t a s.t n 0 2 2 L * a n L 0 n a L 0 Erc Xng @ CMU 2006-205

5 *** he Dual Prole We nze L th respect to and frst: Note that * ples: Plug *** ack to L and usng ** e have: n a L 0 0 L 0 L * 2 L ** 2 L Erc Xng @ CMU 2006-205

he Dual prole cont. No e have the follong dual opt prole: a J 2 hs s agan a quadratc prograng prole. A gloal au of can alas e found. But hat's the g deal?? Note to thngs:. can e recovered 2. he "kernel" s.t. 0 k 0. See net More later Erc Xng @ CMU 2006-205 6

Support vectors Note the KK condton --- onl a fe 's can e nonzero!! α g 0 5 =0 Class 2 8 =0.6 0 =0 7 =0 2 =0 Call the tranng data ponts hose 's are nonzero the support vectors SV 4 =0 9 =0 Class 3 =0 6 =.4 =0.8 Erc Xng @ CMU 2006-205 7

Support vector achnes Once e have the Lagrange ultplers { } e can reconstruct the paraeter vector as a eghted conaton of the tranng eaples: SV For testng th a ne data z Copute z SV z and classf z as class f the su s postve and class 2 otherse Note: need not e fored eplctl Erc Xng @ CMU 2006-205 8

Interpretaton of support vector achnes he optal s a lnear conaton of a sall nuer of data ponts. hs sparse representaton can e veed as data copresson as n the constructon of knn classfer o copute the eghts { } and to use support vector achnes e need to specf onl the nner products or kernel eteen the eaples We ake decsons coparng each ne eaple z th onl the support vectors: * sgn SV z Erc Xng @ CMU 2006-205 9

Non-lnearl Separale Proles Class 2 Class We allo error n classfcaton; t s ased on the output of the dscrnant functon + approates the nuer of sclassfed saples Erc Xng @ CMU 2006-205 20

2 Non-lnear Decson Boundar So far e have onl consdered large-argn classfer th a lnear decson oundar Ho to generalze t to ecoe nonlnear? Ke dea: transfor to a hgher densonal space to ake lfe easer Input space: the space the pont are located Feature space: the space of after transforaton Wh transfor? Lnear operaton n the feature space s equvalent to non-lnear operaton n nput space Classfcaton can ecoe easer th a proper transforaton. In the XOR prole for eaple addng a ne feature of 2 ake the prole lnearl separale hoeork Erc Xng @ CMU 2006-205 2

Non-lnear Decson Boundar Erc Xng @ CMU 2006-205 22

ransforng the Data Input space. Feature space Note: feature space s of hgher denson than the nput space n practce Erc Xng @ CMU 2006-205 23

Erc Xng @ CMU 2006-205 24 24 he Kernel rck Recall the SVM optzaton prole he data ponts onl appear as nner product As long as e can calculate the nner product n the feature space e do not need the appng eplctl Man coon geoetrc operatons angles dstances can e epressed nner products Defne the kernel functon K 2 a J 0. 0 s.t. C K

Erc Xng @ CMU 2006-205 25 25 An Eaple for feature appng and kernels Consder an nput =[ 2 ] Suppose. s gven as follos An nner product n the feature space s So f e defne the kernel functon as follos there s no need to carr out. eplctl 2 2 2 2 2 2 2 2 2 ' ' 2 2 2 ' ' K

More eaples of kernel functons Lnear kernel e've seen t K ' ' Polnoal kernel e ust sa an eaple K here p = 2 3 o get the feature vectors e concatenate all pth order polnoal ters of the coponents of eghted appropratel ' ' p Radal ass kernel K ' ep ' 2 In ths case the feature space conssts of functons and results n a nonparaetrc classfer. 2 Erc Xng @ CMU 2006-205 26

he essence of kernel Feature appng ut thout pang a cost E.g. polnoal kernel Ho an densons e ve got n the ne space? Ho an operatons t takes to copute K? Kernel desgn an prncple? Kz can e thought of as a slart functon eteen and z hs ntuton can e ell reflected n the follong Gaussan functon Slarl one can easl coe up th other K n the sae sprt Is ths necessarl lead to a legal kernel? n the aove partcular case K s a legal one do ou kno ho an denson s? Erc Xng @ CMU 2006-205 27

Kernel atr Suppose for no that K s ndeed a vald kernel correspondng to soe feature appng then for e can copute an atr here hs s called a kernel atr! No f a kernel functon s ndeed a vald kernel and ts eleents are dot-product n the transfored feature space t ust satsf: Setr K=K proof Postve sedefnte proof? Erc Xng @ CMU 2006-205 28

Mercer kernel Erc Xng @ CMU 2006-205 29

SVM eaples Erc Xng @ CMU 2006-205 30

Eaples for Non Lnear SVMs Gaussan Kernel Erc Xng @ CMU 2006-205 3

32 Soft Margn Hperplane No e have a slghtl dfferent opt prole: are slack varales n optzaton Note that =0 f there s no error for s an upper ound of the nuer of errors C : tradeoff paraeter eteen error and argn s.t 0 C 2 n Erc Xng @ CMU 2006-205

33 3 he Optzaton Prole he dual of ths ne constraned optzaton prole s hs s ver slar to the optzaton prole n the lnear separale case ecept that there s an upper ound C on no Once agan a QP solver can e used to fnd 2 a J 0. 0 s.t. C Erc Xng @ CMU 2006-205

he SMO algorth Consder solvng the unconstraned opt prole: We ve alread see three opt algorths!??? Coordnate ascend: Erc Xng @ CMU 2006-205 34

Coordnate ascend Erc Xng @ CMU 2006-205 35

36 Sequental nal optzaton Constraned optzaton: Queston: can e do coordnate along one drecton at a te.e. hold all [-] fed and update? 2 a J 0. 0 s.t. C Erc Xng @ CMU 2006-205

he SMO algorth Repeat tll convergence. Select soe par and to update net usng a heurstc that tres to pck the to that ll allo us to ake the ggest progress toards the gloal au. 2. Re-optze J th respect to and hle holdng all the other k 's k ; fed. Wll ths procedure converge? Erc Xng @ CMU 2006-205 37

38 Convergence of SMO Let s hold 3 fed and reopt J.r.t. and 2 2 a J. 0 s.t. k C 0 KK: Erc Xng @ CMU 2006-205

Convergence of SMO he constrants: he oectve: Constraned opt: Erc Xng @ CMU 2006-205 39

Cross-valdaton error of SVM he leave-one-out cross-valdaton error does not depend on the densonalt of the feature space ut onl on the # of support vectors! Leave - one - out CV error # support vectors # of tranng eaples Erc Xng @ CMU 2006-205 40

Suar Ma-argn decson oundar Constraned conve optzaton Dualt he K condtons and the support vectors Non-separale case and slack varales he kernel trck he SMO algorth Erc Xng @ CMU 2006-205 4