Machine Learning. What is a good Decision Boundary? Support Vector Machines

Similar documents
Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 6, 2015

Machine Learning. Support Vector Machines. Eric Xing. Lecture 4, August 12, Reading: Eric CMU,

Machine Learning. Support Vector Machines. Eric Xing , Fall Lecture 9, October 8, 2015

Recap: the SVM problem

Recap: the SVM problem

Machine Learning. Support Vector Machines. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 8, Sept. 13, 2012 Based on slides from Eric Xing, CMU

Support Vector Machines

Support Vector Machines CS434

Support Vector Machines

Lecture 3: Dual problems and Kernels

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Lecture 10 Support Vector Machines II

Which Separator? Spring 1

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Kernel Methods and SVMs Extension

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Linear classification models: Perceptron. CS534-Machine learning

Support Vector Machines

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

Xiangwen Li. March 8th and March 13th, 2001

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

Support Vector Machines CS434

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

Natural Language Processing and Information Retrieval

CHAPTER 7 CONSTRAINED OPTIMIZATION 1: THE KARUSH-KUHN-TUCKER CONDITIONS

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.

CHAPTER 6 CONSTRAINED OPTIMIZATION 1: K-T CONDITIONS

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

COS 511: Theoretical Machine Learning

Maximal Margin Classifier

Classification learning II

Lecture 10 Support Vector Machines. Oct

Excess Error, Approximation Error, and Estimation Error

Lagrange Multipliers Kernel Trick

Generative classification models

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

10-701/ Machine Learning, Fall 2005 Homework 3

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Solving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint

Microarray technology. Supervised learning and analysis of microarray data. Microarrays. Affymetrix arrays. Two computational tasks

18-660: Numerical Methods for Engineering Design and Optimization

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

Pattern Classification

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Polynomial Barrier Method for Solving Linear Programming Problems

CSE 252C: Computer Vision III

1 Definition of Rademacher Complexity

Chapter 6 Support vector machine. Séparateurs à vaste marge

1 Convex Optimization

Lecture 20: November 7

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Kristin P. Bennett. Rensselaer Polytechnic Institute

Preference and Demand Examples

Advanced Introduction to Machine Learning

Computational and Statistical Learning theory Assignment 4

15-381: Artificial Intelligence. Regression and cross validation

Least Squares Fitting of Data

Rectilinear motion. Lecture 2: Kinematics of Particles. External motion is known, find force. External forces are known, find motion

Support Vector Machines

Ensemble Methods: Boosting

17 Support Vector Machines

Linear Classification, SVMs and Nearest Neighbors

An Optimal Bound for Sum of Square Roots of Special Type of Integers

LECTURE :FACTOR ANALYSIS

Multigradient for Neural Networks for Equalizers 1

Least Squares Fitting of Data

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Kernel Methods and SVMs

Homework Assignment 3 Due in class, Thursday October 15

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Structural Extensions of Support Vector Machines. Mark Schmidt March 30, 2009

Support Vector Machines

CSC 411 / CSC D11 / CSC C11

CSE 546 Midterm Exam, Fall 2014(with Solution)

Evaluation of classifiers MLPs

Lecture 6: Support Vector Machines

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

COMP th April, 2007 Clement Pang

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regulariza5on, Sparsity & Lasso

The Karush-Kuhn-Tucker. Nuno Vasconcelos ECE Department, UCSD

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Intro to Visual Recognition

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Elastic Collisions. Definition: two point masses on which no external forces act collide without losing any energy.

Generalized Linear Methods

Lecture Notes on Linear Regression

1 Review From Last Time

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

COS 521: Advanced Algorithms Game Theory and Linear Programming

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

PHYS 705: Classical Mechanics. Calculus of Variations II

Support Vector Machines. More Generally Kernel Methods

Feature Selection: Part 1

Sequential Minimal Optimization for SVM with Pinball Loss

Fisher Linear Discriminant Analysis

Support Vector Novelty Detection

Transcription:

Machne Learnng 0-70/5 70/5-78 78 Sprng 200 Support Vector Machnes Erc Xng Lecture 7 March 5 200 Readng: Chap. 6&7 C.B book and lsted papers Erc Xng @ CMU 2006-200 What s a good Decson Boundar? Consder a bnar classfcaton task th ± labels not 0/ as before. When the tranng eaples are lnearl separable e can set the paraeters of a lnear classfer so that all the tranng eaples are classfed correctl Man decson boundares! Generatve classfers Logstc regressons Are all decson boundares equall good? Class Class 2 Erc Xng @ CMU 2006-200 2

What s a good Decson Boundar? Erc Xng @ CMU 2006-200 3 Not All Decson Boundares Are Equal! Wh e a have such boundares? Irregular dstrbuton Ibalanced tranng szes outlners Erc Xng @ CMU 2006-200 4 2

Classfcaton and Margn Paraeterzng decson boundar Let denote a vector orthogonal to the decson boundar and b denote a scalar "offset" ter then e can rte the decson boundar as: + b 0 Class 2 Class d - d + Erc Xng @ CMU 2006-200 5 Classfcaton and Margn Paraeterzng decson boundar Let denote a vector orthogonal to the decson boundar and b denote a scalar "offset" ter then e can rte the decson boundar as: + b 0 Margn Class Class 2 d - d + +b > +c for all n class 2 +b < c for all n class Or ore copactl: +b >c he argn beteen an to ponts d + d + Erc Xng @ CMU 2006-200 6 3

Mau Margn Classfcaton he "nu" perssble argn s: 2c Here s our Mau Margn Classfcaton proble: a s.t 2c + b c Erc Xng @ CMU 2006-200 7 Mau Margn Classfcaton con'd. he optzaton proble: a s.t b c + b c But note that the agntude of c erel scales and b and does not change the classfcaton boundar at all! h? So e nstead ork on ths cleaner proble: a b s.t + b he soluton to ths leads to the faous Support Vector Machnes - -- beleved b an to be the best "off-the-shelf" supervsed learnng algorth Erc Xng @ CMU 2006-200 8 4

Support vector achne A conve quadratc prograng proble th lnear constrans: a b s.t + b he attaned argn s no gven b Onl a fe of the classfcaton constrants are relevant support vectors Constraned optzaton We can drectl solve ths usng coercal quadratc prograng QP code But e ant to take a ore careful nvestgaton of Lagrange dualt and the soluton of the above n ts dual for. deeper nsght: support vectors kernels ore effcent algorth Erc Xng @ CMU 2006-200 9 d - d + Dgresson to Lagrangan Dualt he Pral Proble Pral: n s.t. he generalzed Lagrangan: f g 0 K k h 0 K l L β f + g + β h the 's 0 ι and β's are called the Lagarangan ultplers k l Lea: a β 0 A re-rtten Pral: f f satsfes pral constrants L β o/ n a β 0 L β Erc Xng @ CMU 2006-200 0 5

Lagrangan Dualt cont. Recall the Pral Proble: he Dual Proble: heore eak dualt: d n a β 0 L β a β 0 n L β a β n n a 0 L β β 0 L β p heore strong dualt: Iff there est a saddle pont of L β e have d p Erc Xng @ CMU 2006-200 A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d a n f g n a f g 0 + + 0 f p g Erc Xng @ CMU 2006-200 2 6

A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d a n f g n a f g 0 + + 0 f p g Erc Xng @ CMU 2006-200 3 A sketch of strong and eak dualt No gnorng h for splct let's look at hat's happenng graphcall n the dualt theores. d a n f g n a f g 0 + + 0 p f f g g Erc Xng @ CMU 2006-200 4 7

he KK condtons If there ests soe saddle pont of L then the saddle pont satsfes the follong "Karush-Kuhn-ucker" KK condtons: L β 0 L β 0 β g 0 g 0 0 K k K l K K K Copleentar slackness Pral feasblt Dual feasblt heore: If and β satsf the KK condton then t s also a soluton to the pral and the dual probles. Erc Xng @ CMU 2006-200 5 Solvng optal argn classfer Recall our opt proble: a b s.t + b hs s equvalent to n b 2 s.t Wrte the Lagrangan: L b + b 2 Recall that can be reforulated as n b a b 0 L No e solve ts dual proble: a n L b + b 0 0 [ ] b Erc Xng @ CMU 2006-200 6 8

9 Erc Xng @ CMU 2006-200 7 he Dual Proble We nze L th respect to and b frst: Note that ples: Plus back to L and usng e have: n a b b L 0 b 0 L b b 0 L b 2 L Erc Xng @ CMU 2006-200 8 he Dual proble cont. No e have the follong dual opt proble: hs s agan a quadratc prograng proble. A global au of can alas be found. But hat's the bg deal?? Note to thngs:. can be recovered b 2. he "kernel" 2 a J. s.t. k 0 0 K See net More later

Support vectors Note the KK condton --- onl a fe 's can be nonzero!! g 0 K 5 0 Class 2 8 0.6 0 0 7 0 2 0 Call the tranng data ponts hose 's are nonzero the support vectors SV 4 0 9 0 Class 3 0 6.4 0.8 Erc Xng @ CMU 2006-200 9 Support vector achnes Once e have the Lagrange ultplers { } e can reconstruct the paraeter vector as a eghted cobnaton of the tranng eaples: SV For testng th a ne data z Copute z + b SV z + b and classf z as class f the su s postve and class 2 otherse Note: need not be fored eplctl Erc Xng @ CMU 2006-200 20 0

Interpretaton of support vector achnes he optal s a lnear cobnaton of a sall nuber of data ponts. hs sparse representaton can be veed as data copresson as n the constructon of knn classfer o copute the eghts { } and to use support vector achnes e need to specf onl the nner products or kernel beteen the eaples We ake decsons b coparng each ne eaple z th onl the support vectors: sgn SV z + b Erc Xng @ CMU 2006-200 2 Non-lnearl Separable Probles Class 2 Class We allo error ξ n classfcaton; t s based on the output of the dscrnant functon +b ξ approates the nuber of sclassfed saples Erc Xng @ CMU 2006-200 22

2 Erc Xng @ CMU 2006-200 23 Soft Margn Hperplane No e have a slghtl dfferent opt proble: ξ are slack varables n optzaton Note that ξ 0 f there s no error for ξ s an upper bound of the nuber of errors C : tradeoff paraeter beteen error and argn s.t b + 0 ξ ξ + b C 2 ξ n Erc Xng @ CMU 2006-200 24 he Optzaton Proble he dual of ths ne constraned optzaton proble s hs s ver slar to the optzaton proble n the lnear separable case ecept that there s an upper bound C on no Once agan a QP solver can be used to fnd 2 a J 0. 0 s.t. C K

he SMO algorth Consder solvng the unconstraned opt proble: We ve alread see three opt algorths!??? Coordnate ascend: Erc Xng @ CMU 2006-200 25 Coordnate ascend Erc Xng @ CMU 2006-200 26 3

Sequental nal optzaton Constraned optzaton: a J 2 s.t. 0 C 0. K Queston: can e do coordnate along one drecton at a te.e. hold all [-] fed and update? Erc Xng @ CMU 2006-200 27 he SMO algorth Repeat tll convergence. Select soe par and to update net usng a heurstc that tres to pck the to that ll allo us to ake the bggest progress toards the global au. 2. Re-optze J th respect to and hle holdng all the other k 's k ; fed. Wll ths procedure converge? Erc Xng @ CMU 2006-200 28 4

5 Erc Xng @ CMU 2006-200 29 Convergence of SMO Let s hold 3 fed and reopt J.r.t. and 2 2 a J. 0 s.t. k C 0 K KK: Erc Xng @ CMU 2006-200 30 Convergence of SMO he constrants: he obectve: Constraned opt:

Cross-valdaton error of SVM he leave-one-out cross-valdaton error does not depend on the densonalt of the feature space but onl on the # of support vectors! # support vectors Leave - one - out CV error # of tranng eaples Erc Xng @ CMU 2006-200 3 Suar Ma-argn decson boundar Constraned conve optzaton Dualt he K condtons and the support vectors Non-separable case and slack varables he SMO algorth Erc Xng @ CMU 2006-200 32 6