Binary classification: Support Vector Machines

Similar documents
Support vector machines II

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

Support vector machines

Supervised learning: Linear regression Logistic regression

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Kernel-based Methods and Support Vector Machines

Generative classification models

An Introduction to. Support Vector Machine

Linear regression (cont.) Linear methods for classification

Classification : Logistic regression. Generative classification model.

Radial Basis Function Networks

CSE 5526: Introduction to Neural Networks Linear Regression

Regression and the LMS Algorithm

Dimensionality reduction Feature selection

Bayes (Naïve or not) Classifiers: Generative Approach

CS 2750 Machine Learning Lecture 8. Linear regression. Supervised learning. a set of n examples

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

An Improved Support Vector Machine Using Class-Median Vectors *

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Linear regression (cont) Logistic regression

Computational learning and discovery

Multiple Choice Test. Chapter Adequacy of Models for Regression

New Schedule. Dec. 8 same same same Oct. 21. ^2 weeks ^1 week ^1 week. Pattern Recognition for Vision

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Arithmetic Mean and Geometric Mean

A conic cutting surface method for linear-quadraticsemidefinite

Unsupervised Learning and Other Neural Networks

A handwritten signature recognition system based on LSVM. Chen jie ping

A COMPARATIVE STUDY OF THE METHODS OF SOLVING NON-LINEAR PROGRAMMING PROBLEM

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

MMJ 1113 FINITE ELEMENT METHOD Introduction to PART I

Lecture 12 APPROXIMATION OF FIRST ORDER DERIVATIVES

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

6. Nonparametric techniques

Line Fitting and Regression

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

LINEARLY CONSTRAINED MINIMIZATION BY USING NEWTON S METHOD

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

7.0 Equality Contraints: Lagrange Multipliers

Dimensionality Reduction and Learning

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

0/1 INTEGER PROGRAMMING AND SEMIDEFINTE PROGRAMMING

Linear models for classification

Machine Learning. Topic 4: Measuring Distance

Solving optimal margin classifier

ECE 421/599 Electric Energy Systems 7 Optimal Dispatch of Generation. Instructor: Kai Sun Fall 2014

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

The Mathematical Appendix

Nonparametric Density Estimation Intro

Objectives of Multiple Regression

Pinaki Mitra Dept. of CSE IIT Guwahati

ECON 5360 Class Notes GMM

ESS Line Fitting

1 0, x? x x. 1 Root finding. 1.1 Introduction. Solve[x^2-1 0,x] {{x -1},{x 1}} Plot[x^2-1,{x,-2,2}] 3

15-381: Artificial Intelligence. Regression and neural networks (NN)

Point Estimation: definition of estimators

A CLASSIFICATION OF REMOTE SENSING IMAGE BASED ON IMPROVED COMPOUND KERNELS OF SVM

Transforms that are commonly used are separable

QR Factorization and Singular Value Decomposition COS 323

Stochastic GIS cellular automata for land use change simulation: application of a kernel based model

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Chapter 4 Multiple Random Variables

= y and Normed Linear Spaces

Rademacher Complexity. Examples

Lecture 12: Multilayer perceptrons II

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Power Flow S + Buses with either or both Generator Load S G1 S G2 S G3 S D3 S D1 S D4 S D5. S Dk. Injection S G1

ρ < 1 be five real numbers. The

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

9.1 Introduction to the probit and logit models

Generalized Linear Regression with Regularization

ECE 194C Target Classification in Sensor Networks Problem. Fundamental problem in pattern recognition.

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Comparison of SVMs in Number Plate Recognition

Chapter 14 Logistic Regression Models

Correlation and Regression Analysis

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

1 Lyapunov Stability Theory

MIMA Group. Chapter 4 Non-Parameter Estimation. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

4 Inner Product Spaces

Module 7. Lecture 7: Statistical parameter estimation

LECTURE 21: Support Vector Machines

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Mu Sequences/Series Solutions National Convention 2014

Lecture 7: Linear and quadratic classifiers

( x) min. Nonlinear optimization problem without constraints NPP: then. Global minimum of the function f(x)

PROJECTION PROBLEM FOR REGULAR POLYGONS

Algorithms behind the Correlation Setting Window

Lecture 2: The Simple Regression Model

Machine Learning. knowledge acquisition skill refinement. Relation between machine learning and data mining. P. Berka, /18

ANALYSIS ON THE NATURE OF THE BASIC EQUATIONS IN SYNERGETIC INTER-REPRESENTATION NETWORK

Lecture 5: Interpolation. Polynomial interpolation Rational approximation

Log1 Contest Round 2 Theta Complex Numbers. 4 points each. 5 points each

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

ADVANCED SPATIAL DATA ANALYSIS AND MODELLING WITH SUPPORT VECTOR MACHINES

Transcription:

CS 57 Itroducto to AI Lecture 6 Bar classfcato: Support Vector Maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 57 Itro to AI Supervsed learg Data: D { D, D,.., D} a set of eamples D, (,,,,, d s a put vector of sze d s the desred output (gve b a teacher Obectve: lear the mappg f : X Y s.t. f ( for all,.., Regresso: Y s cotuous Eample: eargs, product orders compa stock prce Classfcato: Y s dscrete Eample: hadrtte dgt bar form dgt label CS 57 Itro to AI

Dscrmat fuctos: reve A classfcato model s tpcall defed usg dscrmat fuctos Idea: For each class defe a fucto g ( mappg X Whe the decso o put should be made choose the class th the hghest value of g ( class arg ma ( Works for bar ad mult-class classfcato g CS 57 Itroducto to AI Dscrmat fuctos: reve Assume a bar classfcato problem th classes ad Dscrmat fuctos g ( ad g ( g( g ( g( g ( g( g ( Decso boudar CS 57 Itroducto to AI

Logstc regresso model: reve Model for bar ( class classfcato Defed b dscrmat fuctos: g( g ( g ( g ( here z g ( z /( e s a logstc fucto Iput vector d z Logstc fucto g ( d CS 75 Mache Learg Logstc regresso model. Decso boudar Logstc regresso model defes a lear decso boudar Eample: classes (blue ad red pots g( g ( g( g ( g( g ( CS 57 Itroducto to AI 3

Decso boudar A alteratve a to defe dscrmat fuctos th a lear decso boudar Class : Class -: g g ( ( ( Decso boudar: CS 75 Mache Learg Learl separable classes Learl separable classes: here s a hperplae that separates trag staces th o error Class (+ Class (- CS 75 Mache Learg 4

Learg learl separable sets Fdg eghts for learl separable classes: Lear program (LP soluto It fds eghts that satsf the follog costrats: For all, such that For all, such that ogether: ( Propert: f there s a hperplae separatg the eamples, the lear program fds the soluto CS 75 Mache Learg Optmal separatg hperplae Problem: here are multple hperplaes that separate the data pots Whch oe to choose? CS 75 Mache Learg 5

Optmal separatg hperplae Problem: multple hperplaes that separate the data ests Whch oe to choose? Mamum marg choce: mamum dstace of d d here d s the shortest dstace of a postve eample from the hperplae (smlarl for egatve eamples d d d CS 75 Mache Learg Mamum marg hperplae For the mamum marg hperplae ol eamples o the marg matter (ol these affect the dstaces hese are called support vectors CS 75 Mache Learg 6

Fdg mamum marg hperplaes Assume that eamples the trag set are (, such that {, } Assume that all data satsf: for for he equaltes ca be combed as: d d ( for all Equaltes defe to hperplaes: CS 75 Mache Learg Fdg the mamum marg hperplae Geometrcal marg:, (, ( / L measures the dstace of a pot from the hperplae - ormal to the hperplae.. L - Eucldea orm For pots satsfg: ( he dstace s L Wdth of the marg: d d L CS 75 Mache Learg 7

Mamum marg hperplae We at to mamze d We do t b mmzg d L, L -varables / / But e also eed to eforce the costrats o pots: ( CS 75 Mache Learg Mamum marg hperplae Soluto: Icorporate costrats to the optmzato Optmzato problem (Lagraga J ( - Lagrage multplers,, / ( Mmze th respect to, (prmal varables Mamze th respect to α (dual varables What happes to α: f ( else Actve costrat α > α = CS 75 Mache Learg 8

Ma marg hperplae soluto Set dervatves to (Kuh-ucker codtos J (,, J (,, No e eed to solve for Lagrage parameters (Wolfe dual J (, Quadratc optmzato problem: soluto for all ( Subect to costrats for all, ad mamze CS 75 Mache Learg Mamum marg soluto he resultg parameter vector ŵ ca be epressed as: s the soluto of the optmzato he parameter s obtaed from Soluto propertes for all pots that are ot o the marg he decso boudar: ( ( SV CS 75 Mache Learg he decso boudar defed b support vectors ol α > α = 9

CS 75 Mache Learg Support vector maches he decso boudar: Classfcato decso: ( SV ( sg SV CS 75 Mache Learg Support vector maches: soluto propert Decso boudar defed b the set of support vectors SV ad ther alpha values Support vectors = a subset of datapots the trag data that defe the marg Classfcato decso: Note that e do ot have to eplctl compute hs ll be mportat for the olear (kerel case ( SV ( sg SV ŵ

CS 75 Mache Learg Support vector maches: er product Decso o a e depeds o the er product betee to eamples he decso boudar: Classfcato decso: Smlarl, the optmzato depeds o ( SV ( sg SV ( ( (, J CS 75 Mache Learg Ier product of to vectors he decso boudar for the SVM ad ts optmzato deped o er product of to datapots (vectors: ( ( 6 5 3?

Ier product of to vectors he decso boudar for the SVM ad ts optmzato deped o the er product of to data pots (vectors: 5 6 ( ( 3 5 6* 3 * 5 * 3 6 * 5 CS 75 Mache Learg Ier product of to vectors he decso boudar for the SVM ad ts optmzato deped o the er product of to data pots (vectors: ( he er product s also equal ( * cos If the agle betee them s the: If the agle betee them s 9 the: ( * ( he er product measures ho smlar the to vectors are CS 75 Mache Learg

Eteso to a learl o-separable case Idea: Allo some fleblt o crossg the separatg hperplae CS 75 Mache Learg Eteso to the learl o-separable case Rela costrats th varables for for Error occurs f, s the upper boud o the umber of errors Itroduce a pealt for the errors mmze Subect to costrats / C C set b a user, larger C leads to a larger pealt for a error CS 75 Mache Learg 3

Support vector maches: soluto he soluto of the learl o-separable case has the same propertes as the learl separable case. he decso boudar s defed ol b a set of support vectors (pots that are o the marg or that cross the marg he decso boudar ad the optmzato ca be epressed terms of the er product betee pars of eamples sg J ( ( SV, sg ( CS 75 Mache Learg SV ( Nolear decso boudar So far e have see ho to lear a lear decso boudar But hat f the lear decso boudar s ot good. Ho ca e lear a o-lear decso boudares th the SVM? CS 75 Mache Learg 4

Nolear decso boudar he o-lear case ca be hadled b usg a set of features. Essetall e map put vectors to (larger feature vectors φ( Note that feature epasos are tpcall hgh dmesoal Eamples: polomal epasos Gve the olear feature mappgs, e ca use the lear SVM o the epaded feature vectors ( ' Kerel fucto φ( φ( ' K (, ' φ( φ( ' CS 75 Mache Learg Nolear case he lear case requres to compute ( ' he o-lear case ca be hadled b usg a set of features. Essetall e map put vectors to (larger feature vectors φ( Note that feature epasos are tpcall hgh dmesoal Eamples: polomal epasos No e ca use SVM formalsm o feature vectors ( ' φ( φ( ' Kerel fucto K (, ' φ( φ( ' CS 75 Mache Learg 5

Support vector maches: soluto for olear decso boudares he decso boudar: Classfcato: K (, SV sg sg K (, SV Decso o a e requres to compute the kerel fucto defg the smlart betee the eamples Smlarl, the optmzato depeds o the kerel J (, K (, CS 75 Mache Learg Kerel trck he o-lear case maps put vectors to (larger feature space φ( Note that feature epasos are tpcall hgh dmesoal Eamples: polomal epasos Kerel fucto defes the er product the epaded hgh dmesoal feature vectors ad let us use the SVM ( ' K (, ' φ( φ( ' Problem: after epaso e eed to perform er products a ver hgh dmesoal space Kerel trck: If e choose the kerel fucto sel e ca compute lear separato the hgh dmesoal feature space mplctl b orkg the orgal put space!!!! CS 75 Mache Learg 6

Kerel fucto eample Assume [ ad a feature mappg that maps the put, ] to a quadratc feature set φ( [,,,,,] Kerel fucto for the feature space: K ( ', φ( ' φ( ' ' ' ' ' ' ( ' ' ( ( ' he computato of the lear separato the hgher dmesoal space s performed mplctl the orgal put space CS 75 Mache Learg Kerel fucto eample Lear separator the feature space No-lear separator the put space CS 75 Mache Learg 7

Kerel fuctos Lear kerel K (, ' ' Polomal kerel K (, ' ' k Radal bass kerel K (, ' ep ' CS 75 Mache Learg Kerels Kerel smlart betee pars of obects Kerels ca be defed for more comple obects: Strgs Graphs Images CS 75 Mache Learg 8