Pattern Recognition 2014 Support Vector Machines

Similar documents
IAML: Support Vector Machines

Support-Vector Machines

Pattern Recognition 2018 Support Vector Machines

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

The blessing of dimensionality for kernel methods

COMP 551 Applied Machine Learning Lecture 4: Linear classification

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Support Vector Machines and Flexible Discriminants

Linear programming III

What is Statistical Learning?

Tree Structured Classifier

Part 3 Introduction to statistical classification techniques

Contents. This is page i Printer: Opaque this

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Support Vector Machines and Flexible Discriminants

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES

Chapter 3 Kinematics in Two Dimensions; Vectors

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Computational modeling techniques

Homology groups of disks with holes

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1

SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification

Elements of Machine Intelligence - I

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Introduction: A Generalized approach for computing the trajectories associated with the Newtonian N Body Problem

Chapter 3: Cluster Analysis

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

Physical Layer: Outline

ENGI 4430 Parametric Vector Functions Page 2-01

ChE 471: LECTURE 4 Fall 2003

Section 6-2: Simplex Method: Maximization with Problem Constraints of the Form ~

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

Pre-Calculus Individual Test 2017 February Regional

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lecture 8: Multiclass Classification (I)

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

Smoothing, penalized least squares and splines

Linear Classification

Margin Distribution and Learning Algorithms

Kernel Methods and Support Vector Machines

NUMBERS, MATHEMATICS AND EQUATIONS

Determining Optimum Path in Synthesis of Organic Compounds using Branch and Bound Algorithm

Chapter 2 GAUSS LAW Recommended Problems:

x x

Sequential Allocation with Minimal Switching

Lecture 3: Principal Components Analysis (PCA)

Differentiation Applications 1: Related Rates

Equilibrium of Stress

Support Vector Machine (continued)

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium

Math Foundations 20 Work Plan

An Introduction to Complex Numbers - A Complex Solution to a Simple Problem ( If i didn t exist, it would be necessary invent me.

COMP9444 Neural Networks and Deep Learning 3. Backpropagation

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cambridge Assessment International Education Cambridge Ordinary Level. Published

The Solution Path of the Slab Support Vector Machine

You need to be able to define the following terms and answer basic questions about them:

Part One: Heat Changes and Thermochemistry. This aspect of Thermodynamics was dealt with in Chapter 6. (Review)

Math 105: Review for Exam I - Solutions

Assessment Primer: Writing Instructional Objectives

, which yields. where z1. and z2

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

**DO NOT ONLY RELY ON THIS STUDY GUIDE!!!**

Coalition Formation and Data Envelopment Analysis

Support Vector Machine

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

Dead-beat controller design

Kinetic Model Completeness

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

The Electromagnetic Form of the Dirac Electron Theory

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

ANSWER KEY FOR MATH 10 SAMPLE EXAMINATION. Instructions: If asked to label the axes please use real world (contextual) labels

A Matrix Representation of Panel Data

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

Max Margin-Classifier

1 The limitations of Hartree Fock approximation

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Chapter 5: Diffusion (2)

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

Transcription:

Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55

Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft Margin) 4 SVM s in R. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 2 / 55

Linear Classifier fr tw classes Linear mdel y() = w φ() + b (7.1) with t n { 1, +1}. Predict t 0 = +1 if y( 0 ) 0 and t 0 = 1 therwise. The decisin bundary is given by y() = 0. This is a linear classifier in feature space φ(). Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 3 / 55

Mapping φ y() = w φ() + b = 0 φ maps int higher dimensinal space where data is linearly separable. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 4 / 55

Data linearly separable Assume training data is linearly separable in feature space, s there is at least ne chice f w, b such that: 1 y( n ) > 0 fr t n = +1; 2 y( n ) < 0 fr t n = 1; that is, all training pints are classified crrectly. Putting 1. and 2. tgether: t n y( n ) > 0 n = 1,..., N Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 5 / 55

Maimum Margin There may be many slutins that separate the classes eactly. Which ne gives smallest predictin errr? SVM chses line with maimal margin, where the margin is the distance between the line and the clsest data pint. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 6 / 55

Tw-class training data Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 7 / 55

Many Linear Separatrs Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 8 / 55

Decisin Bundary Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 9 / 55

Maimize Margin Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 10 / 55

Supprt Vectrs Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 11 / 55

Weight vectr is rthgnal t the decisin bundary Cnsider tw pints A and B bth f which lie n the decisin surface. Because y( A ) = y( B ) = 0, we have (w A + b) (w B + b) = w ( A B ) = 0 and s the vectr w is rthgnal t the decisin surface. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 12 / 55

Distance f a pint t a line 2 y() = w + b = 0 r w 1 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 13 / 55

Distance t decisin surface (φ() = ) We have w = + r w. (4.6) where w w is the unit vectr in the directin f w, is the rthgnal prjectin f nt the line y() = 0, and r is the (signed) distance f t the line. Multiply (4.6) left and right by w and add b: w + b }{{} y() = w + b }{{} 0 +r w w w S we get r = y() w w 2 = y() w (4.7) Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 14 / 55

Distance f a pint t a line The signed distance f n t the decisin bundary is r = y( n) w Fr lines that separate the data perfectly, we have t n y( n ) = y( n ), s that the distance is given by t n y( n ) w = t n(w φ( n ) + b) w (7.2) Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 15 / 55

Maimum margin slutin Slve { } 1 arg ma w,b w min[t n(w φ( n ) + b)]. (7.3) n 1 Since w des nt depend n n, it can be mved utside f the minimizatin. Direct slutin f this prblem wuld be rather cmple. A mre cnvenient representatin is pssible. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 16 / 55

Cannical Representatin The hyperplane (decisin bundary) is defined by Then als w φ() + b = 0 κ(w φ() + b) = κw φ() + κb = 0 s rescaling w κw and b κb gives just anther representatin f the same decisin bundary. Chse scaling factr such that fr the pint i clsest t the decisin bundary. t i (w φ( i ) + b) = 1 (7.4) Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 17 / 55

Cannical Representatin (square=1,circle= 1) y() = 1 y() = 0 y() = 1 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 18 / 55

Cannical Representatin In this case we have Quadratic prgram subject t the cnstraints (7.5). t n (w φ( n ) + b) 1 n = 1,..., N (7.5) arg min w,b 1 2 w 2 (7.6) This ptimizatin prblem has a unique glbal minimum. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 19 / 55

Lagrangian Functin Intrduce Lagrange multipliers a n 0 t get Lagrangian functin L(w, b, a) = 1 N 2 w 2 a n {t n (w φ( n ) + b) 1} (7.7) n=1 with L(w, b, a) w N = w a n t n φ( n ) n=1 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 20 / 55

Lagrangian Functin and fr b: L(w, b, a) b = N a n t n n=1 Equating the derivatives t zer yields the cnditins: w = N a n t n φ( n ) (7.8) n=1 and N a n t n = 0 (7.9) n=1 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 21 / 55

Dual Representatin Eliminating w and b frm L(w, b, a) gives the dual representatin. L(w, b, a) = 1 N 2 w 2 a n {t n (w φ( n ) + b) 1} n=1 = 1 N N N 2 w 2 a n t n w φ( n ) b a n t n + = 1 2 n=1 n=1 n=1 N N a n a m t n t m φ( n ) φ( m ) n=1 m=1 N N N a n t n a m t m φ( n ) φ( m ) + = n=1 m=1 N a n 1 2 n=1 n=1 a n N N a n t n a m t m φ( n ) φ( m ) n=1 m=1 a n Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 22 / 55

Dual Representatin Maimize L(a) = N a n 1 N a n t n a m t m φ( n ) φ( m ) (7.10) 2 n=1 n,m=1 with respect t a and subject t the cnstraints a n 0, n = 1,..., N (7.11) N a n t n = 0. (7.12) n=1 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 23 / 55

Kernel Functin We map t a high-dimensinal space φ() in which data is linearly separable. Perfrming cmputatins in this high-dimensinal space may be very epensive. Use a kernel functin k that cmputes a dt prduct in this space (withut making the actual mapping): k(, ) = φ() φ( ) Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 24 / 55

Eample: plynmial kernel Suppse IR 3 and φ() IR 10 with φ() = (1, 2 1, 2 2, 2 3, 1 2, 2 2, 3 2, 2 1 2, 2 1 3, 2 2 3 ) Then φ() φ(z) = 1 + 2 1 z 1 + 2 2 z 2 + 2 3 z 3 + 1 2 z1 2 + 2 2 z2 2 + 3 2 z3 2 + 2 1 2 z 1 z 2 + 2 1 3 z 1 z 3 + 2 2 3 z 2 z 3 But this can be written as (1 + z) 2 = (1 + 1 z 1 + 2 z 2 + 3 z 3 ) 2 which csts much less peratins t cmpute. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 25 / 55

Plynmial kernel: numeric eample Suppse = (3, 2, 6) and z = (4, 1, 5). Then φ() = (1, 3 2, 2 2, 6 2, 9, 4, 36, 6 2, 18 2, 12 2) φ(z) = (1, 4 2, 1 2, 5 2, 16, 1, 25, 4 2, 20 2, 5 2) Then φ() φ(z) = 1 + 24 + 4 + 60 + 144 + 4 + 900 + 48 + 720 + 120 = 2025. But (1 + z) 2 = (1 + (3)(4) + (2)(1) + (6)(5)) 2 = 45 2 = 2025 is a mre efficient way t cmpute this dt prduct. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 26 / 55

Kernels Linear kernel k(, ) = Tw ppular nn-linear kernels are the plynmial kernel k(, ) = ( + c) M and Gaussian (r radial) kernel k(, ) = ep( 2 /2σ 2 ), (6.23) r where γ = 1 2σ 2. k(, ) = ep( γ 2 ), Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 27 / 55

Dual Representatin with kernels Using k(, ) = φ() φ( ) we get dual representatin: Maimize L(a) = N a n 1 N a n t n a m t m k( n, m ) (7.10) 2 n=1 n,m=1 with respect t a and subject t the cnstraints a n 0, n = 1,..., N (7.11) N a n t n = 0. (7.12) n=1 Is this dual easier than the riginal prblem? Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 28 / 55

Predictin Recall that Substituting int (7.1), we get y() = w φ() + b (7.1) N w = a n t n φ( n ) (7.8) n=1 N y() = b + a n t n k(, n ) (7.13) n=1 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 29 / 55

Predictin: supprt vectrs KKT cnditins: a n 0 (7.14) t n y( n ) 1 0 (7.15) a n {t n y( n ) 1} = 0 (7.16) Frm (7.16) it fllws that fr every data pint, either 1 a n = 0, r 2 t n y( n ) = 1. The frmer play n rle in making predictins (see 7.13), and the latter are the supprt vectrs that lie n the maimum margin hyper planes. Only the supprt vectrs play a rle in predicting the class f new attribute vectrs! Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 30 / 55

Predictin: cmputing b Since fr any supprt vectr n we have t n y( n ) = 1, we can use (7.13) t get ( t n b + ) a m t m k( n, m ) = 1, (7.17) m S where S dentes the set f supprt vectrs. Hence we have t n b + t n a m t m k( n, m ) = 1 m S t n b = 1 t n a m t m k( n, m ) m S b = t n m S a m t m k( n, m ) (7.17a) since t n { 1, +1} and s 1/t n = t n. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 31 / 55

Predictin: cmputing b A numerically mre stable slutin is btained by averaging (7.17a) ver all supprt vectrs: ( b = 1 t n ) a m t m k( n, m ) (7.18) N S m S n S Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 32 / 55

Predictin: Eample We receive the fllwing utput frm the ptimizatin sftware fr fitting a supprt vectr machine with linear kernel and perfect separatin f the training data: n n,1 n,2 t n a n 1 2 2 1 0 2 1 3 1 1 3 3 1 1 1 4 3 6 +1 0 9 5 4 4 +1 8 6 6 5 +1 0 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 33 / 55

Predictin: Eample The figure belw is a plt f the same data set, where the dts represent pints with class 1, and the crsses pints with class +1. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 34 / 55

Predictin: Eample (a) Cmpute the value f the SVM bias term b. Data pints with a > 0 are supprt vectrs. Let s take the pint 1 = 4, 2 = 4 with class label +1: b = t m N n=1 [ ] [ ] [ ] a n t n 1 3 m n = 1 + [4 4] + [4 4] 9 4 3 1 8 [4 4] = 3 4 (b) Which class des the SVM predict fr the data pint 1 = 5, 2 = 2? y() = b + N n=1 [ ] [ ] [ ] a n t n 1 3 n = 3 [5 2] [5 2] + 9 4 3 1 8 [5 2] = 1 4 2 Since the sign is psitive, we predict class +1. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 35 / 55

Predictin: Eample Decisin bundary and supprt vectrs. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 36 / 55

Allwing Errrs S far we assumed that the training data pints are linearly separable in feature space φ(). Resulting SVM gives eact separatin f training data in riginal input space, with nn-linear decisin bundary. Class distributins typically verlap, in which case eact separatin f the training data leads t pr generalizatin (verfitting). Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 37 / 55

Allwing Errrs Data pints are allwed t be n the wrng side f the margin bundary, but with a penalty that increases with the distance frm that bundary. Fr cnvenience we make this penalty a linear functin f the distance t the margin bundary. Intrduce slack variables ξ n 0 with ne slack variable fr each training data pint. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 38 / 55

Definitin f Slack Variables We define ξ n = 0 fr data pints that are n the inside f the crrect margin bundary and ξ n = t n y( n ) fr all ther data pints. ξ = 0 ξ = 0 ξ < 1 ξ > 1 y() = 1 y() = 0 y() = 1 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 39 / 55

New Cnstraints The eact classificatin cnstraints t n y( n ) 1 n = 1,..., N (7.5) are replaced by t n y( n ) 1 ξ n n = 1,..., N (7.20) Check (7.20): ξ n = 0 fr data pints that are n the inside f the crrect margin bundary. In that case y n t n 1. Suppse t n = +1 and n the wrng side f the margin bundary, i.e. y n t n < 1. Since y n = y n t n, we have and therefre t n y n = 1 ξ n. Suppse t = 1... ξ n = t n y n = 1 y n t n = 1 y n t n Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 40 / 55

New bjective functin Our gal is t maimize the margin while sftly penalizing pints that lie n the wrng side f the margin bundary. We therefre minimize C N ξ n + 1 2 w 2 (7.21) n=1 where the parameter C > 0 cntrlls the trade-ff between the slack variable penalty and the margin. Alternative view (divide by C and put λ = 1 2C : N n=1 ξ n + λ w 2 i First term represents lack-f-fit (hinge lss) and secnd term takes care f regularizatin. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 41 / 55

Optimizatin Prblem The Lagrangian is given by L(w, b, a) = 1 N N N 2 w 2 + C ξ n a n {t n y( n ) 1 + ξ n } µ n ξ n (7.22) n=1 n=1 n=1 where a n 0 and µ n 0 are Lagrange multipliers. The KKT cnditins are given by: a n 0 (7.23) t n y( n ) 1 + ξ n 0 (7.24) a n (t n y( n ) 1 + ξ n ) = 0 (7.25) µ n 0 (7.26) ξ n 0 (7.27) µ n ξ n = 0 (7.28) Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 42 / 55

Dual Take derivative with respect t w, b and ξ n and equate t zer: L N w = 0 w = a n t n φ( n ) (7.29) n=1 L N b = 0 a n t n = 0 (7.30) n=1 L ξ n = 0 a n = C µ n (7.31) Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 43 / 55

Dual Using these t eliminate w, b and ξ n frm the Lagrangian, we btain the dual Lagrangian: Maimize L(a) = N a n 1 2 n=1 N n,m=1 with respect t a and subject t the cnstraints a n t n a m t m k( n, m ) (7.32) 0 a n C, n = 1,..., N (7.33) N a n t n = 0. (7.34) n=1 Nte: we have a n C since µ n 0 (7.26) and a n = C µ n (7.31). Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 44 / 55

Predictin Recall that y() = w φ() + b (7.1) Substituting N w = a n t n φ( n ) (7.8) int (7.1), we get y() = with k(, n ) = φ() φ( n ). n=1 N a n t n k(, n ) + b (7.13) n=1 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 45 / 55

Interpretatin f Slutin We distinguish tw cases: Pints with a n = 0 d nt play a rle in making predictins. Pints with a n > 0 are called supprt vectrs. It fllws frm KKT cnditin a n (t n y( n ) 1 + ξ n ) = 0 (7.25) that fr these pints t n y n = 1 ξ n Again we have tw cases: If a n < C then µ n > 0, because a n = C µ n. Since µ n ξ n = 0 (7.28), it fllws that ξ n = 0 and hence such pints lie n the margin. Pints with a n = C can be n the margin r inside the margin and can either be crrectly classified if ξ n 1 r misclassified if ξ n > 1. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 46 / 55

Cmputing the intercept T cmpute the value f b, we use the fact that thse supprt vectrs with 0 < a n < C have ξ n = 0 s that t n y( n ) = 1, s like befre we have b = t n m S a m t m k( n, m ) (7.17a) Again a numerically mre stable slutin is btained by averaging (7.17a) ver all data pints having 0 < a n < C: ( b = 1 t n ) a m t m k( n, m ) (7.37) N M m S n M Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 47 / 55

Mdel Selectin As usual we are cnfrnted with the prblem f selecting the apprpriate mdel cmpleity. The relevant parameters are C and any parameters f the chsen kernel functin. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 48 / 55

Hw t in R > cnn.svm.lin <- svm(cause sdium + c2, data=cnn.dat,kernel="linear") > plt(cnn.svm.lin,cnn.dat) > cnn.svm.lin.predict <- predict(cnn.svm.lin,cnn.dat[,1:2]) > table(cnn.dat[,3],cnn.svm.lin.predict) cnn.svm.lin.predict 0 1 0 17 3 1 2 8 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 49 / 55

Cnn s syndrme: linear kernel SVM classificatin plt 146 sdium 144 142 140 0 1 138 22 24 26 28 30 32 c2 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 50 / 55

Hw t in R > cnn.svm.rad <- svm(cause sdium + c2, data=cnn.dat) > plt(cnn.svm.rad,cnn.dat) > cnn.svm.rad.predict <- predict(cnn.svm.rad,cnn.dat[,1:2]) > table(cnn.dat[,3],cnn.svm.rad.predict) cnn.svm.rad.predict 0 1 0 17 3 1 2 8 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 51 / 55

Cnn s syndrme: radial kernel, C = 1 SVM classificatin plt 146 sdium 144 142 140 0 1 138 22 24 26 28 30 32 c2 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 52 / 55

Hw t in R > cnn.svm.rad <- svm(cause sdium + c2, data=cnn.dat,cst=100) > plt(cnn.svm.rad,cnn.dat) > cnn.svm.rad.predict <- predict(cnn.svm.rad,cnn.dat[,1:2]) > table(cnn.dat[,3],cnn.svm.rad.predict) cnn.svm.rad.predict 0 1 0 19 1 1 1 9 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 53 / 55

Cnn s syndrme: radial kernel, C = 100 SVM classificatin plt 146 sdium 144 142 140 0 1 138 22 24 26 28 30 32 c2 Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 54 / 55

SVM in R LIBSVM is available in package e1071 in R. It can als perfrm regressin and nn-binary classificatin. Nn-binary classificatin is perfrmed as fllws: Train K(K 1)/2 binary SVM s n all pssible pairs f classes. T classify a new pint, let it be classified by every binary SVM, and pick the class with the highest number f vtes. This is dne autmatically by functin svm in e1071. Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 55 / 55