Lecture 10 Support Vector Machines. Oct

Similar documents
Support Vector Machines CS434

Support Vector Machines CS434

Which Separator? Spring 1

Lecture 10 Support Vector Machines II

Maximal Margin Classifier

Kernel Methods and SVMs Extension

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

18-660: Numerical Methods for Engineering Design and Optimization

Lecture Notes on Linear Regression

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Support Vector Machines

Support Vector Machines

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Support Vector Machines

Linear Classification, SVMs and Nearest Neighbors

Support Vector Machines

Lecture 3: Dual problems and Kernels

CSC 411 / CSC D11 / CSC C11

Ensemble Methods: Boosting

PHYS 705: Classical Mechanics. Calculus of Variations II

COS 521: Advanced Algorithms Game Theory and Linear Programming

Feature Selection: Part 1

Multilayer Perceptron (MLP)

17 Support Vector Machines

Advanced Introduction to Machine Learning

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Chapter 6 Support vector machine. Séparateurs à vaste marge

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Assortment Optimization under MNL

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Pattern Classification

10-701/ Machine Learning, Fall 2005 Homework 3

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Semi-Supervised Learning

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Online Classification: Perceptron and Winnow

The Expectation-Maximization Algorithm

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Support Vector Machines

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Generalized Linear Methods

APPENDIX A Some Linear Algebra

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

1 Convex Optimization

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Linear Feature Engineering 11

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Multilayer neural networks

On the Multicriteria Integer Network Flow Problem

Lecture 6: Support Vector Machines

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Kristin P. Bennett. Rensselaer Polytechnic Institute

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Lecture 4. Instructor: Haipeng Luo

Math 217 Fall 2013 Homework 2 Solutions

6.854J / J Advanced Algorithms Fall 2008

Report on Image warping

Linear Approximation with Regularization and Moving Least Squares

Natural Language Processing and Information Retrieval

STATIC OPTIMIZATION: BASICS

Kernel Methods and SVMs

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

14 Lagrange Multipliers

SELECTED SOLUTIONS, SECTION (Weak duality) Prove that the primal and dual values p and d defined by equations (4.3.2) and (4.3.3) satisfy p d.

Evaluation of classifiers MLPs

Machine learning: Density estimation

Société de Calcul Mathématique SA

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Maximum Likelihood Estimation (MLE)

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

15 Lagrange Multipliers

CSE 252C: Computer Vision III

Section 8.3 Polar Form of Complex Numbers

Lagrange Multipliers Kernel Trick

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Intro to Visual Recognition

3.1 ML and Empirical Distribution

The Experts/Multiplicative Weights Algorithm and Applications

Some modelling aspects for the Matlab implementation of MMA

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Lecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n +

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Laboratory 3: Method of Least Squares

Chapter Newton s Method

e i is a random error

Fisher Linear Discriminant Analysis

Lecture 17: Lee-Sidford Barrier

Machine Learning: and 15781, 2003 Assignment 4

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Complex Numbers, Signals, and Circuits

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Poisson brackets and canonical transformations

The Geometry of Logit and Probit

Transcription:

Lecture 10 Support Vector Machnes Oct - 20-2008

Lnear Separators Whch of the lnear separators s optmal?

Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron algorthm depends on a concept called margn

Intuton of Margn Consder ponts A, B, and C We are qute confdent n our A w x b =0 predcton for A because t s w x b > 0 far from the decson B boundary. In contrast, we are not so confdent n our predcton for w x b < 0 C because a slght change n C the decson boundary may flp the decson. Gven a tranng set, we would lke to make all of our predctons correct and confdent! Ths can be captured by the concept of margn

Functonal Margn One possble way to defne margn: We defne ths as the functonal margn of the lnear classfer w.r.t tranng example ( x, y ) The large the value, the better really? What f we rescale (w, b) by a factor α, consder the lnear classfer specfed by (αw, αb) Decson boundary reman the same Yet, functonal margn gets multpled by α We can change the functonal margn of a lnear classfer wthout changng anythng meanngful We need somethng more meanngful

What we really want A w x b =0 B C We want the dstances between the examples and the decson boundary to be large ths quantty s what we call geometrc margn But how do we compute the geometrc margn of a data pont w.r.t a partcular lne (parameterzed by w and b)?

Some basc facts about lnes w x b = 0 X 1? 1 w x w b

Geometrc Margn The geometrc margn of (w, b) w.r.t. x () s the dstance from x () to the decson surface B Ths dstance can be computed as y ( w x b) γ = w C A γ A Gven tranng set S={(x, y ): =1,, }, the geometrc margn of the classfer w.r.t. S s γ = mn γ = 1L ote that the ponts closest to the boundary are called the support ote that the ponts closest to the boundary are called the support vectors n fact these are the only ponts that really matters, other examples are gnorable ( )

What we have done so far We have establshed that we want to fnd a lnear decson boundary whose margn s the largest We know how to measure the margn of a lnear decson boundary ow what? We have a new learnng objectve Gven a lnearly separable (wll be relaxed later) tranng set S={(x, y ): =1,, }, we would lke to fnd a lnear classfer (w, b) wth maxmum margn.

Maxmum Margn Classfer Ths can be represented as a constraned optmzaton problem. maxγ w, b subject to : () () ( w x b) y () γ, = 1, L, w Ths optmzaton problem s n a nasty form, so we need to do some rewrtng Let γ = γ w, we can rewrte ths as max w, b γ ' w subject to : y ( w x b) γ ', = 1, L,

Maxmum Margn Classfer ote that we can arbtrarly rescale w and b to make the functonal margn γ ' large or small So we can rescale them such that =1 max γ ' w, b w subject to : y ( w x γ ' b) γ ', = 1, L, max w, b 1 w subject to : (or equvalently mn y ( w x w, b b) 1, w 2 ) = 1, L, Maxmzng the geometrc margn s equvalent to mnmzng the magntude of w subject to mantanng a functonal margn of at least 1

Solvng the Optmzaton Problem mn w, b subject to : y 1 2 w 2 ( w xx b ) 1, = 1, L, Ths results n a quadratc optmzaton problem wth lnear nequalty constrants. t Ths s a well-known class of mathematcal programmng problems for whch several (non-trval) algorthms exst. In practce, we can just regard the QP solver as a black-box box wthout botherng how t works You wll be spared of the excrucatng detals and jump to

The soluton We can not gve you a close form soluton that you can drectly yplug n the numbers and compute for an arbtrary data sets But, the soluton can always be wrtten n the followng form w = α y x, s.t. α y = 0 = 1 Ths s the form of w, b can be calculated accordngly usng some addtonal steps The weght vector s a lnear combnaton of all the tranng examples Importantly, many of the α s are zeros These ponts that have non-zero α s are the support vectors = 1

A Geometrcal Interpretaton Class 2 α 8 =0.6 α 10=0 α 5 =0 α 7 =0 α 2 =0 α 4 =0 α 9 =0 Class 1 α 6 =1.4 6 α 3 =0 α 1 =0.8

A few mportant notes regardng the geometrc nterpretaton gves the decson boundary postve support vectors le on ths lne negatve support vectors le on ths lne We can thnk of a decson boundary now as a tube of certan wdth, no ponts can be nsde the tube Learnng nvolves adjustng the locaton and orentaton t of the tube to fnd the largest fttng tube for the gven tranng set