Linear Classification, SVMs and Nearest Neighbors

Similar documents
Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines

Kernel Methods and SVMs Extension

Support Vector Machines

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Support Vector Machines

Intro to Visual Recognition

Support Vector Machines CS434

Chapter 6 Support vector machine. Séparateurs à vaste marge

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Support Vector Machines CS434

Which Separator? Spring 1

Kristin P. Bennett. Rensselaer Polytechnic Institute

Lecture 3: Dual problems and Kernels

Support Vector Machines

10-701/ Machine Learning, Fall 2005 Homework 3

Support Vector Machines

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Natural Language Processing and Information Retrieval

FMA901F: Machine Learning Lecture 5: Support Vector Machines. Cristian Sminchisescu

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

18-660: Numerical Methods for Engineering Design and Optimization

UVA CS / Introduc8on to Machine Learning and Data Mining

Ensemble Methods: Boosting

Lecture 10 Support Vector Machines. Oct

Lagrange Multipliers Kernel Trick

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Pattern Classification

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Lecture 10 Support Vector Machines II

Nonlinear Classifiers II

Lecture Notes on Linear Regression

CSE 252C: Computer Vision III

17 Support Vector Machines

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

Lecture 6: Support Vector Machines

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Multilayer Perceptron (MLP)

Logistic Classifier CISC 5800 Professor Daniel Leeds

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

Maximal Margin Classifier

Support Vector Novelty Detection

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Advanced Introduction to Machine Learning

Pairwise Multi-classification Support Vector Machines: Quadratic Programming (QP-P A MSVM) formulations

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Support Vector Machines. Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University 2012

Machine Learning: and 15781, 2003 Assignment 4

Statistical machine learning and its application to neonatal seizure detection

Boostrapaggregating (Bagging)

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Maximum Likelihood Estimation (MLE)

Video Data Analysis. Video Data Analysis, B-IT

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Lecture 11 SVM cont

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

By : Moataz Al-Haj. Vision Topics Seminar (University of Haifa) Supervised by Dr. Hagit Hel-Or

Learning Theory: Lecture Notes

Recap: the SVM problem

Calculation of time complexity (3%)

A NEW ALGORITHM FOR FINDING THE MINIMUM DISTANCE BETWEEN TWO CONVEX HULLS. Dougsoo Kaown, B.Sc., M.Sc. Dissertation Prepared for the Degree of

SVM Tutorial: Classification, Regression, and Ranking

Week 5: Neural Networks

VQ widely used in coding speech, image, and video

Online Classification: Perceptron and Winnow

Kernel Methods and SVMs

Evaluation of classifiers MLPs

Feature Selection: Part 1

Lecture 12: Classification

FORECASTING EXCHANGE RATE USING SUPPORT VECTOR MACHINES

Classification as a Regression Problem

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

The exam is closed book, closed notes except your one-page cheat sheet.

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

CSE 546 Midterm Exam, Fall 2014(with Solution)

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Classification learning II

CSC 411 / CSC D11 / CSC C11

Linear Feature Engineering 11

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

CSCI B609: Foundations of Data Science

Clustering & Unsupervised Learning

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

MULTICLASS LEAST SQUARES AUTO-CORRELATION WAVELET SUPPORT VECTOR MACHINES. Yongzhong Xing, Xiaobei Wu and Zhiliang Xu

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Differentiating Gaussian Processes

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

Clustering & (Ken Kreutz-Delgado) UCSD

Generative classification models

What would be a reasonable choice of the quantization step Δ?

Statistical Foundations of Pattern Recognition

Machine learning and pattern recognition Part 2: Classifiers

Homework Assignment 3 Due in class, Thursday October 15

Learning from Data 1 Naive Bayes

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Class Administrivia Motivated Examples

Transcription:

1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush between faces and other obects? 2

Feature 2 2 Bnary Classfcaton: Example Faces (class C 1 ) Non-faces (class C 2 ) Feature 1 How do we classfy new data ponts? Bnary Classfcaton: Lnear Classfers x 2 g(x) > 0 g(x) = 0 g(x) < 0 C 1 C 2 Fnd a lne (n general, a hyperplane) separatng the two sets of data ponts: g(x) = w x + b = 0,.e., w 1 x 1 + w 2 x 2 + b = 0 For any new pont x, choose: x 1 class C 1 f g(x) > 0 and class C 2 otherwse

3 Separatng Hyperplane Class 1 w x b 0 denotes +1 output denotes -1 output Class 2 Need to choose w and b based on tranng data 5 Separatng Hyperplanes Dfferent choces of w and b gve dfferent hyperplanes Class 1 Class 2 denotes +1 output denotes -1 output (Ths and next few sldes adapted from Andrew Moore s) 6

4 Whch hyperplane s best? Class 1 denotes +1 output denotes -1 output Class 2 7 How about the one rght n the mddle? Intutvely, ths boundary seems good Avods msclassfcaton of new test ponts f they are generated from the same dstrbuton as tranng ponts 8

5 Margn Defne the margn of a lnear classfer as the wdth that the boundary could be ncreased by before httng a datapont. 9 Maxmum Margn and Support Vector Machne Support Vectors are those dataponts that the margn pushes up aganst The maxmum margn classfer s called a Support Vector Machne (n ths case, a Lnear SVM or LSVM) 10

6 Why Maxmum Margn? Robust to small perturbatons of data ponts near boundary There exsts theory showng ths s best for generalzaton to new ponts Emprcally works great 11 Margn Fndng the Maxmum Margn (For Math Lovers Eyes Only) Can show that we need to maxmze: 2/ w subect to y x w b 1, Constraned optmzaton problem that leads to: w a y x where the a are obtaned by maxmzng: 1 a aa y y ( x x ) 2, subect toa 0 and a y 0 Quadratc programmng (QP) problem - A global maxmum can always be found Depends on dot product of nputs (Interested n more detals? see Burges SVM tutoral onlne) 12

7 What f data s not lnearly separable? Outlers (due to nose) 13 Soft Margn SVMs ξ Allow errors ξ (devatons from margn) Trade off margn wth errors Mnmze: 1 2 2 y w C w x b 1 and 0, subect to: 14

8 Another Example Not lnearly separable 15 Handlng non-lnearly separable data Idea: Map orgnal nput space to hgherdmensonal feature space; use lnear classfer n hgher-dm. space x φ(x) 16

9 Problem: Hgh dmensonal spaces x φ(x) Computaton n hgh-dmensonal feature space s costly The hgh dmensonal proecton functon φ(x) may be too complcated to compute Kernel trck to the rescue! 17 Recall: SVM maxmzes the quadratc functon: 1 a aa y y ( x x ) 2, subect toa 0 and Insght: The Kernel Trck a y 0 The data ponts only appear as dot product No need to compute hgh-dmensonal φ(x) explctly! Just replace nner product x x wth a kernel functon K(x,x ) = φ(x ) φ(x ) E.g., Gaussan kernel K(x,x ) = exp(- x -x 2 /2 2 ) E.g., Polynomal kernel K(x,x ) = x x +1) d 18

10 Example of the Kernel Trck Suppose f(.) s gven as follows: Dot product n the feature space s So, f we defne the kernel functon as follows, there s no need to compute f(.) explctly Use of kernel functon to avod computng f(.) explctly s known as the kernel trck 19 Face Detecton usng SVMs Kernel used: Polynomal of degree 2 (Osuna, Freund, Gros, 1998) 20

11 Support Vectors 21 K-Nearest Neghbors Idea: Do as your neghbors do! Classfy a new data-pont accordng to a maorty vote of your k nearest neghbors How do you measure near? x dscrete (e.g., strngs): Hammng dstance d(x 1,x 2 ) = # features on whch x 1 and x 2 dffer x contnuous (e.g., mages): Eucldean dstance d(x 1,x 2 ) = x 1 -x 2 = square root of sum of squared dfferences between correspondng elements of data vectors 22

12 Example Input Data: 2-D ponts (x 1,x 2 ) Two classes: C 1 and C 2. New Data Pont + K = 4: Look at 4 nearest neghbors. 3 are n C 1, so classfy + as C 1 23 K-NN produces a Nonlnear Decson Boundary Some ponts near the boundary may be msclassfed (but perhaps okay because of nose) 24

13 Next Tme Regresson (Learnng functons wth contnuous outputs) Lnear Regresson Neural Networks To Do: Proect 4 Read Chapter 18 25