CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Similar documents
Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Neural networks and support vector machines

COMS 4771 Introduction to Machine Learning. Nakul Verma

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Support Vector Machine (SVM) and Kernel Methods

Review: Support vector machines. Machine learning techniques and image analysis

Support Vector Machine & Its Applications

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machine (continued)

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS798: Selected topics in Machine Learning

Introduction to Support Vector Machines

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Support Vector Machines.

Support Vector Machine. Industrial AI Lab.

Linear Classification and SVM. Dr. Xin Zhang

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Linear & nonlinear classifiers

Support Vector Machine (SVM) and Kernel Methods

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Support Vector Machines

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

ML (cont.): SUPPORT VECTOR MACHINES

Jeff Howbert Introduction to Machine Learning Winter

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

SUPPORT VECTOR MACHINE

Linear & nonlinear classifiers

Kernel Methods and Support Vector Machines

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Support Vector Machines

Pattern Recognition 2018 Support Vector Machines

Introduction to Discriminative Machine Learning

Support Vector Machines

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Support Vector Machines. Maximizing the Margin

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Lecture 10: Support Vector Machine and Large Margin Classifier

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Support Vector Machines

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kaggle.

Perceptron Revisited: Linear Separators. Support Vector Machines

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

A Discriminatively Trained, Multiscale, Deformable Part Model

L5 Support Vector Classification

Statistical Pattern Recognition

Lecture 10: A brief introduction to Support Vector Machine

CS145: INTRODUCTION TO DATA MINING

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Support Vector Machines and Kernel Methods

SVMs: nonlinearity through kernels

Machine Learning Practice Page 2 of 2 10/28/13

PAC-learning, VC Dimension and Margin-based Bounds

SVMs, Duality and the Kernel Trick

(Kernels +) Support Vector Machines

Midterm exam CS 189/289, Fall 2015

18.9 SUPPORT VECTOR MACHINES

Kernels and the Kernel Trick. Machine Learning Fall 2017

Support Vector Machines

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

A Tutorial on Support Vector Machine

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Machine Learning : Support Vector Machines

Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CSC 411 Lecture 17: Support Vector Machine

Support Vector Machines

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

PAC-learning, VC Dimension and Margin-based Bounds

The Perceptron Algorithm

Discriminative Models

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Sparse Kernel Machines - SVM

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Support Vector and Kernel Methods

Gaussian and Linear Discriminant Analysis; Multiclass Classification

EE 6882 Visual Search Engine

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Kernelized Perceptron Support Vector Machines

Discriminative Models

Support Vector Machines

Transcription:

CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines

Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several methods Nearest neighbors Boosting Support Vector Machines

Linear classifiers

Linear classifiers

Lines in R 2 Let w p q x x y px qy b 0

Lines in R 2 x, y 0 0 D w Let p w q x x y px qy b 0 w x b 0 Distance from point to line D px qy b 0 0 p q 2 2 wx w b

Linear classifiers Find linear function to separate positive and negative examples xi positive : xi w b 0 x negative : x w b 0 i i Which line is best?

Support Vector Machines (SVMs) Discriminative classifier based on optimal separating line (2D case) Maximize the margin between the positive and negative training examples

How to maximize the margin? x positive ( y 1) : x w b 1 i i i x negative ( y 1) : x w b 1 i i i For support vectors, x i w b 1 x Dist. between point and line: i wb w wxb 1 For support vectors: w w support vectors C. Burges, 1998 margin M M 1 1 2 w w w

Finding the maximum margin line 1. Maximize margin 2 w 2. Correctly classify all training data points: x positive ( y 1) : x w b 1 i i i x negative ( y 1) : x w b 1 i i i 3. Quadratic optimization problem: C. Burges, 1998 1 T Minimize ww 2 Subject to y ( x w ) 1 i i b

Finding the maximum margin line Solution: w i yixi i learned weight support vector The weights α i are non-zero only at support vectors C. Burges, 1998

Finding the maximum margin line Solution: w i yixi b y i w x i i i i i i w x b y x x b Classification function: f( x) sign w x b sign i ixi x b (for any support vector) if f(x) < 0, classify as negative if f(x) > 0, classify as positive C. Burges, 1998 Dot product only!

Questions What if the features are not 2d? What if the data is not linearly separable? What if we have more than just two categories?

Questions What if the features are not 2d? What if the data is not linearly separable? What if we have more than just two categories?

Questions What if the features are not 2d? Generalizes to d-dimensions Replace line with hyperplane

Person detection with HoG s & linear SVM s Map each grid cell in the input window to a histogram counting the gradients per orientation Train a linear SVM using training set of pedestrian vs. nonpedestrian windows Dalal and Triggs, CVPR 2005 Code: http://pascal.inrialpes.fr/soft/olt/

Questions What if the features are not 2d? What if the data is not linearly separable? What if we have more than just two categories?

Non-linear SVMs Datasets that are linearly separable with some noise work out great: 0 x

Non-linear SVMs But what are we going to do if the dataset is just too hard? 0 x

Non-linear SVMs How about mapping data to a higher-dimensional space: 0 x x 2 0 x

Non-linear SVMs How about mapping data to a higher-dimensional space: 0 x x 2 0 x

Non-linear SVMs: Feature spaces General idea: Original input space can be mapped to some higher-dimensional feature space Φ: x φ(x) Andrew Moore where the training set is easily separable

The kernel trick We saw linear classifier relies on dot product between vectors. Define: K x i, x j = x i x j = x i T x j

The kernel trick If every data point is mapped into high-dimensional space via some transformation Φ: x φ x the dot product becomes: K x i, x j = φ x i T φ x j A kernel function is a similarity function that corresponds to an inner product in some expanded feature space

E.g., 2D vectors x = [x 1 x 2 ] Let K x i, x j = 1 + x i T x j 2 Need to show that K x i, x j = φ x i T φ x j : K x i, x j = 1 + x T 2 i x j = 1 + x i1 2 x j1 2 + 2x i1 x j1 x i2 x j2 + x i2 2 x j2 2 + 2x xi1 x j1 + 2x i2 x j2 = 1 x i1 2 1 x j1 2 = φ x i T φ x j 2x i1 x i2 x i2 2 2x i1 2x j1 x j2 x j2 2 2x i2 T 2x j1 2x j2 where φ x = 1 x 1 2 2x 1 x 2 x 2 2 2x 1 2x 2 Andrew Moore

Nonlinear SVMs The kernel trick: Instead of explicitly computing the lifting transformation φ x, define a kernel function K: K x i, x j = φ x i T φ x j This gives a nonlinear decision boundary in the original feature space: i y ( x T x) b y K( x, x) b i i i i i i i

Examples of kernel functions Linear x T x i j i j K( x, x ) Number of dimensions: N (just size of x)

Examples of kernel functions Gaussian RBF K ( xi, x j) exp x i x 2 2 j 2 Number of dimensions: Infinite j 1 2 ( xx) 1 2 1 2 exp x x exp x exp x 2 j! 2 2 2 2 2 j0

Examples of kernel functions Histogram intersection K( x, x ) min( x ( k), x ( k)) i j i j k Number of dimensions: Large. See: Alexander C. Berg, Jitendra Malik, "Classification using intersection kernel support vector machines is efficient", CVPR, 2008

SVMs for recognition: Training 1. Define your representation 2. Select a kernel function 3. Compute pairwise kernel values between labeled examples 4. Use this kernel matrix to solve for SVM support vectors & weights

SVMs for recognition: Prediction To classify a new example: 1. Compute kernel values between new input and support vectors 2. Apply weights 3. Check sign of output

Learning gender with SVMs Learning Gender with Support Faces [Moghaddam and Yang, TPAMI 2002]

Face alignment processing Processed faces

Face gender classification by SVM Training examples: 1044 males, 713 females Images reduced to 21x12 (!!) Experiment with various kernels, select Gaussian RBF 2 xi x j K ( xi, x j) exp 2 2

Face gender classification by SVM Training examples: 1044 males, 713 females Images reduced to 21x12 (!!) Experiment with various kernels, select Gaussian RBF 2 x x j K ( x, x j ) exp 2 2

Support Faces

Moghaddam and Yang: Gender Classifier Performance

Gender perception experiment: How well can humans do? Subjects: 30 people (22 male, 8 female), ages mid-20 s to mid-40 s Test data: 254 face images (60% males, 40% females) Low res and high res versions Task: Classify as male or female, forced choice No time limit Moghaddam and Yang, Face & Gesture 2000

Human Performance Error Error

Careful how you do things?

Careful how you do things?

Human vs. Machine SVMs performed better than any single human test subject, at either resolution

Hardest examples for humans Moghaddam and Yang, Face & Gesture 2000

Questions What if the features are not 2d? What if the data is not linearly separable? What if we have more than just two categories?

Multi-class SVMs Combine a number of binary classifiers One vs. all Training: Learn an SVM for each class vs. the rest Testing: Apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value One vs. one Training: Learn an SVM for each pair of classes Testing: Each learned SVM votes for a class to be assigned to the test example

SVMs: Pros and cons Pros Many publicly available SVM packages: http://www.kernel-machines.org/software http://www.csie.ntu.edu.tw/~cjlin/libsvm/ Kernel-based framework is very powerful, flexible Often a sparse set of support vectors compact when testing Works very well in practice, even with very small training sample sizes Adapted from Lana Lazebnik

SVMs: Pros and cons Cons No direct multi-class SVM, must combine two-class SVMs Can be tricky to select best kernel function for a problem Nobody writes their own SVMs Computation, memory During training time, must compute matrix of kernel values for every pair of examples Learning can take a very long time for large-scale problems Adapted from Lana Lazebnik