Intro to Visual Recognition

Similar documents
Support Vector Machines

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Which Separator? Spring 1

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Linear Classification, SVMs and Nearest Neighbors

Kernel Methods and SVMs Extension

Support Vector Machines

Support Vector Machines

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Lecture 3: Dual problems and Kernels

Pattern Classification

Chapter 6 Support vector machine. Séparateurs à vaste marge

Natural Language Processing and Information Retrieval

18-660: Numerical Methods for Engineering Design and Optimization

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Ensemble Methods: Boosting

10-701/ Machine Learning, Fall 2005 Homework 3

Support Vector Machines CS434

Kristin P. Bennett. Rensselaer Polytechnic Institute

Support Vector Machines CS434

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 10: Classifica8on with Support Vector Machine (cont.

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Lagrange Multipliers Kernel Trick

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

Lecture 10 Support Vector Machines. Oct

Maximal Margin Classifier

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Support Vector Machines

Multilayer Perceptron (MLP)

Support Vector Machines

Advanced Introduction to Machine Learning

CSE 252C: Computer Vision III

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Lecture 10 Support Vector Machines II

Homework Assignment 3 Due in class, Thursday October 15

15-381: Artificial Intelligence. Regression and cross validation

CSC 411 / CSC D11 / CSC C11

Linear Feature Engineering 11

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Boostrapaggregating (Bagging)

Discriminative classifier: Logistic Regression. CS534-Machine Learning

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Generative classification models

Evaluation of classifiers MLPs

Lecture 6: Support Vector Machines

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

UVA CS / Introduc8on to Machine Learning and Data Mining

FMA901F: Machine Learning Lecture 5: Support Vector Machines. Cristian Sminchisescu

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Nonlinear Classifiers II

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Generalized Linear Methods

Classification as a Regression Problem

Recap: the SVM problem

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Multilayer neural networks

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

MULTICLASS LEAST SQUARES AUTO-CORRELATION WAVELET SUPPORT VECTOR MACHINES. Yongzhong Xing, Xiaobei Wu and Zhiliang Xu

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

17 Support Vector Machines

Structured Perceptrons & Structural SVMs

Regularized Discriminant Analysis for Face Recognition

1 Convex Optimization

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Lecture 12: Classification

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Support Vector Novelty Detection

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

The big picture. Outline

Evaluation of simple performance measures for tuning SVM hyperparameters

Statistical machine learning and its application to neonatal seizure detection

Statistical pattern recognition

VQ widely used in coding speech, image, and video

EEE 241: Linear Systems

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

Lecture Notes on Linear Regression

Lecture 10: Dimensionality reduction

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Learning with Tensor Representation

Maximum Likelihood Estimation (MLE)

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Pairwise Multi-classification Support Vector Machines: Quadratic Programming (QP-P A MSVM) formulations

Feature Selection: Part 1

Multi-layer neural networks

Evaluation for sets of classes

Classification learning II

Supporting Information

Non-linear Canonical Correlation Analysis Using a RBF Network

Regression Analysis. Regression Analysis

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Feature Selection & Dynamic Tracking F&P Textbook New: Ch 11, Old: Ch 17 Guido Gerig CS 6320, Spring 2013

SVM-Based Negative Data Mining to Binary Classification

FORECASTING EXCHANGE RATE USING SUPPORT VECTOR MACHINES

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Bounds on the Generalization Performance of Kernel Machines Ensembles

Transcription:

CS 2770: Computer Vson Intro to Vsual Recognton Prof. Adrana Kovashka Unversty of Pttsburgh February 13, 2018

Plan for today What s recognton? a.k.a. classfcaton, categorzaton Support vector machnes Separable case / non-separable case Lnear / non-lnear (kernels) The mportance of generalzaton The bas-varance trade-off (apples to all classfers)

Classfcaton Gven a feature representaton for mages, how do we learn a model for dstngushng features from dfferent classes? Decson boundary Zebra Non-zebra Slde credt: L. Lazebnk

Classfcaton Assgn nput vector to one of two or more classes Input space dvded nto decson regons separated by decson boundares Slde credt: L. Lazebnk

Eamples of mage classfcaton Two-class (bnary): Cat vs Dog Adapted from D. Hoem

Eamples of mage classfcaton Mult-class (often): Object recognton Caltech 101 Average Object Images Adapted from D. Hoem

Eamples of mage classfcaton Fne-graned recognton Vspeda Project Slde credt: D. Hoem

Eamples of mage classfcaton Place recognton Places Database [Zhou et al. NIPS 2014] Slde credt: D. Hoem

Eamples of mage classfcaton Materal recognton [Bell et al. CVPR 2015] Slde credt: D. Hoem

Eamples of mage classfcaton Datng hstorcal photos 1940 1953 1966 1977 [Palermo et al. ECCV 2012] Slde credt: D. Hoem

Eamples of mage classfcaton Image style recognton [Karayev et al. BMVC 2014] Slde credt: D. Hoem

Recognton: A machne learnng approach

The machne learnng framework Apply a predcton functon to a feature representaton of the mage to get the desred output: f( ) = apple f( ) = tomato f( ) = cow Slde credt: L. Lazebnk

The machne learnng framework y = f() output predcton functon mage / mage feature Tranng: gven a tranng set of labeled eamples {( 1,y 1 ),, ( N,y N )}, estmate the predcton functon f by mnmzng the predcton error on the tranng set Testng: apply f to a never before seen test eample and output the predcted value y = f() Slde credt: L. Lazebnk

The old-school way Tranng Tranng Images Image Features Tranng Labels Tranng Learned model Testng Test Image Image Features Learned model Predcton Slde credt: D. Hoem and L. Lazebnk

The smplest classfer Tranng eamples from class 1 Test eample Tranng eamples from class 2 f() = label of the tranng eample nearest to All we need s a dstance functon for our nputs No tranng requred! Slde credt: L. Lazebnk

K-Nearest Neghbors classfcaton For a new pont, fnd the k closest ponts from tranng data Labels of the k ponts vote to classfy Black = negatve Red = postve k = 5 If query lands here, the 5 NN consst of 3 negatves and 2 postves, so we classfy t as negatve. Slde credt: D. Lowe

m2gps: Estmatng Geographc Informaton from a Sngle Image James Hays and Alee Efros, CVPR 2008 Where was ths mage taken? Nearest Neghbors accordng to bag of SIFT + color hstogram + a few others Slde credt: James Hays

The Importance of Data Sldes: James Hays

Lnear classfer Fnd a lnear functon to separate the classes f() = sgn(w 1 1 + w 2 2 + + w D D ) = sgn(w ) Slde credt: L. Lazebnk

Lnear classfer Decson = sgn(w T ) = sgn(w1*1 + w2*2) 2 (0, 0) 1 What should the weghts be?

Lnes n R 2 Let w a c y a cy b 0 Krsten Grauman

Lnes n R 2 Let w a c y w a cy b 0 w b 0 Krsten Grauman

0, y 0 Lnes n R 2 Let w a c y w a cy b 0 w b 0 Krsten Grauman

0, y 0 D Lnes n R 2 Let w a c y w a cy b 0 w b 0 D Krsten Grauman a a 2 cy c 2 b w b w 0 0 dstance from pont to lne

0, y 0 D Lnes n R 2 Let w a c y w a cy b 0 w b 0 D Krsten Grauman a0 cy0 b w b 2 2 a c w dstance from pont to lne

Lnear classfers Fnd lnear functon to separate postve and negatve eamples postve negatve : : w w b b 0 0 Whch lne s best? C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

Support vector machnes Dscrmnatve classfer based on optmal separatng lne (for 2d case) Mamze the margn between the postve and negatve tranng eamples C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

Support vector machnes Want lne that mamzes the margn. postve negatve ( y ( y 1) : 1) : w b 1 w b 1 For support, vectors, w b 1 Support vectors Margn C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

Support vector machnes Want lne that mamzes the margn. postve negatve ( y ( y 1) : 1) : w b 1 w b 1 Support vectors Margn For support, vectors, w b 1 w b Dstance between pont and lne: w For support vectors: Τ w b 1 1 1 2 M w w w w w C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

Support vector machnes Want lne that mamzes the margn. postve negatve ( y ( y 1) : 1) : w b 1 w b 1 Support vectors Margn For support, vectors, w b 1 w b Dstance between pont and lne: w Therefore, the margn s 2 / w C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

Fndng the mamum margn lne 1. Mamze margn 2/ w 2. Correctly classfy all tranng data ponts: postve ( y negatve ( y 1) : 1) : w b 1 w b 1 Quadratc optmzaton problem: Mnmze 1 2 w T w Subject to y (w +b) 1 One constrant for each tranng pont. Note sgn trck. C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

Fndng the mamum margn lne Soluton: w y Learned weght Support vector C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

Fndng the mamum margn lne Soluton: w y b = y w Classfcaton functon: f ( ) sgn sgn (for any support vector) ( w y Notce that t reles on an nner product between the test pont and the support vectors (Solvng the optmzaton problem also nvolves computng the nner products j between all pars of tranng ponts) b) b If f() < 0, classfy as negatve, otherwse classfy as postve. C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998

Inner product f ( ) sgn sgn ( w b) y b Adapted from Mlos Hauskrecht

Nonlnear SVMs Datasets that are lnearly separable work out great: 0 But what f the dataset s just too hard? 0 We can map t to a hgher-dmensonal space: 2 Andrew Moore 0

Nonlnear SVMs General dea: the orgnal nput space can always be mapped to some hgher-dmensonal feature space where the tranng set s separable: Φ: φ() Andrew Moore

Nonlnear kernel: Eample Consder the mappng ), ( ) ( 2 2 2 2 2 2 2 ), ( ), ( ), ( ) ( ) ( y y y K y y y y y 2 Svetlana Lazebnk

The Kernel Trck The lnear classfer reles on dot product between vectors K(, j ) = j If every data pont s mapped nto hgh-dmensonal space va some transformaton Φ: φ( ), the dot product becomes: K(, j ) = φ( ) φ( j ) A kernel functon s smlarty functon that corresponds to an nner product n some epanded feature space The kernel trck: nstead of eplctly computng the lftng transformaton φ(), defne a kernel functon K such that: K(, j ) = φ( ) φ( j ) Andrew Moore

Eamples of kernel functons Lnear: K( Polynomals of degree up to d: Gaussan RBF: K(, j ) ep( 2 2 Hstogram ntersecton:, j ) T K(, j ) = ( T j + 1) d K (, j ) mn( ( k), j ( k)) k j j 2 ) Andrew Moore / Carlos Guestrn

Hard-margn SVMs The w that mnmzes Mamze margn

Soft-margn SVMs Msclassfcaton cost # data samples Slack varable The w that mnmzes Mamze margn Mnmze msclassfcaton

What about mult-class SVMs? Unfortunately, there s no defntve multclass SVM formulaton In practce, we have to obtan a mult-class SVM by combnng multple two-class SVMs One vs. others Tranng: learn an SVM for each class vs. the others Testng: apply each SVM to the test eample, and assgn t to the class of the SVM that returns the hghest decson value One vs. one Tranng: learn an SVM for each par of classes Testng: each learned SVM votes for a class to assgn to the test eample Svetlana Lazebnk

Mult-class problems One-vs-all (a.k.a. one-vs-others) Tran K classfers In each, pos = data from class, neg = data from classes other than The class wth the most confdent predcton wns Eample: You have 4 classes, tran 4 classfers 1 vs others: score 3.5 2 vs others: score 6.2 3 vs others: score 1.4 4 vs other: score 5.5 Fnal predcton: class 2

Mult-class problems One-vs-one (a.k.a. all-vs-all) Tran K(K-1)/2 bnary classfers (all pars of classes) They all vote for the label Eample: You have 4 classes, then tran 6 classfers 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4 Votes: 1, 1, 4, 2, 4, 4 Fnal predcton s class 4

Usng SVMs 1. Defne your representaton for each eample. 2. Select a kernel functon. 3. Compute parwse kernel values between labeled eamples. 4. Use ths kernel matr to solve for SVM support vectors & alpha weghts. 5. To classfy a new eample: compute kernel values between new nput and support vectors, apply alpha weghts, check sgn of output. Adapted from Krsten Grauman

Eample: Learnng gender w/ SVMs Moghaddam and Yang, Learnng Gender wth Support Faces, TPAMI 2002 Moghaddam and Yang, Face & Gesture 2000 Krsten Grauman

Eample: Learnng gender w/ SVMs Support faces Krsten Grauman

Eample: Learnng gender w/ SVMs SVMs performed better than humans, at ether resoluton Krsten Grauman

Some SVM packages LIBSVM http://www.cse.ntu.edu.tw/~cjln/lbsvm/ LIBLINEAR https://www.cse.ntu.edu.tw/~cjln/lblnear/ SVM Lght http://svmlght.joachms.org/

Lnear classfers vs nearest neghbors Lnear pros: + Low-dmensonal parametrc representaton + Very fast at test tme Lnear cons: Can be trcky to select best kernel functon for a problem Learnng can take a very long tme for large-scale problem NN pros: + Works for any number of classes + Decson boundares not necessarly lnear + Nonparametrc method + Smple to mplement NN cons: Slow at test tme (large search problem to fnd neghbors) Storage of data Especally need good dstance functon (but true for all classfers) Adapted from L. Lazebnk

Tranng vs Testng What do we want? Hgh accuracy on tranng data? No, hgh accuracy on unseen/new/test data! Why s ths trcky? Tranng data Features () and labels (y) used to learn mappng f Test data Features () used to make a predcton Labels (y) only used to see how well we ve learned f!!! Valdaton data Held-out set of the tranng data Can use both features () and labels (y) to tune parameters of the model we re learnng

Generalzaton Tranng set (labels known) Test set (labels unknown) How well does a learned model generalze from the data t was traned on to a new test set? Slde credt: L. Lazebnk

Components of generalzaton error Nose n our observatons: unavodable Bas: how much the average model over all tranng sets dffers from the true model Inaccurate assumptons/smplfcatons made by the model Varance: how much models estmated from dfferent tranng sets dffer from each other Underfttng: model s too smple to represent all the relevant class characterstcs Hgh bas and low varance Hgh tranng error and hgh test error Overfttng: model s too comple and fts rrelevant characterstcs (nose) n the data Low bas and hgh varance Generalzaton Low tranng error and hgh test error Slde credt: L. Lazebnk

Generalzaton Models wth too few parameters are naccurate because of a large bas (not enough fleblty). Red dots = tranng data (all that we see before we shp off our model!) Green curve = true underlyng model Models wth too many parameters are naccurate because of a large varance (too much senstvty to the sample). Blue curve = our predcted model/ft Purple dots = possble test ponts Adapted from D. Hoem

Polynomal Curve Fttng Slde credt: Chrs Bshop

Sum-of-Squares Error Functon Slde credt: Chrs Bshop

0 th Order Polynomal Slde credt: Chrs Bshop

1 st Order Polynomal Slde credt: Chrs Bshop

3 rd Order Polynomal Slde credt: Chrs Bshop

9 th Order Polynomal Slde credt: Chrs Bshop

Over-fttng Root-Mean-Square (RMS) Error: Slde credt: Chrs Bshop

Data Set Sze: 9 th Order Polynomal Slde credt: Chrs Bshop

Data Set Sze: 9 th Order Polynomal Slde credt: Chrs Bshop

Regularzaton Penalze large coeffcent values (Remember: We want to mnmze ths epresson.) Adapted from Chrs Bshop

Regularzaton: Slde credt: Chrs Bshop

Regularzaton: Slde credt: Chrs Bshop

Polynomal Coeffcents Slde credt: Chrs Bshop

Polynomal Coeffcents No regularzaton Huge regularzaton Adapted from Chrs Bshop

Regularzaton: vs. Slde credt: Chrs Bshop

Error Tranng vs test error Underfttng Overfttng Test error Hgh Bas Low Varance Complety Tranng error Low Bas Hgh Varance Slde credt: D. Hoem

Test Error The effect of tranng set sze Few tranng eamples Many tranng eamples Hgh Bas Low Varance Complety Low Bas Hgh Varance Slde credt: D. Hoem

Error Choosng the trade-off between bas and varance Need valdaton set (separate from the test set) Valdaton error Tranng error Hgh Bas Low Varance Complety Low Bas Hgh Varance Slde credt: D. Hoem

Summary Try smple classfers frst Better to have smart features and smple classfers than smple features and smart classfers Use ncreasngly powerful classfers wth more tranng data As an addtonal technque for reducng varance, try regularzng the parameters Slde credt: D. Hoem