CS 2770: Computer Vson Intro to Vsual Recognton Prof. Adrana Kovashka Unversty of Pttsburgh February 13, 2018
Plan for today What s recognton? a.k.a. classfcaton, categorzaton Support vector machnes Separable case / non-separable case Lnear / non-lnear (kernels) The mportance of generalzaton The bas-varance trade-off (apples to all classfers)
Classfcaton Gven a feature representaton for mages, how do we learn a model for dstngushng features from dfferent classes? Decson boundary Zebra Non-zebra Slde credt: L. Lazebnk
Classfcaton Assgn nput vector to one of two or more classes Input space dvded nto decson regons separated by decson boundares Slde credt: L. Lazebnk
Eamples of mage classfcaton Two-class (bnary): Cat vs Dog Adapted from D. Hoem
Eamples of mage classfcaton Mult-class (often): Object recognton Caltech 101 Average Object Images Adapted from D. Hoem
Eamples of mage classfcaton Fne-graned recognton Vspeda Project Slde credt: D. Hoem
Eamples of mage classfcaton Place recognton Places Database [Zhou et al. NIPS 2014] Slde credt: D. Hoem
Eamples of mage classfcaton Materal recognton [Bell et al. CVPR 2015] Slde credt: D. Hoem
Eamples of mage classfcaton Datng hstorcal photos 1940 1953 1966 1977 [Palermo et al. ECCV 2012] Slde credt: D. Hoem
Eamples of mage classfcaton Image style recognton [Karayev et al. BMVC 2014] Slde credt: D. Hoem
Recognton: A machne learnng approach
The machne learnng framework Apply a predcton functon to a feature representaton of the mage to get the desred output: f( ) = apple f( ) = tomato f( ) = cow Slde credt: L. Lazebnk
The machne learnng framework y = f() output predcton functon mage / mage feature Tranng: gven a tranng set of labeled eamples {( 1,y 1 ),, ( N,y N )}, estmate the predcton functon f by mnmzng the predcton error on the tranng set Testng: apply f to a never before seen test eample and output the predcted value y = f() Slde credt: L. Lazebnk
The old-school way Tranng Tranng Images Image Features Tranng Labels Tranng Learned model Testng Test Image Image Features Learned model Predcton Slde credt: D. Hoem and L. Lazebnk
The smplest classfer Tranng eamples from class 1 Test eample Tranng eamples from class 2 f() = label of the tranng eample nearest to All we need s a dstance functon for our nputs No tranng requred! Slde credt: L. Lazebnk
K-Nearest Neghbors classfcaton For a new pont, fnd the k closest ponts from tranng data Labels of the k ponts vote to classfy Black = negatve Red = postve k = 5 If query lands here, the 5 NN consst of 3 negatves and 2 postves, so we classfy t as negatve. Slde credt: D. Lowe
m2gps: Estmatng Geographc Informaton from a Sngle Image James Hays and Alee Efros, CVPR 2008 Where was ths mage taken? Nearest Neghbors accordng to bag of SIFT + color hstogram + a few others Slde credt: James Hays
The Importance of Data Sldes: James Hays
Lnear classfer Fnd a lnear functon to separate the classes f() = sgn(w 1 1 + w 2 2 + + w D D ) = sgn(w ) Slde credt: L. Lazebnk
Lnear classfer Decson = sgn(w T ) = sgn(w1*1 + w2*2) 2 (0, 0) 1 What should the weghts be?
Lnes n R 2 Let w a c y a cy b 0 Krsten Grauman
Lnes n R 2 Let w a c y w a cy b 0 w b 0 Krsten Grauman
0, y 0 Lnes n R 2 Let w a c y w a cy b 0 w b 0 Krsten Grauman
0, y 0 D Lnes n R 2 Let w a c y w a cy b 0 w b 0 D Krsten Grauman a a 2 cy c 2 b w b w 0 0 dstance from pont to lne
0, y 0 D Lnes n R 2 Let w a c y w a cy b 0 w b 0 D Krsten Grauman a0 cy0 b w b 2 2 a c w dstance from pont to lne
Lnear classfers Fnd lnear functon to separate postve and negatve eamples postve negatve : : w w b b 0 0 Whch lne s best? C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998
Support vector machnes Dscrmnatve classfer based on optmal separatng lne (for 2d case) Mamze the margn between the postve and negatve tranng eamples C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998
Support vector machnes Want lne that mamzes the margn. postve negatve ( y ( y 1) : 1) : w b 1 w b 1 For support, vectors, w b 1 Support vectors Margn C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998
Support vector machnes Want lne that mamzes the margn. postve negatve ( y ( y 1) : 1) : w b 1 w b 1 Support vectors Margn For support, vectors, w b 1 w b Dstance between pont and lne: w For support vectors: Τ w b 1 1 1 2 M w w w w w C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998
Support vector machnes Want lne that mamzes the margn. postve negatve ( y ( y 1) : 1) : w b 1 w b 1 Support vectors Margn For support, vectors, w b 1 w b Dstance between pont and lne: w Therefore, the margn s 2 / w C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998
Fndng the mamum margn lne 1. Mamze margn 2/ w 2. Correctly classfy all tranng data ponts: postve ( y negatve ( y 1) : 1) : w b 1 w b 1 Quadratc optmzaton problem: Mnmze 1 2 w T w Subject to y (w +b) 1 One constrant for each tranng pont. Note sgn trck. C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998
Fndng the mamum margn lne Soluton: w y Learned weght Support vector C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998
Fndng the mamum margn lne Soluton: w y b = y w Classfcaton functon: f ( ) sgn sgn (for any support vector) ( w y Notce that t reles on an nner product between the test pont and the support vectors (Solvng the optmzaton problem also nvolves computng the nner products j between all pars of tranng ponts) b) b If f() < 0, classfy as negatve, otherwse classfy as postve. C. Burges, A Tutoral on Support Vector Machnes for Pattern Recognton, Data Mnng and Knowledge Dscovery, 1998
Inner product f ( ) sgn sgn ( w b) y b Adapted from Mlos Hauskrecht
Nonlnear SVMs Datasets that are lnearly separable work out great: 0 But what f the dataset s just too hard? 0 We can map t to a hgher-dmensonal space: 2 Andrew Moore 0
Nonlnear SVMs General dea: the orgnal nput space can always be mapped to some hgher-dmensonal feature space where the tranng set s separable: Φ: φ() Andrew Moore
Nonlnear kernel: Eample Consder the mappng ), ( ) ( 2 2 2 2 2 2 2 ), ( ), ( ), ( ) ( ) ( y y y K y y y y y 2 Svetlana Lazebnk
The Kernel Trck The lnear classfer reles on dot product between vectors K(, j ) = j If every data pont s mapped nto hgh-dmensonal space va some transformaton Φ: φ( ), the dot product becomes: K(, j ) = φ( ) φ( j ) A kernel functon s smlarty functon that corresponds to an nner product n some epanded feature space The kernel trck: nstead of eplctly computng the lftng transformaton φ(), defne a kernel functon K such that: K(, j ) = φ( ) φ( j ) Andrew Moore
Eamples of kernel functons Lnear: K( Polynomals of degree up to d: Gaussan RBF: K(, j ) ep( 2 2 Hstogram ntersecton:, j ) T K(, j ) = ( T j + 1) d K (, j ) mn( ( k), j ( k)) k j j 2 ) Andrew Moore / Carlos Guestrn
Hard-margn SVMs The w that mnmzes Mamze margn
Soft-margn SVMs Msclassfcaton cost # data samples Slack varable The w that mnmzes Mamze margn Mnmze msclassfcaton
What about mult-class SVMs? Unfortunately, there s no defntve multclass SVM formulaton In practce, we have to obtan a mult-class SVM by combnng multple two-class SVMs One vs. others Tranng: learn an SVM for each class vs. the others Testng: apply each SVM to the test eample, and assgn t to the class of the SVM that returns the hghest decson value One vs. one Tranng: learn an SVM for each par of classes Testng: each learned SVM votes for a class to assgn to the test eample Svetlana Lazebnk
Mult-class problems One-vs-all (a.k.a. one-vs-others) Tran K classfers In each, pos = data from class, neg = data from classes other than The class wth the most confdent predcton wns Eample: You have 4 classes, tran 4 classfers 1 vs others: score 3.5 2 vs others: score 6.2 3 vs others: score 1.4 4 vs other: score 5.5 Fnal predcton: class 2
Mult-class problems One-vs-one (a.k.a. all-vs-all) Tran K(K-1)/2 bnary classfers (all pars of classes) They all vote for the label Eample: You have 4 classes, then tran 6 classfers 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4 Votes: 1, 1, 4, 2, 4, 4 Fnal predcton s class 4
Usng SVMs 1. Defne your representaton for each eample. 2. Select a kernel functon. 3. Compute parwse kernel values between labeled eamples. 4. Use ths kernel matr to solve for SVM support vectors & alpha weghts. 5. To classfy a new eample: compute kernel values between new nput and support vectors, apply alpha weghts, check sgn of output. Adapted from Krsten Grauman
Eample: Learnng gender w/ SVMs Moghaddam and Yang, Learnng Gender wth Support Faces, TPAMI 2002 Moghaddam and Yang, Face & Gesture 2000 Krsten Grauman
Eample: Learnng gender w/ SVMs Support faces Krsten Grauman
Eample: Learnng gender w/ SVMs SVMs performed better than humans, at ether resoluton Krsten Grauman
Some SVM packages LIBSVM http://www.cse.ntu.edu.tw/~cjln/lbsvm/ LIBLINEAR https://www.cse.ntu.edu.tw/~cjln/lblnear/ SVM Lght http://svmlght.joachms.org/
Lnear classfers vs nearest neghbors Lnear pros: + Low-dmensonal parametrc representaton + Very fast at test tme Lnear cons: Can be trcky to select best kernel functon for a problem Learnng can take a very long tme for large-scale problem NN pros: + Works for any number of classes + Decson boundares not necessarly lnear + Nonparametrc method + Smple to mplement NN cons: Slow at test tme (large search problem to fnd neghbors) Storage of data Especally need good dstance functon (but true for all classfers) Adapted from L. Lazebnk
Tranng vs Testng What do we want? Hgh accuracy on tranng data? No, hgh accuracy on unseen/new/test data! Why s ths trcky? Tranng data Features () and labels (y) used to learn mappng f Test data Features () used to make a predcton Labels (y) only used to see how well we ve learned f!!! Valdaton data Held-out set of the tranng data Can use both features () and labels (y) to tune parameters of the model we re learnng
Generalzaton Tranng set (labels known) Test set (labels unknown) How well does a learned model generalze from the data t was traned on to a new test set? Slde credt: L. Lazebnk
Components of generalzaton error Nose n our observatons: unavodable Bas: how much the average model over all tranng sets dffers from the true model Inaccurate assumptons/smplfcatons made by the model Varance: how much models estmated from dfferent tranng sets dffer from each other Underfttng: model s too smple to represent all the relevant class characterstcs Hgh bas and low varance Hgh tranng error and hgh test error Overfttng: model s too comple and fts rrelevant characterstcs (nose) n the data Low bas and hgh varance Generalzaton Low tranng error and hgh test error Slde credt: L. Lazebnk
Generalzaton Models wth too few parameters are naccurate because of a large bas (not enough fleblty). Red dots = tranng data (all that we see before we shp off our model!) Green curve = true underlyng model Models wth too many parameters are naccurate because of a large varance (too much senstvty to the sample). Blue curve = our predcted model/ft Purple dots = possble test ponts Adapted from D. Hoem
Polynomal Curve Fttng Slde credt: Chrs Bshop
Sum-of-Squares Error Functon Slde credt: Chrs Bshop
0 th Order Polynomal Slde credt: Chrs Bshop
1 st Order Polynomal Slde credt: Chrs Bshop
3 rd Order Polynomal Slde credt: Chrs Bshop
9 th Order Polynomal Slde credt: Chrs Bshop
Over-fttng Root-Mean-Square (RMS) Error: Slde credt: Chrs Bshop
Data Set Sze: 9 th Order Polynomal Slde credt: Chrs Bshop
Data Set Sze: 9 th Order Polynomal Slde credt: Chrs Bshop
Regularzaton Penalze large coeffcent values (Remember: We want to mnmze ths epresson.) Adapted from Chrs Bshop
Regularzaton: Slde credt: Chrs Bshop
Regularzaton: Slde credt: Chrs Bshop
Polynomal Coeffcents Slde credt: Chrs Bshop
Polynomal Coeffcents No regularzaton Huge regularzaton Adapted from Chrs Bshop
Regularzaton: vs. Slde credt: Chrs Bshop
Error Tranng vs test error Underfttng Overfttng Test error Hgh Bas Low Varance Complety Tranng error Low Bas Hgh Varance Slde credt: D. Hoem
Test Error The effect of tranng set sze Few tranng eamples Many tranng eamples Hgh Bas Low Varance Complety Low Bas Hgh Varance Slde credt: D. Hoem
Error Choosng the trade-off between bas and varance Need valdaton set (separate from the test set) Valdaton error Tranng error Hgh Bas Low Varance Complety Low Bas Hgh Varance Slde credt: D. Hoem
Summary Try smple classfers frst Better to have smart features and smple classfers than smple features and smart classfers Use ncreasngly powerful classfers wth more tranng data As an addtonal technque for reducng varance, try regularzng the parameters Slde credt: D. Hoem