CAP5415-Computer Vision Lecture 18-Face Recogni;on Dr. Ulas Bagci bagci@ucf.edu 1
Lecture 18 Face Detec;on and Recogni;on Detec;on Recogni;on Sally 2
Why wasn t Massachusetss Bomber iden6fied by the Massachuse8s Department of Motor Vehicles system from the video surveillance images? He was enrolled in MA DMV Database! DMV Face Recognition System? Slide Credits to Animetrics, Dr. Marc Valliant, VP & CTO 3
Today s FR technology will reliably find controlled facial photo in a mugshot database of controlled database. Slide Credits to Animetrics, Dr. Marc Valliant, VP & CTO 4
Controlled Facial Photo Today s FR technology will reliably find controlled facial photo in a mugshot database of controlled database. However, there are confounding variables in uncontrolled facial photos Slide Credits to Animetrics, Dr. Marc Valliant, VP & CTO 5
Controlled Facial Photo Today s FR technology will reliably find controlled facial photo in a mugshot database of controlled database. However, there are confounding variables in uncontrolled facial photos Resolution (not enough pixels) Facial Pose angulated Illumination Occluded facial areas Slide Credits to Animetrics, Dr. Marc Valliant, VP & CTO 6
Further Difficul;es 7
Three goals Feature Computa;on features must be computed as quickly as possible Feature Selec;on select the most discriminating features Real ;meliness must focus on potentially positive image areas (that contain faces) 8
Face Detec;on Before face recogni;on can be applied to a general image, the loca;ons and sizes of any faces must be first found. Rowley, Baluja, Kanade (1998) 9
Lecture 18 Face Detec;on/Recogni;on using Mobile Devices Face detec6on (camera automa;cally Adjust the focus based on detected Faces) Auto-login with recognized faces 10
Eye, mouth,.. Featurebased Templatebased AAM, Appearance- Based Patches, Face Detec;on 11
Some of the representa;ve works 12
Rectangle (Haar-like) Features Rectangle filters Value = (pixels in white area) (pixels in black area) 13
Fast Computa;on with Integral Images This can quickly be computed in one pass through the image formal definition: (, ) = i( x', y ') ii x y x' x, y' y Recursive definition: (, ) = (, 1 ) + (, ) (, ) = ( 1, ) + (, ) s x y s x y i x y ii x y ii x y s x y (x,y) IMAGE 0 1 1 1 1 2 2 3 1 2 1 1 1 3 1 0 INTEGRAL IMAGE 0 1 2 3 1 4 7 11 2 7 11 16 3 11 16 21 14
Feature Selec;on For a 24x24 detection region, the number of possible rectangle features is ~160,000! 15
Feature Selec;on For a 24x24 detection region, the number of possible rectangle features is ~160,000! PCA 16
Local Binary Paferns (LBP): Alterna;ve Features Gray-scale invariant texture measure Derived from local neighborhood Powerful texture descriptor Computa;onally simple Robust against monotonic gray-scale changes 17
Local Binary Paferns (LBP): Alterna;ve Features (LBP from dynamic/video texture) 18
Principal Component Analysis (PCA) Mapping from the inputs in the original d-dimensional space to a new (k<d)-dimensional space, with minimum loss of informa;on. 19
Principal Component Analysis (PCA) Mapping from the inputs in the original d-dimensional space to a new (k<d)-dimensional space, with minimum loss of informa;on. PCA is an unsupervised method, it does not use output informa;on. 20
Principal Component Analysis (PCA) Mapping from the inputs in the original d-dimensional space to a new (k<d)-dimensional space, with minimum loss of informa;on. PCA is an unsupervised method, it does not use output informa;on. PCA centers the sample and then rotates the axes to line up with the direc;ons of highest variance. 21
Principal Component Analysis (PCA) The projec;on of x on the direc;on of w is z = w T x Second principal component * * * * * * * * * * * * * * Data points * * * * * * * * * * First principal component Original axes 22
Principal Component Analysis (PCA) The projec;on of x on the direc;on of w is z = w T x The principal component us w 1 such that the sample, aker projec;on on w 1, is most spread out so that the difference between the sample points becomes most apparent. 23
Principal Component Analysis (PCA) The projec;on of x on the direc;on of w is z = w T x The principal component us w 1 such that the sample, aker projec;on on w 1, is most spread out so that the difference between the sample points becomes most apparent. To have unique solu;on, w 1 =1 24
Principal Component Analysis (PCA) The projec;on of x on the direc;on of w is The principal component us w 1 such that the sample, aker projec;on on w 1, is most spread out so that the difference between the sample points becomes most apparent. To have unique solu;on, with z = w T x w 1 =1 z 1 = w 1 T x Cov(x) = 25
Principal Component Analysis (PCA) The projec;on of x on the direc;on of w is The principal component us w 1 such that the sample, aker projec;on on w 1, is most spread out so that the difference between the sample points becomes most apparent. To have unique solu;on, with Then, z = w T x w 1 =1 z 1 = w 1 T x Cov(x) = Var(z 1 )=w 1 T w 1 26
Principal Component Analysis (PCA) The projec;on of x on the direc;on of w is The principal component us w 1 such that the sample, aker projec;on on w 1, is most spread out so that the difference between the sample points becomes most apparent. To have unique solu;on, with Then, z = w T x w 1 =1 z 1 = w 1 T x Cov(x) = Var(z 1 )=w 1 T w 1 SEEK w 1 such that Var(z 1 ) is maximized! 27
Solu;on of PCA Write it as a Lagrange problem, take deriva;ves w.r.t to w, then z = W T (x m) where m is the sample mean Cov(z) =W T SW X T X = WDW T (= D diagonal) (S: spectral decomp.) 28
Solu;on of PCA X T X = WDW T Let us say we want to reduce dimensionality to k<d, we take the first k columns of W (with the highest eigenvalues). 29
Solu;on of PCA X T X = WDW T Let us say we want to reduce dimensionality to k<d, we take the first k columns of W (with the highest eigenvalues). z t i = w T i x t i=1, k, t=1,...,n X = USV T U = evec(xx T ) V = evec(x T X T ) S 2 = eval(xx T ) (X T X)w i = i w i 30
Scree plot: Ability of PCs to explain varia;on in data Enough PCs (principal components) to have a cumulative variance explained by the PCs that is >50-70% Kaiser criterion: keep PCs with eigenvalues >1 λ N λ i 31
Recap: PCA calcula;ons in cartoon Steps in PCA: #1 Calculate Adjusted Data Set Adjusted Data Set: A Data Set: D Mean values: M = n dims - M i is calculated by taking the mean of the values in dimension i data samples 32
Recap: PCA calcula;ons in cartoon Steps in PCA: #2 Calculate Co-variance matrix, C, from Adjusted Data Set, A Co-variance Matrix: C n C ij = cov(i,j) n Note: Since the means of the dimensions in the adjusted data set, A, are 0, the covariance matrix can simply be wrifen as: C = A A T /(n-1) 33
Recap: PCA calcula;ons in cartoon Steps in PCA: #3 Calculate eigenvectors and eigenvalues of C Matrix E Eigenvalues Matrix E Eigenvalues x x Eigenvectors Eigenvectors If some eigenvalues are 0 or very small, we can essen;ally discard those eigenvalues and the corresponding eigenvectors, hence reducing the dimensionality of the new basis. 34
Recap: PCA calcula;ons in cartoon Steps in PCA: #4 Transforming data set to the new basis F = E T A where: F is the transformed data set E T is the transpose of the E matrix containing the eigenvectors A is the adjusted data set Note that the dimensions of the new dataset, F, are less than the data set A To recover A from F: (E T ) -1 F = (E T ) -1 E T A (E T ) T F = A EF = A * E is orthogonal, therefore E -1 = E T 35
Holis;c FR: Eigenfaces Eigenfaces, fisherfaces, tensorfaces.. 36
Gabor Feature-based FR Earlier FR methods are mostly feature-based. The most successful feature-based FR is the elas;c bunch graph matching system with Gabor filter coefficients as features: (scale) 37
Gabor Features Scale (5) Orienta;on (8) 38
PCA on Faces: Eigenfaces Average face First principal component Other components For all except average, gray = 0, white > 0, black < 0 39
Eigenfaces example Training faces 40
Eigenfaces example Top eigenvectors: u 1, u k Mean: µ 41
Applica;on to faces Representing faces onto this basis Face reconstruc;on: 42
Simplest Approach to FR The simplest approach is to think of it as a template matching problem Problems arise when performing recognition in a high-dimensional space. Significant improvements can be achieved by first mapping the data into a lower dimensionality space. 43
FR using eigenfaces 44
FR using eigenfaces The distance e r is called distance within face space (difs) The Euclidean distance can be used to compute e r, however, the Mahalanobis distance has shown to work better: k k Ω Ω = ( w w ) K i= 1 i i 2 Euclidean distance Mahalanobis distance 45
Face detec;on (iphoto) 46
Face Detec;on Nikon S60 47
Face Detec;on Nikon S60 finds 12 faces 48
The Viola/Jones Face Detector A seminal approach to real-;me object detec;on Training is slow, but detec;on is very fast Key ideas Integral images for fast feature evalua;on Boos5ng for feature selec;on A7en5onal cascade for fast rejec;on of non-face windows P. Viola and M. Jones. Rapid object detec5on using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-5me face detec5on. IJCV 57(2), 2004. 49
The Viola/Jones Face Detector-Training Ini;ally, weight each training example equally In each boos;ng round: Find the weak learner that achieves the lowest weighted training error Raise the weights of training examples misclassified by current weak learner Compute final classifier as linear combina;on of all weak learners (weight of each learner is directly propor;onal to its accuracy) Exact formulas for re-weigh;ng and combining weak learners depend on the par;cular boos;ng scheme (e.g., AdaBoost) P. Viola and M. Jones. Rapid object detec5on using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-5me face detec5on. IJCV 57(2), 2004. 50
The Viola/Jones Face Detector-Tes;ng First two features selected by boos;ng: This feature combina;on can yield 100% detec;on rate and 50% false posi;ve rate 51
The Viola/Jones Face Detector-Tes;ng A 200-feature classifier can yield 95% detec;on rate and a false posi;ve rate of 1 in 14084 Not good enough! 52
Afen;onal Cascade We start with simple classifiers which reject many of the nega;ve sub-windows while detec;ng almost all posi;ve sub-windows Posi;ve response from the first classifier triggers the evalua;on of a second (more complex) classifier, and so on A nega;ve outcome at any point leads to the immediate rejec;on of the sub-window % Detection 0 100 Receiver opera;ng characteris;c % False Pos 0 50 vs false neg determined by IMAGE SUB-WINDOW Classifier 1 T Classifier 2 T Classifier 3 T FACE F NON-FACE F NON-FACE F NON-FACE 53
Cascaded Classifiers (Boos;ng) Output Base-learners input 54
Boos;ng for FR Weak Classifier 1 55
Boos;ng for FR Weights Increased 56
Boos;ng for FR Weak Classifier 2 57
Boos;ng for FR Weights Increased 58
Boos;ng for FR Weak Classifier 3 59
Boos;ng for FR Final classifier is a combination of weak classifiers 60
AdaBoost Algorithm Given 1 1 m m, 1 Initialize = = For ( x, y ),...,( x, y ) x X, y Y = { 1, + 1} t = 1,..., T For each classifier ht : X { 1, + 1} that minimizes the error with respect to the distribution ε is the weighted error rate of classifier t If εt 0.5, then stop Choose αt R, typically Update where D1 () i, i 1,..., m, m D h t i i D t = arg minε ε = D()[ i y h( x )] h H t t t t i t i 1 1 ε ln t αt = 2 εt D()exp( i α yh( x )) i = Zt is a normalized factor (choose so that Dt+1 will sum_x=1) t t i t i t+ 1() Z t ht 61
Boos;ng for FR Define weak learners based on rectangle features T H( x) = sign atht( x) t= 1 62
Boos;ng & SVM Advantages of boosting Integrates classification with feature selection Complexity of training is linear instead of quadratic in the number of training examples Flexibility in the choice of weak learners, boosting scheme Testing is fast Easy to implement Disadvantages Needs many training examples Often does not work as SVM 63
Simple FR for Mobile Devices (LBP: local binary paferns) 64
References & Slice Credits Animetrics, Dr. Marc Valliant, VP & CTO M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, vol. 3, no. 1., 1991. Y. Freund and R. Schapire, A short introduc;on to boos;ng, Journal of Japanese Society for Ar5ficial Intelligence, 14(5):771-780, September, 1999. S.Li, et al. Handbook of Face Recognition, Springer. Paul A. Viola and Michael J. Jones, Intl. J. Computer Vision 57(2), 137 154, 2004, (originally in CVPR 2001) Some slides adapted from Bill Freeman, MIT 6.869, April 2005) Friedman, J., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting http://www-stat.stanford.edu/~hastie/papers/boost.ps 65