Lecture 17: Face Recogni2on Dr. Juan Carlos Niebles Stanford AI Lab Professor Fei-Fei Li Stanford Vision Lab Lecture 17-1!
What we will learn today Introduc2on to face recogni2on Principal Component Analysis (PCA) The Eigenfaces Algorithm Linear Discriminant Analysis (LDA) Turk and Pentland, Eigenfaces for Recogni2on, Journal of Cogni-ve Neuroscience 3 (1): 71 86. P. Belhumeur, J. Hespanha, and D. Kriegman. "Eigenfaces vs. Fisherfaces: Recogni2on Using Class Specific Linear Projec2on". IEEE Transac-ons on pa7ern analysis and machine intelligence 19 (7): 711. 1997. Lecture 17-2!
Faces in the brain Courtesy of Johannes M. Zanker Lecture 17-3!
Faces in the brain Lecture 17-4! Kanwisher, et al. 1997
Face Recogni2on Digital photography Lecture 17-5!
Face Recogni2on Digital photography Surveillance Lecture 17-6!
Face Recogni2on Digital photography Surveillance Album organiza2on Lecture 17-7!
Face Recogni2on Digital photography Surveillance Album organiza2on Person tracking/id. Lecture 17-8!
Face Recogni2on Digital photography Surveillance Album organiza2on Person tracking/id. Emo2ons and expressions Lecture 17-9!
Face Recogni2on Digital photography Surveillance Album organiza2on Person tracking/id. Emo2ons and expressions Security/warfare Tele-conferencing Etc. Lecture 17-10!
The Space of Faces An image is a point in a high dimensional space If represented in grayscale intensity, an N x M image is a point in R NM E.g. 100x100 image = 10,000 dim Slide credit: Chuck Dyer, Steve Seitz, Nishino Lecture 17-11!
The Space of Faces An image is a point in a high dimensional space If represented in grayscale intensity, an N x M image is a point in R NM E.g. 100x100 image = 10,000 dim However, rela2vely few high dimensional vectors correspond to valid face images We want to effec2vely model the subspace of face images Slide credit: Chuck Dyer, Steve Seitz, Nishino Lecture 17-12!
Image space Face space Compute n-dim subspace such that the projec2on of the data points onto the subspace has the largest variance among all n-dim subspaces. Maximize the scaier of the training images in face space Lecture 17-13!
Key Idea So, compress them to a low-dimensional subspace that captures key appearance characteris2cs of the visual DOFs. USE PCA for es2ma2ng the sub-space (dimensionality reduc2on) Compare two faces by projec2ng the images into the subspace and measuring the EUCLIDEAN distance between them. Lecture 17-14!
What we will learn today Introduc2on to face recogni2on Principal Component Analysis (PCA) The Eigenfaces Algorithm Linear Discriminant Analysis (LDA) Turk and Pentland, Eigenfaces for Recogni2on, Journal of Cogni-ve Neuroscience 3 (1): 71 86. P. Belhumeur, J. Hespanha, and D. Kriegman. "Eigenfaces vs. Fisherfaces: Recogni2on Using Class Specific Linear Projec2on". IEEE Transac-ons on pa7ern analysis and machine intelligence 19 (7): 711. 1997. Lecture 17-15!
technical note PCA Formula2on Basic idea: If the data lives in a subspace, it is going to look very flat when viewed from the full space, e.g. 1D subspace in 2D 2D subspace in 3D This means that if we fit a Gaussian to the data the equiprobability contours are going to be highly skewed ellipsoids Slide inspired by N. Vasconcelos Lecture 17-16!
technical note PCA Formula2on If x is Gaussian with covariance Σ, the equiprobability contours are the ellipses whose x 2 Principal components φ i are the eigenvectors of Σ Principal lengths λ i are the eigenvalues of Σ φ 2 λ 2 λ 1 φ 1 x 1 by compu2ng the eigenvalues we know the data is Not flat if λ 1 λ 2 Flat if λ 1 >> λ 2 Slide inspired by N. Vasconcelos Lecture 17-17!
technical note PCA Algorithm (training) Slide inspired by N. Vasconcelos Lecture 17-18!
technical note PCA Algorithm (tes2ng) Slide inspired by N. Vasconcelos Lecture 17-19!
technical note PCA by SVD An alterna2ve manner to compute the principal components, based on singular value decomposi2on Quick reminder: SVD Any real n x m matrix (n>m) can be decomposed as Where M is an (n x m) column orthonormal matrix of let singular vectors (columns of M) Π is an (m x m) diagonal matrix of singular values N T is an (m x m) row orthonormal matrix of right singular vectors (columns of N) Slide inspired by N. Vasconcelos Lecture 17-20!
technical note PCA by SVD To relate this to PCA, we consider the data matrix The sample mean is Slide inspired by N. Vasconcelos Lecture 17-21!
technical note PCA by SVD Center the data by subtrac2ng the mean to each column of X The centered data matrix is Slide inspired by N. Vasconcelos Lecture 17-22!
technical note PCA by SVD The sample covariance matrix is where x i c is the i th column of X c This can be wriien as Slide inspired by N. Vasconcelos Lecture 17-23!
technical note PCA by SVD The matrix is real (n x d). Assuming n>d it has SVD decomposi2on and Slide inspired by N. Vasconcelos Lecture 17-24!
technical note PCA by SVD Note that N is (d x d) and orthonormal, and Π 2 is diagonal. This is just the eigenvalue decomposi2on of Σ It follows that The eigenvectors of Σ are the columns of N The eigenvalues of Σ are This gives an alterna2ve algorithm for PCA Slide inspired by N. Vasconcelos Lecture 17-25!
technical note PCA by SVD In summary, computa2on of PCA by SVD Given X with one example per column Create the centered data matrix Compute its SVD Principal components are columns of N, eigenvalues are Slide inspired by N. Vasconcelos Lecture 17-26!
technical note Rule of thumb for finding the number of PCA components A natural measure is to pick the eigenvectors that explain p% of the data variability Can be done by ploxng the ra2o r k as a func2on of k E.g. we need 3 eigenvectors to cover 70% of the variability of this dataset Slide inspired by N. Vasconcelos Lecture 17-27!
What we will learn today Introduc2on to face recogni2on Principal Component Analysis (PCA) The Eigenfaces Algorithm Linear Discriminant Analysis (LDA) Turk and Pentland, Eigenfaces for Recogni2on, Journal of Cogni-ve Neuroscience 3 (1): 71 86. P. Belhumeur, J. Hespanha, and D. Kriegman. "Eigenfaces vs. Fisherfaces: Recogni2on Using Class Specific Linear Projec2on". IEEE Transac-ons on pa7ern analysis and machine intelligence 19 (7): 711. 1997. Lecture 17-28!
Eigenfaces: key idea Assume that most face images lie on a lowdimensional subspace determined by the first k (k<<d) direc2ons of maximum variance Use PCA to determine the vectors or eigenfaces that span that subspace Represent all face images in the dataset as linear combina2ons of eigenfaces M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991 Lecture 17-29!
technical note Eigenface algorithm Training 1. Align training images x 1, x 2,, x N Note that each image is formulated into a long vector! 2. Compute average face µ = 1 x i N 3. Compute the difference image (the centered data matrix) Lecture 17-30!
technical note Eigenface algorithm 4. Compute the covariance matrix 5. Compute the eigenvectors of the covariance matrix Σ 6. Compute each training image x i s projec2ons as x i ( x c i φ 1, x c i φ 2,..., x c i φ ) K ( a 1, a 2,..., a K ) 7. Visualize the es2mated training face x i x i µ + a 1 φ 1 + a 2 φ 2 +... + a K φ K Lecture 17-31!
technical note Eigenface algorithm a 1 φ 1 a 2 φ 2... a K φ K 6. Compute each training image x i s projec2ons as x i ( x c i φ 1, x c i φ 2,..., x c i φ ) K ( a 1, a 2,..., a K ) 7. Visualize the es2mated training face x i x i µ + a 1 φ 1 + a 2 φ 2 +... + a K φ K Lecture 17-32!
technical note Eigenface algorithm Tes2ng 1. Take query image t 2. Project into eigenface space and compute projec2on t ((t µ) φ 1, (t µ) φ 2,..., (t µ) φ K ) ( w 1, w 2,..., w K ) 3. Compare projec2on w with all N training projec2ons Simple comparison metric: Euclidean Simple decision: K-Nearest Neighbor (note: this K refers to the k-nn algorithm, is different from the previous K s referring to the # of principal components) Lecture 17-33!
Visualiza2on of eigenfaces Eigenfaces look somewhat like generic faces. Lecture 17-34!
Reconstruc2on and Errors K = 4 K = 200 K = 400 Only selec2ng the top K eigenfaces à reduces the dimensionality. Fewer eigenfaces result in more informa2on loss, and hence less discrimina2on between faces. Lecture 17-35!
Summary for Eigenface Pros Non-itera2ve, globally op2mal solu2on Limita2ons PCA projec2on is op&mal for reconstruc&on from a low dimensional basis, but may NOT be op&mal for discrimina&on See supplementary materials for Linear Discrimina2ve Analysis, aka Fisherfaces Lecture 17-36!
What we will learn today Introduc2on to face recogni2on Principal Component Analysis (PCA) The Eigenfaces Algorithm Linear Discriminant Analysis (LDA) Turk and Pentland, Eigenfaces for Recogni2on, Journal of Cogni-ve Neuroscience 3 (1): 71 86. P. Belhumeur, J. Hespanha, and D. Kriegman. "Eigenfaces vs. Fisherfaces: Recogni2on Using Class Specific Linear Projec2on". IEEE Transac-ons on pa7ern analysis and machine intelligence 19 (7): 711. 1997. Lecture 17-37!
Fischer s Linear Discriminant Analysis Goal: find the best separa2on between two classes Slide inspired by N. Vasconcelos Lecture 17-38!
Basic intui2on: PCA vs. LDA Lecture 17-39!
technical note Linear Discriminant Analysis (LDA) We have two classes such that We want to find the line z that best separates them One possibility would be to maximize Slide inspired by N. Vasconcelos Lecture 17-40!
technical note Linear Discriminant Analysis (LDA) However, this difference can be arbitrarily large by simply scaling w We are only interested in the direc2on, not the magnitude Need some type of normaliza2on Fisher suggested Slide inspired by N. Vasconcelos Lecture 17-41!
technical note Linear Discriminant Analysis (LDA) We have already seen that also Slide inspired by N. Vasconcelos Lecture 17-42!
technical note Linear Discriminant Analysis (LDA) And which can be wriien as Slide inspired by N. Vasconcelos Lecture 17-43!
technical note Visualiza2on x2 S W = S 1 + S 2 Within class scaier S 1 S 2 S B x1 Between class scaier Lecture 17-44!
technical note Linear Discriminant Analysis (LDA) Maximizing the ra2o Is equivalent to maximizing the numerator while keeping the denominator constant, i.e. And can be accomplished using Lagrange mul2pliers, where we define the Lagrangian as And maximize with respect to both w and λ Slide inspired by N. Vasconcelos Lecture 17-45!
technical note Linear Discriminant Analysis (LDA) Sexng the gradient of With respect to w to zeros we get or This is a generalized eigenvalue problem The solu2on is easy when exists Slide inspired by N. Vasconcelos Lecture 17-46!
technical note Linear Discriminant Analysis (LDA) In this case And using the defini2on of S No2ng that (μ 1 -μ 0 ) T w=α is a scalar this can be wriien as and since we don t care about the magnitude of w Slide inspired by N. Vasconcelos Lecture 17-47!
PCA vs. LDA Eigenfaces exploit the max scaier of the training images in face space Fisherfaces aiempt to maximise the between class sca<er, while minimising the within class sca<er. Lecture 17-48!
Basic intui2on: PCA vs. LDA Lecture 17-49!
Results: Eigenface vs. Fisherface (1) Input: 160 images of 16 people Train: 159 images Test: 1 image Variation in Facial Expression, Eyewear, and Lighting With glasses Without glasses 3 Lighting conditions 5 expressions Lecture 17-50!
Eigenface vs. Fisherface (2) Lecture 17-51!
What we have learned today Introduc2on to face recogni2on Principal Component Analysis (PCA) The Eigenfaces Algorithm Linear Discriminant Analysis (LDA) Turk and Pentland, Eigenfaces for Recogni2on, Journal of Cogni-ve Neuroscience 3 (1): 71 86. P. Belhumeur, J. Hespanha, and D. Kriegman. "Eigenfaces vs. Fisherfaces: Recogni2on Using Class Specific Linear Projec2on". IEEE Transac-ons on pa7ern analysis and machine intelligence 19 (7): 711. 1997. Lecture 17-52!