Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190, Winter 2011 CSE190, Winter 2011 Face Example: Face Detection Scan window over image. Classify window as either: Face Non-face Face Window CSE190, Winter 2011 Discriminative Model Learn face and nonface models from examples P( Window Face) and P( Window Non-face) Cluster samples of each class to create subclasses, and project the examples to a lower dimensional space based on multi-discriminant analysis. Detect faces in lower-dimensional space when P(Face Window) > P(Non-face Window) Add non-face examples using bootstraping [Sung and Poggio 98] Classifier Non-face State of the Art Method: Viola Jones [Viola and Jones CVPR 01]: A face is modeled as a set of Harrlike features A fast way to compute simple rectangle features Use Adaboost to focus on a small set of features Cascade of simple classifiers Error rate comparable to the best Fast: 15 Fps, 700Mhz Pentium, half resolution video images 1
Image as a Feature Vector Nearest Neighbor Classifier x 2 x 1 x 3 Consider an n-pixel image to be a point in an n-dimensional space, x R n. Each pixel value is a coordinate of x. x 1 x 2 x 3 Comments Sometimes called Template Matching Variations on distance function (e.g. L 1, robust distances) Multiple templates per class- perhaps many training images per class. Expensive to compute k distances, especially when each image is big (N dimensional). May not generalize well to unseen examples of class. Some solutions: Bayesian classification Dimensionality reduction The Curse of Dimensionality Eigenfaces: linear projection Use Principle Component Analysis (PCA) to determine the most discriminating features between images of faces. An n-pixel image x R n can be projected to a low-dimensional feature space y R m by Example: Projecting from R 3 to R 2 y = Wx where W is an n by m matrix. Recognition is performed using nearest neighbor in R m. How do we choose a good W? 2
Eigenfaces: Principal Component Analysis (PCA) How do you construct Eigenspace? [ ] [ ] [ x 1 x 2 x 3 x 4 x 5 ] W Some details: How big is Σ? Use Singular value decomposition, trick Construct data matrix by stacking vectorized images and then apply Singular Value Decomposition (SVD) to compute basis when n<<d Matrix Decompositions Definition: The factorization of a matrix M into two or more matrices M 1, M 2,, M n, such that M = M 1 M 2 M n. Many decompositions exist QR Decomposition LU Decomposition LDU Decomposition Etc. Singular Value Decomposition Excellent ref: Matrix Computations, Golub, Van Loan Any m by n matrix A may be factored such that A = UΣV T [m x n] = [m x m][m x n][n x n] U: m by m, orthogonal matrix Columns of U are the eigenvectors of AA T V: n by n, orthogonal matrix, columns are the eigenvectors of A T A Σ: m by n, diagonal with non-negative entries (σ 1, σ 2,, σ s ) with s=min(m,n) are called the called the singular values Singular values are the square roots of eigenvalues of both AA T and A T A Result of SVD algorithm: σ 1 σ 2 σ s (sorted by significance) SVD Properties In Matlab [u s v] = svd(a), and you can verify that: A=u*s*v r=rank(a) = # of non-zero singular values. U, V give us orthonormal bases for the subspaces of A: 1st r columns of U: Column space of A Last m - r columns of U: Left nullspace of A 1st r columns of V: Row space of A last n - r columns of V: Nullspace of A For d<= r, the first d column of U provide the best d-dimensional basis for columns of A in least squares sense. Thin SVD Any m by n matrix A may be factored such that A = UΣV T [m x n] = [m x n][n x n][n x n] If m>n, then one can view Σ as: Where Σ =diag(σ 1, σ 2,, σ s ) with s=min(m,n), and lower matrix is (n-m by m) of zeros. Alternatively, you can write: A = U Σ V T In Matlab, thin SVD is:[u S V] = svds(a) 3
Performing PCA with SVD Singular values of A are the square roots of eigenvalues of both AA T and A T A & Columns of U are corresponding Eigenvectors Given a collection of n vectors a 1,, a n, n = a 1 a 2 a n T a i a i [ ][ a 1 a 2 a n ] T = AA T i=1 Covariance matrix is: First Principal Component Direction of Maximum Variance Mean So, ignoring 1/n subtract mean image µ from each input image, create data matrix, and perform thin SVD on the data matrix and take top Columns of U. Eigenfaces [Turk, Pentland 91] Modeling 1. Given a collection of n labeled training images, 2. Compute mean image and covariance matrix. 3. Compute k Eigenvectors (note that these are images) of covariance matrix corresponding to k largest Eigenvalues. 4. Project the training images to the k-dimensional Eigenspace. Recognition 1. Given a test image, project to Eigenspace. 2. Perform classification to the projected training images. Eigenfaces: Training Images [ Turk, Pentland 91] Eigenfaces Variable Lighting Mean Image Basis Images 4
Projection, and reconstruction An n-pixel image x R n can be projected to a low-dimensional feature space y R m by Reconstruction using Eigenfaces Given image on left, project to Eigenspace, then reconstruct an image (right). y = Wx From y R m, the reconstruction of the point is W T y The error of the reconstruction is: x-w T Wx Underlying assumptions Background is not cluttered (or else only looking at interior of object Lighting in test image is similar to that in training image. No occlusion Size of training image (window) same as window in test image. Face detection using distance to face space Scan a window ω across the image, and classify the window as face/not face as follows: Project window to subspace, and reconstruct as described earlier. Compute distance between ω and reconstruction. Local minima of distance over all image locations less than some treshold are taken as locations of faces. Repeat at different scales. Possibly normalize windows intensity so that ω = 1. Difficulties with PCA Projection may suppress important detail smallest variance directions may not be unimportant Does not generalize well to unseen conditions Method does not take discriminative task into account typically, we wish to compute features that allow good discrimination not the same as largest variance 5
Illumination Variability Fisherfaces: Class Specific Linear Projection P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, PAMI, July 1997, pp. 711--720. An n-pixel image x R n can be projected to a low-dimensional feature space y R m by y = Wx where W is an n by m matrix. The variations between the images of the same face due to illumination and viewing direction are almost always larger than image variations due to change in face identity. -- Moses, Adini, Ullman, ECCV 94 Recognition is performed using nearest neighbor in R m. How do we choose a good W? PCA & Fisher s Linear Discriminant PCA & Fisher s Linear Discriminant Between-class scatter χ 1 χ 2 PCA χ 1 χ 2 PCA (Eigenfaces) Within-class scatter µ 2 Maximizes projected total scatter Total scatter µ 1 µ Fisher s Linear Discriminant Where c is the number of classes µ i is the mean of class χ i χ i is number of samples of χ i.. FLD Maximizes ratio of projected between-class to projected within-class scatter Computing the Fisher Projection Matrix Fisherfaces Since S W is rank N-c, project training set to subspace spanned by first N-c principal components of the training set. Apply FLD to N-c dimensional subspace yielding c-1 dimensional feature space. The w i are orthonormal There are at most c-1 non-zero generalized Eigenvalues, so m <= c-1 Can be computed with eig in Matlab Fisher s Linear Discriminant projects away the within-class variation (lighting, expressions) found in training set. Fisher s Linear Discriminant preserves the separability of the classes. 6
PCA vs. FLD Experimental Results - 1 Variation in Facial Expression, Eyewear, and Lighting Input: 160 images of 16 people Train: 159 images Test: 1 image With glasses Without glasses 3 Lighting conditions 5 expressions Leave-one-out Performance evaluation Evaluation of PCA and LDA on the Yale Face Database [Belhumer, Hespanha, Kriegman 97] Experimental Results - 2 Approach Dim. of the subspace Error rate (close crop) Error rate (full face) Eigenface (PCA) Fisherface (LDA) 30 24.4% 19.4% 15 7.3% 0.6% Harvard Face Database Recognition Results: Lighting Extrapolation 7