Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr [Sivic09] J. Sivic et al., "Who are you" Learning Person Specific Classifiers from Video, CVPR 2009 2 Application of ace Recognition 3 Subspace-Based ace Recognition Algorithms Authentication Visual surveillance Personal album organization Video management Tele-conferencing There are many face recognition algorithms based on subspace methods 4 Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) Independent Component Analysis (ICA) Local Non-negative Matrix actorization (LNM) Sparse representation by L1 minimization Many others

Why Using Subspace Methods Subspace method Dimensionality reduction: projecting the original data in full dimension onto a low-dimensional space Examples: PCA, LDA, ICA, NM, Reason to use subspace methods Data often reside in a subspace. The number of data is significantly small compared to the number of dimensions: curse of dimensionality The data in original dimension include too many variations and are not sufficiently general to construct a model. By projecting the data onto a subspace, we can obtain a general model of data, which contains similar or discriminative characteristics among data. Eigenface Eigenface: ace Recognition by PCA Project faces onto the subspace to maximize the appearance variations of faces by eigen-decomposition Compare two faces by projecting the images onto the subspace and measuring the Euclidean distance between them 5 6 [Turk91] M. Turk and A. Pentland, ace recognition using eigenfaces, CVPR 1991 Dimensionality Reduction Key observations The high-dimensional images maybe significantly correlated. It is possible to reduce the dimensionality by finding the subspace to minimize information loss. Dimensionality Reduction One possible method An example: conversion from 2D to 1D Dimensionality reduction without loss of information Data in 3D Data in 2D Projection 1 Projection 2 Error by projection 1 Error by projection 2 Minimizing information loss when data are projected onto a low dimensional subspace 7 8

Principal Components Characteristics of principal components Identifying directions to have high variances The first principal component has as high a variance as possible. Each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components Principal Component Analysis Translation of the origin Rotation of axes Drop the axes with the least amount of information The second principal component Orthogonal The first principal component 9 10 Principal Component Analysis Concept An orthogonal transformation Conversion of correlated observations into uncorrelated variables Dimensionality reduction Minimize the loss of information = maximize the variance of data Provides an optimal solution if the samples are from Gaussian distribution Computation of PCA Eigenvalue decomposition of a data covariance matrix Singular value decomposition of a data matrix How to Compute PCA Given a dataset 1 2 R 5 (7 = 1,, ;) 1= = 1 ; > 1 2 B 1 = 1 ; > 1 2 1= 1 2 1= D mx1 vector mxm matrix kxk matrix mxk matrix B E = 1 ; > E 2 E= E 2 E= D = D B 1 where E 2 = D (1 2 1=) 11 12

Objective function How to Compute PCA G: diagonal matrix of eigenvalues (sorted in a decreasing order) H: orthonormal matrix whose columns correspond to eigenvector of the direction to maximize the objective function I I I D B 1 I I D B 1 I = I D HGH O I = H D I D G H D I I I I D B 1 I I P H D I D G H D I P D GP Objective function How to Compute PCA G: diagonal matrix of eigenvalues (sorted in a decreasing order) H: orthonormal matrix whose columns correspond to eigenvector of the subspace to maximize the objective function D B 1 D B 1 = D HGH O = H D D G H D D B 1 H D D G H D T D GT P = 1 0 0 D I = S A where H = S A S 5 T = U 5 W X 5 5YW = H A:W The solution is k-eigenvectors corresponding to the k largest eigenvalues of B 1 if the subspace is k dimensional. 13 14 Error in PCA D B 1 H D D G H D T D GT Projection to Subspace The projection to the subspace obtained by PCA A simple matrix and vector computation We lose some information. G = [ A [ \ kxk [ 5 (m-k)x(m-k) T D GT = T D D A:W G W W T A:W + T W_A:5 G 5YW (5YW) T W_A:5 Information contained in subspace T = 1 1 Information loss by projection to subspace 1 kx1 vector in subspace E 2 = D (1 2 1=) mxk matrix obtained by PCA mx1 vector in original space Lost information 1 2 E 2 = I A 1D vector in this case 15 16

Back-Projection to Original Space Back-projection from subspace to original space. A simple algebraic computation The reconstructed data is different from the original data because we lost some information during the projection. Vectorize each face. PCA with ace Images mx1 vector in original space 1`2 = E 2 = D (1 2 1) mxk matrix obtained by PCA mx1 vector in original space Reconstruction error 1 2 E 2 1`2 = I A 1 A 1 \ 1 a 1 b 1 Compute mean and covariance matrix. 1= = 1 ; > 1 2 B 1 = 1 ; > 1 2 1= 1 2 1= D Perform eigen-decomposition with covariance matrix and choose the first k eigenvectors. B 1 = HGH O D H A:W G A:W,A:W H A:W 17 18 Projection to subspace PCA with ace Images Back-projection to subspace PCA with ace Images E 2 = H D A:W (1 d 1=) H A:W = [S A S W ] E 2 = S A D 1 1=, S \ D 1 1=,, S W D 1 1= D 1`2 H A:W E 2 = H A:W H D A:W (1 1=) = S A S \ S W g A g \ g D W reconstruction g A g \ g W = g A S A + g \ S \ + +g W S W 1 2 1`2 + 1= S A g A S \ g \ S W g W S A g A S \ g \ S W g W 19 20

Eigenfaces Reconstruction of aces 4D 200D 400D 21 22 Issues in PCA Computation Singular Value Decomposition The size of the covariance matrix is too large. Eigen-decomposition B 1 = 1 ; > 1 2 1= 1 2 1= D B 1 = HGH D mxm matrix: hardly manageable since the number of pixels is large. How can we obtain eigenvectors Covariance matrix Diagonal eigenvalue matrix Eigenvector matrix Use SVD instead of eigen-decomposition. Singular Value Decomposition (SVD) More efficient when the dimensionality of data is very high. i = HBT D mxn data matrix mxm eigenvector matrix mxn singular value matrix nxn matrix 23 24

Characteristics of PCA Pros: Non-iterative, globally optimal solution Cons: PCA projection is optimal for reconstruction from a low dimensional basis, but may NOT be optimal for discrimination. PCA assumes that the data are given by a Gaussian distribution. ace representation by PCA Representation of aces Given a set of faces, 1 2 R 5 (7 = 1,, ;) Projection of faces to subspace: E 2 = D 1 2 1= j \ k \ What if data distribution is not Gaussian k W j a k A j A 25 26 Training for ace Recognition by PCA Align training images, 1 2 R 5 (7 = 1,, ;) Compute the average face Training for ace Recognition by PCA Compute the eigenvectors of the covariance matrix Compute training projections B 1 = HGH O B 1 = HGH O D H A:W G A:W,A:W H A:W H A:W = [S A S W ] E 2 = H D A:W 1 2 Low-rank matrix reconstruction 1= = 1 ; > 1 2 Compute the covariance matrix B 1 = 1 ; > 1 2 1= 1 2 1= D Put all faces on the subspace = generate eigenfaces 27 28

Recognition by Nearest Neighbor Classifier Nearest neighbor classifier Compare with all training examples E A, E \,, E and assign the query image to the nearest neighbor. No training required: All we need is a distance function for our inputs. Recognition by l-nearest Neighbor Classifier A simple extension of nearest neighbor method Classification based on the majority label among l-nearest neighbors How to determine l Class 1 Testing face Class 2 Class 1 Testing face Class 2 Note: You may use a different classifier. Boundary for 7-nearest neighbors 5 faces in Class 1 and 2 faces in Class 2: The testing face is classified as Class 1. 29 30 isherface: Linear Discriminant Analysis (LDA) Aka, isher's Linear Discriminant (LD) ind the subspace to maximize the between class scatter, while minimizing the within class scatter PCA: find the subspace to maximize the scatter of the training images Reference Peter N. Belhumeur, João P. Hespanha, David J. Kriegman, Eigenfaces vs. isherfaces: Recognition Using Class Specific Linear Projection, IEEE Trans. Pattern Anal. Mach. Intell. 19(7): 711-720 (1997) Linear Discriminant Analysis (LDA) What is a good projection for classification Don t be confused with Latent Dirichlet Allocation (LDA). Not good Good 31 32

PCA vs. LDA PCA maximizes the variance of data LDA maximizes the discriminativeness of data. LDA may be a better choice for classification than PCA. PCA: the 1 st component LDA: the 1 st component Variables Data: 1 2 R 5 (7 = 1,, ;) Classes: m m A, m \,, m n Scatters Within class scatter Between class scatter Total scatter Scatter 1= op = 1 > 1 m q B op = 1 > 1 2 m q 1= 2 1 q 1= 2 2 q o p 1= = 1 ; > 1 q q@a q o p n B s = > m 2 1= op 1= 1= op 1= D B = B r + B s D B r = > B op n 33 34 Illustration Optimization 2-class problem B s Between class scatter ind the subspace to Maximize between class scatter Minimize within class scatter n B s = > m 2 1 op 1= 2 D 1 op 1= 2 n B ot Bou Mathematical formulation B r = > B 2 Subspace projection: E W = D 1 W B r = B ot + B ou Within class scatter Bv s Bv r D B s D B r 35 36

37 Solving the objective function I By generalized eigenvectors Optimization Bv s Bv r I B s I 2 = [B r I 2 B r YA B s I 2 = [I 2 I D B s I I D B r I 7 = 1,, l ind the eigenvectors of B r YA B s corresponding to the largest eigenvalues 38 ace Recognition by isherface Training Training data: given Compute within class scatters with training data in each class Compute between class scatter with entire training data ind the optimal projection Testing Take a query image, 1 R 5 Project 1 onto the subspace A:W = I A I W by Bv s Bv r E = D A:W (1 1=) D B s D B r Compare E with all training examples E A, E \,, E and classify the query image by nearest neighbors or other classifiers. Eigenfaces vs. isherfaces Data Variations in lighting and facial expressions. Results Eigenfaces vs. isherfaces 39 40

Challenges in ace Recognition Performance of Recent Algorithms A lot of potential variations in Poses Lighting condition View point acial expressions Image registration issues Handling many people http://vis-www.cs.umass.edu/lfw/ 41 42 43