Eigenimaging for Facial Recognition

Eigenimaging for Facial Recognition Aaron Kosmatin, Clayton Broman December 2, 21 Abstract The interest of this paper is Principal Component Analysis, specifically its area of application to facial recognition known as eigenfaces. Principal Component Analysis will be used to create a face space into which images can be projected and known faces be recognized. Facial recognition has many applications. It can be used for security, verifying that the person accessing something is allowed clearance. It can be used in surveillance, picking a person of interest out of a crowd. It also has many implications for differentiating other objects. The camera industry is currently marketing digital cameras with just such facial recognition software that works in real time to adjust focus and highlight people s faces. There are three main approaches to facial recognition. There s Principle Components Analysis (), Linear Discriminant Analysis (LDA), and Elastic Bunch Graph Page 1 of 18

Mapping (EBGM). LDA uses multiple images of the same person with different facial expressions to create a face class that belongs to a specific person. EBGM Uses 3-dimensional analysis to create a wireframe of the face and uses that wireframe to recognize faces[3]. For the purposes of this project, will be used. The approach was developed by Kirby and Sirivich[]. This approach uses several images of different faces to create an average face. The differences of each face from the average face are used to create a vector representing the face in face space. Mathematically, will treat every image of the training set as a vector in a very high dimensional space. The eigenvectors of the covariance matrix of these vectors will incorporate the variation amongst the face images. [2] A sample of 11 pictures from 1 different people will be used. Some of the pictures from each person will be used to create a training matrix, Γ. Using Γ, a program can be trained to project images into face space, assigning a vector to each image it is given. This vector can be saved for each person, creating a database composed of the persons and their associated vectors. A picture not included in Γ, can be projected into face space to create its own vector. This vector can be compared against the database of vectors to find the identity of the person. Principal Component Analysis is a way of identifying patterns in data, and expressing the data in such a way as to highlight similarities and differences. [6] For the purpose of facial recognition the patterns will correspond to facial features that vary in all people, i.e. width of noses, distance between nose and mouth and width between eyes. The point of this section is to introduce before applying it to the correlation, compression, and identification of an image. The majority of the steps taken in this introduction to will be replicated with only minor alterations for the eigenface Page 2 of 18

recognition algorithm. A good way to understand analysis is to start in two dimensions so it can be represented visually. For the purpose of this demonstration the representative dimensions will be X and Y, each containing a dataset of points in two dimensional space, Figure 1a. To begin, the covariance of the two datasets must be found. Covariance is a way to measure how spread out data is, it is related to variance which is defined as the standard deviation squared, s 2, but while variance is only useful in one dimensional analysis, covariance is the correlation between the standard deviation of one dimension s dataset to that of another. Finding the covariance begins by calculating the average each dimension of the data set. Next, the average of each dimension is subtracted from the points in the respective dimension. This yields a new dataset comprised of the standard deviations for each dimension, X X and Y Ȳ, and will hence forth be known as the adjusted dataset. This step adjusts the data to be centered around the axis, see Figure 1b. The standard deviation of X and Y can be found with: s X = n i=1 (X i X) 2 n 1 s Y = n i=1 (Y i Ȳ )2 n 1 The standard deviation can be used to find both the variance and covariance of X Page 3 of 18

(a) Original data Page 4 of 18 (b) Adjusted Data Figure 1: Original graph and adjusted data set

and Y. The covariance of X and Y is: n i=1 cov(x, Y ) = (X i X) n 2 i=1 (Y i Ȳ )2 n 1 n 1 n i=1 = (X i X)(Y i Ȳ ) n 1 = s X s Y The vairance of one dimension is simply cov(x, X): n i=1 cov(x, X) = (X i X) n 2 i=1 (X i X) 2 n 1 n 1 n i=1 = (X i X)(X i X) n 1 = (s X ) 2 The covariance is a measure of how much the data in one dimension varies from that dimensions mean with respect to how much another dimensions data varies from its mean. If the value of this correlation is positive the two data sets are increasing together, if negative, one data set is increasing while the other is decreasing. A covariance of zero means that the data sets do not vary with with respect to one another and the the two dimensions are independent. By putting the adjusted datasets, X X, Y Ȳ, into the columns of a matrix, the covariance matrix is born. C = A T A where A = [X X, Y Ȳ ] In this case the matrix has only two dimensions but it is possible using this matrix technique to solve for N covariances at a time as the dimensions increases to R N the matrix A takes the shape of A = [X 1 X 1, X 2 X 2,..., X N X N ]. Page of 18

Dominant Eigenvector Figure 2: The adjusted dataset with eigenvectors. There are some important things to notice about a covariance matrix, first of all since cov(x, Y ) = cov(y, X), the matrix is symmetric across the main diagonal. Also, each value along the main diagonal is the variance for a dimension. The covariance matrix will always end up square and singular. The dominant eigenvector will create a line of best fit through the adjusted data, the other eigenvectors will be orthogonal to this vector. The eigenvectors will be contained in a matrix V. By observing Figure 2, it can be seen that the eigenvectors represent a line of best fit and a line perpendicular to that, notice also that the number of eigenvectors correlate to the number of dimensions the data sets are in and are independent of the number of data entries each dimension has. Any data from the adjusted dataset can be represented exactly as a linear combination of these eigenvectors. As the number of data points increases to R 3 and beyond, the additional vectors will also be perpendicular. Page 6 of 18

The largest eigenvalues of the covariance matrix will correspond to eigenvectors that point to where the greatest covariances are in the data line. These eigenvectors are the principal eigenvectors or principal components. It is possible to represent the data very closely with relatively few principal eigenvectors, and not lose much information about the original data. The principal eigenvector in the case of the sample dataset is the vector that passes through the majority of the data points and is aptly labeled dominate eigenvector in Figure 2. Since the second eigenvector contributes so little to the position of the adjusted data, a good representation of the data can be shown in one dimension. This concept is at the heart of. The last thing does to the dataset is put it in terms of the eigenvectors of the of the covariance matrix. Newdata = V T A T The new data is the product of the eigenvector matrix transpose and the adjusted dataset transpose. This shifts the axes of the dataset to fit more naturally with the data. In the event only one principal eigenvector is chosen to represent the data (to perhaps decrease computing time), the data will be meshed together to fit along that principal eigenvector. Figure 3 plots the adjusted data with stars alongside the adjusted data in terms of the dominant eigenvector (circles). It is important to note that the last steps were taken assuming that the eigenvectors of the covariance matrix were normalized. This is important so that when multiplied by the adjusted dataset no distortion of the data occurs, however most computing programs normalize the eigenvectors automatically. It is also important to retrieving the original data set back that the eigenvectors are unit vectors so that V 1 = V T, this will simplify the calculations greatly. The original data set can be brought back easily by reversing the process just described. That is; originaldata X = (V Newdata) + X originaldata Y = (V Newdata) + Ȳ Page 7 of 18

Figure 3: The adjusted dataset in terms of the dominate eigenvector. Page 8 of 18

Data along eigenvector Original data Figure 4: The original dataset along side the restored eigenvector dataset. Page 9 of 18

Figure 4 plots the original data with stars alongside the data retrieved using only a the single principal eigenvector(circles). Some of the information about the original data was lost, but the general relation of the points remains. The first step to identifying the faces is to create the training matrix. For the purpose of this paper, a set of images from Yale will be used. The set consists of 11 images each from 1 different persons, and are all 32 243 pixels. The faces are in 8-bit greyscale, allowing for 2 8, or 26 unique values per pixel. The images are converted to a (32 243) 1 vector ( 77, 76 1), Γ i. In this paper, the number of pixels, 77,6, will be referred to as N; The number of images in the training set, 1, will be refered to as M. The training matrix is created by grouping the vectors into a matrix such that: Γ = [Γ 1, Γ 2,..., Γ M ] It is important to note that the values of the rows are not randomly distributed. Faces, being generally alike in overall structure will have greyscale variations in roughly the same areas corresponding to features such as eyes, cheeks, lips, etc. can be used to find the areas of highest variance. The first step in identification is to take an average of the columns of Γ to create the vector Ψ. Ψ = 1 M Γ i M Ψ, Figure, is analogous to X and Ȳ in the previous example. Ψ contains all the averages in a single vector. After Ψ has been created a new N M matrix, Φ is created from the difference of the columns of Γ and Ψ. i=1 Page 1 of 18

Page 11 of 18 Figure : Ψ.

Φ i = Γ i Ψ, for i=1 to M Φ is a matrix of which each row contains the difference between the original image in a given column and the average of the images. The next step in is to find the eigenvalues and eigenvectors of the covariance matrix. Calculating the covariance matrix, C, is usually: C = A T A The normal convention has the related data points running down the columns of A. In our case however, the related data points are the pixels, which run across the rows. As a result, A = Φ T. This implies: C = ( Φ T ) T Φ T = ΦΦ T For facial recognition, the number of pixels is usually much larger than the number of images. In this case, the images are 32 243 pixels, making the covariance matrix 77, 76 77, 76 in size. This is a prohibitively large computation even with computer automation. The non-trivial eigenvalues and eigenvectors of the covariance matrix can be found another way. As stated by Sirovich and Kirby, If the number in the ensemble M is less than the dimension of C, then C is singular and cannot be of order greater than M. [] The rank of the covariance matrix is M. Although the covariance matrix is N N it was composed of an N M matrix. This implies that many of the columns are linear combinations of other columns. Sirovich and Kirby developed a method to find the eigenvectors of C without computing them from C itself. By starting with the eigenvectors of the inner product: Page 12 of 18 Φ T Φv i = λ i v i

they then multiply on the left of both sides by Φ giving: ΦΦ T Φv i = λφv i CΦv i = λφv i From this they found that the eigenvectors of the covariance matrix can be found by multiplying the eigenvectors of the Φ T Φ matrix on the left by Φ. This simplifies the process considerably since Φ T Φ is M M and much easier to find the eigenvalues for. The additional eigenvalues of ΦΦ T belong to the null space and can be ignored for the purpose of facial recognition. After multiplying the eigenvectors of Φ T Φ on the left by Φ, the vectors are no longer normalized. They can easily be renormalized by dividing them by their respective lengths. This allows us to replace V 1 with V T and simplifies some calculations. Each column of Φ can be found as a linear combination of eigenvectors of the covariance matrix. Since the eigenvalues and the columns Φ i are known, the specific linear combination can be found with: This implies: Φ i = c 1 v 1 + c 2 v 2 +... + c M v M = [c 1, c 2,..., c M ] i = [c 1, c 2,..., c M ] i V v 1 v 2... v M [c 1, c 2,..., c M ] i = V T Φ i Page 13 of 18

The vector of coefficients, [c 1, c 2,..., c M ] i, c i, represents the projection of Φ i into face space, where c i corresponds to a point in face space. To identify new faces which are not included in the training set, each face is projected into facespace, creating a representation of the face as a linear combination of the eigenvectors of the covariance matrix. First the images are added into a matrix, Γ, converting them to columns as was done to the images in the original Γ training set. Next the original Ψ matrix is subtracted, such that Φ k = Γ k Ψ. Then project Φ onto face space with V T Φ. This gives the linear combination, c, in terms of the eigenvectors in V. The last step is to identify the closest facial matches, which is done by treating each face as a point in facespace. The distance between faces is found by solving for the Euclidean distance between them. (c 1i c 1 )2 + (c 2i c 2 )2 +... + (c Mi c M )2 The point in c that is closest to c is recognized as the same face. The faces that have been projected into face space can be reconstructed. The reconstruction is a close approximation of the original face. The equation to reconstruct the face is c kv + Ψ. Turk and Pentland found that the reconstructed face had about a 2% difference from the original face. To make some of the computations faster, the eigenvectors can be sorted in descending order of magnitude of their respective eigenvalues. Some of the eigenvectors contribute Page 14 of 18

Page 1 of 18 Figure 6: A reconstructed face.

less to the faces than others, these can be omitted without losing too much information. In the Turks and Pentland study about 4 images were needed to train the algorithm. The number of eigenvectors kept can easily be adjusted arbitrarily to balance the computing requirements with accuracy. The program identifies the closest face in facespace to a given input face as that face. If the program is running on an insufficient basis, the input image varies too much from its correlated training set images, or the input image wasn t part of the original training set, the possibility of a false positive arises. A false positive occurs when the program recognizes the wrong person. An optimization possibility to confront this error is to check the distance of the points in face space with each other while identifying the faces. If the distance were too large between the new face and any of the previous faces the program could add it to the training set as a new person. If the distance between points was far enough to not be certain of identity but not large enough to be certain of a new face, the program could simply not identify that face. Principle Component Analysis is an effective way to identify faces. It finds the areas of highest variance, creates a subspace, face space, with the pixels according to that variance, and places the images in said subspace. Many of the parameters of a program, such as the number of eigenvectors used and size of the training set, can be adjusted to match accuracy needs with computation limitations. Page 16 of 18

Page 17 of 18 Figure 7: The finished program using a graphic interface to control parameters and view images.

References [1] H. Ganta, P. Tejwani.24 Clemson University. Face Recognition using Eigenfaces. [2] NSTC. 26 Face Recognition, National Science and Technology Counsil. http://www.biometrics.gov/documents/facerec.pdfl. [3] K. Josic. 23 Face Recognition, The Engines of Our Ingenuity. http://www.uh.edu/engines/epi244.htm. [4] M. Turk, A. Pentland. 1991 Journal of Cognitive Neuroscience. Eigenfaces for Recognition. [] L. Sirovich, M. Kirby. 1986 Division of Applied Mathematics, Brown University, Providence, Rhode Island. Low-Dimensional procedure for the characterization of human faces. [6] L. Smith. 22 Cornell University, A tutorial on Principal Component Analysis [7] D. Pissarenko. 23 Eigenface-based facial recognion. Page 18 of 18