Face Detection and Recognition. Charlotte Horgan

Size: px

Start display at page:

Download "Face Detection and Recognition. Charlotte Horgan"

Darlene Leonard
5 years ago
Views:

1 Face Detection and Recognition Charlotte Horgan April 7, 011

2 Abstract This report looks at some aspects of the face recognition problem applied to three-dimensional range data. It considers the statistical approach of principal component analysis as a method of data reduction and face detection and recognition. It then looks at selected methods for finding faces in a scene, extracting the region of interest and finally two approaches for comparing facial surfaces via shapes of level curves.

3 Declaration This piece of work is a result of my own work except where it forms an assessment based on group project work. In the case of a group project, the work has been prepared in collaboration with other members of the group. Material from the work of others not involved in the project has been acknowledged and quotations and paraphrases suitably indicated. Acknowledgments Many thanks to Dr. Kasper Peeters and Dr. Ian Jermyn for their help and guidance. 1

4 Contents 1 Introduction What is Face Recognition? Why is Face Recognition Important? Why use 3D Range Data? Report Outline Principal Component Analysis 8.1 Motivation Covariance Principal Component Analysis Method Example of PCA in Two Dimensions Application to Face Detection and Recognition Using PCA to Classify Images Summary Geometry of Surfaces Surfaces Fundamental Forms First Fundamental Form Second Fundamental Form Third Fundamental Form Gaussian and Mean Curvature HK Classification Isometries Theorema Egregium Summary Finding Faces in a Scene Motivation for Choice of Eyes and Nose to Indicate the Presence of a Face HK Segmentation Method Estimating Partial Derivatives and Curvature

5 4..1 Angle Deficit Method Spherical Image Method Local Method for Calculating Partial Derivatives Further Work Summary Determining the Region of Interest Definitions Newton s Minimisation Approach Method to Compute Point-to-Parametric-Surface Distance Quaternions Multiplication of Quaternions Products of Quaternions Dot Products Between Quaternions Useful Properties of Quaternions Describing Rotation with Unit Quaternions Example - Using Quaternions to Represent Rotation Benefits of Using Quaternions Over Other Representations Finding the Optimal Rotation for Alignment Summary of Finding Optimal Rotation Iterative Closest Point Finding the Region of Interest Summary Face Recognition Using Facial Curves The Shooting Method Constructing Geodesics Between Curves Geodesics on the Preshape Space Geodesics on the Shape Space Shooting Method Problem Statement Path-Shortening Method Path-Shortening Flows on the Preshape Space Path-Shortening Summary Geodesics on Shape Space Application to Face Recognition Illustration of Number of Curves Defining a Face Further Work Conclusion Summary Further Work

6 A Covariance Calculation 81 B Finding the Tangent Space of the Preshape Space 83 4

7 Chapter 1 Introduction 1.1 What is Face Recognition? Face recognition can be considered as a computer vision problem, in particular, it is a type of pattern recognition. The human brain is able to automatically process input from the eyes and identify the presence of objects in a scene, and information about their location, orientation and identity. It is desirable to find a method that enables a computer to compute this information without human input. The problem can be split into the problem of finding a face in a scene - detection; and the problem of finding the identity of a face - identification or recognition. There are two main data types that can be considered: intensity data and range data. Intensity data assigns, at each point, a value that corresponds to the intensity at that point; range data assigns a depth value. This project will focus on range data. 1. Why is Face Recognition Important? Face recognition has many applications including border control, passports, law enforcement and auto-focus capabilities of digital cameras. These applications have different requirements, for example auto-focus would predominantly require the ability to locate multiple faces in a scene, whereas border control needs to find the identity of a face from a known group. Many of these applications require data to be acquired without explicit consent, that is, the images are not necessarily obtained under the same conditions so the detection and recognition algorithms must be robust to changes in pose, illumination and expression, as well as dealing with occlusions from clothing and hair. Much international interest has been placed in the utilisation of automatic face recognition as a method for identification, which led to several major studies including the 5

8 Face Recognition Vendor Test [7] and the Face Recognition Grand Challenge [6]. With continued investment and study face recognition is certainly a viable method for rigorous identification. 1.3 Why use 3D Range Data? The application of range data to the face recognition problem is a relatively recent attempt at overcoming some of the failings of intensity data. The results of the 00 Face Recognition Vendor Test [7] revealed that automatic face recognition could be improved by considering range data in addition to intensity data. This led to the target of improving recognition performance by an order of magnitude as detailed in the Face Recognition Grand Challenge in 005 [6]. It aimed to evaluate performance of different face recognition techniques on real-life large-scale databases which included large variation in pose, expression and illumination; all factors that traditional intensityonly-based techniques struggled with. By using range data more of the 3D geometry information is preserved and so there are more clues to handle changes in expression and still result in a correct identification [3]. Also, variation in the lighting conditions affect the range data to a lesser extent than intensity data since range data gives a description of the shape and not reflectance of the surface. However, it should be noted that some techniques that are used for obtaining range data are somewhat sensitive to illumination, such as structured light, in which deviations from a known pattern of light provide information about target objects [4]. Another advantage is the ease with which faces may be located and orientation information obtained so that different faces may be aligned automatically for comparison. Whilst 3D range databases aren t as vast and comprehensive as intensity databases for faces, advances in technology mean that high resolution information can be obtained at a decreasing cost. 1.4 Report Outline This report begins by looking at a classic statistical technique for dimensionality reduction and face detection and recognition; principal component analysis projects each face image to a point in face space by selecting the components which best describe variation among known faces. The faceness of novel faces is tested by projecting into the face space and measuring the distance between this point and known face projections. This technique was first used on D intensity images but may also be applied to 3D range images and used in conjunction with other techniques as a simple method 6

9 for face detection, recognition and data reduction. In Chapter 3 the basic mathematical tools for understanding range data as a surface are established. In particular, a rigorous definition of a surface and explanations of several intrinsic and extrinsic variables about the shape and embedding of the surface into ambient space. This chapter is largely theoretical and may be skipped if the reader has an understanding of differential geometry. Chapter 4 looks at a method for finding faces in a scene by using a technique called HK classification which segments the range image into regions based on the sign of the mean and Gaussian curvature at each point. From this the eyes and nose are located by identifying the most likely candidate regions and filtering to ensure that the arrangement of features is appropriate for a face. Once a face has been located in the scene a frame of reference may be established so that different faces can be compared. An important mathematical tool that is also discussed in this chapter is the estimation of partial derivatives of a discretely sampled smooth surface and a local method for determination is discussed. In order for different faces to be compared it is useful to have all faces the same size and orientation so that direct comparisons may be made using techniques such as principal component analysis. Chapter 5 looks at a procedure of aligning, or registering different surfaces via scaling, rotating and translating. This can be used to align two different images or as a step in finding the region of interest of a single image. The region of interest of a facial surface can be thought of as the intersection of the surface and a sphere centered at the nose tip. The nose tip is found by aligning a face with its mirror image and then comparing points to find the central profile, the nose tip is the point of maximal distance from a line segment connecting the end points of the central profile. Finally, in Chapter 6 the problem of face recognition is addressed using geodesics between facial curves. Two methods for the construction of geodesics; the shooting method and the path straightening method are explained and compared. 7

10 Chapter Principal Component Analysis.1 Motivation Principal Component Analysis (PCA) is a method for reducing the dimensionality of data by selecting those components which provide most useful information and describing the data in terms of those components. This is important since it greatly reduces computational demands, yet still produces useful, workable data. The idea is to identify those directions in which the greatest change occurs, this is found from the covariance matrix which provides a measure of the degree of linear relationship between two variables. The direction in which the covariance is maximized is the first principal component, and the orthogonal direction in which there is second greatest covariance is the second principal component, etc.. These vectors are then assembled into a matrix which has the effect of mapping the initial data to a lower dimensional space, on which further analysis is more easily carried out.. Covariance An important tool in Principal Component Analysis is covariance. The following is adapted from Lindsay I Smith [31]. Covariance is a measure of how much dimensions of a data set vary from the mean with respect to one another. The covariance between two components of a data set X and Y is given by cov(x, Y ) = ni=1 (X i X)(Y i Ȳ ) (n 1) where n is the number of elements, and X is the mean value of X, similarly for Ȳ. The sign of the covariance reveals the nature of the relationship between X and Y, for example, a positive value indicates that X and Y increase together, a negative value indicates that as one increases the other decreases, and a value of zero indicates that 8

11 the variables are independent of one another. If there are more than two dimensions, say n, then one can write down a symmetric 1 matrix that covers all possible covariances: C = (c i,j, c i,j ) = cov(dim i, Dim j ) where i, j = 1,..., n, C is an n n matrix and Dim k is the k-th dimension. For example, in three dimensions (n = 3) we have C = cov(x, x) cov(x, y) cov(x, z) cov(y, x) cov(y, y) cov(y, z) cov(z, x) cov(z, y) cov(z, z). Note that the covariance between X and itself is just the variance: var(x) = ni=1 (X i X)(X i X). (n 1) Now a notion of covariance has been established we can look at the method of principal component analysis..3 Principal Component Analysis Method A brief outline of the procedure for of determining the principal components is described below and then a simple example is given to demonstrate the concepts. Given an ensemble of vectors Γ 1, Γ,..., Γ M of size N find the average of the set by calculating the mean value Ψ: Ψ = 1 M Γ n M then subtract this from each vector to give the difference vector : n=1 Φ i = Γ i Ψ. We seek a set of M orthonormal vectors to act as a basis for our new space to which the vectors will be projected. These can be found from the covariance matrix: C = 1 M Φ n Φ T n R N N M n=1 = AA T 1 since cov(x, Y ) = cov(y, X). 9

12 where A = [Φ 1 Φ... Φ M ] R N M. Calculating the eigenvectors of C directly is computationally costly for large N, however, one may consider the smaller matrix A T A R M M. Consider the eigenvectors v i of A T A such that A T Av i = µ i v i where µ i are the corresponding eigenvalues. Note that by premultiplying both sides by A gives A T AAv i = µ i Av i CAv i = µ i Av i. So Av i are the eigenvectors of C. Let L = A T A R M M, L mn = Φ T mφ n. Find the M eigenvectors v l of L in the usual way, i.e. diagonalise and read off solutions. Discard the eigenvectors corresponding to eigenvalues of zero since these correspond to dimensions between which there is no correlation so does not provide any useful information. From this the eigenvalues of C are clearly those of L and are given by u l = Av l M u l = v lk Φ k, l = 1,..., M. k=1 The M eigenvectors of L determine linear combinations of the initial difference vectors, and when applied to face data are referred to as eigenfaces..4 Example of PCA in Two Dimensions Here we will consider an example that has dimensions so that each step may be visualised via a scatter diagram. This method may, of course, be used for an arbitrary number of dimensions. We begin with 15 data points for X and Y as shown in Table.1. Firstly, subtract the mean value from each data point to give Table., i.e. X i X and Y i Ȳ, i = 1,..., n. This centers the data and ensures we obtain the correct principal components (not one that is directed towards the mean). This may be illustrated as a scatter diagram, Figure.1, and the centering procedure is easily identified from the mean adjusted data in Figure.. Secondly, calculate the covariance matrix. The calculation may be found in the table in Appendix A.1 and results in the following covariance matrix: 10

13 Figure.1: Initial Data. Figure.: Mean Adjusted Data. 11

14 X Y Table.1: Initial Data X Y Table.: Mean Adjusted Data C = ( Note that the sign of the non-diagonal elements indicates positive or negative correlation between the variables. Since the values here are positive would expect to see positive correlation between these variables, which indeed is true. Thirdly, calculate the unit eigenvectors and eigenvalues by diagonalising the covariance matrix. Here we find the eigenvalues to be ( ) ). and the associated unit eigenvectors ( These eigenvectors may be plotted on the mean adjusted data to illustrate that two orthogonal principal components have been found, Figure.3. The component with an associated eigenvalue of 11.7 is the first principal component and describes the direction in which there is most variation within the data. Next, order the eigenvectors as columns in a matrix from highest to lowest eigenvalue. If one wishes to perform data reduction then simply eliminating the eigenvectors with the lowest associated eigenvalues and projecting the original data will achieve this. If ). 1

15 Figure.3: Mean Adjusted Data with Eigenvectors. there are any eigenvectors with zero eigenvalues then these may be eliminated without loss of information, since the zero eigenvalue corresponds to zero covariance between those variables. We will first consider the case where both eigenvectors are retained and then when only one is kept. Ordering the eigenvectors gives ( Projecting the original mean-adjusted data with these eigenvectors has the effect of rotating the data so that the principal components become the axes. This is achieved by multiplying the transpose of the ordered eigenvector matrix by the mean-adjusted data arranged into columns in a matrix. This gives transformed data in terms of two new variables X and Y as shown in Table.3 and is illustrated by Figure.4 If only the first eigenvector is retained and the initial data projected, we find that only the change in the principal direction is retained, as shown in Figure.5..5 Application to Face Detection and Recognition The notion of using PCA in the problem of face recognition was first developed by Sirovich and Kirby in 1987 [30]. This was then implemented by Turk and Pentland in 1991 [33], and despite its apparent simplicity remains one of the most effective methods for the detection and recognition of faces. This method is well suited to the 13 ).

16 X Y Table.3: Transformed Data. Figure.4: Transformed Data. 14

17 Figure.5: Reduced Transformed Data. problem since the data from both intensity and range images of faces is described in extremely high dimensions, and so it is not viable to directly compare face images. PCA has been successfully applied to range data [1] and largely follows the same method as outlined in the example. Once one has obtained range data for a set of face images, it is necessary to apply some pre-processing before PCA is used. In particular, the salient features should be aligned so that, for example, the nose tip, is in the same position in each image; this ensures that the differences between the faces are due to variances in the shape of the facial surface and not the positioning of the face. For the same reason, it is necessary also to align each face so that it is vertical, Hesher et al. [1] use the nearest adjacent points (which correspond to the next closest depth) which produces a series of points along the bridge of the nose. This line is then used as a reference and each image is rotated so that this line is vertical. Since there may have been variations in distance at which each subject was from the camera, the depths are adjusted so that the nose tip is at an equal distance in each image. Finally the image is cropped to ensure just the salient features are compared, eliminating peripheral noise such as hair which impedes on the accuracy of PCA. Assuming a set of M range images in which each image is initially arranged as an N N matrix, where each element defines the distance of that point in space from the focal plane of the camera. Each image will first have to be arranged into a single column vector Γ i of length N, where the columns of the matrix are arranged sequentially. The mean of the M vectors is then subtracted from each element to give the 15

18 difference vector or mean adjusted data. Combine the difference vectors into a matrix A of dimension N M. The covariance matrix calculated results in an N N matrix. Whereas in the example it was easy to diagonalise the matrix and find its eigenvectors and eigenvalues, this problem is significantly increased for large N, i.e. when images have substantial detail. This means that the trick described in Section.3 in which the eigenvectors of A are found should be employed. The eigenvectors obtained determine linear combinations of the difference vectors which produce eigenfaces, those eigenvectors with the highest associated eigenvalues are the principal components. To reduce the dimensionality of the data, simply discard those eigenvectors with lowest associated eigenvalues..6 Using PCA to Classify Images PCA can be used to classify images into known faces, unknown faces and non-faces. This is achieved by choosing suitable thresholds and noting that since faces are broadly similar they will project onto a small region called the face space. Images which fall outside of the face space are classified as non-faces and those inside as faces. If the input image is sufficiently close to a known face (from an initial training stage) then the face may be identified. In order to classify an unknown image, it is necessary to have an initial known set of faces (the training set). The eigenvectors of the covariance of their differences is found and when multiplied with the difference vectors give eigenfaces. The eigenfaces are of much lower dimension than the original images M N. The M images with the highest associated eigenvalues are retained. Each of the training images can be described by a linear combination of the eigenfaces. The relative weights of each component is given by These weights may be arranged as a vector ω k = u T k (Γ Ψ), k = 1,..., M. (.1) Ω T = [ω 1, ω,..., ω M ]. (.) This vector gives a description of the location of the image in the projected space with respect to the eigenfaces, which act as a basis. Determining whether a new image belongs to a known face class can be done simply by calculating the Euclidean distance between the test image and each known face class, and then minimising this distance. min ɛ k = Ω Ω k. (.3) 16

19 If the distance is below some suitable threshold θ 1 then the test image is said to be in that face class, i.e. the identity of the image has been found. A similar approach reveals whether an image is actually a face. Rather than calculating the distance to a face class, find the distance to the face space, that is, the distance between the mean adjusted input image and its projection onto the face space ɛ = Φ Φ k (.4) where Φ = Γ Ψ, Φ f = Σ M i=1ω i u i. Below some threshold value θ the image is considered a face. In summary, if ɛ > θ the image is not a face, if ɛ < θ and ɛ k > θ 1 the image is an unknown face, and if ɛ < θ and ɛ k < θ 1 the image is a known face..7 Summary We have seen how to reduce the dimensionality of data by finding those components between which there is greatest variation. For example, in two dimensions this is equivalent to finding the principal axes. If one considered a facial surface as a set of points in three dimensions then PCA would identify the directions in which there is greatest variation; that is, it would identify the up-down direction of the face as the principal direction of the face as the principal component, this was done by [3]. The lower dimensional space is called the face space and each image is projected to a point in this space. After a training period where a number of known faces are used to calibrate the location of the face space in the projection space, new images may be classified as face or non-face by considering the distance between its projection and the face space. This method can be used with range data by considering the depth value as one would an intensity value. For example, [9] used PCA on range images and showed good robustness when considering the rigid portion of the face, that is, the nose and brow region. Further, individual faces may be identified by finding the nearest neighbour to the projected point. To ensure that all test faces are correctly identified as faces one would require a large allowable distance to the face region, however, this would result in an increase of false positive results. Similarly for the identification of individuals. We now would like to make the most of the three dimensional information that the range data possesses. To do this we must have an understanding of differential geometry. The following chapter establishes the basis for understanding range data as an approximation to a smooth surface. 17

20 Chapter 3 Geometry of Surfaces One method for identifying faces from a range image is to identify salient features through curvature analysis. Likely candidates may then be classified into faces and non-faces by using techniques such as Principal Component Analysis. In order to be able to describe surfaces obtained from range data using curvature analysis, it is first necessary to establish the required notions of a surface, isometries, geodesics, the first and second fundamental forms, Gaussian curvature and mean curvature. The definitions and theory of this chapter is adapted from lecture notes for Differential Geometry by John Bolton [3]. 3.1 Surfaces In order to perform curvature analysis it is necessary to have a description of the surface in question. Ideally, this surface would be smooth so that the exact coordinates for each point is known. However, since the range data is obtained in a discrete manner the description of the surface is also discrete. The upshot of this is that any information derived from this will be an approximation only. Intuitively, a surface S in R n looks locally like an open subset of R. To properly define a surface we must insist on the following properties being held. Since we will be considering facial surfaces in R 3 we will give the definition for the case n = 3. Definition (Surface). A non-empty subset S of R 3 is a surface if for every point p S 1. there is an open subset U of R and a smooth map x : U R 3 such that p x(u) S. That is, we want a smooth map that takes an open subset U of the plane smoothly onto part of the surface S. We also require that this map can 18

21 be reversed so we can take part of the surface S to an open subset of R, this map is given by F and. there is an open subset W of R 3 which, when intersected with S gives the image of U under the map x, and a smooth map F : W R such that F x(u, v) = (u, v) for all (u, v) U. These conditions ensure that at a small enough scale S looks like a open subset of R. The map x must satisfy three further conditions given below and is called a local parametrisation and its image x(u) is called a coordinate neighbourhood of S. This means the surface S may be covered by many coordinate neighbourhoods and so we are able to study its differential geometry. 1. x(u, v) is a smooth map,. x(u, v) is a homeomorphism, 3. x u and x v are linearly independent for all (u, v) V. The idea of projecting the partial derivatives of x onto the uv-plane is illustrated in Figure 3.1. When considering facial surfaces it is useful to consider the surface S as a graph of a function Γ, Γ(g) = {(u, v, g(u, v)) R 3 (u, v) U} where g : U R a smooth function defined on U R. To show that the graph Γ(g) is indeed a surface take W (the open subset of R 3 which gives the image of x(u) when intersected with S) to be R 3 and let x and F be defined by x(u, v) = (u, v, g(u, v)) F (x, y, z) = (x, y). Then we have x(u) = W S = S that is, the image of U is the whole surface and F x(u, v) = (u, v). So S is a surface given by Γ(g) which may be covered by a single coordinate neighbourhood. We now look at how distances are measured on given surfaces. 19

22 Figure 3.1: The partial derivatives of point p on surface S and their projection to the uv plane. 0

23 3. Fundamental Forms 3..1 First Fundamental Form Now we have a way of describing the surface mathematically we can exploit our knowledge to understand distances and angles on the surface. This is done through the first fundamental form. Consider the partial derivatives of x with respect to u and v. These vectors give a basis for the tangent plane T p S of a surface S at a point p. The inner product gives us a metric, that is, a notion of how distances are measured. The restriction of the inner product to this tangent space is a symmetric positive definite bilinear form on the tangent space and gives the first fundamental form denoted I. I p (ax u + bx v ) = a x u, x u + ab x u, x v + b x v, x v where a, b are uniquely determined scalars. The coefficients of the first fundamental form, E, F, G are determined by x and are given by E = x u, x u, F = x u, x v, G = x v, x v. So the first fundamental form may be expressed as I p (ax u + bx v ) = a E + abf + b G. This means that an element of distance on the surface S can be given by I = ds = dx dx and the angle between two vectors w 1 and w in the tangent plane is given by cos θ = <w 1,w > w 1 w. The discriminant of the first fundamental form is the determinant of the matrix of the coefficients, that is, E F g = F G = EG F. Note the off-diagonal elements are both F since x u, x v = x v, x u. The first fundamental form gives intrinsic information about the surface, that is, it provides information about the intrinsic invariants of the surface. It is also known as the metric on S. 3.. Second Fundamental Form The second fundamental form of S is a measure of the amount of bending a curve must to in order to stay on S. It is given by II = dx dn 1

24 where N is the unit normal vector N(x(u, v)) = x u x v x u x v. The coefficients of the second fundamental form are defined by L = x uu N, M = x uv N, N = x vv N The discriminant is again the determinant of the matrix of coefficients, b = LN M. The second fundamental form gives information about the extrinsic invariants of the surface Third Fundamental Form The third fundamental form is the first fundamental form of the spherical image of a surface S, more detail will be given in Chapter 5. It is given by III = d s = dn dn. Similarly, the discriminant of the third fundamental form is given by a number denoted by c. The discriminants are related by c = b [18] p g 3.3 Gaussian and Mean Curvature To understand the shape of a surface one may consider different types of curvature. In particular, the Gaussian curvature and mean curvature give information about the shape of the surface at each point. The Gaussian curvature and mean curvature of a point on a surface can be described by the principal curvatures of S at p. The principal curvatures are the eigenvalues of the Weingarten map defined by dn p = T p S T N(p) S where N : S S and N(x(u, v)) = xu xv x u x v is the unit normal vector field, or Gauss map. The principal curvatures are the eigenvalues κ 1, κ of dn p and the associated eigenvectors are the principal directions.

25 Definition (Gaussian curvature [10] p. 146,155). The Gaussian curvature is the determinant of the Weingarten map, that is, the product of the principal curvatures, and can also be expressed in terms of the coefficients of the first and second fundamental forms. K = det( dn p ) = κ 1 κ = LN M EG F where E, F, G, L, M, N are the coefficients of the first and second fundamental forms respectively. Definition 3.3. (Mean curvature [10] p. 156). The mean curvature is half the trace of the Weingarten map, that is, the average of the principal curvatures, and may be expressed in terms of the first and second fundamental forms as follows = b g H = 1 T r( dn p) = 1 (κ 1 + κ ) = 1 EN F M + GL. EG F 3.4 HK Classification The shape of a surface can be classified by analysing the signs of the Gaussian and mean curvatures of points on the surface. This was first done by Besl and Jain [1] and results in 8 basic shapes which are summarised in Table 3.1 and illustrated by Figures 3. to 3.7. For simplicity it is common to consider a general hyperbolic shape and not split it down into concave, convex and symmetric. K < 0 K = 0 K > 0 H < 0 hyperbolic concave cylindrical concave elliptical concave H = 0 hyperbolic symmetric planar impossible H > 0 hyperbolic convex cylindrical convex elliptical convex Table 3.1: HK Classification Table 3.5 Isometries An isometry is a distance preserving map between two metrics. Two metrics are said to be isometric if there exists an isometry between them. Since the notion of distance is preserved and this is described wholly by the first fundamental form it can be shown that the Gaussian curvature is also preserved. That is, it is possible to describe the Gaussian curvature solely in terms of the coefficients of the first fundamental form and their derivatives, this is shown in the Theorema Egregium which follows. 3

26 Figure 3.: Saddle point or hyperbolic symmetric point. Where H = 0, K < 0. Figure 3.3: Peak point or elliptical concave point. Where H < 0, K > 0. Figure 3.4: Pit point or elliptical convex point. Where H > 0, K > 0. Figure 3.5: Planar point. Where H = 0, K = 0. 4

Figure 3.6: Valley point or cylindrical convex point. Where H > 0, K = 0. 3.5.1 Theorema Egregium Figure 3.7: Ridge point or cylindrical concave point. Where H < 0, K > 0. Theorem 3.5.1 (Theorema Egregium of Gauss [10] p.

In other words, the Gaussian curvature K of a surface S in R 3 may be expressed solely in terms of the coefficients of the first fundamental form and their derivatives. Proof.

27 Figure 3.6: Valley point or cylindrical convex point. Where H > 0, K = Theorema Egregium Figure 3.7: Ridge point or cylindrical concave point. Where H < 0, K > 0. Theorem (Theorema Egregium of Gauss [10] p. 34). The Gaussian curvature K of a surface S is invariant by local isometries. In other words, the Gaussian curvature K of a surface S in R 3 may be expressed solely in terms of the coefficients of the first fundamental form and their derivatives. Proof. Gaussian curvature is given by K = LN M so it is sufficient to show that the EG F numerator of K may be written in terms of E, F, G and their derivatives. First we must introduce the Christoffel symbols Γ k ij defined by x uu = Γ 1 11x u + Γ 11x v + LN, (3.1) x uv = Γ 1 1x u + Γ 1x v + MN, (3.) x vu = Γ 1 1x u + Γ 1x v + MN, (3.3) x vv = Γ 1 x u + Γ x v + NN (3.4) where x(u, v) is a local parametrization of S R 3. Note (3.) is the same as (3.3) since x uv = x vu. By taking the scalar product of each equation with x u and x v three pair of linear equations can be found. { EΓ F Γ 11 = 1 E u { EΓ F Γ 1 = 1 E v { EΓ 1 + F Γ = F v 1 G u F Γ GΓ 11 = F u 1 E v F Γ GΓ 1 = 1 G u F Γ 1 + GΓ = 1 G v. 5

28 Note that the Christoffel symbols are uniquely determined by E, F and G, since the determinant of the matrix pre-multiplying the Christoffel symbols is EG F and the right-hand-sides are all partial derivatives of E,F and G. Explicitly for the first pair, ) ) E F ( F Γ 1 11 G Γ 11 = EG F ( Γ 1 11 Γ 11 = 1 E u. So, on multiplying each term by the unit normal and rearranging Equations 3.1 to 3.4 we see LN M =NN NN MN MN =(x uu Γ 1 11x u Γ 11x v ).(x vv Γ 1 x u Γ x v ) (x uv Γ 1 1x u Γ 1x v ).(x uv Γ 1 1x u Γ 1x v ) =x uu x vv x uv x uv + terms involving Christoffel symbols, E, F, G and their derivatives. So now we just have to show x uu x vv x uv x uv can also be expressed solely in terms of the Christoffel symbols and their derivatives. Consider taking out a partial derivative of u or v from each term so that we are left with a second partial derivative dotted with a partial derivative, which have already been found in terms of E, F, G and derivatives. x uu.x vv x uv.x uv = u (x u.x vv ) v (x u.x uv ) = ( F v 1 ) u G u ( ) 1 v E v = 1 (E vv F uv + G uu ). So, as required, Gaussian curvature may be expressed solely in terms of the coefficients of the first fundamental form. 3.6 Summary We now have a mathematical description of a surface and have seen how to classify shape by considering the principal curvatures of the surface at each point, which can be used to define mean and Gaussian curvature. We have also seen that Gaussian curvature is independent of how the surface is embedded in ambient space, and that it is preserved under isometric mappings. This will be particularly useful in Chapter 6. Since range data is discrete we must adapt our approach for the classification of shape to account for this. The following chapter looks at the problem of finding faces in a scene by identifying likely candidate shapes. In particular, searching for salient features which have characteristic shape and arrangements. 6

29 Chapter 4 Finding Faces in a Scene One method for identifying potential faces in a scene is to segment the scene into regions of shape defined by curvature and then analyse these, in particular, looking for regions that correspond to eyes and the nose. The first step is to obtain a 3D depth map which gives a description of the physical 3D shape of an object from a focal plane. Figure 4.1 is the range image for the face shown in three dimensions in Figure 4.. For every depth map there exists an inverse set mapping that yields the given depth information from a particular perspective of an object. When the target object is simple ambiguities may arise in which a depth map may be generated by multiple objects. For example, both Figure 4.3 and Figure 4.4 will generate the same depth map when viewed front-on or from the left-hand-side. Since we are dealing with faces which are far more complicated and intricate the chances of ambiguities will be assumed to be negligible, so that every depth image corresponds to at most one face. That is, given a range image of a face the identity will be unique. The surface may be described as a the graph of a function: S = {(u, v, f(u, v) R 3 (u, v) U} f : U R where f is a smooth, twice differentiable function. So the partial derivatives of a point x on the surface S are given by Equations 4.1 x = (u, v, f(u, v)), x u = (1, 0, f u (u, v)), x v = (0, 1, f v (u, v)), x uu = (0, 0, f uu (u, v)), x uv = (0, 0, f uv (u, v)), x vv = (0, 0, f vv (u, v)) (4.1) where x u is the partial derivative of x = (u, v, f(u, v)) with respect to u, x etc.. The u unit normal vector may be found by calculating the normalised cross product of x u and x v 7

30 Figure 4.1: Range image of a face from the Bosphorus 3D database [9] where colour represents depth and the background is set to equal zero. Figure 4.: Three dimensional range image of a face from the Bosphorus 3D database [9]. Figure 4.3: Simple Shape 1 Figure 4.4: Simple Shape 8

31 N = x u x v x u x v = ( f u(u, v), f v (u, v), 1) 1 + f u(u, v) + f v (u, v). The coefficients of the first and second fundamental forms are found to be Equations 4. using the expressions given in Section 3.3 E = 1 + fu, F = f u f v, G = 1 + fv, f L = uu f, M = uv f, N = vv 1+f u +f v 1+f u +f v 1+f u +f v. (4.) Once these values have been found the mean and Gaussian curvature may be calculated at each point by substituting the coefficients of the first and second fundamental forms given in Equations 4. into the definitions of H and K. H = 1 EN F M + GL EG F ( = 1 (1 + fu) f vv 1+fu +f v = 1 f uu + f vv + f uu fv + f vv fu f u f v f uv (1 + fu + fv ) 3/ K = LN M EG F ( ) ( ) ( f uu f vv f uv 1+fu = +f v 1+fu +f v 1+fu +f v (1 + fu)(1 + fv ) (f u f v ) = f uuf vv f uv (1 + f u + f v ) ) ( ) ( ) f (f u f v ) uv + (1 + f f 1+fu +f v v ) uu 1+fu +f v (1 + fu)(1 + fv ) (f u f v ) A point on a surface can be classified by analysing the signs of the Gaussian and mean curvature - see Besl and Jain, 1986 []. This is summarised in Table 4.1. ) Table 4.1: HK Classification Table K < 0 K = 0 K > 0 H < 0 hyperbolic concave cylindrical concave elliptical concave H = 0 hyperbolic symmetric planar impossible H > 0 hyperbolic convex cylindrical convex elliptical convex 9

32 It is important to note that since the data obtained from the scanners is discrete, we must use an approximation to calculate the second derivatives. It is necessary to smooth the surface prior to the calculation of the mean and Gaussian curvature due to the sensitivity of second derivatives to noise. This may be done by applying a Gaussian filter which discards high-frequency fluctuations in the data. Once all the curvature values have been obtained it is desirable to consider only those that have the highest curvature and are insensitive to changes in expression. Discard those values which fall below some threshold to account for zero values not being picked up by discrete samples, for example, Moreno et al. [] and Colombo et al. [9] chose values approximately equal to H t = 0.05 and K t = The highest curvature values are found to be the nose and the inside corners of the eyes. These are chosen as the points that determine the face triangle which will be used to search for potential faces in a scene. Search for noses by considering the convex regions of the thresholded mean curvature map, similarly, search for eyes by looking for elliptical concave regions in the thresholded HK map. The curvature values inside these candidate regions is well described by the mean value, so the number of candidate regions may be reduced by filtering each candidate and eliminating those below some threshold value of the average mean Motivation for Choice of Eyes and Nose to Indicate the Presence of a Face Moreno et al. [] conducted a study to investigate which fiducial points on a face are best suited to locate a face in a scene. They found that the inside corners of the eyes and the nose tip were easily identifiable. The nose tip region was found to be the region with the highest number of convex elliptical points and the bridge of the nose always neighboured the nose tip and was described by convex cylindrical points. This suggests that searching for convex points in the thresholded mean curvature map will include all potential noses. Similarly, the inside corners of the eyes were found to be particularly useful in describing the face, since relatively few candidate regions appear. The eyes and nose were also shown to be particularly invariant to expression, which makes them ideal candidates for robust face detection. 30

33 4.1 HK Segmentation Method Compute mean and Gaussian curvature at each point by estimating the partial derivatives and using the equations for H and K. Since the second derivatives are very sensitive to noise, apply a Gaussian filter to the data; this has the effect of smoothing and minimising high-frequency fluctuations due to errors when the image was scanned. Analyse the signs of the mean and Gaussian curvature. First, since it is difficult to get exactly zero curvature, take the values within some threshold to equal zero. That is, set values H(u, v) < H T (u, v) and K(u, v) < K T (u, v) equal to zero. Plotting the thresholded mean and Gaussian curvatures and also HK curvature (the product of the signs) allows eyes and noses to be easily identified visually. Search for potential eyes in the thresholded HK-classified map, considering only elliptical concave regions (K > 0, H < 0). Search for potential noses in the thresholded mean curvature map, considering only points with positive values, i.e. convex regions. Once one has a set of candidate eyes and noses the search is narrowed to consider only those regions whose average curvature is high relative to the mean of the average of all regions. That is, filter the candidate regions and retain only those which satisfy H i H min for noses, where the bar indicates the average value, and for eyes. K i K min Having now a set of potential eyes and noses we want to further limit our search to those which are arranged in a face triangle. That is, a left eye, a right eye and a nose. For each candidate nose find its principal direction and use this to cut the image plane in half. A face triangle is composed of the candidate nose and a candidate eye from each of the left and right sections. Using prior knowledge of the arrangement of features in a face suggests that we only consider those face triangles that have their features a normal distance apart. Retain only those face triangles that have LR min d(el, er) LR max LN min d(el, n) LN max RN min d(er, n) RN max as illustrated by Figure

34 Figure 4.5: The distances between the left eye (el), right eye (er) and nose (n) are bounded above and below to give a rane of allowable configurations. 4. Estimating Partial Derivatives and Curvature Although the facial surface itself is smooth, the data obtained from the scanner is discrete. This means that rather than having exact values for the partial derivatives at points on the surface approximations must be made. Due to the intrinsic nature and multiple definitions of Gaussian curvature it is possible to find a description that does not rely on the explicit calculation of partial derivatives. In particular, a theorem from [18]p. 187 gives a relation between elements of area and the Gaussian curvature. Theorem At any point of a surface S, the absolute value of the Gaussian curvature K of S is equal to the quotient of the element of area, da, of the spherical image of S and the corresponding element of area, ds, of S da ds = K. (4.3) Strictly speaking, the ratio between the area of the unit sphere swept out by the normals of a region of a surface and the area of that region is the Gaussian curvature of the region. The Gaussian curvature of a point is the limit of this ratio as the area of the region of the surface tends to zero A K = lim S 0 S. (4.4) There are several methods for estimating Gaussian curvature without directly computing partial derivatives. The angle deficit method and the spherical image method [0] will be introduced and then an alternative method will be discussed that was used by Besl and Jain in 1984 [1] which estimates the partial derivatives of a discretely sampled smooth surface. 3

35 4..1 Angle Deficit Method The angle deficit method was developed from a theorem of Rodrigues which relates the differentials of the normal and position vectors in the principal directions. Theorem 4.. (Rodrigues, 1815 [5]). dn + κdx = 0 where N is the unit normal to the surface S and κ is the normal curvature in the direction dx of the line of curvature. That is, κ is the principal curvature and dx is the corresponding direction. The spherical image of a surface S can be defined as follows. Each point of S has an associated outward pointing unit normal N = n, these vectors may be considered as originating from the origin of a Cartesian coordinate system. Since each vector is of unit length the tip of each vector will intersect a unit sphere centered at the origin at a point. At this point, the unit normal of the unit sphere is the same as the unit normal at the corresponding point on the surface. Consider a mapping from each point of the surface S to a point on the unit sphere A, related by the direction of the normal vectors. This mapping is called the Gauss map or Gaussian spherical map, illustrated by Figure 4.6. Figure 4.6: The Gauss map between a surface S and a unit sphere A. A surface may be approximated, on a small enough scale, by a polyhedron with N triangular faces, where each vertex P 1 top N is a point on the surface, Figure 4.7. Each triangular face labeled by S i,i+1 has a unit normal N i,i+1 so the polyhedron may be mapped to its spherical image. The points on the surface of the unit sphere may be joined by arcs of great circles to form a spherical polygon, Figure 4.8. The area of the spherical polygon is the angle deficit of the polyhedron [7]: N π θ i,i+1. (4.5) i=1 33

36 Figure 4.7: Surface approximated by a polyhedron with 5 triangular faces, where each vertex is a point on the surface. Figure 4.8: Spherical image of the the vertices P 1 top 5 joined by arcs of great circles. Next, split the area of each triangle into three equal areas and identify each of these with a vertex of the triangle, this ensures the total area is not over-counted. Therefore the area associated with the point O will be one third of the total area of the polyhedron, that is, one third of the sum of the areas of all triangles with a vertex at O 1 3 N S i,i+1. (4.6) i=1 From this, the Gaussian curvature may be approximated by the quotient of the angle deficit to the area associated with the point O [6] p Spherical Image Method K = π N i=1 θ i,i+1 Ni=1 (4.7) S i,i+1 The spherical image method is based upon the following theorem. 1 3 Theorem 4..3 (Gaussian curvature [10] p. 167). Gaussian curvature at a given point is the limit of the area of the spherical spherical image of a closed path around that point divided by the area the area of the path, as the path shrinks around the point. 34

37 Figure 4.9: The mesh of points, where each vertex has an approximated unit normal vector. Figure 4.10: closed path. Spherical image of the This is equivalent to Theorem For a triangulated mesh each point is a vertex of some number of triangles, by calculating their normals and finding the Gauss map, we have sufficient information for an approximation of Gaussian curvature at each mesh point. In particular, if the point O is a vertex of N triangles then at each of the other vertices of these triangles the unit normal vector may be calculated, Figure 4.9. Since the mesh surface is not smooth an estimation must be made for the normal vectors, commonly this is taken as the vector parallel to an average of unit normals of the triangular faces which surround that point. Different averages can be used, including the arithmetic mean or an area-weighted mean so that faces with larger area contribute more to the average. An illustration of the spherical image of this path, as defined by the unit normals at each point along the path, is given in Figure Local Method for Calculating Partial Derivatives This section is based on [1]. If one has a surface which is sampled discretely then the partial derivatives may be estimated by finding a continuously differentiable function that best fits the data. The derivatives of this continuous function may be computed analytically and then evaluated at the corresponding discrete points. Since it is computationally costly to use just one surface to approximate all the data it makes sense to describe the surface locally, that is, to fit the surface to a finite N N window of discrete points. Consider first the one-dimensional curve fitting problem. Assume we wish to fit a curve to a sample of N points, N odd. The points may be parametrised by the set U 35

38 as U = { (N 1),..., 1, 0, 1,..., } (N 1). The discrete data describes an unknown function f(u) which we wish to approximate with a function ˆf(u). The best fit is determined to be the function ˆf(u) that minimises the least square error term ɛ = u U(f(u) ˆf(u)). Expanding ˆf(u + h) in a Taylor series gives ˆf(u + h) = ˆf(u) + h ˆf (u) + h! ˆf (u) + h3 3! ˆf (u) Since the calculation of mean and Gaussian curvature require only constant, linear and quadratic terms we can truncate the Taylor series after the second derivative. This motivates us to look for functions of the form ˆf(u) = a 0 φ 0 (u) + a 1 φ 1 (u) + a φ (u) where φ 0 (u) = 1, φ 1 (0) = u, φ (u) = u M(M+1) 3, M = N 1. Note that N is the size of the set U. The φ i (u) can be shown to be orthogonal on U φ i (u)φ j (u) = 0 for i j. (4.8) u U φ 0 (u)φ 1 (u) = 1 u u U u U = ( (N 1) + u U = 0 (N 1) since the terms cancel in pairs. φ 0 (u)φ (u) = ( 1 u u U u U = u u U u U M(M + 1) = N 3 = ( 1) ) M(M + 1) 3 (M(M + 1) 3 M(M + 1) N 3 ) 36

39 φ 1 (u)φ (u) = ( u u u U u U = u 3 u U u U = 0 So, the function f(u) is approximated by ˆf(u) given by where the coefficients are given by ˆf(u) = a i φ i (u) i=1 a i = u U f(u)b i (u) where b i (u) are normalised orthogonal polynomials b i = φ i (u) u U φ i (u). ) M(M + 1) 3 M(M + 1) u 3 From this the b i (u) can be computed, and then the derivatives of ˆf(u) are known. b 0 (u) = b 1 (u) = φ 0 (u) u U φ 0(u) = 1 u U 1 = 1 N. φ 1 (u) u U φ 1(u) = using u U u = N M(M+1) 6 and N = M + 1 we find b 1 (u) = b (u) = u N M(M+1) 3 = φ (u) u U φ (u) = u u U u 3u M(M + 1)(M + 1). u U M(M+1) 3 ( u M(M+1) 3 u using u U u 4 = M(M+1)(M+1)(3M +3M 1), 30 u U u = N M(M+1) and N = M we find b (u) = 1 ( ) u M(M + 1), where P (M) = 8 P (M) 3 45 M M M M 1 15 M. ) 37

40 The derivatives of ˆf(u) may now be computed at the center of each region being considered (u = 0). d ˆf(0) du = a d ˆf(0) 1, = a du. To extend this to two-dimensions in which data is given for an N N region, parametrised by U U simply find the function ˆf(u, v) =a 00 φ 0 (u)φ 0 (v) + a 10 φ 1 (u)φ 0 (v) + a 01 φ 0 (u)φ 1 (v) + a 11 φ 1 (u)φ 1 (v) + a 0 φ (u)φ 0 (v) + a 0 φ 0 (u)φ (v) which minimises the error function ɛ = ( ) f(u, v) ˆf(u, v). (u,v) U The coefficients a ij generalise from the one-dimensional case as a ij = f(u, v)b i (u)b j (v) (u,v) U which gives the partial derivatives f u = a 10, f v = a 01, f uu = a 0, f uv = a 11, f vv = a Further Work An alternative method for representing curvature information is the SC method which was introduced by Koenderink in 199 [17]. It decouples the shape of the surface at a point and the magnitude of curvedness meaning the surface is invariant in terms of relative curvature under changes in scale. It has also shown to demonstrate advantages over HK classification at low thresholds, in complex scenes and when dealing with noise [8]. Rather than considering the Gaussian and mean curvature it uses a shape index S (note this S is not the same as the surface being considered). The shape index is defined by the principal curvatures κ 1, κ S = ( ) π arctan κ1 + κ, κ 1 κ κ 1 κ and can take values [ 1, 1]. It describes the major shape types, excluding planar which has indeterminate shape index, as defined in Table

41 Shape Index Range Elliptical concave S [ 1, 5) 8 Cylindrical concave S [ 5, 3) 8 8 Hyperbolic S [ 3, 3) 8 8 Cylindrical convex S [ 3, 5) 8 8 Elliptical convex S [ 5, 1] 8 Table 4.: Surface shape defined by shape index In order to include planar points the magnitude of curvedness C is also considered, if a point has no curvedness then it is a planar point. The magnitude of curvedness is defined by κ 1 + κ C =. Like the HK method, the SC method cannot have exactly zero value because of noise when working with scanned images. Again, we must use a zero threshold. Recall the thresholds for curvature were H T and K T, and one may be calculated from the other via K T = H t (H T + max H(x) ), x Image. In SC classification one must threshold C T to classify planar surface regions. To compare the two algorithms [8] used C T = H T and concluded that the SC algorithm is better at dealing with image noise in scenes that contain more than one surface. In order to understand the differences between the methods, Figures 4.3 and 4.3 show the regions of shape along axes of principal curvatures. 4.4 Summary The method for finding faces we have considered looked at segmenting the scene into regions based on their shape. This shape was defined by HK classification which considered the signs of the mean and Gaussian curvature at each point. In order to compute the curvature values it was necessary to approximate the partial derivatives of the smooth surface which was sampled discretely by the scanner in the data acquisition process. As we have already seen, Gaussian curvature is intrinsic so it was possible to approximate this without explicit reference to partial derivatives, however, mean curvature did require their computation. We looked at a local method for estimating partial derivatives which aimed to find a curve/surface of best fit for sections of the surface. Now we have located where faces are in the scene the next task is to isolate the part of the face we wish to analyse. 39

42 Figure 4.11: The dashed lines separate regions of shape as classified by HK segmentation. Figure 4.1: The dashed lines separate regions of shape as classified by SC segmentation. 40

43 Chapter 5 Determining the Region of Interest Given a large database it is useful to identify and select the region of interest (ROI) on each face. This can be achieved automatically using an algorithm [3] with two major steps, the first is to find the plane of bilateral symmetry of the facial surface and the second is to locate the nose tip. The ROI is defined to be in a sphere centered at the nose tip. Prior to looking in more detail at this algorithm some useful techniques will be considered, in particular the iterative closest point (ICP) method which will be used in the alignment of facial surfaces. 5.1 Definitions The Euclidean distance between two points r 1 = (x 1, y 1, z 1 ) and r = (x, y, z ) is r 1 r = d(r 1, r ) = (x 1 x ) + (y 1 y ) + (z 1 z ). The Euclidean distance between a point p and a point set A composed of N a points denoted a i, A = {a i }, i = 1,..., N a is d(p, A) = The closest point a j of A satisfies min d(p, a i). i {1,...,N a} d(p, a j ) = d(p, A). Similar definitions may be given for the Euclidean distance between a point and a line segment or triangle. Since we will be comparing meshes of facial surfaces the most relevant definition is that for the distance between a point and a parametric surface. The Euclidean distance between p and a parametric surface S defined by r(u), u = (u, v) R is d(p, S) = min (p, r(u, v)). r(u,v) S 41

44 If one has a reliable starting value u a such that r(u a ) is very close to the closest point on the parametric surface, Newton s minimisation approach may be used to find the point-to-parametric-surface distance Newton s Minimisation Approach To minimise a function f(u) of n variables where u = (u 1, u,..., u n ) one may use Newton s minimisation approach. Start with an approximation u a to the minimum point u. Set k = 0. Step 1 Evaluate the gradient and Hessian matrix of f at the point u k. Step Calculate the next point u k+1 by using the update formula u k+1 = u k [ T (f)(u k )] 1 f(u k ). (5.1) Step 3 Test for minimisation. If minimum is found stop, otherwise set k = k + 1 and repeat. This approach may be used to give a method for computing the distance between a point and a parametric surface if an accurate initial guess is known. 5. Method to Compute Point-to-Parametric-Surface Distance We wish to minimise the scalar function f(u) = r(u) p (5.) which is the Euclidean distance between a parametric surface and a point. The minimum of f occurs at f = 0 where f = ( fu f v ) and the Hessian is given by ( T fuu f (f) = uv f uv f vv ). 4

45 The partial derivatives of 5. are f u (u) = r u (u)(r(u) p) f v (v) = r v (u)(r(u) p) f uu (u) = r uu (u)(r(u) p) + r u (u)r u (u) f uv (u) = r uv (u)(r(u) p) + r u (u)r v (u) f vv (u) = r vv (u)(r(u) p) + r v (u)r v (u). Substituting this into the update formula 5.1 gives u k+1 = u k [ T (f)(u k )] 1 f(u k ) [ ] 1 [ ] fuu f = u k uv fu f uv f vv f v [ ruu (u = u k k )(r(u k ) p) + r u (u k )r u (u k ) r uv (u k )(r(u k ) p) + r u (u k )r v u k ) r uv (u k )(r(u k ) p) + r u (u k )r v (u k ) r vv (u k )(r(u k ) p) + r v (u k )r v u k ) [ ] ru (u k )(r(u k p). r v (u k )(r(u k p) ] 1 To find the inverse of a matrix use the following identity ( ) 1 a c A 1 = = b d 1 det(a) ( ) d c. b a After some simplification this gives [ ] 1 fvv f u k+1 = u k u f uv f v f uu f vv fuv f uu f v f uv f u [ ] rvv r u (r p) r uu r v (r p) r uu r v r v (r p) r uv r u (r p) = u k. r uu r vv (r p) r uv r uv (r p) + r uu r v r v + r vv r u r u r uv r u r v This can be tested to determine whether it is indeed a minimum by substituting into f and checking whether the answer is suitably near to zero. Note that the starting value is u 0 = u a. This generally converges in one to five iterations [3], however, it must be noted that an accurate initial guess is required in order for convergence to occur. 5.3 Quaternions If one wishes to align two surfaces, that is, to minimise the distance between their points, then an elegant and simple method for describing the transformation uses 43

46 quaternions. A quaternion can be considered in one of three ways; as a vector with four components, as the composite of a scalar and an ordinary vector or as a complex number with three imaginary parts. A quaternion is denoted by. We will consider first the quaternion as a complex number and will derive some useful properties that will motivate us to use the quaternion as a description for rotation when dealing with the alignment of surfaces. As a complex number a quaternion may be written in the form q = q 0 + iq x + jq y + kq z where i, j, k are imaginary numbers Multiplication of Quaternions To understand the action of quaternions upon vectors we must first establish the effect of multiplying quaternions together. Let i = 1, j = 1, k = 1 ij = k, jk = i, ki = j ji = k, kj = i, ik = j. Then we can define multiplication of two quaternions. r = r 0 + ir x + jr y + kr z, q = q 0 + iq x + jq y + kq z r q =(r 0 + ir x + jr y + kr z )(q 0 + iq x + jq y + kq z ) =(r 0 q 0 r x q x r y q y r z q z ) + i(r x q 0 + r 0 q x r z q y + r y q z ) + j(r y q 0 + r z q x + r 0 q y r x q z ) + k(r z q 0 r y q x + r x q y + r 0 q z ) This results in somewhat cumbersome notation which may be simplified by arranging the components into a 4 4 matrix Products of Quaternions This can be arranged into a matrix where the top row is the real part and the other rows correspond to the imaginary components. r q = r 0 r x r y r z r x r 0 r z r y r y r z r 0 r x r z r y r x r 0 44 = R q (5.3) q

47 where R = q r = r 0 r x r y r z r x r 0 r z r y r y r z r 0 r x r z r y r x r 0 r 0 r x r y r z r x r 0 r z r y r y r z r 0 r x r z r y r x r 0, R = q = R q (5.4) r 0 r x r y r z r x r 0 r z r y r y r z r 0 r x r z r y r x r 0. Observe that R is the same as R but with the lower right hand 3 3 matrix transposed. In general r q q r because ij ji etc.. We now have matrices which act on the given quaternion and represents the effect of multiplication Dot Products Between Quaternions The dot product between two quaternion acts as one would expect, giving the sum of the products of corresponding pairs of values p q = p 0 q 0 + p x q x + p y q y + p z q z. The unit quaternion is defined as having unit length, that is q = q q = 1. Taking the complex conjugate of a quaternion negates its imaginary parts q = q 0 iq x jq y kq x. Note that the product of a quaternion and its conjugate is entirely real q q = (q 0 + iq x + jq y + kq z )(q 0 iq x jq y kq z ) = q 0 + q x + q y + q z = q q. This implies that any non-zero quaternion has an inverse q 1 = 1 q q q which in the case of the unit quaternion is just the conjugate itself. 45

48 5.3.4 Useful Properties of Quaternions The 4 4 matrix Q associated with the multiplication of a quaternion q is given by q 0 q x q y q z q Q = x q 0 q z q y q y q z q 0 q x q z q y q x q 0 and the matrix associated with the conjugate q is the transpose of Q q 0 q x q y q z Q T q = x q 0 q z q y q y q z q 0 q x. q z q y q x q 0 Note that the product of these matrices gives the following q 0 q x q y q z q 0 q x q y q z QQ T q = x q 0 q z q y q x q 0 q z q y q y q z q 0 q x q y q z q 0 q x q z q y q x q 0 q z q y q x q 0 = = ( q q)i q 0 + q x + q y + q z q 0 + q x + q y + q z q 0 + q x + q y + q z q 0 + q x + q y + q z where I is the 3 3 identity matrix. This is expected since q q = ( q q). Similarly, Q T Q = ( q q)i. Note that Q and Q T are orthogonal, that is, their columns are orthonormal vectors; this means that dot products are preserved and the magnitude of a product is the product of the magnitude. For example, the dot product between the quaternions given by the multiplication of q and p, and q and r is equal to the multiplication of dot products q with itself and p and r. For q a unit quaternion this gives ( q p) ( q r) = (Q p) (Q r) = (Q p) T (Q r) = p T Q T Q r = p T ( q q)i r = ( q q)( p r) ( q p) ( q r) = p r. 46

49 To show that the magnitude of a product is the product of the magnitude set r = q further, exchanging p and q gives ( q p) ( q p) = ( q q)( p p) ( p q) ( p q) = ( p p)( q q). Another useful result that will be utilised later is This is derived explicitly as follows using Equation 5.4 ( p q) r = p ( r q ). ( p q) r = ( Q p ) r = ( Q p ) T r T = p Q T r ( ) = p Q T r using Equation 5.4 and that the matrix associated with the multiplication of q is Q T = p ( r q ). Vectors and scalars may be represented by quaternions that are purely imaginary and purely real respectively. Note that the matrices associated with the purely imaginary quaternion q = 0 + iq x + jq y + kq z are skew symmetric Q T = Q T = 0 r x r y r z r x 0 r z r y r y r z 0 r x r z r y r x 0 0 r x r y r z r x 0 r z r y r y r z 0 r x r z r y r x 0 = Q = Q 5.4 Describing Rotation with Unit Quaternions Our aim is to find a way of transforming surfaces so that they are aligned. This usually involves scaling, translating and rotating. It will be shown that mapping imaginary 47

50 quaternions to imaginary quaternions preserves the dot product and the sense of the cross product and that this is sufficient to represent rotation. We require a map from an imaginary quaternion to an imaginary quaternion and because these represent vectors it makes sense to consider the dot and cross products. Rotation is characterised by not altering length, i.e. preserving the dot product, and is distinguished from reflection by not changing the sense of the cross product. Recall that multiplying a quaternion by a unit quaternion does preserve the dot product. ( q p) ( q r) = p r, where q q = 1 This, however, is insufficient, since we require a purely imaginary result (that represents a vector) and this is not generally true. For example p is clearly a unit quaternion q = 1 i + 1 j + 1 k p = i + j + k p q = ( 1 i + 1 j + 1 k ) (i + j + k) which is real. = 3 So multiplying an arbitrary imaginary quaternion by a unit quaternion does not guarantee a purely imaginary result. Instead, consider the composite product denoted m = q m q which is multiplied by the unit quaternion from the left and the conjugate transpose of the unit quaternion from the right. This can be shown to give a purely imaginary result m = q m q = (Q m) q = Q T (Q m) = ( Q T Q) m. Recall Q and Q T so q 0 q x q y q z Q T q Q = x q 0 q z q y q y q z q 0 q x q z q y q x q 0 = q 0 q x q y q z q x q 0 q z q y q y q z q 0 q x q z q y q x q 0 q q q 0 + q x q y q z (q x q y q 0 q z ) (q x q z + q 0 q y ) 0 (q y q x + q 0 q z ) q 0 q x + q y q z (q y q z q 0 q x ) 0 (q z q x q 0 q y ) (q z q y + q 0 q x ) q 0 q x q y + q z. (5.5) 48

51 Each element is real so multiplying with an imaginary vector m will result in a purely imaginary vector. If q is a unit quaternion then Q T Q will be orthonormal; further, since q q = 1 and the rest of the row and column are zeros, the lower right hand 3 3 sub-matrix R must also be orthonormal. In fact, R is the rotation matrix that transforms m into m ( ) 1 0 Q T T Q =. 0 R It must now be shown that the composite product preserves the dot product and the sense of the cross product to show that it is in fact representing rotation and not reflection. Consider now the quaternion as a scalar and a vector with three components where q = q + q q = q 0 q = (q x q y q z ) T. Multiplication of quaternions r and s is given by p = rs r s p = rs + sr + r s (5.6) where p and p are the scalar and vector components of the result respectively. This holds because p = p + p = r s = r 0 s 0 (r x s x + r y s y + r z s z ) + r 0 (is x + js y + ks z ) + s 0 (ir x + jr y + kr z ) + i(r y s z s y r z ) + j(r z s x r x s z ) + k(r x s y r y s s ) = rs r s + rs + sr + r s. Consider the case when r and s are purely imaginary, that is, r = r 0 = 0 and s = s 0 = 0 the Equations 5.6 simplify to give p = r s p = r s. The dot product can easily be shown to be preserved r s = ( q r q )( q s q ) = ( q r)( q q)( s q ) = q( r s) q = ( r s) so r s = (r s). The cross product can be shown to be preserved by showing that the determinant of (5.5) is 1, or by showing that by applying the composite product to the elements of the cross product is equivalent to applying the composite to the result of the cross product. 49

52 Figure 5.1: Illustration of Rodrigues rotation formula where r is transformed into r by a rotation through θ about an axis in the direction of r z Example - Using Quaternions to Represent Rotation To show that the application of the composite product using unit quaternions gives a representation of rotation consider the case shown graphically in Figure 5.1. From this Rodrigues s rotation formula, Equation 5.7, can be derived as follows. Take ω to be a unit vector along the axis of rotation, in this case this has been taken to be the direction of the z component of the vector r, that is, r z. The vector r is transformed into r by a rotation through θ degrees about the rotation axis ω. The component of r in the z direction is given by r z = (r ω)ω. The projection of r onto the xy plane is given by r x = r (r ω)ω and the vector of the same length as r y orthogonal to r y and r z is v = ω r. 50

53 The transformed component of r in the x direction becomes r x = r cos θ + v sin θ = (r (ω r)ω) cos θ + (ω r) sin θ. The result of the transformation on r is simply given by the sum of the transformed x component and the z component, since this is unchanged r = r x + r z = (r (ω r)ω) cos θ + (ω r) sin θ + (r ω)ω and so we find r = r cos θ + (ω r) sin θ + (1 cos θ)(r ω)ω (5.7) which is Rodrigues s rotation formula. To show that the unit quaternion q = cos ( ( ) θ ) + sin θ ω represents the rotation of r to r one must show that ( Q T Q)r gives Rodrigues s rotation formula. ( r ( ( ) ( ( Q x cos θ ) + ω x ωy ω ( ))) z sin θ + ( T Q)r =i r y ωx ω y sin ( ) θ ωz sin ( ( )) θ ) cos θ + ( r z ωx ω z sin ( ) θ ωy sin ( ( )) θ ) cos θ ( r x ωx ω y sin ( ) θ + ωz sin ( ( )) θ ) cos θ + ( j r ( ( ) ( y cos θ ) + ω x + ωy ω ( ))) z sin θ + ( r z ωy ω z sin ( ) θ ωx sin ( ( )) θ ) cos θ ( r x ωx ω z sin ( ) θ ωy sin ( ( )) θ ) cos θ + ( k r y ωy ω z sin ( ) θ + ωx sin ( ( )) θ ) cos θ + ( r ( ( ) ( z cos θ ) + ω x ωy + ω ( ))) z sin θ 51

54 Just considering the ith component reveals ) ) ) ) r x cos ( θ ( θ + r x ωx sin ) ( θ + r y ω x ω y sin r y ω z sin ( ) ( ) θ θ r z ω y sin cos ) ) = r x cos ( θ ( θ + r y ω x ω y sin ) ( θ + r z ω x ω z sin r z ω y sin ( ) θ (ω ) + r x sin x ωy ωz ( θ r x ωy sin ( ) θ cos ( θ r x ωz sin ( ) ( ) θ θ + r z ω x ω z sin ( ) θ r y ω z sin cos ( ) ( ) θ θ cos ( ) θ using that ω x + ω y + ω z = 1 we find ω x = ω y ω z, substituting this in gives r x cos ( θ ) ( θ + r x ωx sin ) ( θ + r y ω x ω y sin r y ω z sin ( ) ( ) θ θ r z ω y sin cos ) ) = r x cos ( θ ) ( θ + r y ω x ω y sin ) ( θ + r z ω x ω z sin r z ω y sin ( ) θ (ω + r x sin x 1 ) ( ( ) ( )) θ θ = r x cos sin + ω x sin ( θ ) ( θ r x ωy sin ( ) θ cos + cos (r x ω x + r y ω y + r z ω z ). ) ) ( θ r x ωz sin ( ) ( ) θ θ + r z ω x ω z sin ( ) θ r y ω z sin cos ( ) ( ) θ θ cos ( ) θ (ω y r z r y ω z ) ( ) θ Similar results hold for the j and k components and these combine to give the solution Q T Qr = (q q q)r + qq r + (q r)q 5

55 as required. Note that the vector part of the unit quaternion gives the axis of rotation and the angle can be found from the magnitude of the scalar and vector parts. Simple example: Rotate the vector r = ( ) 1 π r = cos = 1 = sin ( ) i j k π by θ = π 3 about ( ( π 1 cos 3 )) Benefits of Using Quaternions Over Other Representations Having now established that it is possible to represent rotations via the unit quaternion q it may also be observed that there are benefits to representing rotations in this form rather than, say, a rotation matrix. Consider the composition of rotations represented by q followed by p r = p r p = p( q r q ) p = ( p q) r( q p ) = ( p q) r( p q). This shows that the composite rotation is represented by the unit quaternion p q. Further, note that the multiplication of quaternions requires fewer operations than the multiplication of two 3 3 matrices representing the same rotation Finding the Optimal Rotation for Alignment Recall that we are aiming to find the best rotation to align two surfaces along with suitable scaling and translation so that the plane of bilateral symmetry of a facial surface may be found. The algorithm for finding the optimal rotation will be outlined and then the application of this as part of the ICP method given. Given a point set with points at r i, i = 1,..., n that is described by two different frames of reference, 1 and, the ith component is given by r 1,i and r,i respectively. 53

56 We aim to rotate frame to best align with frame 1, that is, we wish to maximise This is equivalent to maximising n ( q r,i q ) r 1,i. i=1 n ( q r,i) ( r 1,i q). i=1 If r 1,i and r,i are given by (x 1,i, y 1,i, z 1,i) and (x,i, y,i, z,i) respectively, we find, using Equation x,i y,i z,i q r,i x =,i 0 z,i y,i y,i z,i 0 x q = R,i q,i z,i y,i x,i 0 and using Equation 5.3 we get r 1,i q = 0 x 1,i y 1,i z 1,i x 1,i 0 z 1,i y 1,i y 1,i z 1,i 0 x 1,i z 1,i y 1,i x 1,i 0 q = R 1,i q. Substituting these in we now wish to maximise n ( R ) ( n,i q (R 1,i ) = q T R,iR T 1,i q = q T n i=1 i=1 i=1 Where N is given by ) ( n R,iR T 1,i q = q T N i ) q = q T N q. i=1 N = n i=1 n = i=1 n = i=1 R T,iR 1,i 0 x,i y,i z,i x,i 0 z,i y,i y,i z,i 0 x,i z,i y,i x,i 0 0 x 1,i y 1,i z 1,i x 1,i 0 z 1,i y 1,i y 1,i z 1,i 0 x 1,i z 1,i y 1,i x 1,i 0 x 1x + y 1y + z 1z z 1y z y 1 x 1z x z 1 y 1x y x 1 y z 1 y 1z x 1x y 1y z 1z y 1x + y x 1 z 1x + z x 1 x 1z x z 1 x 1y + x y 1 x 1x + y 1y z 1z z 1y + z y 1 y 1x y x 1 x 1z + x z 1 y 1z + y z 1 x 1x y 1y + z 1z Introduce some new notation to abbreviate these terms n n n S xx = x,ix 1,i, S xy = x,iy 1,i, S xz = x,iz 1,i,, S zz = i=1 i=1 54 i=1 n z,iz 1,i. i=1.

57 N = S xx + S yy + S zz S yz S zy S zx S xz S xy S yx S yz S zy S xx S yy S zz S xy + S yx S xz + S zx S zx S xz S yx + S xy S xx + S yy S zz S yz + S zy S xy S yx S zx + S xz S zy + S yz S xx S yy + S zz All of this information can be captured in the 3 3 matrix M = S xx S xy S xz S yx S yy S yz S zx S zy S zz It can be shown that the unit quaternion that maximises q T N q is the eigenvector corresponding to the largest eigenvalue of N, see [13] Summary of Finding Optimal Rotation In summary, to find the optimal rotation to align one point set with another one must first find the center of mass, or average point in each set given by r 1 and r. Next, calculate the position of each point relative to this center of mass and label r 1i and r i, i = 1,..., n where n is the number of points in each set. Since we are assuming that the sets can be aligned we require that the number of points in each set is the same and that the ith point of the first set corresponds to the ith point of the second set. As we have seen, it is sufficient to know about the matrix M in order to find the optimal rotation, so we must calculate the product of components between each pair of points. Let r 1i = (x 1i, y 1i, z 1i) and r i = (x i, y i, z i). Calculate x 1x, x 1y, x 1z, y 1x, y 1y, y 1z, z 1x, z 1y, z 1z for each i = 1,..., n. From here, calculate S xx = n i=1 x 1ix i,..., S zz = n i=1 z 1iz i the total product of components between the sets. Compute N given by N = S xx + S yy + S zz S yz S zy S zx S xz S xy S yx S yz S zy S xx S yy S zz S xy + S yx S xz + S zx S zx S xz S yx + S xy S xx + S yy S zz S yz + S zy S xy S yx S zx + S xz S zy + S yz S xx S yy + S zz Next compute the eigenvalues and corresponding unit eigenvectors of N. Select the eigenvector corresponding to the most positive eigenvalue as the unit quaternion to describe the rotation. The axis of rotation is given by the direction of the vector and the angle of rotation can be found from the magnitude of the vector and scalar parts. To perform the rotation to the set of points, apply the composite product to each point q r i q. The translation is given by the difference between the centroid of the first set and the center of mass of the rotated and scaled second set.. 55

58 5.7 Iterative Closest Point We now have a method for finding the optimal rotation between two surfaces described by point clouds. This can be combined along with an optimal translation to create an iterative method for aligning the two surfaces. The iterative closest point (ICP) method requires two point clouds that are to be aligned, an initial approximation of the transformation and a threshold below which the surfaces are assumed to be aligned. Let q = [ q q T ] T = [q R q T ] T where q = q R is the quaternion describing the rotation and q T is the 3-vector describing the translation. So q = [q 0 q x q y q z q u q v q w ] T where q = [q 0 q x q y q z ] T = q 0 + iq x + jq y + kq z and q T = [q u q v q w ] T. Begin with an initial approximation of q. Step 1 Calculate the closest point in set 1 from each of the points in set (the set we wish to align). Step Compute the transformation that minimises the mean-squares distance between corresponding points. Step 3 Apply the transformation to the points in set. Step 4 If the distance between set 1 and each point in set is below some chosen threshold, stop. Else, repeat from step 1. Label set 1 by A = {r 1i } n i=1 and set by B = {r i } n i=1. We wish to minimise f(q) = 1 n r 1i R( q)r i q n T i=1 where R( q)r i is the rotation matrix generated by the quaternion q applied to the point r i. The distance between a point r i in B and the set of points A is given by d(r i, A) = min r 1i A r 1i r i. The resulting set of closest points is the set C, where C = C (A, B) and C is the closest points operator. The least-squares registration between B and C generated by the vector q, corresponding to the operator Q(B, C) is then applied to B and the points of B are updated via q. Set k = 0, B = B 0. 56

59 Step1 Compute the set of closest points C k = C (A, B k ). Step Compute the registration vector q k where (q k, d k ) = Q(B 0, C k ). Step 3 Apply the registration to B 0 via B k+1 = q k (B 0 ). Step 4 If d k d k 1 < τ where τ is some pre-defined threshold, stop. Else, repeat from step Finding the Region of Interest Given a facial surface, how can one determine where the region of interest (ROI) lies? In our case we wish to consider the region of the face that lies inside a sphere centered at the nose tip. The following method for finding the ROI is taken from [4] and [3]. First, find the plane of bilateral symmetry through the face; next, locate the nose tip. Finally, center a sphere at the nose tip and consider only those points on and inside the sphere. The plane of bilateral symmetry can be found by mirroring the surface in an initial approximation of the actual plane of symmetry. This takes the surface S to the mirrored surface S. Next, using the idea that faces exhibit strong bilateral symmetry, align S and S using the ICP method to rotate and translate S into S. Consider the composite of S and S, S which is self-symmetric. The true plane of symmetry A is the plane that bisects pairs of points of S. That is, A = { x : x p + } p, p p = 0 where p is a point in S and p is the corresponding point in S. Now the plane of bilateral symmetry has been found the nose tip can be easily located. The central profile of the facial surface is simply the curve of intersection between the facial surface and the plane of symmetry. The nose tip is defined to be the point in the central profile that maximises the Euclidean distance between a line segment between the end points of the central profile and the central profile. To isolate the region of interest, center a sphere at the nose tip and consider only those points of the surface which lie on, or inside the sphere. The radius of the sphere can be altered to consider more, or less of the face as required. For example, if one requires to identify the face then more of the surface should be considered, however, if one desires a representation that is more invariant to expression then a small radius should be considered so that only the nose and brow region is included, as this is 57

60 Figure 5.: The surface S is reflected in a first guess of the plane of symmetry and then aligned with with original surface. The bisector of corresponding points gives the true plane of bilateral symmetry, A. 58

61 Figure 5.3: The nose tip is the point of maximal distance from a line connecting either end of the profile. largely expression invariant. In particular [11] consider the comparison of faces by analysing the shapes of level curves on the nose. This will be discussed in the following chapter which considers the method of constructing geodesics between curves which represent the shapes of facial curves. 5.9 Summary We now have a method for extracting the region of interest, that is, the face, or part of, from a scene. This was achieved by finding the plane of symmetry that runs down the center of the face in order to locate the nose tip. From this we defined the ROI to be within a sphere centered at the nose tip. Now we have isolated individual faces from a scene we have successfully solved the face detection problem. Next we will tackle the recognition aspect by considering the comparison of level curves on facial surfaces. 59

62 Chapter 6 Face Recognition Using Facial Curves A facial surface can be represented by a set of planar curves that are given by level sets of the depth function of the surface. If one can find a way of comparing the shapes of these curves then it may be possible to to apply this to face recognition. The general idea is to construct geodesics under some metric between these planar curves so that one shape may be smoothly deformed into another. Whole faces [8] or parts of faces [11] may then be compared using lengths of geodesics as a distance measure, and categorising faces via clustering algorithms that seek to minimise within class variance and maximise between class variance. New faces may then be identified by finding their nearest neighbour under the given metric. Two methods will be considered, the shooting method and the path-shortening method. These will be derived in turn and the broad framework established. Comparisons will be made and then further improvements outlined. The application to face recognition will then be discussed. 6.1 The Shooting Method Constructing Geodesics Between Curves This section is adapted from [16]. Curves defined by level sets of the depth function, that is, the value of the z-component of a surface can be described in terms of a direction function as follows. Given a smooth parametrised curve α : R R of period π that is parametrised by arc length, the tangent idicatrix is given by v : R S 1 R v(s) = α (s) = e jθ(s) 60

63 where S 1 is the unit circle, j = 1 and θ : R R is the angle that α (s) makes with the positive x-axis and is called the direction function as illustrated in Figure 6.1. Figure 6.1: For a closed curve α in R the direction function θ(s) is the angle that the curve makes with the positive x-axis at the point s. Clearly, v(s) determines θ up to addition of an integer multiple of π, since we are assuming that the curve α has period π. The rotation index n gives the number of times α (s) rotates as s varies between 0 and π. We will restrict our attention to those curves with rotation index 1, that is, we will consider simple closed curves and will avoid intersection. The shape of a curve can be represented using θ(s) by generating a shape space that takes into account the fact that shape is invariant to certain transformations; rigid rotations which have an action of SO(3), translations which have group action R and scaling which can be easily dealt with by setting the length of each curve to be π. Consider first the unit circle S 1, this has direction function θ 0 (s) = s. Now consider other curves with rotation index 1, these can be described via θ = θ 0 + f where f is a real valued function that takes real numbers to real numbers, has period π and is square integrable on [0, π]. In fact, let L be the space of all real valued functions from R to R with period π and square integrable on the interval [0, π]. So that f L. We are looking to define a space whose elements describe the shape of a curve. Since we know shape is not altered by rigid rotations, translations or scaling we must impose the following restrictions. We are only interested in closed curves so restrict to those 61

64 curves which satisfy π 0 e jθ(s) ds = 0. (6.1) This must hold since all our curves have length π. For invariance to planar rotation, that is, the action of R on θ 0 + L, we insist that 1 π π 0 θ(s) ds = π. (6.) Note that this could be set equal to any constant, but by choosing π we ensure that the identity is included in the restricted set. Consider splitting the original space into regions where elements of each region are equivalent under planar rotation, then this restriction gives a slice of the action of rotation (adding a constant to θ), Figure 6.. Note that this slice is perpendicular to R-orbits under the L inner product. This is important since this slice will contain all geodesics in the quotient space, as we will see later. Figure 6.: Each point in a region corresponds to the shape of a curve in R. Segment the space of curves into regions which are equivalent under rotation. Points in a region (separated by dashed lines) are equivalent under rotation, so only one point per region need be considered for a description of shape. Define the preshape space C to be the set of all elements that satisfy the restrictions given in Equations 6.1 and 6.. Further, define the map φ = (φ 1, φ, φ 3 ) : (θ 0 + L ) R 3 where φ 1 (θ) = 1 π θ(s) ds π 0 φ (θ) = φ 3 (θ) = π 0 π 0 cos(θ)(s) ds sin(θ)(s) ds then C = φ 1 (π, 0, 0). Further, note that choosing different reference points (s = 0) also does not alter the shape of the curve, so this must also be removed to arrive at 6

65 the shape space. Reparametrising generates an action of S 1 since the origin can be selected at any angle around the unit circle. Therefore, the shape space S is given by S = C/S 1. Now we can begin to understand how to compare shapes by computing geodesics between elements of S. It is too difficult to compute geodesics directly on S or C so the idea is to draw infinitesimal tangent lines in the larger affine space θ 0 + L and then project these lines onto C. This requires a mechanism for projection and we must also have a notion of the tangent space in order to construct tangent lines. Since, by definition, the tangent space is orthogonal to the normal space, it is easy to show that the tangent space of C is given by T θ (C) = {f L : f span{1, cos(θ), sin(θ)}}. This is found by considering the directional derivatives of φ : θ 0 + L R 3 at a point θ θ 0 + L in the direction f L and is derived in Appendix B Geodesics on the Preshape Space Next, we consider how to project points from L to the preshape space C; we need a mechanism for finding the closest point on C given an arbitrary point in L. The basic idea is to travel orthogonal to level set of φ i such that the images under the maps of φ i form straight lines in R 3. This projection is given by P : L C. For further details see [16]. So, to construct geodesics on C, first approximate them by working in θ 0 + L then project these lines onto C using P. The following algorithm from [16] provides an iterative method for the generation of these paths. Let θ C and f T θ (C) a tangent vector to C at θ. Generate a geodesic path starting from θ in the direction f; this flow is denoted Ψ(θ, t, f). At time t = 0 set the flow to start at θ and be directed along f, Ψ(θ, 0, f) = θ. Analyse this flow at discrete time steps given by. At the first time step t = to reach θ + f L apply the projection P to get the corresponding point in C. To find the next point along the geodesic set Ψ(θ,, f) = P (θ + f) and iterate to build the whole curve. It is important to note that at each time step one should ensure that f is transported suitably, ensuring that it remains tangent to each point and renormalising to keep the rate of flow constant. If θ is the next point along the geodesic, then we require f to be tangent to C at this point and also a parallel transport of f. By setting f = f g g, where g = f 3 63 k=1 f, h k h k (6.3)

66 where the h k s form an orthonormal basis of { 1, cos θ, sin θ }, we achieve the requirements that f is a parallel transport of f and also that f is tangent to C at θ. Start with a point θ C and a direction f T θ (C). Set l=0 and the one-parameter flow Ψ(θ, l, f) = θ. Choose a small > 0. Step 1 Add an increment of f to the flow: Ψ(θ, l, f) + f and set Ψ(θ, (l + 1), f) = P (Ψ(θ, l, f) + f). Step Transport f to the next point using Ψ(θ, (l + 1), f) = θ described in Equation 6.3. Step 3 Set l = l + 1. Repeat from Step 1 replacing f with f. As 0 it can be shown that Ψ converges to a geodesic Geodesics on the Shape Space We now have a method for generating geodesics on the preshape space by iteratively projecting tangent lines from the affine space θ 0 + L to C. The problem of finding geodesics in the shape space S reduces to finding those geodesics in C which are orthogonal to the S 1 -orbits, since S is simply the quotient space of C under S 1. This is easily done by restricting the allowable tangent directions to include only those which are orthogonal to the S 1 -orbits, that is, consider only those f T θ (C) that also satisfy f T θ (S 1 (θ)) Shooting Method Problem Statement The problem we are attempting to solve is, given two elements θ 1, θ S, how does one construct a geodesic path from θ 1 to θ that arrives in unit time? This problem essentially reduces to finding the optimal direction f T θ1 (S) that minimise the miss function between the end point and θ. That is, we seek to minimise H[f] = inf s S Ψ(θ 1, 1, f) (s θ ). This, however, is a non-trivial problem since T θ1 (S) is infinite-dimensional, so numerical methods must be employed using a finite-dimensional approximation f mn=0 (a n cos(ns) + b n sin(ns)), m >> 0. The concepts of this method are illustrated by Figure 6.3. There is the risk that using this method one may get stuck in a local minima of the miss function which means the geodesic does not reach the target shape. Also, since this technique demands the use of numerical methods there is the disadvantage that 64

67 Figure 6.3: The shooting method generates geodesics of unit length and then iterations minimise the miss function between the resulting point and the target point. the iterations may become unstable if the shape space is largely curved near the target shape. This motivates us to consider other techniques which offer some solutions to these problems, in particular, the path-shortening method may offer such. 6. Path-Shortening Method This section is based on [15]. Rather than starting from a fixed point and shooting out geodesics whose paths are then iteratively altered to attempt to reach another target point in unit time, the path-shortening method begins by selecting any path between two fixed points and then iteratively shortens the path, ultimately resulting in a geodesic path. This method requires some further exploration of the differential geometry of C. Again, the basic idea is to represent curves parametrised by arc length on the unit -sphere. The task is to connect two curves with a path and iteratively straighten, using that the limit will be a geodesic, Figure 6.4. We will consider the shape space, as before, by removing transformations that do not alter shape and use this space to search for the shortest geodesic between all possible shape preserving transformations of two curves. We will derive a technique based on differential geometry for constructing geodesics between closed curves in R 3. Given a curve p : [0, π) R 3 of period π, parametrised by arc length, the direction function we will work with is given by v(s) p (s) R 3, v(s) = 1 for all s [0, π). Here is the Euclidean norm in R 3. The direction function v takes the interval [0, π) to a curve on the unit -sphere, v : [0, π) S. We will assume that v is square integrable, that is, that the integral of its modulus squared over an interval is finite, but note that it is not necessarily continuous. See Figures 6.5 and 6.6 for an 65

68 Figure 6.4: The path shortening method generates a path between two given points and then iteratively straightens the path, which tends to a geodesic in the limit. illustration of this. Figure 6.5: A closed curve p(s) in R 3 may be represented by a curve on S by considering the tangent to the curve at each point. Figure 6.6: The curve v on S represents the curve p in R 3. Call the set of all square integrable maps from [0, π) to the unit -sphere, P, given by { π } P : [0, π) S, where P = v v : [0, π) S, v(s) ds <. 0 We are only interested in closed curves so define the map µ µ : P R 3, where µ(v) = 66 π 0 v(s) ds

69 so that the preshape space is given by C = µ 1 (0) {v P µ(v) = 0} P. Note the difference between µ and φ defined for the shooting method; φ has the added restriction of invariance to planar rotation. Similarly, C is now the set of all closed curves in R 3 and has no added restrictions. Consider now the geometry of the unit -sphere. Geodesics on S are great circles (or arcs of great circles) which are given analytically at a point x S in tangent direction a T x (S ) by χ t (x; a) = cos(t a )x + sin(t a ) a. a Geodesics on the unit -sphere will be denoted by χ t. In order to generate curves on the shape space we need to understand how vectors rotate on S. If we have two elements x 1, x in S and we know a tangent vector at x 1 given by a, one can transform a to be tangent at the second point via parallel transport along the geodesic joining the points. This has the effect of rotating a and may be expressed as a vector by π( ; x 1, x ) : T x1 (S ) T x (S ), { a (a x ) where π(a; x 1, x ) = x 1 +x (x 1 + x ) for x 1 x a for x 1 = x where (, ) is the standard Euclidean inner product in R 3. We want to analyse the elements of C and to do this a geometric framework must be developed. An outline will be given here but for a complete derivation see [16]. Recall the set P which is the set of all direction functions associated with arc-length parametrised curves of period π. A tangent vector f at v on P belongs to a tangent space T v (P ) but can also be thought of as a vector fields of tangents to S along v as shown in Figure 6.7. This tangent space is given by T v (P ) = { f f : [0, π) R 3, (f(s) v(s)) = 0 }. We are interested in those vector fields f T v (P ) on v that are also tangent to C. Let γ(t) be a path in C such that γ(0) = v. As C is the set of curves of period π we know that π 0 γ(s) ds = 0 for all t. Taking the derivative with respect to t and setting t = 0 yields π 0 γ(0)(s) ds = 0. Note that γ can always be chosen so that f = γ(0) so it follows that π f(s) ds =

70 Figure 6.7: The vector field to S on v is given by f. for all s. Therefore, the tangent space to C at v is given by T v (C) = { f f : [0, π) R 3, (f(s) v(s)) = 0, π 0 } f(s) ds = 0 s. To project vectors from T v (P ) into T v (C) it is necessary to impose a Riemannian structure on P. This will not be dealt with here but [16] developed such a structure using the inner product f, g = π 0 f(s) g(s) ds on T v (P ), where f, g are vectors belonging to this tangent space Path-Shortening Flows on the Preshape Space The idea is to connect two shapes by an arbitrary path g in C and then iteratively shorten it using a gradient approach, see Figure 6.8. The task may be defined as follows; given two closed curves v 0 and v 1 in C, find a geodesic path between them. Begin with any path g(t) connecting v 0 and v 1 g : [0, 1] C such that g(0) = v 0, g(1) = v 1. Finding a local minimum of the energy function E(g) g (t) g (t) dt will give a geodesic between v 0 and v 1 on C. This geodesic, however, is not necessarily the shortest geodesic. 68

71 Figure 6.8: g is a path of curves on S. At a time t [0, 1] the point on the curve g ( ) τ k (s) is a point on S. The vector field q(s) is tangent to the curve g. Call the set of all paths in C, parametrised by t [0, 1], H, and the subset of this which start at v 0 and end at v 1 H 0. The tangent spaces of these sets are T g (H) = { w t [0, 1], w(t) T g(t) (C) } T g (H 0 ) = {w T g (H) w(0) = w(1) = 0} where T g(t) (C) is just T v (C). For any t, g(t) is clearly a path on H 0 and is also a curve on S, so it corresponds to a closed curve in R 3. An element w of the tangent space of H 0 at g also describes a path of vector fields on S such that for all t [0, 1] we have that w(t) is a tangent vector field to S at g(t). Understanding paths on H requires some new definitions. Definition 6..1 (Covariant derivative, [5] p. 305). Given a path g H where g : [0, 1] C, and a vector field w T g (H) the covariant derivative of w along g is the orthogonal projection of dw(t) onto the tangent plane T dt g(t)(c). It is denoted by Dw. dt The covariant integral of w along g is simply the vector field u T g (H) which satisfies Du = w(t). The metric that will be used to make H a Riemannian manifold is the dt Palais metric, given by w 1, w = w 1 (0), w (0) Dw1 dt (t), Dw dt (t) dt for w 1, w T g (H). With respect to this metric T g (H 0 ) is a closed linear subspace of T g (H), since we know T g (H 0 ) is a subset of the linear vector space T g (H) and it can be shown that there is a zero element in T g (H 0 ), linear combinations of elements of T g (H 0 ) are elements of T g (H 0 ) and scalar products of elements of T g (H 0 ) are again 69

72 elements of T g (H 0 ). Further, since the subspace is closed it becomes a closed linear subspace. Further,H 0 is a closed subspace of H, meaning it inherits the topology of H. We are trying to minimise the energy function E in H 0 ; this can be achieved by finding the gradient vector of E in T g (H) and then projecting to T g (H 0 ). The gradient vector of E in T g (H) is given by a vector field q such that Dq = dg and q(0) = 0. That is, q dt dt is the covariant integral of dg dg with initial value zero at t = 0. Given this q can be dt dt found via numerical methods. Definition 6.. (Covariantly constant, [10] p. 41). A vector field w along a parametrised curve g(t) is covariantly constant (or parallel) if Dw dt = 0 for all t [0, 1]. This definition motivates us to define a geodesic in a new way. Definition 6..3 (Geodesic, [5] p. 308). A curve g H is a geodesic if the covariant derivative of ( the) unit tangent vector to the curve is zero along g. That is, g is a geodesic if D dg dt dt = 0 for all t. Definition 6..4 (Covariantly linear). A vector field w along a parametrised curve ( ) g(t) is covariantly linear if Dw is a covariantly constant vector field. That is, D Dw dt dt dt = 0 for all t. Definition 6..5 (Forward parallel translation). The forward parallel translation of a tangent vector w T g(0) (C) along a parametrised curve g is a vector field u for which u(0) = w and Du(t) dt = 0 t [0, 1]. Definition 6..6 (Backward parallel translation). The backward parallel translation of a tangent vector w T g(1) (C) along a parametrised curve g is given by a vector field u, when for g(t) g(1 t), u is the forward parallel translation of w along g. Note that both forward and backward parallel translation of vectors results in a vector field which is covariantly constant since the covariant derivative is necessarily zero, by definition. Theorem Let g : [0, 1] C be a path in H 0. Then, with respect to the Palais metric: 1. The gradient of the energy function E on H is the vector field q along g satisfying q(0) = 0 and Dq(t) = dg t [0, 1]. dt dt. The gradient of the energy function E restricted to H 0 is w(t) = q(t) t q(t), where q is the vector field defined in the first part, and q is the vector field obtained by the backward parallel translation of q(1) along g. Proof. (Adapted from [15]) 70

73 1. Define a variation of g to be a smooth function l : [0, 1] ( ɛ, ɛ) H such that l(t, 0) = g(t) t [0, 1]. Here, l depends on the parameters t and τ. The variational vector field corresponding to l is v(t) = l τ (t, 0). Consider l as a path of curves in H and define E(τ) to be the energy of the curve, where l is restricted to [0, 1] {τ}. Using the definition of E given above yields, E(τ) = l t (t, τ), l t (t, τ) dt. Taking the derivative of E and setting τ = 0 gives, after some calculation 1 Dv E dg (0) = (t), 0 dt dt (t) dt using that l t (t, 0) = dg (t). So, the gradient of E is a vector field q along g such dt that E (0) = v, q, that is 1 Dv E (0) = v(0), q(0) + 0 dt, Dq dt. dt Clearly it must hold that q(0) = 0 and to ensure the derivative is zero Dq dq. dt. For a full proof see [15]. = Dv = dt dt Theorem For two given closed curves v 0, v 1 C, a critical point of E on H 0 is a geodesic on C connecting v 0 and v 1. Proof. Let g be a critical point of E on H 0. This means that the gradient vector w given in the previous theorem must be equal to zero along g. So we have q(t) = t q(t) for all t. Recall that q(t) = dg, taking the derivative of this with respect to t yields, dt dg dt = Dq dt = D(t q) dt We know that q is the backward parallel transport of q(1) along g and so is covariantly constant, therefore, so is the velocity field dg. Thus g is a geodesic, as required. dt 6.. Path-Shortening Summary Step 1 Compute representations of each curve in C. Denote the direction functions representing two closed curves by v 0 and v 1. Recall P is the set of all direction functions v which represent closed curves on R 3 also, C is the set of all closed curves in R 3. We want closed direction function so project v P to C. 71 = q.

74 Step Initialise a path g between v 0 and v 1. The tangent vector to P at a point v is given by f, which may also be thought of as a vector field tangent to S along v. We want a path g : [0, 1] C such that g(0) = v 0 and g(1) = v 1. There are several methods for generating g; one method considers the points p 0 and p 1 in R 3 associated with v 0 and v 1, and connects them via p t (s) = tp 1 (s) + (1 t)p 0 (s). Alternatively, construct a path in S parametrised by t such that for all s [0, π] we define θ(s) = cos 1 (v 0 (s) v 1 (s)) f(s) = v 1 (s) (v 0 (s) v(1)(s))v 0 (s) = θ(s) f(s) f(s). Then, for all t [0, 1], s [0, π) define g(t)(s) = χ t (v 0 (s); f(s)) where χ t is the geodesic on S. Finally, project g(t) into C. Step 3 Compute the velocity vector field dg along g. dt This is a necessary step towards computing the gradient of E in T g (H), where H is the space of all paths in C which start at v 0 and end at v 1. For continuous paths, dg (t) is automatically in T dt g(t)(c) but for discrete paths we must be more careful. For τ = 1,..., k, for all s [0, π) define ( ) τ θ(s) = k cos (g 1 (s) g k ( τ 1 f(s) = g ) (s) + k ( ) dg τ (s) = θ(s) f(s) dt k f(s) ( g ( ) then project dg τ dt k (s) into Tg( τ k) (C). Step 4 Compute covariant integral of dg dt ( ) τ 1 k ( τ 1 k ) (s) ) (s) g, denoted q. We want the vector field q such that q(0) = 0 and Dq dt q ( τ k ) from Tg( τ then set k) (C) to T g( τ+1 k ( ) τ g (s) = π k ) (C) via ( ) τ + 1 q (s) = 1 dg k k dt ( ( ) ( ) τ τ q (s); g (s), g k k 7 ( τ + 1 k ( ) ) ( ) τ τ (s) g (s) k k = dg. Parallel transport dt ( τ + 1 k ) ) (s) ) ( ) τ (s) + q (s) k

75 to give the gradient vector field q ( ) τ k Tg( τ k) (C). If k dg dg τ=1 (τ), dt dt (τ) is small then stop. Else, continue. The resulting g is a geodesic with length d g (v 0, v 1 ) = ( dg dg 1 (0), dt dt (0) ). Step 5 Backward parallel transport q(1) along g. Find the covariant vector field q using iterative backward parallel transport of q(1). Step 6 Compute the full gradient vector field of E along g, denoted w. For all τ {0, 1,..., k} and s [0, π) the gradient vector field of E along g is given by ( ) ( ( ) ( ) ( ) ) τ τ τ τ ( w (s) q (s) q (s) T ) k k k k g( τ S. k)(s) Step 7 Update g in the direction w. Update g using the formula Repeat from Step 3. ( ) ( ( ) ( ) ) τ τ τ g (s) = χ 1 g (s); w (s). k k k 6..3 Geodesics on Shape Space Shape space is again a quotient space of C, modulo shape preserving transformations. By describing the initial curves by a direction function the translation variability is automatically account for. Also, by demanding the closed curves are of length π we account for scaling. To deal with rigid rotation, define rotated shapes by Ov(s) = O v(s) where s [0, π) and O SO(3), the special orthogonal group (the space of all 3 3 rotational matrices). For re-parametrisation consider the curve whose origin has been moved through an angle of θ S 1. The re-parametrisation curve is (θ v) (s) = v ((s θ) mod π ). So we have the rotation group SO(3) and the re-parametrisation group S 1. These groups act on C and the orbit associated with a curve v C is C v = { w C w = θ )v, θ S 1, O SO(3) }. This splits C into disjoint equivalence classes where each region is associated with a unique shape. The shape space is therefore given as the quotient space S = C/ ( S 1 SO(3) ). Note that geodesic distances are not altered on C since the action of S 1 SO(3) is isometric, that is, distance preserving. The geodesics in S correspond to those 73

76 Figure 6.9: Projecting the tangent to the current geodesic onto the space C v1 updating v 0 along the projection iteratively solves the minimisation problem. and geodesics in C of shortest length. That is, we seek to find the shortest geodesic connecting the orbits C v0 and C v1, given by d S (v 0, v 1 ) = min d C(θ Ov 0, v 1 ). (6.4) θ S 1,O SO(3) This may be computed numerically using an iterative method which find the θ and O which locally minimise this length. This is illustrated by Figure Application to Face Recognition Having found geodesic paths between curves on the shape space S it is possible to use statistical techniques for notions of mean shape and variance. This means it is possible to classify curves by their shape using clustering [16]. Given a facial surface which has been aligned so that gaze direction is directed towards the camera one may represent the surface by a set of curves defined by level sets of the depth function. That is, the value of the component directed into the face through the nose tip. Facial surfaces may be approximated by selecting a subset of these level curves. Choose those curves which are closed and set the length of each curve to be π. This, however, results in losing some information associated with the relative size of the curves but labeling each curve ensures that surfaces represented by ordered sets of closed curves may be compared directly. [8] use a shooting method to generate geodesics between corresponding facial curves C 1, C on at a depth λ Λ on surfaces 74

77 S 1, S respectively, and then compare shapes using two metrics d e (S 1, S ) = d ( ) Cλ, 1 Cλ λ Λ d g (S 1, S ) = d ( ) Cλ, 1 Cλ λ Λ 1 1 Λ where d e is the Euclidean length and d g is the geometric mean. They found that both metrics were able to correctly identify faces pair-wise, even with changes in expression. The distance between curves representing faces of the same person were consistently lower than the distance between curves of faces of different people. Clearly, using more curves to represent will increase the recognition rate, however, this also greatly increases the computational demands. This raises the question, what is the optimal number of curves to use to represent a face? Both [16] and [11] found that performance increased significantly with more curves up to around 4 or 5 curves where performance leveled off. This suggests that 5 curves is the best number of curves to use to represent the surface in question, whether it be the whole face, or the nose. Similarly, by increasing the number of training faces, the chances of correctly identifying the target face are increased Illustration of Number of Curves Defining a Face The following uses range data from the Bosphorus 3D Face Database [9]. Consider the face, rendered in 3 dimensions in Figure 6.10 using MATLAB [19]. By using the imcontour command it is possible to plot level curves of this surface. Below are examples of the results obtained for 1,, 3, 4, 5, 10, 0, 50, 100 and 00 curves. Clearly, with increasing curves the face becomes more easily identifiable. However, note that in these examples the curves are not restricted to only closed curves. 6.4 Further Work The approaches described so far have assumed that shapes are inelastic since they have assumed curves to be arc length parametrised; to incorporate bending and stretching one must consider an elastic metric. One such metric was suggested by [34] and used by [1] to compute geodesics between shapes. 75

78 Figure 6.10: Database. Three dimensional rendering of a face from the Bosphorus 3D Face For a parametrised curve α in R, represent the velocity vector α(s) as r(s)e jθ(s) where r(s) is the instantaneous speed and θ(s) is again the angle made between α(s) and the positive x-axis. This representation has the advantage over other possible metrics in that it reduces to a simple L metric, compared to complicated forms which arise because of speed invariance. Further, the preshape space is just a subset of the unit sphere. Represent the parametrised elastic curve α by the function Q(s) = α(s) α(s) Rn. Here Q(s) is the square root of the instantaneous speed and function for all s [0, π). Q(s) Q(s) is the direction Geodesic paths in preshape spaces may be found similarly to the path shortening method discussed previously. See [1] and [14] for further details. 76

79 Figure 6.11: 1 curve. Figure 6.1: curves. Figure 6.13: 3 curves. Figure 6.14: 4 curves. Figure 6.15: 5 curves. Figure 6.16: 10 curves. Figure 6.17: 0 curves. Figure 6.18: 50 curves. Figure 6.19: 100 curves. Figure 6.0: 00 curves. 77

Eigenface-based facial recognition

Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The