Online Appearance Model Learning for Video-Based Face Recognition

Similar documents
A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Multiple Similarities Based Kernel Subspace Learning for Image Classification

CS 231A Section 1: Linear Algebra & Probability Review

Lecture 13 Visual recognition

Example: Face Detection

Kazuhiro Fukui, University of Tsukuba

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

CITS 4402 Computer Vision

Learning Discriminative Canonical Correlations for Object Recognition with Image Sets

Expectation Maximization

Discriminant Uncorrelated Neighborhood Preserving Projections

Linear Subspace Models

Face Recognition Using Multi-viewpoint Patterns for Robot Vision

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.

Face detection and recognition. Detection Recognition Sally

Table of Contents. Multivariate methods. Introduction II. Introduction I

Principal Component Analysis

Incremental Face-Specific Subspace for Online-Learning Face Recognition

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

PCA FACE RECOGNITION

CS 4495 Computer Vision Principle Component Analysis

Face Detection and Recognition

Machine Learning (Spring 2012) Principal Component Analysis

Clustering VS Classification

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Hidden Markov Models Part 1: Introduction

Rapid Object Recognition from Discriminative Regions of Interest

Advanced Machine Learning & Perception

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

A Unified Bayesian Framework for Face Recognition

PCA and admixture models

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Maximum variance formulation

Lecture: Face Recognition

Image Analysis. PCA and Eigenfaces

Sparse representation classification and positive L1 minimization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Iterative Laplacian Score for Feature Selection

Data Preprocessing Tasks

L11: Pattern recognition principles

A Modular NMF Matching Algorithm for Radiation Spectra

Robust Motion Segmentation by Spectral Clustering

Data Mining Techniques

Face Recognition from Video: A CONDENSATION Approach

Machine Learning, Fall 2009: Midterm

Lecture 17: Face Recogni2on

2D Image Processing Face Detection and Recognition

ONLINE LEARNING OF GAUSSIAN MIXTURE MODELS: A TWO-LEVEL APPROACH

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

Discriminative Direction for Kernel Classifiers

Introduction to Machine Learning

ECE 661: Homework 10 Fall 2014

Supervised locally linear embedding

Unsupervised Learning with Permuted Data

CSCI-567: Machine Learning (Spring 2019)

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Advanced Machine Learning & Perception

Incremental Eigenanalysis for Classification

Principal Component Analysis (PCA)

Nonlinear Dimensionality Reduction

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Covariance-Based PCA for Multi-Size Data

Machine learning for pervasive systems Classification in high-dimensional spaces

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Self-Tuning Spectral Clustering

Notes on Latent Semantic Analysis

Pattern Recognition 2

Motivating the Covariance Matrix

Lecture 17: Face Recogni2on

Dimensionality reduction

Image Region Selection and Ensemble for Face Recognition

Reconnaissance d objetsd et vision artificielle

Variational Principal Components

Machine Learning Techniques for Computer Vision

Statistical Filters for Crowd Image Analysis

STA 4273H: Statistical Machine Learning

Affine Structure From Motion

Two-Layered Face Detection System using Evolutionary Algorithm

What is Principal Component Analysis?

Statistical Machine Learning

Locality Preserving Projections

Random Sampling LDA for Face Recognition

Cluster Kernels for Semi-Supervised Learning

Lecture: Face Recognition and Feature Reduction

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Transcription:

Online Appearance Model Learning for Video-Based Face Recognition Liang Liu 1, Yunhong Wang 2,TieniuTan 1 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, China {lliu, tnt}@nlpr.ia.ac.cn 2 School of Computer Science and Engineering Beihang University, Beijing, China yhwang@buaa.edu.cn Abstract In this paper, we propose a novel online learning method which can learn appearance models incrementally from a given video stream. The data of each frame in the video can be discarded as soon as it has been processed. We only need to maintain a few linear eigenspace models and a transition matrix to approximately construct face appearance manifolds. It is convenient to use these learnt models for video-based face recognition. There are mainly two contributions in this paper. First, we propose an algorithm which can learn appearance models online without using a pretrained model. Second, we propose a method for eigenspace splitting to prevent that most samples cluster into the same eigenspace. This is useful for clustering and classification. Experimental results show that the proposed method can both learn appearance models online and achieve high recognition rate. 1. Introduction For video based face recognition, most state-of-the-art face recognition algorithms [1, 2, 6, 7, 8, 9, 10, 12, 16, 18, 19] can perform the recognition task in real time. However, the training process usually runs off-line in a batch mode. Though a few online learning algorithms were proposed recently, those algorithms generally need to perform online learning based on a pre-trained model. A pre-trained model is typically trained in a batch mode from a data set which is manually collected and labeled. The task of collecting and labeling data is often tedious and boring. In addition, a pretrained model is usually not flexible enough to cope with various conditions online. Therefore, how to learn appearance models online without a pre-trained model is valuable to be developed. There are several drawbacks in batch training. First, it is inconvenient to add additional training samples. Each time we add some new samples, we have to run the batch algorithm once more, leading to higher computational complexity. This is because the complexity is generally at least proportional to the total number of training samples. Second, if we are confronted with a huge training data set, the computational complexity of batch algorithms is often prohibitive for current computers. Third, batch algorithms cannot be applied for real time online training. In contrast to batch algorithms, there are several advantages in online learning. First, it is convenient to add additional training samples. Each time we add new samples, the computational complexity is roughly the same because we discard data as soon as it has been processed. We only have to maintain a roughly constant amount of memory to represent the models. So it is quite suitable for the processing of video streams. Second, we can easily handle huge data sets by sequential processing. Third, online learning algorithms can be used for real time online training. For video-based face recognition, these properties are quite desirable. Based on the framework of probabilistic appearance manifolds proposed by Lee and Kriegman [8], we propose an online learning algorithm which can learn appearance manifolds without a pre-trained model. Similar to [8], we use a set of linear eigenspaces to represent sub-manifolds. However, in our method, the sub-manifolds are learnt completely online, while Lee and Kriegman s method [8] learnt appearance manifolds based on a pre-trained model. For each subject in the training data set, we construct K pose manifolds which are approximately constructed by linear eigenspaces (K can be decided empirically). In order to exploit the temporal information which is embedded in video streams, we maintain a transition matrix to record the number of transitions from one eigenspace to another. In this way, the transition probability from one eigenspace to another can be easily computed from the transition matrix. In the online learning process, each time for 1 1-4244-1180-7/07/$25.00 2007 IEEE

an incoming frame, the eigenspaces models are updated using IPCA (Incremental Principal Component Analysis) [4] or Eigenspace Merging and Splitting (EMS) method, with consideration of transition probabilities. For the recognition task, we compute the likelihood probability that a test frame is generated from each pose manifold and choose the one with the highest probability. Experimental results show that our proposed algorithm can achieve high recognition rate. The remainder of this paper is as follows. In Section 2, we briefly introduce some previous work on video-based face recognition. In Section 3, we describe the proposed method in detail. Some experimental results are presented in Section 4. Conclusions are drawn in Section 5. 2. Previous work A general review of recent face recognition literature can be found in [17]. In this section, we only briefly review the the literature which deals specifically with video-based face recognition. In [16], Mutual Subspace Method (MSM) is applied in which the similarity is defined by the angle between the input and the reference subspaces. Krüeger and Zhou [6] proposed an exemplar-based method which selects representative face images as exemplars from training face videos. These exemplars are used to facilitate tracking and recognition tasks. Liu and Chen [10] proposed to use adaptive Hidden Markov Models (HMM) for video-based face recognition. In [12], a KL divergence-based algorithm was proposed. In [14] and [9], frame synchronization was used. They also take advantage of audio information in videos. In [1], a generic shape-illumination manifold is learnt offline. Given a new sequence, the learnt model is used to decompose the face appearance manifold into albedo and shape-illumination manifolds. Zhou et al. [18, 19] proposed a generic framework for both tracking and recognition by estimating the joint posterior probability distribution of motion vector and identity variable. Lee et al. [7] proposed a method using probabilistic appearance manifolds. An appearance manifold is approximated by piecewise linear subspaces and a transition matrix learnt from an image sequence. An online learning algorithm for constructing a probabilistic appearance manifold was proposed in [8]. In their method, an appearance model is incrementally learnt online using a prior generic model and successive frames from the video. Both the generic and individual appearances are represented as an appearance manifold that is approximated by a collection of submanifolds (namely pose manifolds) and the connectivity between them. One obvious limitation is that their algorithm require a generic prior model which should be learnt offline. Our work bears some resemblance to [8] in the sense that both methods use eigenspace models and transition matrix to approximate pose manifolds. However, in this paper, we present an online learning algorithm which does not require a generic model. 3. Online appearance model learning In Section 3.1, we give a description of the appearance models and the transition matrix to be learnt online. In Section 3.2, a framework of online appearance model learning method using Eigenspace Merging and Splitting (EMS) is presented. In our method, eigenspace update using IPCA and EMS is a critical part and is discussed in Section 3.3. In Section 3.4, we discuss more details about the computation of distance. 3.1. Model description The problem we focus on in this paper can be described as follows. For the training face video stream of each subject in the dataset, we aim to construct K eigenspaces, Ω 1, Ω 2,, Ω K, to approximately represent the appearance manifold of that subject. There are four parameters for each eigenspace, namely [4] Ω (i) = {x (i), U (i), Λ (i),n (i) },i=1,,k. (1) The meaning of each parameter is as follows. x: center of the eigenspace. U: a matrix whose columns are orthonormal base of the eigenspace, namely eigenvectors. Λ: a diagonal matrix. Elements along the diagonal are variances for each principal axis, namely eigenvalues. They are arranged in descending order. N: number of samples to construct the eigenspace. We use a transition matrix T to record the number of transitions from one eigenspace to another. In order to make our algorithm more efficient, we also maintain a distance matrix D to record the distance from one eigenspace to another. T =(T ij ) K K. (2) D =(D ij ) K K. (3) How to learn these models online is what we focus on in this paper. Our method is presented in Section 3.2.

3.2. Online appearance model learning method Our method is motivated by hierarchical clustering. We just do hierarchical clustering in an incremental way. Initially, we assign the first K frames as the centers of K eigenspaces respectively. We compute the distance between every pair of eigenspace centers. For each incoming frame I t, we compute the distance between I t and each eigenspace center. The computation of distance between two eigenspace centers or between an incoming sample and a certain eigenspace center is influenced by both the Euclidean distance and the transition probability between them (see Section 3.4 for more details). Then we try to find the nearest neighbor. If the smallest distance is between I t and an eigenspace center, we update that eigenspace using IPCA. If the smallest distance is between two eigenspace centers, we merge the two eigenspaces into one eigenspace. In order to prevent that most frames cluster into the same eigenspace, we split an eigenspace when it contains too many frames. Each time after we update the eigenspaces, the transition matrix T and the distance matrix D are updated. Our algorithm is summarized as follows. Algorithm 1 Input: {I 1,I 2,,I n }: a consecutive face image sequence from a certain person. K: number of eigenspace models to be learnt. Output: Ω (i) = {x (i), U (i), Λ (i),n (i) },i=1,,k. Method: Initialize matrix D with all zeros. Initialize each element in matrix T with the same positive number T 0 (The prior distribution of transition probability is assumed as a uniform distribution). 1. for i 1 to n 2. if i<= K 3. x (i) I i,n (i) 1, U (i) {}, Λ (i) {} 4. Update D and T (3.4). 5. else 6. Compute the distance from I i to each eigenspace center (3.4). 7. Find the nearest neighbor. 8. if the smallest distance is between I i and an eigenspace center 9. Update that eigenspace with I i using IPCA (3.3.1). 10. else 11. Merge the two nearest eigenspaces (3.3.2). 12. Assign I i as the center of a new eigenspace model. 13. end if 14. If we get an eigenspace which contains too many samples, split it into two. 15. Update D and T. 16. end if 17. end for We have also considered about how to handle outliers. Generally, an eigenspace is more likely to be an outlier if it is constructed from fewer frames or its center is further from other eigenspace centers. In the step of eigenspace splitting, we remove an eigenspace which is the most likely to be an outlier using the criteria above. In our algorithm, eigenspace update using IPCA and EMS is a critical part. In Section 3.3, we show how to update eigenspaces without knowing both the original samples and the covariance matrix. 3.3. Eigenspace update In this section, we discuss more details about the computation of IPCA, eigenspace merging and eigenspace splitting. At the end of this section, some tricks are provided to make the calculation more efficient. 3.3.1 IPCA An algorithm of IPCA was proposed by Hall et al. [4], but their method is complicated to some extent. A concise method is described as follows. Algorithm 2 Input: Ω={x, U, Λ,N}: constructed from x 1, x 2,, x N. x: a new sample. Output: Ω = {x, U, Λ,N }: constructed from x 1, x 2,,x N, x. Method: 1. N N +1. 2. α 1 N/N,α 2 1 α 1. 3. x α 1 x + α 2 x. 4. Generate artificial data: [ α1 Y UΛ 1/2, α 1 α 2 (x x)]. 5. Compute the eigenvectors and eigenvalues of Y T Y : Y T Y = VΛ V T. 6. Compute the eigenvectors of Ω : U YVΛ 1/2.

In Section 3.3.2, we will see that IPCA is just a special case of eigenspace merging. Therefore, The correctness of Algorithm 2 is obvious if Algorithm 3 in Section 3.3.2 is correct. A proof for Algorithm 3 can be found in [13]. For eigenspace merging, if one of the two eigenspaces to be merged has zero dimensions, the problem degenerates to IPCA. The time complexity of Algorithm 2 is dominated by Step 5 and Step 6. If Y is an m r matrix, Step 5 can be computed in time O(rm + r 3 ) (See some details in Section 3.3.4). Step 6 takes time O(r 2 m). So the time complexity of Algorithm 2 is O(r 2 m + r 3 ). In Step 5 and Step 6, we need not retain all the eigenvalues and eigenvectors. It is enough to retain only a few relatively larger eigenvalues and corresponding eigenvectors. 3.3.2 Eigenspace merging Skarbek [13] developed an algorithm to compute eigenspace merging which is more concise than Hall s method [5]. Both methods need not store the covariance matrix of previous training samples. Given two eigenspace models Ω 1 and Ω 2, we aim to find the eigenspace model for the union of the original data sets assuming that the original data is not available. Skarbek s algorithm is summarized as follows [13]. Algorithm 3 Input: Ω 1 = {x 1, U 1, Λ 1,N 1 }: constructed from x 1, x 2,,x N1. Ω 2 = {x 2, U 2, Λ 2,N 2 }: constructed from y 1,y 2,,y N2. Output: Ω = {x, U, Λ,N }: constructed from x 1, x 2,, x N1, y 1, y 2,, y N2. Method: 1. N N 1 + N 2. 2. α 1 N 1 /N,α 2 1 α 1. 3. x α 1 x 1 + α 2 x 2. 4. Generate artificial data: Y [ α 1 U 1 Λ 1/2 1, α 2 U 2 Λ 1/2 2, α 1 α 2 (x 1 x 2 )]. 5. Compute the eigenvectors and eigenvalues of Y T Y : Y T Y = VΛ V T. 6. Compute the eigenvectors of Ω : U YVΛ 1/2. The time complexity of Algorithm 3 is dominated by Step 5 and Step 6. If the sizes of Y, U 1 and U 2 are m r, m q 1 and m q 2 respectively, Step 5 can be computed in time O(q 1 q 2 m + r 3 ) (See some details in Section 3.3.4). Step 6 takes time O(r 2 m). So the time complexity of Algorithm 3 is O(r 2 m + r 3 ). In Step 5 and Step 6, we need not retain all the eigenvalues and eigenvectors. It is enough to retain only a few relatively larger eigenvalues and corresponding eigenvectors. 3.3.3 Eigenspace splitting An eigenspace corresponds to a super-ellipsoid. There are an infinite number of ways to split an eigenspace. Here we adopt an intuitive way. We split the super-ellipsoid using the super-plane that is both passing through the center of the eigenspace and perpendicular to the longest axis of the super-ellipsoid. The longest axis corresponds to the first principal eigenvector. Assuming that we have split an eigenspace Ω into two new eigenspaces Ω 1 and Ω 2, each center of the two new eigenspaces should be a translation of the original center along the longest axis. Because we do not know the original data set from which Ω is constructed from, split Ω equally is a reasonable way. We can get Ω 1 and Ω 2 as follows, and it can be verified that Ω can be produced by merging Ω 1 and Ω 2. Proposition. Assuming that Ω={x, (u 1, u 2,, u q ), diag(λ 1,λ 2,,λ q ),N}, Ω 1 = { x + λ 1 u 1, (u 2,, u q ), diag(λ 2,,λ q ),N/2 }, Ω 2 = { x λ 1 u 1, (u 2,, u q ), diag(λ 2,,λ q ),N/2 }, then Ω can be produced by merging Ω 1 and Ω 2. (N/2 is allowed to be a decimal fraction. ) Proof: α 1 = α 2 =1/2. (4) Y 1 = α 1 (u 2,, u q ) diag(λ 2,,λ q ) 1/2. (5) Y 2 = α 2 (u 2,, u q ) diag(λ 2,,λ q ) 1/2. (6) p = α 1 α 2 2 λ 1 u 1 = λ 1 u 1. (7) Y =[Y 1, Y 2, p]. (8) YY T =(u 2,, u q ) diag(λ 2,,λ q ) (u 2,, u q ) T + λ 1 u 1 u T 1 = UΛU T. (9) The proposition now follows immediately from Algorithm 3 and the definition of Singular Value Decomposition (SVD).

For eigenspace splitting, the time complexity is dominated by the copying of (u 2,, u q ). Therefore, eigenspace splitting can be computed in time O(qm), where m is the dimension of the feature space. 3.3.4 More efficient calculation The calculation of Y T Y in Section 3.3.2 can be simplified if we make some further analysis [3]. Y T Y =[Y 1, Y 2, p] T [Y 1, Y 2, p] α 1 Λ 1 = Y 2 Y 1 α 2 Λ 2. (10) p T Y 1 p T Y 2 p T p Because Y T Y is symmetric, we only need to calculate α 1 Λ 1, α 2 Λ 2, Y 2 T Y 1 and p T Y. In this way, the computational complexity of Y T Y becomes much lower. Similarly, the calculation of Y T Y in Section 3.1 can also be simplified. Y T Y =[Y 1, p] T [Y 1, p] [ ] α1 Λ = p T Y 1 p T. (11) p We only need to calculate α 1 Λ and p T Y. 3.4. Computation of distance Assuming that p ij is the transition probability from Ω (i) to Ω (j), p ij can be easily computed from the transition matrix T: p ij = p(ω (j) Ω (i) )= T ij. (12) K T ij j=1 Since larger p ij usually corresponds to smaller D ij, D ij can be computed using the following formula: D ij = x (i) x (j) (a p ij ), (13) where a is a constant which can be chosen empirically. (For a vector v, v 2 denotes the sum of the squares of all entries in v.) Similarly, the distance between an incoming sample and the center of Ω (i) can be computed using the following formula: d i = x x (i) (b p i ), (14) where b is a constant like a, and p i is the transition probability from Ω (i) to the incoming sample. { p0 if previous sample is in Ω (i), p i = (15) otherwise, 1 p 0 K 1 where p 0 is a constant and 1 K <p 0 < 1. Figure 1. Typical samples of the videos used in our experiments. The images in each row come from a different video sequence. 4. Experimental results In order to evaluate the effectiveness of the proposed algorithm, we conduct some experiments on a 36-subject face video data set which bears large pose variation collected by our lab. In the data set, there are 36 videos which correspond to 36 subjects respectively. Each video sequence was captured indoors at 30 frames per second. The video sequences contain large 2-D (in plane) and 3-D (out of plane) head rotation, with slight expression and illumination changes. In each video, the number of frames ranges from 236 to 1270. Since our experiments mainly focus on learning and recognition, we do not pay much attention to automatic face detection and tracking. In each video sequence, the faces are cropped automatically using a boosted cascade face detector [15], or cropped manually when the detection results are not good enough. All the cropped images are transformed into gray-level images and resized to a standard size of 20 20. Then a histogram equalization step is applied to eliminate the illumination impact. Some samples are shown in Fig.1. In our experiments, we use the first half of each video sequence, ranging from 118 to 635 frames, for online learning. Then we use the second half of each video sequence for the recognition task. Apart from the proposed method, we also do experiments using some other online learning methods. The methods we used in our experiments are listed as follows. -EMS + transition, the proposed method. -EMS, online learning using Eigenspace Merging and Splitting without considering transition probabilities. -EM + transition, online learning using Eigenspace Merging with consideration of transition probabilities. In this method, eigenspace splitting is not used. -IPCA, online learning using IPCA. For each subject, only one eigenspace model is learnt.

For the first three methods, K, namely the number of eigenspace models, is a parameter which is difficult to optimize without experiments. In consideration of this, we do experiments for K = 6, 7, 8, 9 respectively to evaluate the first three methods. The test sets are constructed by randomly sampling from the second half of each video sequence for 10 times with each set containing 50 independently and identically distributed samples [2]. For each sample in the test set, we compute the likelihood that is generated from each eigenspace model and find the maximum to make a classification. The likelihood probability can be computed using the following formula [11]: q exp( 1 p(i t Ω (i) 2 )= q (2π) q/2 i=1 y i 2 λ i ) 1/2 λ i i=1 exp( ɛ2 (x) 2ρ ), (16) (2πρ)(m q)/2 where [y 1,y 2,,y q ] T is the projection from I t to Ω (i), and ɛ 2 (x) is the Euclidean distance from I t to Ω (i). The parameter ρ is chosen as 0.3λ q empirically. For each test set, we use majority voting to make a final decision. The recognition rates shown in Table 1 are the average results over all runs. Table 1. Average recognition rates (%) of different methods when choosing K = 6, 7, 8, 9 respectively. K Method 6 7 8 9 EMS + transition 90.0 91.9 96.4 91.4 EMS 90.8 88.3 90.0 87.5 EM + transition 79.7 90.8 89.2 89.4 IPCA 38.9 The results show that the proposed method outperforms the other three online learning methods most of the time. IPCA gives the worst performance because this method tries to learn non-linear manifolds in a simple linear way. EM + transition can sometimes give good results, but is not stable enough. This is mainly because sometimes it happens that most frames cluster into the same eigenspace and this method degenerates to IPCA. EMS is stabler than EM + transition, but performs worse than EMS + transition most of the time. This is mainly because EMS does not exploit any temporal information. In addition, we can notice that when choosing K =8, our method performs much better than the other three methods. The average recognition rate is as high as 96.4%. Fig. 2 shows some eigenspace centers learnt when choosing K =8. We also implemented the Probabilistic Manifold online learning algorithm presented in [8] for comparison. This algorithm starts with a generic manifold which is trained off-line. The online learning process contains two steps. Figure 2. Some eigenspace centers learnt when choosing K =8. As compared with Fig. 1, we can see that these eigenspace centers are fairly representative of the original sequences. The first step is to identify the pose manifold to which the current image belongs with the highest probability. The second step is to update the appearance manifold using IPCA. The result from the first step is applied to find a set of pre-training images that are expected to appear similar to the current subject in other poses. Then all of the other eigenspaces in the appearance manifold are updated with synthetic images. We use 15 of the 36 video sequences for pre-training. The face images are manually classified into 5 different pose clusters. Then a 10-D pose subspace is computed from the images in each cluster using PCA [8]. We use the remain 21 video sequences for online learning and recognition. The first half of each video sequence is used for online learning, and the second half is used for the recognition task. Our proposed algorithm also runs on the 21 video sequences to compare the performance. The average recognition rates and the processing time are listed in Table 2. We can notice that the recognition rate of EMS + transition in Table 2 is higher than that in Table 1. This is because the results in Table 2 are obtained from a 21-subject data set, while the results in Table 1 are obtained from a 36-subject data set. Smaller data set usually makes the recognition task easier. Table 2. Performance of the Probabilistic Manifold algorithm and our proposed algorithm. For EMS + transition, we choose K =8. Because EMS + transition needs no pre-training, the corresponding blank is left empty. Method Prob. Manifold EMS + transition Recognition rates 92.4% 97.1% Online learning 34.3s 9.5s Pre-training 77.3s - From Table 2, we can see that our proposed algorithm gives better performance than that of [8], while the process-

ing time is much shorter than the Probabilistic Manifold algorithm. This is mainly because the Probabilistic Manifold algorithm only use 5 pose manifolds which are not flexible enough to represent different poses. In addition, updating eigenspaces with synthetic face images significantly increases the time complexity. In contrast, the proposed algorithm is more flexible to generate representative eigenspace models. 5. Conclusions In this paper we have presented a novel method for online appearance model learning which can be applied for video-based face recognition. For each person, we build K linear eigenspace models and a transition matrix to approximately construct the face appearance manifold. Each eigenspace model can be viewed as a pose model which represents a particular pose. We update these eigenspace models in an incremental way. For eigenspace update, we use IPCA or Eigenspace Merging or Eigenspace Splitting if necessary. We also try to exploit the temporal information by maintaining a transition matrix. The computation of distance between two eigenspace centers or between an incoming sample and a certain eigenspace center is influenced by both the Euclidean distance and the transition probability between them. The learnt models are used for face recognition in our experiments. When choosing K appropriately, the proposed method performs very well. The average recognition rate can reach as high as 97.1%. Eigenspace models may not fully exploit the nonlinear characteristic of face appearance manifolds. An interesting direction of future work is to develop algorithms which can learn nonlinear models of face appearance manifolds online. 6. Acknowledgement This work was supported by Program of New Century Excellent Talents in University, National Natural Science Foundation of China (No. 60575003, 60332010, 60335010, 60121302, 60275003, 69825105, 60605008), Joint Project supported by National Science Foundation of China and Royal Society of UK (60710059), the National Basic Research Program (Grant No.2004CB318110), Hi-Tech Research and Development Program of China (2006AA01Z133, 2006AA01Z193) and the Chinese Academy of Sciences. References [1] O. Arandjelovié and R. Cipolla. Face recognition from video using the generic shape-illumination manifold. In Proc. European Conf. on Computer Vision, 3594:27 40, 2006. [2] W. Fan and D.-Y. Yeung. Face recognition with image sets using hierarchically extracted exemplars from appearance manifolds. In Proceedings of the 7th International Conf. on Automatic Face and Gesture Recognition, pages 177 182, 2006. [3] A. Franco, A. Lumini, and D. Maio. Eigenspace merging for model updating. The 16th International Conference on Pattern Recognition, 2:156 159, 2002. [4] P. M. Hall, D. Marshall, and R. R. Martin. Incremental eigenanalysis for classification. In The British Machine Vision Conference, pages 286 295, 1998. [5] P. M. Hall, D. Marshall, and R. R. Martin. Merging and splitting eigenspace models. IEEE Transacetions on Pattern Analysis and Machine Intelligence, 22(9):1042 1049, 2000. [6] V. Krüeger and S. Zhou. Exemplars-based face recognition from video. In Proc. European Conf. on Computer Vision, 4:732 746, 2002. [7] K.-C. Lee, J. Ho, M.-H. Yang, and D. Kriegman. Videobased face recognition using probabilistic appearance manifolds. In proceedings of the CVPR, pages 313 320, 2003. [8] K.-C. Lee and D. Kriegman. Online learning of probabilistic appearance manifolds for video-based recognition and tracking. In proceedings of the CVPR, 1:852 859, 2005. [9] W. Liu, Z. Li, and X. Tang. Spatio-temporal embedding for statistical face recognition from video. In Proc. European Conf. on Computer Vision, 3592:374 388, 2006. [10] X. Liu and T. Chen. Video-based face recognition using adaptive hidden markov models. In Proceedings of the CVPR, pages 340 345, 2003. [11] B. Moghaddam and A. Pentland. Probabilistic visual learning for object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):696 710, 1997. [12] G. Shakhnarovich, J. W. Fisher, and T. Darrell. Face recognition from long-term observations. In Proc. European Conf. on Computer Vision, 3:851 865, 2002. [13] W. Skarbek. Merging subspace models for face recognition. In proceedings of the CAIP, pages 606 613, 2003. [14] X. Tang and Z. Li. Frame synchronization and multi-level subspace analysis for video based face recognition. In Proceedings of the CVPR, pages 902 907, 2004. [15] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proceedings of the CVPR, 1:511 518, 2001. [16] O. Yamaguchi, K. Fukui, and K. ichi Maeda. Face recognition using temporal image sequence. In Proceedings of International Conf. on Automatic Face and Gesture Recognition, pages 318 323, 1998. [17] W. Zhao, R. Chellappa, P. Phillips, and A. Rosenfeld. Face recognition: A literature survey. ACM Computing Surveys, 35(4):399 458, 2003. [18] S. Zhou, V. Krueger, and R. Chellappa. Probabilistic recognition of human faces from video. Computer Vision and Image Understanding, 91:214 245, 2003. [19] S. K. Zhou and R. Chellappa. Probabilistic identity characterization for face recognition. In Proceedings of the CVPR, 2:805 812, 2004.