When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

Size: px

Start display at page:

Download "When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants"

Constance Harper
5 years ago
Views:

1 When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg Abstract he Fisher Linear Discriminant (FLD) is commonly used in pattern recognition. It finds a linear subspace that maximally separates class patterns according to Fisher s Criterion. Several methods of computing the FLD have been proposed in the literature, most of which require the calculation of the so-called scatter matrices. In this paper, we bring a fresh perspective to FLD via the Fukunaga-Koontz ransform (FK). We do this by decomposing the whole data space into four subspaces, and show where Fisher s Criterion is maximally satisfied. We prove the relationship between FLD and FK analytically, and propose a method of computing the most discriminative subspace. his method is based on the QR decomposition, which works even when the scatter matrices are singular, or too large to be formed. Our method is general and may be applied to different pattern recognition problems. We validate our method by experimenting on synthetic and real data.. Introduction In recent years, discriminant subspace analysis has been extensively studied in computer vision and pattern recognition. It has been widely used for feature extraction and dimension reduction in face recognition [ and text classification [5. One popular method is the Fisher Linear Discriminant (FLD), also known as Linear Discriminant Analysis (LDA) [2, 3. It tries to find an optimal subspace such that the separability of two classes is maximized. his is achieved by minimizing the within-class distance and maximizing the between-class distance simultaneously. o be more specific, in terms of the between-class scatter matrix S b and the within-class scatter matrix S w, the Fisher s Criterion can be written as J F (Φ) = Φ S b Φ Φ S w Φ () By maximizing the criterion J F, Fisher Linear Discriminant finds the subspaces in which the classes are most linearly separable. he solution [3 that maximizes J F is a set of the eigenvectors {φ i } which must satisfy S b φ = λs w φ. (2) his is called the generalized eigenvalue problem [2, 3. he discriminant subspace is spanned by the generalized eigenvectors. he discriminability of each eigenvector is measured by the corresponding generalized eigenvalue, i.e., the most discriminant subspace corresponds to the maximal generalized eigenvalue. he generalized eigenvalue problem can be solved by matrix inversion and eigendecomposition, i.e., applying the eigen-decomposition on S w S b. Unfortunately, for many applications with high dimensional data and few training samples, such as face recognition, the scatter matrix S w is singular because generally the dimension of sample data is greater than the number of samples. his is known as the undersampled or small sample size problem [3, 2. Up till now, a great number of methods have been proposed to circumvent the requirement of nonsingularity of S w, such as Fisherface [ and LDA/GSVD [5. In [, Fisherface first applies PCA [6, 8 to reduce dimension such that S w is nonsingular, then followed by LDA. he LDA/GSVD algorithm [5 avoids the inversion of S w by the simultaneous diagonalization via the Generalized Singular Value Decomposition (GSVD). However, these methods are designed to overcome the singularity problem and do not directly relate to the generalized eigenvalue, the essential measure of the discriminability. In this paper, we propose to apply Fukunaga Koontz ransform (FK) [3 on the LDA problem. Based on the eigenvalue ratio of FK, we decompose the whole data space into four subspaces. hen our theoretical analyses show the relationship between LDA, FK and GSVD. Our work has three main contributions:. We present a unifying framework to understand differ-

2 ent methods, namely, LDA, FK and GSVD. o be more specific, for the LDA problem, GSVD is equivalent to FK; and the generalized eigenvalue of LDA is equal to both the eigenvalue ratio of FK and the square of the generalized singular value of GSVD. 2. he proposed theory is useful for pattern recognition. Our theoretical analyses demonstrate how to choose the best subspaces for maximum discriminability. he experiments on synthetic data and real data validate our theory. 3. In connecting FK with LDA, we show how FK, originally meant for 2-class problems can be generalized to handle multiple classes. he rest of this paper is organized as follows: Section 2 briefly reviews the mathematical background for LDA, GSVD, and FK. In Section 3, we first analyze the discriminant subspace of LDA based on FK, then set up the connections between LDA, FK and GSVD. We apply our theory to the classification problem on synthetic and real data in Section 4, and conclude our paper in Section Mathematical Background Notations. Let A = {a,..., a N }, a i R D denote a data set of given D-dimensional vectors. Each data point belongs to exactly one of C object classes {L,..., L C }. he number of vectors in class L i is denoted by N i, thus N = N i. Observe that for high-dimensional data, e.g., face images, generally, C N D. he between-class scatter matrix S b, the within-class scatter matrix S w, and the total scatter matrix S t are defined as follows: S b = S w = S t = C N i (m i m)(m i m) = H b H b (3) i= C i= (a m i )(a m i ) = H w H w (4) a L i N (a i m)(a i m) = H t H t (5) i= S t = S b + S w (6) Here m i denotes the class mean and m is the global mean of A. he matrices H b R D C, H w R D N, and H t R D N are the precursor matrices of the betweenclass scatter matrix, the within-class scatter matrix and the total scatter matrix respectively. Let us denote the rank of each scatter matrix: r w = Rank(S w ), r b = Rank(S b ), and r t = Rank(S t ). Note that for high-dimensional data (D N), r b C, r w N and r t N. 2.. Linear Discriminant Analysis Given the data matrix A, which can be divided into C classes, we try to find a linear transformation matrix Φ R D d, where d < D. It maps high dimensional data to a low dimensional space. From the perspective of pattern classification, LDA aims to find the optimal transformation Φ such that the projected data are well separated. Usually, two types of criteria are used to measure the separability of classes [3. One type gives the upper bound on the Bayes error, e.g., Bhattacharyya distance. he other type is based on scatter matrices. As shown in Equation, Fisher s criterion belongs to the latter type. he solution of the criterion is the generalized eigenvector and eigenvalue (See Equation 2). However, as we mentioned, if S w is nonsingular it can be solved by the generalized eigendecomposition: S w S b φ = λφ. Otherwise, S w is singular and we have to circumvent the nonsingularity requirement via LDA/GSVD [5, for example Generalized SVD he Generalized Singular Value Decomposition (GSVD) was developed by Van Loan et al. [4. We will briefly review the mechanism of GSVD using LDA as an example. Howland et. al. [5 extended the applicability of LDA to the case when S w is singular. his is done by using the simultaneous diagonalization of the scatter matrices via GSVD [4. First, to reduce computation load, H b and H w are used instead of S b and S w. hen, based on GSVD there exist orthogonal matrices Y R C C, Z R N N, and a nonsingular matrix X R D D such that: where Σ b = I b Y H b X = [Σ b,, (7) Z H wx = [Σ w,. (8) D b, Σ w = O w O b D w I w. Since Y, Z are orthogonal matrices and X is nonsingular, from Equations 7, 8 we obtain H b = Y [Σ b, X, (9) H w = Z [Σ w, X. () he matrices I b R (rt rw) (rt rw) and I w R (rt r b) (r t r b ) are identity matrices, and O b R (C r b) (r t r b ) and O w R (N rw) (rt rw) are rectangle, zero matrices which may have no rows or no columns, and D b =diag(α rt r w+,..., α rb ) and D w =diag(β rt r w+,..., β rb ) satisfy > α rt r w+...

3 α rb >, < β rt r w+... β rb <, and α 2 i + β2 i =. hus, Σ b Σ b + Σ wσ w = I, () where I R rt rt is an identity matrix. he columns of X, the generalized singular vectors for the matrix pair (H b, H w ), can be used as the discriminant feature subspace based on GSVD Fukunaga Koontz ransform he FK was designed for the 2-class recognition problem. Given data matrices A and A 2 from two classes, the autocorrelation matrices S = A A and S 2 = A 2 A 2 are positive semi-definite (p.s.d.) and symmetric. he sum of these two matrices is still p.s.d. and symmetric and can be factorized in the form [ [ D U S = S + S 2 = [U, U (2) U Without loss of generality, S may be singular and r=rank(s)< D, thus D = diag{λ,..., λ r }, λ... λ r >. U R D r is the set of eigenvectors corresponding to nonzero eigenvalues and U R D (D r) is the orthogonal complement of U. Now we can whiten S by a transformation operator P = UD /2. he sum of the two matrices S, S 2 becomes P SP = P (S + S 2 )P = S + S 2 = I (3) where S = P S P and S 2 = P S 2 P, I R r r is an identity matrix. Suppose an eigenvector of S is v with eigenvalue λ, that is: S v = λ v. Since S = I S 2, we can rewrite it as: (I S 2 )v = λ v (4) S 2 v = ( λ )v (5) his means that S 2 has the same eigenvector as S but the corresponding eigenvalue is λ 2 = λ. Consequently, the dominant eigenvector of S is the weakest eigenvector of S 2, and vice versa. 3. heory In this section, we first employ FK on S b and S w, which results in decomposing the whole data space S t into four subspaces. hen we explain the relationship between LDA, FK and GSVD based on the generalized eigenvalue. his gives insight into the different discriminant subspace analyses. In this paper, we focus on linear discriminant subspace analysis, but our approach can be easily extended to nonlinear discriminant subspace analysis. For example, based on kernel method, we can apply FK on kernelized S b and kernelized S w. Eigenvalues λ b U r t r w r b r t D Dimension Figure. he whole data space is decomposed into four subspaces via FK. In U, the null space of S t, there is no discriminant information. In U, the sum of λ b + is equal to. Note that we represent all possible subspaces, but in practice, some of these subspaces may not exist. 3.. LDA/FK Generally speaking, for the LDA problem there are more than 2 classes. o handle multiple classes, we replace the autocorrelation matrices S and S 2 with the scatter matrices S b and S w. Since S b, S w and S t are p.s.d., symmetric and S t = S b + S w, we can apply FK on S b, S w and S t, which we shall henceforth call LDA/FK. he whole data space is decomposed into two subspaces: U and U (See Fig. ). On the one hand, U is the set of eigenvectors corresponding to the zero eigenvalues of S t. It is the intersection of the null spaces of S b and S w, and contains no discriminant information. On the other hand, U is the set of eigenvectors corresponding to the nonzero eigenvalues of S t. It contains discriminant information. Based on FK, S b = P S b P and S w = P S w P share the same eigenvectors, and the sum of two eigenvalues corresponding to the same eigenvector is equal to. S b = VΛ b V (6) S w = VΛ w V (7) I = Λ b + Λ w (8) Where V R rt rt is the orthogonal eigenvector matrix, Λ b, Λ w R rt rt are diagonal eigenvalue matrices. According to the eigenvalue ratio λ b, U can be further decomposed into three subspaces. o keep the integrity of the whole data space, we incorporate U as the fourth subspace. See Fig... Subspace : the span of eigenvectors {v i } corresponding to =, λ b =. Since λ b =, in this subspace, the eigenvalue ratio is maximized. U

4 2. Subspace 2: the span of eigenvectors {v i } corresponding to < < and < λ b <. Since < λ b <, the eigenvalue ratio is smaller than that of Subspace. 3. Subspace 3: the span of eigenvectors {v i } corresponding to = and λ b =. Since λ b =, the eigenvalue ratio is minimal. 4. Subspace 4: U, the span of eigenvectors corresponding to the zero eigenvalues of S t. Note that in practice, any of these four subspaces may not exist, depending on the ranks of S b, S w and S t. For example, if r t = r w + r b, then Subspace 2 does not exist. As illustrated in Fig., the null space of S w is the union of Subspace and Subspace 4, and the null space of S b is the union of Subspace 3 and Subspace 4, if they exist Relationship between LDA, GSVD and FK How do these four subspaces help to maximize the Fisher criterion J F? We explain this in the following heorem that connects the generalized eigenvalue of J F to the eigenvalues of FK. We begin with a Lemma: Lemma. For the LDA problem, GSVD is equivalent to FK, with X = [UD /2 V, U, Λ b = Σ b Σ b and Λ w = Σ wσ w. Where X, Σ b and Σ w are from GSVD (See Equations 7, 8), and U, D, V, U, Λ w and Λ b are matrices from FK (See Equations 6, 7, 8 ). Proof. GSVD = FK Based on GSVD, hus, S b = H b H b = X [ Σ b Σ b S w = H w H w = X [ Σ w Σ w X (S b + S w ) X = [ I X (9) X (2) (2) Since Σ b Σ b + Σ wσ w = I R rt rt, if we choose the first r t columns of X as P, i.e., P = X (D,rt), then P (S b + S w )P = I. his is exactly the outcome of FK. Meanwhile, we can reach the conclusion that Λ b = Σ b Σ b and Λ w = Σ wσ w. FK = GSVD Based on FK, P = UD /2 S b = P S b P = D /2 U H b H b UD /2 (22) S b = VΛ b V (23) Hence, D /2 U H b H b UD /2 = VΛ b V (24) In general, there is no unique decomposition on the above equation because H b H b = H byy H b for any orthogonal matrix Y R C C. hat is: D /2 U H b YY H b UD /2 = VΛ b V (25) Y H b UD /2 = Σ b V (26) Y H b UD /2 V = Σ b (27) where Σ b R C rt, and Λ b = Σ b Σ b. If we define X = [UD /2 V, U R D D. hen, Y H b X = Y H b [UD /2 V, U (28) = [Y H b UD /2 V, (29) = [ Σ b, (3) Here, H b U = and H wu = because U is the intersection of the null spaces of S b and S w. Similarly, we can obtain Z H wx = [ Σ w,, where Z R N N is an arbitrary orthogonal matrix and Σ w R N rt, and Λ w = Σ Σ w w. Since Λ b + Λ w = I and I R rt rt is an identity matrix, it is easy to check that Σ Σ b b + Σ Σ w w = I, which satisfies the constraint of GSVD. Now we have to prove X is nonsingular. XX = [UD /2 V, U [ V D /2 U U = UD U + U U [ [ D U = [U, U I U (3) where V R r r and [U, U are orthogonal matrices. Note that U U = and U U =. hus XX can be eigen-decomposed with positive eigenvalues, which means X is nonsingular. Based on the above lemma, we can investigate the relationship between the eigenvalue ratio of FK and the generalized eigenvalue λ of the Fisher criterion J F. heorem. If λ is the solution of Equation 2 (the generalized eigenvalue of S b and S w ), and λ b and are the eigenvalues after applying FK on S b and S w, then λ = λ b, where λ b + =. Proof. Based on GSVD, it is easy to verify that: S b = H b H b = X [ Σ b Σ b X (32)

5 According to our lemma, Λ b = Σ b Σ b, thus S b = X [ Λb X (33) Similarly, we can rewrite S w but with Λ w. Since S b φ = λs w φ, X [ Λb X φ = λx [ Λw X φ (34) Let v = X φ, and multiply X on both sides, we can obtain the following. [ Λb [ Λb If we add λ [ Λb ( + λ) [ Λw v = λ v (35) v on both sides of Equation 35, then [ I v = λ v. (36) his means that ( + λ)λ b = λ, which can be rewritten as λ b = λ( λ b ) = λ because λ b + =. Now, we can observe that λ = λ b. Corollary. If λ is the generalized eigenvalue of S b and S w, and α and β are the solutions of Equations 7, 8 and α/β is the generalized singular value of the matrix pair (H b, H w ), then λ = α2 β 2, where α 2 + β 2 =. Proof. In Lemma we have proved that Λ b = Σ b Σ b and Λ w = Σ wσ w, that is: λ b = α 2 and = β 2. According to heorem, we observe that λ = λ b. herefore it is easy to see that λ = α2 β. Note that α 2 β is the generalized singular value of (H b, H w ) by GSVD, and λ is the generalized eigenvalue of (S b, S w ). he Corollary suggests how to evaluate the discriminant subspaces of LDA/GSVD. Actually, Howland et. al. in [5 applied the Corollary implicitly, but in this paper we explicitly connect the generalized singular value α β measure of discriminability. Based on our analysis, the eigenvalue ratio λ b with λ, the and the square of the generalized singular value α2 β 2, both are equal to the generalized eigenvalue λ, the measure of discriminability. herefore, according to Fig., Subspace, with the infinite eigenvalue ratio λ b, is the most discriminant subspace, followed by Subspace 2 and Subspace 3. However, Subspace 4 which contains no discriminant information, can be safely thrown away. Input: he data matrix A. Output: Projection matrix Φ F such that the J F is maximized.. Compute H b and H t from data matrix A: [ N(m H b = m),..., N C(m C m), H t = [a m,..., a N m. 2. Apply QR decomposition on H t = QR, where Q R D r t, R R r t N and r t = Rank(H t). 3. Let S t = RR, since S t = Q S tq = Q H th t Q = RR. 4. Let Z = Q H b. 5. Let S b = ZZ, since S b = Q S b Q = Q H b H b Q = ZZ. 6. Compute the eigenvectors {v i} and eigenvalues {λ i} of S b. S t 7. Sort the eigenvectors v i according to λ i in decreasing order, λ λ 2 λ k > λ k+ = = λ rt =. 8. he final projection matrix Φ F = QV, where V = {v i}. Note that we can choose the subspaces based on the columns of V. Figure 2. Algorithm: Apply QR decomposition to compute LDA/FK. 3.3 Algorithm Although we proved that FK is equivalent to GSVD on the LDA problem, LDA/GSVD is computationally expensive. he running time of LDA/GSVD is O(D(N + C) 2 ), where D is the dimensionality, N is the number of training samples and C is the number of classes. In this paper, we propose an efficient way to compute the Subspaces, 2 and 3 of LDA/FK based on QR decomposition because Subspace 4 contains no discriminant information. Moreover, we use smaller matrices H b and H t because matrices S b, S w and S t may be too large to be formed. Our LDA/FK algorithm is shown in Fig. 2. he running time is O(DN 2 ). 4. Experiments 4.. A oy Problem First, we experiment on synthetic data: three single Gaussian classes with the same covariance matrix and different means. In 3D space, the three classes share the same covariance matrix: diag([,, ), and each class has points. hey have different means: [,,, [,, 2, and [,, (See Fig. 3(a)).

6 able. Eigenvalue and eigenvalue ratio of the toy problem. 2.5 LDA/FK LDA/GSVD LDA λ b / α 2 /β 2 λ / = ( ) / =.44 = / = (.66 6 ) (a) As shown in Fig. 4, LDA/FK decomposes the whole data space into three subspaces:, 2 and 3. Here Subspace 4 does not exist because the number of samples (3) is larger than the dimension (3). Note that each subspace consists of only one eigenvector. In terms of the eigenvalue ratio, Subspace is the most discriminative subspace, followed by Subspaces 2 and 3. But if we use Fisherface (PCA followed by LDA), we cannot obtain Subspace because Fisherface restricts to the Subspaces 2 and 3 where so that S w is invertible. In doing so, Fisherface is discarding the most discriminative information (Subspace ). o compare LDA/FK with Fisherface, we project the original 3D data to Subspace and Subspace 2, and we also project to 2D space via Fisherface. As illustrated in Fig. 3(b) and 3(c), the 2D projection of FK is more separable than the projection of Fisherface. his is consistent with our theory. Let us recall the Corollary: the eigenvalue ratio of LDA/FK is equal to the square of the generalized singular value of LDA/GSVD. o check this, we also apply GSVD on H b and H w, which is known as LDA/GSVD [5. hen we obtain the generalized singular value α/β. From able, we see that α 2 /β 2 λ b /λ 2 w. hus we show the validity of our theory experimentally. Moreover, this means we can compute the generalized eigenvalue λ of LDA, although it cannot be obtained directly when S w is singular Face Recognition We now apply LDA/FK on a real problem: face recognition. We perform experiments on the CMU-PIE [7 face dataset. he CMU-PIE database consists of over 4, images of 68 subjects. Each subject was recorded across 3 different poses, under 43 different illuminations (with and without background lighting), and with 4 different expression. Here we use a subset of PIE for our experiments. We choose 67 subjects 3, and each subject has 24 frontal face images with background lighting. All of these face images are aligned based on eyes coordinates, and cropped to 2 he difference depends on the machine precision. 3 In CMU-PIE, there are 68 subjects together, we only choose 67 subjects because one subject has fewer frontal face images (b) Figure 3. Original 3 classes (triangle, circle and cross) in 3D space (a) and 2D projection by (b) LDA/FK and (c) Fisherface. Observe that the 2D projection of FK is more separable than that of Fisherface Fig. 5 shows a sample of PIE face images used in our experiments. It is easy to see that the major challenge on this data set is to do face recognition under different illuminations. Here, to evaluate the performance of LDA/FK, we compare it with PCA and Fisherface. We employ -NN to do face recognition after projection onto the respective subspaces. For PCA, the projected subspace contains 66 dimensions; for Fisherface, we take the first principal components to make S w nonsingular, then followed by LDA. For LDA/FK, we project data to Subspaces, 2. In face recognition, we usually have an undersampled problem, which is also the reason for the singularity of S w. o evaluate the performance under such situation, we randomly choose n training samples from each subject, n = 2,, 2, and the remaining images are used for testing. Moreover, for each set of n training samples, we repeat sampling times to compute the mean and standard deviation of classification accuracies. As illustrated in Fig. 6, we observe that the more the training samples, the better the recognition accuracy. o be more specific, for each method, increasing the number of training samples increases the mean recognition rate and decreases the standard deviation. Fig. 6 also shows that no matter how many training sam- (c)

2 3.8 9 Eigenvalues.6.4 2 3 Recognition rate (%) 8 7 6 PCA Fisherface LDA/FK λ b.2 5 4 Dimension 2 4 6 8 2 Number of training samples (per class) Figure 4.

7 Eigenvalues Recognition rate (%) PCA Fisherface LDA/FK λ b Dimension Number of training samples (per class) Figure 4. Eigenvalue curve for the toy problem by LDA/FK. Note that each subspace consists of only one eigenvector. Figure 6. he accuracy curve on PIE with varying training samples. We show the rate and standard deviation from runs. Figure 5. A sample of face images from PIE dataset. Each subject has 24 frontal face images under different lighting conditions. ples are used, LDA/FK consistently outperforms PCA and Fisherface. his is easily explained by the subspaces used in each method. LDA/FK uses Subspaces and 2, which are the most discriminative. Fisherface ignores Subspace and operates in Subspaces 2 and 3. PCA has no notion of discriminative subspaces at all, since it is designed for pattern representation, rather than classification [2, 3. Note that even when we have only 2 or 4 training samples from each subject, LDA/FK can still obtain the best recognition accuracy ( 99%) with the smallest standard deviation (< %). his means LDA/FK can handle small sample size problem with high and stable performance, which is not the case for PCA or Fisherface. 5 Conclusion In this paper, we showed how FK provides valuable insights into LDA by exposing the four subspaces that make up the entire data space. hese subspaces differ in discriminability because each has a different value for the Fisher s Criterion. We also proved that the common technique of applying PCA followed by LDA (a.k.a. Fisherface) is not optimal because this discards Subspace, which is the most discriminative subspace. Our result also showed how FK, originally proposed for 2-class problems, can be generalized to handle multiple classes. his is achieved by replacing the autocorrelation matrices S and S 2 with the scatter matrices S b and S w. In the near future, we intend to investigate further the insights that FK reveals about LDA. Another interesting direction to pursue is the extend our theory to nonlinear discriminant analysis. One way is to use the kernel trick employed in SVM, e.g., construct kernelized between-class scatter and within-class scatter matrices. FK may yet again reveal new insights into kernelized LDA. References [ P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE ransactions on Pattern Analysis and Machine Intelligence, 9(7), 997. [2 R. Duda, P. Hart, and D. Stork. Pattern Classification, 2nd edition. John Wiley and Sons, 2. [3 K. Fukunaga. Introduction to Statistical Pattern Recognition, 2rd Edition. Academic Press, 99. [4 G. Golub and C. V. Loan. Matrix Computations, 3rd Edition. John Hopkins Univ. Press, 996. [5 P. Howland and H. Park. Generalizing discriminant analysis using the generalized singular value decomposition. IEEE ransactions on Pattern Analysis and Machine Intelligence, 26(8):995 6, August 24. [6 I. Jolliffe. Principal component analysis. New York: SpringerVerlag, 986. [7. Sim, S. Baker, and M. Bsat. he cmu pose, illumination, and expression database. IEEE ransactions on Pattern Analysis and Machine Intelligence, 25(2):65 68, December 23. [8 M. urk and A. Pentland. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 3(), 99.

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition Jun Liu, Songcan Chen, Daoqiang Zhang, and Xiaoyang Tan Department of Computer Science & Engineering, Nanjing University