Two-Dimensional Linear Discriminant Analysis

Similar documents
Efficient Kernel Discriminant Analysis via QR Decomposition

CPM: A Covariance-preserving Projection Method

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition

Solving the face recognition problem using QR factorization

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

LEC 5: Two Dimensional Linear Discriminant Analysis

Bilinear Discriminant Analysis for Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Lecture: Face Recognition

Weighted Generalized LDA for Undersampled Problems

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Linear Discriminant Analysis Using Rotational Invariant L 1 Norm

Example: Face Detection

ECE 661: Homework 10 Fall 2014

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK

Chapter 2 Canonical Correlation Analysis

A Unified Bayesian Framework for Face Recognition

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Clustering VS Classification

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

PCA, Kernel PCA, ICA

Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost

Discriminant Uncorrelated Neighborhood Preserving Projections

Lecture 17: Face Recogni2on

Lecture 13 Visual recognition

Foundations of Computer Vision

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Enhanced Fisher Linear Discriminant Models for Face Recognition

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels

Myoelectrical signal classification based on S transform and two-directional 2DPCA

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning

Machine Learning - MT & 14. PCA and MDS

Small sample size in high dimensional space - minimum distance based classification.

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai

Pattern Recognition 2

Machine learning for pervasive systems Classification in high-dimensional spaces

Adaptive Discriminant Analysis by Minimum Squared Errors

A Statistical Analysis of Fukunaga Koontz Transform

Linear Algebra Methods for Data Mining

Principal Component Analysis

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Robust Tensor Factorization Using R 1 Norm

Principal Component Analysis (PCA)

A Least Squares Formulation for Canonical Correlation Analysis

Lecture: Face Recognition and Feature Reduction

EIGENVALUES AND SINGULAR VALUE DECOMPOSITION

A Novel PCA-Based Bayes Classifier and Face Analysis

Reconnaissance d objetsd et vision artificielle

Simultaneous and Orthogonal Decomposition of Data using Multimodal Discriminant Analysis

Lecture 17: Face Recogni2on

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Machine Learning 11. week

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Expectation Maximization

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

1 Singular Value Decomposition and Principal Component

Numerical Methods I Singular Value Decomposition

Discriminative K-means for Clustering

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Lecture 02 Linear Algebra Basics

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data

Local Learning Projections

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.

Probabilistic Latent Semantic Analysis

COS 429: COMPUTER VISON Face Recognition

Principal Component Analysis and Linear Discriminant Analysis

Dynamical Systems Solutions to Exercises

14 Singular Value Decomposition

A Tensor Approximation Approach to Dimensionality Reduction

PCA FACE RECOGNITION

Dimensionality reduction

Approximating the Covariance Matrix with Low-rank Perturbations

Parallel Singular Value Decomposition. Jiaxing Tan

Lecture: Face Recognition and Feature Reduction

EECS 275 Matrix Computation

Probabilistic Class-Specific Discriminant Analysis

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 5, MAY ASYMMETRIC PRINCIPAL COMPONENT ANALYSIS

Large Scale Data Analysis Using Deep Learning

Mathematical foundations - linear algebra

Face recognition Computer Vision Spring 2018, Lecture 21

Data Mining Techniques

CITS 4402 Computer Vision

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

A Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis

Linear Subspace Models

CS 340 Lec. 6: Linear Dimensionality Reduction

Discriminative Direction for Kernel Classifiers

Transcription:

Two-Dimensional Linear Discriminant Analysis Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu Ravi Janardan Department of CSE University of Minnesota janardan@cs.umn.edu Qi Li Department of CIS University of Delaware qili@cis.udel.edu Astract Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has een used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity prolem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity prolem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) efore LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely, which stands for 2-Dimensional Linear Discriminant Analysis. overcomes the singularity prolem implicitly, while achieving efficiency. The key difference etween and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the algorithm works with data in matrix representation. To further reduce the dimension y, the comination of and classical LDA, namely, is studied, where LDA is preceded y. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that and achieve competitive recognition accuracy, while eing much more efficient. Introduction Linear Discriminant Analysis [2, 4] is a well-known scheme for feature extraction and dimension reduction. It has een used widely in many applications such as face recognition [], image retrieval [6], microarray data classification [3], etc. Classical LDA projects the data onto a lower-dimensional vector space such that the ratio of the etween-class distance to the within-class distance is maximized, thus achieving maximum discrimination. The optimal projection (transformation) can e readily computed y applying the eigendecomposition on the scatter matrices. An intrinsic limitation of classical LDA is that its ojective function requires the nonsingularity of one of the scatter matrices. For many applications, such as face recognition, all scatter matrices in question can e singular since the data is from a very high-dimensional space, and in general, the dimension exceeds the

numer of data points. This is known as the undersampled or singularity prolem [5]. In recent years, many approaches have een rought to ear on such high-dimensional, undersampled prolems, including pseudo-inverse LDA, PCA+LDA, and regularized LDA. More details can e found in [5]. Among these LDA extensions, PCA+LDA has received a lot of attention, especially for face recognition []. In this two-stage algorithm, an intermediate dimension reduction stage using PCA is applied efore LDA. The common aspect of previous LDA extensions is the computation of eigen-decomposition of certain large matrices, which not only degrades the efficiency ut also makes it hard to scale them to large datasets. In this paper, we present a novel approach to alleviate the expensive computation of the eigen-decomposition in previous LDA extensions. The novelty lies in a different data representation model. Under this model, each datum is represented as a matrix, instead of as a vector, and the collection of data is represented as a collection of matrices, instead of as a single large matrix. This model has een previously used in [8, 9, 7] for the generalization of SVD and PCA. Unlike classical LDA, we consider the projection of the data onto a space which is the tensor product of two vector spaces. We formulate our dimension reduction prolem as an optimization prolem in Section 3. Unlike classical LDA, there is no closed form solution for the optimization prolem; instead, we derive a heuristic, namely. To further reduce the dimension, which is desirale for efficient querying, we consider the comination of and LDA, namely, where the dimension of the space transformed y is further reduced y LDA. We perform experiments on three well-known face datasets to evaluate the effectiveness of and and compare with PCA+LDA, which is used widely in face recognition. Our experiments show that: () is applicale to high-dimensional undersampled data such as face images, i.e., it implicitly avoids the singularity prolem encountered in classical LDA; and (2) and have distinctly lower costs in time and space than PCA+LDA, and achieve classification accuracy that is competitive with PCA+LDA. 2 An overview of LDA In this section, we give a rief overview of classical LDA. Some of the important notations used in the rest of this paper are listed in Tale. Given a data matrix A IR N n, classical LDA aims to find a transformation G IR N l that maps each column a i of A, for i n, in the N-dimensional space to a vector i in the l-dimensional space. That is G : a i IR N i = G T a i IR l (l<n). Equivalently, classical LDA aims to find a vector space G spanned y {g i } l i=, where G =[g,,g l ], such that each a i is projected onto G y (g T a i,,gl T a i) T IR l. Assume that the original data in A is partitioned into k classes as A = {Π,, Π k }, where Π i contains n i data points from the ith class, and k i= n i = n. Classical LDA aims to find the optimal transformation G such that the class structure of the original high-dimensional space is preserved in the low-dimensional space. In general, if each class is tightly grouped, ut well separated from the other classes, the quality of the cluster is considered to e high. In discriminant analysis, two scatter matrices, called within-class (S w ) and etween-class (S ) matrices, are defined to quantify the quality of the cluster, as follows [4]: S w = k i= x Π i (x m i )(x m i ) T, and S = k i= n i(m i m)(m i m) T, where m i = n i x Π i x is the mean of the ith class, and m = k n i= x Π i x is the gloal mean.

Notation Description n numer of images in the dataset k numer of classes in the dataset A i ith image in matrix representation a i ith image in vectorized representation r numer of rows in A i c numer of columns in A i N dimension of a i (N = r c) Π j jth class in the dataset L transformation matrix (left) y R transformation matrix (right) y I numer of iterations in B i reduced representation of A i y l numer of rows in B i l 2 numer of columns in B i Tale : Notation It is easy to verify that trace(s w ) measures the closeness of the vectors within the classes, while trace(s ) measures the separation etween classes. In the low-dimensional space resulting from the linear transformation G (or the linear projection onto the vector space G), the within-class and etween-class matrices ecome S L = GT S G, and S L w = G T S w G. An optimal transformation G would maximize trace(s L) and minimize trace(sl w). Common optimizations in classical discriminant analysis include (see [4]): { max trace((s L w ) S L ) } { and min trace((s L ) Sw) L }. () G G The optimization prolems in Eq. () are equivalent to the following generalized eigenvalue prolem: S x = λs w x, for λ 0. The solution can e otained y applying an eigendecomposition to the matrix Sw S,ifS w is nonsingular, or S S w,ifs is nonsingular. There are at most k eigenvectors corresponding to nonzero eigenvalues, since the rank of the matrix S is ounded from aove y k. Therefore, the reduced dimension y classical LDA is at most k. A stale way to compute the eigen-decomposition is to apply SVD on the scatter matrices. Details can e found in [6]. Note that a limitation of classical LDA in many applications involving undersampled data, such as text documents and images, is that at least one of the scatter matrices is required to e nonsingular. Several extensions, including pseudo-inverse LDA, regularized LDA, and PCA+LDA, were proposed in the past to deal with the singularity prolem. Details can e found in [5]. 3 2-Dimensional LDA The key difference etween classical LDA and the that we propose in this paper is in the representation of data. While classical LDA uses the vectorized representation, works with data in matrix representation. We will see later in this section that the matrix representation in leads to an eigendecomposition on matrices with much smaller sizes. More specifically, involves the eigen-decomposition of matrices with sizes r r and c c, which are much smaller than the matrices in classical LDA. This dramatically reduces the time and space complexities of over LDA.

Unlike classical LDA, considers the following (l l 2 )-dimensional space L R, which is the tensor product of the following two spaces: L spanned y {u i } l i= and R spanned y {v i } l2 i=. Define two matrices L = [u,,u l ] IR r l and R = [v,,v l2 ] IR c l2. Then the projection of X IR r c onto the space L Ris L T XR R l l2. Let A i IR r c, for i =,,n, e the n images in the dataset, clustered into classes Π,, Π k, where Π i has n i images. Let M i = n i X Π i X e the mean of the ith class, i k, and M = k n i= X Π i X e the gloal mean. In, we consider images as two-dimensional signals and aim to find two transformation matrices L IR r l and R IR c l2 that map each A i IR r c, for i n, to a matrix B i IR l l2 such that B i = L T A i R. Like classical LDA, aims to find the optimal transformations (projections) L and R such that the class structure of the original high-dimensional space is preserved in the low-dimensional space. A natural similarity metric etween matrices is the Froenius norm [8]. Under this metric, the (squared) within-class and etween-class distances D w and D can e computed as follows: k k D w = X M i 2 F, D = n i M i M 2 F. i= X Π i i= Using the property of the trace, that is, trace(mm T )= M 2 F, for any matrix M, we can rewrite D w and D as follows: ( k ) D w = trace (X M i )(X M i ) T, i= X Π i ( k ) D = trace n i (M i M)(M i M) T. i= In the low-dimensional space resulting from the linear transformations L and R, the withinclass and etween-class distances ecome ( k ) D w = trace L T (X M i )RR T (X M i ) T L, i= X Π i ( k ) D = trace n i L T (M i M)RR T (M i M) T L. i= The optimal transformations L and R would maximize D and minimize D w. Due to the difficulty of computing the optimal L and R simultaneously, we derive an iterative algorithm in the following. More specifically, for a fixed R, we can compute the optimal L y solving an optimization prolem similar to the one in Eq. (). With the computed L, we can then update R y solving another optimization prolem as the one in Eq. (). Details are given elow. The procedure is repeated a certain numer of times, as discussed in Section 4. Computation of L For a fixed R, D w and D can e rewritten as D w = trace ( L T S R w L ), D = trace ( L T S R L ),

Algorithm (A,,A n, l, l 2 ) Input: A,,A n, l, l 2 Output: L, R, B,,B n. Compute the mean M i of ith class for each i as M i = n i X Π i X; 2. Compute the gloal mean M = k n i= X Π i X; 3. R 0 (I l2, 0) T ; 4. For j fromtoi 5. Sw R k i= X Π i (X M i )R j Rj T (X M i) T, S R k i= n i(m i M)R j Rj T (M i M) T ; 6. Compute the first l eigenvectors {φ L l }l l= of ( ) Sw R S R ; 7. L j [ ] φ L,,φ L l 8. Sw L k i= X Π i (X M i ) T L j L T j (X M i), S L k i= n i(m i M) T L j L T j (M i M); 9. Compute the first l 2 eigenvectors {φ R l }l2 l= of ( Sw) L S L ; 0. R j [ ] φ R,,φ R l 2 ;. EndFor 2. L L I, R R I ; 3. B l L T A l R, for l =,,n; 4. return(l, R, B,,B n ). where S R w = k i= X Π i (X M i )RR T (X M i ) T, S R = k n i (M i M)RR T (M i M) T. Similar to the optimization prolem in Eq. (), the optimal L can e computed y solving the following optimization prolem: max L trace ( (L T S R w L) (L T S R L)). The solution can e otained y solving the following generalized eigenvalue prolem: S R w x = λs R x. Since S R w is in general nonsingular, the optimal L can e otained y computing an eigendecomposition on ( S R w ) S R. Note that the size of the matrices S R w and S R is r r, which is much smaller than the size of the matrices S w and S in classical LDA. Computation of R Next, consider the computation of R, for a fixed L. A key oservation is that D w and D can e rewritten as D w = trace ( R T SwR L ), D = trace ( R T S L R ), where k k Sw L = (X M i ) T LL T (X M i ), S L = n i (M i M) T LL T (M i M). i= X Π i i= This follows from the following property of trace, that is, trace(ab) =trace(ba), for any two matrices A and B. Similarly, the optimal R can e computed y solving the following optimization prolem: max R trace ( (R T SwR) L (R T S LR)). The solution can e otained y solving the following generalized eigenvalue prolem: Swx L = λs Lx. Since SL w is in general nonsingular, i=

the optimal R can e otained y computing an eigen-decomposition on ( S L w) S L. Note that the size of the matrices S L w and S L is c c, much smaller than S w and S. The pseudo-code for the algorithm is given in Algorithm. It is clear that the most expensive steps in Algorithm are in Lines 5, 8 and 3 and the total time complexity is O(n max(l,l 2 )(r + c) 2 I), where I is the numer of iterations. The algorithm depends on the initial choice R 0. Our experiments show that choosing R 0 =(I l2, 0) T, where I l2 is the identity matrix, produces excellent results. We use this initial R 0 in all the experiments. Since the numer of rows (r) and the numer of columns (c) of an image A i are generally comparale, i.e., r c N, we set l and l 2 to a common value d in the rest of this paper, for simplicity. However, the algorithm works in the general case. With this simplification, the time complexity of the algorithm ecomes O(ndNI). The space complexity of is O(rc) =O(N). The key to the low space complexity of the algorithm is that the matrices Sw R, S R, SL w, and S L can e formed y reading the matrices A l incrementally. 3. As mentioned in the Introduction, PCA is commonly applied as an intermediate dimensionreduction stage efore LDA to overcome the singularity prolem of classical LDA. In this section, we consider the comination of and LDA, namely, where the dimension y is further reduced y LDA, since small reduced dimension is desirale for efficient querying. More specifically, in the first stage of, each image A i IR r c is reduced to B i IR d d y, with d<min(r, c). In the second stage, each B i is first transformed to a vector i IR d2 (matrix-to-vector alignment), then i is further reduced to L i IR k y LDA with k <d 2, where k is the numer of classes. Here, matrix-to-vector alignment means that the matrix is transformed to a vector y concatenating all its rows together consecutively. The time complexity of the first stage y is O(ndNI). The second stage applies classical LDA to data in d 2 -dimensional space, hence takes O(n(d 2 ) 2 ), assuming n>d 2. Hence the total time complexity of is O ( nd(ni + d 3 ) ). 4 Experiments In this section, we experimentally evaluate the performance of and on face images and compare with PCA+LDA, used widely in face recognition. For PCA+LDA, we use 200 principal components in the PCA stage, as it produces good overall results. All of our experiments are performed on a P4.80GHz Linux machine with GB memory. For all the experiments, the -Nearest-Neighor (NN) algorithm is applied for classification and ten-fold cross validation is used for computing the classification accuracy. Datasets: We use three face datasets in our study: PIX, ORL, and PIE, which are pulicly availale. PIX (availale at http://peipa.essex.ac.uk/ipa/pix/faces/manchester/testhard/), contains 300 face images of 30 persons. The image size is 52 52. We susample the images down to a size of 00 00 = 0000. ORL (availale at http://www.uk.research.att.com/facedataase.html), contains 400 face images of 40 persons. The image size is 92 2. PIE is a suset of the CMU PIE face image dataset (availale at http://www.ri.cmu.edu/projects/project 48.html). It contains 665 face images of 63 persons. The image size is 640 480. We susample the images down to a size of 220 75 = 38500. Note that PIE is much larger than the other two datasets.

8 6 4 2 8 6 4 2 2 4 6 8 0 2 4 6 8 20 Numer of iterations 8 6 4 2 8 6 4 2 2 4 6 8 0 2 4 6 8 20 Numer of iterations.05 5 5 2 4 6 8 0 2 4 6 8 20 Numer of iterations Figure : Effect of the numer of iterations on and using the three face datasets; PIX, ORL and PIE (from left to right). The impact of the numer, I, of iterations: In this experiment, we study the effect of the numer of iterations (I in Algorithm ) on and. The results are shown in Figure, where the x-axis denotes the numer of iterations, and the y-axis denotes the classification accuracy. d =0is used for oth algorithms. It is clear that oth accuracy curves are stale with respect to the numer of iterations. In general, the accuracy curves of are slightly more stale than those of. The key consequence is that we only need to run the for loop (from Line 4 to Line ) in Algorithm only once, i.e., I =, which significantly reduces the total running time of oth algorithms. The impact of the value of the reduced dimension d: In this experiment, we study the effect of the value of d on and, where the value of d determines the dimensionality in the transformed space y. We did extensive experiments using different values of d on the face image datasets. The results are summarized in Figure 2, where the x-axis denotes the values of d (etween and 5) and the y-axis denotes the classification accuracy with -Nearest-Neighor as the classifier. As shown in Figure 2, the accuracy curves on all datasets stailize around d =4to 6. Comparison on classification accuracy and efficiency: In this experiment, we evaluate the effectiveness of the proposed algorithms in terms of classification accuracy and efficiency and compare with PCA+LDA. The results are summarized in Tale 2. We can oserve that has similar performance as PCA+LDA in classification, while it outperforms. Hence the LDA stage in not only reduces the dimension, ut also increases the accuracy. Another key oservation from Tale 2 is that is almost one order of magnitude faster than PCA+LDA, while, the running time of is close to that of. Hence is a more effective dimension reduction algorithm than PCA+LDA, as it is competitive to PCA+LDA in classification and has the same numer of reduced dimensions in the transformed space, while it has much lower time and space costs. 5 Conclusions An efficient algorithm, namely, is presented for dimension reduction. is an extension of LDA. The key difference etween and LDA is that works on the matrix representation of images directly, while LDA uses a vector representation. has asymptotically minimum memory requirements, and lower time complexity than LDA, which is desirale for large face datasets, while it implicitly avoids the singularity prolem encountered in classical LDA. We also study the comination of and LDA, namely, where the dimension y is further reduced y LDA. Experiments show that and are competitive with PCA+LDA, in terms of classification accuracy, while they have significantly lower time and space costs.

0.7 0.6 0.5 0.7 0.6 0.5 0.4 0.3 0.2 0.7 0.6 0.5 0.4 0.3 0.2 0. 0.4 2 4 6 8 0 2 4 Value of d 0. 2 4 6 8 0 2 4 Value of d 0 2 4 6 8 0 2 4 Value of d Figure 2: Effect of the value of the reduced dimension d on and using the three face datasets; PIX, ORL and PIE (from left to right). Dataset PCA+LDA Time(Sec) Time(Sec) Time(Sec) PIX 98.00% 7.73 97.33%.69 98.50%.73 ORL 97.75% 2.5 97.50% 2.4 98.00% 2.9 PIE 99.32% 53 00% 57 Tale 2: Comparison on classification accuracy and efficiency: means that PCA+LDA is not applicale for PIE, due to its large size. Note that PCA+LDA involves an eigendecomposition of the scatter matrices, which requires the whole data matrix to reside in main memory. Acknowledgment Research of J. Ye and R. Janardan is sponsored, in part, y the Army High Performance Computing Research Center under the auspices of the Department of the Army, Army Research Laoratory cooperative agreement numer DAAD9-0-2-004, the content of which does not necessarily reflect the position or the policy of the government, and no official endorsement should e inferred. References [] P.N. Belhumeour, J.P. Hespanha, and D.J. Kriegman. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(7):7 720, 997. [2] R.O. Duda, P.E. Hart, and D. Stork. Pattern Classification. Wiley, 2000. [3] S. Dudoit, J. Fridlyand, and T. P. Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(457):77 87, 2002. [4] K. Fukunaga. Introduction to Statistical Pattern Classification. Academic Press, San Diego, California, USA, 990. [5] W.J. Krzanowski, P. Jonathan, W.V McCarthy, and M.R. Thomas. Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Applied Statistics, 44:0 5, 995. [6] Daniel L. Swets and Juyang Weng. Using discriminant eigenfeatures for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(8):83 836, 996. [7] J. Yang, D. Zhang, A.F. Frangi, and J.Y. Yang. Two-dimensional PCA: a new approach to appearance-ased face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26():3 37, 2004. [8] J. Ye. Generalized low rank approximations of matrices. In ICML Conference Proceedings, pages 887 894, 2004. [9] J. Ye, R. Janardan, and Q. Li. GPCA: An efficient dimension reduction scheme for image compression and retrieval. In ACM SIGKDD Conference Proceedings, pages 354 363, 2004.