Weighted Generalized LDA for Undersampled Problems

Size: px
Start display at page:

Download "Weighted Generalized LDA for Undersampled Problems"

Transcription

1 Weighted Generalized LDA for Undersampled Problems JING YANG School of Computer Science and echnology Nanjing University of Science and echnology Xiao Lin Wei Street No Nanjing CHINA LIYA FAN School of Mathematics Sciences Liaocheng University Hunan Road Liaocheng CHINA fanliya63@126.com QUANSEN SUN School of Computer Science and echnology Nanjing University of Science and echnology Xiao Lin Wei Street No Nanjing CHINA sunquansen@mail.njust.edu.cn Abstract: Linear discriminant analysis (LDA) is a classical approach for dimensionality reduction. It aims to maximize between-class scatter and minimize within-class scatter thus maximize the class discriminant. However for undersampled problems where the data dimensionality is larger than the sample size all scatter matrices are singular and the classical LDA encounters computational difficulty. Recently many LDA extensions have been developed to overcome the singularity such as Pseudo-inverse LDA (PILDA) LDA based on generalized singular value decomposition (LDA/GSVD) null space LDA (NLDA) and range space LDA (RSLDA). Moreover they endure the Fisher criterion that is nonoptimal with respect to classification rate. o remedy this problem weighted schemes are presented for several LDA extensions in this paper and called them weighted generalized LDA algorithms. Experiments on Yale face database and A& face database are performed to test and evaluate effective of the proposed algorithms and affect of weighting functions. Key Words: Feature extraction dimensionality reduction undersampled problem weighting function misclassification rate 1 Introduction Many machine learning data mining and bioinformatics problems involve data in very high-dimensional space. Undersampled problems where data dimension is larger than the sample size frequently occur in many applications including information retrieval 1-3 face recognition 4-6 and microarray data analysis 7. he feature extraction process is an important part of pattern recognition and machine learning which can results in computation cost decreasing and classification performance increasing. An appropriate representation of data from all features is an important problem in machine learning and data mining problems. All original features can not always beneficial for classification or regression tasks. Some features are irrelevant or redundant in distribution of data set. hese features can decrease the classification performance. In order to increase the classification performance and to reduce computation cost of classifier the feature selection process should be used in classification or regression problems 8. Linear discriminant analysis (LDA) 9 is one of the most popular linear projection techniques for feature extraction. However the classical LDA usually encounters two difficulties. One is the singularity problem caused by the undersampled problem. In recent years many LDA extensions have been developed to deal with this problem such as pseudoinverse LDA (PILDA) 1-11 LDA/GSVD null space LDA (NLDA) 1215 range space LDA 16 orthogonal LDA (OLDA) 1117 uncorrelated LDA (ULDA) 1118 and regularized LDA (RLDA) 19. In PILDA the inverse of the scatter matrix is replaced by the pseudo-inverse. It is equivalent to approximating the solution using a least-squares solution method. he optimal transformation is computed through the simultaneous diagonalization of scatter matrices. LDA/GSVD is one of generalizations of LDA based on GSVD it overcomes the singularity of the scatter matrices by applying the GSVD to E-ISSN: Volume

2 solve the generalized eigenvalue problem. he classical LDA solution is a special case of LDA/GSVD method. In NLDA the between-class distance is maximized in the null space of the within-class scatter matrix. It is a two-step approach the transformation using a basis of null space of the within-class scatter matrix is performed in the first stage and then in the transformed space the second projective directions are searched. In range space LDA the within-class distance is minimizes in the range space of the betweenclass scatter matrix. Similarly we propose a method based on the transformation by a basis of the range space of the within-class scatter matrix to handle undersampled problems. Another drawback of LDA-based algorithms is that the Fisher separability criterion is not directly related to classification rate. A promising solution to this problem is to introduce weighted schemes into the criteria. We can see from that weighting functions are close with classification accuracy. Different weighting functions can lead different classification error. Selecting suitable weighting function can increase classification accuracy. In this paper we focus on weighted versions of PILDA LDA/GSVD NLDA and range space LDA with five weighting functions for each weighted scheme in which the K-Nearest neighbors (KNN) method 22 is used for a classifier. We apply the Euclidean distance d ij = m i m j between the means of class i and j in weighting functions w(d ij ). A weighting function is generally a monotonically decreasing function because classes that are closer to one another are likely to have a greater confusion and should be given a greater weightage. For weighting functions we first apply two special cases of the weighting function w(d ij ) = (d ij ) p proposed by Lotlikar et al. in 2 with p = 1 and p = 2 and then an improved version of weighting function w(d ij ) = 1 erf( d ij 2d 2 ij 2 ) 2 presented by Loog in 25 where the Mahanalobis distance is replaced by the Euclidean distance. In addition according to the feature of weighting functions we present two new weighting functions. he rest of the paper is organized as follows. In Section 2 we briefly review generalized LDA algorithms. Weighted versions of generalized LDA algorithms and weighting functions are introduced in Section 3. Extensive experiments with proposed algorithms have been performed in Section 4 the results demonstrate the effective of the proposed algorithms and the affect of weighting functions. Conclusion follows in Section 5. 2 Generalized LDA In this section we review several generalized LDA algorithms which can overcome the limitation of the classical LDA. Given a data matrix A = a 1 a n R m n where a i R m i = 1 n. Assume the original data is already clustered and partitioned into r classes. Let A = A 1 A r where A i R m n i is the data matrix belonged to the i-th class and r i=1 n i = n. Let N i be the set of indices of i-th class i.e. a j belongs to the i-th class for j N i. he aim of LDA is to find a linear transformation G R m l that maps each a i to y i R l by y i = G a i and optimally preserves the cluster structure in the reduced-dimensional space. Let the between-class within-class and total scatter matrices are defined as S b = r i=1 n i(c i c)(c i c) S w = r i=1 j N i (a j c i )(a j c i ) and S t = n j=1 (a j c)(a j c) (see 1) where c i = (1/n i ) j N i a j and c = (1/n) n j=1 a j are class centroids and the global centroid respectively. We can easily see that the trace of S w measures the within-class closeness and the trace of S b measures the between-class separation. In the lowerdimensional space obtained from the transformation G three scatter matrices above become Sw L = G S w G Sb L = G S b G and St L = G S t G. An optimal transformation G would maximize trace (Sb L) and minimize trace (Sw) L simultaneously. In classical LDA which requires S w or S b is nonsingular common optimizations include max G {trace((sl w) 1 Sb L)} min G {trace((sl b ) 1 Sw)}. L (1) he problem (1) is equivalent to finding the eigenvectors satisfying the generalized eigen equation S b x = λs w x for λ. he solution can be obtained by solving an eigenvalue problem on the matrix Sw 1 S b if S w is nonsingular or on S 1 b S w if S b is nonsingular. here are at most r 1 eigenvectors corresponding to nonzero eigenvalues since rank(s b ) r 1. herefore the reduced dimension by classical LDA is at most r 1. A stable way to solve this eigenvalue problem is to apply singular value decomposition (SVD) on the scatter matrices the details can be found in 5. When the data dimensionality is larger than the sample size all scatter matrices are singular and the classical LDA is no longer applicable. In order to solve the small sampled size problems several generalizations of LDA have been proposed. E-ISSN: Volume

3 2.1 Pseudo-inverse LDA he pseudo-inverse of a matrix A denoted as A + refers to a unique matrix satisfying A + AA + = A + AA + A = A (AA + ) = AA + and (A + A) = A + A. he pseudo-inverse A + is commonly computed by SVD 23. If A = U V Σ is the SVD of the matrix A where U and V are orthogonal and Σ is diagonal with positive diagonal entries then Σ A + 1 = V U. In this subsection we consider the criterion F 1 (G) = trace((s L b )+ S L w) proposed in 1 which is a natural extension of (1) since the inverse of a matrix may not exist but the pseudo-inverse of a matrix is well-defined. Moreover when the matrix is invertible its pseudo-inverse is the same as its inverse. In PILDA the optimal transformation matrix G can be obtained by solving a minimization problem min G F 1(G). (2) he main technique for solving the problem (2) is to use GSVD. Define the matrices H w = A 1 c 1 (e 1 ) A r c r (e r ) H b = n 1 (c 1 c) n r (c r c) H t = a 1 c a n c where e i = (1 1) R n i. hen the scatter matrices can be expressed as S w = H w H w S b = H b H b and S t = H t H t. Let GSVD be applied to the matrix pair (H b H w ) such that U b H b X = Γ b U w H w X = Γ w (3) H w where U b R r r and U w R n n are orthogonal X R m m is a nonsingular matrix Γ b Γ b +Γ wγ w = H I s s = rank b and Γ b Γ b and Γ wγ w are diagonal matrices with nonincreasing and nondecreasing diagonal components respectively. he simultaneous diagonalizations of S b and S w can be obtained by Γ X S b X = b Γ b m s = I µ D τ = D b Γ X S w X = w Γ w µ = E τ s µ τ m s m s I s µ τ m s = D w (4) where I and denote identity and zero matrices respectively. D τ = diag(α µ+1 α µ+τ ) E τ = diag(β µ+1 β µ+τ ) with α µ+1 α µ+τ > < β µ+1 β µ+τ and αi 2 + β2 i = 1 for i = µ + 1 µ + τ. By (4) we can deduce that X S t X = X S b X + X Is S w X =. o solve the problem (2) we need the following three lemmas where the proofs of the first two are straightforward from the definition of the pseudoinverse and the third lemma can be found in 1. Lemma 1 For any matrix A R m n we have trace(aa + ) = rank(a). Lemma 2 For any matrix A R m n we have (AA ) + = (A + ) A +. Lemma 3 Let Σ = diag(σ 1 σ s ) be any diagonal matrix with σ 1 σ s >. hen for any matrix M R m s with rank(m) = δ the following inequality holds: trace((mσm ) + MM ) δ i 1 σ i where the equality holds if and only if M = D U for some orthogonal matrix U R m m and matrix D = Σ 1 QΣ 2 R δ δ where Q R δ δ is orthogonal and Σ 1 Σ 2 R δ δ are diagonal matrices with positive diagonal entries. heorem 4 Let X be the matrix specified by the GSVD of (H b H w ) in (3) and X δ the matrix consisting of the first δ columns of X where δ = rank(s b ). hen G = X δ M minimizes F 1 for any nonsingular M. E-ISSN: Volume

4 Proof: By the GSVD of the matrix pair (Hb H w ) we have Sb L = GD G b and Sw L = GD G w where G = (X 1 G). Let G = G1 G 2 G 3 G 4 be a partition of G such that G1 R l µ G 2 R l τ G 3 R l (s µ τ) and G 4 R l (m s). Putting G 12 = G 1 G 2 we have Sw+S L b L = G 12G 12 +G 3G 3 and S L b = G 12 ΣG 12 where Σ = Iµ D τ. By Lemma 1 we get trace((s L b )+ S L b ) = rank(sl b ) rank(s b ) = δ. Let Ĝ = G 12Σ 1/2 and Ĝ = U ˆΣ (5) V be the SVD of Ĝ. hen rank(ĝ) = rank(g 12) and Ĝ+ = ˆΣ 1 V U. Consequently by Lemma 3 we can deduce that trace((sb L)+ (Sw L + Sb L)) = trace((g 12 ΣG 12 )+ (G 12 G 12 + G 3G 3 )) trace((g 12 ΣG 12 )+ G 12 G 12 ) = trace((ĝ+ ) Ĝ + ĜΣ 1 Ĝ ) = trace(vδ Σ 1 V δ ) µ 1 + δ i=1 i=µ+1 1 α 2 i (6) where V δ is the matrix consisting of the first δ columns of V. We can easily show that the last inequality in Qδ (6) becomes equality if V δ = for any orthogonal matrix Q δ R δ δ and G 12 = U D R l (µ+τ) where D = ˆΣQ δ Σ 1/2 δ and Σ δ is the δ-th principal submatrix of Σ. It follows from (5)-(6) that F 1 (G) = trace((sb L)+ (Sw L + Sb L)) trace((sl b )+ Sb L) δ ( βi α i ) i=µ+1 where the equality holds if rank(sb L) = δ G 12 = ˆΣQδ Σ 1/2 U δ and G 3 =. We observe that the minimization of F 1 is independent of G 4 and simply set it to zero. Hence the minimum of F 1 is attained if the partition of G = G 1 G 2 G 3 G 4 satisfies ˆΣQδ Σ 1/2 G 12 = U δ G 3 = and G 4 =. Let U = U 1 U 2 be a partition of U so that U 1 R l δ and U 2 R l (l δ). It follows that ˆΣQδ Σ 1/2 G 12 = U δ U = 1 ˆΣQδ Σ 1/2 δ M. Note that U is orthogonal matrix Q δ is an arbitrary orthogonal matrix and ˆΣ and Σ δ are diagonal matrices. herefore M is an arbitrary nonsingular matrix and G = X δ M minimizes F 1 (G). his method is called PILDA that is Algorithm 2.1. PILDA H 1. Compute the SVD of K = b Hw as K = R P U where P and U are orthogonal and R is diagonal 2. Let s = rank(k) and δ = rank(h b ). Compute W from the SVD of P (1 : r 1 : s) the submatrix consisting of the first r rows and the first s columns of matrix P from Step 1 as P (1 : r 1 : s) = V ΓW 3. Let X = U R 1 W I 4. Assign the first δ columns of the matrix X to X δ 5. G = X δ M where M is any nonsingular matrix. 2.2 LDA based on GSVD In this subsection we review another method based on the GSVD proposed by Howland et al his approach is justified to preserve the cluster structure after dimension reduction. When S w = H w Hw is nonsingular it is wellknown that trace((sw) L 1 Sb L) is maximized if G h R m l consists of l eigenvectors of Sw 1 S b corresponding to the l largest eigenvalues 1. Let x i denote the i-th column of X then S b x i = λ i S w x i (7) which means that λ i and x i are an eigenvalueeigenvector pair of Sw 1 S b and trace(sw 1 S b ) = λ λ m. Expressing λ i as αi 2/β2 i the eigenvalue problem (7) becomes E-ISSN: Volume

5 β 2 i H b H b x i = α 2 i H w H w x i. (8) Let X = X 1 X 2 X 3 X 4 R m m be a partition of X obtained in section 2.1 with X 1 R m µ X 2 R m τ X 3 R m (s µ τ) and X 4 R m (m s). Defining α i = 1 β i = for i = 1 µ and α i = β i = 1 for i = µ+τ +1 s we can see that (8) is satisfied for 1 i s. For the remaining m s columns of X H b Hb x i and H w Hw x i are zero. So (8) is satisfied for arbitrary values of α i and β i when s + 1 i m and then the columns of X are the generalized singular vectors for the matrix pair (Hb H w ). A question that remains is which columns of X to include in G h. If S w is singular 14 argues in terms of the simultaneous optimization max(trace(g G h S bg h )) h min(trace(g G h S wg h )). h (9) Letting g j represent a column of G h we have trace(g h S bg h ) = gj S bg j and trace(g h S wg h ) = g j S w g j. If x i is the one of the leftmost µ columns of X then x i null(s w ) null(s b ) c (the superscript c denotes the complement) which indicates that including x i in G h can increase trace(g h S bg h ) while leave trace(g h S wg h ) unchanged. If x i is the one of the rightmost m s columns of X then x i null(s w ) null(s b ) which implies that adding x i to G h has no effect on these trace and does not contribute to either maximization or minimization in (9). In either case G h should be comprised of the leftmost µ + τ = rank(hb ) columns of X as illustrated in 24. Hence in LDA/GSVD the transformation matrix is G h = X 1 X 2. An efficient algorithm for LDA/GSVD was presented in 12 as follows. Algorithm 2.2. LDA/GSVD 1. Compute the EVD of S t : Σ1 U S t = U 1 U 2 1 U 2 2. Compute V from the EVD of Sb = Σ 1/2 1 U1 S bu 1 Σ 1/2 1 : S b = V Γ b Γ bv. 3. Assign the first r 1 columns of U 1 Σ 1/2 1 V to G h. 2.3 Null space LDA Chen et al. 15 proposed the null space LDA (NLDA) for dimensionality reduction of undersampled problems which is a two-stage LDA method. his approach projects the original space onto the null space. of S w by using an orthonormal basic of null (S w ) and then in the projected space a transformation that maximizes the between-class scatter is computed. Let the EVD of S w R m m be S w = U w Σ w U w = U w1 U w2 Σw1 U w1 U w2 where s 1 = rank(s w ) U w is orthogonal Σ w1 is a diagonal matrix with nonincreasing positive diagonal elements and U w1 contains the first s 1 columns of the orthogonal matrix U w. We can easily show that null(s w ) = span(u w2 ) and the transformation by U w2 Uw2 projects the original data to null(s w). he between-class scatter matrix S b in the transformed space is S b = U w2 Uw2 S bu w2 Uw2. Consider the EVD of S b : S b = Ũb Σ Σb1 b Ũb = Ũ Ũ b1 Ũ b2 b1 Ũ b2 where s 2 = rank( S b ) Ũ b1 R m s 2 and Σ b1 R s 2 s 2. In NLDA the optimal transformation matrix G e is obtained by G e = U w2 U w2ũb Range space LDA In this subsection we propose another approach to solve undersampled problems. his method first transforms the original space by using a basis of range(s w ) and then in the transformed space the maximization of between-class scatter is pursued. We denote shortly this method by RSLDA. Let the EVD of S w R m m be S w = U w Σ w U w = U w1 U w2 Σw1 U w1 U w2 where s 1 = rank(s w ) Σ w1 is a diagonal matrix and U w1 R m s 1. We can easily show that range(s w ) = span(u w1 ) and the transformation by V y = U w1 Σ 1/2 w1 projects the original data to range(s w ). he withinclass scatter matrix S w in the transformed space is S w = Vy S w V y = I s1. Let the EVD of S b Vy S b V y be S b = Ũb Σ b Ũ b = Ũ b1 Ũ b2 Σb1 Ũ b1 Ũ b2 where s 3 = rank( S b ) Σ b1 is a diagonal matrix and Ũ b1 R s 1 s 3. In RSLDA the optimal transformation matrix G y is obtained by G y = V y Ũ b1 = U w1 Σ 1/2 w1 Ũ b1. E-ISSN: Volume

6 3 Weighted versions and weighting functions As in 25 S b can be rewritten as S b = r 1 r i=1 j=i+1 n i n j n (m i m j )(m i m j ). From this formulation it can be observed that classical LDA maximize the Euclidean distance between the class means and all class pairs have the same weights irrespective of their separability in the original space which makes that the resulting transformation preserve the distance of already well-separated classes and cause a large overlap of neighboring classes in the transformed space. In fact the classes which are closer together are more likely to have more confusion and should therefore be more heavily weighted. wo similarly motivated solutions to this problem have been proposed: weighted pairwise Fisher criteria 25 and fractional-step LDA 2. he fractionalstep LDA is iterative and very time-consuming. In this section in order to improve the classification accuracy we study the weighted versions of generalized LDA mentioned in Section 2. We define a weighted between-class scatter matrix as Ŝ b = r 1 r i=1 j=i+1 n i n j n w(d ij)(m i m j )(m i m j ) (1) where d ij = m i m j is the Euclidean distance between the means of class i and j and w(d ij ) is a weighting function. Apparently the weighted between-class scatter matrix Ŝb degenerates to the conventional between-class scatter matrix S b if the weighting function in (1) gives a constant weight value. If the between-class scatter matrix S b is replaced by the weighted between-class scatter matrix Ŝ b by means of the algorithms obtained in Section 2 we can get some weighted generalized LDA. 3.1 Weighted generalized LDA With Algorithms 2.1 and 2.2 we can derive weighted PILDA and weighted LDA/GSVD algorithms. Algorithm 3.1. Weighted PILDA 1. Compute the EVD of S t : Σt1 U S t = U t1 U t2 t1 U t2 2. Compute the weighted between-class scatter matrix Ŝb 3. Compute V from the EVD of Š b = Σ 1/2 t1 Ut1ŜbU t1 Σ 1/2 t1 : Šb = V ˆΓ ˆΓ b b V 4. Assign the first r 1 columns of U t1 Σ 1/2 t1 V to ˆX δ 5. Ĝ = ˆX δ M where M is any nonsingular matrix. Algorithm 3.2. Weighted LDA/GSVD 1. Compute the EVD of S t : Σt1 U S t = U t1 U t2 t1 U t2 2. Compute the weighted between-class scatter matrix Ŝb 3. Compute V from the EVD of Š b = Σ 1/2 t1 Ut1ŜbU t1 Σ 1/2 t1 : Šb = V ˆΓ ˆΓ b b V 4. Assign the first r 1 columns of U t1 Σ 1/2 t1 V to Ĝ h. From Algorithms 3.1 and 3.2 we can see that weighted LDA/GSVD is a special case of weighted PILDA with M being the identity matrix. We will discuss the affect of the choice of M to classification accuracy in next section. Similarly we can obtain weighted NLDA and weighted RSLDA. Algorithm 3.3. Weighted NLDA 1. Compute the EVD of S w : Σw1 U S w = U w1 U w2 w1 U w2 2. Compute the weighted between-class scatter matrix Ŝb 3. Compute the EVD of S b = U w2 U w2ŝbu w2 U w2 : S b = Ũ b1 Ũ b2 Σb1 4. Ĝ e = U w2 U w2ũb1. Algorithm 3.4. Weighted RSLDA Ũ b1 Ũ b2 1. Compute the EVD of S w : Σw1 U S w = U w1 U w2 w1 U w2 2. Compute the weighted between-class scatter matrix Ŝb E-ISSN: Volume

7 3. Compute the EVD of Sb = Σ 1/2 w1 Uw1ŜbU w1 Σ 1/2 w1 : S b = Ũ b1 Ũ b2 Σb1 4. Ĝ y = U w1 Σ 1/2 w1 Ũ b Weighting functions Ũ b1 Ũ b2 We can see from that weighting functions have close relationships with classification accuracy. Different weighting functions can product different classification error. Selecting suitable weighting function can increase classification accuracy. In this paper we consider four weighted schemes Algorithms with five weighting functions for each weighted scheme. We apply the Euclidean distance d ij = m i m j between the means of class i and j in weighting functions w(d ij ). A weighting function is generally a monotonically decreasing function because classes that are closer to one another are likely to have a greater confusion and should be given a greater weightage. We first apply two special cases of the weighting function w(d ij ) = (d ij ) p proposed by Lotlikar et al. in 2 with p = 1 and p = 2 and then an improved version of weighting function w(d ij ) = 1 erf( d ij 2d 2 ij 2 ) presented by Loog in 25 where the 2 Mahanalobis distance is replaced by the Euclidean distance. In addition according to the feature of weighting functions mentioned above we present two new weighting functions. hey are listed below: w 1 : w(d ij ) = (d ij ) 2 w 2 : w(d ij ) = 1 erf( d ij 2d 2 ij 2 ) 2 w 3 : w(d ij ) = (d ij ) 1 w 4 : w(d ij ) = e 1 d ij w 5 : w(d ij ) = 1. e d ij 4 Experiments and analysis In this section in order to explain the effective of the proposed methods and illustrate the affect of weighting functions and any nonsingular matrix M in PILDA and weighted PILDA to classification accuracy we conduct a series of experiments on 5 different data sets from the A& face database 26 and Yale face database 27. here are ten different images of 4 distinct subjects in A& (or called ORL) face image database. For some subjects the images were taken at different times varying the lighting facial details and facial expressions. he size of each images is pixels with 256 grey levels per pixel. he Yale face database contains 165 face images of 15 individuals. here are 11 images per subject and these 11 images are respectively under the following different facial expression or configuration: center-light wearing glasses happy left-light wearing no glasses normal right-light sad sleepy surprised and wink. In our experiment the images are cropped to a size of and the gray level values of all images are rescaled to 1. Some images of one person are shown in Figure 1. Figure 1: Images of one person in Yale. Data sets 1-3 are from Yale face database. Data set 1 chooses the 21-th dimension to 6-th dimension it contains 15 classes and 165 examples with 4 dimensions data set 2 chooses the 41-th dimension to 9-th dimensionit contains 15 classes and 165 examples with 5 dimensions data set 3 chooses the 8-th dimension to 124-th dimension contains 15 classes and 165 examples with 225 dimensions. Data sets 4-5 are from A& face database. Data set 4 chooses the first 15 samples and the first 35 dimensions it contains 15 classes and 15 examples with 35 dimensions data set 5 chooses the first 15 samples and the 333-th dimension to 665-th dimension it contains 15 classes and 15 examples with 333 dimensions. For all 5 data sets the number of data samples is smaller than the dimension of data space all scatter matrices are singular. In the following experiments the KNN algorithm with K = 7 is used as a classifier for all date sets. For each method we applied 5-fold cross-validation to compute the misclassification rate. Experiments are repeated 5 times to obtain mean prediction misclassification rate. 4.1 Effect of the matrix M In this subsection we study the effect of the matrix M in PILDA to classification accuracy on five data sets. We randomly generate 5 matrices for M and compute the misclassification rates by using the optimal transformation matrices produced in PILDA and 7-NN. he experiment results are listed in able 1 where w : w(d ij ) = 1. From able 1 we can see that matrix M 3 pro- E-ISSN: Volume

8 kernel w(d ij) able 1: Misclassification rate (%) on five data sets W-PILDA W-LDA/GSVD W-NLDA W-RSLDA M 1 M 2 M 3 M 4 M 5 w w date set 1 w w w w w w date set 2 w w w w w w date set 3 w w w w w w date set 4 w w w w w w date set 5 w w w w E-ISSN: Volume

9 duces the best classification accuracy for data set 1 None of the accuracies is higher than matrix M 4 for data set 2 Matrix M 4 produces the best classification accuracy among PILDA but is not higher than RSLDA for data set 3 Matrix M 1 produces the best classification accuracy among PILDA but is not higher than RSLDA for data set 4 Matrix M 5 produces the best classification accuracy among PILDA but is not higher than NLDA and RSLDA for data set Affect of weighting functions In this subsection we study the effect of five weighting functions given in subsection 3.2 to classification accuracy on five data sets. he experiment results are listed in able 1. good overall results for PILDA the weighting function w 2 produces good overall results for LDA/GSVD NLDA and RSLDA. For data set 3 the weighting function w 4 produces good overall results for PILDA and LDA/GSVD the weighting function w 2 produces good overall result for NLDA the weighting functions w 3 and w 5 produce the best results for RSLDA. For data sets 4 and 5 the weighting function w 5 can t be applied the weighting functions w 1 and w 2 produce same results. For data set 4 and M 1 M 2 the best weighting functions for LDA/GSVD NLDA and RSLDA are w 1 and w 2. he weighting function w 3 produces the best result for M 3 and M 4. he best weighting function for M 5 is w 4. For data set 5 and M 1 M 3 M 4 the best weighting functions for LDA/GSVD and RSLDA are w 1 and w 2. he weighting function w 3 produces the best results for M 2 and M 5 and NLDA. Figure 2: he misclassification rates (%) for Data set 1. 5 Conclusion In this paper based on Pseudo-inverse LDA LDA/GSVD null space LDA and range space LDA we propose weighted Pseudo-inverse LDA weighted LDA/GSVD weighted null space LDA and weighted range space LDA. Not only can these methods deal with the singularity problem caused by the undersampled problem they can also make the criterions directly related to classification errors. In order to explain the effective of the proposed methods and illustrate the affect of weighting functions and any nonsingular matrix M in PILDA and weighted PILDA to classification accuracy we conduct a series of experiments on 5 different data sets from the A& face database and Yale face database. Results show that different weighting functions and different nonsingular matrix M affect the classification accuracy of the proposed methods. Acknowledgements: his work is supported by National Natural Science Foundation of China ( ). Figure 3: he misclassification rates (%) for Data set 2. From Figure 2 we can see that the weighting function w 2 is better than other weighting functions for data set 1. Especially we get the best classification accuracy for M 3 and w 2. Figure 3 shows that the weighting function w 3 produces References: 1 K. Fukunaga Introduction to Statistical Pattern Recognition second ed. Academic Press W. Berry S.. Dumais and G. W. O Brie Using linear algebra for intelligent information retrieval SIAM Review pp Deerwester S.. Dumais G.W. Furnas.K. Landauer and R. Harshman Indexing by latent semantic analysis Journal of the Society for Information Scienc pp E-ISSN: Volume

10 4 N. Belhumeur J. P. Hespanha and D. J. Kriegman Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection IEEE ransactions on Pattern Analysis and Machine Intelligence 19 (7) 1997 pp L. Swets and J. Weng Using discriminant eigenfeatures for image retrieval IEEE ransactions on Pattern Analysis and Machine Intelligence 18 (8) 1996 pp A. urk and A. P. Pentland Face recognition using Eigenfaces In IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1991 pp Dudoit J. Fridlyand and.p. Speed Comparison of discrimination methods for the classification of tumors using gene expression data Journal of the American Statistical Association 97 (457) 22 pp Cao Shen Sun Yang Chen Feature selection in a kernel space In International conference on machine learning (ICML) Oregon USA June 2-24 pp K. Fukunaga Introduction to Statistical Pattern Classification Academic Press San Diego California USA Ye R. Janardan C.H. Park and H. Park An optimization criterion for generalized discriminant analysis on undersampled problems IEEE ransactions on Pattern Analysis and Machine Intelligence 26 (8) 24b pp J. P. Ye Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems Journal of Machine Learning Research 6 25 pp C. H. Park H. Park A comparision of generalized linear discriminant analysis algorithms Pattern Recognition pp P. Howland H. Park Generalizing discriminant analysis using the generalized singular value decomposition IEEE ransactions on Pattern Analysis and Machine Intelligence 26 (8) 24 pp P. Howland M. Jeon H. Park Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition SIAM J. Matrix Anal. Appl. 25 (1) 23 pp L. Chen H. M. Liao M. Ko J. Lin G. Yu A new LDA-based face recognition system which can solve the small sample size problem Pattern Recognition 33 2 pp H. Yu and J. Yang A direct LDA algorithm for high-dimensional datałwith application to face recognition Pattern Recognition pp J. P. Ye and. Xiong Computational and theoretical analysis of null space and orthogonal linear discriminant analysis Journal of Machine Learning Research 7 26 pp Ye R. Janardan Q. Li and H. Park Feature extraction via generalized uncorrelated linear discriminant analysis In Proc. International Conference on Machine Learning 24a pp J. H. Friedman Regularized discriminant analysis Journal of American Statistical Association 84 (45) 1989 pp R. Lotlikar R. KothariFractional-step dimensionality reductionieee rans. Pattern Analysis and Machine Intelligence 22 (6) 2 pp Y. Liang et al. Uncorrelated linear discriminant analysis based on weighted pairwise Fisher criterion Pattern Recognition 4 27 pp R. O. Duda P. E. Hart and D. Stork Pattern Classification Wiley G. H. Golub and C. F. VanLoan Matrix computations third ed. John Hopkins Univ. Press P. Howland H. Park Equivalence of several two-stage methods for linear discriminant analysis in: Proceedings of the Fourth SIAM International Conference on Data Mining April 24 pp M. Loog R. P. W. Duin R. Haeb-Umbach Multiclass linear dimension reduction by weighted pairwise Fisher criteria IEEE rans. Pattern Anal. Mach. Intell. 23 (7) 21 pp Jian-qiang Gao Li-ya Fan Li-zhong Xu Median null(s w )-based method for face feature recognition Applied Mathematics and Computation pp Lishan Qiao Songcan Chen Xiaoyang an Sparsity preserving projections with applications to face Recognition Pattern Recognition 43 (1) 21 pp E-ISSN: Volume

Solving the face recognition problem using QR factorization

Solving the face recognition problem using QR factorization Solving the face recognition problem using QR factorization JIANQIANG GAO(a,b) (a) Hohai University College of Computer and Information Engineering Jiangning District, 298, Nanjing P.R. CHINA gaojianqiang82@126.com

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost

Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost Peg Howland and Haesun Park Abstract Linear discriminant analysis (LDA) has been used for decades to extract features

More information

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition Jun Liu, Songcan Chen, Daoqiang Zhang, and Xiaoyang Tan Department of Computer Science & Engineering, Nanjing University

More information

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,

More information

STRUCTURE PRESERVING DIMENSION REDUCTION FOR CLUSTERED TEXT DATA BASED ON THE GENERALIZED SINGULAR VALUE DECOMPOSITION

STRUCTURE PRESERVING DIMENSION REDUCTION FOR CLUSTERED TEXT DATA BASED ON THE GENERALIZED SINGULAR VALUE DECOMPOSITION SIAM J. MARIX ANAL. APPL. Vol. 25, No. 1, pp. 165 179 c 2003 Society for Industrial and Applied Mathematics SRUCURE PRESERVING DIMENSION REDUCION FOR CLUSERED EX DAA BASED ON HE GENERALIZED SINGULAR VALUE

More information

Two-Dimensional Linear Discriminant Analysis

Two-Dimensional Linear Discriminant Analysis Two-Dimensional Linear Discriminant Analysis Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu Ravi Janardan Department of CSE University of Minnesota janardan@cs.umn.edu Qi Li Department

More information

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Tao Xiong Department of ECE University of Minnesota, Minneapolis txiong@ece.umn.edu Vladimir Cherkassky Department of ECE University

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA) Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

A PRACTICAL APPLICATION OF KERNEL BASED FUZZY DISCRIMINANT ANALYSIS

A PRACTICAL APPLICATION OF KERNEL BASED FUZZY DISCRIMINANT ANALYSIS Int. J. Appl. Math. Comput. Sci., 2013, Vol. 23, No. 4, 887 3 DOI: 10.2478/amcs-2013-0066 A PRACTICAL APPLICATION OF KERNEL BASED FUZZY DISCRIMINANT ANALYSIS JIAN-QIANG GAO, LI-YA FAN, LI LI, LI-ZHONG

More information

Multiclass Classifiers Based on Dimension Reduction. with Generalized LDA

Multiclass Classifiers Based on Dimension Reduction. with Generalized LDA Multiclass Classifiers Based on Dimension Reduction with Generalized LDA Hyunsoo Kim a Barry L Drake a Haesun Park a a College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA Abstract

More information

Adaptive Discriminant Analysis by Minimum Squared Errors

Adaptive Discriminant Analysis by Minimum Squared Errors Adaptive Discriminant Analysis by Minimum Squared Errors Haesun Park Div. of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, U.S.A. (Joint work with Barry L. Drake and

More information

Efficient Kernel Discriminant Analysis via QR Decomposition

Efficient Kernel Discriminant Analysis via QR Decomposition Efficient Kernel Discriminant Analysis via QR Decomposition Tao Xiong Department of ECE University of Minnesota txiong@ece.umn.edu Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria Multi-Class Linear Dimension Reduction by Weighted Pairwise Fisher Criteria M. Loog 1,R.P.W.Duin 2,andR.Haeb-Umbach 3 1 Image Sciences Institute University Medical Center Utrecht P.O. Box 85500 3508 GA

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Enhanced Fisher Linear Discriminant Models for Face Recognition

Enhanced Fisher Linear Discriminant Models for Face Recognition Appears in the 14th International Conference on Pattern Recognition, ICPR 98, Queensland, Australia, August 17-2, 1998 Enhanced isher Linear Discriminant Models for ace Recognition Chengjun Liu and Harry

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

A Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis

A Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis A Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis Zhihua Zhang 1, Guang Dai 1, and Michael I. Jordan 2 1 College of Computer Science and Technology Zhejiang University Hangzhou,

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

A Unified Bayesian Framework for Face Recognition

A Unified Bayesian Framework for Face Recognition Appears in the IEEE Signal Processing Society International Conference on Image Processing, ICIP, October 4-7,, Chicago, Illinois, USA A Unified Bayesian Framework for Face Recognition Chengjun Liu and

More information

CPM: A Covariance-preserving Projection Method

CPM: A Covariance-preserving Projection Method CPM: A Covariance-preserving Projection Method Jieping Ye Tao Xiong Ravi Janardan Astract Dimension reduction is critical in many areas of data mining and machine learning. In this paper, a Covariance-preserving

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Developmental Stage Annotation of Drosophila Gene Expression Pattern Images via an Entire Solution Path for LDA

Developmental Stage Annotation of Drosophila Gene Expression Pattern Images via an Entire Solution Path for LDA Developmental Stage Annotation of Drosophila Gene Expression Pattern Images via an Entire Solution Path for LDA JIEPING YE and JIANHUI CHEN Arizona State University RAVI JANARDAN University of Minnesota

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Dimension reduction based on centroids and least squares for efficient processing of text data

Dimension reduction based on centroids and least squares for efficient processing of text data Dimension reduction based on centroids and least squares for efficient processing of text data M. Jeon,H.Park, and J.B. Rosen Abstract Dimension reduction in today s vector space based information retrieval

More information

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries

More information

Research Article Fast Second-Order Orthogonal Tensor Subspace Analysis for Face Recognition

Research Article Fast Second-Order Orthogonal Tensor Subspace Analysis for Face Recognition Applied Mathematics, Article ID 871565, 11 pages http://dx.doi.org/10.1155/2014/871565 Research Article Fast Second-Order Orthogonal Tensor Subspace Analysis for Face Recognition Yujian Zhou, 1 Liang Bao,

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Random Sampling LDA for Face Recognition

Random Sampling LDA for Face Recognition Random Sampling LDA for Face Recognition Xiaogang Wang and Xiaoou ang Department of Information Engineering he Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk Abstract Linear Discriminant

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Small sample size in high dimensional space - minimum distance based classification.

Small sample size in high dimensional space - minimum distance based classification. Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Bilinear Discriminant Analysis for Face Recognition

Bilinear Discriminant Analysis for Face Recognition Bilinear Discriminant Analysis for Face Recognition Muriel Visani 1, Christophe Garcia 1 and Jean-Michel Jolion 2 1 France Telecom Division R&D, TECH/IRIS 2 Laoratoire LIRIS, INSA Lyon 4, rue du Clos Courtel

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2 HE SINGULAR VALUE DECOMPOSIION he SVD existence - properties. Pseudo-inverses and the SVD Use of SVD for least-squares problems Applications of the SVD he Singular Value Decomposition (SVD) heorem For

More information

LEC 3: Fisher Discriminant Analysis (FDA)

LEC 3: Fisher Discriminant Analysis (FDA) LEC 3: Fisher Discriminant Analysis (FDA) A Supervised Dimensionality Reduction Approach Dr. Guangliang Chen February 18, 2016 Outline Motivation: PCA is unsupervised which does not use training labels

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data From Fisher to Chernoff M. Loog and R. P.. Duin Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands,

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Fisher Linear Discriminant Analysis

Fisher Linear Discriminant Analysis Fisher Linear Discriminant Analysis Cheng Li, Bingyu Wang August 31, 2014 1 What s LDA Fisher Linear Discriminant Analysis (also called Linear Discriminant Analysis(LDA)) are methods used in statistics,

More information

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning LIU, LU, GU: GROUP SPARSE NMF FOR MULTI-MANIFOLD LEARNING 1 Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning Xiangyang Liu 1,2 liuxy@sjtu.edu.cn Hongtao Lu 1 htlu@sjtu.edu.cn

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Linear Algebra Background

Linear Algebra Background CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection

More information

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1 Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Au = λu Example: ( ) 2 0 A = 2 1 λ 1 = 1 with eigenvector

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

Eigenimages. Digital Image Processing: Bernd Girod, 2013 Stanford University -- Eigenimages 1

Eigenimages. Digital Image Processing: Bernd Girod, 2013 Stanford University -- Eigenimages 1 Eigenimages " Unitary transforms" Karhunen-Loève transform" Eigenimages for recognition" Sirovich and Kirby method" Example: eigenfaces" Eigenfaces vs. Fisherfaces" Digital Image Processing: Bernd Girod,

More information

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition. Appears in the Second International Conference on Audio- and Video-based Biometric Person Authentication, AVBPA 99, ashington D. C. USA, March 22-2, 1999. Comparative Assessment of Independent Component

More information

Non-parametric Classification of Facial Features

Non-parametric Classification of Facial Features Non-parametric Classification of Facial Features Hyun Sung Chang Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Problem statement In this project, I attempted

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

A Note on Simple Nonzero Finite Generalized Singular Values

A Note on Simple Nonzero Finite Generalized Singular Values A Note on Simple Nonzero Finite Generalized Singular Values Wei Ma Zheng-Jian Bai December 21 212 Abstract In this paper we study the sensitivity and second order perturbation expansions of simple nonzero

More information

On the Equivalence Between Canonical Correlation Analysis and Orthonormalized Partial Least Squares

On the Equivalence Between Canonical Correlation Analysis and Orthonormalized Partial Least Squares On the Equivalence Between Canonical Correlation Analysis and Orthonormalized Partial Least Squares Liang Sun, Shuiwang Ji, Shipeng Yu, Jieping Ye Department of Computer Science and Engineering, Arizona

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Principal component analysis using QR decomposition

Principal component analysis using QR decomposition DOI 10.1007/s13042-012-0131-7 ORIGINAL ARTICLE Principal component analysis using QR decomposition Alok Sharma Kuldip K. Paliwal Seiya Imoto Satoru Miyano Received: 31 March 2012 / Accepted: 3 September

More information

Updating the Centroid Decomposition with Applications in LSI

Updating the Centroid Decomposition with Applications in LSI Updating the Centroid Decomposition with Applications in LSI Jason R. Blevins and Moody T. Chu Department of Mathematics, N.C. State University May 14, 24 Abstract The centroid decomposition (CD) is an

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Recognition Using Class Specific Linear Projection Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Articles Eigenfaces vs. Fisherfaces Recognition Using Class Specific Linear Projection, Peter N. Belhumeur,

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis Liang Sun lsun27@asu.edu Shuiwang Ji shuiwang.ji@asu.edu Jieping Ye jieping.ye@asu.edu Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287 USA Abstract Canonical Correlation

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota ICSC 2012, Hong Kong, Jan 4-7, 2012 HAPPY BIRTHDAY

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: Au = λu λ C : eigenvalue u C n : eigenvector Example:

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Linear Discriminant Analysis Using Rotational Invariant L 1 Norm

Linear Discriminant Analysis Using Rotational Invariant L 1 Norm Linear Discriminant Analysis Using Rotational Invariant L 1 Norm Xi Li 1a, Weiming Hu 2a, Hanzi Wang 3b, Zhongfei Zhang 4c a National Laboratory of Pattern Recognition, CASIA, Beijing, China b University

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Hierarchical Linear Discriminant Analysis for Beamforming

Hierarchical Linear Discriminant Analysis for Beamforming Abstract Hierarchical Linear Discriminant Analysis for Beamforming Jaegul Choo, Barry L. Drake, and Haesun Park This paper demonstrates the applicability of the recently proposed supervised dimension reduction,

More information

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Daoqiang Zhang Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China

More information

EIGENVALUES AND SINGULAR VALUE DECOMPOSITION

EIGENVALUES AND SINGULAR VALUE DECOMPOSITION APPENDIX B EIGENVALUES AND SINGULAR VALUE DECOMPOSITION B.1 LINEAR EQUATIONS AND INVERSES Problems of linear estimation can be written in terms of a linear matrix equation whose solution provides the required

More information

LEARNING from multiple feature sets, which is also

LEARNING from multiple feature sets, which is also Multi-view Uncorrelated Linear Discriminant Analysis with Applications to Handwritten Digit Recognition Mo Yang and Shiliang Sun Abstract Learning from multiple feature sets which is also called multi-view

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Discriminant Kernels based Support Vector Machine

Discriminant Kernels based Support Vector Machine Discriminant Kernels based Support Vector Machine Akinori Hidaka Tokyo Denki University Takio Kurita Hiroshima University Abstract Recently the kernel discriminant analysis (KDA) has been successfully

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Template-based Recognition of Static Sitting Postures

Template-based Recognition of Static Sitting Postures Template-based Recognition of Static Sitting Postures Manli Zhu 1, Aleix M. Martínez 1 and Hong Z. Tan 2 1 Dept. of Electrical Engineering The Ohio State University 2 School of Electrical and Computer

More information

Background Mathematics (2/2) 1. David Barber

Background Mathematics (2/2) 1. David Barber Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and

More information