Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach

Size: px
Start display at page:

Download "Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach"

Transcription

1 Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach Tao Xiong Department of ECE University of Minnesota, Minneapolis txiong@ece.umn.edu Vladimir Cherkassky Department of ECE University of Minnesota, Minneapolis cherkass@ece.umn.edu Jieping Ye Department of CSE Arizona State University, Tempe jieping.ye@asu.edu Abstract Several kernel algorithms have recently been proposed for nonlinear discriminant analysis. However, these methods mainly address the singularity problem in the high dimensional feature space. Less attention has been focused on the properties of the resulting discriminant vectors and feature vectors in the reduced dimensional space. In this paper, we present a new formulation for kernel discriminant analysis. The proposed formulation includes, as special cases, kernel uncorrelated discriminant analysis () and kernel orthogonal discriminant analysis (). The feature vectors of are uncorrelated, while the discriminant vectors of are orthogonal to each other in the feature space. We present theoretical derivations of proposed and algorithms. The experimental results show that both and are very competitive in comparison with other nonlinear discriminant algorithms in terms of classification accuracy. 1. Introduction In supervised learning, linear discriminant analysis (LDA) has become a classical statistical approach for dimension reduction and classification [3, 5]. LDA computes a linear transformation which maximizes the between-class scatter and minimizes the within-class scatter simultaneously, thus achieving maximum discrimination. Although it is conceptually simple, classical LDA requires the total scatter matrix to be nonsingular. In situations where the number of data points is smaller than the dimension of the data space, all scatter matrices are singular and we have the undersampled or singularity problem [8]. Moreover, classical LDA lacks the capacity to capture a nonlinearly clustered structure in the data. Nonlinear extensions of LDA use the kernel trick originally introduced in support vector machines (SVMs) [15]. The main idea of the kernel-based methods is to map the data in the original space to a feature space where inner product can be computed by a kernel function without requiring the knowledge of the nonlinear mapping function explicitly. However, such a nonlinear mapping needs to be applied with caution to LDA, since the dimension of the feature space is often much larger than that of the original data space, and any singularity problem in the original data space becomes more severe. Several kernel LDA (KDA) algorithms have been proposed which address the singularity problem. Mika et al. [1] proposed the kernel fisher discriminant algorithm by regularizing the within-class scatter matrix. Xiong et al. [16] presented an efficient KDA algorithm by using a two-stage analysis framework and QR decomposition. Yang et al. [7] proposed the kernel principal component analysis () [14] plus LDA framework where discriminant vectors can be extracted from double discriminant subspaces. More recently, a KDA algorithm using generalized singular value decomposition was proposed by Park and Park in [11]. In many applications, it is desirable to remove the redundancy among feature coordinates extracted in the reduced dimensional space and among discriminant vectors. Motivated by extracting feature vectors having statistically uncorrelated attributes, uncorrelated LDA (ULDA) was recently proposed in [6]. A direct nonlinear extension using kernel functions is presented in [9]. However, the algorithm involves solving a sequence of generalized eigenvalue problems and is computationally expensive for large

2 and high-dimensional dataset. Moreover, it does not address the singularity problem. Instead of adding constraint to the extracted coordinate vectors, Zheng et al. [18] extends the well known Foley-Sammon optimal discriminant vectors (FSODV) using a kernel approach, which they call KFSODV. The resulting discriminant vectors are orthogonal to each other in the feature space. The KFSODV algorithm presented in [18] has the same problem of involving several generalized eigenvalue problems and might be computationally infeasible for large data sets. Recently, an efficient algorithm, called orthogonal linear discriminant algorithm (), was proposed by Ye in [17]. In this paper, we propose a unified approach to general nonlinear discriminant analysis. We present a new formulation for kernel discriminant analysis which includes, as special cases, kernel uncorrelated discriminant analysis () and kernel orthogonal discriminant analysis (). Like other and algorithms, the feature coordinates of are uncorrelated, while the discriminant vectors of are orthogonal to each other in the feature space. We show how and can be computed efficiently under the same framework. More specifically, the solution to can be readily obtained with little extra cost based on the solution to the problem. The and algorithms proposed in this paper provide a simple and efficient way for computing uncorrelated and orthogonal transformations in the framework of KDA. We compare the proposed algorithms with other commonly used linear and nonlinear discriminant analysis (KDA) algorithms experimentally. This paper is organized as follows: In Section 2, we present a new framework for kernel discriminant analysis. Section 3 derives the and algorithms, respectively. Experimental results are presented in Section 4. Finally we conclude in Section New Framework for Nonlinear Kernel Discriminant Analysis Assume that X is a data set of n data vectors in an m- dimensional space, i.e., X =[x 1,,x n ]=[X 1,,X k ] R m n, (1) where the data is clustered into k classes and each block X i has n i data vectors whose column indices belong to N i. To extend LDA to the nonlinear case, X is first mapped into a feature space F R M through a nonlinear mapping function Φ. Without knowing the nonlinear mapping function Φ or the feature space F explicitly, we can work on the feature space F using kernel functions, as long as the problem formulation involves only the inner products between the data points in F and not the data points themselves. This is based on the fact that for any kernel function κ satisfying Mercer s condition, there exists a mapping function Φ such that Φ(x), Φ(y) = κ(x, y),where, is an inner product in the feature space. Let SB Φ, SΦ W and SΦ T denote the between-class, withinclass and total scatter matrix respectively. Then we have S Φ B = S Φ W = S Φ T = k n i (c Φ i c Φ )(c Φ i c Φ ) T k (Φ(x j ) c Φ i )(Φ(x j ) c Φ i ) T j N i k (Φ(x j ) c Φ )(Φ(x j ) c Φ ) T j N i where c Φ i and c Φ are the centroids of the i-th class and the global centroid, respectively in the feature space. To reduce the computation, instead of working on scatter matrices SB Φ, SΦ W and SΦ T directly, we express them as: where S Φ B = H B H T B,S Φ W = H W H T W,S Φ T = H T H T T, (2) H B = [ n 1 (c Φ 1 c Φ ),..., n k (c Φ k c Φ )], (3) H W = [Φ(X 1 ) c Φ 1 e T 1,...,Φ(X k ) c Φ k e T k ], (4) H T = [Φ(X 1 ) c Φ e T 1,...,Φ(X k ) c Φ e T k ], (5) where e i = [1,...,1] T R ni 1, i = 1,...,k and Φ(X i ) R M ni maps each column of X i into the feature space. After mapping the data to the feature space, LDA can be computed by finding in F a transformation matrix Ψ=[ψ 1,...,ψ k 1 ] R M (k 1), whose columns are the generalized eigenvectors corresponding to the largest k 1 eigenvalues of S Φ Bψ = λs Φ T ψ. (6) To solve the problem in (6) without knowing the mapping function Φ and the feature space F explicitly, we need the following lemma. Lemma 2.1 The discriminant vectors that solve the generalized eigenvalue problem (6) lie in the space spanned by the training points in the feature space, i.e. n ψ = γ i Φ(x i ),γ =[γ 1...γ n ] T. (7) Proof: Any vector ψ in the feature space can be represented as ψ = ψ s + ψ, (8)

3 where ψ s span{φ(x)} and ψ span{φ(x)}, where denotes the orthogonal complement. For any ψ Φ(X),wehaveS Φ B ψ =and S Φ T ψ =. Therefore, for any vector ψ that satisfies (6), S Φ Bψ s = S Φ Bψ = λs Φ T ψ = λs Φ T ψ s. (9) Hence, without losing any information, we can restrict the solution for (6) to span{φ(x)}, which completes the proof. Next we show how the generalized eigenvalue problem (6) can be expressed through kernel functions. Theorem 2.1 With ψ be expressed as in (7) and K the kernel matrix, i.e., K(i, j) =κ(x i,x j ), the generalized eigenvalue problem in (6) is equivalent to S K B γ = λs K T γ, (1) where SB K and SK T are between-class and total scatter matrices of K where each column in K is considered as a data point in the n-dimensional space. Proof: For any ξ = n α iφ(x i ) and α =[α 1...α n ] T, by left multiplying both sides of (6) by ξ T,wehave ξ T S Φ Bψ = λξ T S Φ T ψ. (11) Substituting ξ =Φ(X)α, ψ =Φ(X)γ into (11) and also from (2), α T Φ(X) T H B H B T Φ(X)γ = λα T Φ(X) T H T H T T Φ(X)γ. (12) Since (12) holds for any α, it follows that Φ(X) T H B H T B Φ(X)γ = λφ(x) T H T H T T Φ(X)γ. (13) Or equivalently, K B K T B γ = λk T K T T γ, where K B = [ b ij ], b ij = n j ( 1 κ(x i,x s ) 1 n κ(x i,x s )), (14) n j n s N j s=1 and K T =[t ij ], where t ij = κ(x i,x j ) 1 n n κ(x i,x s ). (15) s=1 The derivations of K B and K T follow naturally from (3), (5) and (7). It can be seen that K B K T B and K T K T T are the same as the between-class matrix SB K and total scatter matrix SK T of K if we view each column in K as a data points in the n-dimensional space. This completes the proof. Similarly, the within-class scatter matrix of K can be shown to be K W K T W where K W =[w ij ] and w i,j = κ(x i,x j ) 1 κ(x i,x s ). (16) n l s N l In summary, the kernel discriminant analysis finds a coefficient matrix Γ=[γ 1,...,γ k 1 ] R n (k 1) satisfying K B K B T Γ=K T K T T ΓΛ, (17) where Λ is a diagonal matrix containing generalized eigenvalues. Solving the generalized eigenvalue problem in (17) is equivalent to finding a transformation matrix Γ that maximizes the following criterion [5]: F (Γ) = trace(γ T K T K T T Γ) 1 (Γ T K B K B T Γ). (18) However, K T KT T is usually singular, which means the optimization criterion in (18) cannot be directly applied. In this paper, we adopt a new criterion, which is defined as F 1 (Γ) = trace((γ T K T K T T Γ) + (Γ T K B K B T Γ)), (19) where (Γ T K T K T T Γ) + denotes the pseudo-inverse of (Γ T K T K T T Γ). The optimal transformation matrix Γ is computed so that F 1 (Γ) is maximized. A similar criterion was used in [17] for linear discriminant analysis. The use of pseudo-inverse in discriminant analysis can also be found in [12]. Once Γ is computed, for any test vector y R m 1 in the original space, the dimension reduced representation of y is then given by [ψ 1,...,ψ k 1 ] T Φ(y) =Γ T κ(x i,y). κ(x n,y). (2) In order to solve the optimization problem (19), we need the following theorem. Theorem 2.2 Let Z R n n be a matrix that diagonalizes K B K T B, K W K T W, and K T K T T simultaneously, i.e. Z T K B K T Σb B Z = D b, (21) Z T K W K T Σw W Z = D w, (22) Z T K T K T It T Z = D t, (23) where Σ b and Σ w are diagonal matrices with diagonal elements of Σ b sorted in non-increasing order, and I t is an identity matrix. Let r = rank(k B K B T ) and Z r be the matrix consisting of the first r columns of Z. Then Γ=Z r M, for any nonsingular matrix M, maximizes F 1 defined in (19). Proof: The proof follows the derivations in [17], where a similar result is obtained for generalized linear discriminant analysis.

4 Next, we present an algorithm to compute such a Z that K B K T B, K W K T W and K T K T T can be diagonalized simultaneously. Let K T = UΣV T be the( SVD of K) T, where U and Σt V are orthogonal and Σ =, Σ t R t t is diagonal and t = rank(k T ). Then K T K T T = UΣΣ T U T = U ( Σ 2 t ) U T. (24) Let U be partitioned as U =(U 1,U 2 ), where U 1 R n t and U 2 R n (n t). From the fact that K T K T T = K B K B T + K W K W T, (25) and U2 T K T K T T U 2 =, we have U2 T K B K T B U 2 =, U2 T K W K T W U 2 =, U1 T K B K T B U 2 = and U1 T K W K T W U 2 =since both K B K T B and K W K T W are positive semidefinite. Therefore U T K B K T U T B U = 1 K B K T B U 1, (26) U T K W K W T U = ( U T 1 K W K T W U 1 ). (27) From (25 27), we have Σ 2 t = U T 1 K B K B T U 1 + U T 1 K W K W T U 2. It follows that I t =Σ 1 t U1 T K B K T B U 1 Σ 1 1 t +Σt U1 T K W K T W U 1 Σ 1 t. }{{}}{{}}{{}}{{} B B T W W T (28) Let B = P ΣQ T be the SVD of B, where P and Q are orthogonal and Σ is diagonal. Then Σ 1 t U1 T K B K T B U 1 Σ 1 t = P Σ 2 P T = P Σ b P T, (29) where Σ b = Σ 2 = diag(λ 1,...,λ t ), and λ 1,..., λ r > =λ r+1 =...= λ t, and r = rank(k B K T B ). It follows that I t = Σ b + P T WW T P. Hence P T Σ 1 t = I t Σ b Σ w is also diagonal. Therefore U T 1 K W K W T U 1 Σ 1 t Z = U ( Σt P I ) (3) diagonalizes K B K B T, K W K W T and K T K T T as in (21 23), where I in (3) is an identity matrix of a proper size. 3. Kernel Uncorrelated and Orthogonal Discriminant Analysis 3.1. Kernel Uncorrelated Discriminant Analysis Uncorrelated linear discriminant analysis (ULDA) was recently proposed in [6] to extract feature vectors having uncorrelated attributes. Statistically, a correlation describes the strength of an association between variables. An association between variables means that the value of one variable can be predicted, to some extent, by the value of the other. Therefore, uncorrelated features have the advantage of containing minimum redundancy, which is desirable in many applications. Let us denote S b, S w and S t to be the between-class, within-class and total scatter matrices in the original data space. The algorithm in [6] finds discriminant vectors successively that are S t -orthogonal. Recall that two vectors x and y are S t -orthogonal, if x T S t y =. The algorithm goes as follows: the i-th discriminant vector φ i of ULDA is the eigenvector corresponding to the maximum eigenvalue of the following generalized eigenvalue problem: P i S b ψ i = λ i S w ψ i, where U 1 = I,D i =[ψ 1,...,ψ i 1 ],P i = I S t Di T (D is t Sw 1 S t Di T ) 1 D i S t Sw 1, and I is the identity matrix. The limitation of the ULDA algorithm in [6] lies in the expensive computation of the s generalized eigenvalue problems, where s is the number of optimal discriminant vectors by ULDA. The algorithms presented in this paper, besides other advantages, resolves all the uncorrelated optimal discriminant vectors simultaneously in the feature space, thus saving huge amount of computation. The key to ULDA is S t -orthogonality of discriminant vectors. S t -orthogonality becomes ψ i ST Φψ j =where ψ i and ψ j are two different discriminant vectors in F. Using (2) (5) (7) and the equivalence in Theorem 2.1, wehave a constrained optimization problem for kernel uncorrelated discriminant analysis () as follows, Γ = arg max ΓT K T K T T Γ=I r F 1 (Γ), (31) where F 1 (Γ) is defined in (19) and I r is an identity matrix of size r r. In theorem 2.2, we have proved that for any nonsingular matrix M, Γ = Z r M maximizes the F 1 criterion. By choosing a specific M, we can solve the optimization in (31). Corollary 3.1 Γ = Z r, i.e., setting M = I r solves the optimization problem in (31). Proof: From the derivation of Z from last section and equation (23) particularly, Z T r K T K T T Z r = I r. With the fact that Z r also maximizes F 1, Z r is an optimal solution to (31). This completes the proof. The pseudo-code for is given in Algorithm Kernel Orthogonal Discriminant Analysis LDA with orthogonal discriminant vectors is a natural alternative to ULDA. In current literature, Foley-Sammon LDA (FSLDA) is well known for its orthogonal discriminant vectors [4, 2]. Specifically, assuming that i vectors

5 Algorithm 1: Input: data matrix X, test data y; Output: Reduced representation of y; 1. Construct kernel matrix K; 2. Form three matrices K B, K W, K T as in Eqs. (14 16); 3. Compute reduced SVD of K T as K T = U 1 Σ t V T 1 ; 4. B Σ 1 t U T 1 K B ; 5. Compute SVD of B as B = P ΣQ T ; r rank(b); 6. Z U 1 Σ 1 t P ; 7. Γ Z r ; 8. Reduced representation of y: Γ T κ(x i,y). κ(x n,y). ψ 1,ψ 2,...,ψ i are obtained, the (i +1)-th vector ψ i+1 of FSLDA is the one that maximizes the criterion function (18), subject to the constraints: ψi+1 T ψ j =,j =1,...,i. It has been proved that ψ i+1 is the eigenvector corresponding to the maximum eigenvalue of the following matrix: (I Sw 1 Di T S 1 i D i )Sw 1 S b, where D i =[ψ 1,...,ψ i ] T, and S i = D i Sw 1 Di T. The orthogonality constraint in the feature space becomes ψi T ψ j =for two different discriminant vectors ψ i and ψ j in F. Using the definition in (7) and the equivalence in Theorem 2.1, we have a constrained optimization problem for kernel orthogonal discriminant analysis () as follows, Γ = arg max Γ T KΓ=I r F 1 (Γ), (32) where F 1 (Γ) is defined in (23) and K is the kernel matrix defined in Theorem 2.1. Corollary 3.2 Let Zr T KZ r = UΣU T be the eigendecomposition of Zr T KZ r. Then Γ = Z r UΣ 1 2 solves the optimization problem in (32). Proof: In theorem 2.2, we have proved that for any nonsingular matrix M, Γ = Z r M maximizes the F 1 criterion. We have the freedom to choose M to make the constraint in (32) satisfied. With M = UΣ 1 2, M T Zr T KZ r M = Σ 1 2 U T UΣU T UΣ 1 2 = I r. This completes the proof. The pseudo-code for is given in Algorithm Experimental Comparisons In this section, we conduct experiments to demonstrate the effectiveness of the proposed and algorithms in comparison with other kernel-based nonlinear discriminant analysis algorithms as well as kernel PCA [14]. In all experiments, for simplicity, we apply the nearestneighbor classifier in the reduced dimensional space [1]. Algorithm 2: Input: data matrix X, test data y; Output: Reduced representation of y; 1. Compute the matrix Z r as in (steps 1 6 of Algorithm 1); 2. Compute the eigendecomposition of Zr T KZ r as Zr T KZ r = UΣU T ; 3. Γ Z r UΣ 1 2 ; 4. Reduced representation of y: Γ T κ(x i,y). κ(x n,y) Data set Vehicle Vowel Car Mfeature #class #attribute #data Table 1. Description of UCI data sets. Methods Vehicle Vowel Car Mfeature LDA KRDA(.1) KRDA(.5) KRDA(1) KRDA(3) KRDA(5) Table 2. The prediction accuracy on UCI datasets based on 3 runs. The number in parenthesis for KRDA is regularization value UCI datasets In this experiment, we test the performance of proposed algorithms using data with relative low dimensionality. Data sets were collected from the UCI machine learning repository 1. The statistics of data sets are summarized in Table 1. We randomly split the data into equally sized training and test sets and cross validation is used with the training data to tune the parameter σ in the Gaussian RBF kernel: κ(x, y) =exp( x y 2 2σ ). The random split was repeated 2 3 times. A commonly used approach to deal with the singularity problem in the generalized eigenvalue problem (17), is to use the regularization technique, where a positive diagonal matrix ξi is added to K T KT T to make it nonsingular. Then 1 mlearn/mlrepository.html

6 PIX: TESTA.8.7 PIX: TESTB error rate error rate Figure 1. Error rate of and vs other methods on PIX dataset: Test A. Figure 2. Error rate of and vs other methods on PIX dataset: Test B. the eigenvalue problem: (K T K T T + ξi) 1 K B K T B γ = λγ (33) is solved. We call this approach the kernel regularized discriminant analysis (KRDA). A similar approach dealing with the two-class classification problem was proposed in [1]. The performances of KRDAs are compared with and in our experiments. To make the comparison fair, we choose five different regularization parameter ξ values and report the prediction accuracies in Table 2. The prediction accuracies obtained by LDA and are also reported. In, for simplicity, we set the reduced dimension to the number of data points minus the number of classes. Experimental results in Table 2 show that both and perform competitively relative to other methods, with having a slight edge in terms of prediction accuracy. KRDA also gives reasonably good results. However, usually the optimal regularization parameter in KRDA needs to be determined experimentally and this procedure can be expensive, while and do not require any additional parameter optimization and can be computed efficiently Face image Databases To test the effectiveness of the proposed algorithms on the undersampled problems, we use two benchmark face databases: PIX and ORL PIX database PIX 2 contains 3 face images of 3 persons. The image size of PIX image is We subsample the images down to a size of 1 1 = 1. We use two settings in our experiments. In test A, a subset with 2 images per individual was randomly sampled to form the training data set and the rest of the data forms the test data set. In test B, 2 the size of randomly sampled data set was increased from 2 to 4. Again Gaussian kernel was used and σ was set to 2 since it gives overall good performance. We repeat the random split 3 times and the average prediction error rates are reported. We compare and with KRDA with five different regularization parameter values as well as and PCA plus LDA approach [1]. In PCA plus LDA, we report the best result on different PCA dimensions. The result on orthogonal linear discriminant analysis () is also reported [17]. Figures 1, 2 show the relative performance of different algorithms compared on PIX data sets. The vertical axis represents error rate while horizontal axis represents different algorithms. The names of different algorithms are listed on top of each bar in the figure. For KRDA, the numbers in parenthesis are regularization parameter values. We can see that in both tests, KRDA with parameter.5 achieves the lowest prediction error rate. outperforms and is very close to the best KRDA classifier. Note that there is no extra parameter to tune in and while it may be expensive to find the optimal regularization parameter for KRDA ORL database ORL 3 is a well-known dataset for face recognition [13]. It contains the face images of 4 persons, for a total of 4 images. The image size is The face images are perfectly centralized. The major challenge on this dataset is the variation of the face pose and facial expressions. We use the whole image as an instance. We use the same experimental protocol as in section The data set is randomly partitioned into training and test sets with 2 images per individual for training in test A and 4 images per individual for training in test B. The results in Figures 3 and 4 show the superiority of proposed and algorithms over all other algorithms compared. 3

7 error rate ORL: TESTA Figure 3. Error rate of and vs other methods on ORL dataset: Test A. error rate ORL: TESTB Figure 4. Error rate of and vs other methods on ORL dataset: Test B. Surprisingly, achieves lower prediction error rate than, which is not the case in our previous experiments. It is also interesting to see that also performs very well and outperforms the PCA plus LDA method by a large margin. 5. Conclusions In this paper, we proposed a unified approach for kernel uncorrelated and orthogonal discriminant analysis. The derived and algorithms solve the undersampled problem in the feature space using a new optimization criterion. has the property that the features in the reduced space are uncorrelated while has the property that discriminant vectors obtained in the feature space are orthogonal to each other. The proposed and algorithms are efficient. Experimental comparisons on a variety of real-world data sets and face image data sets indicate that and are very competitive relative to existing nonlinear dimension reduction algorithms, in terms of generalization performance. References [1] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. In ECCV, pages 45 58, [2] L. Duchene and S. Leclerq. An optimal transformation for discriminant and principal component analysis. IEEE TPAMI, 1(6): , [3] R. Duda, P. Hart, andd. Stork. Pattern Classification. Wiley, 2. [4] J. H. Friedman. Regularized discriminant analysis. JASA, 84(45): , [5] K. Fukunaga. Introduction to Statistical Pattern Classification. Academic Press, California, USA, 199. [6] Z. Jin, J.-Y. Yang, Z.-S. Hu, and Z. Lou. Face recognition based on the uncorrelated discriminant transformation. Pattern Recognition, 34: , 21. [7] J.Yang, A. Frangi, J. Yang, and Z. Jin. plus LDA: A complete kernel fisher discriminant framework for feature extraction and recognition. IEEE TPAMI, 27(2):23 244, 25. [8] W. Krzanowski, P. Jonathan, W. McCarthy, and M. Thomas. Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Applied Statistics, 44:11 115, [9] Z. Liang and P. Shi. Uncorrelated discriminant vectors using a kernel method. Pattern Recognition, 38:37 31, 25. [1] S. Mika, G. Ratsch, J. Weston, B. Schökopf, and K.-R. Müller. Fisher discriminant analysis with kernels. In IEEE Neural Networks for Signal Processing Workshop, pages 41 48, [11] C. Park and H. Park. Nonlinear discriminant analysis using kernel functions and the generalized singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 27(1):87 12, 25. [12] S. Raudys and R. Duin. On expected classification error of the fisher linear classifier with pseudo-inverse covariance matrix. Pattern Recognition Letters, 19: , [13] F. Samaria and A. Harter. Parameterisation of a stochastic model for human face identification. In Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, pages , [14] B. Schökopf, A. Smola, and K. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 1(5): , [15] V. Vapnik. The Nature of Statistical Leraning Theory. New York: Springer Verlag, [16] T. Xiong, J. Ye, Q. Li, R. Janardan, and V. Cherkassky. Efficient kernel discriminant analysis via QR decomposition. In NIPS, pages [17] J. Ye. Charaterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 6:483 52, 25. [18] W. Zheng, L. Zhao, and C. Zou. Foley-Sammon optimal discriminant vectors using kernel approach. IEEE Trans Neural Networks, 16(1):1 9, 25.

Efficient Kernel Discriminant Analysis via QR Decomposition

Efficient Kernel Discriminant Analysis via QR Decomposition Efficient Kernel Discriminant Analysis via QR Decomposition Tao Xiong Department of ECE University of Minnesota txiong@ece.umn.edu Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,

More information

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition Jun Liu, Songcan Chen, Daoqiang Zhang, and Xiaoyang Tan Department of Computer Science & Engineering, Nanjing University

More information

Two-Dimensional Linear Discriminant Analysis

Two-Dimensional Linear Discriminant Analysis Two-Dimensional Linear Discriminant Analysis Jieping Ye Department of CSE University of Minnesota jieping@cs.umn.edu Ravi Janardan Department of CSE University of Minnesota janardan@cs.umn.edu Qi Li Department

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA) Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,

More information

Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick

Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick Nonlinear Projection Trick in kernel methods: An alternative to the Kernel Trick Nojun Kak, Member, IEEE, Abstract In kernel methods such as kernel PCA and support vector machines, the so called kernel

More information

Discriminative K-means for Clustering

Discriminative K-means for Clustering Discriminative K-means for Clustering Jieping Ye Arizona State University Tempe, AZ 85287 jieping.ye@asu.edu Zheng Zhao Arizona State University Tempe, AZ 85287 zhaozheng@asu.edu Mingrui Wu MPI for Biological

More information

Robust Fisher Discriminant Analysis

Robust Fisher Discriminant Analysis Robust Fisher Discriminant Analysis Seung-Jean Kim Alessandro Magnani Stephen P. Boyd Information Systems Laboratory Electrical Engineering Department, Stanford University Stanford, CA 94305-9510 sjkim@stanford.edu

More information

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Daoqiang Zhang Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China

More information

Weighted Generalized LDA for Undersampled Problems

Weighted Generalized LDA for Undersampled Problems Weighted Generalized LDA for Undersampled Problems JING YANG School of Computer Science and echnology Nanjing University of Science and echnology Xiao Lin Wei Street No.2 2194 Nanjing CHINA yangjing8624@163.com

More information

CPM: A Covariance-preserving Projection Method

CPM: A Covariance-preserving Projection Method CPM: A Covariance-preserving Projection Method Jieping Ye Tao Xiong Ravi Janardan Astract Dimension reduction is critical in many areas of data mining and machine learning. In this paper, a Covariance-preserving

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Solving the face recognition problem using QR factorization

Solving the face recognition problem using QR factorization Solving the face recognition problem using QR factorization JIANQIANG GAO(a,b) (a) Hohai University College of Computer and Information Engineering Jiangning District, 298, Nanjing P.R. CHINA gaojianqiang82@126.com

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Efficient Kernel Discriminant Analysis via Spectral Regression

Efficient Kernel Discriminant Analysis via Spectral Regression Efficient Kernel Discriminant Analysis via Spectral Regression Deng Cai Xiaofei He Jiawei Han Department of Computer Science, University of Illinois at Urbana-Champaign Yahoo! Research Labs Abstract Linear

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Efficient and Robust Feature Extraction by Maximum Margin Criterion

Efficient and Robust Feature Extraction by Maximum Margin Criterion Efficient and Robust Feature Extraction by Maximum Margin Criterion Haifeng Li and Tao Jiang Department of Computer Science and Engineering University of California, Riverside, CA 92521 {hli, jiang}@cs.ucr.edu

More information

A PRACTICAL APPLICATION OF KERNEL BASED FUZZY DISCRIMINANT ANALYSIS

A PRACTICAL APPLICATION OF KERNEL BASED FUZZY DISCRIMINANT ANALYSIS Int. J. Appl. Math. Comput. Sci., 2013, Vol. 23, No. 4, 887 3 DOI: 10.2478/amcs-2013-0066 A PRACTICAL APPLICATION OF KERNEL BASED FUZZY DISCRIMINANT ANALYSIS JIAN-QIANG GAO, LI-YA FAN, LI LI, LI-ZHONG

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Worst-Case Linear Discriminant Analysis

Worst-Case Linear Discriminant Analysis Worst-Case Linear Discriant Analysis Yu Zhang and Dit-Yan Yeung Department of Computer Science and Engineering Hong Kong University of Science and Technology zhangyu,dyyeung}@cse.ust.hk Abstract Dimensionality

More information

Multiclass Classifiers Based on Dimension Reduction. with Generalized LDA

Multiclass Classifiers Based on Dimension Reduction. with Generalized LDA Multiclass Classifiers Based on Dimension Reduction with Generalized LDA Hyunsoo Kim a Barry L Drake a Haesun Park a a College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA Abstract

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Bayes Optimal Kernel Discriminant Analysis

Bayes Optimal Kernel Discriminant Analysis Bayes Optimal Kernel Discriminant Analysis Di You and Aleix M. Martinez Department of Electrical and Computer Engineering The Ohio State University, Columbus, OH 31, USA youd@ece.osu.edu aleix@ece.osu.edu

More information

Probabilistic Class-Specific Discriminant Analysis

Probabilistic Class-Specific Discriminant Analysis Probabilistic Class-Specific Discriminant Analysis Alexros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark alexros.iosifidis@eng.au.dk arxiv:8.05980v [cs.lg] 4 Dec 08 Abstract In this

More information

ADAPTIVE QUASICONFORMAL KERNEL FISHER DISCRIMINANT ANALYSIS VIA WEIGHTED MAXIMUM MARGIN CRITERION. Received November 2011; revised March 2012

ADAPTIVE QUASICONFORMAL KERNEL FISHER DISCRIMINANT ANALYSIS VIA WEIGHTED MAXIMUM MARGIN CRITERION. Received November 2011; revised March 2012 International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 1, January 2013 pp 437 450 ADAPTIVE QUASICONFORMAL KERNEL FISHER DISCRIMINANT

More information

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Recognition Using Class Specific Linear Projection Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Articles Eigenfaces vs. Fisherfaces Recognition Using Class Specific Linear Projection, Peter N. Belhumeur,

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Kernel Discriminant Analysis for Regression Problems

Kernel Discriminant Analysis for Regression Problems Kernel Discriminant Analysis for Regression Problems 1 Nojun Kwak Abstract In this paper, we propose a nonlinear feature extraction method for regression problems to reduce the dimensionality of the input

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Adaptive Discriminant Analysis by Minimum Squared Errors

Adaptive Discriminant Analysis by Minimum Squared Errors Adaptive Discriminant Analysis by Minimum Squared Errors Haesun Park Div. of Computational Science and Engineering Georgia Institute of Technology Atlanta, GA, U.S.A. (Joint work with Barry L. Drake and

More information

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

LEARNING from multiple feature sets, which is also

LEARNING from multiple feature sets, which is also Multi-view Uncorrelated Linear Discriminant Analysis with Applications to Handwritten Digit Recognition Mo Yang and Shiliang Sun Abstract Learning from multiple feature sets which is also called multi-view

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen,

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Kernel-based Feature Extraction under Maximum Margin Criterion

Kernel-based Feature Extraction under Maximum Margin Criterion Kernel-based Feature Extraction under Maximum Margin Criterion Jiangping Wang, Jieyan Fan, Huanghuang Li, and Dapeng Wu 1 Department of Electrical and Computer Engineering, University of Florida, Gainesville,

More information

A Unified Bayesian Framework for Face Recognition

A Unified Bayesian Framework for Face Recognition Appears in the IEEE Signal Processing Society International Conference on Image Processing, ICIP, October 4-7,, Chicago, Illinois, USA A Unified Bayesian Framework for Face Recognition Chengjun Liu and

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost

Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost Two-stage Methods for Linear Discriminant Analysis: Equivalent Results at a Lower Cost Peg Howland and Haesun Park Abstract Linear discriminant analysis (LDA) has been used for decades to extract features

More information

A Novel PCA-Based Bayes Classifier and Face Analysis

A Novel PCA-Based Bayes Classifier and Face Analysis A Novel PCA-Based Bayes Classifier and Face Analysis Zhong Jin 1,2, Franck Davoine 3,ZhenLou 2, and Jingyu Yang 2 1 Centre de Visió per Computador, Universitat Autònoma de Barcelona, Barcelona, Spain zhong.in@cvc.uab.es

More information

A Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis

A Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis A Flexible and Efficient Algorithm for Regularized Fisher Discriminant Analysis Zhihua Zhang 1, Guang Dai 1, and Michael I. Jordan 2 1 College of Computer Science and Technology Zhejiang University Hangzhou,

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Random Sampling LDA for Face Recognition

Random Sampling LDA for Face Recognition Random Sampling LDA for Face Recognition Xiaogang Wang and Xiaoou ang Department of Information Engineering he Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk Abstract Linear Discriminant

More information

Discriminant Kernels based Support Vector Machine

Discriminant Kernels based Support Vector Machine Discriminant Kernels based Support Vector Machine Akinori Hidaka Tokyo Denki University Takio Kurita Hiroshima University Abstract Recently the kernel discriminant analysis (KDA) has been successfully

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Immediate Reward Reinforcement Learning for Projective Kernel Methods

Immediate Reward Reinforcement Learning for Projective Kernel Methods ESANN'27 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 27, d-side publi., ISBN 2-9337-7-2. Immediate Reward Reinforcement Learning for Projective Kernel Methods

More information

Constructing Optimal Subspaces for Pattern Classification. Avinash Kak Purdue University. November 16, :40pm. An RVL Tutorial Presentation

Constructing Optimal Subspaces for Pattern Classification. Avinash Kak Purdue University. November 16, :40pm. An RVL Tutorial Presentation Constructing Optimal Subspaces for Pattern Classification Avinash Kak Purdue University November 16, 2018 1:40pm An RVL Tutorial Presentation Originally presented in Summer 2008 Minor changes in November

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System processes System Overview Previous Systems:

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Kernel Discriminant Learning with Application to Face Recognition

Kernel Discriminant Learning with Application to Face Recognition Kernel Discriminant Learning with Application to Face Recognition Juwei Lu 1, K.N. Plataniotis 2, and A.N. Venetsanopoulos 3 Bell Canada Multimedia Laboratory The Edward S. Rogers Sr. Department of Electrical

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Kazuhiro Fukui, University of Tsukuba

Kazuhiro Fukui, University of Tsukuba Subspace Methods Kazuhiro Fukui, University of Tsukuba Synonyms Multiple similarity method Related Concepts Principal component analysis (PCA) Subspace analysis Dimensionality reduction Definition Subspace

More information

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici Subspace Analysis for Facial Image Recognition: A Comparative Study Yongbin Zhang, Lixin Lang and Onur Hamsici Outline 1. Subspace Analysis: Linear vs Kernel 2. Appearance-based Facial Image Recognition.

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

COS 429: COMPUTER VISON Face Recognition

COS 429: COMPUTER VISON Face Recognition COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading:

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis Liang Sun lsun27@asu.edu Shuiwang Ji shuiwang.ji@asu.edu Jieping Ye jieping.ye@asu.edu Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287 USA Abstract Canonical Correlation

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas GENDER DETERMINATION USING A SUPPORT VECTOR MACHINE VARIANT Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas Artificial Intelligence and Information Analysis Lab/Department of Informatics, Aristotle

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota ICSC 2012, Hong Kong, Jan 4-7, 2012 HAPPY BIRTHDAY

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information